5

The company I work for produces a lot of test data in raw text files from lots of different devices. Each device belongs to a 'process' of which there are only a few dozen. Currently we have a program which takes this data and produces/adds to an Access database (currently .mdb, however we are hoping to change to .accdb because it's newer and better?) of the appropriate process.

We then use Excel to view the data from the databases in a pivot table. The database files themselves rarely get close to 1 GB (maybe a few a year) at which point we archive the database and create a new one as this is the size limit. We never need to access the data of multiple processes at the same time, so we have never had an issue with them being in separate databases.

Someone in our IT department has suggested that we switch to an SQL database, where all of the data for all processes will be stored, and the data will just have an extra column labelling the process it belongs to.

My worry is that in accessing this data through Excel we would only ever be querying one process at a time, but the database would have to filter through all of the other processes, which would surely take longer, since it is having to query dozens of times more data?

Is there any disadvantage to sticking with our current system of multiple databases, one for each process which we only access one at a time, instead of collecting them all into a large database? And conversely are there advantages of having a single large database system like this? Am I wrong and it would be faster to use a large SQL database than multiple small Access databases?

philipxy
  • 797
  • 1
  • 7
  • 18

3 Answers3

12

In my opinion, Microsoft Access is an obsolete way to manage data, especially within an organization. There's a lot of limitations and lack of features compared to a modern database system, besides the data being stored in a localized file (as opposed to being centralized on a server). But people still use it to do so, which I suppose is why Microsoft still supports it as an application.

My worry is that in accessing this data through excel we would only ever be querying one process at a time, but the database would have to filter through all of the other processes, which would surely take longer, since it is having to query dozens of times more data?

When architected and indexed properly, this would not be something to be concerned about. A B-Tree index (the standard type of index in Microsoft SQL Server) has a search time of O(Log2(n)). This means if your table had 1 billion rows, it would only need to search through roughly 30 nodes of the B-Tree (at most) to find any particular subset of data (Log2(1 billion) = 30).

So as the person in your IT department mentioned, if you indexed your table on the Process column (since it sounds like you only look at one process of data at a time), ideally it would only take milliseconds to locate any subset of process data, no matter how big the table was. Then loading that data off disk would take roughly the same time as it currently does in Microsoft Access, since it would be the same amount of data.

To hammer the point home, anecdotally, I used to work with databases that had tables that were terabytes big, with 10s of billions of rows in them, on a server that was no more provisioned than a modern laptop, and querying for any subset of data took less than a second for SQL Server to locate.

J.D.
  • 40,776
  • 12
  • 62
  • 141
5

Education: full-featured databases (which Access is not) have a thing called Indexes*. When configured, these allow you to pull data by indexed values without visiting the rest of the table. You can easily get the rows you want from a table with millions or billions of rows in milleseconds to seconds. So that is not a reason to avoid changing.

Instead, let's lay out the disadvantages and advantages of your current approach:

Advantages of Your Current Setup

  • You're used to it. You have knowledge, processes and techniques for operating your current stack that you and your coworkers are used to.
  • Your equipment (server and network) are adequate for your current approach and do not need to be augmented.
  • You don't need to buy any new software.

Disadvantages

  • Reliability: one of the chronic issues with MDB and AccDB, historically, has been its tendency towards file bloating and eventual data corruption. This requires some complex backup procedures not supported by the app to recover from. This may be a very small or very large problem for you, depending.
  • Network access: MSAccess is not designed to be accessed over the network by multiple staff at the same time. Apparently that's not currently a requirement, but if it became one, you couldn't support it.
  • Won't support new analytics tools: should your office decide that they want to be able to use better analytical tools, like Tableau or SciPy, these will not be supported by your current setup.
  • No cross-process queries: right now you don't do analysis across different processes. Should you ever decide to, though, your current setup will not support it.

So, as you can see, most of the reasons to change are around enhanced capabilities. And only you and your coworkers can decide if these enhanced capabilities are worth the cost of changing systems.

(* MSAccess actually supports indexes, but given the structure of MDB they are not nearly as effective as indexes on SQL Server)

Josh Berkus
  • 1,216
  • 8
  • 6
3

You question does already have two excellently answers. What I'd like add, is that your IT department could start by downloading a free Developer version of an SQL Database, like for example MS SQL Server. They could then "play" with it. Import MS Access files is easy. They could then use SQL queries. Help on the web is plenty. Then you could decide whether to upgrade to a more advanced Relational Database.

ypercubeᵀᴹ
  • 99,450
  • 13
  • 217
  • 306