1

My colleague and I are having some trouble coming up with a database design, and I am doing my best to stay away from EAV for now.

We have an entity called Preparation that stores configuration data for machines. There are two types of processes per machine, let's call these process1 and process2, and a machine can only have one process. Both processes share a lot of attributes so setting up one table seemed logical.

However process2 has a lot of extra data, and even sub-categorizes itself with process2a, processes2b that is specific to those sub categories (process1 only has one extra attribute for now). So that one table will have nulls in columns that are not specific to its process.

Splitting the table for each process was another option. However, we need to preserve any data change and splitting the tables will bring more work (most all transactions will be INSERTS with a timestamp into a history table, followed by an UPDATE on the live table if they are updating a record). For example, if a machine changes from process1 to process2, then a record will have to be inserted in the history table from process1, deleted from process1 and then inserted into process2. We are using the standard edition of SQL Server, so no CDC to help with logging changes.

I am trying to keep these pretty relational and I would hate to create one table with expected NULLS (maybe create SPARSE columns if the NULL percentage is high?).

I am hoping someone has experience with this type of situation. Thank you.

Andriy M
  • 23,261
  • 6
  • 60
  • 103
jmzagorski
  • 167
  • 1
  • 9

2 Answers2

1

Which fields will you need to do any of the following on?

  • sorting
  • filtering
  • counting
  • grouping
  • type checking

If there are some fields that you simply want to store for viewing or what-not, then you don't need to put them in their own columns. You could store them as JSON or XML or some other serialized form.

But if you need to do the above on your fields, then you almost certainly want real database columns for them. And by real database columns I mean not EAV or LOBs.

If it sounds like a lot of work, then consider using Hibernate (with inheritance) to generate your database schema for you. And use Hibernate Envers to handle change capturing.

Neil McGuigan
  • 8,653
  • 5
  • 42
  • 57
1

Treat the various processes as sub-types. There will then be entity process_base, which contains all the common attributes, process1, process2, process2a etc. for the process-specific attributes.

Implement these as a table each. A view which combines them all together may simplify usage

create view process_all as
select <whatever>
from process_base
inner join process1 <etc>

union all

select <whatever>
from process_base
inner join process2 <etc>
...

This way you minimise the NULL columns (if that is desirable to you) but maintain the unity of "process" as a single idea.

Michael Green
  • 25,255
  • 13
  • 54
  • 100