Is having a single table with about 46 columns all related to the same entity a good idea or breaking it into 3 tables better?

Question

Let's start by explaining the entity that I'm thinking of breaking it into three tables and how the UI form is organized, and data entry is happening. There is a Person who has many properties, and these can be grouped into three parts (according to the UI form).

The first part is the Person's basic info, things like his name, date of birth, location, etc. It has 21 fields.
The second part is the Person's medical condition. This part has 5 fields and all of them could be null.
The third part relates to the Person's financial situation. This part has 20 fields and many of them could be null.

The UI form is divided into these three main sections (there are other sections, like document upload, but these will have their own separate tables)

Should I keep all these data in one table (Person) which will contain about 46 columns, or do I make three tables each for each section? Am I going to run into performance issues down the line because the Person table has 46 columns? I'm inclined towards having one table for simplicity and I don't see a good reason to have three tables.

Let's talk data access patterns.

The Person table will be the most accessed table in my app in terms of reads. The user will be able to see a paginated list of persons, and this query will select only about 3 or 4 fields.

The user will also be able to filter persons based on about 10 fields (most of them from basic info and some from the person's medical condition). Also, this query will have to do some joins with another tables whose data should not be in the Person table (like Photos).

For showing a single person, the query will select about 10 fields, 9 of them is from the person's basic info and 1 from his financials. It will also retrieve other data from different tables. The listing of persons and of a single person will be the two most used queries in my app.

Another query that will be used many times will be listing all the data of a single person, the 46 columns along with other columns from different tables. There will also be the occasional updates to the Person table but mostly it's reads.

I am using Postgres.

Part of the medical condition fields is "diseases" and these can be multiple. I just have an array of "diseases" as field for the medical condition part. But essentially, a Person can have only one "medical condition" so one array of "diseases" for each Person. A person will have one-to-one with each one of these 46 fields.

For instance, if I have M1, M2 fields related to medical condition and Person can only have one value of M1 and of M2, why can't I just stick M1 and M2 in Person table? What are the disadvantages? Same goes for the financial situation part.

The medical condition section in the UI form has different fields of different things all related to the person's medical situation, let's call them M1 and M2 (in reality it's more than 2). I thought that a different table (person_medical_condition) might house these M1 and M2 columns and references the Person table but why would I do that given that there is one-to-one mapping from the Person to these fields. A person can't have multiple M1s and M2s.

I could just add these M fields to the Person table even when some of them could be null. The same case goes for the finance part. I have different fields related to the Person's finances that could be added to the Person table given they also have one-to-one mapping. For instance, a Person's monthly income. He can't have multiple monthly incomes.

score 2 · Answer 1 · answered Dec 28 '24 at 07:59

The use-case is not sufficiently clear. If you are designing a medical records system, where each person can have many interactions over a period of time, each for a different problem, then standard normalisation techniques are what you need to apply. If, however, your system is for single snapshots, where each record is for a single problem and completely unrelated to any other problems for the same person, either before or after, then a single table would do, assuming that there is only a single value for each data item.

score 2 · Answer 2 · answered Dec 28 '24 at 13:47

For this issue, you require associative entities (more commonly known as joining tables) - there are many more names on the Wiki.

To answer your question, I did the following (all of the code above is available on the fiddle here):

Create tables:

person:

CREATE TABLE person
(
  id   INT4       NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
ssn INT8        NOT NULL UNIQUE, -- Don't know how your country does ssn's 
  fname TEXT      NOT NULL,        -- may want to CHECK here
  lname TEXT      NOT NULL,           --            "
  address1 TEXT   NOT NULL,        -- may have separate address table
  ---                                 for multi-member families - reference FK!
---   other fields
blood_type TEXT NOT NULL CHECK (LENGTH(blood_type) <=3), -- longest is 'AB+/AB-')
--    other fields
comment TEXT    NOT NULL
);

medical condition:

CREATE TABLE medical_condition
(
  id   SMALLINT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name TEXT NOT NULL
  --
  -- other condition details (treating physician/isolation/...
  --
);

financial situation:

CREATE TABLE financial_situation
(
  id SMALLINT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
situation_type TEXT NOT NULL UNIQUE
);

Populate these:

person:

INSERT INTO person (ssn, fname, lname, address1, blood_type, comment) VALUES
(1001, 'fname_1', 'lname_1', 'address_1', 'A-',   '1 condition'),
(1002, 'fname_2', 'lname_2', 'address_2', 'A+',   '2 conditions'),
(1003, 'fname_3', 'lname_3', 'address_3', 'B+',   '3 conditions'),
(1004, 'fname_4', 'lname_4', 'address_4', 'B-',   '4 conditions'),
(1005, 'fname_5', 'lname_5', 'address_5', 'AB-', 'No M conditions'), 
(1006, 'fname_6', 'lname_6', 'address_6', 'AB+', 'No finances'),

(1007, 'fname_7', 'lname_7', 'address_7', 'O-',  'Neither M nor F');

medical:

INSERT INTO medical_condition (name) VALUES
('Dysentry'),
('Diphteria'),
('Cholera'),
('Typhoid'),
('Tuberculosis');  -- nobody has this!

finances:

INSERT INTO financial_situation (situation_type) VALUES
('Mortgage'),
('Retired'),
('Pension'),
('Stocks'),
('Shares'),
('Pot of Gold');  -- nobody has this!

Now, for our joining tables - see here for more on composite (multi-column) PRIMARY KEYs and their use in joining tables):

CREATE TABLE person_medical
(
  person_id INT4 NOT NULL REFERENCES person (id),
  condition_id SMALLINT NOT NULL REFERENCES medical_condition (id),
CONSTRAINT person_medical_pk PRIMARY KEY (person_id, condition_id)
-- Because of the PK, nobody can have the same condition twice!
);

Populate:

INSERT INTO person_medical VALUES
(1, 1),
(2, 1),
(2, 2),
(3, 1),
(3, 2),
(3, 3),
(4, 1),
(4, 2),
(4, 3),
(4, 4),
(6, 1); -- remember p5, no medical condition, p6 no fin and p7 nothing!

and person_financial:

CREATE TABLE person_financial
(
  person_id INT4 NOT NULL REFERENCES person (id),
  financial_id INT2 NOT NULL REFERENCES financial_situation (id),
CONSTRAINT person_financial_pk PRIMARY KEY (person_id, financial_id)
);

Populate:

INSERT INTO person_financial VALUES
(1, 1),
(2, 1),
(2, 2),
(3, 1),
(3, 2),
(3, 3),
(4, 4),
(5, 5);  -- remember p6, no fin and p7 nothing!

So, how is this useful? Well, to find out who has what condition(s) and how many conditions they have overall (more urgent?), we run the following SQL:

SELECT
  p.id AS pid, p.ssn, 
  pm.person_id AS pm_pid, pm.condition_id AS pm_cid, 
  mc.name, mc.id,
  COUNT(*) 
    OVER 
    (
      PARTITION BY p.ssn 
      ORDER BY p.id, pm.condition_id
      RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
    ) AS "No. conditions",
  p.comment
FROM
  person p
  JOIN person_medical    pm ON p.id = pm.person_id
  JOIN medical_condition mc ON pm.condition_id = mc.id
ORDER BY pid, mc.id;  -- Always have an ORDER BY!

Result (you can remove unwanted fields - left in for didactic purposes):

pid     ssn  pm_pid pm_cid  name      id No. conditions     comment
  1     1001      1      1  Dysentry       1    1   1 condition
  2     1002      2      1  Dysentry       1    2   2 conditions
  2     1002      2      2  Diphteria      2    2   2 conditions
  3     1003      3      1  Dysentry       1    3   3 conditions
  3     1003      3      2  Diphteria      2    3   3 conditions
  3     1003      3      3  Cholera        3    3   3 conditions
  4     1004      4      1  Dysentry       1    4   4 conditions
  4     1004      4      2  Diphteria      2    4   4 conditions
  4     1004      4      3  Cholera        3    4   4 conditions
  4     1004      4      4  Typhoid        4    4   4 conditions
  6     1006      6      1  Dysentry       1    1   No finances

and to find people with no medical condition:

SELECT
  p.id AS pid, p.ssn,
  COALESCE(pm.person_id, 0),
  p.comment  
FROM
  person p
  LEFT JOIN person_medical pm ON p.id = pm.person_id
WHERE 
  pm.person_id IS NULL
ORDER BY pid;

Result:

pid     ssn  coalesce   comment
  5     1005        0   No M conditions
  7     1007        0   Neither M nor F

Now, if you have a table such as this:

CREATE TABLE person
(
  id INT...
  ssn INT...
  -- 
  -- more fields
  --
  medical_condition_1 TEXT,
  medical_condition_2 TEXT,
  medical_condition_3 TEXT...
  -- 
  -- more medical conditions
  --
  financial_situation_1 TEXT,
  financial_situation_2 TEXT,
  --
  -- more financial condititions
  --
);

Trying to find out the two things I obtained fairly easily with SQL will be nothing short of a NIGHTMARE - a world of pain awaits you! I suggest that you try and do it? Try and produce a list of people by count of conditions using your schema?

What if you have one person with (different ordering!):

---
---
--- medical_condition_1 'Cholera',
--- medical_condition_2 'Typhoid'
---
and another person with:


--- medical_condition_1 'Typhoid',
--- medical_condition_2 'Cholera'

How are you going to find out all of the people who have Cholera?

SELECT
  pid, 
  medical_condition_1,
  medical_condition_1,
  medical_condition_1,
  --
  --  more - what's the limit - what happens if you add one more - your
  --  entire schema + your app will have to change - with my schema, you
  --  just add a condition to the `medical_condition` table!
  --
  medical_condition_n,  
  --
  --
WHERE medical_condition_1 = 'Cholera' 
   OR medical_condition_2 = 'Cholera'
   OR medical_condition_3 = 'Cholera'.
   OR ... OR... 
   OR medical_condition_n = 'Cholera';

All I have to do is:

--
-- People with Cholera
--
SELECT
  p.id AS pid, p.ssn, p.fname, p.lname, 
  mc.name AS "Disease"
FROM
  person p
  JOIN person_medical    pm ON p.id = pm.person_id
  JOIN medical_condition mc ON pm.condition_id = mc.id
WHERE mc.name = 'Cholera'
ORDER BY pid, mc.id;

Result:

pid  ssn   fname    lname    Disease
  3  1003  fname_3  lname_3  Cholera
  4  1004  fname_4  lname_4  Cholera

Trivial!

I cannot urge you strongly enough to do away with your idea of having constructs like M1 and M2... Mn. As shown above, this is nothing short of a recipe for disaster, sleepless nights and unhappy stakeholders!

Play a bit with LEFT JOINs and RIGHT JOINs and see what you come up with - try finding those people who have a Pension or are Retired using both scenaios - I'll leave all the fun of that to you! Last snippet is a list of conditions not present in your population and a COUNT() number of conditions per patient.

score 1 · Answer 3 · answered Dec 29 '24 at 07:15

1

If each person has only one personal information (likely), medical condition and financial situation, keep them all in a single table. Null values don't take up any space, and 46 columns is not excessive. – laurenz-albe

answered Dec 29 '24 at 07:15

Comment Answers

1
4

Is having a single table with about 46 columns all related to the same entity a good idea or breaking it into 3 tables better?

3 Answers3

--- other fields

blood_type TEXT NOT NULL CHECK (LENGTH(blood_type) <=3), -- longest is 'AB+/AB-')

-- other fields

--- medical_condition_2 'Cholera'

Linked