Best approach for a one-to-many list - how to store/acces some related count info?

Question

I have a table with persons, and other with interactions, notes and tasks, all related to person(s). I need to have quick access to:

last interaction date.
number of interactions in the past 1,6, 12 months.
number of notes (Total)
number of tasks

All this info are used for sorting highlighting in a list/table. When user select a person, all related info can be requested from DB, question is how to get this data for all persons from db?

Version 1: Sub-query and add the info for each person in table. Could be intensive when tables grows.

Version 2: store this info in person table, update them when new info is created/modified. This violates the 'single source of true' principle and could be prone to errors.

Any ideas?

score 0 · Answer 1 · answered Mar 21 '21 at 04:08

Without specifics such as which database system, version, and implementation (when applicable), your table schema, how much data there is, anticipated will be, and general growth pattern, it's hard to recommend one option over the other.

It looks like you're trying to do standard types of OLAP querying against your data in which case both options are valid (to a degree). Most modern RDBMS can handle OLAP type queries for Tables with rows into the multi-millions without a blink of an eye. So option 1 can be sufficient of a solution up to a certain point, as long as proper architecting and indexing is adhered to. For example (and this is where it's important to know specifics like which database system you're on), Microsoft SQL Server offers columnstore indexing which can immensely improve the performance of an OLAP type query by efficiently compressing the data in a columnar format which helps improve aggregation queries. YMMV on these kinds of features depending on which database system you're using.

Alternatively, when you get to a point where the data is just too much for your regular OLTP architected database to handle, then pre-storing data rollups and the answers to your other OLAP type queries in data cubes (loosely using the term here) makes sense and is accomplished by either a real-time process (such as via triggers) or periodically calculated at a rate acceptable to the business needs, and is considered the one source of truth for whatever state it's in at any point of time. But I would still follow some level of normalization and not store this information directly on the person table, rather stored on a table that references the person table (by storing the person_id for example).

Vérace · Answer 2 · 2021-03-21T11:59:46.293

You have 4 entities - therefore 4 tables would appear to be a good idea.

person (or patient... - I will use this because it's a scenario with which I'm familiar).
interaction
note
task

A person doesn't change - their name might, or even their gender, but their past history is immutable - therefore a person table is an obvious Single Point of Truth.

The whole rationale behind database normalisation is to "to reduce data redundancy and improve data integrity". Wiki is great, but no substitute for primary sources - read up on normalisation and get your db into 3rd Normal Form at least!

Also, remember that database schema design is an iterative process and that with a single question here, I can't possibly know about all of your legal, ethical & regulatory constraints, so I will give you a basic starting point, but I (or even any poster here) cannot give you a solution "out-of-the-box" - if you want that, buy some software - or take a look at a few/many of the F/LOSS solutions available in this area.

Another word of advice would be to NEVER (in so far as is possible) allow users to enter free text - see the interaction_type table as an example of how to enforce this within the database. Your db server is your last bastion of defence of your data - put as many constraints into the schema DDL as possible - this will prevent end-users (and even devs) from corrupting your system.

I have used PostgreSQL for my fiddle (here) - if you are considering a F/LOSS database server, then I would strongly urge you to choose it over any other Open Source offering - it is by far the most stable and standards compliant and also has more (and/or better) functionality than any of the others in many areas (JSON, GIS, regexes...).

So, I would start out with a basic skeleton like this:

Person:

CREATE TABLE person
(
  person_id INTEGER GENERATED ALWAYS AS IDENTITY,
  p_f_name TEXT NOT NULL,
  p_l_name TEXT NOT NULL,
  p_is_male BOOLEAN,
  p_addr_1 TEXT NOT NULL
  -- 
  -- more fields
  --
);

Interaction_type - how to keep control of data input!

CREATE TABLE interaction_type
(
  interaction_type_id SMALLINT GENERATED ALWAYS AS IDENTITY,
  i_t_name TEXT NOT NULL,
  i_t_desc TEXT NOT NULL,
  CONSTRAINT interaction_type_pk PRIMARY KEY (interaction_type_id),
  CONSTRAINT i_t_name_lt_50 CHECK (LENGTH(i_t_name) < 25)
);

Populate it for demonstration purposes:

INSERT INTO interaction_type (i_t_name, i_t_desc)
VALUES
('Appointment', 'Scheduled appointment'), ('Emergency', 'Emergency appointment'),
('Walk in', 'Walk in appointment'), ('Home', 'Scheduled home visit'),
('Phone A', 'Scheduled telephone appointment'), 
('Phone U', 'Unscheduled telephone appointment'); -- &c.

With the best will in the world, people make mistakes - typos, added spaces... fatigue, distraction... - by restricting the interaction_type in the interaction table below, you can perform analyses accurately and not have to put in an endless amount of OR conditions to take account of errors!

Then:

Interaction:

CREATE TABLE interaction
(
  interaction_id INTEGER GENERATED ALWAYS AS IDENTITY,
  i_person_id INTEGER NOT NULL,
  i_date_time TIMESTAMP NOT NULL,
  i_type SMALLINT NOT NULL,
  CONSTRAINT interaction_pk PRIMARY KEY (interaction_id),
  CONSTRAINT i_type_fk FOREIGN KEY (i_type) 
    REFERENCES interaction_type (interaction_type_id)
);

Note:

CREATE TABLE note  -- assumption - a note is always accompanied by an event
(                  -- change as required   
  note_id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
  note_i_id INTEGER NOT NULL,
  note_text TEXT NOT NULL,
  CONSTRAINT note_pk PRIMARY KEY (note_id),
  CONSTRAINT note_i_fk FOREIGN KEY (note_i_id) REFERENCES interaction (interaction_id)
);

Task:

CREATE TABLE task
(
  task_id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
  task_type TEXT NOT NULL,
  task_name TEXT NOT NULL, -- could have a table "procedure" - 'X-ray', 'Blood test'
  --                       -- and link via FOREIGN KEY
  task_ts TIMESTAMP NOT NULL,  -- could have two blood tests twice on same day?
  --
  -- other fields
  --
  CONSTRAINT task_pk PRIMARY KEY (task_id)
);

Interaction_task. This final table is an example of (technical jargon) an Associative Entity - more commonly called a joining table (or linking, or m:n - there are many names). A given interaction may be associated with one or many tasks and a given task may be associated with one or many interactions. This is the standard way of modelling this scenario in an RDBMS - (see here for more details).

CREATE TABLE interaction_task -- could have series of tasks associated with 
(                              -- one interaction... - or just one...
  it_i_id INTEGER NOT NULL,
  it_t_id INTEGER NOT NULL,
  CONSTRAINT interaction_task_pk PRIMARY KEY (it_i_id, it_t_id),
  CONSTRAINT it_i_id_fk FOREIGN KEY (it_i_id) REFERENCES interaction (interaction_id),
  CONSTRAINT it_t_id_fk FOREIGN KEY (it_t_id) REFERENCES task (task_id)
);

As mentioned above, schema design is iterative and your final working schema will depend on many factors of which I'm unaware. The process works like this

1 modify schema
2 debug/test
3 back to step 1!

Come back to us with more specific questions when and if you have them - your question is a bit broad.

Concerning performance, you won't have to start to worry until you have millions of entries in your tables - any sort of small practice (medical, business, legal...) will be able to run this on a laptop using a HDD - not that I'd recommend this - HDDs can fail!

At the very least, have a cloud backup strategy and/or external HDD and perform your backups at a quiet time (03:30?) - and retain a paper trail.

When/if you become busy, you might want to invest in a small server - with RAID 1 (mirroring) using SSDs (prices coming down by the month) - should be able to cope with most needs in this sphere - although maybe by that time, you'll be able to afford a sysadmin/dba?

Best approach for a one-to-many list - how to store/acces some related count info?

2 Answers2