13

I am mostly a self-taught when it comes to database designs. I am posing this question because I have settled on this common structure, but am wondering if it is the most efficient or 'industry standard' method.

Most databases I design have a user table, and then a persons activty is tracked in another table. I understand that the beauty of the database is to have these sorts of efficiencies, but the activity table will gather many many events fairly quickly just from every user using it regularly, thus becoming a huge table fairly quickly with moderate user usage. Is this best practice to just let it grow in this way? Or is a tier of tables, or splitting to different tables based on dates, or per amount of users, or something else?

+--------------------+                   +------------------------+
|   UserData         |                   |   Activity             |
+-=------------------+                   +------------------------+
| ID     (auto uint) | <--1-to-many-+    | ID  (auto uint)        |
| UserName (text)    |              +--> | UserID (uint)          |
| Email    (text)    |                   | Timestamp (time)       |
| additional info... |                   | Type (ID to elsewhere) |
+--------------------+                   | additional info...     | 
                                         +------------------------+

I just would like to know of where I can improve anything, as to help me learn.

CenterOrbit
  • 440
  • 3
  • 10

2 Answers2

6

You have made a very good observation. The Activity table will grow fast and large. What I have done in the past is archive off the older data (say older than 14 days) to an ActivityHistory table. Doing so keeps the Activity table to a manageable size and if you need to do research you can always look back at the ActivityHistory table.

5

Or is a tier of tables, or splitting to different tables based on dates, or per amount of users, or something else?

You may want to look into the concept of 'partitioning' in your database. Most RDBMSes have some support for them (eg, mysql, oracle, sql server, postgresql). Basically, you let the RDBMS handle the process of creating/managing the fact that each month/year/whatever is stored in a separate table, while the code accessing it treats it as one large table.

You could partition it by user name, date, or whatever's going to be used most frequently to access the data. (there are advantages / disadvantages of making it user-centric vs. date-centrid ... but I don't know if you want me going into all that)

Joe
  • 5,189
  • 1
  • 29
  • 39