I'm building an application to calculate different KPI metrics for customers of an ecommerce website e.g. (avg. order value, avg. items count and so on). KPIs are integer and or double values e.g. number of items bought, avg. order value, gross margin...
The application fetches orders data, calculate metrics and store them. I'm using MySQL as a relational database.
about metrics:
I currently have 10 metrics to calculate for each customer.
Metrics can increase in the future but not so frequently so I can consider "10" as quite definitive. Anyway altering schema in the future is not a problem at all.
I need to calculate each metric on a weekly basis (at minimum). Metrics are about customers.
about customers:
Customers are 30k and they are growing + 0.5k/month rate.
Not all customers buy with the same frequency. I can have occasional buyers but also heavy buyers.
I want to show a graph with the overall trend of a specific KPI in a given timespan.
I want to show a graph with the trend of a metric for a specific customer in a given timespan.
My entites are:
- orders
- customers
- customers_kpi
I'm worried about storing a huge amount of useless data
52 weeks * 30k users * 4+ years = 6.2M rows at minimum
I have 2 questions:
Should I store rows for customers without orders for a given timespan (e.g. the row will be all filled with NULL)? Can avoid it somehow without affecting data visualization?
Which table structure is more efficient ("thin" table vs "fat" table) given that not all customers buy every week and number of metrics are unlikely to change often?
I'm in doubt among these 2 structures for customers_kpi table:
| customer_id | kpi1 | kpi2 | kpi3 | ..kpiN | from | to |
|---|
VS
| customer_id | kpi_name | kpi_value | from | to |
|---|