Improving performance of large SQL table for visitors stats

Question

I have a table for visitor stats as follows:

CREATE TABLE `stats` (
  `key` int(11) NOT NULL AUTO_INCREMENT,
  `u_id` int(11) NOT NULL,
  `app` varchar(20) NOT NULL,
  `type` varchar(20) DEFAULT '',
  `category` varchar(80) DEFAULT '',
  `sub_category` varchar(80) DEFAULT '',
  `date` int(11) NOT NULL,
  `ip` varchar(20) NOT NULL
  PRIMARY KEY (`key`)
) ENGINE=MyISAM AUTO_INCREMENT=41094490 DEFAULT CHARSET=utf8;

It was created without much thought a few years ago and has been collecting data ever since. It now has about 40 million rows.

Selecting by "app" takes about 1 minute to complete which is far too slow. I'm usually involved in PHP and Javascript but my MYSQL knowledge is pretty limited.

The table will only ever need rows added to it and for select queries to be made. The select queries are for generating graphs of usage stats on the frontend.

A typical query will select by matching 2-4 column values and will always be selected with a date range. Normally I would expect the date range tobe for a period of 1 week to 6 months, but there are cases where the user may want to view a few years of data. The priority is to optimise the former.

I don't mind having a separate database with this data sectioned off into different tables according to date, so have maybe one table per month of data. I can see how this would help queries of a few months at a time (using joins), but when it comes to data over a few years it would get even slower.

Is there anything else I can alter in terms of the tables structure to improve things? Would it be better separating the table into related tables with fewer columns?

The query I am using to test performance is:

select * from stats where u_id='123' and app='articles'

Using the EXPLAIN function on this I get:

"id","select_type","table","type","possible_keys","key","key_len","ref","rows","Extra"
1,"SIMPLE","stats","ALL",NULL,NULL,NULL,NULL,37462750,"Using where"

Ergest Basha · Accepted Answer · 2023-04-13T21:00:48.283

The table needs some changes.

Change Engine from MyISAM to InnoDB

The main differences between MyISAM and InnoDB

Overall, MyISAM is an older and less efficient storage engine than InnoDB. The most commonly noted differences between these two engines are as follows: InnoDB is more stable, faster, and easier to set up; it also supports transactions. It is the default storage engine of choice for a reason, and you can use the features of InnoDB with no need to worry about compatibility issues. If you need to store large amounts of data or ensure that transactions will work correctly, choose InnoDB.

The MyISAM engine is not very good at storing large amounts of data, because it stores everything in a single table. When you need to add data to the database, you have to lock the entire table, which can cause your database to stop working until it is unlocked. In the InnoDB engine, each row is stored separately in a separate table. This means that when you insert data into a MySQL database, you do not need to lock all rows.

A more detailed explanation here Which is faster, InnoDB or MyISAM?

Use proper date datatype not int. If by date you store the inserted date of the record I prefer using

insert_record_dt timestamp NOT NULL DEFAULT current_timestamp()
A typical query will select by matching 2-4 column values and will always be selected with a date range

We need to see those queries for better answer on the indexes.

For example if you are using the following query

select s.`key`
       s.`sub_category`
       ..
       ..
from stats s 
where sub_category='test' 
and `date` >='2023-03-01' ;

The needed index would be

INDEX sc_dt(`sub_category`,`date`)

If possible do not use Keywords and Reserved Words otherwise you need to use backticks

score 1 · Answer 2 · answered Apr 13 '23 at 19:32

For

where u_id='123' and app='articles'

needs

INDEX(u_id, app)  -- either order is good.

For

where `date` >='2023-03-01'  and sub_category='test';

needs

INDEX(sub_category, date)  -- in this order

More on building optimal indexes: Index Cookbook

select by matching 2-4 column values and will always be selected with a date range

If those columns are tested with =, start an INDEX with those columns, then add date at the end (because it is a range).

period of 1 week to 6 months, but there are cases where the user may want to view a few years of data

The shorter ones are likely to use an index and be fast, the longer ones may still take a long time.

If you have other example queries, I may provide more suggested indexes.

I do agree that InnoDB is better. (In fact, MySQL 8.0 has removed MyISAM.)

Improving performance of large SQL table for visitors stats

2 Answers2