8

I have an INNODB table levels:

+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| id                 | int(9)       | NO   | PRI | NULL    |       |
| level_name         | varchar(20)  | NO   |     | NULL    |       |
| user_id            | int(10)      | NO   |     | NULL    |       |
| user_name          | varchar(45)  | NO   |     | NULL    |       |
| rating             | decimal(5,4) | NO   |     | 0.0000  |       |
| votes              | int(5)       | NO   |     | 0       |       |
| plays              | int(5)       | NO   |     | 0       |       |
| date_published     | date         | NO   | MUL | NULL    |       |
| user_comment       | varchar(255) | NO   |     | NULL    |       |
| playable_character | int(2)       | NO   |     | 1       |       |
| is_featured        | tinyint(1)   | NO   | MUL | 0       |       |
+--------------------+--------------+------+-----+---------+-------+

There are ~4 million rows. Because of the front-end functionality, I need to query this table with a variety of filters and sorts. They are on playable_character, rating, plays, and date_published. The date_published can be filtered to show by the last day, week, month, or anytime(last 3 years). There's also paging. So, depending on the user choices, the queries can look, for example, like one of these:

SELECT * FROM levels
WHERE playable_character = 0 AND
    date_published BETWEEN date_sub(now(), INTERVAL 3 YEAR) AND now()
ORDER BY date_published DESC
LIMIT 0, 1000;

SELECT * FROM levels
WHERE playable_character = 4 AND
    date_published BETWEEN date_sub(now(), INTERVAL 1 WEEK) AND now()
ORDER BY rating DESC
LIMIT 4000, 1000;

SELECT * FROM levels
WHERE playable_character = 5 AND
    date_published BETWEEN date_sub(now(), INTERVAL 1 MONTH) AND now()
ORDER BY plays DESC
LIMIT 1000, 1000;

I should add that rating and plays are always queried as DESC. Only date_published may be either DESC or ASC.

I started out with an index idx_date_char(date_published, playable_character) that worked great on the first example query here. Based on some other answers, I changed to two other indexes (date_published, playable_character, plays) and (date_published, playable_character, rating).

The first query still runs very fast, however there's some unusual things happening in EXPLAIN, when player_character = x exceeds a certain number of rows (~700,000): the USING WHERE pops on in EXPLAIN.

So, first question is are there any improvements in the query or indexes possible, and, second, what MySQL settings should get altered to allow for the large result sets.

Any suggestions greatly appreciated. TIA.

Hal50000
  • 269
  • 1
  • 6

3 Answers3

2
WHERE playable_character = 0 AND
    date_published BETWEEN date_sub(now(), INTERVAL 3 YEAR) AND now()

Start with the "=" item, then do the range:

INDEX(playable_character, date_published);

"Pagination", a la ORDER BY rating DESC LIMIT 4000, 1000; is best done by remember where you "left off". That way, you don't have scan over the 4000 records that you don't need.

András Váczi
  • 31,778
  • 13
  • 102
  • 151
Rick James
  • 80,479
  • 5
  • 52
  • 119
-1

create both indexes for better performance if not much of insert or update query:

ADD INDEX (playable_character, date_published, rating desc)

ADD INDEX (playable_character, date_published, plays desc)

otherwise if insert and update query are more using on this table than create index:

ADD INDEX (playable_character, date_published)

or don't create any index

András Váczi
  • 31,778
  • 13
  • 102
  • 151
Saddam Khan
  • 642
  • 1
  • 7
  • 25
-2

Everybody here are missing one point - SQLs in question are retrieving 1000 rows. And one cannot retrieve 1000 rows fast unless most of the data is cached. If data is not cached one has to do 1000 random reads sequentially to retrieve the data.

Good disk based storage with fast disks will give you up to ~200 reads per second. Common disks even in RAID I doubt that manage even 100. That means 10+ seconds to get results even with the best indexes.

So in the longer run such data model and queries won't work. Now the queries run fast when you hit cached data.

Mindaugas Riauba
  • 1,767
  • 1
  • 11
  • 11