I have a query that is taking a particularly long time to run (15+ seconds) and it is only getting worse with time as my dataset grows. I have optimized this in the past, and have added indices, code-level sorting and other optimizations, but it needs some further refining.
SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM `sounds`
INNER JOIN ratings ON sounds.id = ratings.rateable_id
WHERE (ratings.rateable_type = 'Sound'
AND sounds.blacklisted = false
AND sounds.ready_for_deployment = true
AND sounds.deployed = true
AND sounds.type = "Sound"
AND sounds.created_at > "2011-03-26 21:25:49")
GROUP BY ratings.rateable_id
The query's purpose is to get me the sound id's and the average rating of the most recent, released sounds. There are about 1500 sounds, and 2 Million ratings.
I have several indices on sounds
mysql> show index from sounds;
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| sounds | 0 | PRIMARY | 1 | id | A | 1388 | NULL | NULL | | BTREE | |
| sounds | 1 | sounds_ready_for_deployment_and_deployed | 1 | deployed | A | 5 | NULL | NULL | YES | BTREE | |
| sounds | 1 | sounds_ready_for_deployment_and_deployed | 2 | ready_for_deployment | A | 12 | NULL | NULL | YES | BTREE | |
| sounds | 1 | sounds_name | 1 | name | A | 1388 | NULL | NULL | | BTREE | |
| sounds | 1 | sounds_description | 1 | description | A | 1388 | 128 | NULL | YES | BTREE | |
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+
and several on ratings
mysql> show index from ratings;
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| ratings | 0 | PRIMARY | 1 | id | A | 2008251 | NULL | NULL | | BTREE | |
| ratings | 1 | index_ratings_on_rateable_id_and_rating | 1 | rateable_id | A | 18 | NULL | NULL | | BTREE | |
| ratings | 1 | index_ratings_on_rateable_id_and_rating | 2 | rating | A | 9297 | NULL | NULL | YES | BTREE | |
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Here is the EXPLAIN
mysql> EXPLAIN SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM sounds INNER JOIN ratings ON sounds.id = ratings.rateable_id WHERE (ratings.rateable_type = 'Sound' AND sounds.blacklisted = false AND sounds.ready_for_deployment = true AND sounds.deployed = true AND sounds.type = "Sound" AND sounds.created_at > "2011-03-26 21:25:49") GROUP BY ratings.rateable_id;
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
| 1 | SIMPLE | ratings | index | index_ratings_on_rateable_id_and_rating | index_ratings_on_rateable_id_and_rating | 9 | NULL | 2008306 | Using where |
| 1 | SIMPLE | sounds | eq_ref | PRIMARY,sounds_ready_for_deployment_and_deployed | PRIMARY | 4 | redacted_production.ratings.rateable_id | 1 | Using where |
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+-------------+
I do cache the results once obtained, so site performance is not much of an issue, but my cache warmers are taking longer and longer to run due to this call taking so long, and that is starting to become an issue. This doesn't seem like a lot of numbers to crunch in one query…
What more can I do to make this perform better?