Imagine a very basic example of your average discussion board. For example:
CREATE TABLE threads (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
title VARCHAR(100),
PRIMARY KEY (id)
)
CREATE TABLE replies (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
thread_id INT UNSIGNED NOT NULL,
text TEXT NOT NULL,
PRIMARY KEY (id)
INDEX thread_id (thread_id),
)
There can be very lengthy discussions (people love bikeshedding!), maybe 100k or 200k replies per thread and users can read them paginated (the number of replies per page is variable, depending on user preferences, but if needed for the solution it can be limited to a fixed set). These tables might have ~40 million replies and ~2 million threads.
So you might end running this query to get the last replies of a thread:
SELECT * FROM replies
WHERE thread_id = 1234
ORDER BY id ASC
LIMIT 125400,10 /* whoops */
Which, as you know, is quite slow since MySQL has to walk 125,400 rows just to get there and return your 10 rows.
Hacky solutions I've thought:
Create a secondary index which assigns an incrementing number for each chunk of N posts (for example, a new field in the
repliestable which for the first 1000 posts contains 1, for the following 1000 it contains 2, etc).- I have to heavily modify the application, since there are tons of queries that read the
repliestable, it's not just a simpleSELECThere and there and I really don't want to cripple and reengineer each query of the application. - It would force me to recalculate each time that I delete a reply or when I do other destructive operations (splitting, merging, etc).
- I have to heavily modify the application, since there are tons of queries that read the
For each link to the next page, attach the ID of the next post. That way the database can go directly to the row using the primary key of the
repliestable.- This is a web application, so this solution would have tricky SEO implications which I'd prefer not to deal with.
I might be dreaming here (and if so please do tell me!) but is there a solution that resides (almost) exclusively in the database and allows me to fix this problem without heavily modifying the application?
I've read about MySQL partitions, but I'm not sure they would help here.