I have a little web-application that is using sqlite3 as it's DB (the db is fairly small).
Right now, I am generating some content to display using the following query:
SELECT dbId,
dlState,
retreivalTime,
seriesName,
<snip irrelevant columns>
FROM DataItems
GROUP BY seriesName
ORDER BY retreivalTime DESC
LIMIT ?
OFFSET ?;
Where limit is typically ~200, and offset is 0 (they drive a pagination mechanism).
Anyways, right now, this one query is completely killing my performance. It takes approximately 800 milliseconds to execute on a table with ~67K rows.
I have indexes on both seriesName and retreivalTime.
sqlite> SELECT name FROM sqlite_master WHERE type='index' ORDER BY name;
<snip irrelevant indexes>
DataItems_seriesName_index
DataItems_time_index // This is the index on retreivalTime. Yeah, it's poorly named
However, EXPLAIN QUERY PLAN seems to indicate they're not being used:
sqlite> EXPLAIN QUERY PLAN SELECT dbId,
dlState,
retreivalTime,
seriesName
FROM
DataItems
GROUP BY
seriesName
ORDER BY
retreivalTime
DESC LIMIT 200 OFFSET 0;
0|0|0|SCAN TABLE DataItems
0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY
The index on seriesName is COLLATE NOCASE, if that's relevant.
If I drop the GROUP BY, it behaves as expected:
sqlite> EXPLAIN QUERY PLAN SELECT dbId, dlState, retreivalTime, seriesName FROM DataItems ORDER BY retreivalTime DESC LIMIT 200 OFFSET 0;
0|0|0|SCAN TABLE DataItems USING INDEX DataItems_time_index
Basically, my naive assumption would be that the best way to perform this query would be to walk backwards from latest value in retreivalTime, and every time a new value for seriesName is seen, append it to a temporary list, and finally return that value. That would have somewhat poor performance for cases where OFFSET is large, but that happens very rarely in this application.
How can I optimize this query? I can provide the raw query operations if needed.
Insert performance is not critical here, so if I need to create an additional index or two, that's fine.
My current thoughts are a commit-hook that updates a separate table that is used to track only unique items, but that seems like overkill.