My query has a join using a BETWEEN clause and it causes a full table scan to occur if I understand the EXPLAIN output correctly.
Query
EXPLAIN
SELECT g.id, g.name, c.id AS child_id, c.date
FROM g
JOIN c
ON c.parent_id = g.id
AND c.date BETWEEN g.start_date AND g.end_date;
Keys
- Primary key of
g: (id, start_date) - Primary key of
c: (id) - Foreign key on
c.parent_id references g.id - Index on
c: (parent_id)
EXPLAIN output
+----+-------------+-------+------+-------------------+-------------------+---------+-------------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+-------------------+-------------------+---------+-------------+-------+----------+-------------+
| 1 | SIMPLE | g | ALL | PRIMARY | | | | 18342 | 100 | "" |
| 1 | SIMPLE | c | ref | idx_parent_id | idx_parent_id | 4 | square.g.id | 3 | 100 | Using where |
+----+-------------+-------+------+-------------------+-------------------+---------+-------------+-------+----------+-------------+
In the g (parent) table, there are no rows that overlap each other in start/end date for a given ID but the database would not know that since there is no way to indicate it as far as I know.
All of the date columns are of type date.
How can I avoid a full table scan, or if I can't, what can I do to improve the performance? As of now there are approximately 250k rows in the c table and 25.7k rows in the g table.
This query is used to create a view, and in the application there are many queries that select from this view and joins it with other tables, so it is very important to optimize this view.
What has been tried as per comments:
- Adding indices
g (id, start_date, end_date)andc (parent_id, date) - Replacing
c.date BETWEEN g.start_date AND g.end_datewithg.start_date <= c.date AND c.date <= g.end_date