I am looking to optimize a JOIN between tables that uses a small subset of columns in the second table.
CREATE TABLE `employees` (
`employee_id` bigint NOT NULL,
`manager_id` bigint NOT NULL,
`org_id` bigint NOT NULL,
`union_id` bigint NOT NULL
...
PRIMARY KEY (employee_id),
INDEX (union_id)
);
CREATE TABLE managers (
manager_id bigint NOT NULL,
org_id bigint NOT NULL,
some_condition boolean NOT NULL,
PRIMARY KEY (manager_id)
);
Now I want to optimize two types of queries. Both join the tables together on manager_id and org_id, and optionally apply a filter to the some_condition column.
SELECT employees.*
FROM employees
JOIN managers
ON (employees.manager_id = managers.manager_id AND employees.org_id = managers.org_id)
WHERE (employees.union_id = ? AND managers.some_condition);
SELECT employees.*
FROM employees
JOIN managers
ON (employees.manager_id = managers.manager_id AND employees.org_id = managers.org_id)
WHERE (employees.union_id = ?);
Assuming these are very large tables and employees > manager. I am trying to create an index on managers that will speed up the query. Right now the query is slow because for each row it has to read org_id and some_condition. I want to avoid going to disk if possible.
So far I have two indexes that might work:
INDEX `join_index` (`org_id`,`some_condition`)
INDEX `id_join_index` (`manager_id`, `org_id`, `some_condition`)
My main issue is that MySQL does not use either index in the EXPLAIN statement unless I force it to with use index (...).
Which index (if either) will speed up my query, and do I need manager_id in the index to speed up the join if I do not filter on some_condition?