Optimizing index creation

Question

One thing that has bothered me for awhile is creating many many indexes for the one table where the query only differs slightly or contains many of the same columns.

An example case would be:

table1
--------------
id
state
first_name
last_name
city
phone
cat_id

Now, imagine the below queries:

SELECT * FROM table1 WHERE state = 'WA' AND cat_id = 124;
SELECT * FROM table1 WHERE state = 'WA' AND cat_id = 124 AND first_name = 'bob';
SELECT * FROM table1 WHERE state = 'WA' AND cat_id = 124 AND phone = '12345' AND first_name = 'bob';
SELECT * FROM table1 WHERE city = 'seattle' AND last_name = 'jones' AND phone = '12345';
SELECT * FROM table1 WHERE city = 'seattle' AND cat_id = 124 AND phone = '12345' AND last_name = 'jones';
SELECT * FROM table1 WHERE city = 'seattle' AND cat_id = 124 AND phone = '12345' AND first_name = 'bob' AND last_name = 'jones';

Normally I would just create an index for each type, unless there are far too many combinations. However, this generally feels like I am just creating one big mess, and on large tables, substantially increasing the INSERT/UPDATE/DELETE times.

Is there a better method to handle things like this?

Note: I don't have a table like this and I am also aware of leftmost indexing, this is purely an example case.

score 1 · Answer 1 · edited Apr 13 '17 at 12:42

Indexing the columns that only have yes and no values (cardinality of 2) would never get use. You would quickly subject table1 to table scans. Index merges would be out of the question : See Combining columns in index

In the given table, note that you could separate the clothing types from the person

Here is your table. Let's call it person

CREATE TABLE person
(
    id INT NOT NULL AUTO_INCREMENT,
    state CHAR(2),
    jacket CHAR(3),
    coat CHAR(3),
    shoes CHAR(3),
    raincoat CHAR(3),
    sweater CHAR(3),
    pants CHAR(3),
    PRIMARY KEY (id),
    KEY (state)
);

Let's create two tables

Here is one to hold types of clothing

CREATE TABLE articles_of_clothing
(
    id INT NOT NULL AUTO_INCREMENT,
    article VARCHAR(20),
    PRIMARY KEY (id)
);
INSERT INTO clothing_types (article) VALUES
('jacket'),('coat'),('shoes'),('raincoat'),('sweater'),('pants');

Here is one to hold types of clothing associated with a person

CREATE TABLE clothing
(
    id INT NOT NULL AUTO_INCREMENT,
    person_id INT NOT NULL,
    article_id INT NOT NULL,
    PRIMARY KEY (id),
    KEY person_article (person_id,article_id),
    KEY article_person (article_id,person_id)
);
INSERT INTO clothing (person_id,article_id)
SELECT A.id,,B.id FROM person A,
(SELECT id FROM articles_of_clothing WHERE article='jacket') B
WHERE A.jacket='yes';
INSERT INTO clothing (person_id,article_id)
SELECT A.id,,B.id FROM person A,
(SELECT id FROM articles_of_clothing WHERE article='coat') B
WHERE A.coat='yes';
INSERT INTO clothing (person_id,article_id)
SELECT A.id,,B.id FROM person A,
(SELECT id FROM articles_of_clothing WHERE article='shoes') B
WHERE A.shoes='yes';
INSERT INTO clothing (person_id,article_id)
SELECT A.id,,B.id FROM person A,
(SELECT id FROM articles_of_clothing WHERE article='raincoat') B
WHERE A.raincoat='yes';
INSERT INTO clothing (person_id,article_id)
SELECT A.id,,B.id FROM person A,
(SELECT id FROM articles_of_clothing WHERE article='sweater') B
WHERE A.sweater='yes';
INSERT INTO clothing (person_id,article_id)
SELECT A.id,,B.id FROM person A,
(SELECT id FROM articles_of_clothing WHERE article='pants') B
WHERE A.pants='yes';

Let's remove the clothing columns from the person table

CREATE TABLE person_new LIKE person;
ALTER TABLE person_new DROP COLUMN jacket;
ALTER TABLE person_new DROP COLUMN coat;
ALTER TABLE person_new DROP COLUMN shoes;
ALTER TABLE person_new DROP COLUMN raincoat;
ALTER TABLE person_new DROP COLUMN sweater;
ALTER TABLE person_new DROP COLUMN pants;
INSERT INTO person_new (id,state) SELECt id,state FROM person;
ALTER TABLE person RENAME person_old;
ALTER TABLE person_new RENAME person;
DROP TABLE person_old;

Now that they are separated, let's look at your queries. We'll use just the first one.

SELECT * FROM table1 WHERE pants = 'yes' AND sweater = 'yes';

How can you accomplish this under the new design? Do it in stages

Create a Table to collect person_ids from each pass
Query the clothing table for all person_ids that have pants and collect them
Query the clothing table for all person_ids that have sweater and collect them
Count how many have person_id written twice
Join all person_ids found twice back to person table

Here are those steps

CREATE TEMPORARY TABLE queryids (person_id INT NOT NULL,KEY (person_id));
INSERT INTO queryids
    SELECT B.person_id FROM
    (SELECT id article_id, FROM articles_of_clothing WHERE article='pants') A
    INNER JOIN clothing USING (article_id) B
;
INSERT INTO queryids
    SELECT B.person_id FROM
    (SELECT id article_id, FROM articles_of_clothing WHERE article='sweater') A
    INNER JOIN clothing USING (article_id) B
;
SELECT B.* FROM
(
    SELECT COUNT(1) rcount,person_id
    FROM queryids GROUP BY person_id
    HAVING COUNT(1)=2
) A LEFT JOIN person B;

Since you only have 6 articles of clothing this may not be the best solution. If you have over twenty articles of clothing, the Query Optimizer stands a better chance of using indexes. I say this because when the Query optimizer has to read more than index entries for more than 5% a table, the Query Optimizer gives up an does a full table scan.

I have given this rule-of-thumb as a root cause to many index searching misadventures:

Dec 11, 2012 : How do I force a JOIN to use a specific index in MySQL?
Jun 18, 2012 : Why does MySQL only sometimes use my index for a range query?
May 07, 2012 : MySQL EXPLAIN doesn't show 'use index' for FULLTEXT
Mar 22, 2012 : Why does MySQL choose this execution plan?
Jan 18, 2012 : MySQL status variable Handler_read_rnd_next is growing a lot
Oct 20, 2011 : In MySQL, should I add an index even if the query that scans the table is only ran once a month?
Jul 12, 2011 : MySQL very slow query when changing one WHERE field despite no index/key

score 1 · Accepted Answer · answered Dec 30 '12 at 09:01

Usually, indexes shine in cases where the criteria given will narrow down the data set significantly.

For example, if you have a table of 10 million rows, and the few criteria will narrow it down to say 100 rows, then using proper indexes will help greatly.

So, in your hypothetical table, I would a few composite keys lead by the almost compulsory criteria.

index on (state, city, first_name, last_name) people typically search by state
index on (city, first_name, last_name) else they should know the city name
index on phone if they know phone, usually the rest seldom matters

While googling to see if someone else gives a good explanation, I found this for your read as well.

Hope it helps.

Optimizing index creation

2 Answers2