6

I have an InnoDB table of customers, partitioned by KEY, with each partition being one US state. I have an index on phone number, and the performance is great. We also need to SELECT by last name in a given US state. Should I create an index on lastname AND state, or there's no point in doing that because the table is already partitioned by state?

Will that take up unnecessary space, since InnoDB should already know what partition is which state, and it only has to find all rows matching last name in that partition?

We aren't doing any joins whatsoever...this is just for a simple query like this:

SELECT * FROM table WHERE lastname = "Smith" AND state="NY"
RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
tiredofcoding
  • 147
  • 2
  • 5

2 Answers2

2

You should create an index that involves lastname and maybe state,lastname.

Think of it. What would obligate a Query Optimizer to plan a query to search by lastname within a partition? The only course of action would be a full table scan within the correct partition.

Let's assume 50 U.S. States (no Puerto Rico, Guam, US Commonwealths, or Protectorates)

If you have 1 million rows, that would be average out 20,000 rows per state. A table scan of 20,000s for a lastname may not be that bad. However, we are talking about the United States. Huge numbers could live in New York, California, Texas and Florida. Smaller numbers will reside in Rhode Island, Idaho, Wyoming, and Montana. Search times via full table scans will vary greatly. In light of this, it would be beneficial to index names within each state.

I suggest two indexes

  • LastName : The second index will get the Query Optimizer to search for LastName in every partition. Doing 50 index lookups would sure beat a 50 full table scans any given day. You mentioned you have an index on phonenumber only and the performance was great. The same would have to apply to a LastName-only index.
  • State,LastName : The first index will get the Query Optimizer to search for LastName only down one partition, the state you selected (Maybe unnecessary if the Query Optimizer is intelligent enough to examine the query and select the correct partition).

UPDATE 2012-10-09 16:21 EDT

In all fairness to MySQL's partitioning, if the Query Optimizer is smart enough to navigate data retrieval from just one partition base on the WHERE clause, then indexing (state,lastname) may be unnecesaary. Nevertheless, you must still index on lastname. That way, each partition is indexing on lastname.

My original suggestion of indexing on (state,lastname) is actually based on how I used to index MyISAM tables that were combined under the MERGE storage engine.

Here were my past examples on MERGE tables:

Since Table Partitioning is implemented, indexing on lastname should suffice for you. In fact, recall you stated that the phonenumber is indexed and performance was great, So, I say for the second time, indexing lastname should also be as good for you.

RolandoMySQLDBA
  • 185,223
  • 33
  • 326
  • 536
1

Without partitioning

Based on what you said, PARTITIONing is useless for your table. Instead, have a non-partitioned table with

INDEX(phonenumber),
INDEX(lastname, state) -- in either order

They would be optimal for your two queries. The non-partitioned lookup by phonenumber would be better because it need look only one place, not 50 (plus merge the results, 49 of which are empty).

With PARTITION BY KEY(state)

INDEX(phonenumber), -- checks 50 places
INDEX(last_name)  -- sufficient...

WHERE lastname = "Smith" AND state="NY" will first "prune" to the one partition, then search the index specific to the NY partition. Adding state to the index (either first or last) would provide no advantage, and make the index larger (a slight disadvantage).

BTW, California has 12% of the population of the US; Wyoming has 0.2%. Such imbalance has minimal impact because its BTrees may not be any deeper.

Each partition has a bunch of "free" space; the overall free space for the table is about 50 times as much as with a non-partitioned table. For that reason alone, don't bother with partitioning unless you have at least a million rows.

Rick James
  • 80,479
  • 5
  • 52
  • 119