43

Is there any hard and fast rule to decide what columns and in which order it should be put in Included in non clustered index. I was just reading this post https://stackoverflow.com/questions/1307990/why-use-the-include-clause-when-creating-an-index and I found that for the following query :

SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5

The poster suggested to make index like this:

CREATE NONCLUSTERED INDEX NC_EmpDep 
  ON Employee(EmployeeID, DepartmentID)
  INCLUDE (Lastname)

here comes my question why can't we make index like this

CREATE NONCLUSTERED INDEX NC_EmpDep 
      ON Employee( EmployeeID, DepartmentID, LastName)

or

    CREATE NONCLUSTERED INDEX NC_EmpDep 
          ON Employee( EmployeeID, LastName)
INCLUDE (DepartmentID)

and what thing leads the poster to decide to keep the LastName column included. Why not other columns? and how to decide in what order we should keep the columns there?

4 Answers4

54

That index suggestion by marc_s is wrong. I've added a comment. (And it was my answer accepted too!)

The index for this query would be

CREATE NONCLUSTERED INDEX NC_EmpDep 
  ON Employee(DepartmentID)
  INCLUDE (Lastname, EmployeeID)

An index is typically

CREATE INDEX <name> ON <table> (KeyColList) INCLUDE (NonKeyColList)

Where:

  • KeyColList = Key columns = used for row restriction and processing
    WHERE, JOIN, ORDER BY, GROUP BY etc
  • NonKeyColList = Non-key columns = used in SELECT and aggregation (e.g. SUM(col)) after selection/restriction
gbn
  • 70,237
  • 8
  • 167
  • 244
19

JNK and gbn have given great answers, but it's also worth considering the big picture - not just focusing on a single query. Although this particular query might benefit from an index (#1):

Employee(DepartmentID) INCLUDE (Lastname, EmployeeID)

This index does not help at all if the query changes slightly, such as:

SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5 AND LastName = 'Smith'

This would need the index (#2):

Employee(DepartmentID, LastName) INCLUDE (EmployeeID)

Imagine you had 1,000 employees in Department 5. Using index #1, to find all the Smiths, you'd need to seek through all 1,000 rows in Department 5, as the included columns are not part of the key. Using index #2, you could seek directly to Department 5, LastName Smith.

Index #2 is thus more useful at servicing a wider range of queries - but the cost is a more bloated index key, which will make the non-leaf pages of the index larger. Every system will be different, so there's no rule-of-thumb here.


As a side note, it's worth pointing out that if EmployeeID was the clustering key for this table - assuming a clustered index - then you don't need to include EmployeeID - it's present in all non-clustered indexes, meaning index #2 could just be

Employee(DepartmentID, LastName)
7

I'm not sure how you got that first one. For me, for that query, I would use:

CREATE NONCLUSTERED INDEX NC_EmpDep 
  ON Employee(DepartmentID)
  INCLUDE (EmployeeID, Lastname)

There's not a "Hard and fast rule" for pretty much anything in SQL.

But, for your example, the only field the index will use is DepartmentID because it's in the WHERE clause.

The other fields just need to be easily accessible from there. You select based on DepartmentID then the INCLUDE has those fields at the leaf node of the index.

You don't want to use your other examples because they wouldn't work for this index.

Think of an index like a phone book. Most phone books are ordered by Last Name, First Name, Middle Initial. If you know someone's first name, but not their last name, the phone book does you no good since you can't search for first name based on the order of the phone book's index.

The INCLUDE fields are like the phone number, address, etc. other information for each entry in the book.

EDIT:

To further clarify why not to use:

CREATE NONCLUSTERED INDEX NC_EmpDep 
          ON Employee( EmployeeID, LastName)
INCLUDE (DepartmentID)

This index is only useful if you have either EmployeeID or BOTH EmployeeID and LastName in your WHERE clause. This is pretty much the OPPOSITE of what you need for this query.

JNK
  • 18,064
  • 6
  • 63
  • 98
-1

I think you might still be able to use the (employee_id, department_id) index, but you'd have to include a 'dummy' line in the where phrase, like: "employee_id = employee_id)

  • having an index on (employee_id, departemnent_id),
  • having to search / restrict only on a department_id
  • knowing it won't use the index since wrong order (or things have changed by now, and the following "trick" is no longer needed. I'm an "oldy"?).
  • Use the "old" tricK?

    select * from Employee emp
    where emp.employee_id = emp.employee_id
    and emp.department_id = 5

(So I'm not focusing on the include part here of Lastname, but on the yes/or not being used of key.)

Kind regards,

Miguell