When do I use another type index than integer? (eg string) [integer vs string index]

Question

I was reading about relational databases like sqlite and I always see the schema for user storage like:

UserID (index) | Username | Password

but wouldn't it be actually much faster to use Username as an index?

For example when a user logs in you query WHERE username == some_name, the database engine has to search through all record for this data, right? (namely SQLite? - I am using this)

Why do I always see this "ID" and not just username as the index?

Assuming all names will be unique, what is the point in an integer based index?

What is the actual difference in terms of lookup speed? Or space? (assuming a maximum character count at 24 characters).

Or would it be even better to just make a additional/separate lookup table for username<->userid resolving and then load the data with userid from the main table where the index is the userid?

score 4 · Accepted Answer · answered Jul 10 '13 at 04:32

There are a lot of places internally where it is possible to reference data using a user id, which is better than a string. For example, if you had 4 tables referencing the user table, they'd each have a string column which is less efficient than 4 integer indexed columns. This is the reason why most schemas use this layout; you're more likely to want to reference a user than to refer to the user by name.

However, you can also set username to be an index, which you might want to do if you are looking it up a lot.

To answer your edited question, a 24 character text field will require roughly 24 bytes. Depending on whether it's a varchar or a char it will be stored differently, but the full amount will be needed for searching, so let's consider them both to be 24.

An integer field will require 4 bytes.

You can see how this will add up over time. For comparisons, assume a linear time comparison, meaning 24 operations to compare two a string field, while only 4 operations to compare an integer field. In practice, they will probably take roughly the same amount of time, though the string comparison cannot be faster.

If you want to search by username, create a table with the schema of username | user_id. That'll let you search by username and join with the other data you are looking for while still being efficient.

score 2 · Answer 2 · answered Jul 10 '13 at 04:34

User ID's are guaranteed to be unique. Even if a user get's deleted, his ID will never be reused. This means that anything based on ID's will always be the same for a particular user. This is important for data integrity. It is also significantly less space to store an int vs a string (especially for larger numbers) and faster to process.

score 1 · Answer 3 · answered Jul 10 '13 at 04:31

1

You can do that...just change the primary key and the sql server will create an index over it.

How ever i don't see why you would do this.

answered Jul 10 '13 at 04:31

score 1 · Answer 4 · answered Jul 10 '13 at 04:43

I wouldn't use username as a PK, not just for arguable performance reasons, but because the username represent 50% of the credentials. If you need to log the user activity, export data, send a user reference to a Third Party, etc. it is more secure to use an ID then the customer username.

When do I use another type index than integer? (eg string) [integer vs string index]

4 Answers4

Linked