What is an effective way of labeling columns in a database?

Question

I used to label columns in my databases like this:

user_id
user_name
user_password_hash

To avoid conflicts when joining two tables, but then I learnt some more on how to alias tables, and I stopped doing this.

What is an effective way of labeling columns in a database? Why?

score 33 · Answer 1 · edited Oct 25 '11 at 15:51

In your case, the prefix user is redundant. We (the devs in charge) know that this is the table user, so why add user_ prefix in front of every field?

What I would suggest to you is to do it with a more natural approach.

What are the characteristics of a Person: Last Name, First Name, Birthdate, Nationality, etc...

What are the characteristics of a Car: Model, Year, Color, Energy, etc...

Your column should be named as natural as possible, it would make the schema more clear for everybody, for you and the ones coming after you. This is also called the Maintenance phase, and anything you can do to make the maintenance easier is usually worth the effort.

score 16 · Answer 2 · answered Jan 03 '11 at 21:01

16

In addition to Spredzy's comment, label your primary keys the same ( ID ) so that when you're writing queries on the fly, you can easily recall (u.ID=c.ID) instead of having to look up "Was it countryID, country_ID, countries_ID, countriesID, ?"

answered Jan 03 '11 at 21:01

David Hall

2,295
1
18
11

bsoist · Answer 3 · 2011-01-03T21:23:51.230

9

I couldn't agree more with David Hall's addendum to Spredzy's excellent answer. Simple and natural is the way to go. Table confusion should not be an issue if you name tables naturally too.

No sense having users.user_id and cars.car_id when you could have users.id and cars.id

edited Jan 03 '11 at 21:23

answered Jan 03 '11 at 21:11

bsoist

391
2
5

score 8 · Answer 4 · answered Jan 03 '11 at 22:17

I would argue that in a database schema, every column should have a unique name, across tables. There are several reasons for that:

From a modeling point of view: You start with a soup of attributes and you normalize it down into tables. Over time, you might denormalize or normalize further or introduce views or materialized views, or introduce new tables. This is never a problem if all column names are unique.
You can use this join syntax: a JOIN b USING (a_id) JOIN c USING (a_id). Very convenient and also helps with the following point.
If you run queries with lots of joins or create materialized views with SELECT *, you will never (well, maybe rarely) a have a conflict. Think about joining person.name, product.name, country.name, etc. Urgh.
In general, if you have big queries, it's hard to keep track of what id means everywhere.

score 7 · Answer 5 · edited Aug 17 '11 at 20:33

Let's see, with your example it will look something like this:

USERS
----
id
username,
password
registration_date

I use the table name in uppercase. This lets me identify the table easily. The columns I just named is each for what it represents. I try not to use numbers or include any prefix or suffix with it. This will makes the queries dead simple and pretty straightforward.

BTW, I think you should find some style you like and stick with it. If you change it often, then you will have a messier DB schema.

score 5 · Answer 6 · answered Jan 04 '11 at 09:33

I'm working in an environment where each column name starts with a prefix derived from the table name, it is not my invention, but I'm quite happy with it.

Ideally column names are unique over all tables in the database.

Some observations:

we only need table aliases, when tables are joined multiple times in a select statement
it prevent some faults when copying code snippets, because column names must be adapted to the table name
it helps to show to which table a foreign key column points

General ideas: Most important is the consistency of each naming conventions: - singular vs. plural (ok that applies to tables and not columns) - identify primary and foreign keys (they build the structure vs the content of the database) - be consistent when you store strings and short variant of the same string - be consistent with flags, status etc.

score 5 · Answer 7 · answered Jan 04 '11 at 02:30

Like the others, I recommend that you do not include the table name as part of the column. Unless you've got hundreds of tables all with mostly similar column names: if you have multiple dozens of tables all with a column titled ID, then by all means prefix them with the table name.

I recently left a company where one of the developers preferred to prefix primary key and foreign key columns with pk and fk. This lead to some abominations where columns started with pkfk (usually a composite primary key based on 2 columns, of which one column was a foreign key to another table).

score 3 · Answer 8 · answered Jan 03 '11 at 21:01

3

I agree with Spredzy's answer but would add that as a matter of preference I would use camelCase instead of under_score.

firstName, lastName etc.

answered Jan 03 '11 at 21:01

Toby

1,128
2
12
12

Joe · Answer 9 · 2011-10-26T13:26:08.250

In the case of Oracle, you'll want to not name columns 'id' or 'name' or anything generic.

The issue is that by default in older versions, Oracle will attempt to join tables based on similar column names, so if I've named everything well, then I've also ended up specifying the default join clause between my tables.

But even if you're not using Oracle, by not chosing names that appear in multiple tables, it also means that you don't then have to go through the trouble of aliasing every time you have to do a select across two tables:

SELECT
  instrument.name as instrument_name,
  instrument.abbr as instrument_abbr,
  source.name     as source_name,
  source.abbr     as source_abbr,
  ...
FROM ...

So, if multi-table selects are the norm, longer column names save you typing. (if you're only using one table at a time ... do you really even need a relational database?)

... and saving typing brings us to another issue in Oracle -- at least in 8i (the current version when I took the Oracle SQL Tuning and Data Modeling courses) caching of execution plans is based on only the first so many characters of the query (can't remember the exact value ... 1024?), so if you have queries that only varies by something all the way at the end of the where clause, and a really long list of columns you're extracting, you can run into a performance hit as it can't cache the execution plan correctly.

Oracle had a guide on selecting what they claim are good table and column names, which is basically a guide for removing letters 'til it's about 5-8 characters, but I never much cared for it.

...

As things go other than that:

columns are always singular (tables are always plural)
all names are lower case, just in case there's something case-sensitive
as a result of the above, use underscores instead of camel case.

update : for those not familiar with Oracle's join behavior, see the last example on Mastering Oracle SQL : Join Conditions, where it mentions:

What happened? The reason lies in the fact that, aside from supplier_id, these two tables have another pair of columns with a common name. That column is name. So, when you ask for a natural join between the supplier and the part tables, the join takes place not only by equating the supplier_id column of the two tables, but the name column from the two tables is equated as well. Since, no supplier name is the same as a part name from that same supplier, no rows are returned by the query.

Under 'old join syntax' (8i and earlier), 'NATURAL JOIN' was the default join behavior, and I believe it still is if you don't specify a join condition. Once 'NATURAL JOIN' was an official option in 9i, the general recommendation was don't use it, because bad column naming can screw you up, which is my I'm advocating for good column names.

score 2 · Answer 10 · edited Mar 27 '18 at 19:58

Never use double-quotes " because in doing so, you override the native case-folding of the database. The SQL spec demands all identifiers be folded to upper case. Some databases, like PostgreSQL fold them to lower case. If nothing is quoted, it'll work in all databases and they can fold them to the spec or the rdbms-specific default.
Use an under_score (_), because as per above -- you shouldn't use camelCase.
use {entity}_id for ids (and foreign keys pointing to those ids). Because then you can use the USING clause. The globally-unique key names used in join-conditions is a convention established in the spec.
```
SELECT *
FROM employee
INNER JOIN department
  USING (department_id);

  -- compare to
  ON employee.department_id = department.department_id;
```

What is an effective way of labeling columns in a database?

10 Answers10

Linked