3

In raw form, I have data with 2 columns: skills in welsh and stats for that skill. For storing these information in a database, would it be best to combine the stats with the categories, or split them up as below?

WELSH_SKILLS_TABLE
----------------------------------------------------
|  SKILL_ID   |                SKILL
----------------------------------------------------
|      0      |            CAN'T SPEAK
----------------------------------------------------


WELSH_STATS_TABLE
----------------------------------------------------
|  SKILL_FK   |                COUNT
----------------------------------------------------
|      0      |                 235
----------------------------------------------------

Instinctively, the laid out method seems to be correct, but for such a simple task, also seems like overkill?

Thanks!

BlackBox
  • 133
  • 4

2 Answers2

4

Without knowing anything else about your system, that seems like total overkill. You'd be better off with a single table that has ID, Skill, and SkillCount.

Having two tables for such a simple schema will result in unnecessary work for both the DBA and developer. The database engine will need to perform extra work to JOIN the two tables whenever you require reading both Skill and SkillCount at the same time.

If you had a really, really large dataset (think hundreds of millions or billions of rows) and you regularly needed to access just the Skill and rarely needed to show the SkillCount field, then perhaps you might consider using your design.

Hannah Vernon
  • 70,928
  • 22
  • 177
  • 323
4

You ask about normalization.

If you start with this

skill        count
--
Can't speak  235
...

it should be clear that the only candidate key is "skill". This relation is in 5NF.

There is no normalization guideline that introduces new columns like "skill_id". (There's no such thing as "now I have an ID number" normal form.)