1

I've been a software developer for 8+ years now and I think I have a pretty good grasp of the concepts but as an entirely self-taught developer, I have some unfortunate gaps in proper terminology. This has not been a problem so far (understanding concepts while not knowing their names), but I'm writing some tutorials now and would very much like to get everything right.

What I'd like to verify with you, my peers, is the use of the term „data normalization“. I have a very simple example in XML on which I try to demonstrate normalized vs. not normalized data:

<!-- Normalized address -->
<person>
    <address>
        <city>Horní Dolní</city>
        <zip>123 00</zip>
        <street>U slepých</street>
        <house_no>53</house_no>
    </address>
</person>
<!-- Not normalized address -->
<person>
    <address>U slepých 53, Horní Dolní, 123 00</address>
</person>

I have always thought that splitting larger chunks of data into smaller, more specific pieces of data is a perfect example of data normalization. (Or in the context of DBs, instead of having one column „address“, having more, specific columns - like in the XML above.) But the more I google on the topic, the stronger feeling I have, that perhaps I have no clue of what normalization actually is :D

So, do you concur with my data normalization example? Or am I off?

1 Answers1

1

The answer is sort of a grey area. By the book definition, which some people like philipxy (who appear to be well versed on), I don't believe your example exhibits normalization.

But even in philipxy's linked answer, which gives a bunch of examples of common misconceptions to what the book definition of normalization is, he even concludes with ""Normalize" also has other generic & specific uses both applicable to and outside of database design" which I think agrees with my take that in a loose sense you can say you are normalizing your data.

To me, one purpose of normalization is to improve data integrity. So I personally loosely call any kind of change that improves data integrity a step in the direction of normalization.

By breaking a single Address field into multiple correlating fields such as City, Zip, Street etc, you're improving your data integrity, and therefore to me I think it's ok to call it normalization. An example of this is if a street name should change, by normalizing the Street into its own column, you minimize your risk to the data integrity when you go to update that street name. Previously you would've had to do some kind of contains search that would be at risk of missing records for the update.

J.D.
  • 40,776
  • 12
  • 62
  • 141