1

I have a question.

I'm developing a Data Warehouse, I retrieve the data from my reconciled level, this in order to keep updated my Data Warehouse, but i am not sure how to handle some numeric fields that have a null value.

I was thinking to replace it with some special numeric value like -1 for example, that never appear in my column list. But during the analysis process this value replaced by -1 they must not be considered.

Now my question is, is better leave the null numeric field to null, or replace it with some numeric value?

traveller
  • 11
  • 2

1 Answers1

1

If you don't have a thorough understanding on the business meaning of the data point, you should NEVER put in a dummy or made-up value.

If this is a fact table, you will need to work with the business teams to understand the scenarios in which the NULL can occur.

If the column represents a monetary value, using zero works in most cases, but again, understanding the data point is necessary here too.

If you determine with the business teams that NULL can/should be represented by something else e.g. $0, or -1, etc., I recommend working with the folks responsible for the source data, and trying to affect the change there if possible. Then the appropriate business values flow through to the DWH in your ETL.

If the business determines a field/data point can legitimately have no value, then let there be nulls!

If the column is an identifier e.g. a primary key to another dimension or reference table, then it is a different story. In your dimension or reference table, there would typically be a row representing 'no value', which would have an id of it's own... This key would go in your fct table.

Regardless of what you work out with your business teams... ALWAYS document it all in your data dictionary, so that users of the data (and your ETL/DWH developers) have a reference back to what it was done that way.


Side Note: I hear lots of arguments for not allowing nulls which relate to "oh, what if users writing queries don't know how to properly account for NULLs in joins or where clauses, etc... This is never a good reason to support a design decision at the data model/etl level. This issue is best addressed at the data presentation or reporting layer. E.G. some BI tools have built in handling for this or abstract it away, etc.

GWR
  • 2,847
  • 9
  • 35
  • 42