Is EAV an appropriate design for a collection of coexisting metadata-rich "fact estimates" on objects?

Question

I'm attempting to architect a system that will collect estimates on various quantities (10+) on each of many (300K+) objects, and make decisions from the historical record of these estimates. For example, we have a number of human and automated processes that could be attempting to ascertain an object's, say, PowerLevel, and as they report results that may (or may not) supercede each other, we'd like to track the history of these reports, along with metadata (dates, tuning parameters, code versions, etc.) from the processes. Specifically, we might want to do queries like "grouping by entity id, for each distinct attribute, find the most recent estimate" or "find all entities whose attributes had any updates that were run by codebase 573ae4".

As someone who's never actually used an entity-attribute value schema in production, this seems like a perfect use case for something like EAV, with additional metadata columns for the sources. Specifically, I'd envision a table like this:

But I've heard a lot of criticism of these types of schema, such as this answer. I'm hard pressed to find anyone using EAV on non-legacy systems, and that makes me very hesitant that I may be reinventing a very obsolete wheel...

On the other hand, the only other alternative I can think of is full NoSQL (bleh) or some unholy hybrid like:

because we'd need metadata on each attribute.

Any thoughts? Is this one of the few times that EAV is a good fit? Thanks for your help!

Aaron Bertrand · Answer 1 · 2020-05-08T02:48:38.017

EAV is not evil; like any other tool, it can be implemented poorly and abused. You can find articles trash talking cursors, dynamic SQL, triggers, even SQL Server itself. That doesn't make them bad things.

EAV can be an appropriate solution. Whether it's the right answer in your specific case is probably more opinion-based than anything; I'm answering more to suggest that you not close your mind to a solution because someone said it was bad. There are always opposing viewpoints to consider:

What's so bad about EAV, anyway?

score 3 · Answer 2 · edited Feb 27 '15 at 14:32

I have an EAV for tracking server / database configuration. It's great for getting data in. We can throw any data at it and the loader ensures the "E" and the "A" reflect the data given. Once we got over a few hundred million rows in the Values table, however, getting data out became increasingly problematic. (I think having 400-column PIVOT queries returning 100K rows has a lot to do with that.)

On the whole I believe we made the correct choice. With little specification we were able to lauch a functional system within weeks and deliver actionable insights. Now that our requirements, reporting, ad hoc load and use cases have settled we're copying data to "normalised" table to satisfy interactive needs. I don't ever want to get rid of the EAV part, though - it's just too darn useful!

By the way if one set of INSERTs does not include values for all attributes then your "roll forward" queries i.e. give me the most recent value for every attribute, no matter when I collected that value, will be expensive, too.

Is EAV an appropriate design for a collection of coexisting metadata-rich "fact estimates" on objects?

2 Answers2