Constraints in a relational databases - Why not remove them completely?

Question

Is there any reason to build constraints between tables (inside SQLserver) nowadays? If so, when? Most applications in my area are built on object principles and tables are joined on demand. Demand are based on the need from the application. I won't load a bunch of constrained tables for a simple look-up, which in turn (after action) require one another simple lookup.

ORM tools such EntityContext, Linq2Data, NHibernate are also handle the constraints by themselves, at least you know what tables that need each other. Doing constraints inside the server is just about making (forcing) same changes twice?

This is usually not a question up for decision, but this database is designed quite different. The design looks regular good, mostly mirrored the objects used by the applications. What disturbing me is all constraints configured inside SQLserver with "not cascade". Which means that you have to play "seek and find" when coding new database queries. Some cases requires up to 10 levels of an exact order to make a single delete.

This surprises me and I'm not sure how to handle it.

In my simple world, that setting makes the constraints losing most of the purpose. OK if the database were accessed from hosts without knowledge of the design.

How would you act in this scenario?
Why not just remove all constrains from db and keep them at application level?

Péter Török · Accepted Answer · 2011-05-13T11:55:18.020

Two general reasons not to remove contraints from DB:

It may be accessed by more apps, now or in the future, which may or may not use ORM. Even if the developers of those apps faithfully duplicate all the constraints there (which may be significantly more difficult using lower level non-ORM solutions), it is always extra work. And if not, even one small omission is enough to break schema integrity... which is something you don't want to risk. In most companies, the data stored in their DB is the lifeblood of their business, so its integrity must be ensured by any means. And the tried and proven best means to achieve this is to implement as many constraints in the DB as possible.
The query optimizer relies a lot on the constraints known on the DB level. If you remove constraints, query performance may start deteriorating. You may not immediately notice it, but one day it is going to hit you, and by then it may be too late to fix it easily. The nature of things is that DB performance tends to break down at peak load time, when there is the least possibility to make careful, well thought out design improvements, backed by exact performance measurements and detailed analysis to pinpoint the root causes.

Your concrete case sounds like the DB schema may have been originally generated by an ORM tool (or designed by someone not very experienced with the relational world), so it is suboptimal from the relational point of view. It is probably better to analyse and improve it towards a more "natural" relational design, while keeping it consistent with the ORM views. It may be useful to involve a DB expert in this analysis.

score 28 · Answer 2 · answered May 13 '11 at 08:12

Applications can come and go but data lives forever. In my company the DB is over 30-40 years old, it will live on as long as the company exists. The applications change, developers come and go. It is better to have integrity and a good logical data model. That way someone can look at data and get meaningful understanding without having to go through a complex codebase. This also helps in reporting significantly. Also applications can and will have bugs and DB constraint is a guard against that. My default position is to have as much constraint (FK and check) as possible.
The only reason not to have a constraint would be if your design pattern doesn't allow that e.g. Table-per-hierarchy or performance problems.

score 16 · Answer 3 · answered May 13 '11 at 14:51

What disturbing me is all constraints configured inside SQLserver with "not cascade".

That is not disturbing to me, that means someone has shown good sense. Cascading deletes are often very bad for the database. Inthe first place, sometimes you want a delete to fail if you have data in related tables. For instance, if youhave a customer who has an order inthe past, you don't want him to be deleted or you lose the data about who the order was for and a cascade delete woudl get rid of the record which would messup you financial reporting.

You seem to to think that ease of developemeant is the most important thing. In the database world this is simply not true. Data integrity is the first most critical thing followed closely by performance and data security. If it takes longer to write the queries then so be it.

Database are typically acted on by many applications = one or more web sites or desktop applications, a reporting application, web-services, the query window, ETL processes, etc. If you do not enforce contraints at the database level you first lose the integrity of the data as one of those applications may not follow all the rules. Second, you have to code those contraints multiple times and rewrite them if you decide to use a different application later. Third, you cannot control in advance whether there will be a need to to do some sort of data maintenance task that will not happen through the application(fixing the data from a bad customer data import for instance or changing all the 10,000,000 records from one client to another client when the company is bought by a competitor). Typically application developers don't think about how the data may need to be manipulated over time and thus forget to add important constraints at the database level which is the first, most important place where they belong.

score 11 · Answer 4 · answered May 13 '11 at 12:32

I read somewhere once that said basically: The data is the key of your application. If you will only EVER access data through your UI (and I mean ever, as in now and forever, for all eternity... or the lifetime of your application, anyway) then you don't need database constraints. But there's ever a chance that something other than the app itself will need to touch data, for example a web service, public API, rake task/SQL job/cron/automated script, then you will save yourself a lot of potential trouble down the road by keeping the DB constraints.

I firmly believe this is the one area of software development where you should not apply DRY (and I'm fully expecting a bevy of downvotes for that statement). Your data is the heart and soul of your application - if it's ever corrupted beyond repair, that it: game over. It's worth it IMO to enforce the constraints everywhere they are needed. If that means in the form of Triggers and constraints on the DB level, server-side validations on the middleware, and client-side Javascript on the UI (for web apps), then it's IMO a necessary evil to ensure the data is always pristine.

score 6 · Answer 5 · answered May 13 '11 at 08:42

Do you know what ORM means? Object-relational mapping. Quoting Wikipedia "technique for converting data between incompatible type systems". Yup, relational and object models don't fit together. ORMs do a pretty good conversion, by respecting rules of both type systems. RDBMS are organized in such way, that you achieve data integrity by using constraints. In general, integrity is very nice thing to have, so ORMs tend to use them when creating data model for storing object data. Your ORM probably has a good reason to use "non cascading" constraints. And if this forces you to make complicated queries instead of just create/update/drop certain objects, then something is wrong with your ORM setup.

If you consider relational concept to be annoying, then why you don't use object database? Some time ago they were slow (which is why most people still use RDBMSs) but from what I've heard things changed a bit. You would get rid of all the relational nitpicks. Simply objects in, objects out.

score 6 · Answer 6 · 2011-05-13T12:55:49.683

Well that is what eBay did and they probably have one of the largest databases in the world:

http://www.dba-oracle.com/oracle_news/news_ebay_massive_oracle.htm http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf

Despite what has been said above about performance being increased by referential integrity, it can actually be degraded; which is why massive databases have been dropping their constraints and doing the work in the application layer. And as far as I can tell it is the only really good reason.

By dropping those constraints you essentially loose your safety net that keeps the data clean, and that brings about its own problems. So as with everything it is a balancing act. I guess that in general maintaining referential integrity is the right thing to do.

Having worked in a development environment with strong referential integrity, I know that from a developer's point of view that it can be a total pain; often in a development environment a bit of dirty data doesn't matter and working out how to delete a row might take an hour or more. However, it can also be very useful, as the constraints make the schema explicit.

sunwukung · Answer 7 · 2011-05-14T11:26:12.863

First - my answer: No, you should not rely on the application alone to look after your data.

This points to a larger debate: ORMs have encouraged a culture of disdain for "direct" DB interaction, often at the expense of normalisation/referential integrity. Tables are forcibly mapped to arbitrary object hierarchies, at the expense of the design implicit in the relational model. The decoupling favoured by OOP is arguably sacrificed here as the application makes it's design felt in the data structure. While ORM has demonstrated great utility, it seems to be based on the abuse or mistrust of SQL.

New paradigms are (re)emerging, take Functional programming for example. If the dev team decides to embrace a new programming methodology then what implications will this have for data that has been structured according to the ORM's requirements?

I agree with @Jacek Prucia - I think ORM is a bad match for RDBMS, I'd personally opt for a DBAL on RDBMS, or go for an OODB with ORM.

score 3 · Answer 8 · answered May 13 '11 at 15:43

Constraints are your only guarantee that you have consistency and data integrity at the database level. Sure, you can enforce constraints using your application code, but what if, in the future, you need to modify the data directly? You might understand how to maintain data integrity, but someone else might not. Keeping the constraints at the data level ensures that the integrity is ensured even when someone is monkeying around in places they don't understand.

Furthermore, let's say your application needs to be rewritten, but with the same database in place. All those constraints in code are just begging for bugs that prevent some entry whilst allowing erroneous data through.

When developing, keep it simple. Constraints let you do that. (That said, when a constraint throws an error, don't spit the same error back at the user. Make the error understandable.)

(As to the cascade issue: that's a good thing. I'd prefer to throw an error that certain other records have to be deleted first, rather than rely upon the cascade to get everything right. Cascades are nice in theory, but not necessarily so in practice.)

score 2 · Answer 9 · answered May 13 '11 at 15:18

One problem with constraints in a database are that they give the program limited information on what failed and how to fix it. This means that, for smooth handling, it is often necessary to repeat the constraint check in the application, and therefore the database constraint check is wasted effort.

This runs the risk of compromising data integrity, so we've got tradeoffs here. For important data, ensuring data integrity is almost always more important than performance, and it's far better to fail a transaction even if it looks arbitrary than to mess up the data.

To safely remove constraints, it is therefore vital to secure database access so that nothing can change the database without checking the constraints. This is not reliable when writing new applications or coming up with ad hoc ways of dealing with the data, since all it takes is one mistake and the database is corrupt.

Therefore, to dispense with database constraints, it is necessary to establish what can and what cannot be done with the database up front, so that all applications can be written, reviewed, and tested extensively. All database requirements must be established up front, and any change to database requirements will require extensive work. This is something of a frozen waterfall methodology, which works only in very specific cases. (Designing, implementing, and maintaining to requirements is much like walking on water. Something has to be frozen first, and if it isn't frozen quite enough the results can be disastrous.)

One case where it does work is the massive enterprise applications like PeopleSoft and SAP, where the application already does virtually everything, and there are carefully defined ways to extend it. There are other, very rare, possibilities.

So, unless you work on a very large enterprise project (and I wouldn't want to) or can walk on liquid water, leave those constraints in the database.

Constraints in a relational databases - Why not remove them completely?

9 Answers9

Linked