12

I understand what foreign keys are, and have made a point of including them wherever they make sense for all my database tables that I design.

However, something which has always confused me is whether or not I should be explicitly setting the "ON UPDATE" and "ON DELETE" features (in lack of a better term). Example:

CREATE TABLE "test1"
(
    id              serial,
    referenceid     integer,
    FOREIGN KEY     (referenceid) REFERENCES "othertable" (id) ON UPDATE CASCADE ON DELETE CASCADE
)

This code goes out of its way to explicitly add the technically "unnecessary" part: "ON UPDATE CASCADE ON DELETE CASCADE".

Since this is not done by default, there must be a reason for this! After all, the default behaviour is always (or at least should always be) the most commonly needed behaviour:

CREATE TABLE "test2"
(
    id              serial,
    referenceid     integer,
    FOREIGN KEY     (referenceid) REFERENCES "othertable" (id)
)

In the test1 table, as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that the referenced records in the test1 table will either be updated or deleted. This seems, on the surface, like what should be the default behaviour.

In the test2 table, again as I understand it, if the "othertable" either changes the "id" column values, or deletes any record(s), that means that PostgreSQL will refuse to perform the query if there are records in test2 which reference the ones being modified.

I'm basically confused about the entire concept of "ON UPDATE" and "ON DELETE". Why would one ever want a query to be refused like that? And "CASCADE" isn't even the only option (besides none); there are multiple other values you can use which cause various behaviour (which I don't understand).

Since there is a stated relationship between the tables (through the foreign keys), isn't the whole point that you want them to remain consistent? So why you not want it to "CASCADE" if there are changes made to the "master" table?

This might be similar to how I could never understand why object-oriented programming had "security measures" in the code, disabling you from directly changing or retrieving an object's properties and being forced to go through "getters" and "setters". I mean, if something can execute queries in your database, isn't "all lost" anyway? They can just do:

DELETE FROM table1/table2 CASCADE

... or something like that.

The ON UPDATE/ON DELETE mechanism seems almost like the database engineers could not decide on the best behaviour and instead put this on the user of the product instead. For me, it adds a lot of confusion and anxiety.

It should be noted that I have used the "test2" style code many times in the past, only to realize that I cannot update or delete records where it made sense. That's why I started using "ON UPDATE CASCADE ON DELETE CASCADE" in the first place, after asking and learning about it.

So why isn't this the default and even the only behaviour for a database? Why would you ever want a query to update/delete your "master records" to fail?

pgphant
  • 121
  • 1
  • 1
  • 3

7 Answers7

7

I'm not sure about ON UPDATE CASCADE. If you find yourself needing this sort of cascaded update then that is perhaps a "code smell" in your database design. In theory your primary key should be static so changes that need cascading shouldn't need to happen. Perhaps it was added as a logical step along from ON DELETE CASCADE. It is at least safer than cascading deletes.

The existence of ON DELETE CASCADE makes more sense: while PKs shouldn't really change, things do often get deleted. The cascading is simply a convenience. It saves you from having to write code to drop child entities manually when getting rid of a parent.

Furthermore, it might be considered safer than implementing this in other logic because the database is taking care of transactional consistency, deadlocks, and so forth, so the operation should (bugs permitting) be guaranteed atomic. If you implement your own "find children, delete, then delete parent," which may have to be nested, you have to do some legwork[!] to ensure that if there is an error part way through there is no way that you delete the great-great-great-grand-children of a row but leave the rest standing (leaving a partly deleted entity which could cause difficult to diagnose issues later).

[!] Taking appropriate locks, preferably not by locking whole tables, ensuring transaction isolation settings are right isn't as simple as it might first look.

Why Not Cascade?

As I said above, I consider a need to cascade updates routinely to be a bit of a design smell. You shouldn't need to change a primary key value during normal operations.

I'm very wary of cascaded deletes, despite the danger of bugs in more manually deleting complex structured entities. Too often you see inexperienced people perform UPSERT operations[*] using a DELETE-then-re-INSERT method, even when the DB supports single-statement upsert operations[^], which damages your data if cascaded deletes are enabled: the delete removes the children too, and they don't get put back by the subsequent insert.

Also, in a lot of cases with real data you don't actually want a cascaded delete. For example, if a manager leaves a company, you don't want to delete their sub-ordinates because assigning a new manager first was forgotten or prevented by a bug.

[*] in postgres via INSERT ... ON CONFLICT ... but this is not standard and quite different syntax[†] is used elsewhere
[^] either because they aren't aware of the available syntax, or are avoiding it in order to be cross-DB compatible
[†] MERGE can be used in Microsoft's TSQL for the same effect, mySQL[‡] supports INSERT ... ON DUPLICATE KEY ...
[‡] mySQL also supports REPLACE INTO, but IIRC that is just syntactic sugar for delete+insert so has the same dangers

David Spillett
  • 32,593
  • 3
  • 50
  • 92
4

If you feel uncomfortable about cascading updates and deletes (to use the technical term), I won't dissuade you. It is certainly dangerous if deleting a single table row can wipe out half your database.

The reason for this feature to exist is convenience and speed. The convenience angle is obvious. The speed benefit is that you can delete a row and all its dependents with a single statement. Without cascading delete, you need at least two statements: one to delete the dependent rows and one to delete the row itself. So cascading delete can save you one client-server round trip.

Laurenz Albe
  • 61,070
  • 4
  • 55
  • 90
4

Other answers that say

The reason for this feature to exist is convenience and speed

or

a safety feature to mitigate one's own mistakes.

miss the point, I'm afraid. Referential integrity constraints, as other database constraints, are tools to enforce business rules that underlie your database model. As such, they reflect the expected behaviour of entities upon which they act.

Others have provided examples where one or another behaviour would be appropriate from the business rules' point of view. The SQL standard, with which Postgres complies, is flexible enough to give you the means to implement these and other rules (RESTRICT, SET NULL, SET DEFAULT).

As to why NO ACTION has been chosen as the default for the SQL standard, it's probably because it offers a balance of safety and flexibility. On the one hand it prevents irreversible data loss, while on the other allowing you to defer constraint validation until the end of the transaction.

mustaccio
  • 28,207
  • 24
  • 60
  • 76
3

I have used on cascade delete in case where the parent and child records belong together such as

  • an invoice and its line items
  • a product and references to the parts needed to assemble it
  • a user and their privileges, role assignments, etc
  • a table and its columns :-)
Colin 't Hart
  • 9,455
  • 15
  • 36
  • 44
2

Ideally, your PK never changes and so there is no reason to ever CASCADE a modified PK to a child table. That's one reason so many people use a synthetic key, such as a UUID or SERIAL value. Those aren't "real" keys, but they're pretty darn convenient. (The real key is the attribute or combination of attributes that should be unique amongst all rows within the table.) For the real keys in the row, you can (and often should) add a UNIQUE constraint. Postgres is good at this kind of stuff.

As far as DELETE goes, that depends on the tables. If the child record should only ever exist with a parent (it's a "dependent entity"), then it's correct to delete the child records. If, on the other hand, the child may exist without the parent (it's an "independent entity"), then NULL is the better solution.

Note that if you ever want to preserve historical data when you DELETE or TRUNCATE, you can do this in Postgres pretty easily. Others can say more, I'm using statement-level triggers where I need such functionality.

Morris de Oryx
  • 939
  • 6
  • 18
1

"Why would one ever want a query to be refused like that?"

Like the object-oriented analogy, it isn't a security feature to defend from external attackers so much as a safety feature to mitigate one's own mistakes. It gives a user tools to inform them about unexpected results of an action.

In many cases, cascading is appropriate, but there are use-cases where it might not be.

For example, consider a university database with courses and students enrolled on them. When changing the ID of an existing course, it's probably appropriate to cascade that change - it's the same course, and any students enrolled on it will still be enrolled.

If, however, you were working with a new course that had been added to the database ready for the next year's intake, but which didn't have any students enrolled on it yet, it would be more appropriate to have the update fail if it would change a foreign key dependency - you know there shouldn't be any entries pointing to this new course as a foreign key, so if the update does fail, it means something's wrong. Whether it's an erroneous premature enrolment or you're trying to edit the wrong course by mistake, it's worth investigating further to understand exactly what the problem is before forcing the change through. In this case, cascading the changes would likely propagate the error and could make debugging and recovery harder.

If everything's working properly, this safety mechanism is never necessary; but things sometimes don't work properly, and in some of those cases it's best to fail and be warned before continuing.

LizWeir
  • 151
  • 4
1

The question on when to cascade delete, when to restrict delete, is an interesting one, and I haven't figured out a definitive answer yet. Informally I'm thinking in terms of how important is the information, and can it be recreated from an external source?

I.e., if the information is of little value or it can be recreated from an external source, I tend to go with CASCADE. If on the other hand, the information is valuable and can not easily be recreated, I tend to restrict it. In the latter case, an explicit delete has to be made.

Lennart - Slava Ukraini
  • 23,842
  • 3
  • 34
  • 72