DBMS in a container/cluster vs data corruption

Question

I am fairly new to db administrations as well as containerization/kubernetes concepts.

I see every here and then containerized/clustering solutions for i.e. running PostgreSQL (i.e. here.

This conflicts with info I get during my training where it is said that "databases can not run into ephemeral enironments like containers to avoid data corruption" on container-failure. The same logic should generally apply to Kubernetes clusters alike.

Can anybody clarify how these two points of views play together?

mustaccio · Answer 1 · 2022-11-21T15:46:04.780

Your training was wrong. In my organisation we've been running containerised databases in production for few years now. Postgres, MySQL, Db2, Clickhouse.

We are not alone: a survey conducted by the Data on Kubernetes community in September of 2021 found that, of the participating 500 users

90% believe [Kubernetes] is ready for stateful workloads, and a large majority (70%) are running them in production with databases topping the list.

The same DBMS program (i.e. the same machine code) doesn't suddenly become more likely to fail if is executed inside a container. Likewise, it doesn't suddenly develop defects that may cause it to corrupt its data.

DBMSes, at least those that you would want to use in production, are already designed to be resilient against crashes and to keep data durable and consistent. All you need to do is use a persistent volume for persistent data. Persistent volumes survive container restarts and recovery is taken care of by the respective database engine, just like it would deal with the regular database process abend.

DBMS in a container/cluster vs data corruption

1 Answers1