How can I generate fake data that adheres to the PRIMARY KEY-FOREIGN KEY constraints of my schema?

Question

I have a schema for which I would like to generate fake data. I can generate fake data using Python for each of the tables separately, but I don't know how to make sure the fake data generation adheres to the constraints I have set between the tables.

I would like to generate 1 million rows for each of the tables. My system is Ubuntu 18.04 LTS.

I came across Red Gate's SQL generator, but it is only for SQL Server AND available only in Windows. SQL Server is currently not available for Ubuntu 18.04 either. I currently have PostgreSQL and MySQL in my system.

http://www.generatedata.com I can install locally (didn't try it yet) but I still can't figure out how to make sure the data generation adheres to PK-FK constraints.

Erwin Brandstetter · Answer 1 · 2022-05-04T22:20:26.910

Basically, chain data-modifying CTEs in PostgreSQL for the purpose. Details very much depend on the undisclosed details of your setup and specific requirements. The meaning of "random" varies wildly.

Minimal demo:

CREATE TABLE fk_tbl(fk_id serial PRIMARY KEY, data int);
CREATE TABLE pk_tbl(pk_id serial PRIMARY KEY, fk_id int REFERENCES fk_tbl, data int);
WITH ins1 AS (
   INSERT INTO fk_tbl(data)
   SELECT trunc(random() * 10000)::int      -- or whatever
   FROM   generate_series(1, 10) g          -- or 1000000 instead of 10?
   RETURNING fk_id
   )
INSERT INTO pk_tbl(fk_id, data)
SELECT fk_id, trunc(random() * 55555)::int  -- or whatever
FROM   ins1
ORDER  BY random();                         -- optional

db<>fiddle here

How can I generate fake data that adheres to the PRIMARY KEY-FOREIGN KEY constraints of my schema?

1 Answers1