2

I have a schema for which I would like to generate fake data. I can generate fake data using Python for each of the tables separately, but I don't know how to make sure the fake data generation adheres to the constraints I have set between the tables.

I would like to generate 1 million rows for each of the tables. My system is Ubuntu 18.04 LTS.

I came across Red Gate's SQL generator, but it is only for SQL Server AND available only in Windows. SQL Server is currently not available for Ubuntu 18.04 either. I currently have PostgreSQL and MySQL in my system.

http://www.generatedata.com I can install locally (didn't try it yet) but I still can't figure out how to make sure the data generation adheres to PK-FK constraints.

MDCCL
  • 8,530
  • 3
  • 32
  • 63

1 Answers1

1

Basically, chain data-modifying CTEs in PostgreSQL for the purpose. Details very much depend on the undisclosed details of your setup and specific requirements. The meaning of "random" varies wildly.

Minimal demo:

CREATE TABLE fk_tbl(fk_id serial PRIMARY KEY, data int);
CREATE TABLE pk_tbl(pk_id serial PRIMARY KEY, fk_id int REFERENCES fk_tbl, data int);

WITH ins1 AS ( INSERT INTO fk_tbl(data) SELECT trunc(random() * 10000)::int -- or whatever FROM generate_series(1, 10) g -- or 1000000 instead of 10? RETURNING fk_id ) INSERT INTO pk_tbl(fk_id, data) SELECT fk_id, trunc(random() * 55555)::int -- or whatever FROM ins1 ORDER BY random(); -- optional

db<>fiddle here

Related:

Erwin Brandstetter
  • 185,527
  • 28
  • 463
  • 633