This question is quite old and already has some good answers provided.
However, in recent years, several new products have been released to help you subset your database. Without delving too much into the details, these products allow you to make the configuration process more declarative, so that newly connected tables and columns are handled automatically. Additionally, you can embed them into your CI/CD pipeline.
I might be biased, but we have an awesome tool in this category called Synthesized TDK: https://docs.synthesized.io/tdk/latest/. There is a free community version that supports only Open Source databases, so it should work for you!
An example configuration that takes 10% of your production database and masks data (config.yaml):
default_config:
mode: MASKING
target_ratio: 0.1
table_truncation_mode: TRUNCATE
schema_creation_mode: CREATE_IF_NOT_EXISTS
It handles foreign keys by traversing a topologically sorted graph of tables. Also, it masks columns so no sensitive data leaves the production. You can customise masking on the columns level, see Masking Docs
To run the tool, simply execute the corresponding command: java -jar tdk.jar <connection options> -c config.yaml
If you have any questions, please join our community: Slack Community