Truncated copy of a DB for a developer

Question

I have an access to a DB (SQL Server 2008 R2). The size of the DB is about 40GB. I am a developer and I'd like to have a local copy of that DB to be able to work offline.

Is there an easy way to get truncated version of that DB (all db objects and about 1000rows per table)?

I have a backup of that DB on a test server, but I am not able to copy 40GB over my slow connection. I don't have permission to restore that backup and delete rows manually and backup again after.

I probably can script schema by Task -> Generate scripts, but how to script data and limit it to ~1000 rows per table(taking into account the fact that there are about a hundred tables and bounded through FK rows should remain safe without loosing a row at one end)

score 4 · Accepted Answer · edited Apr 13 '17 at 12:42

Couple of options :

A. Native to sql server (no third party):

enter image description here

-- make sure to have it in order due to Foreign keys + if your tables have Identity columns then Identity insert should be ON + you have to order by date to get matching data from all the tables.

select 'select top(100) * from '+ name +char(10)+ 'go' from sysobjects 
where type = 'U' -- for user tables

Option 2: BCP OUT and BULK Insert.

See my script at https://dba.stackexchange.com/a/43232/8783

Option 3: SSIS wherein you can specify top rows to be extracted from the source database using T-SQL Script task. Sill here you have to manage FK's , Identity columns, etc.

Option 4: Use third party or tools from codeplex (all free)

SSMS Tool pack. Very good and freebie (till SQL Server 2008R2) and it has Insert statement generator
DBScripter
Sql Script Generator

Licensing only applies to SSMS 2012 and higher versions. For previous SSMS versions the SSMS Tools Pack is still FREE.

score 3 · Answer 2 · edited Apr 13 '17 at 12:43

You should be able to get an empty database built from source by the supplier, failing that generating scripts directly from the database and running them on a new blank one, as Kin suggests, will be required.

Getting a limited but useful amount of data is going to be more complicated than simply taking X,000 rows of Y% of each table: much of the data will end up not linking together correctly - in fact if the database is properly setup with referential integrity enforced by foreign key constraints this simplistic process will simply fail. To create a subset of the data you need to work with the structure of the data: for instance with out training records system you might extract 10 teams of people out of the 100s, then extract their training records, then the audit trail records associated with them, and so forth - that way everything you have links together as real data instead of being arbitrary combinations of rows that might not related to each other.

Also be careful when taking copies of real data for testing/development purposes: you may be in breach of data protection rules/regulations/laws (or the owner of the database you are taking a partial copy of may be in breach of those rules/regulations/laws by allowing you access). At very least you will probably need to randomise any personally identifying or otherwise sensitive data.

A better solution is to generate this data rather than copying it from the production database. This way you can control the data size fairly directly to match your needs and can engineer in all the potential oddities that you want for testing purposes to make sure your changes don't introduce regressions in dealing with edge cases. It also means that you do not need to worry about data protection issues as you are not dealing with real records about real people.

See this answer for another discussion of manufacturing test data, the benefits, and issues.

Truncated copy of a DB for a developer

2 Answers2