Tool to generate large datasets of test data

Question

Many times when trying to come up with an efficient database design the best course of action is to build two sample databases, fill them with data, and run some queries against them to see which one performs better.

Is there a tool that will generate (ideally straight into the database) large (~10,000 records) sets of test data relatively quickly? I'm looking for something that at least works with MySQL.

score 12 · Answer 1 · answered Jan 10 '11 at 00:25

The best tool (if you can find it) is DataFactory. (Sadly out of print). I've generated absolutely delightful (and quite authentic-looking) datasets from it.

Generatedata.com is... acceptable, but doesn't scale very well.

DataGenerator is something to keep an eye on.

And while DTM Data Generator is clunky and a poor substitute for DataFactory, it exists and is being sold, and I've used it to generate mildly acceptable data.

score 9 · Answer 2 · edited Oct 03 '13 at 14:28

9

RedGate has a tool similar to what you're looking for, but it's destination is intended to be MS SQL Server.

http://www.red-gate.com/products/sql-development/sql-data-generator

You might also check out the following article:

http://www.sqlservercentral.com/articles/Advanced+Querying/jointestdata/197/

edited Oct 03 '13 at 14:28

answered Jan 10 '11 at 01:53

Jeff

1,035
7
12

score 4 · Answer 3 · answered Jan 10 '11 at 07:55

I typically generate my own, using some known data as input -- if it's too random, it's not always a good test; I need data that's going to be distributed similarly to my final product.

All of the larger databases that I have to tune are scientific in nature -- so I can usually take some other investigation as input, and rescale it and add jitter. (eg, taking data that was at a 5 min cadence with millisecond precision, and turning it into a 10 sec cadence w/ milisecond precision but a +/- 100 ms jitter to the times)

...

But, as another alternative, if you don't want to write your own, is to look at some of the benchmarking tools -- as they can repeat things over and over again based on a training set, you can use them to insert lots of records (and then just ignore the reports on how fast it did it) ... and then you can use that same tool for testing how fast the database performs once it's populated.

score 3 · Answer 4 · edited Mar 05 '12 at 17:57

3

I've been using mysqlslap. It cleans up after itself too.

Here's the article I read when I started using it.

edited Mar 05 '12 at 17:57

BenV

4,923
7
40
38

answered Jan 11 '11 at 15:24

SteveHarville

493
3
7

score 2 · Answer 5 · answered Feb 28 '12 at 17:54

2

Have a look at benerator

It is not easy to start with, but is quite powerful.

answered Feb 28 '12 at 17:54

score 2 · Answer 6 · answered Jul 03 '12 at 15:59

Take a look at this function. You can generate names, phonenumbers addresses, zipcodes etc etc. This is completely done in MYSQL no need for other applications. http://moinne.com/blog/ronald/mysql/howto-generate-meaningful-test-data-using-a-mysql-function

score 1 · Answer 7 · answered Feb 28 '12 at 17:33

For anyone looking for a different solution to this problem... I wrote a test data generator project for Data Synchronisation Studio. It can generate a large dataset ranging from 1 to 100s of millions of rows of realistic testing dat. Here is a blog post all about it. http://www.simego.com/Blog/2012/02/Test-Data-Generator-Download-for-Data-Sync It's free to use for 15 days (once you have your test data, you have it)

score 1 · Answer 8 · answered Mar 01 '11 at 12:47

1

The most cost-effective way is probably to use an open source or commercial data generator. I used to do that.

Now, in my golden years, I regard each need for test data as a mandate to learn another scripting language.

answered Mar 01 '11 at 12:47

Mike Sherrill 'Cat Recall'

11,017
1
38
45

Tool to generate large datasets of test data

8 Answers8

Linked