1

As a web developer, I've typically only worked with nice, normalized tables in SQL-based RDBMS, but I'm doing some data consolidation for a researcher who will be analyzing correlations between all items of several very long questionnaires and wants a giant CSV with data for all takers and all items. So, say, 3000 rows and 5000 columns (?).

I set up tables for each questionnaire (each has its own CSV), but we're talking tables with 1000+ columns representing item responses for each taker / pkid. Obviously I can't join up four, five different questionnaires like that based on pkid -- I'd have 5K or more columns and I'm already at the outer limits of MySQL just on the individual tables.

So do I: 1) look into NoSQL at last, 2) just programmatically write a one-off script to consolidate this data 3) convince scientist to buy Oracle or MS SQL 4) other option I as a non-data-scientist don't know about?

Thanks for the advice!

Joy

Vince Bowdren
  • 439
  • 2
  • 18
Joy
  • 111
  • 2

1 Answers1

1

1) Consider how this information will be used. Will the owner want to try lots of ad hoc analysis and data exploration? Does your potential data store have tools or connection libraries that allow this? Will the user have to resort to writing Map and Reduce functions for every question posed?

2) Do you mean that the same question is on different questionaires and you're wondering if you should recognise they're the same in your data store design? Yes, absolutely you should, whatever thechnology you end up with.

3) The DBMSs have column limits, too. They may be larger, but they're not infinite.

4) Store your data as one row per answer. There are several qustions here on quiz design. Here's one picked at random. These will give you pointers and any detailed questions remaining will be good material for future DBA.SE posts.

Michael Green
  • 25,255
  • 13
  • 54
  • 100