56

I'm planning to design and set up a database to store dictionary entries (usually single words) and their meaning in another language. So, for example, the table Glossary must have entry and definition and each table record has a reference to the id of a record stored in Tag (Each entry must have a tag or category).

Since my data has a structure, I thought using a SQL database (like MySQL) is not a bad idea; but people say MongoDB is much better for performance.

At the client side, the application must be able to provide a search box with autocomplete which consumes a REST API provided by the backend. Is it safe to go with MySQL in such a scenario? or should I use MongoDB or ElasticSearch of any other solution for this? Hundred thousands of records are supposed to be stored and accessed in this way.

enderland
  • 12,201
  • 4
  • 53
  • 64
Aziz Az
  • 491

4 Answers4

94

I can't tell you why it's a bad idea. I can tell you a bunch of reasons why a relational database is a good idea though.

  1. Remember that not everyone consults a dictionary for a definition. More times than not, a dictionary is used to find the correct spelling. This means you're not just finding a needle in a haystack, you are searching the haystack for needles that are similar to the one described by the user (if I may use an idiom).

    You won't just be doing primary key look-ups. You'll be doing keyword searches

  2. Words can be related, either in meaning or spelling (read, read, red and reed)

    Whenever you see the word "related" think "Relational Database"

  3. If you need speed, you need caching on top of your relational database, not a broken relational data model

  4. A properly normalized database speeds up primary key look-ups and searches since there is simply fewer bits to sift through.

  5. The people who say normalized databases are slower are referring to the 0.1% of cases where this is true. In the other 99.9% of cases they haven't actually worked with a truly normalized database to see the performance first hand, so ignore them. I have worked with a normalized database. Love it. Don't want to go back. And I'm not a database guy. I'm a C#/JavaScript/HTML/Ruby guy.

  6. Words have an origin. In fact, many words in the same language can have the same origin, which is another word in a different language. For instance, résumé (the thing we upload to recruiters websites so we can get incessant phone calls and e-mails for the next 7 years) is a French word.

  7. A dictionary also defines what kind of word it is (noun, verb, adjective ect). This isn't just a piece of text: "noun" it has meaning as well. Plus with a relational database you can say things like "give me all the nouns for the English language" and since a normalized database will be utilizing foreign keys, and foreign keys have (or should have) indexes, the lookup will be a snap.

  8. Think of how words are pronounced. In English especially, lots of words have the same pronunciation (see my example above with read and reed, or read and red).

    The pronunciation of a word is, itself, another word. A relational database would allow you to use foreign keys to any pronunciations. That information won't be duplicated in a relational database. It gets duplicated like crazy in a no-SQL database.

  9. And now let's talk about plural and singular versions of words. :) Think "boat" and "boats". Or the very fact that a word is "singular" or "plural".

  10. Oh! And now let's talk about past tense, present tense, future tense and present participle (to be honest, I don't know what the crap "present participle" is. I think it has something to do with words ending in "ing" in English or something).

    Look up "run" and you should see the other tenses: ran, runs, running

    In fact, "tense" is another relationship itself.

  11. English doesn't do this so much, but gender is another thing that defines a word. Languages like Spanish have suffixes the define whether the subject of the noun is male or female. If you need to fill in the blanks for a sentence, gender is extremely important in many languages.

    Since you can't always rely on language conventions to determine gender (in Spanish, words ending in "o" are masculine/male, but that's not true for all words), you need an identifying value: Male or Female. This is another relationship that a normalized database handles gracefully even at millions of records.

With all the twisted rules and relationships between words, and even different languages, it's hard for me to imagine this data store as a "document store" like a no-SQL solution provides. There are so many and such a large variety of relationships between words and their components that a relational database is the only sensible solution.

27

If you go with the key-value store (which offers you a more impoverished programming model) and it turns out you need more structure (in your case, say, adding a third language), or you need to do more complex queries involving joins, you'll spend a bunch of time reorganizing your keys, denormalizing your data, and/or looping over all the data to find what you need.

If you start with a relational database, you can work through your application's design, code, and try it out concentrating more on the natural data model for your application, rather than on shoehorning it into the key-value form.

Once the application settles down, you can work on performance, by measuring various options. There are quite a few performance tricks to do in SQL before needing to switch technologies. You'll have learned a lot about your application and will be in a much better position to decide if relational is hurting you and if key-value will work for your data model.

If it turns out that key-value is exactly what your application needs, you can switch without having wasted significant investment in the relational model, whereas the other way around you might possibly end up wasting time making the key-value model do things that are trivial in the relational model.

Consider the relational database as an accelerator to getting your application designed, written, and up and running, in the face of ever-changing requirements as you learn more about your domain and users.

When you have millions of users, you'll almost certainly need to refactor the design anyway, even if you had picked key-value to start with.

Erik Eidt
  • 34,819
10

For a database this small, it's probably not going to make much difference for performance. A standard RDBMS isn't a terrible idea here because presumably, there should be far more reads than writes of a given entry. Performance doesn't seem to be a primary driver for this. Caching in the application layer also mitigates such concerns.

The other consideration is replication and resilience. Relational databases tend to be designed around a single instance. You should read up on the CAP theorem and consider what matters the most to you.

JimmyJames supports Canada
  • 30,578
  • 3
  • 59
  • 108
2

These NoSQL databases always sound like a good idea at the outset, but you'll be guaranteed to run into problems when you start dealing with edge cases (e.g. where keywords must by looked up by their value (or part of) for instance.

It would be a safer option to go with a relational database at the outset and then denormalise later. MySQL is awesome for this sort of purpose (simple relational databases with text-based searching), there's not too many use-cases where you'll find it struggling with this sort of data. Just make sure you have your indexes set up correctly and you'll find it will perform at a level comparable (or better when doing a text search) to a NoSQL database, and it will give you the flexibility to modify your app logic without being bound to a concrete data structure.

As you find the most common usage of your data (and if you ever find it's not meeting your performance needs), you can then proceed to de-normalise the data by outputting to a set format that can be loaded into (and retrieved from) a NoSQL schema.