0

I am in charge of a mysql database which is about 12 gb, and it grows pretty quickly (was 3gb a few months ago). I wanted to know what measures i can take when the database grows even larger (lets say the 50-100gb area) to keep the performance good. I heard about sharding and archiving but it was pretty overwhelming so i was hoping to get some clarification here. Thank you very much

user69153
  • 157
  • 2
  • 12

2 Answers2

2

It depends on your table structures, data layout and usage pattern. However I wouldn't consider 12 gb or actually anything that can fit into the RAM of a reasonable priced commodity server big data (~100-400 Gb). I work with MySQL servers happily running with terabytes of data. If your tables are well designed, indexed and your queries are well written you won't see issues.

After that it's usually much easier to split your dataset by common usage pattern and logical dependencies. Move tables to a different database. For example split your log tables to a new server. You can repeat that as long as want until you reach the write limit of a single server and your cannot split your database anymore (all tables are strongly related). Then you can think about sharding.

Károly Nagy
  • 3,080
  • 1
  • 14
  • 13
1

InnoDB, not MyISAM. Subject closed.

Shard? Only if you have a logical way to split the data across machines. For example, are there "users" that are "independent" of each other? That is, can all the info about two users be sitting two separate servers, with no need to JOIN or otherwise combine data between the users? If so, Sharding may be in your future.

The benefit of sharding is to scale (size or writes) beyond what one server can handle. A few GB is not a problem. You did not imply that your queries will melt down a single server.

Partitioning? This does a similar thing, but on a single machine. However, partitioning does not provide any intrinsic benefit. Will you eventually be purging "old" data? If so, the partitioning on 'time' and using DROP PARTITION is a good reason to partition. That and 3 other cases are in my partitioning blog.

If you want a more specific answer, please provide more information about your data and your application. Give us more clues of the type of data, schema, queries to be performed, data flow, etc.

If it is a Data Warehouse app, then Normalization (is that what you meant by 'extensions'?) and Summary Tables are a must.

The sooner you decide to partition or shard, the easier it will be. But if you do it without careful thought, it could be a fiasco. Or, let's say, "a costly learning experience".

Rick James
  • 80,479
  • 5
  • 52
  • 119