1

I'm an administrator of a social game which uses MySQL(Percona 5.1.56 to be precise) for data storage(all tables have InnoDB type). There are about 2 millions of players in the game and database size is about 100Gb and it's gradually growing. There are a few tables which have >500 millions records already.

The game DB is running pretty smoothly even not sharded on a single powerful enough non-virtualized Linux Debian 6 server(24 GB RAM, hardware Adaptec RAID-10, with a couple of read-only slaves). The problem is that from time to time(once a month or two) MySQL crashes with data corruption as following:

 InnoDB: Database page corruption on disk or a failed InnoDB: file read of page XXXX.     
 InnoDB: You may have to recover from a backup.

Restoring from such errors is quite a painful process. Which usually requires promoting one of the slaves being a new master, directing the traffic to this new master and creating the backup slave for this master. There is some downtime which makes players really mad...

Percona folks told me it was the hardware's fault and at first I thought it was the hardware to blame too but after I've changed several servers I don't know what to think really.

Is there any chance it's MySQL corrupting the data? I've already started looking at alternatives(e.g PostgreSQL, or even something radical like Cassandra). But of course I know that every new product has its own baggage of bugs and quirks not to mention the costs of migration....

I'm pulling out my hair(today I've faced another crash), so if you have any ideas, please share...

pachanga
  • 505

1 Answers1

2

We have been running MySQL (and the Percona version in the past) for several years with databases with up to 300 million rows, with multiple read slaves. The only times I have seen these sort of issues have been related to hardware. Most frequently, bad drives, bad drive controllers, bad RAID controllers.

What kind of storage are you using? If you are using commodity hard drives, even in a RAID configuration, with your I/O levels you are going to be exceeding typical MTBF rates.

Craig
  • 1,364