2

I am planning to set up a RAID array for scratch space use in a computational server (16 cores, 128 GB RAM). The users will routinely be creating large (500GB) MySQL InnoDB databases and storing these temporarily to the scratch space. The databases are filled with data from a cluster which may have up to 1000 MySQL clients connected with a database at once. The RAID controller is a PERC H710 integrated controller with a 512MB non-volatile cache.

Since the storage is temporary, I am planning to use RAID 0 for read/write performance. The remaining question is then whether to use 8 x 7,200 RPM disks or 4 x 15,000 RPM disks. One typical use-pattern will be that once a database is created, there will be very few writes to it. There will be a lot of reads for analysis so the 15K seek time would help here however I do not know how the RPM improvement stacks up against from the RAID 0 striping speed up with extra disks.

Ignoring drive capacity as a factor, which setup would be preferable, 8 x 7200 RPM drives or 4 x 15000 RPM drives? I apologize if this type of question does not have a clear answer.

Edit: I have not looked into how much the RAID controller will limit the effective throughput based on the number of disks in the array yet.

2 Answers2

6

Lots to address here, spanning design to just knowing pricing and the attributes of the related technologies.

Let's assume the reason you're choosing between 8 x 7,200 RPM nearline disks and 4 x 15k enterprise disks is cost. Let's also assume that you're talking about 2.5" small form-factor disks...

I rarely buy 15k disks these days because if latency and random I/O performance is paramount, I go to SSD-based solutions. Your capacity needs aren't tremendous, so just use 6 or 8 10k RPM enterprise disks. They have a better performance and capacity profile than the 7,200 RPM disks and are a better value than the 15k enterprise disks. Right now, 600GB and 900GB 10k SAS 2.5" disks are around the same price as 1TB 7,200 2.5" drives.

How much usable storage space do you actually need? In the 2.5" disk world, capacities are:

  • 7,200 RPM - 500GB, 1TB
  • 10,000 RPM - 72GB, 146GB, 300GB, 450GB, 600GB, 900GB, 1.2TB
  • 15,000 RPM - 72GB, 146GB, 300GB

But there's the academic side of this question. If the read/write profile is sequential, the 8 x 7,200 RPM drives win on throughput because of spindle count. If it's random, it's more complicated. The edge would still go towards the 8 slower disks, but not by much.

If your working set of data fits within 1 Terabyte and is definitely scratch space, I'd just get a 960GB PCIe SSD (or two) and be done.

ewwhite
  • 201,205
1

Since the storage is temporary, I am planning to use RAID 0 for read/write performance

You are wrong.

Mirroring isn't just about availability. It's also about reducing latency. If you're only doing sequential access on a single table then mirroring is just going to slow down the writes. But with multiple users and/or multiples tables/indexes and/or random reads then mirroring will improve performance.

If performance is the primary objective here then, like ewwhite says, why aren't you looking at SSDs?

There's more to the story than rotation speed and capacity. For a long time the vendors of "Enterprise" drives have justified a price differential based on reliability as well as performance. But there's a growing body of evidence that this is not the case. On the other hand they do tend to behave better in failure modes - a commodity drive will try very hard to commit stuff to the disk - which can play havoc with your MTTR. Hence using enterprise drives in an array can give better availability for the array as a whole.

See also:

The price differential has to be a consideration. IME, Enterprise drives are around 4 times the cost of basic drives but typically only offer double the performance.

Since you don't seem to be bothered about availability, then I'd recommend going with the cheaper drives - but do mirror them for performance.

symcbean
  • 23,767
  • 2
  • 38
  • 58