Effect of drive count on RAID-5

最新推荐文章于 2025-11-25 14:22:12 发布

转载最新推荐文章于 2025-11-25 14:22:12 发布 · 772 阅读

文章标签：

#disk #ios #user #random #each #performance

Storage 同时被 2 个专栏收录

52 篇文章

订阅专栏

RAID

6 篇文章

订阅专栏

本文探讨了RAID-5配置中磁盘数量的实际限制，包括存储效率、性能下降、重建时间、二次故障概率及位错误率等问题，并提供了数学模型进行评估。

Effect of drive count on RAID-5

Posted in Storage Interconnects & RAID, Advisor - Tom by Tom Treadway

Question to the Storage Advisors, from Dean: What are the practical limitations on the number of disks in a RAID-5 set? I understand that a larger number of disks worsens the probabilities of bad things like a failure or paying the worst possible seek/delay cost in random accesses. Do you have any mathematical models for evaluating these penalties?

Dean, there are several ways to look at this issue. Some of the areas I point out below are pretty obvious, but for the sake of completeness I’ll cover it all:

Storage efficiency: As the disk count increases so does the storage efficiency. This is because there is one disk’s worth of redundancy (parity) per array. For example a 3-disk RAID-5 has one disk’s worth of parity and two disk’s worth of usable space, therefore the efficiency is 67%, i.e., 67% of the total disk space is available for user data. Likewise, a 10-disk RAID-5 has an efficiency of 90%. As a formula, it looks like this:

Efficiency = (DiskCount-1) / DiskCount

Degraded performance: A degraded RAID-5 is an array with a failed disk. If the user tries to read a block on the failed disk the RAID software will have to access all the other disks in the array to reconstruct that missing data. However if the user tries to read a block on one of the remaining good disks then nothing special happens. The data is simply read from the disk.

So let’s go back to the 3-disk example and assume a single failed disk. And let’s assume that the user is reading just one block, and let’s see how that read transforms to one or more disk accesses. If the user reads the two good disks then each read will be converted to just one disk read. However if the user reads the bad disk then that read will turn into two disk reads (from the good disks). So on average, three random disk reads (one per disk) will result in four disk IOs, or an increase in IOs of 33% from the optimal array case. Now let’s look at the 10-disk array. A read from nine of the disks will result in just one IO, however a read to the bad disk will result in nine IOs, i.e., one read from each of the remaining good disks. So ten random reads will result in 18 disk IOs, an increase of 80%. As a formula, it looks like this:

IOIncrease = ((Disk Count-1) + (DiskCount-1) – DiskCount ) / DiskCount

which reduces to:

IOIncrease = (DiskCount-2) / DiskCount

Now let’s look at writes - they’re a little more complicated. Each host write will typically result in four disk IOs – two reads and two writes. One read and write will be on a data disk while the other read and write will be on a parity disk. We’ll need to see what happens if any of those IOs go to a bad disk. If a read touches a bad data disk then all the other disks will have to be read, just as in the previous example. However the data disk won’t be written, obviously, because the disk is bad. Therefore the read becomes (DiskCount - 1) reads, and the write just goes away. The parity disk still has both the read and write. So the total number of IOs is (DiskCount - 1) + 2.

If a read touches a bad parity disk, then nothing special happens. There is no parity to update and therefore the write to the data disk is just a plain ol’ write. The total number of IOs will go from four to just one.

OK, to summarize, normally a write causes 4 IOs. However if the data disk is bad the total IOs will increase to (DiskCount + 1). Likewise, if the parity disk is bad the total IOs will decrease to 1.

If we go back to our example of ten IOs spread evenly (one per disk) across ten disks, you’d see that 8 host writes will result in 4 IOs, one host write will result in 11 IOs and one will result in 1 IO, for a total of 44 IOs. In an optimal array all 10 host writes would result in 4 IOs each, of a total of 40 IOs. That’s only an increase of 10%. The formula looks like this:

Increase = ((DiskCount-2)*4 + (DiskCount+1) + 1 – (DiskCount*4)) / (DiskCount*4)

which reduces to:

Increase = (DiskCount-6) / (DiskCount*4)

It’s interesting to note that if the disk count is exactly 6 then the increase is zero and the total IOs don’t change. If the disk count is less than 6 then the total IOs actually drop!

Also, you may have noticed that I conveniently left out the XOR time. In general, assuming that the controller doesn’t have a memory bottleneck or the XOR isn’t done in software, then the XOR time is relatively small compared to the disk time, so it can be just left out of the equation.

Rebuild time: [Note that this section was reworded on July 18, 2007. An observant reader had noticed that I was in the weeds. ] When a bad disk is replaced it is re-created by reading from all the other disks in the array. Luckily these reads can be issued in parallel and therefore the rebuild time does NOT increase linearly as the disk count increase. In other words, to rebuild a 3-disk array will require reading two entire disks and writing a third disk. Likewise rebuilding a 10-disk array will require reading nine entire disks and writing a tenth disk. The time to read two disks should be roughly the same time to read nine disks. There are some additional complications regarding XOR and potential limitations in the hardware, but they don’t have too much effect on the rebuild time and probably aren’t interesting enough to repeat here.

Second disk failure: There is a small chance that another disk will fail before the first one is replaced. The chance of an array failing is simply the chance of a disk failing multiplied by the number of disks in the array. Therefore the more disks in the array, the higher the likelihood of at least one disk in the array failing. BTW, the chance of a disk failing is inversely proportional to the MTBF (Mean Time Before Failure) of the disk. MTBF is a common way of indicating disk reliability. Here’s the simple formula:

ChanceOfArrayFailure = ChanceOfDiskFailure * DiskCount

Some folks like to correlate multiple failures, meaning that if one disk fails then there is a higher chance (such as 10X) that a second disk will fail – possibly due to power supply problems, overheating, or shared cable issues (as with parallel SCSI). But really these chances are extremely small and it’s really not worth going into the detail again. A more thorough review can be found here.

Bit error during rebuild: This is probably the biggest negative with extremely large RAID-5 arrays. Every sector on the disk has a very small chance of being unreadable, even using error correction codes (ECC). This is referred to as the Bit Errors Rate (BER). A typical low-end SATA disk will have an uncorrectable bit error for every 10^14 bits read and a typical high-end SAS disk will have an uncorrectable bit error every 10^15 bits. These seem like big numbers, but keep in mind that there are almost 10^13 bits in a 1TB disk. That means that you’re almost guaranteed to get a bit error if you read from ten SATA disks.

So what does it mean if you get a bit error? Basically it means that one entire 512 byte sector will be unreadable, further meaning that the corresponding failed sector can’t be rebuilt. The bottom line is that you’ve just lost data. A good RAID controller will mark that sector “offline”, allowing the OS to get an error which will cause the user to restore the corresponding file from backup. A bad RAID controller will ignore the error (causing hidden data corruption) or will abort the rebuild, leaving the user one disk failure away from total loss of all data.

Going back to our 10-disk example, let’s assume that we’re using 1TB SATA disks with a total of 8.8×10^12 bits and an error rate of 10^14. We’ll have to read nine disks to rebuild the bad disk, resulting in 7.9×10^13 total read bits. Divide that by the 10^14 error rate and you have a 79% chance of getting a bit error. In other words, you probably won’t be able to rebuild the array!

(Note that the situation is often not that horrific. A good RAID controller will perform a continuous background media check looking for bit errors before the disk fails. If one is found then it is repaired while the array is still optimal. It’s difficult to say how much that improves your chances of rebuilding, but it’s generally accepted that background media checks are a “good thing”.)

Here’s the resultant formula:

ChanceOfDataLoss = (DiskCount–1) * DiskCapacityInBits / BER

In addition to minimizing the chance of a BER by performing the background scan, it’s common to divide the RAID-5 into multiple RAID-5 arrays combined with striping – more commonly referred to as RAID-50. All of the equations above can be easily adapted to a RAID-50 configuration.

Dean, I hope that helps answer your question. I realize it was WAY more information than you were looking for, but I was on a roll.