RAID Stripe width
Posted in General, Storage Applications, Storage Interconnects & RAID, Storage Management, Advisor - Steve by Steve Rogers
Does the number of disks in a RAID-5 array affect the performance of the array?
Received a question from a reader wanting to know if the number of Drives in a RAID set affect the performance of the Array. The short answer is yes it most certainly does!
Commonly referred to as the stripe width, which refers to the number of parallel stripes that can be written to or read from simultaneously. This is of course equal to the number of disks in the array. So a four-disk striped array would have a stripe width of four. Read and write performance of a striped array increases as stripe width increases, all else being equal. The reason is that adding more drives to the array increases the parallelism of the array, allowing access to more drives simultaneously. As an example, you will generally have superior transfer performance from an array of eight 18 GB drives than from an array of four 36 GB of the same drive family, all else being equal.
I will also point out that adding more drives to your stripe width is a two–edged sword; especially with today’s Disk capacities increasing, soon to be 1TB per disk; The more high-capacity drives you have in a RAID set, the more likely it is to get bit errors and or a drive failure that will require a RAID rebuild; and RAID re-build / reconstruction times of these drives can be very long. I refer to my astute colleague, Tom Treadway, who has written several blogs about how MTDDL decreases as drive count increases - mostly due to BER. Not to be redundant in this Blog entry, his articles can be found at our Blog page referencing several titles: NOT everybody loves SATA, Is RAID-6 made of wood?, Real-life RAID reliability; and lastly, RAID reliability calculations.
From a performance perspective, my testing has shown that a stripe width of 8 to 12 drives is usually maximum for most controllers or storage arrays. Beyond that, you are usually maxing out the interface to the host or the backend of the storage array. Moreover, the more RAID sets you have, the more isolated your failures will be to a single data set that won’t affect data on the other RAID sets. If you go larger, higher RAID levels, like RAID 6, 50 or 60 are more “data resilient ” choices.
There really aren’t many practical reasons to go with a stripe width much larger, unless you have a hugely performance dependent application like -for real-time data capture - holding temporary/transitory data meaning, you don’t plan on keeping the data there very long, at least not without backing it up or having a copy somewhere else.