I have a Silicon Image 3114 SATA RAID controller built into my Asus motherboard and I was trying to decide on the optimal stripe size for my 4-disk RAID 5 array. The problem I’m having is, there are a lot of articles online about the tradeoffs between large and small stripe sizes, but I haven’t found a completely clear guide yet that applies specifically to my situation. I found that most guides say that for larger files, a small stripe size is better, and for numerous small files, a large stripe size is better.
I’m not sure, but I would guess that for the large files, a small stripe size is better because it would give finer-grained streaming, more speed because the reads and writes are more similar in size to the 512 bytes of a typical disk sector. (I wonder how RAID 5 will perform on the upcoming 4kb sector disks) I am also guessing that for smaller files, a large stripe size would be better because it would reduce the amount of seeks needed if multiple files are retrieved at once due to the large stripe (the largest available on my controller was 128kb, thats a lot of text at once, if I was running a database).
The best guide I found for RAID 5 stripe sizing
So with all this inconclusive information, I just had to make a quick decision and hope for the best. I could try a bunch of different numbers and do benchmarking, but really all I’m after is the fault tolerance, because right now all my data is on single disks, Dangerous with a capital D.
This is what I decided: since I’m encrypting my entire disk as a controller and disk stress test, I had to take into consideration the encryption scheme, which gives me the option several different cluster sizes. A cluster is a packet of encrypted data that will be written to the disk at once, so it determines the minimum actual size a file can have on the disk. If the largest TrueCrypt cluster size is 64KB, then the ideal the ideal 4-disk RAID 5 stripe size is 16KB, because each cluster written by truecrypt will be theoretically evenly spread across all four drives – 16KB on each, the perfect size to fill one stripe. Even better, the TrueCrypt website says that the larger the cluster size, the better the performance.
I’m aware of the huge performance hit that the 4KB sector disks take when they are simulating 512b sectors and the sectors are misaligned, but I don’t think there will be a similar issue in my system, and here’s why:
most of the time I’ll be transferring large files with thousands of clusters each.
the 4 physical hard drives will always be reading the normal 16KB at once, because it’s the size of one stripe.
the 4 stripes will be combined by the RAID controller as usual.
TrueCrypt will wait until a full cluster is available before decrypting.
Most likely it will receive a full cluster with each stripe read from the controller, but if not it only has to wait 1 read cycle before the cluster is definitely completed, and that shouldn’t add much latency at all.
All in all, I think I’ve found the perfect balance between fault tolerance, high availability, security, and performance given the software and hardware available to me. I’m really looking for input from readers on this post, please leave comments if you spot technical errors or have ideas about how I can increase my system’s performance.