Choosing a RAID level for redundancy over performance

Introduction

When configuring a new storage setup, system administrators around the globe face the same dilemma: which RAID level to choose. Like with most things in life, it’s a compromise, a trade-off if you will. We need to balance performance and redundancy, keeping in mind that losing raw capacity will result in increased costs.

Digital data-hoarders tend to be less focussed on performance (at least in terms of speed), and instead consider redundancy and uptime to be of prime importance. In other words, as long as our data is safe and available, that’s the most important thing. We’ll explore the concept later but in brief, if RAID reliability and redundancy are most important to you, RAID 6 is a good choice. RAID 5 comes in second place in most scenarios.

Of course, there are other factors to consider, such as the number of disks in your array, your workload (I’m assuming as a data-hoarder your writes are fairly infrequent), and your appetite for risk.

There are millions of articles online explaining the concept of RAID and what each RAID level means (not least of all the Wikipedia entries). It’s not my intention to regurgitate that information in this post, but instead to cover a few likely scenarios and express why a certain RAID level might be appropriate.

RAID 6 for redundancy

RAID 6 allows you to have two disks concurrently fail within the array and it will remain active. Whilst other RAID levels may allow you to lose “up to two disks”, with RAID 6 you can lose an actual “two disks” and still be fine.

The price you pay for this is a loss of raw capacity (you lose the raw capacity of two disks in your array) and a decrease in write speed (i.e. writes are slower).

I mentioned that for data-hoarders, write-speed isn’t always considered important, and whilst this is usually the case, there’s an important exception to bear in mind: rebuild times. When one (or perhaps even two, with RAID 6) disk fails and is replaced, the system needs to rebuild the array. This involves calculating the missing blocks of data from the failed drive and writing them to the new drive. This calculation is computationally intensive and depending on how RAID 6 is being implemented, it can take a significant amount of time to bring an array up to full levels of redundancy following a disk failure. You might not be too worried about slow rebuild times but it’s worth remembering that the longer a rebuild takes, the more chance there is that an additional disk will also fail. I find with RAID 6 that I don’t worry too much during a rebuild following a failed disk, because I know that although it’s taking a while to complete, I have the safety buffer of being able to still lose another disk whilst still likely being OK.

If you do choose to go with RAID 6, I would recommend using a high-quality, dedicated controller card. You want a controller that has been designed with RAID 6 use in mind and that’s going to be able to perform relatively fast writes, if for no other reason than to speed up the inevitable array rebuilds following failures. I would definitely steer well clear of using a “fake RAID”, or software-based solution as it’s never going to be as efficient as a dedicated hardware solution and it’s going to impact overall system performance by hogging CPU cycles during any periods of intensive writes or a rebuild.

Advantages

The advantages of RAID 6 are:

  • The ability to lose two disks whilst maintaining the array for reads and writes
  • Reads are relatively fast (no noticeable detrimental impact)

Disadvantages

Some disadvantages of using RAID 6 are:

  • The reduced raw capacity of the array (in effect, the capacity of two disks is lost)
  • Writes are slow compared to other RAID levels or single-disks
  • A dedicated RAID controller is essential for most real-world use cases

Use cases

RAID 6 lends itself to situations where either reliability, resilience, and uptime are of the utmost importance, or in cases where writes are fairly infrequent. For example:

A Webserver

Web hosting tends to be heavy on reads, with not too many writes (i.e. lots of users visiting a website to read or download content, with far fewer contributing or uploading to it). In addition to this, it’s usually important that a Web server remains online for extended periods of time and is unaffected by hardware failures.

Data-hoarding / Archiving

Most people doing data-archival (aka data-hoarding!), build their collections of TV shows, movies, and music over years and years, meaning writing data (for example, adding a new show) doesn’t happen too often. On the other hand, they expect their content to be available almost 24/7, feeding Plex, Jellyfin, etc. Again, this makes RAID 6 a good choice if funds allow, as it gives extra breathing space in case of a second failed disk during an array rebuild.

RAID 6 real-world example

Aside from workload, the number of disks in your array and your willingness to sacrifice raw capacity will likely affect your decisions on whether or not to adopt RAID 6. The table below shows how using RAID 6 would affect your usable capacity and redundancy given a range of different disk quantities and raw capacities:

Number of disksRedundant disksCapacity per disk (TB)Raw capacity (TB)Useable capacity (TB)Lost capacity (TB)Lost capacity (%)
424168850
42832161650
421456282850
6242416833
62848321633
621484562833
8243224825
82864481625
8214112842825
Popular RAID 6 configurations, showing useable and lost capacity

RAID 5

RAID 5 is somewhat similar to RAID 6, except, it provides only one redundant (parity) drive, rather than two. The benefit of this is that you lose less raw capacity with RAID 5 than you do with RAID 6, the main disadvantage, of course, is that you have less redundancy and resiliency. As such, RAID 5 requires a minimum of just three disks.

RAID 5 is popular in industry, as well as with hobbyists and data-hoarders. If you have a good RAID monitoring system in place and are able to get hold of a replacement drive fairly quickly in case of a disk failure, then RAID 5 can be quite appealing.

The biggest risk with RAID 5 comes when one drive has failed and you’re waiting for the array to be rebuilt after installing a new replacement drive. Unlike with RAID 6, you have no breathing space, so to speak. If a second disk fails during the rebuild then the array is going to go offline and need to be restored from a backup.

You might think that the chances of two disks failing in quick succession are slim but actually, it happens more often than any of us would care for. There are a few reasons for this.

Reasons multiple disks fail in a RAID 5 array

Similar ages and power-on hours of disks in the array

In many cases, all the disks in the array are the same make, model, and capacity, all manufactured at around the same time. This means that unfortunately, the risk of two failing at around the same point is increased. One way to mitigate this is to source the disks over a period of a few weeks or months and check that they’re all from different batches. If disks have failed and been replaced in the past at different points, then this is actually quite often a good thing. The disks will naturally be from different batches, manufactured at different times, with different levels of usage (“miles on the clock”, if you will).

Increased stress when rebuilding the array

Whenever a disk is replaced, and the array needs to be rebuilt, it causes a lot of read and write activity across all disks, not just the new one that’s been replaced. If the system has only historically been lightly used (as is often true with the likes of Plex servers and home NAS solutions), it will probably be the case that this rebuild activity is the most action the disks have ever seen. If a disk is nearing its end-of-life, it’s quite possible that a heavy session of read/write activity will push it over the edge. As mentioned, with RAID 6, this is really a minor nuisance, it just means you now have to quickly replace a second disk and you’re out of pocket a bit more than you anticipated. With RAID 5, this is a major hassle as a second disk failing during the rebuild is going to bring the array crashing to its knees.

High capacity / large disks

This is really a contributing factor to the two points already made above but it’s worth mentioning. In recent years, the size of individual hard drives has massively increased. At the time of writing, it’s not unusual for a home user or data-hoarder to build a new array with 18TB disks. Whilst this is great in most cases (improved energy efficiency, reduced noise, etc), there is a drawback that isn’t often discussed. The major downside is that it takes a long time to read or write the entire contents of these large disks, precisely because of their huge capacities.

For example, the Seagate Exos X18 has a sustained, sequential read or write speed of around 275MB/s. Whilst this is quite fast, because of its massive capacity (18TB), reading or writing its entire contents is going to take 18+ hours. Given the overhead of the RAID controller calculating parity, plus the fact that the reads and writes aren’t going to be sustained or sequential, it means that a rebuild of the array is likely to take several days to complete. Again, with RAID 6 you can relax a little knowing that even if another disk fails, the array will remain online. With RAID 5, unless you have nerves of steel, you’re probably going to be watching the rebuild progress, listening to all the disks spinning and clicking away at full load, praying that another disk failure doesn’t occur. If it does, of course, it’ll bring the array down with it.

Advantages

There are a couple of advantages to RAID 5 over some of the other RAID levels:

  • It works with a minimum of three disks. This is useful for hobbyists in particular where the cost of four disks might be prohibitive
  • Writes are relatively fast as they’re striped across multiple disks. They’re faster than RAID 6 as the parity calculation is less computationally expensive

Disadvantages

The main disadvantages of RAID 5 are:

  • Reduced redundancy compared to RAID 6 (RAID 5 can only lose one disk)
  • A dedicated RAID controller is really a must. Like with RAID 6, using a “Fake RAID”, or software solution is ineffective and risks corruption in the event of power loss.

Use Cases

RAID 5 is useful in cases where redundancy and resiliency are important, yet not paramount. For data-hoarding and general home use, RAID 5 is often deemed sufficient. For example, if you’re looking for a RAID level for your home NAS that holds your movies, TV shows, and music for Plex/Jellyfin, then RAID 5 is a good candidate. If a drive were to fail, you wouldn’t want to be without your media for an extended period and with RAID 5, this wouldn’t be a problem. If a second disk were to fail, then yes, you would experience some downtime and you’d have to dig through your VHS or DVD collection for a few days to keep you entertained. This edge-case (two drives failing), probably doesn’t warrant the extra expense of implementing RAID 6 for most home users.

RAID 5 real-world example

As we did with RAID 6, let’s have a look at how choosing RAID 5 might affect your capacity depending on how many disks you have.

Number of disksRedundant disksCapacity per disk (TB)Raw capacity (TB)Useable capacity (TB)Lost capacity (TB)Lost capacity (%)
314128433.3
3182416833.3
311442281433.3
4141612425
4183224825
411456421425
6142420416.6
6184840816.6
611484701416.6
8143228412.5
8186456812.5
8114112981412.5
Common RAID 5 configurations showing lost capacity and useable space

Conclusion

In this post, we’ve covered the fact that if you value redundancy over performance and raw capacity, you should probably be using RAID 5 or even better, RAID 6.

This is a fairly broad statement, so we’ve covered the different scenarios where this makes sense, mainly from the perspective of a data-hoarder or home user.

It’s worth mentioning, in case you aren’t already aware, that it’s possible to “nest” RAID levels. For example, you could nest RAID 1 within RAID 0 and have RAID 1+0 (or RAID 10, as it’s often referred to). In some cases, particularly in enterprise environments, there are likely times where this is preferable over RAID 5 or even RAID 6. It all really depends on your use case, budget and expectations.

If you spend any time reading the data-hoarding or home-lab forums and subreddits, you’ll find that the majority of users are going with RAID 5 or 6 whenever performance needs to take a back seat to redundancy, resiliency, and uptime.

At this point, it almost feels customary to sign off from any post discussing data storage, hard drives, or RAID arrays to remind you that RAID is not a substitute for a good, tried and tested backup strategy. Consider yourself officially reminded (again!).

Leave a Comment