Parity RAID is dying – Long live CDP & Near-CDP

I keep reading all kinds of articles and tweets like this one from Robin Harris that predict or announce the death of parity-based RAID.  (Not everyone agrees, of course.)  The ultimate worry is that a second disk would fail while a RAID 5 group is rebuilding, or a third disk would die while a RAID 6 group is rebuilding, causing data loss.  The bigger disk drives get, the long rebuilds take, and the bigger the chance that this would happen. This paper published at Usenix geeks out as much as you could possibly want on the subject.

At the same time all of that is happening, we keep making bigger and bigger datastores that store everything from Oracle/SQL/DB2 to hundreds of VMs stored on a single large volume. The company that had a mission-critical 300 TB database comes to mind.

All I’m saying is that when we’re talking dozens to hundreds of terabytes of data in one place, at some point traditional backup is not going to cut it.  I don’t care how fast your tape drive or favorite disk target is (deduped or not), at some point a restore of any kind is just not going to meet your RTO.

Here’s my next thought.  When we think about double or triple disk failures on a critical storage array, that sounds really bad.  But what if — in the extremely unlikely event this really bad thing happens to you — you just flip a switch and you’re now running operations from your backup system, that has a very recently updated copy of your data?  If it’s a true CDP system, you might not lose any data at all.  If it’s a near-CDP system, you might lose a few minutes or an hour.  It’s all about what you’re willing to pay for.

In summary:

  1. If you’re not thinking about CDP and near-CDP solutions, you should be.
  2. Does the idea of a CDP or near-CDP system take away a little bit of the sting of the fear of the death of RAID?


----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

10 thoughts on “Parity RAID is dying – Long live CDP & Near-CDP

  1. nickpc says:

    Curtis,

    I’m not sure I’m ready to accept going to a DR Scenario for a couple of failed drives (and if you eliminate RAID, it would be one drive). To do this would require a 2nd set of Production level performance disk arrays in your primary data center, or a Hot secondary data center with everything needed to run production. That against the cost (not just dollars, but performance, space, etc) of the extra disks in an array just doesn’t seem to make sense.

    I’m not commenting about the viability of CDP/near-CDP, just the idea of abandoning RAID seems a bit extreme. Disks will fail, but the switch you want to flip still seems too big, for today.

    I think RAID will be another example of a technology that never really goes away, because it takes care of a problem (failing disks) in a fairly simple way. And, after all these years, no one has come up with something that’s proven to be better.

  2. cpjlboss says:

    First, it’s not that we’re saying we’re abandoning RAID. We’re saying that the drive rebuild times of 2 and 3+ TB drives significantly increase the chance of data loss. Even with RAID 6, it is still possible that three drives could die while two others are rebuilding. The question is what do you do if/when it eventually fails.

    The question is what do you do when it eventually fails.

    Second, remember that the second set of disks does not really need to be high performance. In fact, since we’re talking about 2+ TB disks, the FIRST set of disks isn’t high performance — it’s SATA.

    How about this? Suppose rebuild times and chances of data loss with parity RAID become so long that people DO start abandoning it for RAID 10 (striping and mirroring). That would be expensive, would it not?

    Well, instead of that, what if they went to RAID 6 + CDP/near-CDP? That way they have a primary AND a backup system for just a little more disk than just having a primary system based on RAID 10.

  3. nickpc says:

    So it sounds like you think RAID is still alive and well, but needs to be supplemented with other technology to be a viable solution in some scenarios.

    I think RAID will live on a long, long time. I haven’t seen anything that looks like it can replace it for drive failures.

  4. cpjlboss says:

    @Nick

    You keep saying just “RAID” and I keep saying “parity RAID.” What I specifically think is having trouble is parity RAID (e.g. RAID 3,4,5,6,DP, etc.). Mirroring is also RAID, but it doesn’t use parity. Mirroring RAID levels have a much better protection level and a “rebuild” is just a copy, where as a rebuild with parity RAID requires a lot of calculation AND copying.

    It is parity RAID that I think is going to run into trouble in the SATA world at some point.

  5. gaulfinger says:

    Two forces may slow the “too big for parity RAID” march: 1. the move to 2.5″ SFF drives and 2. the move towards SSD.

    As SFF drive prices slowly converge on 3.5″ pricing, it’s becoming as practical to put 24 500GB drives in a 2U drive tray as 12 2TB drives. The SFF drives may cost a little more and provide a little less capacity, but they return faster rebuilds, higher performance, and lower power/cooling bills.

    SSDs are so much smaller, faster, and possibly more reliable than spinning media, RAID5 or RAID6 will make sense for quite a while. And since their performance will likely increase along with capacity (unlike HDDs which just get larger) parity RAID could make sense here for a long time to come.

  6. Joel Asaro says:

    I am still not clear on what you propose as the Ideal solution(though I realize that depends on a lot of factors). You say parity RAID is dying, but aren’t you suggesting sticking with parity RAID 6 and hedging with CDP? Doesn’t the risk of total failure get pretty low with RAID 10? Don’t you still have some level of risk even with CDP? Finally, while the number of disks for RAID 10 and RAID 6 with CDP may not be hugely different there is still a lot of infrastructure cost to implement CDP right?

  7. cpjlboss says:

    I’m leaving the recommendations for primary storage up to those who know it much better than I. (In fact, an assumption I had always had about RAID was challenged this week, and I’ll probably blog about that too.)

    What I’m saying is that having CDP or near-CDP as your backup system makes the data protection part of the argument moot. If you have a backup system that can easily take over for your primary system if the worst happens, then I’m less concerned about that happening. (There will still be downtime, so it’s not like I don’t care, but it moves farther down on the list of priorities, IMO.)

  8. nickpc says:

    Data protection and data availability are different. Hopefully no one out there believes their data is protected just by using Parity RAID. And I’m not seeing CDP/near-CDP as a good substitute solution for data availability. Maybe someday, but not with today’s solutions.

  9. rjorgensen says:

    I’ve gone for a “design” where we replicate everything important from one storagesystem to another one on another site. No automatic faiover but we run something from one site and other from the other, that way we only loose part of everything if something do goes wrong.
    And if one system die completly, well I’m really fine with having 40% of the customers offline for ~1hour or so while we switch everything over to the other system. Better than losing everything 🙂

    … and of course, once in a while (every hour or so) we do snapshot or similar off-system backups.

    … so in the end, it doesn’t really matter what sort of raid the storage system runs as long as they can handle a few disk failure once in a while. Actually had a failure where too many disks in the diskgroup failed so it went into recovery modus to get everything back. Worked fine, lost some speed from the applications but way better than loosing everything 🙂

  10. n0mb says:

    I’m a little behind on my reading… but to Nick who believe that RAID may never go away…

    Never say never. RAID will go away when spinning disks go away. Another generation of geeks will say, “Do you remember when people actually had to spin platters coated with metal oxides to store data? Haw haw haw haw, grrmph!”

    Happy holidays if you have them. I know for some of us this is simply an opportunity for yet another migration or upgrade. I’m happy to say that’s not me!

    Mickey

Leave a Reply to gaulfinger Cancel reply

Your email address will not be published. Required fields are marked *