Instant recovery & dedupe are not friends

Instant recovery is the modern-day equivalent of what we used to call a hot site, as it allows you to recover immediately after some type of incident. I have personally advocated for this concept, as I strongly believe that in a true disaster (or ransomware event), time is of the essence.

As mentioned in my previous article, one company’s lack of an instant recovery system caused them to pay the ransom when they were infected with ransomware. They said recovering their entire datacenter using their backup system would have taken several days, and paying the ransom would cost less than several days of downtime. I explain in the article why I completely disagree with this reasoning, but I understand I have the luxury of Monday morning quarterbacking.

The key to being able to easily recover from a large disaster or ransomware attack is to be able to instantly spin up your entire datacenter in a hot site or an instant recovery system. This allows you to take your time addressing the cause of the incident, such as identifying and removing the ransomware itself, putting out actual fires, or replacing hardware damaged in the incident. If you can run your entire environment in a public or private cloud, you can continue your business – almost without interruption – regardless of how bad the incident is.

Dedupe is not instant recovery’s friend

Instant recovery is great, as it is allowing many to recover much quicker and better than they ever could before. Deduplication is also great, as it is the technology that enables so many wonderful things, like disk-based backup and recovery, offsite replication of backups without human intervention, and significant reductions in bandwidth usage. It’s the marriage of deduplication and instant recovery that usually doesn’t work.

Deduplication systems are very good at many things, but usually are not very good at random reads and writes. Just ask anyone who has attempted to run one or more VMs using their deduplicated backup data as the datastore. The performance might be enough to handle a single server that doesn’t require a lot of random I/O, but running several servers or an entire datacenter simply isn’t possible from a deduplicated datastore.

This is why post-process deduplication backup appliances make such a big deal about their native landing zone where recent backups are stored in their native, non-deduplicated format before they are deduplicated for replication or long-term storage. They advise customers who are interested in instant recovery to turn off any backup software dedupe.  Backups are sent to disk in their full, native format and are stored that way in the landing zone until they are pushed out by newer backups. This yields much better performance if you have to run multiple VMs from your backups.

But most people using the instant recovery feature tend to be using modern backup packages that already have deduplication integrated as a core part of their product. This means they are typically performing their instant recovery using a deduplicated datastore. This means that they should be able to recover one or two VMs at a time. However, they will probably be very disappointed if they try to recover their entire datacenter.

There are other ways to do instant recovery

If you are going to use instant recovery to run your entire datacenter in a disaster, the latest copy of your VM backups needs to be in native format on storage that can support the performance that you need. There are a couple of ways of accomplishing that.

Continuous data protection (CDP) products are essentially replication with a back button. Some of these companies describe themselves as a TiVo for your backups. They store your backups in native format, and also store the bits needed to be able to change portions of the latest version in order to move it back in time. (A good example of such a product would be Zerto.)

These types of products tend to do well at disaster recovery, but not at operational recovery. They’re good at recovering an entire datacenter, usually not so good at recovering a single file.  The DR functionality of these products can be quite advanced, as it is their specialty. Another upside of this approach is you only have to pay for one copy of your backup – plus the versioning blocks of course. One downside to this approach is most people also purchase another product for operational recovery.

Alternatively, you can use a backup product that uses its backups to update an image stored in native format as a DR image.  (Druva offers this as part of their Data Protection as a Service offering.) Instant recoveries – especially large scale recoveries of an entire datacenter – would run from this DR image. The advantage of this approach is you get operational recovery and disaster recovery in a single system. This is both simpler and less expensive than maintaining two systems. One disadvantage is that you will need to pay for the storage the DR copy of your data uses, equivalent to one full backup.  This cost is offset by the fact that you would be able to do both operational recovery and disaster recovery with a single product.

Don’t pay ransoms! Get a better backup product!

As I mentioned in my previous blog post, please prepare now to be able to recover from a ransomware attack or other disaster. Investigate the DR plans of your company, as you might need to activate them for something you might not consider a disaster. Your entire datacenter may be fully functional, but you won’t be able to get to your data if it’s all encrypted.  So make sure you have a solid plan for how you would recover from this scenario, because the likelihood that this will happen to your company goes up every day.

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

1 comment