IT Admin deletes entire datacenter THEN tests his backups! (Restore it All Podcast #135)

This week’s guest tells the most incredible story we’ve ever had on the podcast. We’ve had ransomware restores, disaster recoveries after a hurricane, but we’ve never had someone who deleted their entire computing environment and then restored it using their backups. (Backups that had never been tested to this degree, BTW.)

Paul VanDyke is the IT Supervisor at the Kodiak Island Borough in Alaska, which is the second largest island in the US and has to satisfy its backup and DR needs while staying on the island. Cloud resources are not a possibility due to bandwidth concerns, so he’s doing things “old school.” We first talk about the kinds of things they are protecting from, including tsunamis, fires, and strong winds. They are primarily based on tape, and for DR they store copies of all backups in a nearby safe. We discussed ways they could improve their resilience, such as shipping some tapes to a location on the mainland.

But the highlight of this episode is the story of when Paul intentionally destroyed his entire environment and then tested his backup system! He learned many valuable lessons, starting with “don’t ever do that again!” Luckily, his test was successful, albeit not without some challenges. He wiped the storage arrays on five servers: two domain controllers, an email server, a file server, and an application server and then restored them. (He had his reasons for doing it this way, which he goes into in the podcast.)

One big thing he learned was how restores are often slower than backups. So he prioritized critical apps (e.g. email, fileserver, logins) and got them up by Monday morning. Then it took him a few more days to get the application server up and running due to a more complicated restore. We have a really good discussion on how Paul could have done things better, including a really good idea that Prasanna came up with it. Curtis also tells a similar story about the first time he “tested” backups when he actually needed them, versus doing it in advance.

We cover a number of topics and questions on this podcast:
What was an Exabyte Mammoth (M2) tape drive?
What is a helical scan tape drive?
What is multiplexing?
Why can restores be slower than backups?
What happens when you rebuild a RAID array?
Should you have a post-mortem after a large incident?
How important is recovery testing?
How important is it to set expectations in IT, especially when it comes to recovery times?

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

ZFS filesystem in the cloud – just for your backups (Restore it All Podcast #134)

The founder of rsync.net, John Kozubik, joins us on the podcast this week. It’s a unique offering: a ZFS filesystem running in a private cloud – accessible only via SSH – that is designed just for sending your backup data to. They support anything that can run over SSH. Use rsync, scp, etc. to copy your data unencrypted, or something like restic, duplicity, or borg, if you want your backups to be encrypted. (All backups are encrypted in flight, of course, because they are all over SSH.).

The servers are completely locked down except for the SSH port, so they’re about as secure as they can be for what they are. You can configure ssh to behave the way you want it (e.g. passphrase, MFA, etc.), and the ZFS filesystem automatically creates daily snapshots of the backups you send there. (More complicated schedules can also be created.)

You pay by the gigabyte ($.025/GB/mth) for the size of the ZFS filesystem and its associated snapshots, but they urge you to NOT over-provision. Provisioning is easy and non-disruptive, so only add storage when you need it. For an extra fee ($.017/GB/mth), they can also replicate your backups to another region.

It’s a no-nonsense offering that seems to be unique out there – especially when you add the ZFS features. Check out the website and rsync.net, and you’ll see they aren’t spending any money on being flashy. They just want to build a rock-solid ZFS syncing destination that is separate from any cloud provides.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Rclone creator Nick Craig-Wood Explains This Powerful Tool (Restore it All Podcast #133)

This week, we talk to Nick Craigwood, the creator and principal developer of rclone, a very popular open-source tool for copying data to and from cloud providers. Rclone is downloaded roughly 250,000 times each month, and has over 30,000 stars on GitHub. There are six core developers, and a great community of users and other developers at rclone.org.

We talk a little bit about Nick’s development philosophy, which is that he doesn’t mind adding features – as long as they don’t break backwards compatibility. Then we talk about how rclone works, and what it’s like to sync a filesystem to an object store – including support for multi-part uploads and downloads. We also talk about rclone’s encryption support, while Nick was “relaxing” on holiday. We then talked about how rclone can be used to minimize the risk of backing up to any one cloud provider, preventing things like what happened during the OVH fire earlier in 2021. We also discuss some strategies, such as backing up directly to two different clouds, versus backing up to one, then syncing to another – and how CloudFlare’s R2 might figure into things. Finally, we talk about Nick’s plans for rclone’s future, such as making their web UI better to increase usability for many more people – while not sacrificing the command line. Join us for a fascinating episode, the first one where we’re talking to the creator of the tool in question.

Don’t forget the drawing for a free e-book version of Modern Data Protection. All you have to do to be eligible is sign up for my newsletter at https://www.backupcentral.com/subscribe-to-our-newsletter/

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.