Backup is evil (or at least how many people do it is)

This week we are joined by John “Ricky” Martin, Director of Strategy at NetApp (and former owner of a tape recovery business), to talk about his paper that declares that backup is fundamentally evil and done in an unintelligent way. Mr. Backup wasn’t sure how this one was going to go, and there were at least one or two arguments along the way. No blows were thrown, though. We definitely talk about what a tape recovery business is, and what it was like to do that. We also talk about tape backup, full backups, multiplexing, tape handling, and other elements of how backup is still done today by many people. It’s a fun episode where you should learn a lot. (Full transcript below the video)

Episode Transcript

 

[00:00:00] Ricky Martin: Nobody wants to talk about backup. Nobody wants it. It’s evil, but it’s a necessary evil, but nonetheless, nobody really wants to talk about it. It’s like talking about plumbing, right? Nobody cares. Nobody cares about plumbing until you get backed up. *rimshot*

oh,

[00:00:15] W. Curtis Preston: Yeah.

[00:00:36] W. Curtis Preston: Hi and welcome to Backup Central’s Restore it All podcast. I’m your host, W. Curtis Preston, AKA Mr. Backup, and I have with me my clarified butter consultant, Prasanna Malaiyandi. how’s it going Prasanna?

[00:00:49] Prasanna Malaiyandi: I’m good, Curtis. I have to say my wife was quite surprised when we were talking about Ghee, which is Indian clarified butter. And she’s like, what are you talking to Curtis about? Because I think , first,, we started off with Herbal teas . Because you’re like, yeah, there’s not enough flavor. Should I try using like loose leaf?

And we were talking about that. And then I switched over to ghee, and she’s like, who are you talking to? And why does Curtis care about this?

[00:01:14] W. Curtis Preston: And then suddenly I got to talk to her and she, and I was like, I’m thinking about trying Ghee like, is there a brand that I should try? And your wife’s like, well, I make my own. And I’m like, of course you do. and so,

[00:01:24] Prasanna Malaiyandi: It’s a staple in like Indian cooking, right? Or

[00:01:26] W. Curtis Preston: I didn’t even know ghee existed until a couple of years ago.

[00:01:30] Prasanna Malaiyandi: And then it just keeps coming up. there was this meme on Facebook, where it was like, it’s two people and he’s like, this butter is amazing. And he says, actually, it’s ghee. And she says, thanks for clarifying, I did see that one on Reddit.

[00:01:43] W. Curtis Preston: it’s yeah, it’s been coming up a lot. And so I see it for me as a way to solve a long running argument in this house, because the thing about ghee for those that don’t know is that it’s shelf stable that By doing what you do.

You can just leave it on the counter and it lasts much longer than butter would on the counter

[00:02:04] Prasanna Malaiyandi: least a couple of months or until you finish it.

[00:02:06] W. Curtis Preston: Yeah. Yeah, exactly. I don’t think this jar that I bought by the way I have received my first jar of ghee today and I have already eaten some So I like to do toast with liberal butter , and I want it to be easy to spread. So I want the butter on the counter. My wife is concerned about the butter going bad. So she’s constantly putting my soft butter in the refrigerator and I’m like, dammit.

And then I have to slice it. And then I gotta nuke it just so I can spread it and it ticks me off. And so then I had this moment, I was like, ghee! Ghee can solve this problem. and so I was like, I bet Prasanna knows about ghee. And so now I’ve, I ordered.

[00:02:47] Prasanna Malaiyandi: And since you got it, how was it?

[00:02:49] W. Curtis Preston: You know what it wasn’t life-changing, but it did taste good. It tastes like butter, obviously. Super spreadable. it was like butter.

it was like butta. But, there’s going to be some moments I’m going to make some toast and then I’m going to spread it. and, yeah, I’m going to be, I’m going to enjoy some ghee. That’s all I’m saying.

[00:03:07] Prasanna Malaiyandi: Sometimes my wife likes to spread ghee on sourdough bread and then just toss it on a skillet and let it like crisp up on both sides.

[00:03:15] W. Curtis Preston: Oh yeah. Yeah. Yeah. So by, by the way, is ghee an Indian word.

[00:03:21] Prasanna Malaiyandi: I believe it is.

[00:03:22] W. Curtis Preston: I’ve been informed that ghee is from Sanskrit. That means sprinkled.

[00:03:29] Prasanna Malaiyandi: Interesting.

[00:03:31] W. Curtis Preston: Yeah.

[00:03:31] Prasanna Malaiyandi: I learn something all the time.

[00:03:33] W. Curtis Preston: Yeah, you never know what you’re going to learn here on the restore it all podcast. This is going to prove to be, I think, an interesting discussion. There could be arguments.

We, we might not agree. I might not agree with our, guest here, but he certainly has me when it comes to experience, he may be, I think he is the guest that we’ve had that has the longest time in IT. and, interestingly enough, he’s about the same age as me, but he’s been in IT for 10 years longer than me.

[00:04:10] Prasanna Malaiyandi: ’cause he wasn’t slacking. Like you.

[00:04:12] W. Curtis Preston: Yeah. Yeah. He was, I was slacking in high school and college. He’s been in IT for 40 years. You’ve been at NetApp for 16 years. Currently the director of market strategies, welcome to the podcast, John/Ricky Martin.

[00:04:28] Ricky Martin: Thank you for inviting me. It’s a pleasure to be here and talk to you face to face, so to speak. for the first time.

[00:04:33] W. Curtis Preston: So to speak, in what stands for face-to-face in the COVID world, are you in the bay area? I assume,

no. I’m a bay area. so I’m in Sydney. So I actually look out over a thing called Shipwrights Bay, which is next to Botany Bay, which is a bay area. so let’s go with that. yeah.

[00:04:56] W. Curtis Preston: I actually, I have a little story from my visit to Sydney. I visited the rock, There was like a plaque, as there usually is. I was reading a plaque. andI remember it said something like the founding date or the first arrival date was like 1770 something.

I don’t remember if it was just before 1776 or just after, but I just said, oh, that’s really interesting. That’s close to, the founding date of the U S and I wonder if there’s any relation to that and this Aussie looked at me and they were like, you’re kidding. Right? No, not kidding.

You do know that you were a prison colony first and that then you had the big battle and you told the Brits to, shove off. And so then they sent them here, Oh no,I’ve taken plenty of American history. Literally had no idea that we were a penal colony before Australia was .So yay, school system.

[00:05:53] W. Curtis Preston: But it’s interesting. it’s sort of a running joke about Australia, being, originally a penal colony. But that, for some reason, I don’t know, we just had better marketing or something. I don’t know.

[00:06:03] Prasanna Malaiyandi: no one wants to acknowledge that. I think in the U S.

[00:06:06] Ricky Martin: Yeah. You see kind of more up on the,the religious persecution and freedom from that sort of

[00:06:12] W. Curtis Preston: Yeah. Yeah, we just we’d glossed over that other stuff. History is written by the victors.

So you and I, interacted somewhere out there in the Twitter verse,and this paper that you wrote back in 2018 with a nice long name, improving economics and business workloads by using a self-protecting data infrastructure, Short title, I think, is backup is dead and NetApp is awesome.

Maybe I think that’s the, that’s my version of my reading of this paper. How I, how do I do

[00:06:44] Ricky Martin: Pretty good. Look at it. It’s the whole thinking for that came out of something else I did many years ago, even when I was still working at Legato, when I was basically the APEC guy for all Legato.

Some listeners don’t know. So Legato. They had Networker, the backup product, whichcurrently is owned by Dell, because Legato got bought by EMC, EMC got bought by Dell. so Dell Networker used to be

[00:07:08] W. Curtis Preston: yeah. I, I spent plenty of time, making, Networker backups back in the day. So I have my time around Networker. So you were actually at Legato, a backup company. And you had a presentation. I understand. That was not very backup friendly.

[00:07:26] Ricky Martin: It was called Backup is Evil, have you ever tried to get people to like you when you had those like trade fairs and you’ve got, everybody’s like presenting their wares and things like that, and everybody kind of walks past your stand cause you’re talking about backup and let’s face it backup is just boring as bat…?

[00:07:41] W. Curtis Preston: Yeah.

Yeah.

[00:07:43] Ricky Martin: just, nobody wants to talk about backup. Nobody wants it. It’s evil, but it’s a necessary evil, but nonetheless, nobody really wants to talk about it. It’s like talking about plumbing, right? Nobody cares. Nobody cares about plumbing until you get backed up. *rimshot*

oh,

[00:07:59] W. Curtis Preston: Yeah.

[00:08:00] Ricky Martin: sorry. so in order to get people to come and talk to me, I was taught, saying that backup is evil, right? Because back then even 15, almost 20 years ago, the whole idea about doing full backups, which pulls all of your data across from all of your subsystems, pushes it across your networks. It doesn’t just touch every piece of your infrastructure. It treads all over it in great big hobnail boots. If anything’s going to break, it’s going to break during backup. Okay.

[00:08:30] W. Curtis Preston: You’re going to, you’re going to have.

[00:08:31] Prasanna Malaiyandi: true.

[00:08:32] W. Curtis Preston: You’re going to have to hobnail boots. Is that a British phrase?

[00:08:37] Ricky Martin: Hobnail boots, yeah. It’s a British thing. It’s like boots that are so thick that they’ve got like these nails at the bottom to provide tread,

oh, okay. All right. All right. I’m with you. we, I’m bilingual, by the way, I speak both, both English and American, but go ahead.

[00:08:51] Ricky Martin: Yeah. So basically, having spent a long time in data centers at three o’clock in the morning, troubleshooting why backup has broken something or something isn’t working or why it goes really fast in one direction of the network because the duplexing settings are wrong. But restores go at 76 kilobytes per second.

We’ve all been there.

[00:09:10] W. Curtis Preston: Yeah, multiplexing is definitely evil.

[00:09:12] Ricky Martin: Yeah. So all of these things, which we do multiplexing and a whole bunch of stuff is done because we are trying to do something which is fundamentally a stupid idea in the first place, which is to take all of the data that we have and copy it across a network to some other device and hope that works on a regular basis.

And more to the point, hope that at some stage, when we need to restore all of that stuff, it will, it will somehow work . And actually expect that to work. And what works in IT without testing? Nothing.

When was the last time you actually tested, recovering the majority of your infrastructure from your backups?

And I would get lots of people going and you might get a bank going, oh, we have to do that twice a year. And I’m going, I bet you didn’t restore it from tape. Yeah, that they mumbled and they shuffled and they would look uncomfortable and I would go. So all it is this an insurance policy, right? Backup is just there as a way of protecting you against some form of disaster.

Now it’s not a cheap insurance policy, the amount of money you spend on backup is. There’s a lot of money in data protection. So would you pay for insurance? So you just sit there and you’d go. If you were to try and run that full recovery right in there, how much do you think you would get back? Just pull a figure out of the air and people would say 40% maybe might come back successfully.

[00:10:31] W. Curtis Preston: That’s a reasonable number. Okay. I guess what would you do if your wife said if our house burns down. Our insurance policy will cover the house. You get most of the house, she’d go and get a new insurance policy. Yeah.

[00:10:48] Ricky Martin: And so yet in IT, we all wander around in the back of our heads.

We know that the chances of actually successfully recovering is pretty low. And again, I will say this isn’t an abstract thing. This is the reality of almost every single person who pays for ransomware. And that’s a butt-ton of. Right. It fundamentally doesn’t address the problem that we want it to. And that’s not to say that backup per se is evil, butthe way that most people approach backup is and it just stems from this moving all of your data from one spot across a network to another spot, putting it into something which is meant to be an offline medium, taking that offline medium and putting it into a fireproof safe. Now here’s the other thing, all of this tape handling, right? I’ve seen situations where, what could be the very last copy of a company’s data being handed to a guy who gets paid less than the guy at McDonald’s to be put into the back of a rusty van, driven over some of the worst roads in the Southern hemisphere, To be put into what you hope is the right environmental conditions. So you sit there and you go, this stuff is good for seven or 20 years or whatever the case may be. None of the things which are on side that little label. Yeah. Please keep between this, this humidity and this temperature and this free from vibration of blah, blah, blah, blah, blah, blah, blah, blah, blah. Tape isn’t bad, but people don’t treat it the way that it needs to be treated in order for it to have the kinds of recoverability that people expect it to have. So I ran a tape recovery business for a while. And about 25% of the tapes that were sent to me to be recovered – and a lot of this was for legal reasons and stuff like that – failed, There were just media errors or things like that. Now, to be fair, there’s probably some selection bias there. the reasons why they said probably it’s because they couldn’t recover it, but still.

[00:12:35] W. Curtis Preston: Let me ask you, and by the way, this is the conversation that we started with. That’s what I, now I remember you actually said you had this company, this is how you and I first started talking. and that’s how, and that’s how you’re here. so what does that mean? A tape recovery business.

[00:12:51] Ricky Martin: So there’s a lot of people out there who have tapes that they need to recover data from for other legal discovery or something else that, and it’s usually not an operational recovery. It’s usually a recovery from say a tape that they no longer have the tape drive for. So my tape recovery business included things like chain of custody, where I would get the tape and I would find the old tape drives.

And I would find the old copies of the operating systems than the copies of the backup software. And I would then recover that data onto a piece of removable media and send that back to the data owner.and you’re saying that a significant portion of the time the tape was just worthless?

They needed significant amounts of extra handling. So it was media errors, the unrecoverable media error, .Most backup software won’t read a tape past that unrecoverable media portion, You actually have to do unnatural things to try and move past there and recover the rest of the data. So I would recover as much as I could from those tapes. And as I said, about 25% of the stuff that I was sent was. Just not there. So this whole keeping tape as an archive medium for 20 years, I just don’t. I personally, based upon my experience would never trust that. Because you can’t test it!

[00:14:08] W. Curtis Preston: Yeah.

Yeah

[00:14:08] Prasanna Malaiyandi: No, this is interesting because we’ve had some tape specialists, like guys who understandall the physical characteristics of tape on the podcast as well. Joe Jurneke , Mark Lance, talking about like tape and the physical properties of tape. And at least from what I can gather as a complete tape newbie, it seemed yes, there are issues, but there was a lot of resiliency built into tape. To handle some of these issues. Now, I don’t know. Maybe if it’s a software thing or like you said, maybe some of these old tape drives, maybe they weren’t handled in the right way. And that’s why you’re seeing some of these issues. Maybe it’s just the fact you live in the land down under. And so that’s why there are issues with tape.

[00:14:49] W. Curtis Preston: We don’t have any of these problems in the Northern hemisphere. I’m just saying.

[00:14:54] Prasanna Malaiyandi: But my sample size is also very small. I don’t hang out with people like unlike Curtis. I know Curtis you’ve run into issues with doing restores from tapes, Where you’ve had issues like some tape drives. It just doesn’t work.

[00:15:06] W. Curtis Preston: Yeah. My experience was that just tape or no tape, but my experience. Over 30 years of working with people with backups. Is that 99% of the problem was not the medium. It was the person behind the medium, right? It was. Yeah. and I will agree that tape is a problematic medium, right?

There are some things about it that I don’t think are problematic, but I do think it’s a problematic medium for a long list of reasons. You’ve mentioned some of them like multiplexing, but you talked about the environmental control, that 25%. you said just not there.

And again, I do agree with that. That’s probably some sample bias, right? Because you, because you were given all the worst stuff. and I also, I agree with Prasanna, like when was this? When did you have this tape recovery business?

[00:15:57] Ricky Martin: Oh, this was, not long before I joined, NetApp. So that would have been 16 years ago. Right. So, you know, a a lot has happened in sixteen

[00:16:05] W. Curtis Preston: So it’s not ancient history, but it’s not recent either.

[00:16:07] Ricky Martin: no,

[00:16:08] W. Curtis Preston: that would’ve been the early LTO days.

[00:16:11] Ricky Martin: Yeah, early LTO. I think we were up to LTO-2 around about that point in time. DLT, something or other, I’ve say tape is only as good as its handlers.

[00:16:18] W. Curtis Preston: Hm.

And when you think about it, who gets given the job of looking after backup? It is usually the most junior or the person who is like at the end of their career.

[00:16:27] Ricky Martin: Generally speaking the care factors of both of those groups of people are usually not the same. They’re not your gun SREs that sit there and proactively think about how do I eliminate every possible cause of failure.

[00:16:38] W. Curtis Preston: yeah, by the way I fight that issue all the time. You know? because nobody else wanted to do that job. That’s how I got my job and I just never got out of it. I just, I guess maybe I wasn’t smart enough. I never got out of backup, but the,and then actually I see that towards the end of people’s career, I often see them in DR.

I see them start in backup and end in DR. I heard you talk about you talking about the full aspect and we can both completely agree that the concept of occasional full backups is stupid. we did it back in the day when, when we had to do it, because if you didn’t do an occasional full, what, how are you going to do a restore with a tape from, you’re going to do one full and then you’re going to do incrementals for seven years. You’re not going to do that. You’re going to do, you’re going to do an occasional full.

[00:17:26] Ricky Martin: Incremental, forever restore. We used to talk about that with Tivoli storage mangler.,

[00:17:29] W. Curtis Preston: Yeah.

But the thing is that’s not to say that doing it occasionally for a really good reason is a bad idea, but using it as your first line of defense is just living in fantasy land because tape has this really wonderful.

Is that a tape sitting in a fireproof safe offsite? Right. Is air gapped in genuinely air gapped right? It is safe from hackers, from network, from software failures. It’s safe from every known form of data loss, short of a nuclear bomb. And some of them are probably safe against nuclear bombs too ,if they’re in the right kind of location. It is this wonderful catch-all thing that protects against all known forms of data loss. The trouble is that tape in that fireproof safe is not suitable for operational recoveries or even really disaster recoveries. it’s the last line of defense

[00:18:21] W. Curtis Preston: Yeah.

[00:18:21] Ricky Martin: Use it for what it’s good for.

[00:18:23] Prasanna Malaiyandi: You should not be trying to do your user deleted a file. Let me try to restore it. Oh, wait, I got to go call, recall a tape off premises and do the restore, right? It’s a last line of defense, like you said, but it should not be your first.

[00:18:36] Ricky Martin: No, it shouldn’t be. and that’s kinda what I put inside these like massive tables, which sort of talk about, what protects against various failure domains and, tape is green across the board.

[00:18:47] W. Curtis Preston: Let me ask you, when you look at this paper that it wasn’t that long ago, when you wrote it, it’s when I’m listening to you talk about it. I’m hearing a lot of complaints about tape and about things that we did because of tape. What about. disk based systems that aren’t doing the things that you talked about?

So not doing repeated fulls, not doing multiplexing. and you they’re less susceptible to the environmental stuff that you talked about. Generally the same companies that you were talking about before, but just not using tape as a primary mechanism. How much does that address your concerns?

[00:19:30] Ricky Martin: With the right combination of operational procedures and the right combination of technologies, it completely addresses them in a really elegant way. And I’m gonna bring up the whole, a snapshot is a backup, right? Even at NetApp. Now you will find people go, no snapshot’s not a backup. We can’t say that anymore.

A snapshot is a form of backup that protects against certain kinds of failures. So I sit there and say, what are your causes of failure as well? There’s user failures. protects against that pretty well. Application failures? Yep. Protects against that pretty well. Array failures? Bow does not, right? Site failures, but bow does not. Metro failures, not really. In order to protect against those things. You then need to combine that with replication to a separate physical device and preferably a separate location. So suddenly applications, Yep.. Users. Yep. Arrays. Yup. Sites. Yep. Metro. Yes. A malicious actor with privileged, local access. A hacker that’s got your, yeah. Sorry. It deletes a snapshot and goes to your remote site and deletes the snapshot, but then. Bow, it’s gone. So you then have to layer on what sometimes referred to as operational air gapping, which is things like WORM, right? Two factor authentication lots of people having to turn the key at the same time.

And while it’s not a real air gap, that’s good enough to stop people using nuclear bombs in the wrong way. Okay. It’s good enough to protect your backup.

So when you combine those things together and I’ve got this thing: a geo distributed object store with replication to a separate administrative domain, protects against users, user failures, application, array, site, Metro, malicious actors, and malicious actors with site access.

All of these things, that you now have to get to the point where you’ve got commandos going into both data centers with axes, finding their way to the array and chopping them both apart at the same time.

But you went from one extreme to the other. You went from legato to NetApp. I’m talking about all the companies in the middle. You went immediately, because I have my issues and by the way, I’m a fan of NetApp, right? I’m a fan of snapshots.

[00:21:39] W. Curtis Preston: I do not call them backup by themselves. I call them like a convenience copy. I just, it hurts my heart to call them backup Okay. Without replication. All right. we’ll just have to agree to disagree on that, but, when we started talking about a pure play system, like NetApp, the concern that I have there is if you don’t change forms at some point, it’s that concern of the rolling code problem.

If something goes wrong with data OnTap, could that take out everything? Both the primary and the secondaries, you know, Hacker, but just something goes wrong with data OnTap and then poof, all my stuff goes. That’s why even back when I was like at my height of my NetApp love, I still wanted, I wanted, and if Stephen listens to this one, it’s gonna, I wanted an NDMP backup of the, you know, back in the day,so so we have two extremes, I guess my question is, what about the other people that aren’t NetApp that are doing that are there, they’re not doing it because what I heard again, I heard you talking about the full copy. I heard you talk about multiplexing? and I heard you talk about, the environmental concerns. If you look at many modern data protection systems, they’re not doing the repeated fulls are doing block level incremental forever. They’re not using tape. I haven’t recommended tape as part of a backup system in. I don’t know, 15 years more than that. the only people left that I know that there are some, uh, Brian, I’m talking to you there what’s that?

[00:23:19] Prasanna Malaiyandi: was going to say Matt over at Spectra.

yeah. So Spectralogic uses tape for backup. Of course they do.

When they got hit by ransomware, attack, they recovered from that ransomware attack using their tape backup.

but wait, I want to just comment on that. Some part was recovered from tape. A lot of it though, they said they were able to pull back from snapshots, which they had on their.

which, and again, I got no issue with snapshots, I do still feel that there is a role of a different system, because earlier you were talking about this craziness, you said of the idea of copying it to a different, that’s the same thing NetApp is doing. I’m just saying, do it in a different form.

[00:23:59] W. Curtis Preston: Not use NetApp, use something else. Just don’t do the dumb things that you said, that repeated fulls. That’s been dumb for, a long time. Definitely multiplexing is multi, you talked about it, it was evil. It was a necessary evil. We had no choice back in the day. Multiplexing was the only way we could get the backups to, we could get enough data to make the tape drives go fast enough and be at least semi happy. but then we all knew that if we were going to do a restore, that was not going to be a good day. so what do you think about that?

As long as you’re using some form of minimal replication, like block level incremental, even to just file level incremental and replication. I’m a happy man, Plus WORM. It’s a good idea, right? The trouble is I still see customers who have this as an option, still electing to do streaming backups, because that’s what they’re comfortable with. Well that I can’t fix.

[00:24:53] Ricky Martin: No. So certainly it’s that. The other thing I’ll also say is that, and I didn’t want to turn this into a, an advert for NetApp, but I’m going to work for NetApp and I’m still very keen about the

we’re going to mention Druva at some point.

[00:25:04] W. Curtis Preston: Yeah. Yeah. The, like you talk about pushing this into a different format. That’s exactly why the latest versions of what it’s basically called snap mirror. It’s a very different way of doing things. It’s basically taking the snapshots and pushing this off and putting it into an object store. I want to hear about that. but I just realized that I haven’t yet done our standard, disclaimer, that Prasanna and I do work for different companies. I work for Druva. He works for Zoom. we’re not speaking for either company. The opinions here are ours.

And, please rate this podcast at ratethispodcast.com/restore. And if you want to come argue with me, Then have at it, baby.

[00:25:44] Prasanna Malaiyandi: Pick a time Curtis will be there.

[00:25:47] W. Curtis Preston: Or, if you want to come on and you want to go, Curtis, I think backups are amazing. I think that John Martin guy was a moron and we want to talk about just how much we love backups, whatever man, in this space, cybersecurity, ransomware,beer and backups. I keep threatening, we gotta do a, do another beer and backups

episode

[00:26:05] Prasanna Malaiyandi: We do have.

I actually guested on a show that was called beer and bytes. That’s literally the name of their podcast, beer and bytes, and I was required. Oh, it was so horrible. I had to bring beer to, to

[00:26:18] Prasanna Malaiyandi: Oh, poor baby.

[00:26:19] W. Curtis Preston: buy by the end of the, by the end they were drinking the whole episode.

By the end of the episode, I was a little loopy. yeah. So tell me about this,the SnapMirror to object

[00:26:29] Ricky Martin: It’s been called a few things within NetApp, sometimes snap mirror to cloud,It’s basically using, the same block level replication technology, you would expect to find it inside of, a NetApp array to go from it’s called snap mirror, To go from array to array where we’re keeping like an ONTAP file system with figuring out what’s what the change has been made there.

And we ship across like a blob of blocks and we apply that transactionally to the other ONTAP file system and away we go. Okay. And so they’re in, you have your problem, but what happens if that blob of stuff includes a level of corruption that screws up the file system at the other end. Now I can say I have never heard of that happening, but just cause it’s never happened doesn’t mean it couldn’t theoretically happen. Again, what this does is this replicates rather than to another NetApp array, it replicates directly through to an S3 object store. So we’re able to apply these blobs of changes directly into an S3 object, and then we can run through and grab that stuff and re-present that as a filesystem.

In fact, we can even do incremental restores from that into some other areas. So we’ve actually satisfied that I’ll call it an objection, fair, you know why. Yeah. If you’re not changing the nature of the format in which you’re storing the data, you have this potential problem. What neatly addresses that?

The other thing that preserves a lot of the goodness of array to array mirroring, the other thing it does, and this is the thing where the NetApp approach either. Better than using,third party.

a third party because all of those things drop these things into backup formats, which are meant primarily for recoverability.

Remember how I talked about what works in IT that you can’t test. Can you easily test the fact that stuff is there and is working? Can you sit there and say, is there 90%, a hundred percent, how quickly can I go through and start using that data? Again, all of those things typically require some form of recovery process.

And if we’re talking about. 80 90% of your estate, which has just been crypto locked. We’re now talking about 10 to 20 days worth recovering this stuff back onto something which is primary. If that copy is on something, which is primary, right? This allows us to use this replica, not just for data protection, but we can use it for, compliance, checking.

It’s a usable copy of the data and more to the point it’s a testable copy of that.

[00:29:00] Prasanna Malaiyandi: full disclosure. I used to work for NetApp on replication products. I love the technology, But I’m going to start poking a little, a few holes. Ricky. one of the things though, right? And this sounds very familiar. I’m sure you’re aware of it way back when there was snap mirror to tape where people want it to be able to dump to tape and ship it off places and restore it.

Except it looks like this is a much, much better designed, right? With the incremental forever being able to connect and being able to instantly access your files. It sounds great. But just going back to Curtis. One of the challenges is when you have a single vendor, Even though you are changing the format and writing it to object store, I would claim underlying it’s still the NetApp file system.

[00:29:41] Prasanna Malaiyandi: Correct.

[00:29:42] Ricky Martin: No,

[00:29:43] Prasanna Malaiyandi: You’re not writing out what waffle or anything else like that.

[00:29:46] Ricky Martin: there is no waffle on the other end.

[00:29:47] Prasanna Malaiyandi: Okay. So then I back

[00:29:49] W. Curtis Preston: waffle on this answer,

[00:29:51] Prasanna Malaiyandi: Okay.

[00:29:51] Ricky Martin: no, not waffling on this answer.

[00:29:53] Prasanna Malaiyandi: Okay.

[00:29:53] Ricky Martin: The data structure, which has by necessity has to be different because the semantics for being able to access this stuff, You can’t write over. Yeah. Just think about it. You’ve got get put in a bunch of other things.

I would be happy to arrange. For anybody, who’s interested to talk to a real expert on this, but if you think about it coming back to snap mirror to tape, which was a similar kind of thing, it was still basically a stream of the waffle file system on tape.

[00:30:22] Ricky Martin: Then there was another advance which was called advanced tape, which is still there right now, which basically allows you to do image-based backups of things because let’s face it.

Just doing file-based backups. I’ve got to say like in the old days doing a high density file system backup, which is oh no, God,no, please don’t make me do this. and then the answer was image-based backups and I’d go back to the legato, which is when I started my backup was evil. The thing for that was something called, God.

Do you remember bud tool and they’re there I’m Celestra? And that was an image based dump that was used by 30 that’s where this started. That’s what recently our backup was able to start it because like it was back up, kill your location, did nasty things to your server and blah, blah, blah, blah, blah.

this snap mirror to tape was that. And then we then with advanced tape which allows us to do these. This then takes that and logically improves it in a number of areas. So while it’s really new technology, it’s only been available in ONTAP for, I think the last two or three versions. it’s the foundation for so much of NetApp’s ongoing data protection.

cloud backup, or you may not know cloud backup, which is that’s based upon this same technology, right? Because it’s going through and pushing stuff directly onto S3.

[00:31:33] Prasanna Malaiyandi: Gotcha. And I do like the functionality that you mentioned, where you can instantly Mount that copy from the cloud. And start verifying, do I have my data there? Because Curtis and I have chatted with some folks in the past with ransomware recovery half the time is just figuring out like where your data is doing the actual recovery process.

Being able to quickly access it like you mentioned, and not have to wait for a tape to come back. It’s critical in being able to get your environment back up and running, identify which applications you need to bring back first and where that data exists. So

[00:32:03] Ricky Martin: Yeah. So if

you can,

[00:32:04] Prasanna Malaiyandi: that instant access functionality.

[00:32:06] Ricky Martin: absolutely. If you can Mount that, that recovery data store. Yeah. And access it, just using standard tools, not the backup tool, Because that’s the other problem you have is when you put this into a relative proprietary format, how do you access that easily? If this is just mountable via NFS or SMB share, especially for like high-density file system backups, you can start running statistical things where you just like, run like a chaos monkey of can I see that file?

Can I see this file? I should be able to see this thing here. Okay. Statistically be happy that the stuff which I should be able to recover, I can recover. It takes away that testability problem that I talked about before. And so you can always sit there and have a really high degree of confidence that if there is a disaster, I can recover this stuff and you can run tests on it, but you can sit there and say, how long would it take me?

And I can tell you with. well-designed NetApp infrastructure, right? That recovering four petabytes worth of data can literally take you 10 minutes, Going back to that previous recovery point, and you can test it

[00:33:08] W. Curtis Preston: Oh, you mean because you’re not actually recovering the entire four petabytes you’re bringing back the bits that are different.

[00:33:14] Ricky Martin: All you have to do is change the metadata. And suddenly you got a view, the dead of what it looked like an hour ago, two hours ago, five days ago, 10 days ago, whatever the case may be, and you don’t have to run through a recovery process. So in the same way that remember we only back up so that we can recover, but recovery is what it’s all about.

So if I can recover, like in 10 minutes, Yeah, that, that might be look, let’s say it’s an hour. Once you throw on all the other appropriate operational procedures, compare that to trying to recover a petabytes worth of stuff from any third party backup tool before you can start using it, There’s a week long.

[00:33:53] W. Curtis Preston: Yeah, for, for what it’s worth, at least, our position at Druva, when you look at especially a high density file system or anything like that, your primary protection mechanisms should be something like what you described, because there’s just no beating that.

I don’t care how I don’t care how fast you are bringing four petabytes back from anywhere is going to take. and, I, I do think, you talked a lot about testing and I couldn’t agree more. I, and I remember back, the first company that really did this to put this on the map was Veeam.

They had the sure backup and still have this sure backup, functionality that you automatically could specify. some systems that they would recover, they don’t recover it. They just bring it up. That’s the key, just the same thing that you talked about. And I know that with Druva we have the ability to do that on certain workloads.

[00:34:43] W. Curtis Preston: I know that we can do that Dr. and I know that there are other, some other systems, and by the way, we can all agree that. the more testing you can do the better number one and number two, the easier it is to do testing, the easier, the more you’re going to do it. I remember, when I worked at the. Um, we had to do, I think you, you said about the every six months, right? The re the recovery thing, and no, we did not do the whole data center. We picked a couple of critical workloads and we would do it. And it was a huge pain in the butt that involved 50 people that all had to be there, whether they needed to be there or not. And it was horrible. It was a horrible process. It was like, that was like traumatic. and,I know I’ve said it on the podcast before, but one of the things that. I would write the documentation, but somebody else had to read the documentation and execute the recovery.

And,and I wasn’t supposed to help write it. Like the idea was that I got blown up in the whatever, and so they have to do it. And I think that’s a great way to do recovery, but we need to do more automated recovery testing. we pushed this at Druva where, we’re like, if something is a critical workload, by the way, I have a similar thought to you, just obviously with a deal, with a different tool. So if it’s a workload that’s critter. That needs to come back. If it’s something that if you get a ransomware attack and it needs to be up now, then it should be covered by you should pre-configure it with our DR

service, And the only way to do DR, in my opinion, today for most people is to do it in the cloud. Number one, and number two, to do it in advance, meaning that if it’s really important, it needs to be just like you talked about with your image that you can just Mount it. Cause that’s what Veeam did..

[00:36:36] W. Curtis Preston: Right that’s why Veeam can do that automated testing is because they could just Mount the image and we can do the same thing with some of our workloads. but they, and then basically if you could pre restore the data so that, because if you are, I’ll bring up, go back to the days of when EMC was EMC, right?

One of their slogans was if you’re reaching for a tape, you’re dead. That was one of their things that you th the recovery needs to have been done already. They were pushing like SRDF and whatnot. But you and I agree on that, that you, if you’re going to have to pull down some giant blob of data, whether it’s from a bunch of tapes from a big, dedupe disk array whether it’s from a bunch of object storage in the cloud, If in order to get back up and running, you have to bring a bunch of stuff back and you haven’t already restored it.

You’re going to be in a world of hurt, no matter what technology you’re in.

[00:37:32] Ricky Martin: Yeah, absolutely. The biggest downfall from our perspective NetApp’s approach is that it requires to have your data sitting. On tap on waffle as your primary data source and market share says that’s about 15% of the enterprise data out there. There’s as I would love it to be a hundred percent, but realistically it will never be a hundred percent,

[00:37:52] W. Curtis Preston: I bet your stock price would be a lot better.

[00:37:54] Ricky Martin: Oh shit. Yeah. But, and so you have these other tools and heterogeneous backup is still an incredibly important thing. My real beef is about people. My problem is not the competitors, my problem is people’s practices and the way in which they define their requirements based upon like tape based backup thinking.

[00:38:22] W. Curtis Preston: It’s yeah,

[00:38:23] Prasanna Malaiyandi: Yeah, they haven’t modernized.

[00:38:25] W. Curtis Preston: yeah. That, so a good friend of mine, Reed, who also listens to the podcast and, he talked, he tells a story about the, the grandmother that, that. Always cut. What was, it was something about cutting the ends off the meat, low for something. And, and she’d done it for years.

And when you find out it’s why did, why do we cut the ends off to the meatloaf? And it’s because back in the day, it wouldn’t fit in the pan. If he didn’t do that. But she’s still doing it, but she’s still doing it right. 30 years later. And that is absolutely true that there are so like the full backup is the best example of what you’re talking about.

People that no longer use tape. So even if you’re not using a tool like Druva, Veeam, Rubrik, Cohesity. These are all incremental forever technologies that they were built to be that way. A lot of the other products, what they’ve done, they weren’t built to be that way.

They, but they’ve invented the concept of the, what’s it called? Prasanna. Synthetic full. Thank you. This synthetic full. So at least migrate to a synthetic full. if your product needs an occasional full, if you can do it synthetically, please do that.

[00:39:40] Prasanna Malaiyandi: don’t send all the bits over the

[00:39:41] W. Curtis Preston: if you’re still, I think if you’re still using a product that requires an occasional full. .

Sorry, and I, I know it’s gonna make me sound like a Druva bigot or something, but if you’re still using a backup product that requires an occasional full in today’s world, I agree with you. but that’s, I’ve always said, you were, you with your presentation back with legato.

I had a similar presentation where I was like full backups are stupid. I remember having a guy had a presentation of It was like 10 things wrong with most people’s backups. And one of them was that they were doing full backup. This was back, they were still doing weekly fulls when they could go onto monthly or quarterly foals and use.

Since, it’s stopped doing lake folds. You did that with tape because you had no other choice. and they,

[00:40:22] Prasanna Malaiyandi: it makes me feel good, Curtis

[00:40:25] W. Curtis Preston: what’s

[00:40:25] Prasanna Malaiyandi: it when they do it. That’s probably what they’re thinking. It’s the way

that things.what’s really hilarious is they do that weekly full. Despite the impact that has on the production environment. And then they store it on a D duplication disk array.

[00:40:38] Ricky Martin: And know what drives me nuts is even when people have a really easy option to go from fulls to using something modern. And I can’t give out the answers, but I have seen how many NetApp customers still use NDMP dump.

And

[00:41:06] Prasanna Malaiyandi: Yeah.

please stop on me. We have

[00:41:07] Ricky Martin: this really easy.I remember being a fan of NDMP at one point, It had its purpose at one point that purpose I think is, has, it’s past

And yet, And so I think might my whole thing is that backup. Don’t think about doing this backup system, build a recovery system. Think about what your most likely failure scenarios are. Think about failure domains, users, arrays, sites. In fact, a site is more likely to fail than an array is these days, I’ve seen more arrays and data centers taken out from plumbing accidents than I’ve arrays go down.

you have a good, viewpoint to, to have that data. That’s interesting that’s a, that’s an interesting statement.

[00:41:50] Ricky Martin: Yeah, because you sit there and go, what happens if the array goes down? Arrays. this isn’t just NetApp arrays. NetApp arrays are really awesome, but arrays these days are pretty damn reliable. The state of the art is that they stay up for five years, unless you need to reboot them for. Unless the, I’m not here to be mean about the competition.

There are some that were really awful, not NetApp arrays, for the most part, when an array goes down, it’s because there’s been a messy power center failure. And then something comes up in certain arrays. Don’t come back up online, all the NetApps ones do. Oh, there’s been a plumbing accident where it’s just seriously, I see a toilet breaks upstairs.

the data center floor is underneath that. And you’ve got water dripping down the racks and it’s oh shit.

[00:42:28] Prasanna Malaiyandi: Or someone forgot to change the battery on a generator fail over switch.

[00:42:31] Ricky Martin: yep. Or somebody tested the generator fail over and didn’t realize that all the PDUs were wired up wrong. I.

that is why we test things, but it, it’s cause we, our last guest w w did we have a guest? It was darn it. I totally, we had our last guest

[00:42:49] Prasanna Malaiyandi: Yeah, He He basically tried to do an exchange recovery

[00:42:53] W. Curtis Preston: That’s right. It was exchange recovery.

[00:42:55] Prasanna Malaiyandi: and it didn’t work. And he basically had to go through and luckily it was a test environment. Not all the pressure of people yelling at you, but he’s yeah, we have to figure out how to do like test our restores.

[00:43:06] Ricky Martin: Yeah, keep in mind, Chernobyl was because of infrastructure testing

I did not know that.

[00:43:12] Ricky Martin: wrong.

[00:43:12] W. Curtis Preston: No, you know what? I did know that I never, I guess I never really thought about. I never put it in that context. Cause I, according to the movie I saw the movie, that’s literally the extent that I know of Chernobyl,that, and a bunch of people are currently blockading it from Russian soldiers.

but, which based on what I saw in the movie sounds like a good idea,

[00:43:32] Prasanna Malaiyandi: Yeah.

[00:43:33] W. Curtis Preston: but stay away, don’t touch the building. Don’t touch the glowing building.

[00:43:39] Ricky Martin: I It’s not a laughing matter, but yeah, it’s

[00:43:42] W. Curtis Preston: Yeah.

[00:43:43] Ricky Martin: testing is doing, making this infrastructure testable I’m really cause I used to be a software developer and test driven development. Is how you do things. And if we think about all the things about creating agile data infrastructures or becoming into the cloud and all the rest of the stuff, this is about how do we make what we do testable and incremental and small changes and continuous development, continuous operations, traditional backup approaches, which depend upon I have to make sure my weekly full is done before I begin my change management.

And I have to allow for at least nine hours. to recover my environment in case it goes wrong. Means I can only do changes on the weekend. That means you get 50 change windows a year. And I ask people how many change windows is 50 change windows enough for you to get all the stuff you need to get done.

And they go nuts. I said, how many do you need? They go about 300. I go. So that means you have to be able to not just back up but recover more or less instantaneously.

[00:44:47] Prasanna Malaiyandi: Yeah.

[00:44:47] Ricky Martin: And you need to be able to sit there and rehearse all of this stuff. Yeah. During the day when everybody is there, not like how was it Chernobyl.

Everybody had gone home and all the good people were sleeping. You need to test all that stuff, make sure that it works And then you rehearse that you script it, you test it, you make sure it works. And then you execute that because I’ve been in situations where people did an SAP upgrade the wrong slow way.

And the SAP upgrade was going to take 17 hours, but they couldn’t stop. Because that it just starts it.

[00:45:19] Prasanna Malaiyandi: Uh,

[00:45:19] Ricky Martin: mean? And the thing is, if you sit there and you rehearse that on a test dev copy, which you spin up instantaneously, and you can do this in the cloud or wherever you want to do it, then something going to, oh, no, don’t do that.

Let’s do it this other way. And that’s my entire thing about that long white paper is I’m trying to make the case: think about failure scenarios, think about operational efficiencies, right? How do you support the business in doing what it is they want to do? Because the reason why the backup is given to the most junior guy, why it never really gets enough funding to do a really proper job, or if it does, it happens once every three years and becomes broken within four months, Is because it is not aligned. In fact, it’s antithetical to most business level objectives. It’s a boat anchor on change management. It can be so much more. In fact, the backup can provide the test dev copies to accelerate change

That’s what. Really long paper is all about it’s please stop thinking. Don’t use tape or traditional backups or any other form of backup for anything that is not really designed to be good for yes. Use tape to make off-site copies and put them in your fireproof safe because you know what it’s still is proof against nuclear bombs and all the rest of that stuff.

that’s a good thing to have at least one copy on tape. Don’t build your entire thinking around it. Build your thinking around business requirements. Sorry. There’s my rant.

[00:46:44] W. Curtis Preston: no, I get it. Yeah. I do still think that your primary beef is with a tape based system and a system that works like a tape based system. Rather than backup itself. and to me, backup is a broad tent, that

[00:46:57] Prasanna Malaiyandi: call it data protection.

[00:46:58] Ricky Martin: Yes.

[00:46:59] W. Curtis Preston: I’m okay with that. Yeah. And it includes the things that you do.

It includes DDP, right? It includes, a bunch of things and many of which I’m not a fan of, but they still meet the basic definition. and,this hasn’t been nearly as painful as I thought it might’ve been.

[00:47:17] Ricky Martin: Backup is evil is just clickbait.

[00:47:19] W. Curtis Preston: Yeah. Yeah. that’s, what’s going to have to be the title of this episode. It’s going

[00:47:23] Prasanna Malaiyandi: there you

[00:47:23] W. Curtis Preston: be so saSosa is Ricky Martin. All right. thanks for coming on, for coming on the podcast.

[00:47:32] Ricky Martin: You’re welcome. It’s been a pleasure. It really has. No, I don’t really get to talk to many people about.

[00:47:38] W. Curtis Preston: And they don’t let you out much over there. and persona, I know you didn’t get a word, too many words in edgewise

[00:47:44] Prasanna Malaiyandi: That’s okay.

[00:47:45] W. Curtis Preston: between me and Ricky.

I’m glad I did not have to tell you built to go to your corners. So that was a plus.

[00:47:52] W. Curtis Preston: That’s why you were going to be the, the, what do you call it? the moderator.

[00:47:55] Prasanna Malaiyandi: Yep,

[00:47:56] W. Curtis Preston: This was, this wasn’t so bad. I will say in the beginning when he was just really railing against backup, my

[00:48:02] Prasanna Malaiyandi: I could feel your blood boiling from here.

[00:48:04] W. Curtis Preston: It hurt a little bit. I was like, oh, but I was like, I’ve said a lot of the same things.

[00:48:09] Prasanna Malaiyandi: It’s just, Yep.

[00:48:11] W. Curtis Preston: yeah. Anyway, so thanks. thanks for not saying much

[00:48:14] Prasanna Malaiyandi: Yeah. It was entertaining. It’s all good.

[00:48:18] W. Curtis Preston: And, thanks to the listeners. we only do this for you. we do it so Ricky can have somebody to talk to and, we, make sure to subscribe to the podcast, so that you can restore it all.


%d bloggers like this: