What are RTO and RPO, and how do they drive backup design?

Start listening

If you don’t meet your company’s Recovery time objective (RTO) and Recovery Point Objective with your backup design, nothing else matters. Seriously. No one cares if you can back up – only that you can restore in a timeframe they consider reasonable. The only way that’s going to happen is if you agree to these times UPFRONT. In this episode in our new back to basics series, we’ll jump right into this extremely important topic. We’ll explain what RTO & RPO are, what recovery time and recovery point actual are, and how they relate to RTO & RPO. We’ll also explain how to get your company to agree to them.

Transcript

[00:01:17] W. Curtis Preston: Hi, and welcome to Backup Central’s Restore it all podcast. I’m your host w Curtis Preston, AKA Mr. Backup. And I have with me a guy that I am absolutely positive. My, my dad, when I was younger would definitely call a hippie.

How’s it going, prasanna?.

[00:01:35] Prasanna Malaiyandi: I was trying to figure out where you were taking that. And I was very afraid where that was going.

[00:01:44] W. Curtis Preston: Yeah, my dad would’ve with that hair that you got going on, my dad would totally have called you a hippie back in the day. And this is well, actually pretty sure both of my parents and mind you, I don’t know if I’ve ever told you. Did I tell you that? I, I wasn’t allowed to wear jeans until like, I, I didn’t own a pair of jeans until I was 18 years old and it’s because genes were what hippies wore.

[00:02:07] Prasanna Malaiyandi: Interesting.

[00:02:08] W. Curtis Preston: That was. Corduroy was my, um, was my, uh, yeah, exactly. Um, you know, the, you know, you could start a fire down there. Um but yeah, you, you know, you,

[00:02:25] Prasanna Malaiyandi: more people don’t wear corduroy. Like they’re super comfy. They’re nice material.

[00:02:30] W. Curtis Preston: what’s amazing is they’re pretty warm though. And I did this in Florida, right?

[00:02:36] Prasanna Malaiyandi: I don’t know about Florida if you’d want that there.

Honestly though, I think even today, people would still call me a hippie. So it’s all good. Curtis, you don’t have to go back to your dad’s generation back in the day, calling me a hippie back then. So,

how about this? If we can get five comments on our podcast, right? Positive comments in the next month. No, sorry. In the next two weeks from when this goes live,

[00:03:05] W. Curtis Preston: all right.

[00:03:06] Prasanna Malaiyandi: Curtis will grow a beard for the next three months for the next three

[00:03:10] W. Curtis Preston: let’s see apple,

[00:03:12] Prasanna Malaiyandi: Yeah.

[00:03:13] W. Curtis Preston: pulling up to see. All right. We have to be specific. Uh, alright. Right now we have 16. Ratings on, on the,

[00:03:25] Prasanna Malaiyandi: so I’ll make it a little harder. So if we get to TW hold on, I said five before.

[00:03:30] W. Curtis Preston: yeah.

[00:03:31] Prasanna Malaiyandi: Okay. So, okay. So if we get nine, how about that?

[00:03:35] W. Curtis Preston: Well, if we get nine, if we get nine new ratings and comments I’ll grow a beard until Christmas.

[00:03:42] Prasanna Malaiyandi: Okay.

[00:03:44] W. Curtis Preston: I, I, I can’t commit to after that, but, but I don’t see that I don’t see that happening.

[00:03:51] Prasanna Malaiyandi: And just, just, just, just watch Curtis, just be careful.

[00:03:55] W. Curtis Preston: uh,

[00:03:55] Prasanna Malaiyandi: this is, this is nine from when this episode goes live, correct.

[00:04:00] W. Curtis Preston: yeah, yeah. Which, which will be sometime in September or, well, it might be in August. We’ll see. Um, yeah, so right now, there’s right now there’s 16 ratings on the, uh, on the, uh,

[00:04:14] Prasanna Malaiyandi: just, yep. Just, just, just wait, just wait.

[00:04:19] W. Curtis Preston: All right. Well, now I’m scared, but, uh, but speaking of ratings, I’ll throw out our, our podcast, our, our, our disclaimer, uh, Prasanna and I work for different companies. He works for Zoom. I work for Druva and this is not a podcast of either company. The opinions that you hear are ours. Also rate us at ratethispodcast.com/restore. And, uh, you know, if, if, if you like what we’re talking about, if, if, you know, if you. Somebody who’s been listening to the podcast.

We know you’re out there. Just reach out to me @wcpreston on Twitter, or w Curtis Preston at Gmail. And, um, you know, we’ll, we’ll get you on the podcast. We’ll talk about your favorite subject. We’ll even keep you anonymous if you want. Right. We’ll give you a fake name. Like we’ve had Harry Potter and Ron Weasley on here.

Um, you know, it’s all.

[00:05:07] Prasanna Malaiyandi: couple mystery guests without any

[00:05:08] W. Curtis Preston: Yeah, yeah, yeah. We’ve had, yeah, we’ve had, yeah. Where we didn’t even give him a name. Right. Um, so, uh, so that’s all good, right? That way you can, you can speak to your heart’s content and not think, not worry about what your employer thinks about. It we’ll even disguise your voice. So, um, I thought this week we would kind of go back to.

Basic, but incredibly important topic. And that is just to the concepts that are RTO and RPO, which of course for those who don’t already know

[00:05:46] Prasanna Malaiyandi: or office RTO is a

[00:05:50] W. Curtis Preston: So recovery time, objective and recovery point objective. And then also we should talk about, um, you know, RTA and RPA and how those are related, but completely different. So let’s first talk about, and, and, and I guess what I’m gonna make the title of this is why RTO and RPO are, you know, what, what did I say?

I was gonna, what did I put here? Oh, I didn’t put, what did I say in the message Prasanna? I was quite eloquent.

What, what did I say the title should be

[00:06:24] Prasanna Malaiyandi: why RTO and RPO should drive all backup design.

[00:06:28] W. Curtis Preston: done? Right? Because let me, let me ask you a question Prasanna. Do backups matter.

[00:06:39] Prasanna Malaiyandi: No,

[00:06:40] W. Curtis Preston: Does anyone care if you back up? No one cares. If you back up, they only care. Yeah. The only thing is what

[00:06:50] Prasanna Malaiyandi: is restoring data, and if you fail to restore data, there’s a high likelihood. Your job might be out gone. So.

[00:06:59] W. Curtis Preston: Yeah, and it, and it, and I would say, and I, I feel so strongly about this and, and by the way, I’m, I’m speaking to the I’m, I’m not speaking to the choir. I’m speaking to old me. I spent the first, I don’t know how many years of my backup career, not really knowing much about RTO and RPO. And I, I kind of used the concepts, I suppose.

But I didn’t use them to drive backup design. I didn’t use them to set expectations with my, you know, with my customers. Right. I was at a very large bank and we had all kinds of expectations. Um, And, and, and I know, you know, and you know, you’ve worked at companies where you’ve got customers that have expectations and you’re, well, you know, you, you and I are both married, so many arguments that you have as, as a marriage couple, as a married couple comes from what mismatched expectations.

Yeah. Um, and, and I think that’s, that’s go ahead. Go ahead. What were you gonna

[00:08:19] Prasanna Malaiyandi: Now, and I was just thinking back to, I know we’ve had Jeff Rocklin on this call or on the podcast and in your book, right. That you wrote modern data protect.

[00:08:30] W. Curtis Preston: Yep.

[00:08:31] Prasanna Malaiyandi: by O’Reilly um, in that book also, right? There’s that entire chapter of working with your stakeholders, setting expectations, right?

Understanding and getting agreement on, Hey, this is what it is, because like you said, a lot of the time it comes down to expectations aren’t agreed to, and aren’t set upfront and therefore, when something goes wrong, Inevitably something does right. Then everyone’s like, oh, that’s not what I thought. And oh, I thought I would get my data back tomorrow and oh, why am I losing data?

Right. And it’s like, oh, because these things weren’t clearly documented, discussed upfront. Like you said, Curtis, when designing those backup systems,

[00:09:10] W. Curtis Preston: Exactly. And, and so, you know, when you think about the ways that you can get in trouble as a backup admin, one of them is clearly you either the restore didn’t complete the. In the expected amount of time and the restore lost more data than was expected. Now I’m just saying expected, I’m saying it that way specifically because it it’s, it’s what they were expecting, not what you were expecting.

right. You probably always knew how long it was gonna take, but if the powers that be well, you may or may not know we did have an episode. Why, uh, why restore is often. Usually it takes longer than the backup. Right. Uh, and if you haven’t listened to that episode, I would highly recommend it because it goes into it.

Go ahead.

[00:10:04] Prasanna Malaiyandi: oh yeah, like you were saying, it goes into all the details about why restores can be slower than your backups and issues around that. Um, I was also going to add that. Even another challenge that you see with restores is some people. And I know Curtis, we always talk about this, verify your restores, right?

Verify that your backups are done. Right. But some people have never actually done a restore. So they can’t tell you, or they’ve done such a small restore that they don’t know how long in real life is it gonna take to bring my Oracle database back up and running to the latest point in time.

[00:10:38] W. Curtis Preston: They’ve often done sort of functional restores, but not per, but not performance restore tests. Right. I, I, I remember when I was at the bank, we did something like. Like we did a handful of restores every single day because we didn’t have snapshots back in the day. There were the number one reason for restores is still still human, human action.

Right? Not in fact, I would say it’s even more, uh, more so today than it was. 20/30 years ago, because now we have raid and erasure coding and highly reliable drives like SSD drives versus the rotational drives that we were, uh, that we used for so many years. I would say that, that at this point, like 99% of the time that you’re gonna do a restore is due to some kind of action of some kind of human, some

[00:11:36] Prasanna Malaiyandi: I was, I was just

[00:11:37] W. Curtis Preston: but.

[00:11:37] Prasanna Malaiyandi: you seen a study about that? That’d be an interesting stat. I wonder if there is an industry stat talking about what percentage of restores is actually user

[00:11:47] W. Curtis Preston: Yeah, I, I just, I just think about like the,

[00:11:49] Prasanna Malaiyandi: Yeah. Yeah. The day to day

[00:11:51] W. Curtis Preston: center used to be. Right. We, we, when I started, we had servers running on a disk drive. A disk drive. You know, you had the OS disk drive, you had the application disk drive, and then you had one or more data disk drives, disk drives, not LUNs, not LUNs on a RAID array.

What’s a RAID array. Right. Um, and it was all obviously rotational discs. And we went through you. I’m sure you know, nothing of this, but there was. There was a big HP recall. We had HP, a lot of HP servers and it was an HP disc recall because it was, they were leaking. Swag oil. I don’t even know what that means, but swag oil, they were leaking swag oil onto the platters and thus creating data loss.

We, we called them the Valdees discs. Um, for,

[00:12:46] Prasanna Malaiyandi: on Exxon. Valdis the oil

[00:12:48] W. Curtis Preston: based on Exxon Valdis apologies if anybody works in that, you know, industry, but, uh, that was what we had back then. That that just doesn’t happen. Right. I mean, indiv, if an individual drive, whether it’s first off SSDs fail way less often than, than rotational drives.

Right. And, and if they do they’re in an array and it’s just replaced, it’s, it’s, it’s like replaced right away. Right. You have hot swappable drives.

[00:13:19] Prasanna Malaiyandi: I think though the challenge is if you think about the types of scenarios and use cases, right? When I think about like a user accidentally deleted something or some, a use case like that, the amount of data I’m restoring, isn’t a large amount. Right.

[00:13:36] W. Curtis Preston: well, but, but it’s not just the, it’s not just the user. Right? Notice the way I, where I said it, I said that the action of some human, right. That could be an admin dropping a VM, it could be a hacker.

[00:13:52] Prasanna Malaiyandi: That’s okay. That’s what I was gonna get to right. Is yeah. The ransomware style use cases where yes, that is a smaller percentage of probably overall restores. But if I look at the amount of data recalled during those scenarios versus typical user restore behaviors, right. Um,

[00:14:10] W. Curtis Preston: Oh, I see. Oh, that

[00:14:11] Prasanna Malaiyandi: do on the much larger end.

Right. Which are the RTOs that you need to be considering, right?

[00:14:17] W. Curtis Preston: So you’re saying that you think that if you look at data per reason, the amount of data of restored versus the number of restores. You’re saying, if you look at the amount of data restored versus the reasons that you’re restored, you think that the vast majority will be, uh, ransomware attacks.

I, I, I can’t, I can’t dispute that. I would say ransomware attacks and disasters. Um, but, uh, interesting. Yeah. So, so let’s go.

[00:14:48] Prasanna Malaiyandi: of things, oh, sorry. Just quickly on that from the restore time objective, that’s why it’s important to understand and try to figure out a way to extrapolate, to get to that sort of full RTO restore scenario. So you understand the performance there as well.

[00:15:02] W. Curtis Preston: Yeah. You

[00:15:03] Prasanna Malaiyandi: it an application or a bunch of VMs or whatever else it is

[00:15:09] W. Curtis Preston: you know, this is gonna sound like a non sequitor, but, but, um, I’m pulling up a, I’m pulling up a scene from the west wing. I don’t know. Did you ever watch the west wing?

[00:15:17] Prasanna Malaiyandi: Nope.

[00:15:19] W. Curtis Preston: The west wing is an amazing show. Um, and, um, you know, in this house we’ve seen, my wife has seen the entire west wing at least four times the entire series.

And there’s this scene in there where the president played by Martin.

[00:15:41] Prasanna Malaiyandi: short.

[00:15:41] W. Curtis Preston: was gonna say Martin short, it’s not Martin short Martin.

Martin she, okay. Yeah. Um, yeah, so the president played by Martin sheen, uh, decides on his next Supreme court justice, who was Edward James Almos. And there’s. There’s this moment, he goes, so when’s he gonna get here? And he goes well in a couple of days, what a couple of days, like, normally again, this is the expectation thing.

Normally when somebody’s nominated for a position like Supreme court, they hop on the plane that moment.

[00:16:17] Prasanna Malaiyandi: Yeah.

[00:16:18] W. Curtis Preston: But, but the, the, the Almos character decides to drive down. Uh, through he lives in Maine and he’s gonna drive down and stop in Connecticut for some, for some antiquing. Um, so there’s just, this, this is the thing it’s like, what’s the expectation versus what actually happens.

[00:16:38] Prasanna Malaiyandi: Yep.

[00:16:39] W. Curtis Preston: So, so, and, and I, and I would say that the bigger, the restore, the greater the expectations.

[00:16:45] Prasanna Malaiyandi: Yep.

[00:16:46] W. Curtis Preston: And so what you have to do, what you must do, and if you have not done this yet, you must do this. Now that is to decide per application on an RTO and an RPO.

And, and then, and by the way, this is an ITER, an iterative process, which I think we’ll, we’ll probably talk about at the end here. How, how do you do this? Because, and I know we’ve talked about this on the, on the podcast before is if you ask the typical business unit, what RTO do they want? They will say zero, right?

How fast do you wanna restore it immediately? How much data do you wanna lose? None

[00:17:36] Prasanna Malaiyandi: How much are you willing to spend?

[00:17:38] W. Curtis Preston: none , their answers are always the same.

[00:17:41] Prasanna Malaiyandi: Yep. And I think though, going back to what you said, right? This is where. It’s not so much that as a backup person, you decide what the RTO and RPOs are. I think

[00:17:51] W. Curtis Preston: You absolutely do not do

[00:17:53] Prasanna Malaiyandi: yeah. You have to have that discussion with the business stakeholders to be like, okay, what are you expecting?

And like you said, right, you ask them and it comes back to sort of dollars, right? Because Hey, if you want that zero RTO, zero RPO, zero data loss, right. That is going to be a pretty penny. And is that really needed by your application or. Can you

[00:18:13] W. Curtis Preston: It. Yeah.

[00:18:14] Prasanna Malaiyandi: yeah, I’m okay. If it takes a weekend to bring back up, it’s not mission critical.

And so that’s fine. And we’re not losing a lot of downtime.

[00:18:24] W. Curtis Preston: Yeah. The answer to your last question is almost never, right. Meaning almost never does the application need zero and zero, right? Unless it’s like a financial trading firm or

[00:18:35] Prasanna Malaiyandi: Exactly. Yeah.

[00:18:36] W. Curtis Preston: Yeah. They are there right on the opposite end. I’ve be I’ve. I worked at a paper mill. Their, their RTO was two weeks

[00:18:45] Prasanna Malaiyandi: Yeah. And

[00:18:46] W. Curtis Preston: their RPO was two weeks as well.

[00:18:48] Prasanna Malaiyandi: And I would say for the companies that have a zero RTO, zero RPO, they’re not talking to the backup team. They’re probably talking to the storage infrastructure team. The compute team, right. Backup is just sort of like a, okay. If everything else fails, it’s the last line of defense. It’s not the first place I go in order to recover.

[00:19:08] W. Curtis Preston: Yeah. And, and by the way, that brings up a topic, which we should cover in this episode. And that is, should there be different RTOs and RPOs based on what happened? Okay. I, I would argue that it, it depends it depends. Okay. So, so let’s talk about RTO and RPO. What are they? So Prasanna, what is R T O

[00:19:30] Prasanna Malaiyandi: So RTO is basically recovery time objective. It’s basically an objective telling you if you needed to recover a data set, how long will it take you to bring it back?

[00:19:43] W. Curtis Preston: Right.

[00:19:43] Prasanna Malaiyandi: And the key here is it’s not in, this is where I like to differentiate versus what a lot of other people, it’s not just bringing back your data.

It’s actually bringing back your application to a good known state.

[00:19:56] W. Curtis Preston: Bingo.

[00:19:57] Prasanna Malaiyandi: See, I didn’t forget everything.

[00:19:59] W. Curtis Preston: Yeah, no good job that wasn’t a test by the way. Uh, but the, yeah, I I’d say mistake number one, that, that a lot of people make is that they think it means to restore time. It doesn’t, it means from the moment the outage happened to the moment the application and any related applications are back up and running in fully functional state.

Right. That, that is, that is the objective. We’re gonna, we’re gonna talk about the reality in a minute, but that, but that is the objective. That’s what you’ve agreed. You say, listen, another way to call this is, is an SLA, right? A service level agreement. You have an SLA with your stakeholders. That, what, what is that?

What was that?

[00:20:44] Prasanna Malaiyandi: that, that was me being like, I don’t like calling them SLAs. I like calling them SLOs.

[00:20:50] W. Curtis Preston: Okay. All right. All right. I’ll I’ll I’ll let that slide . So what would make it an SLA to you if you just agreed to that? Archie R RPO.

[00:21:02] Prasanna Malaiyandi: Or the RTO? Well, I think it’s, you need to be able to provably show that you are hitting that every single time, right. Rather than here’s objective, right. Because these are.

[00:21:16] W. Curtis Preston: But the agreement. I mean, yeah. I mean, we may, we may be, um, I don’t know if we’re, I mean, what an SLA is, it’s an agreement between two groups of people, maybe more than two groups of people. This is the agreement that we have made. This is what, this is the objective we’re gonna meet. So maybe the RPO and RTO is the, the metric an objective upon which you create an SLA.

All right. I, I, I won’t sure. Okay. I’ll, I’ll be fine with that. So, uh, so an RTO is essentially how long it takes to bring, you know, how long it should take to bring the application back up online. Right? So RPO recovery point objective is basically how much data you have agreed you’re allowed to lose it.

[00:22:07] Prasanna Malaiyandi: data. How you’re.

[00:22:10] W. Curtis Preston: So a as expressed by a matter of time, meaning you agree that you will allow you, you will allow a loss of one hour’s worth of data or 24 hours worth of data. And, and I would say at least going back in, back in back my day, um, RPO was the one we talked about the least, at least it’s the one we talked about the least.

Frankly, did that come out in English? the less, it was less frankly discussed because we all knew that we only backed up once a day and that all the backups didn’t work every day. And so we knew that the best we could do was a 24 hour RPO and that maybe it might be 48 or 72, depending on what day of the week it was and all that kind of stuff.

But nobody wanted to talk about.

[00:23:03] Prasanna Malaiyandi: That’s interesting being on the, from the vendor side, I remember focusing so much on the RPO side of things and less on the RTO side.

[00:23:13] W. Curtis Preston: Interesting. Interesting. Now, is that because.

[00:23:17] Prasanna Malaiyandi: a lot of it was really around replication, right. Where people do care more about the RPOs and the fact that, because a lot of it was disk based systems, storage appliances replicating from one to another, right. Typically your RTO was minutes or

[00:23:31] W. Curtis Preston: Yeah, the RTO was minutes. So that was easy peasy. That was way better than anything it had before. So then they’re like, okay, now let’s talk about the amount of data we’re gonna lose. Right. Ex exactly. So the,

[00:23:44] Prasanna Malaiyandi: But,

[00:23:44] W. Curtis Preston: thing is that sure.

[00:23:46] Prasanna Malaiyandi: one last thing in terms of the data you lose, right? That’s from the time a disaster strikes to going backwards in time. Right.

[00:23:55] W. Curtis Preston: Correct. It is the disaster happened. And then, you know, the, the amount of data that we, that we transactions, whatever it is that we put into the system that we’re agreeing we can lose because we had to restore from a backup that is 10 hours old or whatever it is. Right. We’re agree. We’re agreeing in advance that, you know, we need to. We need to lose less than four hours worth of data. Um, and then, you know, as we talked before, we have an SLA based around that. So what I, what I having spent so much time in the backup side, the, there was R there was RTO and RPO, but then there was some people call RPA. RTA others call RPR and RTR so that’s recovery point actual or recovery point reality, right?

One is one is recovery point objective. The other is reality, right? So that’s, this is something that you, as a backup person, and again, I use the term backup to be, to include any kind of recovery mechanism. This is something that you, as a backup person should know. You, you should be aware for every type of system that you have.

You should be aware of what the actual recovery time, at least your portion of the recovery time, you should know that the recovery time actual and the recovery point actual is X number of hours, and you should be able to communicate that effectively by say. And, and how would you know that.

[00:25:42] Prasanna Malaiyandi: Actually doing it and trying it out,

[00:25:44] W. Curtis Preston: Yes. There’s a word for that.

[00:25:47] Prasanna Malaiyandi: restore validation.

[00:25:50] W. Curtis Preston: Another word

[00:25:53] Prasanna Malaiyandi: Uh, Verify

[00:25:55] W. Curtis Preston: starts with a T

[00:25:58] Prasanna Malaiyandi: test. There we go. Test your backups.

[00:26:02] W. Curtis Preston: yeah, it’s a bit. Yeah, but all those words you said were all valid. Those were all very valid. It’s just the only way you’re gonna do this is to test it. You’ve got to, you’ve got to test your restores in order to know what your RTA and RPA are

[00:26:19] Prasanna Malaiyandi: Uh, one thing,

[00:26:21] W. Curtis Preston: go ahead.

[00:26:21] Prasanna Malaiyandi: no finish.

[00:26:23] W. Curtis Preston: Well, well be because the next phase we’re, we’re gonna talk about is how to come to some sort of agreement, right?

[00:26:33] Prasanna Malaiyandi: So

[00:26:33] W. Curtis Preston: like you’re in a meeting and they’re like, I want, I want an RTO and an RPO of zero. and then, then you should be able to say immediately, well, currently we could do 24 hours and 16 hours, whatever the number is. And then, and then you have a discussion right. But if you don’t know that you, you know, your Sol.

[00:26:56] Prasanna Malaiyandi: I think the one thing to also consider is like we had talked about recovery versus re recovery of an application versus restoring data. You may not be responsible for the end to end recovery of that application. You might only be responsible for a part. So just because that application team says, oh, I have four hours to recover my applications, don’t think that you have all four hours to get the data back, right? Because you might only be a small percentage of bringing up the entire application.

[00:27:25] W. Curtis Preston: Yeah. And part of that four hours may be equipment PR procurement. I don’t know why that was so hard for me to get out equipment procurement, right. It may be, there may be repair. There may be bringing in a vendor and, um, you know, all of that has to be figured into it. I I’ll, I’ll say this when you have a major outage, unless you’ve planned really well for it.

Like, like you, you have to be able to plan, like you have to have spare equipment available. You have to have spare storage capacity. You have to have spare computing capacity and you have to have a, some sort of recovery system rocking and rolling and ready to go. That’s the only way you’re gonna meet most modern RTOs and RPOs.

If you’re gonna wait to call a vendor to come replace a disc drive or a server, before you start your restore, you’re never gonna meet your RTO and RPO.

[00:28:24] Prasanna Malaiyandi: Well, and especially right now, because it’s still the, I guess technically the pandemic’s over we’re now in an endemic stage, but during the COVID pandemic, right. It was hard to get equipment, right. Supply chains, people showing up in offices. Right. So if you needed to add a server in order to be able to do the restores good luck trying to hit your normal RTOs

[00:28:46] W. Curtis Preston: Yeah, we, we may be. We may be in the endemic phase, but trust me, the supply chain problem is not over. Right. Um, the, I am aware of competitors of Druva’s that have several month lead times on their, on their new systems. So it, it, it, you know, it, it, the problem isn’t over. So the, you know, and of course we think of that as a competitive differentiator, of course.

Right. Cuz we don’t, we don’t have that issue cuz we’re, we’re a service.

[00:29:17] Prasanna Malaiyandi: One other thing I wanted to add about equipment is in the, this is where you can go to the extreme, right? You could say, okay, for every single system I have, I’m gonna double the capacity. Right. That way I never have to worry about bringing in equipment in case a site fails. Right. The thing though, you have to worry about is backup. People aren’t spending a whole lot of their budget on making sure there’s infrastructure ready for backup. So as someone working in backup, you need to make sure you figure out what are those mission critical applications that need to be immediately up and running, right? Where maybe I need to keep some percentage extra capacity in order to support that.

Right. What are sort of the things that, eh, if something happens, it might. Say a week to bring these back up. Maybe I don’t actually have equipment for that, for those things. Right. And that’s okay. But I think going and telling someone, oh yeah, your production budget. I need the exact same amount for backup.

Right? Usually doesn’t fly in a lot of corporations.

[00:30:13] W. Curtis Preston: Yeah. I mean, you know, and I’m talking about additional, like, I think this is about virtualization and cloud. The more virtuals I, the more virtualized you are, the more cloud focused you are, the easier this particular issue becomes, right? You just need one or two extra servers ready to go.

Um, not an entire, I’m just saying, I’m just saying you can, you can deal with a lot more

[00:30:37] Prasanna Malaiyandi: That’s true. Then

[00:30:38] W. Curtis Preston: by, you know, how much you need is, is, is. Gonna be up to you, but I’m just saying, if you, I’m just saying it’s easier if you’re virtualized, because if you’re virtualized and you have an application goes down because of some sort of data issue, you can easily restore that VM in another server without acquiring anything.

I guess that’s, that’s sort of what it, where I was going

[00:31:00] Prasanna Malaiyandi: which works. But the one thing I would caution is cloud is great. But if you have an entire disaster that strikes a geographic region, right. And everyone is trying to spin up in the cloud at the same time, cloud is still servers. So don’t think it’s something magical, right.

[00:31:18] W. Curtis Preston: No, it’s not magical, but you could, you could prepare for a multi, you could prepare for a different region cloud

[00:31:26] Prasanna Malaiyandi: Yep.

[00:31:26] W. Curtis Preston: recovery,

[00:31:26] Prasanna Malaiyandi: Exactly. Just make sure you look at your options,

[00:31:29] W. Curtis Preston: yeah. Yeah. Um, why, why we just gotta have a big butt man.

[00:31:34] Prasanna Malaiyandi: but I, I just, but I I’m just saying, I just wanna make sure people don’t think the cloud is something magical. Yes.

[00:31:41] W. Curtis Preston: cloud is not magic.

[00:31:42] Prasanna Malaiyandi: There are lots of

[00:31:43] W. Curtis Preston: It is not magical.

[00:31:44] Prasanna Malaiyandi: Yeah, there are lots of great benefits to it, but you just need to make sure you understand the limitations as well.

[00:31:51] W. Curtis Preston: So, so the title RTO and RPO are what drives backup design. So RTO drives the power, the, the speed of the system, because the, the beefier, the system, the, the quicker it’s able to restore, um, you know, the, the, the, the. Easier. You’re gonna be able to meet a tighter recovery time objective, right?

The RPO is what’s going to drive your backup frequency. If you have a one hour RPO and you’re backing up once a day, you are in trouble. Right. Um, so that, that’s what I meant B because all your backup decisions or your backup design decisions should be based on how they affect RTO and RPO.

[00:32:42] Prasanna Malaiyandi: Yep.

[00:32:43] W. Curtis Preston: And if, if you’re not, if you’re not doing that, then you are doing yourself a disservice.

[00:32:47] Prasanna Malaiyandi: And it’s also not to say that you will never be able to meet a one hour RPO with a backup system. Right. Once again, it also has to take into account not only the speed, but also the amount of data you have. Right. So take that into consideration as you’re looking at it, because maybe you have a database with not a lot of change rate.

That’s fairly small that yeah. One hour RPO. You can hit and probably like a 15 minute RTO, right. Perfectly well suited for that. But if it say grows from a small database to say 20 terabyte database, yeah. Maybe you’re not able to hit those same RPOs and RTOs. Right. I think that was one of the points you wanna make earlier, Curtis.

Right? Is it depends on not only the size, but I think you also wanna talk about the types of failures too.

[00:33:36] W. Curtis Preston: So I, I don’t think there’s any RTO or RPO you can’t meet. And I’m not. So I’m not sure I agree with what you said just a few minutes ago. I, I think I understand what you were trying to say, but it, it it’s, well, first off there is no RTO on RPO you can’t meet. With money, right. Regardless of the size of the database,

[00:33:58] Prasanna Malaiyandi: I should say picking a certain technology to use.

[00:34:02] W. Curtis Preston: Okay. Yeah. Yeah. Different technologies enable different RTOs and RPOs replication, and, you know, uh, CDP, continuous data protection. These are technologies that can, that can beat both a zero, uh, RTO and a zero RPO. They are expensive, you know, they are expensive. Uh, but the question is how much money are you losing when you’re down?

[00:34:28] Prasanna Malaiyandi: Yep.

[00:34:29] W. Curtis Preston: There’s that. So, okay. So if that’s how we decide on backup design and, and RTO and RPO are really important and I’ve never done that. How do I do that?

[00:34:44] Prasanna Malaiyandi: How

[00:34:45] W. Curtis Preston: you know, so yeah,

[00:34:46] Prasanna Malaiyandi: well, I think the starting point is go talk to your business stakeholders. right. I think, understand what they need, what are their requirements? And not just, oh, what do you want from RTO and RPO, but ask them the questions of what would the impact be if this application was down for a day, because that’ll change the answer they give you back.

[00:35:09] W. Curtis Preston: yeah, what is the financial impact? Do this app being down for a day or an hour, et cetera. And if they don’t have that data, then honestly they don’t deserve to be in their

[00:35:21] Prasanna Malaiyandi: Tier three, tier three.

[00:35:23] W. Curtis Preston: Yeah. Tier three. Yeah. So you get an RTO on an RPO of a week. If they don’t have that data, then I don’t know what to say.

Right. Um, but if they have that data and they, and they know that it’s a million dollars an hour, well, that helps you go and justify the amount of money that you need to spend. So you ask for that RTO and RPO, and then you say, well, our current system. As designed and as budgeted has an RTO, an RTA, an RPA, uh, you know, you could say that like in plain English, you could say it has the ability to meet an RTO of an hour, has an ability to meet an RPO of 12 hours, whatever the number is for you.

And, um, so. The, and then they were like, what, you know, and that’s when the conversation begins

[00:36:16] Prasanna Malaiyandi: then it’s a negotiation like at a car dealership.

[00:36:19] W. Curtis Preston: it’s absolute why’d you have to bring up car dealership.

[00:36:22] Prasanna Malaiyandi: Sorry. It’s not, maybe it’s not as painful as a car dealership.

[00:36:26] W. Curtis Preston: I had such a non-fun experience getting my wife, her, her new car, and it was. Why you gotta bring that up. Um, but yeah, it’s a, it’s a conversa, it’s a business discussion back and forth, right?

They could, they say, well, we want, we want, you know, an RTO of, of one hour. And you’re like, well, that’s, I’m sorry. That’s not possible. It’s totally possible. It’s just that it costs a, you know, a ton of money. So you, you have to, you have to come to a point where. You’re like I could. And, and then, and you know, it, it’s almost always starts like this.

They want an RTO of this, and you’re able to do an RTO of that and you need to go, you need to, right. You need to meet in the middle. Almost always. You’re going to need to make some technological changes in order to, to do that. Those technological changes have costs. You can go back to that business shooting and say, we can get to here.

It’s going to. 1 million. Right? And then they go, what? And then you go, well, we can get to, you know, we can get to here for a million. We can get to here for 25,000, right. Somewhere in there. There’s a, there’s a point of decreasing marginal returns. Right. Um, and you need to find that spot and get them to agree to that spot.

[00:37:48] Prasanna Malaiyandi: the one question you brought up earlier, Curtis, which might be worthwhile thinking about is, or discussing is you mentioned that it depends on the type of failure, right? When you’re talking about RTO and RPO, right. Does that come into this discussion as you’re talking to the

[00:38:03] W. Curtis Preston: I think it should. I think it should. This, this is a, this is a, um, a philosophical discussion. There are those who feel that all RTA, all RTOs and RPOs should be the same. Whether you deleted a file. Or, you know, you had a natural disaster take out your entire state. Um, I don’t personally feel that way.

I, I, I feel that for, for the most common type of, of things that happen, you should be able to, to have a pretty short RTO and RPO, right. You, you should be able, you know, I lost a file. Boom, boom. And that should be like a minute. Right. You know, it should not take a long. And I think that if there’s a major disaster, I think you will get some, some, um,

[00:38:55] Prasanna Malaiyandi: Leeway.

[00:38:55] W. Curtis Preston: understanding.

Yeah. Some leeway. Thank you. That’s a perfect word. But, but again, all that really matters is that this is just my opinion. It’s what, it’s what your company will, you know, pay for. If you’re going to have, if you’re, if you’re gonna have a, you know, the best RTO and RPO. For every kind of outage, then it’s just gonna cost you a whole lot of money.

[00:39:25] Prasanna Malaiyandi: yeah.

[00:39:26] W. Curtis Preston: As long as they’re willing to pay that money, then, you know, we’re all happy.

[00:39:30] Prasanna Malaiyandi: And also on the flip side, right? If your RTO and RPOs are short for the most common ones, and they’re long for these critical or for these unexpected outages set that expectation. So people aren’t yelling at you later, right. Set the expectation with the business and say, look, I will save you money. Right.

And here’s what it is in the most general cases. And yes, something catastrophic happens, then yes, here is now my new RPO or my RTO is gonna be, say three days. And as long as everyone’s okay with that, and it’s understood and documented, right? It’s something you can go forward with, cuz you’re saving a bunch of money because it’s all about risk.

Right? How often is that catastrophic event going to happen? Right. And is say three to five days to recover. Is that.

[00:40:16] W. Curtis Preston: Yeah, that’s it. That’s all we’re saying right. Is get the RTO and RPO decided upon and agreed upon beforehand and get the RTA and RPA. Hopefully the two should match, get them to match, but if they don’t match, by the way, that’s another scenario is we all know we should have a better RTO and RPO, but this is what our budget currently will allow due to market conditions, the condition of the company, whatever, as long as we all know that now, so that when something bad happens and then you go to do this large restore and it takes a really long time.

The, you know, they will, they will know that that’s the case. Right.

[00:41:04] Prasanna Malaiyandi: don’t wanna be left holding the bag

[00:41:08] W. Curtis Preston: You do not want to be, you do not want to be the one blamed for the long restore or the restore that lost an acceptable or an unacceptable amount of, uh, of data.

[00:41:19] Prasanna Malaiyandi: Yep.

[00:41:21] W. Curtis Preston: All right. Well, um,

[00:41:25] Prasanna Malaiyandi: You think we talked about,

[00:41:26] W. Curtis Preston: that time.

[00:41:27] Prasanna Malaiyandi: I, I gotta try to mix it up every once in a while, you

[00:41:30] W. Curtis Preston: You had, you had to, you had to argue with me, man. So you hurt, you hurt my feelings. all right. Well, well, it was good. Good stuff. Um, I hope you guys, um, hope you folks out there. Learned a thing or two, and maybe, maybe you didn’t agree. You know what? Come on, come on the podcast. Uh, we don’t even agree with each other sometimes. So, you know, we’d be, we’d be happy to have you on give us a comment.

And apparently if you make more than it has to be at least nine comments more than we have today on the apple podcast. Apparently I have to grow a beard nine or more.

[00:42:08] Prasanna Malaiyandi: Yep.

[00:42:09] W. Curtis Preston: Apparently I have to grow a beard for Christmas. Um, That’ll be, that’ll be something. All right. Well, and remember, of course, to subscribe so that you can restore it all.


Join the discussion

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: