Check out our companion blog!
May 8, 2023

Flash expert schools Mr. Backup

Flash expert schools Mr. Backup

A few weeks ago, Mr. Backup (W. Curtis Preston) said he didn't understand why people used flash for backups. He said it was overkill. A few days later, Howard Marks of Vast (friend of the pod) took issue with that statement, and asked for the chance to defend Vast's title, so to speak. Howard is a friend of the pod and we were happy to say yes. We also take the opportunity to get an update on Vast, and discuss their data reduction techniques in more detail. Bonus points if you get the cover art reference.

Mentioned in this episode:

Interview ad

Transcript
W. Curtis Preston:

Hi, and welcome to Backup Central's.

W. Curtis Preston:

Restore it all podcast.

W. Curtis Preston:

I'm your host, W.

W. Curtis Preston:

Curtis Preston, AKA Mr.

W. Curtis Preston:

Backup.

W. Curtis Preston:

And I have with me my continuing advisor on my consumer backup project

W. Curtis Preston:

Prasanna Malaiyandi, how's it going?

W. Curtis Preston:

Prasanna.

Prasanna Malaiyandi:

I'm good.

Prasanna Malaiyandi:

Curtis.

Prasanna Malaiyandi:

You know what?

Prasanna Malaiyandi:

We have the expert, the backup anorak Daniel Rosehill coming

Prasanna Malaiyandi:

next week on the podcast, so

W. Curtis Preston:

yeah.

Prasanna Malaiyandi:

can definitely pick his brains.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

You know, it's been interesting cuz, and it was funny, it was on the

W. Curtis Preston:

podcast where I really, I really, I had a moment where I was like,

W. Curtis Preston:

I'm not really backing up I photos.

W. Curtis Preston:

Right?

W. Curtis Preston:

Like, I'm, I'm backing up, you know, I, I use iCloud, but as, as we have

W. Curtis Preston:

discussed, iCloud is not a backup.

W. Curtis Preston:

iCloud is a sync.

W. Curtis Preston:

Right.

W. Curtis Preston:

And if something catastrophic, if you, if I ever got hacked, um, and somebody

W. Curtis Preston:

got a hold of my, my iCloud password or my iPhone and then just decided to

W. Curtis Preston:

massively delete everything, I, if I caught it soon enough, I would be okay.

W. Curtis Preston:

Cuz I do have like a deleted items thing.

W. Curtis Preston:

Right.

W. Curtis Preston:

And as we discussed, I have, well, I don't think we'd discuss on the pod,

W. Curtis Preston:

but as part of this project I found out I have 11,000 photos in, in iCloud.

W. Curtis Preston:

So,

Prasanna Malaiyandi:

isn't as much as a lot of other people, you know.

Prasanna Malaiyandi:

I'm sure there are

W. Curtis Preston:

know, I am not,

Prasanna Malaiyandi:

you.

W. Curtis Preston:

yeah.

W. Curtis Preston:

As, as I, as you and I were talking earlier, I'm not, you

W. Curtis Preston:

know, on one end, I'm not Cecil b Dilla and I'm not, you know,

W. Curtis Preston:

photographing and filming everything.

W. Curtis Preston:

On the other hand, I'm not Prasanna because you use your phone camera,

W. Curtis Preston:

like you use your, uh, Tesla,

Prasanna Malaiyandi:

Yeah, pretty much.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

Almost never

W. Curtis Preston:

Okay.

W. Curtis Preston:

So you've had your Tesla, how long now

Prasanna Malaiyandi:

four years,

W. Curtis Preston:

and how many miles do you have on it?

Prasanna Malaiyandi:

I think like 11,600, 700, something like that.

Prasanna Malaiyandi:

And it works great because my maintenance cost has been zero.

Prasanna Malaiyandi:

My electricity cost is really minimal versus gas,

W. Curtis Preston:

Cars are incredibly reliable when you don't use them.

W. Curtis Preston:

Um, anyway, yeah,

Prasanna Malaiyandi:

powered cars, even if you don't use them,

Prasanna Malaiyandi:

you still gotta change the oil.

Prasanna Malaiyandi:

You still gotta do everything else, you know?

Prasanna Malaiyandi:

So,

W. Curtis Preston:

I'm not, we're not doing a, we're not

W. Curtis Preston:

doing an e an ecar thing.

W. Curtis Preston:

But anyway, but yeah, we've, we've been having some fun with, uh, with this

W. Curtis Preston:

project of figuring out the various ways.

W. Curtis Preston:

Right.

W. Curtis Preston:

Uh,

Prasanna Malaiyandi:

Speaker:

and, and I think, yeah.

Prasanna Malaiyandi:

Speaker:

I was just gonna say, I think you should mention to the listeners what you're

Prasanna Malaiyandi:

Speaker:

current, what you are currently trying to do for backing up your iCloud photo

W. Curtis Preston:

my current, uh, uh, uh, I don't know what,

W. Curtis Preston:

I don't know what, what, what, I don't know these different methods.

W. Curtis Preston:

My current method that I am trying is Google Photos, and it turns

W. Curtis Preston:

out Google photos, it, it's the only one that I've found so far.

W. Curtis Preston:

Uh, well, the only one that I've.

W. Curtis Preston:

Well, there's maybe one other, which is iDrive, but Google Photos, cuz

W. Curtis Preston:

the problem is that on an iPhone you can turn on optimized storage.

W. Curtis Preston:

And so I have like, I don't know, somewhere between sixty

W. Curtis Preston:

and a hundred gigabytes.

W. Curtis Preston:

We're not quite sure of photos up in, uh, iCloud and, but I only have four

W. Curtis Preston:

and a half gigabytes on my phone because it's, I'm using the optimized storage.

W. Curtis Preston:

But apparently Google Cloud photo, Google Photos pulls down a high res

W. Curtis Preston:

whatever high, the original version from.

W. Curtis Preston:

iCloud and then backs

Prasanna Malaiyandi:

Speaker:

that, that's our theory.

Prasanna Malaiyandi:

Speaker:

That's our theory.

W. Curtis Preston:

That's the theory.

W. Curtis Preston:

Well, it's, I, that's what it says in documentation.

W. Curtis Preston:

We shall see what we shall see.

W. Curtis Preston:

Um, and then we will report on the results here and then,

W. Curtis Preston:

and, and I'll blog about it.

W. Curtis Preston:

I'll backup Central, uh, because IO iCloud is not a backup.

W. Curtis Preston:

The number of articles that I read that told me to use iCloud to

W. Curtis Preston:

back up my iPhone pissed me off.

W. Curtis Preston:

Right?

W. Curtis Preston:

Like, it, it was like 95% of the articles that I found on how to back up, uh,

W. Curtis Preston:

my photos basically said, OI cloud.

W. Curtis Preston:

I'm like, ah.

Prasanna Malaiyandi:

Because.

Prasanna Malaiyandi:

Because for most consumers, right, they're probably not going to do what

Prasanna Malaiyandi:

you're about to do, and they don't care.

Prasanna Malaiyandi:

And so turn on iCloud.

Prasanna Malaiyandi:

At least you have something else other than whatever's on your phone,

W. Curtis Preston:

Yeah.

W. Curtis Preston:

So we're gonna, we're gonna have an answer for the three people in the world,

W. Curtis Preston:

all of which are probably already on this recording, the three people in the

W. Curtis Preston:

world that actually care about having an actual backup of their, of their photos.

W. Curtis Preston:

Anyway, all right.

W. Curtis Preston:

Well, we're gonna bring back a, a longtime friend and a

W. Curtis Preston:

returned guest to our podcast.

W. Curtis Preston:

Uh, he is one of the few people in this industry that, um, make me feel young.

W. Curtis Preston:

Uh, we welcome.

W. Curtis Preston:

We welcome.

W. Curtis Preston:

And he's also the, uh, the technologist, extraordinary and

W. Curtis Preston:

plenty of potentially at Vast Data.

W. Curtis Preston:

Welcome to the podcast, Howard Marks.

W. Curtis Preston:

How's it going, Howard?

Howard Marks:

I'm really happy to be here cuz you know you guys went on your

Howard Marks:

little podcast and you said something about using flash for backup being stupid

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

I might have

Prasanna Malaiyandi:

But, but, but, but wait, I wanna clarify,

Prasanna Malaiyandi:

Howard, that was from Curtis.

Prasanna Malaiyandi:

I was the one who was like, yes.

W. Curtis Preston:

Oh, he's

Prasanna Malaiyandi:

What about Howard?

Prasanna Malaiyandi:

Oh,

W. Curtis Preston:

Speaker:

throwing me right under the

Prasanna Malaiyandi:

throw you under the,

W. Curtis Preston:

Speaker:

So, we'll, we'll, we'll

Howard Marks:

you don't have to explain that to me.

Howard Marks:

I've known Curtis 35 years.

W. Curtis Preston:

that's that.

W. Curtis Preston:

We will, we will, uh, we will, we'll get to that topic.

W. Curtis Preston:

I will give you a chance to defend your, your, your Honor.

W. Curtis Preston:

Um, why don't we start with an update.

W. Curtis Preston:

It's been a while since we've had you on the pod.

W. Curtis Preston:

Why don't we start with an update on, uh, how much more vast, vast

W. Curtis Preston:

data is, uh, since we had you on.

Howard Marks:

Well, you know, from the financial side, um, we announced at

Howard Marks:

the beginning of this year that we've hit a hundred million a year in annual

Howard Marks:

recurring revenue cuz we've organized ourselves as a software company even

Howard Marks:

though the experience customers have is look, appliances on my data center

Howard Marks:

call vast when something goes wrong.

Howard Marks:

Um, we arrange for customers to buy the hardware so that we are a software

Howard Marks:

company, makes life easier for us.

Howard Marks:

Um, the other big thing is that our friends at HPE

Howard Marks:

just made announcement of a product of theirs called GreenLake Files

Howard Marks:

that will be powered by our software.

Howard Marks:

So before today, if you wanted a scale out, expandable, low cost all flesh file

Howard Marks:

and object system, we would facilitate your buying hardware from the OEMs that we

Howard Marks:

deal with, and we'd sell you the software and you'd have a system that was running.

Howard Marks:

Now you can buy that from HPE as part of GreenLake, and that includes management

Howard Marks:

through the GreenLake Cloud front end.

Howard Marks:

So you can manage the GreenLake for files along with GreenLake for Block

Howard Marks:

and the Compute and all the other servers that are part of GreenLake.

Howard Marks:

So they've taken our software, married it to their control plane

Howard Marks:

and run it on their hardware.

Prasanna Malaiyandi:

And is for, sorry, for those who may

Prasanna Malaiyandi:

not be familiar with GreenLake.

Prasanna Malaiyandi:

GreenLake is more of a, I don't know, a managed or a hosted environment done by

Howard Marks:

It, it, it, it's an as a servicey.

Howard Marks:

So there, there are both consumption and CapEx models the way I understand it.

Howard Marks:

But you know, you don't log into the block array that and create a LUN.

Howard Marks:

You go to the cloud website and you create a LUN and their

Howard Marks:

control plane does that for you.

Howard Marks:

And so it's got more controls and you do don't have to keep the detail.

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

So then you're, you're probably paying

W. Curtis Preston:

for what you provision then,

Howard Marks:

Uh, you can do that or you either way.

Howard Marks:

And, you know, that's, that's, you know, kind of an H p E finance question.

Howard Marks:

But my understanding is they do it either way.

Prasanna Malaiyandi:

And I'm sure for vast data, right.

Prasanna Malaiyandi:

That's a huge win, being part of that offering.

Howard Marks:

Well, it's, you know, first of all, it just gets hundreds and hundreds

Howard Marks:

of boots on the ground out selling our software and our whole concept of you

Howard Marks:

can do all flash for as cheaper, cheaper than other guys can do spinning disk.

Howard Marks:

And why do you want spinning discs as opposed to flash?

Howard Marks:

Other than that, they're cheaper.

Howard Marks:

You know, I can't think of another advantage.

Howard Marks:

Um, and so we, you know, for most workloads have narrowed

Howard Marks:

that down or reversed it, and so, All flash cheaper than disk.

Howard Marks:

What a great idea.

Howard Marks:

Um, so we've got, you know, a, all those HPE sales guys going out there selling it

Howard Marks:

as an HPE product, you know, it's not like Qumulo or Scality, where HPE was reselling

Howard Marks:

those products to run on HPE servers.

Prasanna Malaiyandi:

Mm-hmm.

Howard Marks:

Um, and you know, who you called for support was.

Howard Marks:

Well, is this a server problem or is this a software problem?

Howard Marks:

It's HPE GreenLake for files.

Howard Marks:

HPE takes the support calls.

Howard Marks:

It's a full HPE product.

Howard Marks:

Um, it's our software underneath it.

W. Curtis Preston:

Speaker:

Yeah, it's interesting.

W. Curtis Preston:

Speaker:

So you know, you talked about the, you, you know, you said you, you.

W. Curtis Preston:

Speaker:

You, you have a r r.

W. Curtis Preston:

Speaker:

So basically your customers are paying an annual fee to you based

W. Curtis Preston:

Speaker:

on the size of their storage

Howard Marks:

we, so, so we make, we make our money on a, what

Howard Marks:

we call a Gemini subscription.

Howard Marks:

That is, you know, in capacity units, we sell it at a hundred terabytes.

Howard Marks:

HP can sell it in different ways, um, and it's per year.

Howard Marks:

And we guarantee that we'll write that agreement for any piece

Howard Marks:

of hardware for 10 years at the

W. Curtis Preston:

right, right.

W. Curtis Preston:

right.

W. Curtis Preston:

I remember

Howard Marks:

Because, you know, it's not spinning disks.

Howard Marks:

They don't start failing a lot more often in year five and six.

Howard Marks:

And so if you want to keep it for 10 years, keep it for 10 years.

Howard Marks:

If you decide you want to replace some of your hardware in your five

Howard Marks:

or six, because the new denser or faster hardware is more attractive to

Howard Marks:

you, uh, but you bought seven years of support, we'll transfer it on the,

Howard Marks:

you know, terabyte per terabyte basis.

W. Curtis Preston:

Right.

W. Curtis Preston:

Gotcha.

W. Curtis Preston:

Um, yeah, that's a pretty good deal for you.

W. Curtis Preston:

And by the way, I'll, I'll, um, I, I was gonna, uh, compare it to something

W. Curtis Preston:

else, but, but it, it made me remind me of our, uh, disclaimer Prasanna.

W. Curtis Preston:

And I work for different companies.

W. Curtis Preston:

I work for myself, he works for Zoom.

W. Curtis Preston:

And, uh, these are our opinions, not theirs.

W. Curtis Preston:

And, uh, be sure to rate us at, uh, your favorite podcast or give

W. Curtis Preston:

us all the stars and comments.

W. Curtis Preston:

It helps other people find us.

W. Curtis Preston:

If you think we're amazing, then maybe other people will do so as well.

W. Curtis Preston:

Uh, reach out to me, uh, @wcpreston on Twitter, or w Curtis Preston

W. Curtis Preston:

at gmail, and, you know, to be part of the conversation.

W. Curtis Preston:

And we'll see.

W. Curtis Preston:

Um, you know, we'll get you on.

W. Curtis Preston:

So your arrangement with HP reminds me of our arrangement with Dell.

W. Curtis Preston:

Basically it's the whole boots on the ground thing.

W. Curtis Preston:

Uh, you get to put your product in front of a whole, you know, giant number of

W. Curtis Preston:

other people and it's great for you.

W. Curtis Preston:

It's good for them.

W. Curtis Preston:

Their customers get the benefit of your, uh, technology with, with the company

W. Curtis Preston:

that they already, you know, know and

Howard Marks:

And you know, and, and we all know that there are loyal H P E

Howard Marks:

customers who you know now it's a lot more likely they'll buy this product cuz

Howard Marks:

it's got that stamp of approval on it,

Howard Marks:

all of which works for us.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Well, congratulations on hitting a hundred million.

W. Curtis Preston:

Um, wish you the best of luck on your way to the, you know,

W. Curtis Preston:

doubling that and triple in that.

W. Curtis Preston:

Um, last time you were on, we talked about, we, we alluded to, I think,

W. Curtis Preston:

a little bit about how you do dedupe or dedupe-like stuff that's a little

W. Curtis Preston:

different than the rest of the world.

W. Curtis Preston:

And, and, and that it's better, you know, these are the, you know, the,

W. Curtis Preston:

the, you're saying it's better.

W. Curtis Preston:

So I, I want to give you a chance to talk about that, and then

Howard Marks:

we, we guarantee it's, we guarantee it's better because I'm

Howard Marks:

a vendor and without a guarantee you shouldn't believe anything I say.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

That sounds good.

W. Curtis Preston:

So how so how, so first off, how is it

W. Curtis Preston:

better, and then why?

Howard Marks:

It's better cause it reduces data further.

Howard Marks:

Um, And the why is how it works.

Howard Marks:

So, you know, at the beginning it's really pretty simple.

Howard Marks:

We do variable chunk deduplication with a variation of the rock soft method.

Howard Marks:

So if there are insertions, we re re re-sync relatively quickly and the

Howard Marks:

deduplication gets more effective.

W. Curtis Preston:

Mm-hmm.

Howard Marks:

Um, we do z standard compression on the data.

Howard Marks:

We then throw some data specific encryption algorithms at the data for

Howard Marks:

things like, oh look, it's numeric data.

Howard Marks:

Well that means it's only gonna vary within this range.

Howard Marks:

We'll store deltas.

Howard Marks:

And so whichever of those compression methods reduces this block of data most.

W. Curtis Preston:

Mm-hmm.

Howard Marks:

we run.

Howard Marks:

Um, because we're doing so the, the data path is writes go to storage class memory.

W. Curtis Preston:

Mm-hmm.

Howard Marks:

then get act and then all of this data reduction happens

Howard Marks:

as we migrate from that writebuffer to the capacity flash layer.

Howard Marks:

And since it's after the act, as long as we're draining that buffer fast

Howard Marks:

enough, l how long in time it takes to move any piece is irrelevant.

Howard Marks:

And so we have time to go, ah, let's try five different compression algorithms.

Howard Marks:

Use whichever one works best.

W. Curtis Preston:

Interesting.

W. Curtis Preston:

Yeah.

Prasanna Malaiyandi:

do you do,

W. Curtis Preston:

go ahead.

W. Curtis Preston:

Go ahead.

Prasanna Malaiyandi:

do you?

Prasanna Malaiyandi:

And that's actually very interesting how you can, like you said, by

Prasanna Malaiyandi:

storing it in memory, right.

Prasanna Malaiyandi:

You're not impacting client latencies at all.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

For them it's like, Hey, right.

Prasanna Malaiyandi:

Went through.

Prasanna Malaiyandi:

And then you have this time to do the parallel, uh, computation.

Howard Marks:

Yeah, just, just accept it.

Howard Marks:

It's storage class memory, so it's an S s D, so it's persistent and

Howard Marks:

there's no batteries and protection and you know, a panic when power goes

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

Now, when you are running these algorithms, like I know AI and ML

Prasanna Malaiyandi:

is all the hot topic everywhere you look these days, right?

Prasanna Malaiyandi:

Are you guys doing anything around that in terms of trying to smartly detect

Prasanna Malaiyandi:

which compression algorithms based on

Howard Marks:

We we're, we're not doing that in the data path right now.

Howard Marks:

You know, frankly, the running the five doesn't use that much

Howard Marks:

compute that it's worth it.

Howard Marks:

Um, we're using AI in our cloud platform, so if you have multiple

Howard Marks:

clusters, there's a cloud site you can go to and see one dashboard.

Howard Marks:

Um, and we're using it for the capacity projections.

Howard Marks:

So it's like, oh look, here's how much capacity you're gonna need

Howard Marks:

six months from now while you're filling out your budget request.

Howard Marks:

Let me tell you what you're gonna, there's AI behind that so that it

Howard Marks:

smooths things like, oh look, every three months they do a cleanup.

Howard Marks:

And so let me factor that the AI is good enough to factor that kind of thing

Howard Marks:

in, but not in the data path.

Howard Marks:

But let's get back to the data path.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

yeah, your, um, your comment when you, you know, there

W. Curtis Preston:

was a comment you were like, as long as we're clearing the buffer quick enough.

W. Curtis Preston:

Um, and, and I would agree with you, um, you know, how, how do you ensure that

W. Curtis Preston:

that happens, I guess is, is one question.

Howard Marks:

Well, first of all, it becomes a parallelism issue.

Howard Marks:

So we have a large number of compute nodes, all of which are

Howard Marks:

draining this buffer in parallel.

Howard Marks:

And so when the buffer hits a high water mark, more threads

Howard Marks:

to D stage, it gets spawned and allocated across the parallel system.

Howard Marks:

Now, if there's a huge influx of writes, and you know, we're talking.

Howard Marks:

Tens of gigabytes per second for hours on the smallest system.

W. Curtis Preston:

Mm-hmm.

Howard Marks:

Um, then we'll start introducing latency into the

Howard Marks:

writes and apply back pressure.

W. Curtis Preston:

Okay.

W. Curtis Preston:

Okay, that makes

Howard Marks:

But, but you know, that's, you know, literally,

Howard Marks:

you know, it, it don't

Howard Marks:

ha it, you know, the mechanism is there just in case,

W. Curtis Preston:

right.

Howard Marks:

happen.

Prasanna Malaiyandi:

And when you, and because you have more than more

Prasanna Malaiyandi:

capacity, I guess, more throughput at the capacity level than at the storage

Prasanna Malaiyandi:

media level, is that why you can just increase the number of parallel

Prasanna Malaiyandi:

threads and you don't have to worry about the backend being a bottleneck?

Howard Marks:

so in, in, so our, our building block, we call a D Box

Howard Marks:

or a data box, and it's got some.

Howard Marks:

S scm SSDs.

Howard Marks:

We started with Opta.

Howard Marks:

We now mostly use K oxia FL six, and then a larger number of capacity SSDs.

Howard Marks:

And they, you know, it's whatever the cheapest we can get or the cheapest that

Howard Marks:

our OEMs use is, um, the P C I E lanes.

Howard Marks:

Feeding the small number of S C m SSDs is generally the bottleneck.

Prasanna Malaiyandi:

Okay.

Howard Marks:

And so we can paralyze reading data out of s C m, the

Howard Marks:

writing to cut to the capacity.

Howard Marks:

We have a lot more capacity ssd, so there's plenty of bandwidth to write

Prasanna Malaiyandi:

Gotcha.

W. Curtis Preston:

So what, why'd you stop using OC Octane?

Howard Marks:

Um, well first we decided just to get a second

Howard Marks:

source because it's a good idea.

Howard Marks:

Um, and then I, Intel

W. Curtis Preston:

turned out to be a really good

Howard Marks:

of, then, then Intel decided to get out of the business.

Howard Marks:

Um,

Howard Marks:

and, you know, we have supply agreements with Intel.

Howard Marks:

They still have a warehouse full of wafers.

Howard Marks:

Um, but it, you know, the, the performance advantage wasn't worth the complexity.

Howard Marks:

So we've chunked on these variable sized 32 K average blocks and we de-dupe them.

Howard Marks:

But in addition to running a strong hash.

Howard Marks:

To validate identical, we run a series of weaker hashes against the same

Howard Marks:

data blocks, and these weaker hashes are designed to generate the same

Howard Marks:

hash value for inputs across a narrow range of cryptographic distance.

Howard Marks:

So if two blocks have a sm, so cryptographic distance is the

Howard Marks:

number of bits you have to flip to turn block A into block B.

Howard Marks:

If block A is within X bits of block B, this hash will

Howard Marks:

generate the same hash value

W. Curtis Preston:

Okay.

Howard Marks:

from a data reduction point of view.

Howard Marks:

If two blocks generate the same hash value and are a small cryptographic

Howard Marks:

distance part, they have long common strings between them.

Howard Marks:

And will therefore re compress with the same compression dictionary.

Howard Marks:

So the first block that generates one of these similarity hashes, we just

Howard Marks:

compress and store when the second through MTH block generates the same hash.

Howard Marks:

We recall the first one and we used the dictionary from the first

Howard Marks:

one to compress the second one

Prasanna Malaiyandi:

So you get better compression

Howard Marks:

we can store it compressed without the overhead of

Howard Marks:

storing the dictionary a second time.

Prasanna Malaiyandi:

yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

Howard Marks:

and it becomes essentially the difference,

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

So instead of storing a bigger block, it's like just very, very small Deltas

Prasanna Malaiyandi:

because they are cryptographically

Howard Marks:

right?

W. Curtis Preston:

that,

W. Curtis Preston:

that's

Howard Marks:

similar.

Howard Marks:

The mathematicians would say it's a limited cryptographic distance.

Prasanna Malaiyandi:

That's unique.

Prasanna Malaiyandi:

I've never heard of someone doing that.

Prasanna Malaiyandi:

Have you, Curtis?

W. Curtis Preston:

just this guy that we had on the podcast a little

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

that looks a lot like Howard.

Howard Marks:

It, it, you know, nobody else is doing it now.

Howard Marks:

Um,

W. Curtis Preston:

Is it, are you patenting it or,

Howard Marks:

there are patents around it.

Howard Marks:

I don't, I haven't looked to see exactly

W. Curtis Preston:

gotcha.

W. Curtis Preston:

Gotcha.

Howard Marks:

to.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

That

Howard Marks:

Um,

Howard Marks:

cause reading patent applications makes my brain hurt.

W. Curtis Preston:

I see, I thought that you would, let's

W. Curtis Preston:

say you got two chunks, right?

W. Curtis Preston:

And you run the really weak, but much faster, I'm assuming, uh, hashing

W. Curtis Preston:

algorithm, and that you would say these two blocks definitely aren't the same.

W. Curtis Preston:

And so let's not do anything else other than com.

W. Curtis Preston:

They're not, they're nowhere.

W. Curtis Preston:

They're, they're, they're cryptographic distance.

W. Curtis Preston:

I think you said so far apart, there's no point in running the

W. Curtis Preston:

stronger ddu, uh, thing on it.

W. Curtis Preston:

Um, that's

W. Curtis Preston:

where I

Howard Marks:

it, it turns out, it turns out even with a weak hash, the

Howard Marks:

number of identical hashes that are not identical data is so small that

Howard Marks:

the cost of testing is ignorable.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

And especially if it's in flash, like probably

Howard Marks:

the,

Prasanna Malaiyandi:

is

Howard Marks:

compare is so rare.

Howard Marks:

It doesn't matter that it's expensive.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

And the fact that you're not doing this in line, right.

Prasanna Malaiyandi:

So it's all been de

Prasanna Malaiyandi:

Or

Howard Marks:

it, it's not trench.

Howard Marks:

It's in line, but it's not.

Prasanna Malaiyandi:

client.

Howard Marks:

act,

Prasanna Malaiyandi:

Yeah, yeah.

Prasanna Malaiyandi:

Exactly.

Howard Marks:

it's, it's it's post act, so it doesn't have any impact on latency.

Howard Marks:

But you know, the, the S CM is a one-way writebuffer.

Howard Marks:

We write new data into it, it gets demoted to the capacity flash layer

Howard Marks:

and there's so much bandwidth in the capacity flash layer that reads from

Howard Marks:

there actually faster than from the scm.

Howard Marks:

So there's no reason ever to promote it back.

W. Curtis Preston:

Right,

Howard Marks:

Um, but the other thing is we keep all the metadata in that s scm.

Howard Marks:

So as you expand the system, you add another enclosure that's got more s cm

Howard Marks:

and more capacity, the DUP hash table and the similarity hash tables all grow

Howard Marks:

with it.

Howard Marks:

So it's one data reduction realm regardless of how big a cluster is.

Howard Marks:

We don't have to store that DUP table in memory.

Prasanna Malaiyandi:

Yep.

Howard Marks:

And so you know the whole, well, flash would be great for backup,

Howard Marks:

except I can't afford it as well.

Howard Marks:

If you've got three or four conventional PBBAs,

Prasanna Malaiyandi:

Mm-hmm.

Howard Marks:

you know, first of all, the vendors of PBBAs charged

Howard Marks:

you a lot for that disc storage.

Howard Marks:

You know, they, that's a high margin product.

Prasanna Malaiyandi:

Yeah.

Howard Marks:

as soon as you have two of them, you have two deduplication realms.

W. Curtis Preston:

Right,

Howard Marks:

And we might talk about data duping 10 to one.

Howard Marks:

That doesn't mean all your data dupes 10 to one

W. Curtis Preston:

right.

Howard Marks:

50% of your data at least is unique.

Prasanna Malaiyandi:

Yep,

Howard Marks:

Some of your data ddus a hundred or a thousand to one, and most

Howard Marks:

of the benefits you get is from that data that ddus a hundred or a thousand to one.

Howard Marks:

Well, when you got two boxes, it's not a hundred or a thousand, it's 50 to 500.

Prasanna Malaiyandi:

yep.

Prasanna Malaiyandi:

And every time you add a new box, you're, you lose some of that benefit as well.

Prasanna Malaiyandi:

Right.

Prasanna Malaiyandi:

So,

W. Curtis Preston:

you can dup all our episodes down to

W. Curtis Preston:

like four or five comments.

W. Curtis Preston:

Right?

W. Curtis Preston:

Like back up, back up, all the things.

Howard Marks:

Well that

W. Curtis Preston:

3, 2, 1.

W. Curtis Preston:

Just 3, 2, 1.

Howard Marks:

that requires the next version of, uh, AI deduplication

Howard Marks:

that can take out the idle banter.

W. Curtis Preston:

Exactly, exactly.

W. Curtis Preston:

Our episodes will be like five minutes long.

Prasanna Malaiyandi:

So just to summarize or just to close on that, so

Prasanna Malaiyandi:

we talked about the how you guys do it.

Prasanna Malaiyandi:

So because of all these technologies that you're leveraging or mechanisms,

Prasanna Malaiyandi:

right, that's how you're able to offer that guarantee, right?

Prasanna Malaiyandi:

That's better than anyone else.

Howard Marks:

Yeah, we, you know, we use Z Standard.

Howard Marks:

It's a slightly newer compression algorithm than anybody else

Howard Marks:

does, cuz we started a little bit later than everybody else.

Howard Marks:

So we got to pick the latest one.

Howard Marks:

Um, and then we have those, you know, the additional, well, oh, they're numbers,

Howard Marks:

let's just store the differences, tricks.

Howard Marks:

And then we do deduplication on variable block, which is

Howard Marks:

as well as anybody does it.

Howard Marks:

And then we throw in similarity as, oh, here's another unique

Howard Marks:

trick nobody else does.

Howard Marks:

And so the combination is, we are confident that as long as you're send,

Howard Marks:

you know, we guarantee as long as you're sending us unencrypted data,

Howard Marks:

that will reduce it better than the other guy, whoever the other guy is.

Howard Marks:

And if we don't, we'll provide the capacity so that you

Howard Marks:

didn't pay any more money.

Howard Marks:

Cuz

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

No, that's a great guarantee for end users and customers, especially

Prasanna Malaiyandi:

with budgets these days, right?

Prasanna Malaiyandi:

It's like, Hey, I bought this system.

Prasanna Malaiyandi:

It doesn't quite meet my expectations.

Prasanna Malaiyandi:

I can't go back to my boss and ask for more money.

Prasanna Malaiyandi:

So

Howard Marks:

Well, you know, the other side of that is, you

Howard Marks:

know, just, it's really a very simple scale out architecture.

Howard Marks:

So you don't buy today what you think you're gonna need in three years.

Howard Marks:

You buy today what you think you're gonna need in a year, and then you

Howard Marks:

can buy more when you need it later.

Howard Marks:

Or if, as some of our customers have found out much to therin of

Howard Marks:

our sales guys, their data reduces better than they expected and they

Howard Marks:

don't need anymore in the next year.

Howard Marks:

Well then you're just ahead of the game.

Prasanna Malaiyandi:

So, and I know maybe you could talk in gener generalities,

Prasanna Malaiyandi:

but sort of like if I was a customer who had one of the competition, PBBAs, right.

Prasanna Malaiyandi:

And I now use Vast, right.

Prasanna Malaiyandi:

I buy a vast system, sort of like, is there, like what is the savings that

Prasanna Malaiyandi:

I normally see in terms of storage?

Prasanna Malaiyandi:

Like if I had like a hundred terabyte P B B A, actual

Howard Marks:

if, if you, you know, a hundred terabytes is small for us.

Prasanna Malaiyandi:

okay.

Prasanna Malaiyandi:

Or say

Prasanna Malaiyandi:

a

Howard Marks:

So if you had, if you had a petabyte P B B A, um, then you

Howard Marks:

know, you're probably storing four or five petabytes of logical data on

Howard Marks:

it.

Howard Marks:

Um, and you bought a, you know, a petabyte of usable from us and you'd

Howard Marks:

probably store 25 or 30% more on it.

Prasanna Malaiyandi:

Okay.

Howard Marks:

But that petabyte, P B B A is as big as you can buy that P B B

Howard Marks:

A, there isn't a two petabyte P B B A.

Prasanna Malaiyandi:

Yep.

Howard Marks:

And the real difference is at restore time

Prasanna Malaiyandi:

Hmm.

Howard Marks:

because PBBAs are scaled.

Howard Marks:

For backup speed, not restore speed.

Howard Marks:

They don't even have restore speed on the spec sheet anymore.

Howard Marks:

And backups are not sequential operations nearly as much as

Howard Marks:

you think they used to be.

Howard Marks:

And you

Howard Marks:

know, when

Howard Marks:

Curtis, when when Curtis changed block, block tracking, incremental forever,

W. Curtis Preston:

Right?

Howard Marks:

of those things make the backup and the restore much more random.

Prasanna Malaiyandi:

yep,

Howard Marks:

And so if you're backing up to a disc based p v a,

Howard Marks:

your restore speed is like a fourth or a fifth, you're backup speed.

Prasanna Malaiyandi:

yep.

Howard Marks:

If you're backing up to a vast, your restore speed

Howard Marks:

is five times your backup speed.

Howard Marks:

Cause we're, cuz we are designed to serve.

Howard Marks:

Re primary storage applications where reads happen much more frequently

Howard Marks:

than writes cuz the reads come from all the capacity SSDs, the

Howard Marks:

writes have to go to the s scm.

Howard Marks:

Um, and so what, where that really starts to get important is when, when we start

Howard Marks:

talking about ransomware attack, cuz 10 years ago Curtis and I used to teach

Howard Marks:

seminars and we'd go, yeah, 90, 95% of your restorers are, you know, the file.

Howard Marks:

Somebody screwed up.

Howard Marks:

And you know, if it's on A P B B A it'll be restored in a couple of minutes.

Howard Marks:

And if it was on

Howard Marks:

tape, you'd go find the tape and then a couple of minutes.

Howard Marks:

And so, but you don't know you've been ransomware attacked till

Howard Marks:

thousands or hundreds of thousands of files have been encrypted.

Prasanna Malaiyandi:

Yep.

Howard Marks:

And so now you have to like use something like instant recovery to

Howard Marks:

check back, you know, is this backup good?

Howard Marks:

You gotta do three or four quick looks without restoring, which

Howard Marks:

is a great feature, but you know, requires a relatively high speed

Howard Marks:

backend to work relatively well.

Howard Marks:

And then you're gonna find, okay, this is my last non good point.

Howard Marks:

And then you have to restore and you are gonna have to restore a lot

Howard Marks:

of data and restore speed starts to become really important then.

Prasanna Malaiyandi:

Mm-hmm.

Howard Marks:

And then the kicker is, and the lawyers in the insurance company

Howard Marks:

won't let you use the, the system that was infected for another couple of

Howard Marks:

weeks cuz it's evidence or we have to get somebody in to clean it and certify

Howard Marks:

that it's cleaned well, if you know you can run a VMware NFS data store

Howard Marks:

on VAs, you can just restore to VASc.

Howard Marks:

Now it's a bad idea to run your primary and your backup on the same

Howard Marks:

system for more than a day or two,

W. Curtis Preston:

Right,

Howard Marks:

but, Compared to not running your primary and just, you

Howard Marks:

know, if your choice is backup only or primary and backup, and if this one

Howard Marks:

system dies, I'm really in trouble.

Howard Marks:

Not that part of choice for me.

Howard Marks:

I want my users back up and running.

Howard Marks:

As soon as the lawyers let me get tacked to the old system, or my

Howard Marks:

VAR gives me a new system, or I have someplace else to storage, VMO

Howard Marks:

to, I'm getting that stuff off there right away.

Howard Marks:

But that might mean I'm up a week earlier and a week earlier is a lot of time.

W. Curtis Preston:

my objection to flash for backup has

W. Curtis Preston:

been for two primary reasons.

W. Curtis Preston:

One is, is expensive af right second.

W. Curtis Preston:

Do I really need it?

W. Curtis Preston:

Right?

W. Curtis Preston:

Like, because there's, there are a lot of things that we can buy in life, right?

W. Curtis Preston:

Uh, like I, I need to move fertilizer.

W. Curtis Preston:

I can totally borrow Prasannas, uh, Tesla and it will do it, right?

W. Curtis Preston:

But, but is that what I should be using for that?

W. Curtis Preston:

Do I need, do I need a Tesla to move fertilizer or will

W. Curtis Preston:

my Prius do

Howard Marks:

need Prasannas.

Howard Marks:

Tesla to move fertilizer if you ever want prasanna to speak to you again.

W. Curtis Preston:

no, that's, that's true.

W. Curtis Preston:

By the way, the Prius has been used to move fertilizer just for the record.

W. Curtis Preston:

Um, but so, so that's the thing.

W. Curtis Preston:

It's like, there, there are a lot of things, like, this goes back

W. Curtis Preston:

to the c d P, the c d P argument that I made back in the day.

W. Curtis Preston:

It was the same thing, the same two arguments.

W. Curtis Preston:

One was c D P was too damn expensive, right?

W. Curtis Preston:

And then the other was, does anybody actually need.

W. Curtis Preston:

The, the, the, the functionality that C D P provided.

W. Curtis Preston:

And the answer is yes.

W. Curtis Preston:

0.1% of the population needed what C D P provided.

W. Curtis Preston:

And that's why you don't really see c D P as a choice very, very often these days.

W. Curtis Preston:

Right there, there, there's a one or two companies that do it now, um,

W. Curtis Preston:

and all the other products have died.

W. Curtis Preston:

So those are my two arguments.

W. Curtis Preston:

It's, I, I already know what your argument to the second one is gonna

W. Curtis Preston:

be because you just gave it, I think.

W. Curtis Preston:

Um, so

W. Curtis Preston:

why

Prasanna Malaiyandi:

about for cost?

Howard Marks:

Well,

W. Curtis Preston:

talk about costs?

Howard Marks:

so, for cost, it depends what flash systems you're talking about.

W. Curtis Preston:

Mm-hmm.

Howard Marks:

Um, I will give you that most all flash systems are

Howard Marks:

designed to be fast as possible for a small amount of data, because that's

Howard Marks:

what you need to run the Oracle databases that make companies work.

Howard Marks:

And so if you, you know, it's a block, you know, it's block storage to be low

Howard Marks:

latency to support O L T P and therefore expensive because that's, you know, if

Howard Marks:

that system goes down, you count by the second how much money you're losing.

Howard Marks:

And so you have always bought expensive storage for that.

Howard Marks:

Um,

W. Curtis Preston:

sort of the, sort of the true normal sort of, if I, if

W. Curtis Preston:

I can use this word, pure flash array.

Howard Marks:

Yes.

W. Curtis Preston:

the, that type is designed for that, right?

W. Curtis Preston:

Um, that's technically pure with a small p, but it works the other way as well.

W. Curtis Preston:

Um,

Howard Marks:

talking either way, you know?

Howard Marks:

Yeah.

Howard Marks:

You know, I could name half a dozen other products, but

W. Curtis Preston:

And they're just too expensive.

Howard Marks:

of it, you know, it's, we're gonna design a system based on

Howard Marks:

having a, a pyramidal tiered system.

Howard Marks:

this is the one at the top.

Prasanna Malaiyandi:

Yeah.

Howard Marks:

And if you assume you're gonna build a tier system, then you

Howard Marks:

want the one at the top to be as fast as possible, and you kind of

Howard Marks:

don't care how much it costs because

Howard Marks:

you'll just put stuff that doesn't deserve it on the next tier.

Prasanna Malaiyandi:

Yep.

Howard Marks:

Philosophically our idea was we're gonna make something that

Howard Marks:

delivers performance for everything but the very, very top there.

W. Curtis Preston:

Mm-hmm.

Howard Marks:

And goes down in cost to where Well, if you use enough

Howard Marks:

of it, you don't need those tiers.

Howard Marks:

You don't need the complexity.

Howard Marks:

Right.

Howard Marks:

So, you know, part of our story is as you consolidate workloads, you have

Howard Marks:

workloads that need performance, and you have workloads that need capacity.

Howard Marks:

When you add capacity, performance comes with because in,

Howard Marks:

you know, spindles, how many SSDs?

Howard Marks:

Yeah.

Howard Marks:

A hundred SDS is so much performance.

Howard Marks:

200 SDS is twice that much performance.

Howard Marks:

So if you take the applications that need capacity, And you put them on the

Howard Marks:

same system as the applications that need performance but don't need capacity.

Howard Marks:

The performance that the capacity creates is used by the applications

Howard Marks:

that need the performance and the cost of the performance is brought down

Howard Marks:

because you've used that much capacity and you get in a vir virtuous cycle.

W. Curtis Preston:

I think I followed that.

Prasanna Malaiyandi:

yeah, it, it, it's basically

Prasanna Malaiyandi:

by

Howard Marks:

if you, if

Prasanna Malaiyandi:

that's

Prasanna Malaiyandi:

common.

Howard Marks:

if you, you're, paying 10 x for 10% and one x for

Howard Marks:

90%, then you're paying a hundred.

Howard Marks:

If you have one tier that costs a hundred,

W. Curtis Preston:

Mm-hmm.

Howard Marks:

why have two tiers?

Prasanna Malaiyandi:

Yeah.

Howard Marks:

And when you use capacity, that capacity comes with performance.

W. Curtis Preston:

Mm-hmm.

Howard Marks:

And that means that performance is available

Howard Marks:

to other applications that didn't need the capacity.

Howard Marks:

So you don't need to have separate systems, you just have

W. Curtis Preston:

So, so if I could, if I could try to put this

W. Curtis Preston:

in, in, in just different words, but it'll say the same thing.

W. Curtis Preston:

If I've got a hundred QLC disks, right.

W. Curtis Preston:

Um, and, and these are how big

Howard Marks:

15 or 30 terabytes.

W. Curtis Preston:

the each, each one, right?

Howard Marks:

Each one

W. Curtis Preston:

So if I've got a hundred, I've got one and a half

W. Curtis Preston:

tear, one and a half petabytes.

W. Curtis Preston:

Did I

W. Curtis Preston:

do that

W. Curtis Preston:

right of raw?

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Okay.

W. Curtis Preston:

All right.

W. Curtis Preston:

So I've got one and a half petabytes of raw capacity.

W. Curtis Preston:

And what you're saying is we can just take a slice off the top, if you will.

W. Curtis Preston:

You know, we used to call short stroking the discs.

W. Curtis Preston:

The, obviously you don't need to short stroke a, a flash, but you're

W. Curtis Preston:

basically saying, we're just gonna take a slice off the top, uh, of these

W. Curtis Preston:

150 discs and we're gonna get this massive performance slice, uh, for,

W. Curtis Preston:

for the 10% that need that performance.

W. Curtis Preston:

And then the rest will just put wherever we need to put it.

W. Curtis Preston:

Is that, Does that sound about

W. Curtis Preston:

right?

Howard Marks:

I'm, saying all those SSDs create one pool of

Howard Marks:

performance and one pool of capacity,

W. Curtis Preston:

Right.

Howard Marks:

and a workload can draw from either one as much as it needs.

W. Curtis Preston:

Gotcha.

Prasanna Malaiyandi:

Is it separate capacity and performance pools that you

Prasanna Malaiyandi:

then assign to Applic?

Prasanna Malaiyandi:

Okay.

Prasanna Malaiyandi:

It's just one

Prasanna Malaiyandi:

pool that includes both

Howard Marks:

that, and now, you know, and you can use q o s, you can say,

Howard Marks:

okay, this workload gets a hundred thousand iops, or, or 50 gigabytes

Howard Marks:

per second, and this other one gets

Howard Marks:

different.

Prasanna Malaiyandi:

yeah,

W. Curtis Preston:

yeah,

W. Curtis Preston:

And

W. Curtis Preston:

you'll just use

Howard Marks:

And so you

W. Curtis Preston:

you need to

Howard Marks:

performance, right?

Howard Marks:

But you know, if you've got, um, you know, your backups and you've got the developers

Howard Marks:

who wanna do run, live copies of the database, well run it all on one system.

Howard Marks:

It's, it's an all flash system.

Howard Marks:

It's fast enough to run the database.

Prasanna Malaiyandi:

It's almost as if you're saying, You've built a

Prasanna Malaiyandi:

system that works for all workloads except that 1% or whatever, that's

Prasanna Malaiyandi:

like that very, very, very high end.

Prasanna Malaiyandi:

And you're saying you have one common architecture that allows it

Prasanna Malaiyandi:

to deal with, regardless of if your workload is capacity focused and not

Prasanna Malaiyandi:

very performance, it doesn't need a lot of performance or it's high

Prasanna Malaiyandi:

performance and maybe a little capacity.

Prasanna Malaiyandi:

It's all a

Howard Marks:

and and it doesn't matter whether your definition of

Howard Marks:

performance is bandwidth or iops.

Howard Marks:

You know, it's like all but that very lowest.

Howard Marks:

You know, we, you know, we're an all flash system lightly loaded.

Howard Marks:

We deliver one millisecond latency.

Prasanna Malaiyandi:

yeah,

Howard Marks:

You know, some systems can deliver half that

Howard Marks:

and some rare applications care.

Howard Marks:

But you know, between that and the 10, Tencent, a gigabyte, well,

Howard Marks:

there are 20 terabyte hard drives and super micro servers and you

Howard Marks:

know, they don't do any iops, but you can write to 'em pretty fast.

Howard Marks:

You know,

Prasanna Malaiyandi:

Yeah.

Howard Marks:

in between we can cover.

W. Curtis Preston:

So we're dancing around.

W. Curtis Preston:

You're saying why you could be cheaper, but let me,

W. Curtis Preston:

let me just put a, lemme just put it right, you know, sort of, I'm

W. Curtis Preston:

assuming that you get into competitive bids with PBBAs on a regular basis.

Howard Marks:

Yes, sir.

W. Curtis Preston:

Okay.

W. Curtis Preston:

How do you do there?

Howard Marks:

They're easy.

Howard Marks:

Those are very high profit margin products for

Howard Marks:

the

W. Curtis Preston:

so you're, so you're, saying you can come in

W. Curtis Preston:

less expensive than the effective price of the typical P B B A, even

W. Curtis Preston:

though you're using all this flash.

Howard Marks:

Yes, sir.

W. Curtis Preston:

Okay.

W. Curtis Preston:

Because that, that's the short answer.

W. Curtis Preston:

I like the long answer.

W. Curtis Preston:

That's, I like the long answer.

W. Curtis Preston:

You and I

W. Curtis Preston:

live in long, right.

W. Curtis Preston:

Um, yeah, but in the end, it doesn't matter if it's still more expensive.

Howard Marks:

yeah, the, you know, the long answer is, you know, we use the

Howard Marks:

cheapest flash we can get because we designed the system to treat flash well

Howard Marks:

and understand how to minimize wear.

Howard Marks:

We ha our erasure codes have 3% overhead at I at large scale, so we're not wasting.

Howard Marks:

Space on raid, we reduce data better than anybody else does.

Howard Marks:

So you know, we're getting as much capacity in there.

Howard Marks:

Um, and then when you start saying, okay, it's 30 terabyte SSDs, so you get a lot

Howard Marks:

of capacity and a little bit of space and a little bit of power, and the power

Howard Marks:

and Rackspace start to add up as costs.

Howard Marks:

Um, especially when you start looking at the fact that the leading PBBAs are

Howard Marks:

still using eight terabyte hard drives because that rehydration tax of turning,

W. Curtis Preston:

Hmm.

Howard Marks:

making everything random, well the bigger the hard

Howard Marks:

drive gets, the worse it is cause.

Howard Marks:

One hard drive is a hundred iops.

Prasanna Malaiyandi:

Yep.

Howard Marks:

Doesn't matter whether it's a one terabyte hard

Howard Marks:

drive or a 20 terabyte hard drive.

Howard Marks:

And so they're just reaching the, the world.

Howard Marks:

The land of diminishing returns on IO density.

Howard Marks:

They can't go any lower.

Howard Marks:

And now the sheet metal and the power supplies and the SaaS

Howard Marks:

expanders are becoming a larger and larger percentage of their cogs.

Howard Marks:

And they mark 'em up a lot cuz there's a lot of IP in there in terms of

Howard Marks:

software and they have to make a margin.

Howard Marks:

Um, and so we just don't have most of those problems.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

you're, you're, also marking up due to your ddo, right?

W. Curtis Preston:

I mean, you

Howard Marks:

Yeah.

Howard Marks:

Or you know, some of it, you know, some small portion of the difference is, you

Howard Marks:

know, compared to the guys, those the best P VBAs, we still do a little bit better.

Howard Marks:

but but when we go into a customer who says, no, no, no price, this as

Howard Marks:

if you de-dupe the same, we're still coming in with a lower selling price.

Prasanna Malaiyandi:

Yeah, I, I'm not surprised about that.

Prasanna Malaiyandi:

The other thing, Howard, I wanted to bring up, I know you

Prasanna Malaiyandi:

mentioned sort of dis drives and the a hundred iops limit, right?

Prasanna Malaiyandi:

That each of them typically have.

Prasanna Malaiyandi:

The other thing that I've also seen is as the drives get larger and larger, anytime

Prasanna Malaiyandi:

you have to do a raid, rebuild, right?

Prasanna Malaiyandi:

And you're talking like a 20 terabyte drive and it just takes longer and longer,

Prasanna Malaiyandi:

and now there's a potential for failure,

Howard Marks:

Yeah.

Howard Marks:

Well,

Prasanna Malaiyandi:

becomes a lot worse.

Howard Marks:

you know, I do a lot of, you know, resilience calculations

Howard Marks:

and people just don't realize how big a factor rebuild time is

Howard Marks:

in the probability of data loss.

Howard Marks:

Uh, we had one customer share with us.

Howard Marks:

They ran, you know, the leading.

Howard Marks:

Scale out system before us and the average for their rebuilds was 53 days.

Prasanna Malaiyandi:

Oh

Prasanna Malaiyandi:

wow.

W. Curtis Preston:

That's two months.

Howard Marks:

yeah, that's two months during which time your data is exposed

Howard Marks:

and you know, could be slightly exposed if you are already running.

Howard Marks:

N plus three could be really exposed if you're running n plus

Howard Marks:

one, like some vendors recommend.

Howard Marks:

So it all depends.

W. Curtis Preston:

right.

W. Curtis Preston:

Okay.

W. Curtis Preston:

So, so I think, I think, you know, you, you've definitely

W. Curtis Preston:

covered the cost argument.

W. Curtis Preston:

Um, the, and, and it's, I think if we just back up, you've

W. Curtis Preston:

already covered the why, right?

W. Curtis Preston:

The why.

W. Curtis Preston:

would, why

W. Curtis Preston:

is today's Restore different?

Howard Marks:

Ransomware.

W. Curtis Preston:

Yeah.

Howard Marks:

stores.

Howard Marks:

The stores bigger.

Howard Marks:

And the restore location is less well known

W. Curtis Preston:

What do you mean by that?

Howard Marks:

you may not be able to restore back to the infected

Howard Marks:

system cuz it's still evidence,

W. Curtis Preston:

okay.

W. Curtis Preston:

Understood.

W. Curtis Preston:

Okay.

Howard Marks:

right?

Howard Marks:

You need someplace to restore to.

Howard Marks:

And you know, having it where the primary in the backup are duped to each other

Howard Marks:

probably gives you that in a pinch.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

Have you

Prasanna Malaiyandi:

seen.

Howard Marks:

emphasize in a pinch, cuz

W. Curtis Preston:

Yeah,

Howard Marks:

you know, we've, we've all been trained, no, no bad idea.

Howard Marks:

Don't mix the strip, don't cross the streams.

Howard Marks:

Um, but, but that's the, if you have a backup on the primary, you

Howard Marks:

don't, you don't have a backup.

Howard Marks:

But if you have to choose between backup and primary, I'd rather have primary.

Prasanna Malaiyandi:

Have you seen customers actually do this

Prasanna Malaiyandi:

in the field with fast systems?

Howard Marks:

Oh, we have several customers doing really

Howard Marks:

large scale backup to Vasst.

Howard Marks:

Um, we had one customer who was kind of shocked cuz they were

Howard Marks:

doing encryption in net backup.

Howard Marks:

And so they expected us to not reduce data at all, uh, but they were doing encrypted

Howard Marks:

net backup backups of Oracle dumps of the same database over and over again,

Howard Marks:

encrypted with the same encryption key.

Howard Marks:

And we started seeing about 20% reduction just because even when you encrypted, if

Howard Marks:

you're backing up the same data, it looks

Howard Marks:

the same encrypted as it

W. Curtis Preston:

right.

W. Curtis Preston:

There's like sort of two questions in my head here.

W. Curtis Preston:

One is, and, and they're, they're very much related.

W. Curtis Preston:

One is the, the whole backup container problem, right?

W. Curtis Preston:

Meaning that you get the net backup container and the.

W. Curtis Preston:

Arc serve container and the backup exec container, you know, and they

W. Curtis Preston:

all stored backup data differently.

W. Curtis Preston:

And you have that issue.

W. Curtis Preston:

And then you, but you, there was something that you alluded

W. Curtis Preston:

to that I found interesting.

W. Curtis Preston:

You said commonality between the backup and the primary, but the

W. Curtis Preston:

backup is in some weirdo format.

W. Curtis Preston:

So are you able to get backup or commonality between the

W. Curtis Preston:

backup and the primary?

Howard Marks:

Now in that case, it's

Howard Marks:

more likely we'll see commonality between multiple primaries.

Howard Marks:

You know, it's more like you

Howard Marks:

restored 17 windows VMs

W. Curtis Preston:

Does the way that you're doing ddo make the

W. Curtis Preston:

format problem any less problematic?

W. Curtis Preston:

Right.

Howard Marks:

O Only in that we reduce them all as opposed to if you were relying

Howard Marks:

on the data movers to do reduction.

Howard Marks:

So, you know, kind of the most common case is the storage guys like backup,

Howard Marks:

you know, com Vault or net backup or Veritas or Veeam or whatever they use.

Howard Marks:

And the Oracle DBAs don't trust them and insist on doing, doing, dumps.

W. Curtis Preston:

Right.

Howard Marks:

And so, you know, if you're doing both to, you know, they

Howard Marks:

just give the Oracle DBAs, okay dump to this NFS mount point on the vast.

Howard Marks:

Well then we'll reduce all of those dumps as well as anybody could reduce

Howard Marks:

all of those dumps and your data mover.

Howard Marks:

You'll do data reduction at multiple stages to manage the network traffic.

Howard Marks:

And then we'll do the final dup at the end cuz we're finer grained.

Howard Marks:

And the sim similarity works really well for things that are duped course

Howard Marks:

grain, cuz the edges all look similar.

Howard Marks:

And so when

Howard Marks:

we, when we run, you know, we have a probe you can get as a VM that

Howard Marks:

scans your data and reports back, this is how much it would reduce.

Howard Marks:

And this is how much of that comes from each of these techniques.

Howard Marks:

And so when we run, when we do that with data from a data mover,

Howard Marks:

d duper, those are usually, you know, 128 K or big blocks because

Howard Marks:

they have limited memory available.

Howard Marks:

And so we see more similarity cuz we're finding those pieces Finer

W. Curtis Preston:

just to, just to make sure I understood correctly.

W. Curtis Preston:

So the, one of the question that I didn't really ask was, you know,

W. Curtis Preston:

when you buy a, you know, pick your favorite P V B A, they tend to support.

W. Curtis Preston:

These five backup products, and if you buy a different backup

W. Curtis Preston:

product, well, they're like, well, we don't understand that format yet.

W. Curtis Preston:

And so then they have to go and do some development work to figure

W. Curtis Preston:

out how to crack that container.

W. Curtis Preston:

Do you not have that problem or have you done that

Howard Marks:

We, We, have not optimized for any of these backup applications,

W. Curtis Preston:

And yet you

W. Curtis Preston:

still get better duped than the other guys.

Howard Marks:

Our, our general case data reduction against all of these reduced

Howard Marks:

data types still gets better reduction.

Howard Marks:

You know, we are not, you know, scanning for the timestamps in Oracle rack

Howard Marks:

dumps and, you know, that level stuff.

W. Curtis Preston:

Right.

Howard Marks:

Not to

Prasanna Malaiyandi:

agnostic, right?

Howard Marks:

to say we won't in the future, but our, you know, our data

Howard Marks:

reduction was written for primary storage.

Howard Marks:

It just so happens that being an

Howard Marks:

N F S or an S3 target for a backup data mover is a simple case of primary storage,

Prasanna Malaiyandi:

Yeah.

Howard Marks:

it just works.

W. Curtis Preston:

Right.

W. Curtis Preston:

Right.

W. Curtis Preston:

Hmm.

W. Curtis Preston:

What do you think Prasanna,

Prasanna Malaiyandi:

in the future?

Prasanna Malaiyandi:

So I, I had no complaints to start with.

Prasanna Malaiyandi:

Uh, the one question

W. Curtis Preston:

I lost this argument?

W. Curtis Preston:

I think I might

Prasanna Malaiyandi:

I think, I think you lost this one.

Prasanna Malaiyandi:

Uh, Howard, the one last question I had was, I know some of these backup vendors

Prasanna Malaiyandi:

support the ability to do source side due duplication by integrating with

Prasanna Malaiyandi:

the purpose-built backup appliances.

Prasanna Malaiyandi:

Does VAs support that?

Prasanna Malaiyandi:

Are you guys planning to support that?

Prasanna Malaiyandi:

I know you're looking, you just

Prasanna Malaiyandi:

previously said, right, that you're

Howard Marks:

don't, we don't, um, I've never been really comfortable with the

Howard Marks:

use of client side CPU for that cuz client side CPU is valuable for other things.

Howard Marks:

Um, I think, you know, doing a pass at some level in the data mover.

Prasanna Malaiyandi:

Mm-hmm.

Howard Marks:

It's like, okay, we'll we'll de-dupe at the media server at some

Howard Marks:

large grain so that we're not transferring 50 copies of windows over the network.

Howard Marks:

Um, is perfectly reasonable thing to do cuz it's a network

Howard Marks:

bandwidth management technique.

Howard Marks:

Um, things like Didi Boost are, you know, let's offload this from the,

Howard Marks:

the P B B A to the client and we'd just rather do the work ourselves.

Howard Marks:

Um, and in our architecture, since you can just add more servers at the front end

Howard Marks:

and you just have to buy the servers, we don't even charge for that software.

Howard Marks:

If you need more compute.

Howard Marks:

To do more

Prasanna Malaiyandi:

you just scale out.

Howard Marks:

to more dup, you just add a few more servers as opposed to stealing 5%

Howard Marks:

of the cycles of all of your VMware hosts, which means you now have to not just

Howard Marks:

buy servers, but you have to buy another VMware host, another VMware license.

Howard Marks:

All the other stuff you put on a VMware host starts to add up.

Prasanna Malaiyandi:

Yeah.

Prasanna Malaiyandi:

Gotcha.

W. Curtis Preston:

Well since, well, since you stepped into my neighborhood

W. Curtis Preston:

now Howard, I will have to say that source side, DUP done correctly,

W. Curtis Preston:

speeds up the backup and reduces the C P U utilization on the client.

W. Curtis Preston:

But I, I want, I can't speak to the, to the implementations

W. Curtis Preston:

you were talking about.

W. Curtis Preston:

Um, I can only speak to the one that I am obviously very familiar with.

W. Curtis Preston:

Um, cuz there, you know, there that, that is the off discussed

W. Curtis Preston:

thing of like, well there is a,

Howard Marks:

It's

W. Curtis Preston:

know, there's a

W. Curtis Preston:

pen.

Howard Marks:

it's also a different case because of the assumed

Howard Marks:

bandwidth at all the stages.

W. Curtis Preston:

right.

Howard Marks:

You know, I'm, I'm kind of assuming that there's

Howard Marks:

a lot of bandwidth for short

Howard Marks:

distances in the data center

W. Curtis Preston:

Well, all right.

W. Curtis Preston:

I, I concede this battle, Howard, I lay down my sword.

W. Curtis Preston:

Um,

Howard Marks:

Okay.

Howard Marks:

I

W. Curtis Preston:

you know, I mean, you, what's that

W. Curtis Preston:

You expect

W. Curtis Preston:

to what?

Howard Marks:

in the mail,

W. Curtis Preston:

Um, yeah, I'll send you, I'll send you something.

W. Curtis Preston:

Um, alright, well, uh, Howard's been great.

W. Curtis Preston:

Uh, glad to hear the update and glad to, you know, I, I remember we did,

W. Curtis Preston:

now that I heard you describe it, I, I think we did cover it in the last one,

W. Curtis Preston:

but I think you went deeper this time and that, that's good to hear this idea

Howard Marks:

probably.

W. Curtis Preston:

you can, that you have the, that you have the, the bandwidth

W. Curtis Preston:

to, to, to, to how many different ways did you say you try each block

W. Curtis Preston:

for

Howard Marks:

there's five compression algorithms and, and then there's a strong

Howard Marks:

hash and a number of similarity hashes.

Howard Marks:

I can't remember offhand what they are,

W. Curtis Preston:

Gotcha.

W. Curtis Preston:

I got, I thought I heard you say 15 total ways.

W. Curtis Preston:

I thought I

W. Curtis Preston:

heard you say

W. Curtis Preston:

that.

Howard Marks:

it on that order

W. Curtis Preston:

Gotcha.

W. Curtis Preston:

So the fact that you can take each chunk and try 15 different ways to

W. Curtis Preston:

compress it and pick the one that works the best is pretty damn cool.

W. Curtis Preston:

Um, and, um, you

W. Curtis Preston:

know, it's just on

Howard Marks:

of it's just cuz we can parallelize it so well

Prasanna Malaiyandi:

Yeah.

W. Curtis Preston:

that too, right?

W. Curtis Preston:

Uh, the

W. Curtis Preston:

fact that, you can, you know, that's a.

Howard Marks:

one takes.

Howard Marks:

We're doing a lot.

Prasanna Malaiyandi:

Yeah,

W. Curtis Preston:

it's that, that beauty of the scale out architecture, right?

W. Curtis Preston:

That you can just pass that out, um, like that.

W. Curtis Preston:

All right, well, thanks for coming back, especially to, it's really funny

W. Curtis Preston:

how we had you, you're like, you're listening and you're like, Hey, you said

W. Curtis Preston:

mean things about the way I do things.

W. Curtis Preston:

I will, I accept your challenge.

W. Curtis Preston:

And I'm like, all right, come on back.

W. Curtis Preston:

Uh, happy to do that.

W. Curtis Preston:

And we'll do that with other people.

W. Curtis Preston:

By the way, if you're out there listening and you're like, the thing that Curtis or

W. Curtis Preston:

Prasanna said is wrong, we will be happy to have you on and have us prove to you

W. Curtis Preston:

why you're wrong or, or in this case,

W. Curtis Preston:

In this case, uh, right.

W. Curtis Preston:

Yeah, we concede.

W. Curtis Preston:

Well, I mean, I mean, I think my concerns are certainly valid and

W. Curtis Preston:

there are certainly vendors out there that are like, yes, we can

W. Curtis Preston:

certainly sell you this appliance for the purposes of backup, because

W. Curtis Preston:

recovery speed is really important.

W. Curtis Preston:

I'm like, but it costs five times the cost of this thing over here.

W. Curtis Preston:

I don't like how much better could it possibly be Anyway,

W. Curtis Preston:

so that's, that's where those arguments tend to come from, so.

W. Curtis Preston:

Alright, well thanks.

W. Curtis Preston:

Thanks Howard for

Howard Marks:

and given, given the marketplace, they're

Howard Marks:

not completely unreasonable.

Howard Marks:

Uh, you know, we, we just do things sufficiently different that,

Howard Marks:

you know, if you think restore speed's important than we do,

W. Curtis Preston:

right, right.

Prasanna Malaiyandi:

Yep.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Yeah.

W. Curtis Preston:

Ransomware.

W. Curtis Preston:

Ransomware.

W. Curtis Preston:

All right.

W. Curtis Preston:

Once again, ransomware, you know, I don't know.

W. Curtis Preston:

What do you, what do you call it?

W. Curtis Preston:

Uh, trump's all, um, although I don't enjoy that word as much

W. Curtis Preston:

as I used to for some reason.

W. Curtis Preston:

Um, anyways, thanks for, thanks for your questions for,

Prasanna Malaiyandi:

Uh, I try.

Prasanna Malaiyandi:

I try.

Prasanna Malaiyandi:

And Howard, great to have you back on the podcast.

Prasanna Malaiyandi:

Hopefully you'll come again.

Howard Marks:

Always pleasure.

Howard Marks:

As long as I keep winning, I'll keep coming back.

W. Curtis Preston:

and we thank you to our listeners.

W. Curtis Preston:

Uh, you know, we're nothing without you.

W. Curtis Preston:

Remember to subscribe so that you can restore it all.