No such thing as a “Pay as you go” appliance

Pay as you goI’ve never seen an appliance solution that I would call “pay as you go.”  I might call it “pay as you grow,” but never “pay as you go.”  There is a distinct difference between the two.

What is “pay as you go?”

I’ll give you a perfect example.  BackupCentral.com runs on a Cpanel-based VM. Cpanel can automatically copy the backups of my account to an S3 account.   I blogged about how to do that here.

I tell Cpanel to keep a week of daily backups, four weeks of weekly backups, and 3 months of monthly backups.  A backup of backupcentral.com is about 20 GB, and the way I store those backups in S3, I have about fifteen copies.  That’s a total of about 300 GB of data I have stored in Amazon S3 at any given time.

Last time I checked, Amazon bills me about $.38/month.  If I change my mind and decrease my retention, my bill drops.  If I told Cpanel to not store the three monthly backups, my monthly bill would decrease by about 20%.  If I told it to make it six months of retention, my monthly bill would increase by about 20%.

What is “pay as you grow?”

Pay as you grow

Instead of using S3 — which automatically ensures my data is copied to three locations — I could buy three FTP servers and tell Cpanel to back up to them. I would buy the smallest servers I could find. Each server would need to be capable of storing 300 GB of data.  So let’s say I buy three servers with 500 GB hard drives, to allow for some growth.

Time will pass and backupcentral.com will grow.  That is the nature of things, right?  At some point, I will need more than 500 GB of storage to hold backupcentral.com.  I’ll need to buy another hard drive to go into each server and install that hard drive.

Pay as you grow always starts with a purchase of some hardware –– more than you need at the time.  This is done to allow for some growth.  Typically you buy enough hardware to hold three years of growth.  Then a few years later when you outgrow that hardware, you either replace it with a bigger one (if it’s fully depreciated) or you grow it by adding more nodes/blocks/chunks/bricks/whatever.

Every time you do this, you are buying more than you need at that moment, because you don’t want to have to keep buying and installing new hardware every month.  Even if the hardware you’re buying is the easiest to buy and install hardware in the world, pay as you grow is still a pain, so you minimize the number of times you have to do it. And that means you always buy more than you need.

What’s your point, Curtis?

The company I work (Druva) for has competitors that sell “pay as you grow” appliances, but they often refer to them as “pay as you go.”  And I think the distinction is important. All of them start with selling you a multi-node solution for onsite storage, and (usually) another multi-node solution for offsite storage. These things cost hundreds of thousands of dollars just to start backing up a few terabytes.

It is in their best interests (for multiple reasons) to over-provision and over-sell their appliance configuration.  If they do oversize it, nobody’s going to refund your money when that appliance is fully depreciated, and you find out you bought way more than you needed for the least three or five years.

What if you under-provision it?  Then you’d have to deal with whatever the upgrade process is sooner than you’d like.  Let’s say you only buy enough to handle one year of growth.  The problem with that is now you’re dealing with the capital process every year for a very crucial part of your infrastructure.  Yuck.

In contrast, Druva customers never buy any appliances from us.  They simply install our software client and start backing up to our cloud-based system that runs in AWS.  There’s no onsite appliance to buy, nor do they need a second appliance to get the data offsite.(There is an appliance we can rent them to help seed their data, but they do not have to buy it.) In our design, data is already offsite.  Meanwhile, the customer only pays for the amount of storage they consume after their data has been globally deduplicated and compressed.

In a true pay as you go system, no customer ever pays for anything they don’t consume. Customers often pay up front for future consumption, just to make the purchasing process easier.  But if they buy too much capacity, anything they paid for in advance just gets applied to the next renewal.  There is no wasted capacity, no wasted compute.

In one mode (pay as you grow)l you have wasted money and wasted power and cooling while your over-provisioned system sits there waiting for future data.  In the other model (pay as you go), you pay only for what you consume — and you have no wasted power and cooling.

What do you think?  Is this an important difference?

 

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Bandwidth: Backup Design Problem #2

Getting enough bandwidth is the second challenge of designing and maintaining a traditional backup system. Tape is the first challenge, which is solved by not using it for operational backups.  The next challenge, however, is getting enough bandwidth to get the job done.

bandwidth

This is a major problem with any backup software product that does occasional full backups, which is most of the products running in today’s datacenter. Products that do full-file incremental backups also have this problem, although to a lesser degree.  (A full-file incremental backup is one that backs up an entire file when even one byte has changed.)

This is such a problem that many people would agree with the statement that backups are the thing that test your network system more than anything else. This is one of the main reasons people run backups at night.

This problem has been around for a long time.  I remember one time I was testing backups over the weekend, and accidentally set things up for backups to kick off at 10 AM the next day — which happened to be Monday. The network came to a screeching halt that day until we figured out what was happening and shut the backups off.

Backup system admins spend a lot of time scheduling their backups so they even out this load.  Some perform full backups only on the weekend, but this really limits the overall capacity of the system.  I prefer to perform 1/7th of the full backups each night if I’m doing weekly full backups, or 1/28th of the full backups each night if I’m doing monthly full backups.

While this increases your system capacity, it also requires constant adjustment to even the full backups out, as the size of systems changes over time. And once you’ve divided the full backups by 28 and spread them out across the month, you’ve created a barrier that you will hit at some point. What do you do when you’re doing as many full backups each night as you can? Buy more bandwidth, of course.

How has this not been fixed?

 

Luckily this problem has been fixed. Products and services that have switched to block-level incremental-forever backups need significantly less bandwidth than those that do not use such technology.   A typical block-level incremental uses over 10 times less bandwidth than typical incremental backups, and over 100 times less bandwidth than a typical full backup.

Another design element of modern backup products and services is that they use global deduplication, which only backs up blocks that have changed and haven’t been seen on any other system. If a given file is present on multiple systems, it only needs to be backed up from one of them. This significantly lowers the amount of bandwidth needed to perform a backup.

Making the impossible possible

 

Impossible

Lowering the bandwidth requirement creates two previously unheard-of possibilities: Internet-based backups and round-the-clock backups. The network impact of globally deduplicated, block-level incremental backups is so small that the data can be transferred over the Internet for many environments.  In addition, the impact on the network is so small that backups can often be done throughout the day.  And all of this can be done without all of the hassle mentioned above.

The more a product identify blocks that have changed, and the more granular and global the deduplication can be designed, the more these things become possible. One of the best ways to determine how efficient a backup system is on bandwidth is to ask them how much storage is needed to hole 90-180 days of backups. There is a direct relationship between that number and the amount of bandwidth you’re going to need.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

First Impressions of AWS Re:Invent

We’ve come a long way, baby.  I worked at Amazon when they were just an Internet bookseller. I put in the first enterprise-wide backup system back in 1998.  I was there on the day they came out with the “universal product locator,” which is the day they sold something other than books.

Oh, and if you’re here, make sure you stop by our booth and meet Data!  We have Brent Spiner from Star Trek Next Generation in our booth and at our party. Details here.

It’s a big show

There are definitely 10s of thousands of people here.  Amazon says is 40K, most of which are actual customers.  That’s a refreshing change to some shows that I’ve been at that are more about partner meetings than potential customer meetings.  Now that I’m viewing this show as a sponsor (since I now work at Druva), that’s really important. Almost everyone here is something we could potentially sell something to.

Of course, AWS being what it is, there is everything from a very small company with one VM or a couple of GB in S3 to a large enterprise.  Amazon says it’s more the latter than the former, of course.  But as a company with solutions aimed at the middle enterprise, that’s the first thing we have to determine.

The show is actually too big

It’s the first large show I’ve been at in Vegas that is in multiple venues. And there’s a sign telling you to expect it to take 30 mins to travel between venue.

There are plenty of cities that can host an event of this size without requiring people to travel between venues.  (I live in one of them.  San Diego hosts ComicCon, which is three times the size of this show.)  So I’m curious as to why Amazon has chosen Las Vegas.

The show is also sold out.  Druva has a large team here, but it would be larger if we were able to get more tickets. Even as a sponsor, we’re unable to buy more tickets for people just to work the booth.  Why is that?  Either it’s a marketing tactic or they’ve actually hit the agreed-upon capacity of the venues they chose. Either one is totally possible.

Remember when?

Amazon only sold books?  Remember when they only sold “stuff,” and weren’t the largest IaaS vendor on the planet?  Remember when we said no one would run production on Windows?  Remember when we said no one would move production to the cloud?  Ah, those were the days.

As a company that runs its entire world on Amazon, it’s now hard to imagine a world without them.  Their ability to scale infrastructure and applications like DynamoDB has enabled an entirely new class of production applications that simply weren’t possible before.  Druva is able to do things for our customers because we’re built as a cloud-native application.  We can dynamically and automatically scale (up and down) every part of our infrastructure as our customer’s needs demand.  This gives us unlimited scalability without any of the limits associated with typical backup apps.  This is why some of the largest organizations in the world trust us with their user data and ROBO data. And none of that would be possible without something like AWS.

Like I said, we’ve come a long way, baby.

 

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Why tape drives are bad for backups

Specifically, this article is about why modern tape drives are a really bad choice to store the initial copy of your backups. It’s been this way for a long time, and I’ve been saying so for at least 10 years, in case anyone thinks I’ve been swayed by my current employer.  Tape is good at some things, but receiving the first copy of your backups isn’t one of them.  There are also reasons why you don’t want to use them for your offsite copy, and I’ll look at those, too.

 

Tape drive are too fast for incremental backups

  • Tape drives are too fast
    • In case you didn’t know it, modern tape drives essentially have two speeds: stop and very fast. Yes, there are variable speed tape drives, but even the slowest speed they run at is still very fast.  For example, the slowest an LTO-7 drive can go using LTO-7 media is 79.99 MB/s native.  Add compression, and you’re at 100-200 MB/s minimum speed!
  • Incremental backups are too slow
    • Most backups are incremental backups, and incremental backups are way too slow. A file-level incremental backup supplies a random level of throughput usually measured in single digits of MegaBytes per second. This number is nowhere near 100-200 MB/s.
  • The speed mismatch is the problem
    • When incoming backups are really slow, and the tape drives want to go very fast, the drive has no choice but to stop, rewind, and start up again. It does this over and over, dragging the tape head back and forth across the read write head in multiple passes. This wears out the tape and the drive, and is the number one reason behind tape drive failures in most companies.  Tape drives are simply not the right tool for incoming backups.  Disk drives are much better suited to the task.
  • What about multiplexing
    • Multiplexing is simultaneously interleaving multiple backups together into a single stream in order to create a stream fast enough to keep your tape drive happy. It’s better than nothing, but remember that it helps your backups but hurts your restores.  If you interleave ten backups together during backup, you have to read all ten streams during a restore — and throw away nine of them just to get the one stream you want. It literally makes your restore ten times longer.  If you don’t care about restore speed, then they’re great!

What about offsite copies?

Their have been many incidents involving tapes lost or exposed by offsite vaulting companies like Iron Mountain.  Even Iron Mountain’s CEO once admitted that it happens at a regular enough interval that all tape should be encrypted. I agree with this recommendation — any transported tape ought to be encrypted.

Tape is still the cheapest way to get data offsite if you are using a traditional backup and recovery system. If you’re using such a system, you have to buy an expensive deduplication appliance to make the daily backup small enough to replicate. These can be effective, but they are very costly, and there are a lot of limits to their deduplication abilities — many of which make them cost more to purchase and use.  This is why most people are still using tape to get backups offsite.

If you have your nightly backups stored on disk, it should be possible to get those backups copied over to tape.  That is assuming that your disk target is able to supply a stream fast enough to keep your tape drives happy, and there aren’t any other bottlenecks in the way.  Unfortunately, one or more of those things is often not the case, and your offsite tape copy process becomes as mismatched as your initial backup process.

In other words, tape is often the cheapest way to get backups offsite, but it’s also the riskiest, as tapes are often lost or exposed during transit. Secondly, it can be difficult to configure your backup system properly to be able to create your offsite tape copy in an efficient manner.

I thought you liked tape?

I do like tape.  In fact, I’m probably one of the biggest proponents of tape.  It has advantages in some areas.  You cannot beat the bandwidth of tape, for example.  There is no faster way to get petabytes of data from one side of the world to another.  Tape is also much better had holding onto data for multiple decades, with a much lower chance of bit rot.  But none of these advantages come into play when talking day-to-day operational backups.

I know some of you might think that I’m saying this just because I now work at a cloud-based backup company. I will remind you that I’ve been saying these exact words above at my backup seminars for almost ten years.  Tape became a bad place to store your backups the day it started getting faster than the network connection backups were traveling over — and that was a long time ago.

What do you think?  Am I being too hard on tape?

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is AWS Ready for Production Workloads?

Yes, I know they’re already there.  The question is whether or not Amazon’s infrastructure is ready for them.  And when I mean “ready for them,” I mean “ready for them to be backed up.”  Of course that’s what I meant.  This is backupcentral.com, right?

But as I prepare to go to Amazon Re:Invent after Thanksgiving, I find myself asking this question. Before we look at the protections that are avaialable for AWS data, let’s look at why we need them in the first place.

What are we afraid of?

There is no such thing as the cloud; there is only someone else’s datacenter.  The cloud is not magic; the things that can take out your datacenter can take out the cloud.  Yes, it’s super resilient and time-tested.  I would trust Amazons’ resources over any datacenter I’ve ever been in.  But it’s not magic and it’s not impenetrable – especially by stupidity.

  • Amazon zone/site failure
    • This is probably the thing Amazon customers are most prepared for.  All Amazon resources are continuously replicated to three geographically dispersed locations.  Something like 9/11, or even a massive hurricane or flood, should not affect the availability or integrity of data stored in AWS.  Caveat: replication is asynchronous, so you may lose some data.  But you should not lose your dataset.
  • Accidental deletion/corruption of a resource
    • People are, well, people. They do dumb things.  I’ve done dumb things. I can’t tell you the number of times I’ve accidentally deleted something I needed. And, no, I didn’t always have a backup.  Man, it sucks when that happens.  Admins can accidentally volumes, VMs, databases, and any kind of resource you can think of.  In fact, one could argue that virtualization and the cloud make it easier to do more dumb things.  No one ever accidentally deleted a server when that meant pulling it out of the rack.  Backups protect against stupidity.
  • Malicious damage to a resource
    • Hackers suck. And they are out there. WordPress tells me how many people try to hack my server every day.  And they are absolutely targeting companies with malware, ransomware, and directed hacking attacks.  The problem that I have with many of the methods that people use to protect their Amazon resources is that they do not take this aspect into account  – and I think this danger is the most common one that would happen in a cloud datacenter.  EC2 snapshots and RDS snapshots (which are actually copies) are stored in the same account they are backing up.  It takes extra effort and extra cost to move those snapshots over to another account.  And no one seems to be thinking about that.  People think about the resiliency and protection that Amazon offers – which it does – but they forget that if a hacker takes control of their account they are in deep doodoo.  Just ask codespaces.com.  Oh wait, you can’t.  Because a hacker deleted them.
  • Catastrophic failure of Amazon itself
    • This is extremely unlikely to happen, but it could happen. What if there were some type of rolling bug (or malware) that somehow affected all instances of all AWS accounts.  Even cross-account copies of data would go bye-bye.  Like I said, this is extremely unlikely to happen but it’s out there.

How do we protect against these things?

I’m going to write some other blog posts about how people protect their AWS data, but here’s a quick summary.

  • Automated Snapshots
    • As I said before, these aren’t snapshots in the traditional sense of the word.  These are actually backups.   You can use the AWS Ops Automator, for example, to regularly and automatically make a “snapshot” of your EC2 instance.  The first “snapshot” copies the entire EBS volume to S3.  Subsequent “snapshots” are incremental copies of blocks that have changed since the last snapshot.  I’m going to post more on these tools later.  Suffice it to say they’re better than nothing, but they leave Mr. Backup feeling a little queasy.
  • Manual copying of snapshots to another account
    • Amazon provides command-line and Powershell tools that can be used to copy snapshots to another account.  If I was relying on snapshots for data protection, that’s exactly what I would do.  I would have a central account that is used to hold all my snapshots, and that account would be locked down tighter than any other account. The downside to this tool is that it isn’t automated.  We’re now in scripting and manual scheduling land. For the Unix/Linux folks among us this might be no big deal. But it’s still a step backward for backup technology to be sure.
  • Home-grown tools
    • You could use rsync or something like that to backup some of your Amazon resources to something outside of Amazon.  Besides relying on scripting and cron, these tools are often very bandwidth-heavy, and you’re likely going to pay heavy egress charges to pull that data down.
  • Third-party tools
    • For some Amazon resources, such as EC2, you could install a third-party backup tool and backup your VMs as if they were real servers.  This would be automated and reportable, and probably the best thing from a data protection perspective. The challenge here is that this is currently only available for EC2 instances.  We’re starting to see some point tools to backup other things that run in AWS, but I haven’t seen anything yet that tackles the whole thing.

So is it ready?

As I said earlier, an AWS datacenter is probably more resilient and secure than most datacenters.  AWS is ready for your data. But I do think there is work to be done on the data protection front.  Right now it feels a little like deja vu.  When I start to think about shell scripts and cron, I start feeling like it’s the 90s.  It’s been 17 years since I’ve revisited hostdump.sh, the tool I wrote to automatically backup filesystems on a whole bunch of Unix systems.  I really don’t want to go back to those days.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is a portable hard drive the best way to backup a laptop?

Short answer: no, it’s the worst way

Portable hard drive

Alright, the worst way would be to not back it up at all.  Sadly that’s the most common way. Other than that, the worst way would be to back it up to a portable hard drive.

Portable hard drives are unreliable

I have used portable hard drives for years, and I can’t tell you how many of them have failed in that time.  Let’s just say it’s in the dozens.  It could be the physics of putting a hard drive in such a small container.  That would explain how they fail much more often than the same drives in a laptop.  Maybe it gets too hot in those enclosures; maybe just being small like that allows them to get roughed up more than they do in a laptop.  All I know is they fail much more often than any hard drive I’ve ever had.  When the hard drive itself doesn’t fail, the electronics around it fail.

It’s with your laptop or PC

Laptop on fire

Using a portable hard drive as your backup means you’re probably storing it next to your PC or putting it into your laptop bag when you travel.  That means it’s right next to the thing it’s protecting.  So when the thing you’re protecting catches fire or gets stolen, your protection goes right along with it.  Remember, you’re just as likely (if not more likely) to have your laptop stolen as you are to have a hard drive failure.

What about DVD backup?

DVDs are more reliable than hard drives, but they have their own problems.  The biggest challenge is that the capacity and throughput are way off from what most people need. Hard drives can easily hold many hundreds of gigabytes of data — even terabytes.  Backing that up to even BluRay DVDs is going to take a lot of CDs and a lot of time.  The transfer rate of burning something in with a laser is pretty slow.

So what do you do, then?

I don’t see any other sensible method than to back it up automatically to a system designed to back up laptops and desktops over the Internet.  This could be a piece of software you purchase and install on systems in your datacenter.  If you go that route, however, you’re going to need to make sure the system works for people who aren’t on the corporate network.

What makes the most sense for this data is a cloud-based data protection system. It would support everyone no matter where they reside.  There are no hard drives to manage, no backup hardware to purchase and manage, and everyone everywhere can backup their computers and access their backups.

What do you think?  Is there a better way to back up laptops and desktops than the cloud?

 

 

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Where does data come from: Laptops & desktops

The datacenter is no longer the center of data.  Data that needs to be protected comes from a variety of sources, most of which are not the datacenter. The first one I’m going to talk about is laptops and desktops.

There was a time when personal computers were used to access company data, rather than create it. In my first corporate job, I remember using a 3270 terminal to use Lotus 123 or Word Perfect.  Documents created in that terminal were not stored on that terminal; it had no hard drive or floppy drive!

(From IBM 3270 on Wikipedia)

Documents created on that computer were stored on the company’s servers in the datacenter. Then I was responsible for backing up those servers. I remember backing up hpfs01, or HP file server 01, where all that data was stored.

If you wanted to create data, you came to the office and you used the 3270 to do so.  No one took their data home.  No one created data at home.  Even once we added the ability to dial in from your home PC, you used a terminal emulator to telnet into the Lotus or WordPerfect server to do your actual work.

Enter Windows, stage left

I still remember the first time I saw Joe (his real name) using Windows in the office, and I remember they were using some new thing called Microsoft Word. I remember fighting the idea for so many reasons, the first of which was how was I supposed to back up the data on that guy’s floppy drive?   We forced that user to store any data he created in his home directory on hpfs01.  Problem solved.

We weren’t in danger of having Joe take his work home.  His PC was strapped to his desk, as laptops just weren’t a thing yet. I mean, come on, who would want to bring one of these things home?  (From http://www.xs4all.nl/~fjkraan/comp/ibm5140/ )

Enter the laptop

Once laptops became feasible in the mid to late 90s, things got more difficult. Many companies staved off this problem with corporate policies that forced employees to store data on the company server.

For a variety of reasons these approaches stopped working in the corporate world. People became more and more used to creating and storing data on their local PC or laptop.

A data protection nightmare

The proliferation of data outside the datacenter has been a problem since the invention of cheap hard drives.  But today it’s impossible to ignore that a significant amount of data resides on desktops and laptops, which is why that data needs to be protected.

It must be protected in a way that preserves for when that hard drive goes bad, or is dropped in a bathtub, or blows up in a battery fire.  All sorts of things can result in you needing a restore when you have your own hard drive.

It also must be protected in a way that allows that data to be easily searched for electronic discovery (ED) requests, because that is the other risk of having data everywhere. Satisfying an ED request for 100s of laptops can be quite difficult if you don’t have the ability to search for the needle in a haystack.

My next post will be about why portable hard drives are the worst way you can back up this important data.

Check out Druva, a great way to back up this data.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

My head’s in the clouds: I just joined Druva

After almost 25 years of specializing in backup and data protection as an end user, consultant, and analyst, I’ve decided to work for my first vendor.  I started today at Druva.

Why a vendor?  Why Now?

I figured that it was time to put up or shut up. Put my money where my mouth is.  To fully understand this industry I have to experience it from all sides, and that includes the side trying to make it all happen.  I’ve been an end user, a consultant, and an analyst.  Now it’s time to try making it happen.

Why Druva?

I’ve been a fan of cloud-based data protection for some time now, as anyone who ever attended one of my backup schools can attest.  It makes the most sense for the bulk of the market and offers a level of security and availability simply not available with traditional solutions.

Anyone who has heard me speak knows I’m not anti-tape.   In fact, I think tape is a great medium for some things. But it hasn’t been the right medium for operational backup for quite some time.  Obviously more to come on this and other subjects.

But if disk is the right medium for operational backup, how do you get that data offsite to protect against disasters?  There are many answers to this question, but I have felt for a long time the best answer is to back up to the cloud.  If your first backup is to the cloud, then it’s already offsite.

Of course, having your only copy of data in the cloud can be problematic for large restores with a short RTO. This is why Druva has the ability to have a local copy of your data to facilitate such restores.

Druva was founded in 2008 by Jaspreet Singh and Milind Borate and it has over 4000 happy customers running its products.  Druva’s first product was inSync, which focuses on protecting & sharing data from desktops, laptops, and cloud applications such as Office365, GSuite, and Salesforce.com. Druva’s second product is Phoenix, which is designed to protect datacenters.  It protects VMware and Hyper-V workloads, as well as physical machines running Linux or Windows.   One of  Druva’s differentiators is that all data, regardless of source or type, is stored in a central deduplicated repository to facilitate data governance, ediscovery, and data mining.   I’ll be talking more about those things as I learn more about the company and its products.

This post was going to be longer, but the first day at my new job turned out to be a lot of work.  So I’ll keep it short and sweet. Mr. Backup has joined Druva!

Keep it cloudy, my friends.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Not possible to export Quickbooks Online Data to Quickbooks Desktop

Intuit has created a catch 22 that can only be resolved with professional help.  The good news is they’ll help you.  The bad news is you might have to pay for it.

Suppose you’ve been using Quickbooks Online and have decided that you would like to switch to the Desktop Edition for whatever reason.  No problem.  Just google how to export your Quickbooks Online data to the desktop edition.  Google’s quick result article tells you just how to do it.  Except the instructions don’t work.  Bad Google.

So you go into Quickbooks Online and search “Export data to desktop” and you’ll find an article that has better instructions.  You need Internet Explorer.  But I’m on a Mac.  <sigh>


So I find a Windows machine so I can run Internet Explorer.  I get the login screen and I try to login.  It just spins and spins.  So I Google “can’t login to Quickbooks Online with Internet Explorer.” I find this:

Ugh.  So I call tech support and ask them what to do.  They recommend I install IE 10.  You know, the version that was replaced over three years ago.

Except when i try to install IE 10 it says it won’t install.  Maybe I need to uninstall 11 first, right?  Well, it doesn’t show up in the “Uninstall Software” dialogue.

So they require me to use a piece of software then tell me that’s not the best software to use.  Just wonderful.

I’m on hold right now.  They tell me that because I’m in this catch 22, they’ll do the full service export for free.  Except that now I’m being grilled and being told that it should work.  Except it doesn’t.  And your own site says it doesn’t.

So…

 

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Spectra takes aim at Amazon Glacier

I recently attended the Spectralogic Deep Storage Summit in Boulder Colorado.  (They paid for all travel and meals during the trip, but no other remuneration was offered.)  Their big announcement was a product that is aimed solidly at Amazon Glacier: Spectra ArcticBlue.

ActicBlue is an object-based disk system starting at 300 usable TB and going up to over 5 PB that sits directly in front of a Spectra tape library.  It's aimed squarely at Amazon Glacier because its interface is S-3.  You can do a get or put to it just like you would to a bucket in Amazon, except the data would be stored in the (up to) 5 PB disk cache and stored on tape in a Spectra tape library — which scale to multiple Exabytes. The product is built on top of the BlackPearl architecture that they announced two years ago.

Two products came immediately to mind when thinking about this product.  Quantum's Lattus & Amazon's Glacier.  It would seem that Spectra is actually aiming solidly at both.  Here are a few things that are very interesting about the product.

Erasure Coding

ArcticBlue uses erasure coding — not RAID — to ensure that data on disk is not corrupted or lost.  Disks are grouped into "bands" of 23 drives, which are part of a 20+3 erasure coding group.  This very wide band offers protection from up to three simultaneous disk failures with very minimal overhead.  If you're not familiar with erasure coding and how it is definitely not RAID, check out this article from ComputerWeekly.

Power-Down at the Band Level

When an application does a get or put to/from an S-3 bucket, only the units that comprise that bucket need to be on.  This means that the rest of the system can be powered off to both save power and cooling and to extend the life of the unit.  This is why they are advertising a 7-year lifespan for this product and not a 3-year lifespan.  This was one big difference I saw between the ArcticBlue unit and Lattus.  Lattus does not appear to have any power down features.

Genetic Dispersion

An S-3 bucket can be configured to span both disk and tape, ensuring that any files put onto disk are also put onto tape.  It could even span multiple tape types, since Spectra supports both LTO & IBM TS drives.  This means that the system could ensure that every file is always on disk, LTO, and IBM TS tape.  Spectra referred to this as increasing genetic dispersion.  Genetic dispersion protects against multiple types of failures by putting data on multiple different types of media.  The system can also be told to make sure one copy is also offline.

Future iterations of the product could have a bucket that spans location, so that any data is always copied to multiple locations. 

Shingled Magnetic Recording (SMR) drives

A new type of media from Seagate is called Shingled Magnetic Recording, and it allows data to be stacked on top of each other — just like shingles on a roof.  The upside of this is that it increases the density of the disk by about 25%.  The downside is that — like roof shingles — you can't remove a lower layer of shingles without removing an upper layer.  Therefore, writing to an SMR drive is a lot like writing to tape.  You can append all you want, but once you wan to go back and modify things, you have to erase the whole thing and start over.  Spectra said this is why they were uniquely suited to leverage these drives.  (Their marketing slick says, "It took a tape company to unleash the power of disk.")  Using these drives requires advanced planning and logistics that they claim is built into their system from day one. 

Why would you use such drives, you may ask?  Cheaper and bigger while being smaller.  That is the drives have bigger capacities than are possible without SMR today, and therefore allow you to put more data in less space and also save money.

TCO

The most interesting part of me what when they compared the TCO of having your own S-3 cloud onsite using ArcticBlue vs. doing the same thing with Glacier or S-3.  I have not delved into the TCO model, but according to them it is at least of magnitude cheaper than Glacier.  So there's that.

I'd be interested in hearing from anyone who actually deploys this product in his or her datacenter.

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.