Why tape drives are bad for backups

Specifically, this article is about why modern tape drives are a really bad choice to store the initial copy of your backups. It’s been this way for a long time, and I’ve been saying so for at least 10 years, in case anyone thinks I’ve been swayed by my current employer.  Tape is good at some things, but receiving the first copy of your backups isn’t one of them.  There are also reasons why you don’t want to use them for your offsite copy, and I’ll look at those, too.

 

Tape drive are too fast for incremental backups

  • Tape drives are too fast
    • In case you didn’t know it, modern tape drives essentially have two speeds: stop and very fast. Yes, there are variable speed tape drives, but even the slowest speed they run at is still very fast.  For example, the slowest an LTO-7 drive can go using LTO-7 media is 79.99 MB/s native.  Add compression, and you’re at 100-200 MB/s minimum speed!
  • Incremental backups are too slow
    • Most backups are incremental backups, and incremental backups are way too slow. A file-level incremental backup supplies a random level of throughput usually measured in single digits of MegaBytes per second. This number is nowhere near 100-200 MB/s.
  • The speed mismatch is the problem
    • When incoming backups are really slow, and the tape drives want to go very fast, the drive has no choice but to stop, rewind, and start up again. It does this over and over, dragging the tape head back and forth across the read write head in multiple passes. This wears out the tape and the drive, and is the number one reason behind tape drive failures in most companies.  Tape drives are simply not the right tool for incoming backups.  Disk drives are much better suited to the task.
  • What about multiplexing
    • Multiplexing is simultaneously interleaving multiple backups together into a single stream in order to create a stream fast enough to keep your tape drive happy. It’s better than nothing, but remember that it helps your backups but hurts your restores.  If you interleave ten backups together during backup, you have to read all ten streams during a restore — and throw away nine of them just to get the one stream you want. It literally makes your restore ten times longer.  If you don’t care about restore speed, then they’re great!

What about offsite copies?

Their have been many incidents involving tapes lost or exposed by offsite vaulting companies like Iron Mountain.  Even Iron Mountain’s CEO once admitted that it happens at a regular enough interval that all tape should be encrypted. I agree with this recommendation — any transported tape ought to be encrypted.

Tape is still the cheapest way to get data offsite if you are using a traditional backup and recovery system. If you’re using such a system, you have to buy an expensive deduplication appliance to make the daily backup small enough to replicate. These can be effective, but they are very costly, and there are a lot of limits to their deduplication abilities — many of which make them cost more to purchase and use.  This is why most people are still using tape to get backups offsite.

If you have your nightly backups stored on disk, it should be possible to get those backups copied over to tape.  That is assuming that your disk target is able to supply a stream fast enough to keep your tape drives happy, and there aren’t any other bottlenecks in the way.  Unfortunately, one or more of those things is often not the case, and your offsite tape copy process becomes as mismatched as your initial backup process.

In other words, tape is often the cheapest way to get backups offsite, but it’s also the riskiest, as tapes are often lost or exposed during transit. Secondly, it can be difficult to configure your backup system properly to be able to create your offsite tape copy in an efficient manner.

I thought you liked tape?

I do like tape.  In fact, I’m probably one of the biggest proponents of tape.  It has advantages in some areas.  You cannot beat the bandwidth of tape, for example.  There is no faster way to get petabytes of data from one side of the world to another.  Tape is also much better had holding onto data for multiple decades, with a much lower chance of bit rot.  But none of these advantages come into play when talking day-to-day operational backups.

I know some of you might think that I’m saying this just because I now work at a cloud-based backup company. I will remind you that I’ve been saying these exact words above at my backup seminars for almost ten years.  Tape became a bad place to store your backups the day it started getting faster than the network connection backups were traveling over — and that was a long time ago.

What do you think?  Am I being too hard on tape?

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is AWS Ready for Production Workloads?

Yes, I know they’re already there.  The question is whether or not Amazon’s infrastructure is ready for them.  And when I mean “ready for them,” I mean “ready for them to be backed up.”  Of course that’s what I meant.  This is backupcentral.com, right?

But as I prepare to go to Amazon Re:Invent after Thanksgiving, I find myself asking this question. Before we look at the protections that are avaialable for AWS data, let’s look at why we need them in the first place.

What are we afraid of?

There is no such thing as the cloud; there is only someone else’s datacenter.  The cloud is not magic; the things that can take out your datacenter can take out the cloud.  Yes, it’s super resilient and time-tested.  I would trust Amazons’ resources over any datacenter I’ve ever been in.  But it’s not magic and it’s not impenetrable – especially by stupidity.

  • Amazon zone/site failure
    • This is probably the thing Amazon customers are most prepared for.  All Amazon resources are continuously replicated to three geographically dispersed locations.  Something like 9/11, or even a massive hurricane or flood, should not affect the availability or integrity of data stored in AWS.  Caveat: replication is asynchronous, so you may lose some data.  But you should not lose your dataset.
  • Accidental deletion/corruption of a resource
    • People are, well, people. They do dumb things.  I’ve done dumb things. I can’t tell you the number of times I’ve accidentally deleted something I needed. And, no, I didn’t always have a backup.  Man, it sucks when that happens.  Admins can accidentally volumes, VMs, databases, and any kind of resource you can think of.  In fact, one could argue that virtualization and the cloud make it easier to do more dumb things.  No one ever accidentally deleted a server when that meant pulling it out of the rack.  Backups protect against stupidity.
  • Malicious damage to a resource
    • Hackers suck. And they are out there. WordPress tells me how many people try to hack my server every day.  And they are absolutely targeting companies with malware, ransomware, and directed hacking attacks.  The problem that I have with many of the methods that people use to protect their Amazon resources is that they do not take this aspect into account  – and I think this danger is the most common one that would happen in a cloud datacenter.  EC2 snapshots and RDS snapshots (which are actually copies) are stored in the same account they are backing up.  It takes extra effort and extra cost to move those snapshots over to another account.  And no one seems to be thinking about that.  People think about the resiliency and protection that Amazon offers – which it does – but they forget that if a hacker takes control of their account they are in deep doodoo.  Just ask codespaces.com.  Oh wait, you can’t.  Because a hacker deleted them.
  • Catastrophic failure of Amazon itself
    • This is extremely unlikely to happen, but it could happen. What if there were some type of rolling bug (or malware) that somehow affected all instances of all AWS accounts.  Even cross-account copies of data would go bye-bye.  Like I said, this is extremely unlikely to happen but it’s out there.

How do we protect against these things?

I’m going to write some other blog posts about how people protect their AWS data, but here’s a quick summary.

  • Automated Snapshots
    • As I said before, these aren’t snapshots in the traditional sense of the word.  These are actually backups.   You can use the AWS Ops Automator, for example, to regularly and automatically make a “snapshot” of your EC2 instance.  The first “snapshot” copies the entire EBS volume to S3.  Subsequent “snapshots” are incremental copies of blocks that have changed since the last snapshot.  I’m going to post more on these tools later.  Suffice it to say they’re better than nothing, but they leave Mr. Backup feeling a little queasy.
  • Manual copying of snapshots to another account
    • Amazon provides command-line and Powershell tools that can be used to copy snapshots to another account.  If I was relying on snapshots for data protection, that’s exactly what I would do.  I would have a central account that is used to hold all my snapshots, and that account would be locked down tighter than any other account. The downside to this tool is that it isn’t automated.  We’re now in scripting and manual scheduling land. For the Unix/Linux folks among us this might be no big deal. But it’s still a step backward for backup technology to be sure.
  • Home-grown tools
    • You could use rsync or something like that to backup some of your Amazon resources to something outside of Amazon.  Besides relying on scripting and cron, these tools are often very bandwidth-heavy, and you’re likely going to pay heavy egress charges to pull that data down.
  • Third-party tools
    • For some Amazon resources, such as EC2, you could install a third-party backup tool and backup your VMs as if they were real servers.  This would be automated and reportable, and probably the best thing from a data protection perspective. The challenge here is that this is currently only available for EC2 instances.  We’re starting to see some point tools to backup other things that run in AWS, but I haven’t seen anything yet that tackles the whole thing.

So is it ready?

As I said earlier, an AWS datacenter is probably more resilient and secure than most datacenters.  AWS is ready for your data. But I do think there is work to be done on the data protection front.  Right now it feels a little like deja vu.  When I start to think about shell scripts and cron, I start feeling like it’s the 90s.  It’s been 17 years since I’ve revisited hostdump.sh, the tool I wrote to automatically backup filesystems on a whole bunch of Unix systems.  I really don’t want to go back to those days.

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is a portable hard drive the best way to backup a laptop?

Short answer: no, it’s the worst way

Alright, the worst way would be to not back it up at all.  Sadly that’s the most common way. Other than that, the worst way would be to back it up to a portable hard drive.

Portable hard drives are unreliable

I have used portable hard drives for years, and I can’t tell you how many of them have failed in that time.  Let’s just say it’s in the dozens.  It could be the physics of putting a hard drive in such a small container.  That would explain how they fail much more often than the same drives in a laptop.  Maybe it gets too hot in those enclosures; maybe just being small like that allows them to get roughed up more than they do in a laptop.  All I know is they fail much more often than any hard drive I’ve ever had.  When the hard drive itself doesn’t fail, the electronics around it fail.

It’s with your laptop or PC

Using a portable hard drive as your backup means you’re probably storing it next to your PC or putting it into your laptop bag when you travel.  That means it’s right next to the thing it’s protecting.  So when the thing you’re protecting catches fire or gets stolen, your protection goes right along with it.  Remember, you’re just as likely (if not more likely) to have your laptop stolen as you are to have a hard drive failure.

What about DVD backup?

DVDs are more reliable than hard drives, but they have their own problems.  The biggest challenge is that the capacity and throughput are way off from what most people need. Hard drives can easily hold many hundreds of gigabytes of data — even terabytes.  Backing that up to even BluRay DVDs is going to take a lot of CDs and a lot of time.  The transfer rate of burning something in with a laser is pretty slow.

So what do you do, then?

I don’t see any other sensible method than to back it up automatically to a system designed to back up laptops and desktops over the Internet.  This could be a piece of software you purchase and install on systems in your datacenter.  If you go that route, however, you’re going to need to make sure the system works for people who aren’t on the corporate network.

What makes the most sense for this data is a cloud-based data protection system. It would support everyone no matter where they reside.  There are no hard drives to manage, no backup hardware to purchase and manage, and everyone everywhere can backup their computers and access their backups.

What do you think?  Is there a better way to back up laptops and desktops than the cloud?

 

 

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Where does data come from: Laptops & desktops

The datacenter is no longer the center of data.  Data that needs to be protected comes from a variety of sources, most of which are not the datacenter. The first one I’m going to talk about is laptops and desktops.

There was a time when personal computers were used to access company data, rather than create it. In my first corporate job, I remember using a 3270 terminal to use Lotus 123 or Word Perfect.  Documents created in that terminal were not stored on that terminal; it had no hard drive or floppy drive!

(From IBM 3270 on Wikipedia)

Documents created on that computer were stored on the company’s servers in the datacenter. Then I was responsible for backing up those servers. I remember backing up hpfs01, or HP file server 01, where all that data was stored.

If you wanted to create data, you came to the office and you used the 3270 to do so.  No one took their data home.  No one created data at home.  Even once we added the ability to dial in from your home PC, you used a terminal emulator to telnet into the Lotus or WordPerfect server to do your actual work.

Enter Windows, stage left

I still remember the first time I saw Joe (his real name) using Windows in the office, and I remember they were using some new thing called Microsoft Word. I remember fighting the idea for so many reasons, the first of which was how was I supposed to back up the data on that guy’s floppy drive?   We forced that user to store any data he created in his home directory on hpfs01.  Problem solved.

We weren’t in danger of having Joe take his work home.  His PC was strapped to his desk, as laptops just weren’t a thing yet. I mean, come on, who would want to bring one of these things home?  (From http://www.xs4all.nl/~fjkraan/comp/ibm5140/ )

Enter the laptop

Once laptops became feasible in the mid to late 90s, things got more difficult. Many companies staved off this problem with corporate policies that forced employees to store data on the company server.

For a variety of reasons these approaches stopped working in the corporate world. People became more and more used to creating and storing data on their local PC or laptop.

A data protection nightmare

The proliferation of data outside the datacenter has been a problem since the invention of cheap hard drives.  But today it’s impossible to ignore that a significant amount of data resides on desktops and laptops, which is why that data needs to be protected.

It must be protected in a way that preserves for when that hard drive goes bad, or is dropped in a bathtub, or blows up in a battery fire.  All sorts of things can result in you needing a restore when you have your own hard drive.

It also must be protected in a way that allows that data to be easily searched for electronic discovery (ED) requests, because that is the other risk of having data everywhere. Satisfying an ED request for 100s of laptops can be quite difficult if you don’t have the ability to search for the needle in a haystack.

My next post will be about why portable hard drives are the worst way you can back up this important data.

Check out Druva, a great way to back up this data.

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

My head’s in the clouds: I just joined Druva

After almost 25 years of specializing in backup and data protection as an end user, consultant, and analyst, I’ve decided to work for my first vendor.  I started today at Druva.

Why a vendor?  Why Now?

I figured that it was time to put up or shut up. Put my money where my mouth is.  To fully understand this industry I have to experience it from all sides, and that includes the side trying to make it all happen.  I’ve been an end user, a consultant, and an analyst.  Now it’s time to try making it happen.

Why Druva?

I’ve been a fan of cloud-based data protection for some time now, as anyone who ever attended one of my backup schools can attest.  It makes the most sense for the bulk of the market and offers a level of security and availability simply not available with traditional solutions.

Anyone who has heard me speak knows I’m not anti-tape.   In fact, I think tape is a great medium for some things. But it hasn’t been the right medium for operational backup for quite some time.  Obviously more to come on this and other subjects.

But if disk is the right medium for operational backup, how do you get that data offsite to protect against disasters?  There are many answers to this question, but I have felt for a long time the best answer is to back up to the cloud.  If your first backup is to the cloud, then it’s already offsite.

Of course, having your only copy of data in the cloud can be problematic for large restores with a short RTO. This is why Druva has the ability to have a local copy of your data to facilitate such restores.

Druva was founded in 2008 by Jaspreet Singh and Milind Borate and it has over 4000 happy customers running its products.  Druva’s first product was inSync, which focuses on protecting & sharing data from desktops, laptops, and cloud applications such as Office365, GSuite, and Salesforce.com. Druva’s second product is Phoenix, which is designed to protect datacenters.  It protects VMware and Hyper-V workloads, as well as physical machines running Linux or Windows.   One of  Druva’s differentiators is that all data, regardless of source or type, is stored in a central deduplicated repository to facilitate data governance, ediscovery, and data mining.   I’ll be talking more about those things as I learn more about the company and its products.

This post was going to be longer, but the first day at my new job turned out to be a lot of work.  So I’ll keep it short and sweet. Mr. Backup has joined Druva!

Keep it cloudy, my friends.

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Not possible to export Quickbooks Online Data to Quickbooks Desktop

Intuit has created a catch 22 that can only be resolved with professional help.  The good news is they’ll help you.  The bad news is you might have to pay for it.

Suppose you’ve been using Quickbooks Online and have decided that you would like to switch to the Desktop Edition for whatever reason.  No problem.  Just google how to export your Quickbooks Online data to the desktop edition.  Google’s quick result article tells you just how to do it.  Except the instructions don’t work.  Bad Google.

So you go into Quickbooks Online and search “Export data to desktop” and you’ll find an article that has better instructions.  You need Internet Explorer.  But I’m on a Mac.  <sigh>


So I find a Windows machine so I can run Internet Explorer.  I get the login screen and I try to login.  It just spins and spins.  So I Google “can’t login to Quickbooks Online with Internet Explorer.” I find this:

Ugh.  So I call tech support and ask them what to do.  They recommend I install IE 10.  You know, the version that was replaced over three years ago.

Except when i try to install IE 10 it says it won’t install.  Maybe I need to uninstall 11 first, right?  Well, it doesn’t show up in the “Uninstall Software” dialogue.

So they require me to use a piece of software then tell me that’s not the best software to use.  Just wonderful.

I’m on hold right now.  They tell me that because I’m in this catch 22, they’ll do the full service export for free.  Except that now I’m being grilled and being told that it should work.  Except it doesn’t.  And your own site says it doesn’t.

So…

 

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Spectra takes aim at Amazon Glacier

I recently attended the Spectralogic Deep Storage Summit in Boulder Colorado.  (They paid for all travel and meals during the trip, but no other remuneration was offered.)  Their big announcement was a product that is aimed solidly at Amazon Glacier: Spectra ArcticBlue.

ActicBlue is an object-based disk system starting at 300 usable TB and going up to over 5 PB that sits directly in front of a Spectra tape library.  It's aimed squarely at Amazon Glacier because its interface is S-3.  You can do a get or put to it just like you would to a bucket in Amazon, except the data would be stored in the (up to) 5 PB disk cache and stored on tape in a Spectra tape library — which scale to multiple Exabytes. The product is built on top of the BlackPearl architecture that they announced two years ago.

Two products came immediately to mind when thinking about this product.  Quantum's Lattus & Amazon's Glacier.  It would seem that Spectra is actually aiming solidly at both.  Here are a few things that are very interesting about the product.

Erasure Coding

ArcticBlue uses erasure coding — not RAID — to ensure that data on disk is not corrupted or lost.  Disks are grouped into "bands" of 23 drives, which are part of a 20+3 erasure coding group.  This very wide band offers protection from up to three simultaneous disk failures with very minimal overhead.  If you're not familiar with erasure coding and how it is definitely not RAID, check out this article from ComputerWeekly.

Power-Down at the Band Level

When an application does a get or put to/from an S-3 bucket, only the units that comprise that bucket need to be on.  This means that the rest of the system can be powered off to both save power and cooling and to extend the life of the unit.  This is why they are advertising a 7-year lifespan for this product and not a 3-year lifespan.  This was one big difference I saw between the ArcticBlue unit and Lattus.  Lattus does not appear to have any power down features.

Genetic Dispersion

An S-3 bucket can be configured to span both disk and tape, ensuring that any files put onto disk are also put onto tape.  It could even span multiple tape types, since Spectra supports both LTO & IBM TS drives.  This means that the system could ensure that every file is always on disk, LTO, and IBM TS tape.  Spectra referred to this as increasing genetic dispersion.  Genetic dispersion protects against multiple types of failures by putting data on multiple different types of media.  The system can also be told to make sure one copy is also offline.

Future iterations of the product could have a bucket that spans location, so that any data is always copied to multiple locations. 

Shingled Magnetic Recording (SMR) drives

A new type of media from Seagate is called Shingled Magnetic Recording, and it allows data to be stacked on top of each other — just like shingles on a roof.  The upside of this is that it increases the density of the disk by about 25%.  The downside is that — like roof shingles — you can't remove a lower layer of shingles without removing an upper layer.  Therefore, writing to an SMR drive is a lot like writing to tape.  You can append all you want, but once you wan to go back and modify things, you have to erase the whole thing and start over.  Spectra said this is why they were uniquely suited to leverage these drives.  (Their marketing slick says, "It took a tape company to unleash the power of disk.")  Using these drives requires advanced planning and logistics that they claim is built into their system from day one. 

Why would you use such drives, you may ask?  Cheaper and bigger while being smaller.  That is the drives have bigger capacities than are possible without SMR today, and therefore allow you to put more data in less space and also save money.

TCO

The most interesting part of me what when they compared the TCO of having your own S-3 cloud onsite using ArcticBlue vs. doing the same thing with Glacier or S-3.  I have not delved into the TCO model, but according to them it is at least of magnitude cheaper than Glacier.  So there's that.

I'd be interested in hearing from anyone who actually deploys this product in his or her datacenter.

Continue reading

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Running CrashPlan on a headless CentOS/cpanel server

I was helping someone figure out how to back up their CentOS/Cpanel-based web server using CrashPlan.  He was already backing it up via rsync, but he wanted to back it up with a cloud backup product.  Code42 advertises that CrashPlan and CrashPlanPro support Linux, so how hard could it be?  Not hard at all if you know what to do.  But if you're on a headless web server you will be at the mercy of what you can find on the Internet, as Code42 won't help you at all as you're running an "unsupported configuration."

We got it to work, but only after trying multiple different methods that didn't work.  So I thought I'd describe what we did that eventually worked, and hopefully someone else will find this when they're in the same situation.

What didn't work

Code42 has an "unsupported" (but totally reliable) method to connect the CrashPlan app on your desktop to the CrashPlan service running on the web server by using ssh tunneling.  It's described here.  We were able to make that method work to configure the backup, but then the backup wouldn't run.  It just stayed stuck at "waiting for backup."  We contacted Code42, but they said they couldn't help us at all because we were running an unsupported configuration.  More on that at the end of this blog.

I thought the path to take would be to see if we could use the GUI that is supposed to display on the console of the server, but display it back to our desktop — a MacBook in this case.  (Something totally normally in Unix/Linux configurations.)  Then since I would be running the GUI directly from the server being backed up, I could then call support.  It turned out I ended up fixing it myself, though.  Let's see what I did.

Use ssh to forward X11

Since MacOS no longer uses the X11 Window System (BTW, it's not "X Windows"), I needed to install Xquartz, which I got from here. We followed the instructions and they seemed to work without a hitch.

X11 forwarding is not turned on by default in CentOS, so you have to edit the sshd config and restart sshd.  (Thanks to this blog post for helping me with this.)

sudo vi /etc/ssh/sshd_config

Uncomment and change these two lines to these values

X11Forwarding yes
X11UseLocalhost no

Now restart sshd.

$ sudo /etc/init.d/ssd reload

If you do not have xauth installed already, you need to install it, too.

$ sudo yum install xauth

Then back on the client where you want to see the GUI displayed, run this command:

$ ssh -l root -Y <linuxserver>

We saw a message that mentioned that xauth had created a new authority file.

To test if it was working correctly, we wanted to run xterm.  But that wasn't installed yet, so we installed it.

$ sudo yum install xterm
$ xterm

We waited a few second, and voila!  An xterm popped up on the Mac.  Awesome.  

Run CrashPlanDesktop

$ /usr/local/crashplan/bin/CrashPlanDesktop
$

It just returned the prompt to us and never did anything.  When we looked at the log directory, we saw error messages like the ones mentioned in this blog post.  We followed the suggestions in that blog post about creating temporary directories that CrashPlan can write to, and then specifying those directories in the run.conf file.

$ mkdir /root/.crashplan-tmp
$ mkdir /var/crashplan
$ vi /usr/local/crashplan/bin/run.conf

Add this to the end of the GUI_JAVA_OPTS line: "-Djava.io.tmpdir=/root/.crashplan-tmp"
Add this to the end of the SRV_JAVA_OPTS line: "-Djava.io.tmpdir=/var/crashplan"

So run.conf now looks like this:

SRV_JAVA_OPTS="-Dfile.encoding=UTF-8 -Dapp=CrashPlanService -DappBaseName=CrashPlan -Xms20m -Xmx1024m -Djava.net.preferIPv4Stack=true -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 -Dc42.native.md5.enabled=false -Djava.io.tmpdir=/var/crashplan"

GUI_JAVA_OPTS="-Dfile.encoding=UTF-8 -Dapp=CrashPlanDesktop -DappBaseName=CrashPlan -Xms20m -Xmx512m -Djava.net.preferIPv4Stack=true -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 -Dc42.native.md5.enabled=false -Djava.io.tmpdir=/root/.crashplan-tmp"

After that, everything worked perfectly!

Epilogue: Vindication

We fixed the GUI_JAVA_OPTS line first and were then able to run the GUI and configure the backups, but the backup was still stuck at "waiting for backup."  That was exactly what happened when we used the method of running the GUI local on the Mac and connecting to the CrashPlan service on the web server.  We then changed the SRV_JAVA_OPTS line and backups kicked off immediately.

In other words, the reason the backup wasn't working had nothing to do with us running an unsupported GUI configuration and had everything to do with the CrashPlan app trying to use directories that it couldn't write to.  Now back to Code42.

You can support something that isn't "supported"

Just because a customer is running an unsupported configuration, that doesn't mean you can't help him troubleshoot something.  The Code42 support person could have told us where the logs are, for example.  (Yes, they were in the obvious place of /usr/local/crashplan/logs, but we didn't know that.)  Luckily we googled the right thing and found that web page.  Luckily we knew what X11 was and could figure out how to install it on our Mac.  They could have at least helped a little.  Instead, they simply said I was running a system that didn't meet the minimum requirements, so he literally could not help me in any way to troubleshoot the problem.

This is very reminiscent of when I was trying to install a Drobo on my iMac in my house. The blog post I wrote back then was to tell Data Robotics to either support Linux or drop it.  I still feel the same way right now, but in this case the problem is not that they aren't supporting Linux; it's that they don't support headless Linux, which is what most web servers are running.

It isn't that hard to provide some "best effort" support to someone.  They could also enhance that "how to run CrashPlan on a headless Linux system" post by adding this X11 Forwarding idea to it.  Then if a customer has a few questions, help them.  Tell them it's unsupported and that the support will be best effort.  But make the effort.  Seriously.

Continue reading

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is your data in the cloud getting backed up?

Roaming the aisles of Dreamforce 2014 taught me one thing: backups are here to stay.  You can move everything you can into the cloud, but your data still has to be protected against natural and human-created disasters.  Moving it to the cloud doesn’t change that.

I’ve always felt that way, but I thought for a while that maybe I was just a lone reed in the wind; only I was worried about data that had been moved to the cloud.  Everyone else was happy with the backups of their mission-critical data being put into the hands of the cloud provider. 

It was with some joy that I welcomed Backupify to the salesforce.com world when I first heard about them a few years ago.  (To my knowledge, they were the first vendor to offer backup of your salesforce.com data, and the first to backup Facebook, Gmail, and others.)  But I wondered whether or not there would be enough people concerned about their cloud-based data to justify adding that expense to their cloud infrastructure bill.  They might think, for example, that a company the size of salesforce.com is backing up their data – so why should they pay to do it as well.   Only time would tell.

Walking around Dreamforce 2014, though, put my fears to rest.  There were three other companies exhibiting backup solutions for salesforce.com (that I could see), and there are a few others that I found via a simple “backup salesforce”  search.  By the way, I’ll cover these companies in another post.

The key concept I wanted to cover here is that some people believe that by moving their data to the cloud, it’s automatically going to get backed up.  That simply isn’t the case.

Consider salesforce.com, for example.  It is well documented that they back up your data – but not so you can restore it!  Their backup is for them to restore a datacenter that gets destroyed by a disaster, malicious attack, or even just plain human error of one of their many humans.   However, if you need to use that backup to restore your salesforce instance due to error on your end, it will cost you a minimum of $10,000, and it is a best effort restore that might take several days.  In addition, it’s an all-or-nothing restore, so you are forced to roll back your entire salesforce instance to the last good backup they took, which could be several days ago!  Suffice it to say that relying on this service is a really, really bad idea.

This is still better than Amazon.com.  They do not back up customer data at all.  Their method of protecting against disasters is to replicate everything all over the place. However, if something catastrophic happens on your end, their replication will simply make it more catastrophic by immediately replicating it to multiple locations.  There is no way to recover your AWS instance if you or someone else manages to take it you.  If you don’t believe me, read my post about the death of codespaces.com

The general rule is that backup of the data you place in the cloud is your responsibility – just like it is in the datacenter.  Moving it to the cloud does not change that.

Recommendation

The first thing you need to do is to figure out what data you actually have in the cloud.  Good luck with that.  I’ve got some ideas, but we’ll save those for another post.

The next thing you need to do is find out what the cloud vendor’s policies are in this area.  Do they backup your data at all, or are backups entirely your responsibility?  Please note that I believe that backups are entirely your responsibility, I just want to know if you’re going to get any help from them or not in meeting that responsibility.  Even if you develop your own backup system, it would be nice to know whether or not there is a Plan B.

If they do backup your data, are you allowed to use it?  If so, is there an extra fee like salesforce.com, or can you use it at will?  It would be really nice to test this backup once in a while so you know how it will work when and if you need it.  But you’re not going to test a backup that costs $10K just to try it.

Finally, since the goal here is to have your own independent backup, make sure to investigate the feasibility and costs of doing that.  With salesforce.com, you’ll probably need more API calls, as a regular backup is likely to exceed your base amount.  With hosting providers, you’re talking about bandwidth.  How much will it cost to perform your first off-host backup of your data, and how much will each incremental backup cost you?  You need to know these numbers before investigating alternatives.

If you’re talking a hosted system of any kind, whether a physical machine in a colo somewhere or a VM inside AWS, you need to find out if regular backup software will run inside that machine, or if you are prevented in any way from running a backup application in that machine.  This could be anything from “we have a customized Linux kernel that doesn’t run regular apps” to “you are not allowed to make outgoing connections on non-standard ports.”  Find out the answers to these questions now.

Examine alternatives

If we’re talking about an application like salesforce, you can start by googling “backup application name.”  If you do that with salesforce, you will find several apps that you can investigate and compare the pricing for. You will find that each has set their pricing structure so they are more or less attractive to small or larger instances.  For example, they may have a base price that includes 50 users.  That’s great if you have 50 users, but not if you have 5.  If you have 500 users, though, you might not want an app that charges by individual user if they don’t start giving discounts at larger numbers.

If you’re talking any kind of hosted system running Windows or Linux, you can use most any cloud backup application that uses either source deduplication, continuous data protection (CDP), or near-CDP (otherwise known as snapshots and replication).  This is because after the first full backup is done, each of these will only send new, unique blocks every time they backup.  Since you are likely paying your cloud provider by the bit, this is both financially wise and doesn’t put you at odds with physics.

If you find yourself running an app that there is no way to backup, see if there is an API that can be used to get some of the data out.  For example, even though there are several apps that backup salesforce, what if there weren’t?  There are other apps that can connect via the API to at least grab your leads and contacts and put them into other systems such as databases or even spreadsheets.  It would be better than nothing if you found yourself running such an app that did not have any automated backup options.

Speaking of that, it’s not really a backup if it’s not automated, and it also needs to be stored in some system other than where the primary data is stored.   Again, I hate to keep using salesforce.com as an example, but please don’t tell me you do a weekly manual export of your various salesforce object using Dataloader.  That is better than nothing, but not by much.  Too much human involvement means too much chance for human error.  Automate it and get it offsite.

Just do it

I can’t explain all the options in an article like this, but I can hopefully get you thinking and asking questions about this.  Is your salesforce.com data being backed up? What about those apps you have running in a Linux VM in AWS?  You can’t fix what you don’t acknowledge, so it’s time to start looking.

 

 

 

 

 

Continue reading

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is a Copy a Backup?

Are we breaking backup in a new way by fixing it?  That's the thought I had while interviewing Bryce Hein from Quantum. It made me think about a blog post I wrote four years ago asking whether or not snapshots and replication could be considered a backup.  The interview is an interesting one and the blog post has a lot of good points, along with quite a bit of banter in the comments section.
 
What I mean when I say, "is a copy a backup" is this: traditionally, a "backup" changed form during the backup process.  It was put into tar/cpio/dump format, or the format of some commercial backup tool.  In this process, it made it slightly harder for it to be monkeyed with by a black hat.
 
I'm a fan of putting operational backup and recovery on disk.  I'm even a bigger fan of backing up in such a way that a "recovery" can simply be done by using the backup as the primary while the real primary is being repaired.  It offers the least amount of downtime in some sort of disaster.

But this does beg the question of whether or not leaving the backup in the same format as the original leaves it vulnerable in some way that putting it into a backup format doesn't.  I think the answer is a big fat no.  Specifically, I'd say that a copy is no more of less susceptible than a file on disk that's in some kind of "backup" format.  Either one could be deleted by a malicious admin, unless you were storing it on some kind of WORM filesystem.  The same is true of backups stored on tape.  If someone has control of your backup system, it doesn't take a rocket scientist to quickly relabel all your tapes, rendering them completely useless to your backup system.

As mentioned in my previous post on snapshots and replication, what makes something a backup (versus just a copy) is not its format.  The question is whether or not it has management, reporting, and cataloging built around it so that it is useful when it needs to be.

In that sense, a CDP or near-CDP style backup is actually more of a backup than a tar tape, assuming the tar tape is just the result of a quick tar command.  The tar tape has not management, reporting, or cataloging, other than what you get on the tape itself.  

I just want to close out by saying that backup products that are making instant recovery a reality are my favorite kind of products.  These include CDP and near-CDP style products like SimpanaZerto, Veeam, AppAssure, RecoverPoint, and any of the storage array or storage virtualization products that accomplish backup via snapshots and replication. This is the way backup should be done.  Backup continuously or semi-continuously, and recover instantly by being able to use the backup as the primary when bad stuff happens.

One thing's for sure: you can't do that with tape. 😉

 
 

Continue reading

----- Signature and Disclaimer -----

For those of you unfamiliar with my work, I've specialized in backup & recovery for 25 years. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.