Is a portable hard drive the best way to backup a laptop?

Short answer: no, it’s the worst way

Portable hard drive

Alright, the worst way would be to not back it up at all.  Sadly that’s the most common way. Other than that, the worst way would be to back it up to a portable hard drive.

Portable hard drives are unreliable

I have used portable hard drives for years, and I can’t tell you how many of them have failed in that time.  Let’s just say it’s in the dozens.  It could be the physics of putting a hard drive in such a small container.  That would explain how they fail much more often than the same drives in a laptop.  Maybe it gets too hot in those enclosures; maybe just being small like that allows them to get roughed up more than they do in a laptop.  All I know is they fail much more often than any hard drive I’ve ever had.  When the hard drive itself doesn’t fail, the electronics around it fail.

It’s with your laptop or PC

Laptop on fire

Using a portable hard drive as your backup means you’re probably storing it next to your PC or putting it into your laptop bag when you travel.  That means it’s right next to the thing it’s protecting.  So when the thing you’re protecting catches fire or gets stolen, your protection goes right along with it.  Remember, you’re just as likely (if not more likely) to have your laptop stolen as you are to have a hard drive failure.

What about DVD backup?

DVDs are more reliable than hard drives, but they have their own problems.  The biggest challenge is that the capacity and throughput are way off from what most people need. Hard drives can easily hold many hundreds of gigabytes of data — even terabytes.  Backing that up to even BluRay DVDs is going to take a lot of CDs and a lot of time.  The transfer rate of burning something in with a laser is pretty slow.

So what do you do, then?

I don’t see any other sensible method than to back it up automatically to a system designed to back up laptops and desktops over the Internet.  This could be a piece of software you purchase and install on systems in your datacenter.  If you go that route, however, you’re going to need to make sure the system works for people who aren’t on the corporate network.

What makes the most sense for this data is a cloud-based data protection system. It would support everyone no matter where they reside.  There are no hard drives to manage, no backup hardware to purchase and manage, and everyone everywhere can backup their computers and access their backups.

What do you think?  Is there a better way to back up laptops and desktops than the cloud?

 

 

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Where does data come from: Laptops & desktops

The datacenter is no longer the center of data.  Data that needs to be protected comes from a variety of sources, most of which are not the datacenter. The first one I’m going to talk about is laptops and desktops.

There was a time when personal computers were used to access company data, rather than create it. In my first corporate job, I remember using a 3270 terminal to use Lotus 123 or Word Perfect.  Documents created in that terminal were not stored on that terminal; it had no hard drive or floppy drive!

(From IBM 3270 on Wikipedia)

Documents created on that computer were stored on the company’s servers in the datacenter. Then I was responsible for backing up those servers. I remember backing up hpfs01, or HP file server 01, where all that data was stored.

If you wanted to create data, you came to the office and you used the 3270 to do so.  No one took their data home.  No one created data at home.  Even once we added the ability to dial in from your home PC, you used a terminal emulator to telnet into the Lotus or WordPerfect server to do your actual work.

Enter Windows, stage left

I still remember the first time I saw Joe (his real name) using Windows in the office, and I remember they were using some new thing called Microsoft Word. I remember fighting the idea for so many reasons, the first of which was how was I supposed to back up the data on that guy’s floppy drive?   We forced that user to store any data he created in his home directory on hpfs01.  Problem solved.

We weren’t in danger of having Joe take his work home.  His PC was strapped to his desk, as laptops just weren’t a thing yet. I mean, come on, who would want to bring one of these things home?  (From http://www.xs4all.nl/~fjkraan/comp/ibm5140/ )

Enter the laptop

Once laptops became feasible in the mid to late 90s, things got more difficult. Many companies staved off this problem with corporate policies that forced employees to store data on the company server.

For a variety of reasons these approaches stopped working in the corporate world. People became more and more used to creating and storing data on their local PC or laptop.

A data protection nightmare

The proliferation of data outside the datacenter has been a problem since the invention of cheap hard drives.  But today it’s impossible to ignore that a significant amount of data resides on desktops and laptops, which is why that data needs to be protected.

It must be protected in a way that preserves for when that hard drive goes bad, or is dropped in a bathtub, or blows up in a battery fire.  All sorts of things can result in you needing a restore when you have your own hard drive.

It also must be protected in a way that allows that data to be easily searched for electronic discovery (ED) requests, because that is the other risk of having data everywhere. Satisfying an ED request for 100s of laptops can be quite difficult if you don’t have the ability to search for the needle in a haystack.

My next post will be about why portable hard drives are the worst way you can back up this important data.

Check out Druva, a great way to back up this data.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

My head’s in the clouds: I just joined Druva

After almost 25 years of specializing in backup and data protection as an end user, consultant, and analyst, I’ve decided to work for my first vendor.  I started today at Druva.

Why a vendor?  Why Now?

I figured that it was time to put up or shut up. Put my money where my mouth is.  To fully understand this industry I have to experience it from all sides, and that includes the side trying to make it all happen.  I’ve been an end user, a consultant, and an analyst.  Now it’s time to try making it happen.

Why Druva?

I’ve been a fan of cloud-based data protection for some time now, as anyone who ever attended one of my backup schools can attest.  It makes the most sense for the bulk of the market and offers a level of security and availability simply not available with traditional solutions.

Anyone who has heard me speak knows I’m not anti-tape.   In fact, I think tape is a great medium for some things. But it hasn’t been the right medium for operational backup for quite some time.  Obviously more to come on this and other subjects.

But if disk is the right medium for operational backup, how do you get that data offsite to protect against disasters?  There are many answers to this question, but I have felt for a long time the best answer is to back up to the cloud.  If your first backup is to the cloud, then it’s already offsite.

Of course, having your only copy of data in the cloud can be problematic for large restores with a short RTO. This is why Druva has the ability to have a local copy of your data to facilitate such restores.

Druva was founded in 2008 by Jaspreet Singh and Milind Borate and it has over 4000 happy customers running its products.  Druva’s first product was inSync, which focuses on protecting & sharing data from desktops, laptops, and cloud applications such as Office365, GSuite, and Salesforce.com. Druva’s second product is Phoenix, which is designed to protect datacenters.  It protects VMware and Hyper-V workloads, as well as physical machines running Linux or Windows.   One of  Druva’s differentiators is that all data, regardless of source or type, is stored in a central deduplicated repository to facilitate data governance, ediscovery, and data mining.   I’ll be talking more about those things as I learn more about the company and its products.

This post was going to be longer, but the first day at my new job turned out to be a lot of work.  So I’ll keep it short and sweet. Mr. Backup has joined Druva!

Keep it cloudy, my friends.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Not possible to export Quickbooks Online Data to Quickbooks Desktop

Intuit has created a catch 22 that can only be resolved with professional help.  The good news is they’ll help you.  The bad news is you might have to pay for it.

Suppose you’ve been using Quickbooks Online and have decided that you would like to switch to the Desktop Edition for whatever reason.  No problem.  Just google how to export your Quickbooks Online data to the desktop edition.  Google’s quick result article tells you just how to do it.  Except the instructions don’t work.  Bad Google.

So you go into Quickbooks Online and search “Export data to desktop” and you’ll find an article that has better instructions.  You need Internet Explorer.  But I’m on a Mac.  <sigh>


So I find a Windows machine so I can run Internet Explorer.  I get the login screen and I try to login.  It just spins and spins.  So I Google “can’t login to Quickbooks Online with Internet Explorer.” I find this:

Ugh.  So I call tech support and ask them what to do.  They recommend I install IE 10.  You know, the version that was replaced over three years ago.

Except when i try to install IE 10 it says it won’t install.  Maybe I need to uninstall 11 first, right?  Well, it doesn’t show up in the “Uninstall Software” dialogue.

So they require me to use a piece of software then tell me that’s not the best software to use.  Just wonderful.

I’m on hold right now.  They tell me that because I’m in this catch 22, they’ll do the full service export for free.  Except that now I’m being grilled and being told that it should work.  Except it doesn’t.  And your own site says it doesn’t.

So…

 

Spectra takes aim at Amazon Glacier

I recently attended the Spectralogic Deep Storage Summit in Boulder Colorado.  (They paid for all travel and meals during the trip, but no other remuneration was offered.)  Their big announcement was a product that is aimed solidly at Amazon Glacier: Spectra ArcticBlue.

ActicBlue is an object-based disk system starting at 300 usable TB and going up to over 5 PB that sits directly in front of a Spectra tape library.  It's aimed squarely at Amazon Glacier because its interface is S-3.  You can do a get or put to it just like you would to a bucket in Amazon, except the data would be stored in the (up to) 5 PB disk cache and stored on tape in a Spectra tape library — which scale to multiple Exabytes. The product is built on top of the BlackPearl architecture that they announced two years ago.

Two products came immediately to mind when thinking about this product.  Quantum's Lattus & Amazon's Glacier.  It would seem that Spectra is actually aiming solidly at both.  Here are a few things that are very interesting about the product.

Erasure Coding

ArcticBlue uses erasure coding — not RAID — to ensure that data on disk is not corrupted or lost.  Disks are grouped into "bands" of 23 drives, which are part of a 20+3 erasure coding group.  This very wide band offers protection from up to three simultaneous disk failures with very minimal overhead.  If you're not familiar with erasure coding and how it is definitely not RAID, check out this article from ComputerWeekly.

Power-Down at the Band Level

When an application does a get or put to/from an S-3 bucket, only the units that comprise that bucket need to be on.  This means that the rest of the system can be powered off to both save power and cooling and to extend the life of the unit.  This is why they are advertising a 7-year lifespan for this product and not a 3-year lifespan.  This was one big difference I saw between the ArcticBlue unit and Lattus.  Lattus does not appear to have any power down features.

Genetic Dispersion

An S-3 bucket can be configured to span both disk and tape, ensuring that any files put onto disk are also put onto tape.  It could even span multiple tape types, since Spectra supports both LTO & IBM TS drives.  This means that the system could ensure that every file is always on disk, LTO, and IBM TS tape.  Spectra referred to this as increasing genetic dispersion.  Genetic dispersion protects against multiple types of failures by putting data on multiple different types of media.  The system can also be told to make sure one copy is also offline.

Future iterations of the product could have a bucket that spans location, so that any data is always copied to multiple locations. 

Shingled Magnetic Recording (SMR) drives

A new type of media from Seagate is called Shingled Magnetic Recording, and it allows data to be stacked on top of each other — just like shingles on a roof.  The upside of this is that it increases the density of the disk by about 25%.  The downside is that — like roof shingles — you can't remove a lower layer of shingles without removing an upper layer.  Therefore, writing to an SMR drive is a lot like writing to tape.  You can append all you want, but once you wan to go back and modify things, you have to erase the whole thing and start over.  Spectra said this is why they were uniquely suited to leverage these drives.  (Their marketing slick says, "It took a tape company to unleash the power of disk.")  Using these drives requires advanced planning and logistics that they claim is built into their system from day one. 

Why would you use such drives, you may ask?  Cheaper and bigger while being smaller.  That is the drives have bigger capacities than are possible without SMR today, and therefore allow you to put more data in less space and also save money.

TCO

The most interesting part of me what when they compared the TCO of having your own S-3 cloud onsite using ArcticBlue vs. doing the same thing with Glacier or S-3.  I have not delved into the TCO model, but according to them it is at least of magnitude cheaper than Glacier.  So there's that.

I'd be interested in hearing from anyone who actually deploys this product in his or her datacenter.

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Running CrashPlan on a headless CentOS/cpanel server

I was helping someone figure out how to back up their CentOS/Cpanel-based web server using CrashPlan.  He was already backing it up via rsync, but he wanted to back it up with a cloud backup product.  Code42 advertises that CrashPlan and CrashPlanPro support Linux, so how hard could it be?  Not hard at all if you know what to do.  But if you're on a headless web server you will be at the mercy of what you can find on the Internet, as Code42 won't help you at all as you're running an "unsupported configuration."

We got it to work, but only after trying multiple different methods that didn't work.  So I thought I'd describe what we did that eventually worked, and hopefully someone else will find this when they're in the same situation.

What didn't work

Code42 has an "unsupported" (but totally reliable) method to connect the CrashPlan app on your desktop to the CrashPlan service running on the web server by using ssh tunneling.  It's described here.  We were able to make that method work to configure the backup, but then the backup wouldn't run.  It just stayed stuck at "waiting for backup."  We contacted Code42, but they said they couldn't help us at all because we were running an unsupported configuration.  More on that at the end of this blog.

I thought the path to take would be to see if we could use the GUI that is supposed to display on the console of the server, but display it back to our desktop — a MacBook in this case.  (Something totally normally in Unix/Linux configurations.)  Then since I would be running the GUI directly from the server being backed up, I could then call support.  It turned out I ended up fixing it myself, though.  Let's see what I did.

Use ssh to forward X11

Since MacOS no longer uses the X11 Window System (BTW, it's not "X Windows"), I needed to install Xquartz, which I got from here. We followed the instructions and they seemed to work without a hitch.

X11 forwarding is not turned on by default in CentOS, so you have to edit the sshd config and restart sshd.  (Thanks to this blog post for helping me with this.)

sudo vi /etc/ssh/sshd_config

Uncomment and change these two lines to these values

X11Forwarding yes
X11UseLocalhost no

Now restart sshd.

$ sudo /etc/init.d/ssd reload

If you do not have xauth installed already, you need to install it, too.

$ sudo yum install xauth

Then back on the client where you want to see the GUI displayed, run this command:

$ ssh -l root -Y <linuxserver>

We saw a message that mentioned that xauth had created a new authority file.

To test if it was working correctly, we wanted to run xterm.  But that wasn't installed yet, so we installed it.

$ sudo yum install xterm
$ xterm

We waited a few second, and voila!  An xterm popped up on the Mac.  Awesome.  

Run CrashPlanDesktop

$ /usr/local/crashplan/bin/CrashPlanDesktop
$

It just returned the prompt to us and never did anything.  When we looked at the log directory, we saw error messages like the ones mentioned in this blog post.  We followed the suggestions in that blog post about creating temporary directories that CrashPlan can write to, and then specifying those directories in the run.conf file.

$ mkdir /root/.crashplan-tmp
$ mkdir /var/crashplan
$ vi /usr/local/crashplan/bin/run.conf

Add this to the end of the GUI_JAVA_OPTS line: "-Djava.io.tmpdir=/root/.crashplan-tmp"
Add this to the end of the SRV_JAVA_OPTS line: "-Djava.io.tmpdir=/var/crashplan"

So run.conf now looks like this:

SRV_JAVA_OPTS="-Dfile.encoding=UTF-8 -Dapp=CrashPlanService -DappBaseName=CrashPlan -Xms20m -Xmx1024m -Djava.net.preferIPv4Stack=true -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 -Dc42.native.md5.enabled=false -Djava.io.tmpdir=/var/crashplan"

GUI_JAVA_OPTS="-Dfile.encoding=UTF-8 -Dapp=CrashPlanDesktop -DappBaseName=CrashPlan -Xms20m -Xmx512m -Djava.net.preferIPv4Stack=true -Dsun.net.inetaddr.ttl=300 -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.negative.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 -Dc42.native.md5.enabled=false -Djava.io.tmpdir=/root/.crashplan-tmp"

After that, everything worked perfectly!

Epilogue: Vindication

We fixed the GUI_JAVA_OPTS line first and were then able to run the GUI and configure the backups, but the backup was still stuck at "waiting for backup."  That was exactly what happened when we used the method of running the GUI local on the Mac and connecting to the CrashPlan service on the web server.  We then changed the SRV_JAVA_OPTS line and backups kicked off immediately.

In other words, the reason the backup wasn't working had nothing to do with us running an unsupported GUI configuration and had everything to do with the CrashPlan app trying to use directories that it couldn't write to.  Now back to Code42.

You can support something that isn't "supported"

Just because a customer is running an unsupported configuration, that doesn't mean you can't help him troubleshoot something.  The Code42 support person could have told us where the logs are, for example.  (Yes, they were in the obvious place of /usr/local/crashplan/logs, but we didn't know that.)  Luckily we googled the right thing and found that web page.  Luckily we knew what X11 was and could figure out how to install it on our Mac.  They could have at least helped a little.  Instead, they simply said I was running a system that didn't meet the minimum requirements, so he literally could not help me in any way to troubleshoot the problem.

This is very reminiscent of when I was trying to install a Drobo on my iMac in my house. The blog post I wrote back then was to tell Data Robotics to either support Linux or drop it.  I still feel the same way right now, but in this case the problem is not that they aren't supporting Linux; it's that they don't support headless Linux, which is what most web servers are running.

It isn't that hard to provide some "best effort" support to someone.  They could also enhance that "how to run CrashPlan on a headless Linux system" post by adding this X11 Forwarding idea to it.  Then if a customer has a few questions, help them.  Tell them it's unsupported and that the support will be best effort.  But make the effort.  Seriously.

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is your data in the cloud getting backed up?

Roaming the aisles of Dreamforce 2014 taught me one thing: backups are here to stay.  You can move everything you can into the cloud, but your data still has to be protected against natural and human-created disasters.  Moving it to the cloud doesn’t change that.

I’ve always felt that way, but I thought for a while that maybe I was just a lone reed in the wind; only I was worried about data that had been moved to the cloud.  Everyone else was happy with the backups of their mission-critical data being put into the hands of the cloud provider. 

It was with some joy that I welcomed Backupify to the salesforce.com world when I first heard about them a few years ago.  (To my knowledge, they were the first vendor to offer backup of your salesforce.com data, and the first to backup Facebook, Gmail, and others.)  But I wondered whether or not there would be enough people concerned about their cloud-based data to justify adding that expense to their cloud infrastructure bill.  They might think, for example, that a company the size of salesforce.com is backing up their data – so why should they pay to do it as well.   Only time would tell.

Walking around Dreamforce 2014, though, put my fears to rest.  There were three other companies exhibiting backup solutions for salesforce.com (that I could see), and there are a few others that I found via a simple “backup salesforce”  search.  By the way, I’ll cover these companies in another post.

The key concept I wanted to cover here is that some people believe that by moving their data to the cloud, it’s automatically going to get backed up.  That simply isn’t the case.

Consider salesforce.com, for example.  It is well documented that they back up your data – but not so you can restore it!  Their backup is for them to restore a datacenter that gets destroyed by a disaster, malicious attack, or even just plain human error of one of their many humans.   However, if you need to use that backup to restore your salesforce instance due to error on your end, it will cost you a minimum of $10,000, and it is a best effort restore that might take several days.  In addition, it’s an all-or-nothing restore, so you are forced to roll back your entire salesforce instance to the last good backup they took, which could be several days ago!  Suffice it to say that relying on this service is a really, really bad idea.

This is still better than Amazon.com.  They do not back up customer data at all.  Their method of protecting against disasters is to replicate everything all over the place. However, if something catastrophic happens on your end, their replication will simply make it more catastrophic by immediately replicating it to multiple locations.  There is no way to recover your AWS instance if you or someone else manages to take it you.  If you don’t believe me, read my post about the death of codespaces.com

The general rule is that backup of the data you place in the cloud is your responsibility – just like it is in the datacenter.  Moving it to the cloud does not change that.

Recommendation

The first thing you need to do is to figure out what data you actually have in the cloud.  Good luck with that.  I’ve got some ideas, but we’ll save those for another post.

The next thing you need to do is find out what the cloud vendor’s policies are in this area.  Do they backup your data at all, or are backups entirely your responsibility?  Please note that I believe that backups are entirely your responsibility, I just want to know if you’re going to get any help from them or not in meeting that responsibility.  Even if you develop your own backup system, it would be nice to know whether or not there is a Plan B.

If they do backup your data, are you allowed to use it?  If so, is there an extra fee like salesforce.com, or can you use it at will?  It would be really nice to test this backup once in a while so you know how it will work when and if you need it.  But you’re not going to test a backup that costs $10K just to try it.

Finally, since the goal here is to have your own independent backup, make sure to investigate the feasibility and costs of doing that.  With salesforce.com, you’ll probably need more API calls, as a regular backup is likely to exceed your base amount.  With hosting providers, you’re talking about bandwidth.  How much will it cost to perform your first off-host backup of your data, and how much will each incremental backup cost you?  You need to know these numbers before investigating alternatives.

If you’re talking a hosted system of any kind, whether a physical machine in a colo somewhere or a VM inside AWS, you need to find out if regular backup software will run inside that machine, or if you are prevented in any way from running a backup application in that machine.  This could be anything from “we have a customized Linux kernel that doesn’t run regular apps” to “you are not allowed to make outgoing connections on non-standard ports.”  Find out the answers to these questions now.

Examine alternatives

If we’re talking about an application like salesforce, you can start by googling “backup application name.”  If you do that with salesforce, you will find several apps that you can investigate and compare the pricing for. You will find that each has set their pricing structure so they are more or less attractive to small or larger instances.  For example, they may have a base price that includes 50 users.  That’s great if you have 50 users, but not if you have 5.  If you have 500 users, though, you might not want an app that charges by individual user if they don’t start giving discounts at larger numbers.

If you’re talking any kind of hosted system running Windows or Linux, you can use most any cloud backup application that uses either source deduplication, continuous data protection (CDP), or near-CDP (otherwise known as snapshots and replication).  This is because after the first full backup is done, each of these will only send new, unique blocks every time they backup.  Since you are likely paying your cloud provider by the bit, this is both financially wise and doesn’t put you at odds with physics.

If you find yourself running an app that there is no way to backup, see if there is an API that can be used to get some of the data out.  For example, even though there are several apps that backup salesforce, what if there weren’t?  There are other apps that can connect via the API to at least grab your leads and contacts and put them into other systems such as databases or even spreadsheets.  It would be better than nothing if you found yourself running such an app that did not have any automated backup options.

Speaking of that, it’s not really a backup if it’s not automated, and it also needs to be stored in some system other than where the primary data is stored.   Again, I hate to keep using salesforce.com as an example, but please don’t tell me you do a weekly manual export of your various salesforce object using Dataloader.  That is better than nothing, but not by much.  Too much human involvement means too much chance for human error.  Automate it and get it offsite.

Just do it

I can’t explain all the options in an article like this, but I can hopefully get you thinking and asking questions about this.  Is your salesforce.com data being backed up? What about those apps you have running in a Linux VM in AWS?  You can’t fix what you don’t acknowledge, so it’s time to start looking.

 

 

 

 

 

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is a Copy a Backup?

Are we breaking backup in a new way by fixing it?  That's the thought I had while interviewing Bryce Hein from Quantum. It made me think about a blog post I wrote four years ago asking whether or not snapshots and replication could be considered a backup.  The interview is an interesting one and the blog post has a lot of good points, along with quite a bit of banter in the comments section.
 
What I mean when I say, "is a copy a backup" is this: traditionally, a "backup" changed form during the backup process.  It was put into tar/cpio/dump format, or the format of some commercial backup tool.  In this process, it made it slightly harder for it to be monkeyed with by a black hat.
 
I'm a fan of putting operational backup and recovery on disk.  I'm even a bigger fan of backing up in such a way that a "recovery" can simply be done by using the backup as the primary while the real primary is being repaired.  It offers the least amount of downtime in some sort of disaster.

But this does beg the question of whether or not leaving the backup in the same format as the original leaves it vulnerable in some way that putting it into a backup format doesn't.  I think the answer is a big fat no.  Specifically, I'd say that a copy is no more of less susceptible than a file on disk that's in some kind of "backup" format.  Either one could be deleted by a malicious admin, unless you were storing it on some kind of WORM filesystem.  The same is true of backups stored on tape.  If someone has control of your backup system, it doesn't take a rocket scientist to quickly relabel all your tapes, rendering them completely useless to your backup system.

As mentioned in my previous post on snapshots and replication, what makes something a backup (versus just a copy) is not its format.  The question is whether or not it has management, reporting, and cataloging built around it so that it is useful when it needs to be.

In that sense, a CDP or near-CDP style backup is actually more of a backup than a tar tape, assuming the tar tape is just the result of a quick tar command.  The tar tape has not management, reporting, or cataloging, other than what you get on the tape itself.  

I just want to close out by saying that backup products that are making instant recovery a reality are my favorite kind of products.  These include CDP and near-CDP style products like SimpanaZerto, Veeam, AppAssure, RecoverPoint, and any of the storage array or storage virtualization products that accomplish backup via snapshots and replication. This is the way backup should be done.  Backup continuously or semi-continuously, and recover instantly by being able to use the backup as the primary when bad stuff happens.

One thing's for sure: you can't do that with tape. 😉

 
 

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Keep Your Private Data Private – Good Password Practices Part 2

This is the third post in a series of posts on keeping your private data private.  It was inspired by the Jennifer Lawrence (et al) nude photo scandal, and then encouraged by the “gmail hack,” which wasn’t really a gmail hack — which was published literally while i was working on this post.  Previous posts talked about two-factor authentication and preventing hackers from guessing your password.

As I said in the last post, password best practices boil down to three things: preventing hackers from guessing your password, preventing them from stealing it in plain text, and limiting the damage if they do either one.  This blog post is about protecting yourself form the second two.  To read about protecting against the first one, read my previous blog post.

Note: if at any point in this article, you find yourself saying “give me a break” or your eyes start rolling into the back of your head due to boredom, just skip to the next blog post where I talk about password managers.

Limiting the damage if hackers steal your password 

You should assume that any given password may eventually get compromised.  Therefore, you do not want to use the same password on every system. It’s one thing to have your gmail.com account password in the hands of bad guys.  But if that same username and password are used on your amazon.com site?  You’ll be buying $500 espresso machines for all your best friends in the Czech Republic before you can say Carlovy Vary.

Now I’ve gone and made it impossible, right?  I want you to use a hard-to-guess password, I don’t want you to write it down, and I want you to use a different one on every system.

One thing that people do that is to combine the password mentioned above with a 2-3 letter code for each site.  Prepend, append, or (better yet) split your password with this code.  So take the “base” password above and make it one of these for Facebook:

FbStephen12p$4#oS
FStephen12p$4#oSb
FBStephen12p$4#oS
Stephenfb12p$4#oS

Then you do the same thing for your other accounts that you have.  This has the benefit of giving you a unique password for every site that’s relatively easy to remember, and it makes it harder to guess.  Adding those two characters increases the entropy of the password as well.

Another thing that people do is to have classes of passwords.  They use really secure and separate passwords for sites where money is involved (e.g. bank, Amazon.com, any site that stores your credit card), another set of passwords for sites with sensitive personal information (e.g. facebook, gmail, dropbox), and then a “junk” password they use at places where you wouldn’t care if it got hacked (e.g. The website that stores your recipes).

Preventing them from stealing your password in plain text

This blog post says that half of all internet sites store your passwords in plain text. For example, but it was revealed only a few years ago that LinkedIn was storing passwords in plain text.  You’d think they’d know better.  There’s literally nothing you can do to protect against that.  No matter how good your password is; if they steal the password file and your password is in plain text — you’re toast.  Well, shame on them.  

What you can do, though, is to avoid installing software that would steal your passwords as you type them by watching your keystrokes. Don’t click on emails you don’t recognize.  Don’t click on emails from places you do recognize!  If Bank of America sends you an email, open BOA’s website on your own and log in.  Don’t click on the link in the email.  If you do, at the very least you’re letting a spammer know you’re a real person.  Possibly it’ll be a really normal looking website that is nothing but a dummy site made to look like BOA and designed to steal your password as you type it in.

Also, no bank should ever call you and ask you for personally identifiable information, either.  They should not be calling asking for passwords, your SSN, or anything like that.  Unfortunately, some actual banks do this.  The bank I belong to will call me about some fraud, and then ask me to verify my identity by giving them my account number or SSN or something.  I refuse to give them that information and then I call back the actual number of the bank and talk to the fraud department.  In my case, it really is the bank just doing stupid stuff.  But it could be someone just trying to steal your passwords. But I believe it’s a really bad idea for banks to teach people that someone might call them and ask them for such information.

And if you get a phone call from “computer support” claiming you’ve got a virus and they need to login to your computer to fix it, again… hang up!  Tell them they’re full of crap and they are a worthless excuse for a human being.  In fact, feel free to unload the worst things you’ve ever wanted to say to a human being to them.  It’ll be cathartic, and it’s not like they can complain to anyone.

This practice of trying to get you to give up your password or other personal info is referred to as social engineering.  If you want to see how it works, watch a great movie called Sneakers, or a not-as-great movie called Trackdown.  Both are available on Netflix On-Demand, and they both show you exactly the kinds of things hackers do to get people to reveal their personal information.  Sneakers is the better movie, but Trackdown is actually more technically correct.  It’s loosely based on the story of Kevin Mitnick, considered one of the greatest hackers of all time.  (In real life, Kevin Mitnick now does what Robert Redford’s character does in Sneakers.)

Use a Password Manager

This is becoming my default recommendation. Use a password manager to create random passwords for you, remember them, and enter them for you.

I’m talking about products like 1passwordlastpass, and Dashlane.  Instead of having to create and remember dozens of different passwords, you can just have them create and store your passwords for you.  I have been trying out Dashlane and like it quite a bit.  Some of them also support two-factor authentication, something I talked about in my last post.

The first thing Dashlane did was to import all of the passwords stored in my browser.  It turns out there were 150+ of them!  If I did nothing else, it would allow me to turn off the “remember password” feature on my browser.  (It’s a really bad feature because if someone gets your laptop, they have the ability to automatically login as you to your most important sites, and your browser’s history will take them right to those sites.)  

The second thing Dashlane did was to run a security audit on all my passwords.  Like many people, I failed the audit.  But then they walked me through exactly what I needed to do to make things all better.  They also synchronized my passwords to my iPad and Android phones. 

The software will remember your passwords and automatically log you in — but not before requiring you to login to the password manager (usually once per session). That way if someone stole your laptop, they wouldn’t be able to use the password manager to gain access to anything — assuming you didn’t put your master password on a sticky on your laptop, of course. 😉  They also allow you to specify that a particular site requires an entry of the master password every single time you use it, not just once per session. Pretty impressive stuff.

They unfortunately don’t yet support logging into apps on iOS/Android, but it can sync your passwords to those devices.  That way if you forget a given password, it can either display it to you or copy it into the buffer so you can paste it into the app.  I’ve been pretty impressed with Dashlane.

Summary

•    Don’t use easy to guess passwords

•    Don’t use the same password everywhere

•    Don’t open stupid stuff that’s designed to steal your data

•    Consider using a password manager

I hope this post helps and hasn’t been too overwhelming.

 

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.