Someone in Ohio should have been fired

[This story originally happened in 2007, but I just learned about it, so I blogged about it.  Then I learned that it was a four-year-old story.    Everything here still applies, even if the actual story is old.  But I did re-edit the story and change it’s title because the original wording seems a bit odd four years later. ]

Someone in the office of the State of Ohio should have been fired, and it isn’t the guy who already got fired.  He should get his job back.  This story has me fuming.  I don’t often write blogs like this, but here it goes.

The story as it was published in 2007 was “Intern loses backup tape with 800,000 SSNs on it. Intern fired.”  The real story, in my opinion is what led up to this.  I read this article and this statement from the intern, and learned that the following allegedly happened in the State of Ohio:

1. The State of Ohio used (and may still use) unencrypted backup tapes to store SSNs and names

If your company or government entity is currently making tapes of any kind with SSNs on them then fix it.  Fixing this costs so little now that it is simply unforgivable not to be encrypting your backups tapes — especially if you’re handing them to a dude in a truck.  If you’re handing them to an intern to take them home in a car… well, I really don’t know what to say.

This is not a new problem.  It’s not like we haven’t had hundreds — hundreds — of exposures over the past 10 years that show how bad this practice is.  Ignorance of this problem simply isn’t possible at this point.

2. Employees of The State of Ohio wanted to cover this up

They told the intern to not tell the police that one of the things stolen was a tape with sensitive data on it.  Seriously.  This tells me, of course, that they knew their unencrypted backup tape was a bad idea, and that they needed to keep others from knowing what they were doing.  It also tells me that they were liars.

3. The State of Ohio (a $52B/yr enterprise) had the money to hire $150/hr and $200/hr contractors full time, but didn’t have the money to hire Iron Mountain (and still may not have it)

Seriously.  It had been the practice for apparently 10 years or more for someone to take the backup tapes home in their car.  Do I really need to say why this was stupid?  A hot car is not where tapes should ever be stored — ever.  Asking someone who is off the clock to handle company property of any kind is also wrong.  Tapes — especially unencrypted tapes — should only be handled by professionals with procedures and policies to do such things.

No one ever told this young man what to do with this tape other than to bring it back the next day.  So not only was the practice to have him take it home, the practice was not to even give him any special instructions on how to handle the tape. Wow.

4. These same employees and their lawyer were bullies who needed a scapegoat and found one

The story about how they bullied this young intern into signing a resignation is just tragic.  He asked for an hour to think it over and they said no.  He asked for 20 minutes.  No.  He asked for 10 and they said no.  Just sign the paper.

Jared, if you’re reading this, I would gladly act as an expert witness on your behalf for any kind of wrongful termination lawsuit you want to file. (I know this offer is a little late, but it’s still out there.)

Someone in Ohio should have opened an investigation about the lack of security of taxpayers’ personal information, as well as the details behind this story.  But if that never happened (and I can’t find any evidence that it did), it’s probably too late now.

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Have we put tape out to pasture too soon?

A week at NABShow (National Association of Broadcasters) and two days at Tape Summit last week have given me a chance to revisit my thoughts on tape.  Here's a brief summary of how my opinion of tape has changed over the years:

Stage 1: Tape was it.  It was all I knew. Backing up to disk was crazy, as it was too expensive.  (early 90s)
Stage 2: Tape was still it, but tape drives were getting too fast.  Multiplexing or disk staging was starting to be required.  Disk was too expensive to hold backups long term.
Stage 3: The dedupe craze hit.  It was both theoretically possible, as well as financially feasible (for some) to store all backups on disk — and still have an offsite copy.
Stage 4: (Pretty recently).  I compared the pricing of today's dedupe systems to similarly-sized tape systems.  I was shocked at how expensive disk still was (4x-8x the price of tape).
Stage 5: (Today) I think we have unsuccessfully put a very good backup and archive target out to pasture and we should really reconsider that.

[Update: Because people tend to read my old articles, I'm going to update this one almost a year later to reflect my current position on tape.]

[Update2: I just wrote this blog post about my response to another article about this topic.]

First, let me state that I am not saying that we should not have disk in a backup system, or that deduped systems are over-rated. What I am saying is that tape has more to offer than we've been giving it credit for lately. Here are some factors that came into my mind while considering this:

It costs 4-8 times more to acquire a disk-based backup system than it does to acquire an automated tape system.

[Update 3/9/12: Pricing obviously changes all the time, and prices on disk have come down since this original post.  I even have some vendors that claim to be as cheap as tape on the initial purchase of one disk system vs one disk robot in some situations. ]

While I've heard this from multiple sources, let me give you a real-life example to drive home this point.  I recently priced tape libraries and dedupe disk systems for a 20 TB shop, and I was surprised to learn that disk was actually still way more than the price of tape — even after dedupe.  The average street price of the tape libraries I was considering was about $15K, and the average price of the dedupe systems was about $60K.   Since the customer was getting rid of their (very old) tape library, their choices were:

A) Buy a new tape library, copy tapes and hand them to a dude in a truck  ($15K)

B) Buy a dedupe system AND a tape library.  Copy from the dedupe system to the tape library, and then hand tapes to a dude in a truck. ($60K + $15K)

C) Buy two dedupe systems and replicate between them (no truck needed) ($120K)

Option C was 8 times more expensive than Option A and was out of the question.  While it meant they could get rid of their Iron Mountain bill, they did not believe they could ever save enough money to recoup that additional $105K.  Option B offered no cost savings, so it was difficult to justify the additional $60K.  I pointed out that Option A (if done correctly) requires a disk cache in front of their tape library, but they informed me that they were already doing that.  (Based on their throughput requirements, though, adding a disk cache wouldn't have added that much to the price.)

You can undoubtedly make an argument that a backup-to-disk system is easier to manage than a hybrid tape system, but the simple fact is that the disk system will be more expensive to purchase.

Tape actually has a better bit error rate than disk

For those unfamiliar with the concept of bit error rate (BER), the following definition from Wikipedia should be helpful:

"The bit error rate or bit error ratio (BER) is the number of bit errors divided by the total number of transferred bits during a studied time interval. … The bit error probability p^e is the expectation value of the BER. The BER can be considered as an approximate estimate of the bit error probability. This estimate is accurate for a long time interval and a high number of bit errors."

LTO-5 has a bit error rate of [1:10]^17.  The TS1130 from IBM has a bit error rate of [1:10]^20, & the T10000C from Oracle both have a BER of [1:10]^19.  SATA disk has a BER of [1:10]^14 for SATA (SAS/FC is [1:10]^15 but no one is using that for backup or archive).  This will probably come as a surprise to many people.  Tape has actually gotten so good at writing data, it is more reliable at writing data than disk!

While 10^15 may look really close to 10^17, it's not.  When it's bits we're talking about, it's the difference between 113 TB and 11.1 PB!  It means you are 100 times more likely to have bad data on disk than you are on an LTO-5 tape drive, and 10,000 times more likely than if the data is stored on a T1000C or TS1130 drive!

Tape uses less power than disk

Every time I calculate power consumption for tape systems vs. disk systems, tape systems win.  The reason for this is that tapes in slots take up no power at all, tape drives use very little power while they're not doing anything, and you need far fewer tape drives than you need disk drives.  I recently did a comparison for a 20 TB shop that resulted in at least a 2X difference in power consumption, and that included enough disk to do disk staging before the tape system.  (I plan to publish this once I double/triple check my numbers, but right now I feel pretty safe in saying at least a 2X difference.)

You buy the system once; you power it all day long every day.

Longterm (5+ years) storage of data on disk is not compatible with the typical lifecycle of disk, but it is compatible with tape.

This one is something we don't talk about.  An individual tape is made to hold data much longer than an individual disk, and the lifecycle of most tapes is much longer than the lifecycle of most datasets.  You cannot say the same about disks.  Storing data on disks for more than 5 years automatically assumes that you're going to migrate data from one disk unit to another.

In addition to the media, it is also very common for tape libraries and tape drives to outlast the disk systems sitting next to them. Where most companies migrate data at the end of the depreciation cycle for disk, they tend to keep their tape libraries and drives much longer than that.  They also tend to swap out their drives in the tape libraries; the same is not true in disk units. If you find a disk system in your data center older than five years, I'd be shocked.

What's the problem then?

Let's throw out the claims I've heard:

1. Tape has bitrot

So does disk.  It's called magnetism.  It happens.  The chances of bitrot happening on tape are far less than the chances of it happening on disk. [Update: See this post for further info on this.]

2. Tape is flimsy

Tell you what.  Move disks around the way you move tapes around and see how flimsy they are.

3. 80% of tape restores fail.  [Update 3/9/12: This is a fake statistic that never existed. See my updated blog post.]

This Gartner statistic has been thrown around so much and I really don't know where Gartner got this number from, but it's out there.  [Update: This Gartner statistic never existed.] What I can tell you is that in my entire career of working with backups, I've only had one or two restores that failed due to an actual bad tape — and that's why we make copies.  But I can tell you of dozens of situations where bad disk drives caused me all sorts of headaches.

I can also tell you that most of the restore failures I've seen have been caused by human error – not tape failure.

4. Tape is too slow

Baloney.  Check your facts again.  There isn't a disk drive alive that can keep up with the speed of today's tape drives.

5. Tape is hard to make happy during backups & restores

Agreed.  This is why I believe strongly in using at least disk caching.  I would never design a system that uses just tape to do backups at this point.  I'm actually OK with all of the designs mentioned above (in the A, B, C list).  I think dedupe systems are awesome, and the idea of replicating to another one is even better.  But I also know that doing this is more expensive than the alternative. The other thing I know is that it can't possibly be cheaper to store data on disk for many, many years, and it may even be risky to do so.  (See my comments on BER.

What I'm really making an argument for is the use of tape for long term archiving, and as a less expensive way of getting data offsite.  (Less expensive than having a second dedupe system and replicating to it.)

6. Tapes go bad sitting on the shelf and you never know they're bad until you need them

That is correct.  This is why both Spectralogic and Quantum have come up with products to proactively scan your old archives to find and fix any corruption issues before you need a given tape.  If it finds something wrong, it can be fixed by copying the other copy that you have.


Tape can be your friend for long term archives and cheap offsite storage.  Don't dismiss it so lightly.

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

SNW Reflections

I attended my first SNW conference in over two years last week.  (My previous employer was really good at scheduling me for competing events, so I wasn’t able to go.)  Most of my thoughts about what I saw go hand-in-hand with Don Jenning’s thoughts here.  I agree that it was very cool to see The Cube guys and to know that Infosmack was recording in the social media room.  (I got to sit in on one of those recordings with none other than Dr. Dedupe, Larry Freeman.  I look forward to that podcast.)  And it was my first experience live tweeting from a big event that had a hashtag (#SNWUSA).  That was definitely very cool.  Here are my thoughts, some of which are about SNW, and others that are about the vendors.

First, SNW is still very much alive and well, but it’s definitely not the show that it used to be.  There was a time when your company was considered dead or dying if your major reps were not in attendance.  That appears to be the case no longer.  Sure, there were a lot of companies there.  But there were a lot of companies not there as well.  Perhaps it’s because there are other options now to talk to analysts and other vendors (like The Exec Event) that pull budget money away from this.  Perhaps its also because there is now a whole other group of people (bloggers) that you also need to reach out to.  Vendors are either doing their own thing, like HDS’ recent event, or they’re sponsoring one of the Tech Field Day events (which cost much less than doing it yourself).  Either way, that’s more money being pulled away from this “industry event.”

The other reason (if not the main reason) that vendors do a show like SNW is for new leads.  New leads means new end users to talk to.  I remain unconvinced that this is a show to find such people in significant numbers.  Any time I ever scanned the room, I saw way more vendor, analyst and press badges than end user badges.  I also kept seeing the same end users over and over.

As a former end-user, and someone who continues to see himself as an advocate for such, I am simply not drawn to the content offered by SNW.  With very few exceptions, it’s one talk after another given either by a vendor or someone the vendor is paying to be there (either an analyst or an end user who was sponsored to the show by a vendor).  I do like the SNIA tutorials, BTW, because I know they try really hard to keep those vendor-neutral.  But every other presentation was (IMO) simply a marketing presentation.  They should all be titled: The Correct Way to do X, Which Can Only be Done by Buying Our Product.  I’m not saying that these sessions are bad, mind you.  I’m just saying that they don’t draw me.

I’d much rather hear from independent thinkers that are mostly absent from this conference.  There’s another conference that these people tend to speak at that tends to draw a much larger percentage of end users, and that is Storage Decisions.  I like that show for its content and the make-up of its audience.  It’s a real shame that I seem to be uninvited.  This upcoming Storage Decisions is the first one I haven’t spoken at in a really long time.  Although no official word was given, it’s no coincidence that this happened right after I started hosting my own road show: Backup Central Live.  No worries.  I’ve got plenty of other things to do.

In my final thoughts on SNW, I want to pass out the worst-designed booth award.  Booth design experts tell you that you have 7 seconds to catch someone’s attention.  Anobit’s booth display was whitespace with their company name and the phrase “add another bit.”  What are they, anti-dedupe software? Are they hardware, software, service, what?  Then there was RackSpace.  Their booth, which I took a picture of, said they were the world leader in hosting and cloud computing, but never said the name of their company.  This is something not lost on the people that worked the booth, because they took the ghetto little white sign (hung on the pipe and drape before the vendor gets there so they know which booth to go to) and hung it up and over their display. See this photo for what I’m talking about.  So we have a booth with a company name and no description, and a booth with a very big description but no company name.  I declare a tie.

I do have some thoughts to post about companies that I met and got briefings from.  They will come soon.

Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.