Login Form

About Me

RSS Feed

Mr. Backup Mr. Backup

Curtis' Random Thoughts

Don't be a Snail: Stop Changing Your Backup Software

PDFPrintEmail

Written by W. Curtis Preston Thursday, 10 May 2012 17:10

Sometimes I walk to work in the mornings (I live just over 2 miles from the office), and if I leave early enough I walk down this particular sidewalk that has just been sprinkled with water.  On each side of this sidewalk is grass, and watered grass here in Southern California tends to mean that snails will be there. The snails take advantage of the wet, cool sidewalk (and the fact that the sun isn't overhead), and they decide to cross from one bunch of grass to another bunch of grass. 

On any given morning there will be several hundred snails crossing from one side to another.  200 or so will be crossing from the left to the right, and 200 or so will be crossing from right to left.  You know, 'cause the grass on the other side is, well... you know.  The funniest one I saw was a snail that had made it 95% of the way from one side to the other, and then changed his mind and turned around.

I got to thinking about backup software (as one does), and all of the people I know that are moving from product A to product B.  Then a bunch of other people that are moving from product C to product A, while others are moving from product B to product C -- and they all think that this will make their lives soooooo much better.

I've done hundreds of backup assessments over the years.  I can only think of one or two where the gist of the recommendation was, "Your backup software sucks.  You should change it."

I can, however, think of many, many times where the problem was "you're not streaming your tape drives," or "you're manually specifying an include list and you should use the auto-selection feature," or "you're making too many full backups," or "you're using the scheduler in a way that it wasn't designed to work," and on and on and on.

Changing backup products is one of the riskiest things you can do to your backup environment.  The learning curve of the new backup product is almost definitely going to reduce your recoverability for a significant period of time.

What would be much better is to bring in an expert in that product for a few weeks and have him/her tell you how best to use the product you already have.  The learning curve is much easier, the cost is much lower, and the period of instability will be much shorter.

Don't be a snail.  Learn what the grass on your side of the sidewalk really tastes like before you start crossing the sidewalk.  Remember that some snails die along the way.

 

 

IT is not in charge!

PDFPrintEmail

Written by W. Curtis Preston Saturday, 07 April 2012 01:18

 I was helping a guy on a plane understand what "the cloud" is.  Once I did that, we begun a discussion on trust.  I shared with him my opinion that we have been trusting other vendors since we started IT.  We trust every hardware and software we have not to put backdoor stuff in our hardware or software that is designed to do things we don't know about. We trust technicians to know enough not to use bad passwords. (Of course, sometimes we're wrong.)  I don't see trusting a cloud vendor as being so terribly different.

I'm sure a bunch of you will focus on that first paragraph, and not on what this blog post is actually about. But here goes anyway.

Eventually we got to the part of the discussion where he mentioned that "our IT department would never allow that."  He explained how he has to carry three laptops (personal, corporate 1 and corporate 2) whenever he travels and how he has to dial four digits on his phone before he makes any calls.  I'm guessing that we just hit the tip of the iceberg of how his IT department is soooo security concsious that they have forgotten their primary purpose -- to enable people to do work.  (BTW, this guy wasn't working on missile launch codes or anything.  I forgot what he does for a living but I remember wondering was security was that important for this particular company.)

I ranted a little bit about that to him, to which he replied, "well, they are in charge."  I asked who he meant, and he said, "IT."

I just about lost it.

If you are in IT and you think you are in charge, you are wrong.  The only thing you are in charge of is helping people get their job done.  We buy decent laptops & desktops, so they'll stay up and people can get their job done.  We make backups so when things go wrong, we can get people their work back, and let them get their job done.  The only reason we do security things is to keep our company from losing the efforts of the people that work there.

Sometimes IT people forget that we are there to serve the business.  If you enact a security policy that's so rigid that it slows down people's work, you forgot your job. If you turn on a backup system that slows down the servers, and by association the work of the people, you forgot your job. 

You are not in charge.  The business is.  I feel better now.

 

 

FCC is a LITTLE out of date WRT its backup designs

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 21 March 2012 00:15

The FCC gives discounts to schools and libraries if they want to buy a tape-based backup system, but not if they want to use disk or any type of cloud-based architecture.

No, this is not me saying this is an example of how tape is better.  It's me, an American citizen expressing frustration at how inefficient my government is -- at least in this case.

For those (like me) who don't live in this world, here's what I'm talking about.  According to their website, "The Schools and Libraries Program of the Universal Service Fund, commonly known as "E-Rate," is administered by the Universal Service Administrative Company (USAC) under the direction of the Federal Communications Commission (FCC), and provides discounts to assist most schools and libraries in the United States to obtain affordable telecommunications and Internet access."

If you download the list of things that are eligible for the E-rate program, you will find that "tape backup" is eligible, but Online Backup Solutions are specifically not eligible.  There is no mention of disk-based backup devices.  Here's the best part.  Tape backup is defined as "QIC, DAT, 8mm, DLT, AIT, and ADR." ADR was end-of-lifed 9 years ago, QIC & AIT were EOLd 3 years ago.  Note that their is no mention of LTO, a device that was released 12 years ago and currently owns 90% of the market.  So to say that the FCC is a bit behind the times is an overstatement.

By the way, they also list floppy disks and CD-Rs as the only examples of removable storage.  No mention of DVDs or BluRays -- and when was the last time you saw a floppy drive?

Thanks to Christina Weil (@c_weil) for pointing this out via Twitter.

Amazing, just amazing.

Update (3/22): This program is aimed at getting schools and libraries connected.  So I've been told that the network parts of this document are mostly up to date.  The only reason backup is in the document in the first place is to help ensure that the connectivity systems remain available and connected.  What I think happened is that the network vendors knew about this program and made sure their parts got updated, and the tape/storage folks have ignored it (or not know about it).

 

More misinformation about backups

PDFPrintEmail

Written by W. Curtis Preston Saturday, 10 March 2012 07:25

I don't care if you use disk, tape, or the cloud to back up your systems.  (In case you think I'm swayed by advertising, I have advertisers from all of those categories.)

Having said that, it bothers me when I see misinformation being used to sway you one way or the other.  This is why I wrote this article that disproved the Gartner 71% tape failure "quote," and this article disproving the Yankee Group 42% failure "quote."  And since he used my comment system to link to his article, I also thought I'd write this blog article dispelling the misinformation in the article he linked to.

He said it's been a long time since people have seen tape used for backups. 

The live survey of the hundreds of attendees to last year's Backup Central Live shows showed that 82% of them still use tape as their final destination for backups.  So much for not seeing tape in a while.

He said IT pros are still skeptical that removable drives have a legitimate place in backup

Yes, we are.  I think that 3.5" removable disk drives are a very bad place for backups. 2.5" drives yes, 3.5" drives, not so much.  They're simply not designed for excessive portability.  Adding to that is this fact:  every portable hard drive I have ever used for backing up my laptop has died long before the drive it was backing up.  Every single one.

He said cloud backup is shiny and new and that's why people are choosing it.

No, it's because it's a complete and total outsourcing of backup functionality.  Backups can be onsite and offsite without ever touching a disk drive or tape drive.  AND you will be constantly notified if your backups are working or not working.  You often even get notified even if you shut off all your backups!  That's not the case with any backup software product that I've ever used.  There are a lot of reasons to use cloud backup over removable disk drives or tape.  In fact, there are so many that I strongly recommend cloud backup for small to medium sized companies.

He said disk is cheaper for small companies

Yes, it is.  It is cheaper to acquire the drives as long as you never need to add capacity.  If you do, however, need to add capacity, disk costs will double.  Tape costs will not.  It'll cost you about $.02/GB to add more capacity to a tape-based system.  (Having said that, I do not recommend backing up directly to tape; I haven't in a while.)  Having said that, I priced a slightly different tape-based system than the one he quoted in his article, and it was approximately the same cost.

As to comparing their 10-bay disk systems against an autoloader, I don't see how you can do that.  Backup software products simply don't know what to do with 10 removable disk drives, but they do not what to do with autoloaders.  (I'm sure he knows a backup software product that will work his configuration, but I don't know of one.)

He said disk is more reliable than tape

Baloney.  I've written about this before.  Tape has a much higher reliability rate than SATA disk -- one hundred times more reliable.  I've already shown above that the statistics he quotes in his article are bunk.  Almost every failure I've ever seen with restores was the fault of anything but the media that was being used.  (I still don't think tape should be used as the initial target for backups, but it is a very reliable place to put the second copy.)

He said LTO-5 speed is 140 MB/s, but only with compression

Sorry, Charlie.  That's the native speed.  It's up to 280 MB/s with compression.  With the 1.5:1 I see all the time, it's 210 MB/s easy.  Having said that, it's very hard to feed data to a drive that fast.  This is why I don't recommend tape as the initial target for backups.  But if you've already made a copy on disk, you should have no problem streaming that tape drive.

He said single file restores are faster from disk

Yes, they are.  That extra minute it takes the tape to load and get to the single file would probably put most companies out of business.  Seriously, you're going to make a case out of a minute of tape loading time?

He said upgrading/replacing is cheaper with disk

Are you kidding me?  I addressed this already.  If you need more capacity than the initial purchase, it's much cheaper to expand a tape system. Most customers go years without upgrading their drives.

He said doing synthetic fulls is easier on disk

Yes, it is.  And CDP and CDP-like tech is also only possible on disk.  Disk has a lot of things going for it.

He said tape has to be replaced more often

Again, baloney.  A tape that is used once a week will last four years with the chart quoted in the article.  That is longer than most disks I've used.

Summary

I don't recommend using tape as the initial target for backups, but I still think it's a great place to put the next copy.  And if you are a smaller company, I think the best thing you can do is to use a cloud backup service that totally automates everything, and alllows for a local copy of your data.  But I think that any backup system that requires small companies to manually swap removable media just to make backups happen is a bad idea.

 

Yankee Group never said 42% of tape restores fail

PDFPrintEmail

Written by W. Curtis Preston Saturday, 10 March 2012 01:37

I just wrote a blog post about how Gartner never said that 71% of tape restores fail.  They never said anything like it.  Another statistic that is often quoted is "The Yankee Group said that 54% of tape restores fail."  Guess what?  They never said that, either.

What they did say in a 2004 paper is that 40.7% of 362 IT executives believed that they had suffered at least one restore failure in the previous year due to tape unreliability.  That's not even close to saying htat 42% of all tape restores fail, but who need truth, right?

Also, I'd like to throw out that these were IT executives.  What this stat really means is that 40.7% of them were told that they had restores that failed in the previous years due to bad tapes.  That's not quite the same thing as it actually happening.  How many backup people even know that the reason for their failure is their own misconfiguration?  And if they did, how many of them would admit that to their boss, rather than saying "the dang tape failed again."

Now the only statistic left is the Strategic Research one, but I can't find anything on that one.

It appears that at least 66% of all tape statistics are made up. ;)

 

Gartner never said 71% of tape restores fail

PDFPrintEmail

Written by W. Curtis Preston Friday, 09 March 2012 23:41

How many times have you read that Gartner said 71% of tape restores fail?  Google it.  You'll find dozens of references to this Gartner "statistic."  It was cited again recently in an article by Highly Reliable Systems, along with a bunch of other stats about how tape sucks. I saw Dave Russell of Gartner last week and asked him about this statistic.  He said he had never heard it, but that he would look into it.  It turns out that the only way he could find it was to Google it.  He searched Gartner's entire archive and could find no paper that ever suggested at 71% failure rate for tape restores.

He said, "I am somewhere between annoyed and pretty darn angry about what I believe are continued misquotes re. Gartner and tape failure rates.  I’ve been the lead analyst for backup and recovery technologies since 2005, and none of what’s out there have been published during my watch."  The only report that referenced tape and the number 71% was a report David did in March of 2006.  Here is what it said:

New, and less-expensive, disk options make the use of disk for faster recovery a more viable option than backup to tape. In a poll of 252 attendees at the 2005 Gartner PlanetStorage conference, 26 percent reported that half or more of their recoveries were currently done from disk. That number jumped to 62 percent when the time frame was extended to 2007. As they look five years into the future to 2010, 71 percent expect that tape will be used mostly for archiving and disaster recovery.

I did a bunch of web searches for "Gartner 71% tape restores fail," and found that if I search for those words prior to March of 2006, I don't find much.  I do find an article from Jon Toigo in 2005 that says he hears IT people quoting a 10% failure rate from Gartner, but he believes that number is fictitous (which it probably was.)  I also find a whitepaper from Exabyte that refers to a 2002 article from Adam Couture of Gartner Group.  I just asked David Russell to see if he can find that article.  I also found another whitepaper from Tandberg citing similar numbers and the same paper.  Maybe that one has some basis in reality.  Most interestingly, I did find this page which claims to be the text of a Feb 2003 article from Computer Technology Review that says that "A recent study [it doesn't cite the study] found that while tape backups are used extensively, restoring data from a tape backup system fails an astounding 70 percent of the time. The reasons for such an alarming rate of failure range significantly--and may vary from bad tapes or tape drives to the inability to find the backup tapes or careless processing by IT staff."  (My experience has been it's been far more careless processing by IT staff than bad tapes.)

The important thing is that prior to March of 2006, a Google search shows no references to Gartner thinking that 71% of tape restores fail.  Then David Russell wrote his report in March of 2006 that said that "71 percent expect that tape will be used mostly for archiving and disaster recovery."  If you change your Google search to the year after his paper came out, you find a bunch of quotes to the 71%, the first of which comes from this DPM Datasheet from Microsoft -- promoting DPM.  Then all of the sudden, the floodgates are open and everyone is quoting this number -- no one (including Microsoft) actually giving their source, other than simply saying "Gartner said it."  Most of them also seem to quote the Yankee Group (saying 42%) and Strategic Research (54%).  I wonder if they ever said what these articles say they said. 

Another quote I've seen is this: "according to Ben Matheson, group product manager for Microsoft’s “Data Protection Manager” Division,  42% of attempted recoveries from tape backups in the past year have failed.”  (BTW, please note that this is the same number as the Yankee Group number above, so maybe he was just quoting that number) I saw this in an article updated last week. According to LinkedIn, Ben Matheson hasn't worked for Microsoft since February of 2006, so that quote can't be correct either.  But once you've got a great quote, why let it go?  Wait, I may have found our Gartner quote culprit. Let's see, Ben Matheson leaves Microsoft as its DPM product manager in February of 2006.  The new person took over shortly thereafter.  The next month a Gartner paper is written, and within two months we have the Microsoft DPM product group citing it incorrectly.  Could it have been a new gung-ho product manager misquoting Gartner?  Then everyone else starts quoting Gartner by quoting this Microsoft paper.  Next thing you know it, it's real!  (This is just conjecture, of course. Don't sue me, person who took over from Ben Matheson.)

So what?

We all know tape backups and restores, fail, right?  Who cares if no one at Gartner said it?  The first reason is truth.  This statistic is cited so often that it has been accepted as truth, and it isn't.

The second reason is that you can't debate the truth of a fake report. If it was a real report, we could check the stats behind the stat, and see how many of these "tape restore failures" were caused by human error and had nothing to do with the fact that they were using tape.   But since there never was any report, we can't do such a thing.

Please, people.  Don't quote third parties like that if you can't cite the source.  It's too easy to misquote.

 

Backup Central Live Q1 starts next week!

PDFPrintEmail

Written by W. Curtis Preston Thursday, 19 January 2012 08:19

We've got new and exciting content for 2012, and we're starting our seminars this year with San Jose and San Diego next week.   These free seminars are first-come first serve (end-users only), and we're almost at capacity in the first two cities, so you'd better act now if you want to go.  I've also listed the rest of our backup seminars for Q1.  (Other cities will be announced soon.)

Date Event Where
Jan 24 Backup Central Live! San Jose, CA
Jan 26 Backup Central Live! San Diego, CA
Feb 7 Backup Central Live! Raleigh, NC
Feb 21 Backup Central Live! Miami, FL
Feb 23 Backup Central Live! Tampa, FL

See you there!

 

Stop SOPA/PIPA

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 18 January 2012 11:10

Before I say anything about SOPA, let me say that I am not battling SOPA because I’m into illegally downloading books/music/movies.  As a provider of content (author of three books), I am strongly FOR paying for the media I use.  And don’t give me that crap about it isn’t stealing cause you were never going to buy it anyway.  You didn’t have something, now you have it, and you neither paid for it nor obtained the permission of the person who provided it.  You call it what you want; I call it stealing.

Now that THAT is out of the way….

What I’m also very much against is the government wasting time and MY money trying to stop something they will never stop.  I’m against SOPA for many of the reasons I’m against the TSA.  The TSA is security theater; SOPA is anti-piracy theater.  The only thing it will accomplish besides wasted money is some government folks getting to say that they did what they could — all while wasting millions of taxpayer dollars and the time of other companies’ IT departments.

Chime in!  Especially over at my new domain stopsopacentral.com. ;)

 

 

Contemplating File Sync/Sharing Services

PDFPrintEmail

Written by W. Curtis Preston Monday, 31 October 2011 15:23

I wrote a few months ago about what a difference the cloud has made for how I conduct business.  I rarely buy software for my new company anymore; I often am paying for some type of cloud-delivered service.

One of those services that I use (and love) is Dropbox.  It is an incredibly easy replacement for a file server when you need to share 10s to 100s of GB of files between mutliple users.  However, I definitely have some security concerns about it, and not just since the big snafu a few months ago.

One of my issues with dropbox is that they can access my data.  Data is encrypted in transit, but they can access my data because they have my password.  The same appears to be true of Syncplicity & Sugarsync.  Why do I think that? Because they have a "reset my password" link.  How does encryption work if they can change my password without a problem?  Compare this, for example, to wuala's answer and boxcryptor's answer to the question about a lost password.

Even with Wuala, who says they don't know my password, how do they share encrypted data with users I specify?  If all data is encrypted/decrypted locally, how does the person with whom I'm sharing files decrypt them?  I'm curious.

The last two listed are open source alternatives.  They're too limited in functionality for me, but I thought I'd throw them on there anyway.

SugarSync

Wuala

Syncplicity

Boxcryptor

iFolder

Sparkleshare

What do you think about all this?  Anyone I left out that I shouldn't have?

 

Veeam excites and frustrates me

PDFPrintEmail

Written by W. Curtis Preston Saturday, 01 October 2011 23:41

Veeam is one of the most innovative backup and recovery tools designed specifically for VMware and Hyper-V.  They've also done a really good job of marketing this tool.  In a matter of a couple of years, they've gone from "who's Veeam?" to the mindshare leader in this space.  I'm not sure what they're actual market share is, and there are several other tools that are also making a name for themselves, but it's hard to think of a product that has more successfully captured the hearts and minds of their target market than Veeam.

They announced their vPower functionality at Tech Field Day in Seattle quite some time ago.  To summarize, this is the ability to run a VM from their backup image of that VM.  This opens up all sorts of different levels of functionality, such as instant VM recovery and automated, full testing of the viability of your backups of a given VM.

This is why I looked forward to their presentation at Tech Field Day 7.  At first, I was not disappointed. They announced support for Hyper-V.  Yay!  They also announced further refinement of their vPower functionality.  (They even gave me credit in one of the Powerpoint slides for some suggestion I made that they acted on.) They also hinted at a new version that is almost out, but wouldnt' really talk about it or show it.  We definitely were not allowed to ask questions about it.  Note to future Tech Field Day presenters: I can't think of a way to frustrate bloggers more than to tell them about a new version that you're not going to talk about, show us, or let us ask questions about.  To make that matter worse, they kept hinting about the new version throughout the presentation, but then kept telling us we couldn't ask about it.

Where the wheels fell off the truck for me was when I brought up the fact that most Veeam customers use Backup Exec to back up Veeam.  Another way to say that is that Veeam can't back itself up.  This resulted in a 20 minute conversation during which I got quite riled up, while Doug Hazelmen kept looking at me like he had no idea why I had such an issue with this.  You can watch the whole conversation here.  It's from 1:24 to 1:45.  He occasionally snickered, as if to say that the whole point of the discussion was ludicrious.  At one point he actually said the statement that they can't back themselves up was "stupid."  Yet he confirmed that the most common practice for Veeam customers was to use Backup Exec to back up Veeam.

Veeam data is stored in two places: the SQL database and the backup jobs directory.  There is no way within the product to make a special backup of the SQL catalog so that it can be easily restored without creating a catch-22 situation.  For example, one suggestion was to use one Veeam server to backup another Veeam server.  That creates a catch-22 of having to restore one server before you can restore the other server.  What if both servers are gone?   Doug hinted that losing the SQL database just isn't that big of a deal because it's just job configuration information.  You could just redo it if you lost it.  Is this really a backup company talking to me?

The second part of their data is the backup jobs history.  It has no catalog; everything that Veeam needs to know about the backups is stored with the backups.  The question is: what happens if one or more of those files gets corrupted?  What happens if some well-meaning admin looking for space deletes some jobs?  What happens if a rogue administrator deletes all of them?  As far as I could tell, Veeam has no way of recovering from this situation -- which is why most Veeam customers use Backup Exec to back up Veeam.

Doug seemed to think that I was pushing for tape support.  In a way, I was.  Tape is still the least expensive way to get data offsite.  In many organizations, it's the only way to get data offsite.  They just have too much data to be able to afford a pipe big enough to replicate their backups -- even if they have been deduplicated.  That issue aside, I wasn't pushing so much for tape as I was a method for creating a backup of my backup.  Files stored in filesystems get corrupted.  It just happened to me today.  For no apparent reason, a file whose modification time hadn't changed was telling me that it couldn't be copied.  It was a movie file on an iMac.  I can play the movie, but I can't copy the file.  Weird. That's what files on filesystems do -- and that's why we back them up.  But the guys at Veeam just don't seem to get this, and that's why they frustrate me.

On one hand, I think the idea of a backup that can test itself in a totally automated fashion is completely awesome, and a lot of other areas of functionality are very impressive as well.  On the other hand, them not understanding the issue I do have (and therefore not addressing it) is really frustrating.  I hope we can work this out eventually, but they'll first have to stop calling what I'm saying "stupid." ;)

 

Dell going for the big time

PDFPrintEmail

Written by W. Curtis Preston Saturday, 01 October 2011 21:58

Dell is going to build a unified storage system that has everything you could want ever want in a mid-tier or enterprise-tier storage system.  Or so said the presenters at Tech Field Day 7.  Only time will tell.

I was part of several bloggers visiting Dell's headquarters in Round Rock, TX (a short drive from Austin) last month just prior to VMWorld.  (That's my excuse for this blog entry being so late, BTW.)  Dell apparently paid for a double-sponsorship from Stephen Foskett of Gestalt IT so that they could talk to us for four hours (instead of the usual two).  They had a lot to talk about.

They made sure we knew about all of the major acquisitions that Dell has made over the past few years:

  • Equallogic - A scalable iSCSI grid storage array
  • Exanet - A scalable NAS system
  • Perot Systems - Professional Services
  • Ocarina - Deduplication and Compression
  • Compellant - Midrange storage arrays
  • RNA Networks - Cloud memory
  • Scalent Technologies - Datacenter management software

I believe it was Carter George who explained all this, and explained how Dell was going to integrate these technologies faster and better than any other storage company has ever done.  The way he described it, it was as if Dell would come out with a totally unified scalable storage system that supported iSCSI, NAS, dedupe and compression that could meet the needs of the mid-market and enterprise market, while being easy to manage in a datacenter -- and be cloud ready.  And they were going to do all of this reeeeal soon.  He didn't give dates, but the way it was talking, it sounded like 2012.

Dell, you see, "is starting from scratch."  Those other vendors weren't.  The problem is that I'm not sure how having several products from several different companies, all of which already have existing customers is "starting from scratch." 

The way this usually goes is each company becomes a faction in a big project, each wanting to put their technology into the finished product.  Each of them thinks that their technology is what's going to make things better.  I have one product in mind from the past, where it was pieced together from acquired technologies from a bunch of different companies.  The result was three levels of abstraction (one from each company) before the data ever got to disk.  The result was also a piece of crap.

Maybe Dell will be different.  I wish them the best of luck.  Good luck at tearing down the fiefdoms without damaging egos.  Good luck getting people to speak their mind when it's really important -- when the emperor appears to be getting undressed.  My personal experience with trying to do that with Dell did not go very well (to put it mildly), so I hope things have changed.

I also have concerns about how Dell salespeople will evolve to sell products that require upfront sales engineering to get the order right.  My personal experience with their sales teams so far suggests that they've got as much work to do here as they do with all their products I mentioned earlier.

I have been exposed to Equallogic, Compellant, and Ocarina before, and have heard nothing but good about them from the field.  So I think Dell has chosen some really solid building blocks to build a real storage company with.  I just don't think it's going to be as easy as the presenters at Tech Field Day were trying to say it will be.  I'll be more than happy to be wrong, though.

 

My first trip to VMworld

PDFPrintEmail

Written by W. Curtis Preston Friday, 09 September 2011 06:53

VMworld is the new industry show.  It is the show to attend and the show to exhibit at.  I was really impressed.  Here's a list of my thoughts about my trip there.

Update: I re-read this blog post this morning and felt it was too harsh and didn't contain enough of my positive reaction to the show.  I've therefore added a new paragraph or two in the beginning that explains my overall reaction to VMworld.  I also added some photos. No, I didn't get any complaints. This is just a case of writer's remorse.

VMworld is a very impressive show.  The main session was the biggest such session I've ever seen.  Attendance was around 20,000 people, which is more than last year, which was bigger than the year before, etc.  In a world of ever-shrinking tradeshows, it's nice to see one that's growing.  I liked the way they did the virtual park, and the way they had volleyball, basketball, and badminton courts (with what appeared to be pros would would play with you).  The attendance at the opening keynote was incredible.  (The content of the Paul Maritz's talk, or the what-appeared-to-be-scripted "interview" of the three CIOs later... not so much.  Due to my impression of that talk, I slept in the following morning and didn't go to the next morning's general session, only to be disappointed by all the tweets about how much better THAT talk was.) 

VMworld and the Venetian also did a very good job of shuffling 20,000 people around the various venues, including lunch.  I never felt like I was ever waiting in line anywhere.  There was the occasional traffic jam, of course, but nothing compared to what I've seen at some shows. Food was decent, and there were healthy options if that's what you were looking for. 

The treatment of the press was very good.  We had a press-only area with meals, drinks, and snacks where we could relax, write, blog, etc. Then they had a place where press could bring non-press people for interviews.  They also had dedicated Q&A sessions for the press.  All of that was very close by, which made it all very convenient.

Overall, it's a very good show with a lot of content (if that' what you're looking for) and a lot of exhibitors (if that's what you're looking for).  You could do a lot worse.  Now, my feedback...

1. Registration was easy

That is, just getting registered.  And then....

2. Session builder was horrible

You were required to register for any sessions you wanted to attend.  That's fine.  However, the system you had to use to do that was one of the worst designed web pages I've ever worked with.  Every mouse click resulted in a refresh of the entire page with a list of all sessions.  Many sessions were listed in multiple places, instead of just listing the session once with multiple times.  Registering for each session required many, many mouse clicks and a popup.  Then, of course, it was followed by a page refresh.  Yuck.

One cool feature was that you could export your schedule to your calendar.  That was nice.

3. The exhibit hall was huge, huge, huge.

It's not just that this was bigger than EMC World or Symantec Vision, or any other large industry show.  It's that it contained almost anyone who was anyone.  In the backup world, you're not going to see SyncSort or CA at EMC World, but you do see them here.  This shows how separate EMC continues to allow VMware to be.

In fact, the exhibit hall was so big, and there were so many vendors there that I hadn't seen in a long time, that I had to give up almost all the sessions I had planned to attend just to make time to see all the vendors in my space.  And that's just in the backup space!

4. The exhibit hall is a little out of control

Certain vendors (and you know who they are if you were there) send people so far out in the aisle that you can't get past them without being accosted.  They would literally stand in front of you, forcing you to interact with them.  This is regardless of how many times you went by the booth, or whether or not you had any interest in technology.

Many vendors exceeded any reasonable noise rules.  There should be a very definite rule that your booth cannot be beyond N decibels if you're more than N feet away from the booth.  Subwoofers should be outlawed altogether.  It is soooo not cool to be the booth 30 feet away and not be able to hold a conversation because another booth is blasting away.

If you're going to hire booth babes (and there's a good argument for not doing so), can you at least have them dress professionally and not like they're going to a night club or standing on a street corner?

5. Water.  Seriously.

I was never so thirsty as when I was in the exhibit hall.  You're several minutes away from any drinks you can buy.  There's no complimentary sodas.  So there should be water dispenser everywhere -- and they should be constantly monitored for fullness and cup availability.  Almost every single water dispenser I found was either out of water, out of cups, or both.  Here's an idea?  How about putting the next 5-gallon water bottle next to the dispenser.  If we're thirsty and it's empty, we'll put it in.

The first night I went to dinner after being thirsted to death in the exhibit hall.  I drank six glasses of water and -- not sure how to say this delicately -- my body showed me later it needed all six glasses. [Update: I heard from a few people that I put this too delicately and they didn't understand what I was saying.  I'm saying that I didn't need to go to the bathroom at all after drinking that much water.]  I was severely dehydrated just from walking around the exhibit hall.  Water was that hard to find.

Having said all of that, this is the new industry show and I will never miss it again if I can help it.

 
 

VMware passes Hyper-V up in the backup race

PDFPrintEmail

Written by W. Curtis Preston Sunday, 07 August 2011 09:00

The title may surprise none of you, but it is actually the opposite of what I said 1.5 years ago in a blog post called Hyper-V ahead of VMware in the backup race

Back then I was concerned that VMware did not have full VSS support.  They have since rectified that. [Update: by "full VSS support," what I mean is that it can talk properly to all versions of VSS.  Before, they did not support Windows 2008.  Now they support all versions of Windows.  There is still the problem that they only have one style of snapshot, so they aren't telling applications that they've been backed up, which means that the applications aren't truncating their logs.]

They also added changed block tracking (AKA "CBT") in vSphere, so it is possible to perform block-level incrementals on image-level backups. And since VMware is talking properly to VSS, the applications are doing what they are supposed to be doing before a backup as well. 

Now it is Hyper-V that is behind.  There is no API within Hyper-V that can present to you a map of changed blocks in order to back them up.  You can perform an incremental backup of-course, but an incremental backup via the Hyper-V host is going to back up everything, as every .VDK file will have changed.

This changed blocked block tracking feature of VMware makes finding which blocks have changed must faster, and backing up just the blocks that have changed (vs the files that have changed) is the fastest way to do an incremental backup.

Just like with VMware, third parties have stepped in to fill the void.  So far, I know of Veeam and Arkeia that are using their source deduplication capabilites to perform sub-file incremental backup of Hyper-V machines.  I'm sure there are more as well -- and if any of them mention themselves in a comment, I'll update my post.

 

Moving to the cloud

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 03 August 2011 00:34

I'm in the process of trying to convert my church's use of an onsite file and email server to a cloud file synchronization service and a hosted Exchange service.  I've chosen SugarSync, as that appears to have a little more flexibility than Dropbox, and Sherweb for the Exchange service, as that is what we're currently using for Exchange at Truth in IT.

For file services, the idea would be to move any folder that a given person needs access to their local PC, synchronize that folder to SugarSync, then share that folder out to other people who would need access to it.  That user would then synchronize that folder to their PC and have local access to it as well.  Changes would automatically be synchronized to every computer accessing the same folder.  (This is the same way we use Dropbox at Truth in IT. We share one big folder, and it's synchronized to all our Macbooks.)  In order for this to work at the church, I need each PC to have enough local storage space to hold the data that they need to access on a regular basis.  (They can get access to infrequently accessed files via the web.)  The good news is that most PCs today have way more storage than they need if they're using a fileserver. 

I've started the SugarSync pilot with three of the office workers, and selected about 20GB of folders to synchronize.  They have a 1.5 Mb T-1, so it took about a little over 24 hours to upload that 20GB up to SugarSync, and another 24+ hours to sync it to each computer that needs to have access to it.

Besides doing away with the server (and the costs associated with maintaining it), different people have experienced different benefits.

  • One staff member who does not have an Internet connection can work on his files on his laptop at home and have them automatically synchronized to SugarSync when he plugs into the church's Wi-Fi
  • One staff member who likes to work from home a lot can access all of her files at home just like she was at the church, and can stop using thumb drives to bring files back and forth, or waiting ages to download a file via the VPN
  • Another staff member needs infrequent access to office files from the house, but doesn't feel the need to sync any folders to his house.  He will instead download or upload any files he needs via the SugarSync website.

There you have it: different strokes for different folks.

 

 
 

Social Media and security

PDFPrintEmail

Written by W. Curtis Preston Friday, 22 July 2011 00:00

Social media incidents cost a typical company $4 million over the past 12 months, according to the results of a Symantec survey published today.

There have been a number of legal actions about social media in recent years, including a Financial Industry Regulation Authority (FINRA) regulatory notice, the Romano vs Steelcase Inc and Bass vs Ms. Porter's School cases (where both plaintiffs were granted discovery of the defendant's Facebook Profile), and the sexual harassment case EEOC vs Simple Storage Management LLC (where a US District Court held that social networking sites -- or SNS for short -- were discoverable).  This means that what your employees do on their personal time on SNSs can open your company to embarassment and litigation.  The survey, then, sought to find out how big this problem is in the enterprise. They hired Applied Research to interview IT professionals from 1200+ enterprises with 1000+ employees.

45% of respondents use SNSs for personal use, and 42% use them for company use.  IT folks are worried about employees sharing too much information (46%), the loss or exposure of confidential information (41%), damage to the brand (40%), exposure to litigation (37%), malware (37%), and violating regulatory rules (36%). 

The respondents to the survey listed 9 social media "incidents" in the past 12 months, with 94% of those incidents having consequences, including damage to the brand (28%), loss of data (27%), or lost revenue (25%).  The average cost of a social media incident was listed as $4.3M!

Most of the companies are discussing creating a social media policy, training their employees, putting processes to capture confidential information, and putting technology in place to stop these things from happening as well.  However, what was suprising was that -- while almost 90% of respondents felt they  needed to have these things in place, only 24% had a social media policy, 22% were training their employees on social media, and about 20% were using technology to control this process.

Folks, it's happening and it isn't going away.  The very least you can do is to create a social media policy and train your employees why it is important.  Those employees who are allowed to blog about company matters need to be continually reminded that their actions are discoverable.  Even if their personal site may not be demonstrated to be official company policy, it surely states the opinion of one of its employees -- and those employees make up the company.  And if it can be shown that one of its employees was continually doing something damaging on a publicly accessible social site and the company did nothing to stop it, that can be actionable.

Just remember: It's really easy to be a jerk on the Internet where you're not facing the person you're talking to.  You might want to dial it down a notch or two.  Just a thought.

Update 25 Jul 2011: I was given a briefing about this survey and didn't read the press release until today. During the briefing, Symantec seemed to be playing down the role that technology had to play in helping to solve this problem.  However, in the press release, it seems as if they're saying that Enterprise Vault is going to handle this by archiving social media content.  First, I have no idea why anyone who is not required to archive any content -- be it email or twitter -- would do such a thing.  If you're not required to keep something and keeping it adds no value to your business -- don't keep it!  Second, even if you did archive it, I'm trying to understand how that would help you in a discovery situation.  If someone wants to see your Facebook logs, they're going to subpoena Facebook.  That's what happened in the cases listed in this article.  So if you did archive it, now you're required to produce it.  So why would you do this if you weren't being forced?  And how would doing this help you in a trial?

 

Is Holographic Storage the future of archive & backup?

PDFPrintEmail

Written by W. Curtis Preston Thursday, 21 July 2011 20:26

And now for something completely different.  GE researchers have announced that they have successfully demonstrated a micro-holographic material that can support 500 GB in a DVD-style disc.  That's 20 times greater than most Blu-Ray discs (there is a Blu-Ray 100 in the works), and 100 times greater than DVDs.  So does this have backup and archive potential?  Let's look into that.

The first question is how fast this thing will be.  The article said that it supports "data recording at the same speed as Blu-ray discs."  The fastest a Blu-Ray disc can currently write is 12x, which translates into 54 MB/s.  That's slow in comparison to modern tape drives, but still not too shabby.  It's way faster than any of the Magneto-Optical formats. Although it's not stated anywhere, I'm assuming this is a random-access format, so it's access time during restores or retrievals would be very nice when compared to tape.  Due to the load/unload process, it's still not going to be as fast as a hard drive unless we're talking about leaving the disc in the drive all the time.  In a robotic setup, you'd have to add robotic time and load/unload time.  But this would all be similar to, if not better than, the speeds we have with tape.

The next question is cost, and there's nothing on that yet.  Traditionally, other optical formats have lost this race in a big way.  Only time will tell whether or not this format will change that pattern.

Finally, there's the question of long-term stability of the media itself.  I previously posted about the differences of tape vs disk in this area, and how tape is actually more stable for longer periods of time than disk is.  However, this is holographic storage and I honestly have no idea what the long term viability of data stored on such a medium would be.   I'm leaning towards the idea that it would actually be very stable, but I know that other optical formats are not as stable as one might think they would be, so...  Only time and more research will answer that question, too.

Assuming that they address the cost concerns and my hunches are right about its long term stability, I'm really leaning towards this as a long-term archival medium -- as opposed to a backup and recovery medium.  While 54 MB/s may sound like a lot, it's just not enough for today's large data centers.  Throughput doesn't matter much in archival situations, but random access does, making this really well suited to archive.

For those of you ready to dump tape or disk for anything that gives you the portability and cost of tape with the random-access nature of disk, it looks like you're going to have to wait a bit.

 
 

Include All Files; Reject Some

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 29 June 2011 16:57

I had a twitter chat with @JLivens the other day where the question was "what do you back up?"  My first response was to, of course, say that I back up everything - thrice - cause I'm me. If you're curious, my critical personal data is synced on multiple computers with history using Dropbox (which I'm reconsidering based on how things have been going over there lately), then it's backed up with the free version of CrashPlan to another computer that isn't at my house, AND I can't resist the urge to throw in a Time Machine backup every once in a while.  You know what?  I haven't done one of those in a week or so.  Just a second.  My little Time Machine icon is spinning now. Ah, there I feel better.

Side note: for all my talk about tape lately, you'll notice that I don't have any tape in my setup for now.  I am about to embark on a project that may make me reconsider that as I might have an archiving need soon.  Can't keep it all on spinning disk!

Alright, back to the topic at hand.  What do I back up?  I actually do back up everything, but that is not the point I wanted to get across in this post. 

It's easy to come up with a list of directories you don't want to back up.  Your /tmp folder, your "Temporary Internet Files," your folder on your work laptop that contains the illegally downloaded movies that you should have never downloaded in the first place.  Yeah, I'm talking to you.  Pay for the media/software you consume.

But what I wanted to talk about was how to make your backup selection if you want to exclude things.  What I've found is that the human tendency is to say "just backup the Documents folder," or something like that.  And that is what I really want to talk you out of.  There is too much risk doing it this way.  You could accidentally put some important data in a directory you're not backing up.  You could create a whole other directory that contains really important data and forget to add it to the list.  The risk outweighs the benefit of excluding the other data.

If your backup software has the ability, please have it autoselect both filesystems/drives and folders/directories.  If it supports it and if you want to do so, you can also create an exclude list of the directories you definitely don't want to back up.

And that's what I came to say: backup up everything, but exclude what you don't want.  Hopefully the title makes sense now.

 

Schedule Tweets from the command-line

PDFPrintEmail

Written by W. Curtis Preston Tuesday, 21 June 2011 07:29

We at Truth in IT have several events that we need to invite people to, and twitter is one of the ways we do that.  Scheduling such tweets in advance is a great way to make sure you send the right tweet at the right time, and Twuffer.com (short for Twitter Buffer) is an easy way to make such automated tweets happen.  The only problem is that each scheduled tweet in twuffer.com takes several mouse clicks, each of which is followed by a screen refresh.

I wondered if there was an easier way.  I'm proficient in old-style Bourne shell programming in Unix/Linux (never did get very good at Perl, but I rock at Bourne Shell) and I know how to use cron, so if I could just find a way to tweet from the Linux command-line I figured I could make my own twuffer.

An Internet search for "tweet from the command line" turned up this and this article.  I got all excited, then disappointed once I realized those were using basic authentication, which was disabled in June of last year.  It was replaced by oauth authentication, allowing you to authorize an app to use your twitter account without giving them your twitter password.

A google search for "oauth twitter commandline post" turned up this post from Joe Chung's "Nothing of Value" blog called "Twitter OAuth Example."  He explains a series of separate PHP scripts that, if run and edited in the proper order, will result in you having a script called twitter.php that is actually your own properly registered and authorized twitter app that can send tweets from the command line.

While I was able to figure out Joe Chung's instructions (and I'm incredibly thankful for them and the code that comes with them), I wanted to adapt his code and instructions a little bit for those who may not be as adept at coding.  And I've also added my own code around the final tweet.php script to support scheduled tweets.

Before You Start

If you want to understand more about Oauth and how it works, you should read the original blog post.  Each major step below is also a link to the original instructions from twitter.

What You'll Need

You will need a Unix/Linux command line (or something like it), php and cron to make all of this code work.  If you don't have cron or something like it, you won't be able to send scheduled tweets, but you will still be able to send tweets from the command line.  You'll also need to have a basic understanding of the command line.  Unlike the original code from Joe, though, you won't have to edit any of the PHP scripts.

Step 0: Download my modified code

You can download all of my source files here: http://www.backupcentral.com/twitterapp.zip
Unzip them into a directory then cd into that directory.  My first six steps of my post follow the ones from the original post.   I again urge you to read the original post, as he really deserves all the credit for figuring this out.  All I did was hack his scripts to behave differently.  If you want even more information, each step is a link to the original oauth spec from twitter.com.

Step 1: Register an application with twitter

Only registered apps can send tweets via Twitter's API.  So in order to send a tweet on the command line, you need to be your own app.  (Don't worry; the code is already written.  You just need to register the code you just downloaded as your own app.)  The first step in this process is to go to twitter.com and register your app.

Here are some pointers to help you fill out the form:

  1. Whatever you put as the name of the Twitter App is what will show up when you send tweets in the "via" column.  For example, we named ours TruthinITApp, so our scheduled tweets say "via TruthinITApp" at the end.  You can name the app whatever you want, except that the name cannot have the word "twitter" in it
  2. It doesn't matter what you put in the rest of the fields, although you should probably put a valid website, and a description of what you're up to.
  3. I put Browser as my application type, but I'm not sure if that matters
  4. Specify Read & Write or Read, Write & DM access
  5. Use twitter for login

Once you have clicked Save, you will be presented with a results page.  You need to get two values from that page: Consumer Key & Consumer Secret(Record these values somewhere for later.)

Step 2: Get a request token

Now you're going to do the equivalent of a user using the app for the first time.  You will login to twitter, then try to use the app.  Twitter will ask if you authorize the app.  After you do that, it gives you another value you need.

1. Login to twitter as the user you wish to send tweets as
2. Run the following command, substituting the two values of consumer_key and consumer_secret you got in Step 1

$ php getreqtok.php consumer_key consumer_secret

This will display a URL followed by a command.  You will use those two strings in the next two steps.

Step 3: Authenticate the user and authorize the app to tweet for the user

Cut and paste the URL from the previous step into your browser.  (This is the equivalent of using the app for the first time as the user you want to tweet as.)  Once you click Authorize App, it will display a seven-digit number that will then append to the command displayed in the results of the previous command.  (Record the value for later.)

Step 4: Get the access token and secret

Now that the app has been authorized to tweet for the user, the app needs to establish a special key and secret (think username and password, but without actually giving them your password) that it will use each time it tweets on your behalf.  The command will look something like the following command, where consumer_key and consumer_secret are the values that you got when you registered your app, oauth_token and oauth_token_secret are the values the app was given when the app was authorized by the user, and authkey is the seven-digit value from the web page.

$php getacctok.php consumer_key consumer_secret oauth_token oauth_token_secret authkey

This command will display the next command that must be run, which is the actual twitter.php command, along with all the arguments you need to pass to it.  It will look something like the following, where access_token and access_token_secret are the values that the previous command got that are the unique username/password combo for this app and for this user. (Notice the access token actually starts with your twitter user ID -- the number, not the name.)

$ php tweet.php "Hello World..." access_token access_token_secret consumer_key consumer_secret

Step 5: Post a tweet on the command line

Start your twitter client or monitor twitter.com for the user you're going to send the tweet as.

Run the command above, and you should see a bunch of text fly by.  As long as you don't see errors like "Invalid Token" or anything like that, your tweet should have gone through.  

You just sent your first command-line tweet!

Scheduling tweets using cron and tweet.sh

In addition to the code above that was written by Joe Chung, I wrote twitter.sh, that uses twitter.conf and twitter.txt to automate the sending of tweets using cron.  The rest of this blog post is about how to use those tools, which are also in the code you downloaded in Step 0.

Step 6: Edit tweet.conf with the appropriate keys and secrets

Put the values of consumer_key and consumer_key secret as the second and third field in the consumer_key line:

consumer_key:<consumer_key>:<consumer_key_secret>

Create a line for each user that you have authorized using the steps above and insert the appropriate values for:

username:<access_key>:<access_key_secret>

Step 7: Put a cron job that will run tweet.sh every minute for you:

* * * * * /workingdirectory/tweet.sh workingdirectory >/tmp/tweet.out 2>&1

Where workingdirectory is the directory where you installed the code.

Step 8: Edit tweet.txt and put a tweet sometime in the near future. 

The format for tweets is as follows (where "|" is the field separator):

MON DD HH:MM|username|Tweet goes here

Here's an example.  First, get the current date

$ date
Tue Jun 21 03:20:22 EDT 2011

(Yes, I'm up a little late working on this post...)

Second, add a tweet to the file for a few minutes from now

$ echo "Jun 21 03:22|testuser|Test tweet1" >>tweet.txt

Please note that I used "|" as the field separator.  This means you cannot use the "|" character in any of your tweets.  One other note: Twitter will not let you send the same tweet twice, so you will need to change your tweet phrase if you want to do more testing.

When Jun 21, 03:22 rolls around, it will send your tweet.  If tweet.php returns successfully (indicating a successful tweet), it removes it from tweet.txt and appends it to completedtweets.txt.  If there was a problem sending your tweet (such as it being a duplicate), then it leaves it in the tweet.txt file.

That's it.  All you need to do to send tweets in the future is to add them to tweet.txt and they will magically happen.  You can put blank lines, comments, or whatever other formatting you want in tweet.txt, as long as the actual tweet lines follow the format in step 8.

Please let me know if this post was helpful.  Also please post any suggestions on how to make the code better.  If I can make it work, I'll update the code and the post.

 
 

Tape more reliable than disk for long term storage

PDFPrintEmail

Written by W. Curtis Preston Thursday, 02 June 2011 00:46

Tape is inherently a more stable magnetic medium than disk when used to store data for long periods of time.  This is simply "recording physics 101," according to Joe Jurneke of Applied Engineering Science, Inc. 

I had heard rumblings of this before, but it was Joe that finally explained it in almost plain English in a post to this thread from hell on LinkedIn.  Here's the core of his argument:

By the way, the time dependent change in magnetization of any magnetic recording is exponentially related to a term known as KuV/kt. This relates the "blocking energy" (KuV) which attempts to keep magnetization stable, driven by particle volume (V) and particle anisotropy (Ku) to the destabilizing force (kt) the temperature in degrees kelvin (t) and Boltzmans constant (k).  Modern disk systems have KuV/kt ratios of approximately 45-60. Modern production tape systems have ratios between 80 and 150. As stated earlier, it is exponentially related. The higher the ratio, the longer the magnetization is stable, and the more difficult it is to switch state.....Recording Physics 101....

I had to call him to get more information.  He explained how this came about.  Disk drives have been pushed for greater and greater densities, which caused their vendors to create a much tighter "areal density."  Tape, on the other hand, mainly got longer and fatter to accomodate more data in the same physical space.  (Yes, it increased areal density, too, but nowhere near as much as the disk drive folks did.)  The result is that the tape folks have more room to play, allowing them to use magnetic particles with a bigger particle volume (the V in the equation).  The bigger the particle volume, the more stable the magnetism is, according to the KuV/kt equation.  In addition, tapes are generally stored outside of the drive, which means their temperature is lower than disk drives.  That means they have a lower k volume (degrees kelvin), which is one of the "bad" numbers in the KuV/kt equation.  Having a higher V value and a lower t value is what translates into tape systems having ratios of 80-150, vs disk systems that have ratios of approximately 45-60. While I don't have an exact cite to point to in order to show these exact values, what he's describing makes perfect sense to me.
 

Add to this the fact that tape drives also have a lower bit error rate than disk.  SATA disk is 1:10^14, FC disk is 1:10^15, LTO is 1:10^16, and IBM 3xx0 and Oracle T10000s are 1:10^17.

Add to this the fact that tape drives always do a read after write, where disk drives do not always do this.

Sooo...

Tape drives:

  1. Write data more reliably than disk
  2. Read it after they've written it to make sure they did (where disks often don't do that)
  3. Have significantly less "bit rot" or "bit flip" than disk drives over time.

Like I said in a previous post, I think we've put these guys out to pasture a little too soon.

 

My Detente With EMC's DD Archiver

PDFPrintEmail

Written by W. Curtis Preston Friday, 20 May 2011 19:12

When I first heard about the EMC disk archiver, I blew my stack.  I don't remember exactly how it was presented to me, but what I heard was that EMC was coming out with a disk product that was designed to hold backups for seven years or more.  Since storing backups for seven years or more is fundamentally wrong (and no one -- and I mean no one -- argues with that), the idea that EMC was coming out with a product that was designed specifically to do that angered me.  Brian Biles, VP of Product Management for EMC's BRS division, said with a wry smile, "so you're saying we've become a tobacco company."

I replied saying, "No, you've become a cigarette case manufacturer.  You shouldn't smoke, kids, but here's a really pretty gold case to hold your ciggies in."  I had a similar conversation with Mark Twomey (@storagezilla) on Twitter.

Since that time, I have come to a detente.  I still wouldn't buy one of these for my long term storage needs, but I can see why some other people might want to do so -- and I don't think those people are wrong or committing evil or data treason. This blog post is about how I got here from there.

Here were my arguments against this product:

There's no way that this could cost less than tape

Some of the messaging that I saw for the Archiver suggested that it was as affordable as tape.  That's simply not possible.  First, let's talk about what we're competing with. (For these comparisons, I am assuming you have either a tape system or a Data Domain box, and that what we're talking about is adding the cost of extra capacity to support long term storage of backups or archives.)

A backup or archive that is kept for that long is not kept in the tape library; it's put on a shelf.  (This is because chances are that it's never going to be read from.)  Therefore, the cost for tape is about $.02/GB, which is the cost of an LTO-5 tape cartridge.  The daily operational cost of that tape's existence is negligible, assuming it's onsite.

The last time I checked target dedupe appliances, they were about $1/GB after discounting.  I also saw a slide that this archiver is supposed to be about 20% cheaper than a regular Data Domain.  That puts it at around $.80/GB -- 40 times greater than the cost of a tape on a shelf.  And the daily operational cost of that disk is higher than the tape because it is going to be powered on.  (The Archiver does not currently support powering down unused shelves, although it may in the future.)

Then there is the issue of dedupe ratio.  The deduped disk price above is assuming a 20:1 dedupe ratio.  Dedupe ratios do not go up over time; they actually decrease.  This is because eventually we start making new data.  (The full backup you take today is going to contain quite a bit of new data when compared to the full backup from a year ago.)  Then there's the fact that the Archiver needs to start each tier (a collection of disks) with a new full backup, thus decreasing the overall dedupe ratio of the entire unit.  (It must do this in order to keep each tier self-contained.)  The result is that you will probably get a much lower dedupe ratio on your long term data than on your short-term data.  This increases your cost.

If you're going to do the right thing and use archive software to store data for several years (instead of backup software), any good archive software has single-instance-storage.  So if you're using archive software, you're going to get an even lower dedupe ratio.

Which brings me back to my belief that there is no way this can be anywhere near as inexpensive as tape.

The good news is that I didn't hear EMC saying that the Archiver is as cheap as tape when I saw them speak about it at EMC World.  When I talked to the EMC people at the show, I told them I had heard stories of EMC sales reps showing this unit cheaper than tape by using dedupe ratios of 100:1.  (The idea is that you're going to store 100 copies of the same full backups.)  They told me that any sales rep quoting ratios like is not speaking on behalf of EMC and talking out of his ...  Well, you know.

There's nothing that this unit offers that justifies that difference in price

Disk offers a lot of advantages when used for day-to-day backups.  It's a whole lot easier to stream during both backups and restores.  There is no question that it adds a lot of value there.  However, the idea of backups or archives that are stored long term is that no one reads them.  If they are reading them, it's for an electronic discovery request, where the amount of time you have to retrieve that is much greater than the time you typically have for a restore.  This increased amount of time is easily met with tape as your storage medium.  Disk offers no real advantage here.

When I said this, Mark Twomey pointed out that this unit offers regular data integrity checking of backups stored on it.  I informed him that if this were important, there are now two tape library manufacturers (Quantum & Spectralogic) that will be glad to do this for your tapes.

I will concede that disk does offer an advantage if you're using backups as your archives.  Having backups that will load instantly helps mitigate the issue of how many restores you're going to be doing to satisfy a complicated ediscovery request.

It's just wrong to store backups for many years

You should not be using your backups as archives.  You should not be using backups as archives.  If you ever get an ediscovery request for all of Joe Smith's emails for the last seven years -- and you happen to have a weekly full for each of the 364 weeks of that time frame -- you will remember what I said.

The thing is that EMC agrees. In fact, the EMC Archiver presentation starts with a few slides about how you should be doing real archiving; you should not be using your backups as archives.

They also said that they see this device as a transition device that can store both backups and archives.  Just because this device can store backups doesn't mean you have to store backups on it.  You can use proper archive software.  (But, if you did, I once again point out that your dedupe ratio will go down and therefore your effective cost per GB will go up.)

So what's changed, then?

I had a number of good conversations with EMC folks at last week's EMC World.  (Which, for the record, was a really big show.)  Some of those comments are above.  They know that this is not going to be cheaper than tape, and they're saying that anyone that is saying that is not being truthful.  They know that storing backups for years is wrong; they also know that more than half of the world does it that way.

The reason for the detente, however, is that I realize that many people hate tape.  I think they're wrong, as I've stated more than a few times.  There are plenty of IT departments that have a "get rid of tape" edict.  If the goal is to get rid of tape, the fact that the alternatives are much more expensive is not really an issue.  And if you're going to store backups for a really long time on disk, then at least EMC put some thought into what a disk system would need to do in order to do that right.  This includes things like fault isolation. If you lose one tier for whatever reason, you only lose the data on that array.  It includes things like scanning data occasionally to make sure it's still good.

Finally, Index Engines also announced an important product at EMC World that will help increase the value of the Archiver for those using it to store backups.  They already have a box that can scan tape backups and basically turn them into archives.  (One of the coolest products I've ever seen, BTW.)  They now support NFS, so you can point an Index Engines box at a DD Archiver and voila!  Those backups that you are storing on disk magically become fully searchable, ediscovery-ready archives.

Summary

Don't use your backups as archives.  Use archive software instead.  Tape is still the most economical destination for long term storage of backups or archives, and it's a pretty reliable one, too.  However, if you're going to store your backups or archives on disk for many years, there are worse places to put them than the EMC Data Domain Archiver.

 
 

Page 1 of 9

Sponsored Links