Sponsored Links

Login Form

RSS Feed

Mr. Backup Mr. Backup

Curtis' Random Thoughts

Backup Central Live Q1 starts next week!

PDFPrintEmail

Written by W. Curtis Preston Thursday, 19 January 2012 08:19

We've got new and exciting content for 2012, and we're starting our seminars this year with San Jose and San Diego next week.   These free seminars are first-come first serve (end-users only), and we're almost at capacity in the first two cities, so you'd better act now if you want to go.  I've also listed the rest of our backup seminars for Q1.  (Other cities will be announced soon.)

Date Event Where
Jan 24 Backup Central Live! San Jose, CA
Jan 26 Backup Central Live! San Diego, CA
Feb 7 Backup Central Live! Raleigh, NC
Feb 21 Backup Central Live! Miami, FL
Feb 23 Backup Central Live! Tampa, FL

See you there!

 

Stop SOPA/PIPA

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 18 January 2012 11:10

Before I say anything about SOPA, let me say that I am not battling SOPA because I’m into illegally downloading books/music/movies.  As a provider of content (author of three books), I am strongly FOR paying for the media I use.  And don’t give me that crap about it isn’t stealing cause you were never going to buy it anyway.  You didn’t have something, now you have it, and you neither paid for it nor obtained the permission of the person who provided it.  You call it what you want; I call it stealing.

Now that THAT is out of the way….

What I’m also very much against is the government wasting time and MY money trying to stop something they will never stop.  I’m against SOPA for many of the reasons I’m against the TSA.  The TSA is security theater; SOPA is anti-piracy theater.  The only thing it will accomplish besides wasted money is some government folks getting to say that they did what they could — all while wasting millions of taxpayer dollars and the time of other companies’ IT departments.

Chime in!  Especially over at my new domain stopsopacentral.com. ;)

 

 

Contemplating File Sync/Sharing Services

PDFPrintEmail

Written by W. Curtis Preston Monday, 31 October 2011 15:23

I wrote a few months ago about what a difference the cloud has made for how I conduct business.  I rarely buy software for my new company anymore; I often am paying for some type of cloud-delivered service.

One of those services that I use (and love) is Dropbox.  It is an incredibly easy replacement for a file server when you need to share 10s to 100s of GB of files between mutliple users.  However, I definitely have some security concerns about it, and not just since the big snafu a few months ago.

One of my issues with dropbox is that they can access my data.  Data is encrypted in transit, but they can access my data because they have my password.  The same appears to be true of Syncplicity & Sugarsync.  Why do I think that? Because they have a "reset my password" link.  How does encryption work if they can change my password without a problem?  Compare this, for example, to wuala's answer and boxcryptor's answer to the question about a lost password.

Even with Wuala, who says they don't know my password, how do they share encrypted data with users I specify?  If all data is encrypted/decrypted locally, how does the person with whom I'm sharing files decrypt them?  I'm curious.

The last two listed are open source alternatives.  They're too limited in functionality for me, but I thought I'd throw them on there anyway.

SugarSync

Wuala

Syncplicity

Boxcryptor

iFolder

Sparkleshare

What do you think about all this?  Anyone I left out that I shouldn't have?

 

Veeam excites and frustrates me

PDFPrintEmail

Written by W. Curtis Preston Saturday, 01 October 2011 23:41

Veeam is one of the most innovative backup and recovery tools designed specifically for VMware and Hyper-V.  They've also done a really good job of marketing this tool.  In a matter of a couple of years, they've gone from "who's Veeam?" to the mindshare leader in this space.  I'm not sure what they're actual market share is, and there are several other tools that are also making a name for themselves, but it's hard to think of a product that has more successfully captured the hearts and minds of their target market than Veeam.

They announced their vPower functionality at Tech Field Day in Seattle quite some time ago.  To summarize, this is the ability to run a VM from their backup image of that VM.  This opens up all sorts of different levels of functionality, such as instant VM recovery and automated, full testing of the viability of your backups of a given VM.

This is why I looked forward to their presentation at Tech Field Day 7.  At first, I was not disappointed. They announced support for Hyper-V.  Yay!  They also announced further refinement of their vPower functionality.  (They even gave me credit in one of the Powerpoint slides for some suggestion I made that they acted on.) They also hinted at a new version that is almost out, but wouldnt' really talk about it or show it.  We definitely were not allowed to ask questions about it.  Note to future Tech Field Day presenters: I can't think of a way to frustrate bloggers more than to tell them about a new version that you're not going to talk about, show us, or let us ask questions about.  To make that matter worse, they kept hinting about the new version throughout the presentation, but then kept telling us we couldn't ask about it.

Where the wheels fell off the truck for me was when I brought up the fact that most Veeam customers use Backup Exec to back up Veeam.  Another way to say that is that Veeam can't back itself up.  This resulted in a 20 minute conversation during which I got quite riled up, while Doug Hazelmen kept looking at me like he had no idea why I had such an issue with this.  You can watch the whole conversation here.  It's from 1:24 to 1:45.  He occasionally snickered, as if to say that the whole point of the discussion was ludicrious.  At one point he actually said the statement that they can't back themselves up was "stupid."  Yet he confirmed that the most common practice for Veeam customers was to use Backup Exec to back up Veeam.

Veeam data is stored in two places: the SQL database and the backup jobs directory.  There is no way within the product to make a special backup of the SQL catalog so that it can be easily restored without creating a catch-22 situation.  For example, one suggestion was to use one Veeam server to backup another Veeam server.  That creates a catch-22 of having to restore one server before you can restore the other server.  What if both servers are gone?   Doug hinted that losing the SQL database just isn't that big of a deal because it's just job configuration information.  You could just redo it if you lost it.  Is this really a backup company talking to me?

The second part of their data is the backup jobs history.  It has no catalog; everything that Veeam needs to know about the backups is stored with the backups.  The question is: what happens if one or more of those files gets corrupted?  What happens if some well-meaning admin looking for space deletes some jobs?  What happens if a rogue administrator deletes all of them?  As far as I could tell, Veeam has no way of recovering from this situation -- which is why most Veeam customers use Backup Exec to back up Veeam.

Doug seemed to think that I was pushing for tape support.  In a way, I was.  Tape is still the least expensive way to get data offsite.  In many organizations, it's the only way to get data offsite.  They just have too much data to be able to afford a pipe big enough to replicate their backups -- even if they have been deduplicated.  That issue aside, I wasn't pushing so much for tape as I was a method for creating a backup of my backup.  Files stored in filesystems get corrupted.  It just happened to me today.  For no apparent reason, a file whose modification time hadn't changed was telling me that it couldn't be copied.  It was a movie file on an iMac.  I can play the movie, but I can't copy the file.  Weird. That's what files on filesystems do -- and that's why we back them up.  But the guys at Veeam just don't seem to get this, and that's why they frustrate me.

On one hand, I think the idea of a backup that can test itself in a totally automated fashion is completely awesome, and a lot of other areas of functionality are very impressive as well.  On the other hand, them not understanding the issue I do have (and therefore not addressing it) is really frustrating.  I hope we can work this out eventually, but they'll first have to stop calling what I'm saying "stupid." ;)

 

Dell going for the big time

PDFPrintEmail

Written by W. Curtis Preston Saturday, 01 October 2011 21:58

Dell is going to build a unified storage system that has everything you could want ever want in a mid-tier or enterprise-tier storage system.  Or so said the presenters at Tech Field Day 7.  Only time will tell.

I was part of several bloggers visiting Dell's headquarters in Round Rock, TX (a short drive from Austin) last month just prior to VMWorld.  (That's my excuse for this blog entry being so late, BTW.)  Dell apparently paid for a double-sponsorship from Stephen Foskett of Gestalt IT so that they could talk to us for four hours (instead of the usual two).  They had a lot to talk about.

They made sure we knew about all of the major acquisitions that Dell has made over the past few years:

  • Equallogic - A scalable iSCSI grid storage array
  • Exanet - A scalable NAS system
  • Perot Systems - Professional Services
  • Ocarina - Deduplication and Compression
  • Compellant - Midrange storage arrays
  • RNA Networks - Cloud memory
  • Scalent Technologies - Datacenter management software

I believe it was Carter George who explained all this, and explained how Dell was going to integrate these technologies faster and better than any other storage company has ever done.  The way he described it, it was as if Dell would come out with a totally unified scalable storage system that supported iSCSI, NAS, dedupe and compression that could meet the needs of the mid-market and enterprise market, while being easy to manage in a datacenter -- and be cloud ready.  And they were going to do all of this reeeeal soon.  He didn't give dates, but the way it was talking, it sounded like 2012.

Dell, you see, "is starting from scratch."  Those other vendors weren't.  The problem is that I'm not sure how having several products from several different companies, all of which already have existing customers is "starting from scratch." 

The way this usually goes is each company becomes a faction in a big project, each wanting to put their technology into the finished product.  Each of them thinks that their technology is what's going to make things better.  I have one product in mind from the past, where it was pieced together from acquired technologies from a bunch of different companies.  The result was three levels of abstraction (one from each company) before the data ever got to disk.  The result was also a piece of crap.

Maybe Dell will be different.  I wish them the best of luck.  Good luck at tearing down the fiefdoms without damaging egos.  Good luck getting people to speak their mind when it's really important -- when the emperor appears to be getting undressed.  My personal experience with trying to do that with Dell did not go very well (to put it mildly), so I hope things have changed.

I also have concerns about how Dell salespeople will evolve to sell products that require upfront sales engineering to get the order right.  My personal experience with their sales teams so far suggests that they've got as much work to do here as they do with all their products I mentioned earlier.

I have been exposed to Equallogic, Compellant, and Ocarina before, and have heard nothing but good about them from the field.  So I think Dell has chosen some really solid building blocks to build a real storage company with.  I just don't think it's going to be as easy as the presenters at Tech Field Day were trying to say it will be.  I'll be more than happy to be wrong, though.

 

My first trip to VMworld

PDFPrintEmail

Written by W. Curtis Preston Friday, 09 September 2011 06:53

VMworld is the new industry show.  It is the show to attend and the show to exhibit at.  I was really impressed.  Here's a list of my thoughts about my trip there.

Update: I re-read this blog post this morning and felt it was too harsh and didn't contain enough of my positive reaction to the show.  I've therefore added a new paragraph or two in the beginning that explains my overall reaction to VMworld.  I also added some photos. No, I didn't get any complaints. This is just a case of writer's remorse.

VMworld is a very impressive show.  The main session was the biggest such session I've ever seen.  Attendance was around 20,000 people, which is more than last year, which was bigger than the year before, etc.  In a world of ever-shrinking tradeshows, it's nice to see one that's growing.  I liked the way they did the virtual park, and the way they had volleyball, basketball, and badminton courts (with what appeared to be pros would would play with you).  The attendance at the opening keynote was incredible.  (The content of the Paul Maritz's talk, or the what-appeared-to-be-scripted "interview" of the three CIOs later... not so much.  Due to my impression of that talk, I slept in the following morning and didn't go to the next morning's general session, only to be disappointed by all the tweets about how much better THAT talk was.) 

VMworld and the Venetian also did a very good job of shuffling 20,000 people around the various venues, including lunch.  I never felt like I was ever waiting in line anywhere.  There was the occasional traffic jam, of course, but nothing compared to what I've seen at some shows. Food was decent, and there were healthy options if that's what you were looking for. 

The treatment of the press was very good.  We had a press-only area with meals, drinks, and snacks where we could relax, write, blog, etc. Then they had a place where press could bring non-press people for interviews.  They also had dedicated Q&A sessions for the press.  All of that was very close by, which made it all very convenient.

Overall, it's a very good show with a lot of content (if that' what you're looking for) and a lot of exhibitors (if that's what you're looking for).  You could do a lot worse.  Now, my feedback...

1. Registration was easy

That is, just getting registered.  And then....

2. Session builder was horrible

You were required to register for any sessions you wanted to attend.  That's fine.  However, the system you had to use to do that was one of the worst designed web pages I've ever worked with.  Every mouse click resulted in a refresh of the entire page with a list of all sessions.  Many sessions were listed in multiple places, instead of just listing the session once with multiple times.  Registering for each session required many, many mouse clicks and a popup.  Then, of course, it was followed by a page refresh.  Yuck.

One cool feature was that you could export your schedule to your calendar.  That was nice.

3. The exhibit hall was huge, huge, huge.

It's not just that this was bigger than EMC World or Symantec Vision, or any other large industry show.  It's that it contained almost anyone who was anyone.  In the backup world, you're not going to see SyncSort or CA at EMC World, but you do see them here.  This shows how separate EMC continues to allow VMware to be.

In fact, the exhibit hall was so big, and there were so many vendors there that I hadn't seen in a long time, that I had to give up almost all the sessions I had planned to attend just to make time to see all the vendors in my space.  And that's just in the backup space!

4. The exhibit hall is a little out of control

Certain vendors (and you know who they are if you were there) send people so far out in the aisle that you can't get past them without being accosted.  They would literally stand in front of you, forcing you to interact with them.  This is regardless of how many times you went by the booth, or whether or not you had any interest in technology.

Many vendors exceeded any reasonable noise rules.  There should be a very definite rule that your booth cannot be beyond N decibels if you're more than N feet away from the booth.  Subwoofers should be outlawed altogether.  It is soooo not cool to be the booth 30 feet away and not be able to hold a conversation because another booth is blasting away.

If you're going to hire booth babes (and there's a good argument for not doing so), can you at least have them dress professionally and not like they're going to a night club or standing on a street corner?

5. Water.  Seriously.

I was never so thirsty as when I was in the exhibit hall.  You're several minutes away from any drinks you can buy.  There's no complimentary sodas.  So there should be water dispenser everywhere -- and they should be constantly monitored for fullness and cup availability.  Almost every single water dispenser I found was either out of water, out of cups, or both.  Here's an idea?  How about putting the next 5-gallon water bottle next to the dispenser.  If we're thirsty and it's empty, we'll put it in.

The first night I went to dinner after being thirsted to death in the exhibit hall.  I drank six glasses of water and -- not sure how to say this delicately -- my body showed me later it needed all six glasses. [Update: I heard from a few people that I put this too delicately and they didn't understand what I was saying.  I'm saying that I didn't need to go to the bathroom at all after drinking that much water.]  I was severely dehydrated just from walking around the exhibit hall.  Water was that hard to find.

Having said all of that, this is the new industry show and I will never miss it again if I can help it.

 

VMware passes Hyper-V up in the backup race

PDFPrintEmail

Written by W. Curtis Preston Sunday, 07 August 2011 09:00

The title may surprise none of you, but it is actually the opposite of what I said 1.5 years ago in a blog post called Hyper-V ahead of VMware in the backup race

Back then I was concerned that VMware did not have full VSS support.  They have since rectified that. [Update: by "full VSS support," what I mean is that it can talk properly to all versions of VSS.  Before, they did not support Windows 2008.  Now they support all versions of Windows.  There is still the problem that they only have one style of snapshot, so they aren't telling applications that they've been backed up, which means that the applications aren't truncating their logs.]

They also added changed block tracking (AKA "CBT") in vSphere, so it is possible to perform block-level incrementals on image-level backups. And since VMware is talking properly to VSS, the applications are doing what they are supposed to be doing before a backup as well. 

Now it is Hyper-V that is behind.  There is no API within Hyper-V that can present to you a map of changed blocks in order to back them up.  You can perform an incremental backup of-course, but an incremental backup via the Hyper-V host is going to back up everything, as every .VDK file will have changed.

This changed blocked block tracking feature of VMware makes finding which blocks have changed must faster, and backing up just the blocks that have changed (vs the files that have changed) is the fastest way to do an incremental backup.

Just like with VMware, third parties have stepped in to fill the void.  So far, I know of Veeam and Arkeia that are using their source deduplication capabilites to perform sub-file incremental backup of Hyper-V machines.  I'm sure there are more as well -- and if any of them mention themselves in a comment, I'll update my post.

 

Moving to the cloud

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 03 August 2011 00:34

I'm in the process of trying to convert my church's use of an onsite file and email server to a cloud file synchronization service and a hosted Exchange service.  I've chosen SugarSync, as that appears to have a little more flexibility than Dropbox, and Sherweb for the Exchange service, as that is what we're currently using for Exchange at Truth in IT.

For file services, the idea would be to move any folder that a given person needs access to their local PC, synchronize that folder to SugarSync, then share that folder out to other people who would need access to it.  That user would then synchronize that folder to their PC and have local access to it as well.  Changes would automatically be synchronized to every computer accessing the same folder.  (This is the same way we use Dropbox at Truth in IT. We share one big folder, and it's synchronized to all our Macbooks.)  In order for this to work at the church, I need each PC to have enough local storage space to hold the data that they need to access on a regular basis.  (They can get access to infrequently accessed files via the web.)  The good news is that most PCs today have way more storage than they need if they're using a fileserver. 

I've started the SugarSync pilot with three of the office workers, and selected about 20GB of folders to synchronize.  They have a 1.5 Mb T-1, so it took about a little over 24 hours to upload that 20GB up to SugarSync, and another 24+ hours to sync it to each computer that needs to have access to it.

Besides doing away with the server (and the costs associated with maintaining it), different people have experienced different benefits.

  • One staff member who does not have an Internet connection can work on his files on his laptop at home and have them automatically synchronized to SugarSync when he plugs into the church's Wi-Fi
  • One staff member who likes to work from home a lot can access all of her files at home just like she was at the church, and can stop using thumb drives to bring files back and forth, or waiting ages to download a file via the VPN
  • Another staff member needs infrequent access to office files from the house, but doesn't feel the need to sync any folders to his house.  He will instead download or upload any files he needs via the SugarSync website.

There you have it: different strokes for different folks.

 

 

Social Media and security

PDFPrintEmail

Written by W. Curtis Preston Friday, 22 July 2011 00:00

Social media incidents cost a typical company $4 million over the past 12 months, according to the results of a Symantec survey published today.

There have been a number of legal actions about social media in recent years, including a Financial Industry Regulation Authority (FINRA) regulatory notice, the Romano vs Steelcase Inc and Bass vs Ms. Porter's School cases (where both plaintiffs were granted discovery of the defendant's Facebook Profile), and the sexual harassment case EEOC vs Simple Storage Management LLC (where a US District Court held that social networking sites -- or SNS for short -- were discoverable).  This means that what your employees do on their personal time on SNSs can open your company to embarassment and litigation.  The survey, then, sought to find out how big this problem is in the enterprise. They hired Applied Research to interview IT professionals from 1200+ enterprises with 1000+ employees.

45% of respondents use SNSs for personal use, and 42% use them for company use.  IT folks are worried about employees sharing too much information (46%), the loss or exposure of confidential information (41%), damage to the brand (40%), exposure to litigation (37%), malware (37%), and violating regulatory rules (36%). 

The respondents to the survey listed 9 social media "incidents" in the past 12 months, with 94% of those incidents having consequences, including damage to the brand (28%), loss of data (27%), or lost revenue (25%).  The average cost of a social media incident was listed as $4.3M!

Most of the companies are discussing creating a social media policy, training their employees, putting processes to capture confidential information, and putting technology in place to stop these things from happening as well.  However, what was suprising was that -- while almost 90% of respondents felt they  needed to have these things in place, only 24% had a social media policy, 22% were training their employees on social media, and about 20% were using technology to control this process.

Folks, it's happening and it isn't going away.  The very least you can do is to create a social media policy and train your employees why it is important.  Those employees who are allowed to blog about company matters need to be continually reminded that their actions are discoverable.  Even if their personal site may not be demonstrated to be official company policy, it surely states the opinion of one of its employees -- and those employees make up the company.  And if it can be shown that one of its employees was continually doing something damaging on a publicly accessible social site and the company did nothing to stop it, that can be actionable.

Just remember: It's really easy to be a jerk on the Internet where you're not facing the person you're talking to.  You might want to dial it down a notch or two.  Just a thought.

Update 25 Jul 2011: I was given a briefing about this survey and didn't read the press release until today. During the briefing, Symantec seemed to be playing down the role that technology had to play in helping to solve this problem.  However, in the press release, it seems as if they're saying that Enterprise Vault is going to handle this by archiving social media content.  First, I have no idea why anyone who is not required to archive any content -- be it email or twitter -- would do such a thing.  If you're not required to keep something and keeping it adds no value to your business -- don't keep it!  Second, even if you did archive it, I'm trying to understand how that would help you in a discovery situation.  If someone wants to see your Facebook logs, they're going to subpoena Facebook.  That's what happened in the cases listed in this article.  So if you did archive it, now you're required to produce it.  So why would you do this if you weren't being forced?  And how would doing this help you in a trial?

 

Is Holographic Storage the future of archive & backup?

PDFPrintEmail

Written by W. Curtis Preston Thursday, 21 July 2011 20:26

And now for something completely different.  GE researchers have announced that they have successfully demonstrated a micro-holographic material that can support 500 GB in a DVD-style disc.  That's 20 times greater than most Blu-Ray discs (there is a Blu-Ray 100 in the works), and 100 times greater than DVDs.  So does this have backup and archive potential?  Let's look into that.

The first question is how fast this thing will be.  The article said that it supports "data recording at the same speed as Blu-ray discs."  The fastest a Blu-Ray disc can currently write is 12x, which translates into 54 MB/s.  That's slow in comparison to modern tape drives, but still not too shabby.  It's way faster than any of the Magneto-Optical formats. Although it's not stated anywhere, I'm assuming this is a random-access format, so it's access time during restores or retrievals would be very nice when compared to tape.  Due to the load/unload process, it's still not going to be as fast as a hard drive unless we're talking about leaving the disc in the drive all the time.  In a robotic setup, you'd have to add robotic time and load/unload time.  But this would all be similar to, if not better than, the speeds we have with tape.

The next question is cost, and there's nothing on that yet.  Traditionally, other optical formats have lost this race in a big way.  Only time will tell whether or not this format will change that pattern.

Finally, there's the question of long-term stability of the media itself.  I previously posted about the differences of tape vs disk in this area, and how tape is actually more stable for longer periods of time than disk is.  However, this is holographic storage and I honestly have no idea what the long term viability of data stored on such a medium would be.   I'm leaning towards the idea that it would actually be very stable, but I know that other optical formats are not as stable as one might think they would be, so...  Only time and more research will answer that question, too.

Assuming that they address the cost concerns and my hunches are right about its long term stability, I'm really leaning towards this as a long-term archival medium -- as opposed to a backup and recovery medium.  While 54 MB/s may sound like a lot, it's just not enough for today's large data centers.  Throughput doesn't matter much in archival situations, but random access does, making this really well suited to archive.

For those of you ready to dump tape or disk for anything that gives you the portability and cost of tape with the random-access nature of disk, it looks like you're going to have to wait a bit.

 

Include All Files; Reject Some

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 29 June 2011 16:57

I had a twitter chat with @JLivens the other day where the question was "what do you back up?"  My first response was to, of course, say that I back up everything - thrice - cause I'm me. If you're curious, my critical personal data is synced on multiple computers with history using Dropbox (which I'm reconsidering based on how things have been going over there lately), then it's backed up with the free version of CrashPlan to another computer that isn't at my house, AND I can't resist the urge to throw in a Time Machine backup every once in a while.  You know what?  I haven't done one of those in a week or so.  Just a second.  My little Time Machine icon is spinning now. Ah, there I feel better.

Side note: for all my talk about tape lately, you'll notice that I don't have any tape in my setup for now.  I am about to embark on a project that may make me reconsider that as I might have an archiving need soon.  Can't keep it all on spinning disk!

Alright, back to the topic at hand.  What do I back up?  I actually do back up everything, but that is not the point I wanted to get across in this post. 

It's easy to come up with a list of directories you don't want to back up.  Your /tmp folder, your "Temporary Internet Files," your folder on your work laptop that contains the illegally downloaded movies that you should have never downloaded in the first place.  Yeah, I'm talking to you.  Pay for the media/software you consume.

But what I wanted to talk about was how to make your backup selection if you want to exclude things.  What I've found is that the human tendency is to say "just backup the Documents folder," or something like that.  And that is what I really want to talk you out of.  There is too much risk doing it this way.  You could accidentally put some important data in a directory you're not backing up.  You could create a whole other directory that contains really important data and forget to add it to the list.  The risk outweighs the benefit of excluding the other data.

If your backup software has the ability, please have it autoselect both filesystems/drives and folders/directories.  If it supports it and if you want to do so, you can also create an exclude list of the directories you definitely don't want to back up.

And that's what I came to say: backup up everything, but exclude what you don't want.  Hopefully the title makes sense now.

 

Schedule Tweets from the command-line

PDFPrintEmail

Written by W. Curtis Preston Tuesday, 21 June 2011 07:29

We at Truth in IT have several events that we need to invite people to, and twitter is one of the ways we do that.  Scheduling such tweets in advance is a great way to make sure you send the right tweet at the right time, and Twuffer.com (short for Twitter Buffer) is an easy way to make such automated tweets happen.  The only problem is that each scheduled tweet in twuffer.com takes several mouse clicks, each of which is followed by a screen refresh.

I wondered if there was an easier way.  I'm proficient in old-style Bourne shell programming in Unix/Linux (never did get very good at Perl, but I rock at Bourne Shell) and I know how to use cron, so if I could just find a way to tweet from the Linux command-line I figured I could make my own twuffer.

An Internet search for "tweet from the command line" turned up this and this article.  I got all excited, then disappointed once I realized those were using basic authentication, which was disabled in June of last year.  It was replaced by oauth authentication, allowing you to authorize an app to use your twitter account without giving them your twitter password.

A google search for "oauth twitter commandline post" turned up this post from Joe Chung's "Nothing of Value" blog called "Twitter OAuth Example."  He explains a series of separate PHP scripts that, if run and edited in the proper order, will result in you having a script called twitter.php that is actually your own properly registered and authorized twitter app that can send tweets from the command line.

While I was able to figure out Joe Chung's instructions (and I'm incredibly thankful for them and the code that comes with them), I wanted to adapt his code and instructions a little bit for those who may not be as adept at coding.  And I've also added my own code around the final tweet.php script to support scheduled tweets.

Before You Start

If you want to understand more about Oauth and how it works, you should read the original blog post.  Each major step below is also a link to the original instructions from twitter.

What You'll Need

You will need a Unix/Linux command line (or something like it), php and cron to make all of this code work.  If you don't have cron or something like it, you won't be able to send scheduled tweets, but you will still be able to send tweets from the command line.  You'll also need to have a basic understanding of the command line.  Unlike the original code from Joe, though, you won't have to edit any of the PHP scripts.

Step 0: Download my modified code

You can download all of my source files here: http://www.backupcentral.com/twitterapp.zip
Unzip them into a directory then cd into that directory.  My first six steps of my post follow the ones from the original post.   I again urge you to read the original post, as he really deserves all the credit for figuring this out.  All I did was hack his scripts to behave differently.  If you want even more information, each step is a link to the original oauth spec from twitter.com.

Step 1: Register an application with twitter

Only registered apps can send tweets via Twitter's API.  So in order to send a tweet on the command line, you need to be your own app.  (Don't worry; the code is already written.  You just need to register the code you just downloaded as your own app.)  The first step in this process is to go to twitter.com and register your app.

Here are some pointers to help you fill out the form:

  1. Whatever you put as the name of the Twitter App is what will show up when you send tweets in the "via" column.  For example, we named ours TruthinITApp, so our scheduled tweets say "via TruthinITApp" at the end.  You can name the app whatever you want, except that the name cannot have the word "twitter" in it
  2. It doesn't matter what you put in the rest of the fields, although you should probably put a valid website, and a description of what you're up to.
  3. I put Browser as my application type, but I'm not sure if that matters
  4. Specify Read & Write or Read, Write & DM access
  5. Use twitter for login

Once you have clicked Save, you will be presented with a results page.  You need to get two values from that page: Consumer Key & Consumer Secret(Record these values somewhere for later.)

Step 2: Get a request token

Now you're going to do the equivalent of a user using the app for the first time.  You will login to twitter, then try to use the app.  Twitter will ask if you authorize the app.  After you do that, it gives you another value you need.

1. Login to twitter as the user you wish to send tweets as
2. Run the following command, substituting the two values of consumer_key and consumer_secret you got in Step 1

$ php getreqtok.php consumer_key consumer_secret

This will display a URL followed by a command.  You will use those two strings in the next two steps.

Step 3: Authenticate the user and authorize the app to tweet for the user

Cut and paste the URL from the previous step into your browser.  (This is the equivalent of using the app for the first time as the user you want to tweet as.)  Once you click Authorize App, it will display a seven-digit number that will then append to the command displayed in the results of the previous command.  (Record the value for later.)

Step 4: Get the access token and secret

Now that the app has been authorized to tweet for the user, the app needs to establish a special key and secret (think username and password, but without actually giving them your password) that it will use each time it tweets on your behalf.  The command will look something like the following command, where consumer_key and consumer_secret are the values that you got when you registered your app, oauth_token and oauth_token_secret are the values the app was given when the app was authorized by the user, and authkey is the seven-digit value from the web page.

$php getacctok.php consumer_key consumer_secret oauth_token oauth_token_secret authkey

This command will display the next command that must be run, which is the actual twitter.php command, along with all the arguments you need to pass to it.  It will look something like the following, where access_token and access_token_secret are the values that the previous command got that are the unique username/password combo for this app and for this user. (Notice the access token actually starts with your twitter user ID -- the number, not the name.)

$ php tweet.php "Hello World..." access_token access_token_secret consumer_key consumer_secret

Step 5: Post a tweet on the command line

Start your twitter client or monitor twitter.com for the user you're going to send the tweet as.

Run the command above, and you should see a bunch of text fly by.  As long as you don't see errors like "Invalid Token" or anything like that, your tweet should have gone through.  

You just sent your first command-line tweet!

Scheduling tweets using cron and tweet.sh

In addition to the code above that was written by Joe Chung, I wrote twitter.sh, that uses twitter.conf and twitter.txt to automate the sending of tweets using cron.  The rest of this blog post is about how to use those tools, which are also in the code you downloaded in Step 0.

Step 6: Edit tweet.conf with the appropriate keys and secrets

Put the values of consumer_key and consumer_key secret as the second and third field in the consumer_key line:

consumer_key:<consumer_key>:<consumer_key_secret>

Create a line for each user that you have authorized using the steps above and insert the appropriate values for:

username:<access_key>:<access_key_secret>

Step 7: Put a cron job that will run tweet.sh every minute for you:

* * * * * /workingdirectory/tweet.sh workingdirectory >/tmp/tweet.out 2>&1

Where workingdirectory is the directory where you installed the code.

Step 8: Edit tweet.txt and put a tweet sometime in the near future. 

The format for tweets is as follows (where "|" is the field separator):

MON DD HH:MM|username|Tweet goes here

Here's an example.  First, get the current date

$ date
Tue Jun 21 03:20:22 EDT 2011

(Yes, I'm up a little late working on this post...)

Second, add a tweet to the file for a few minutes from now

$ echo "Jun 21 03:22|testuser|Test tweet1" >>tweet.txt

Please note that I used "|" as the field separator.  This means you cannot use the "|" character in any of your tweets.  One other note: Twitter will not let you send the same tweet twice, so you will need to change your tweet phrase if you want to do more testing.

When Jun 21, 03:22 rolls around, it will send your tweet.  If tweet.php returns successfully (indicating a successful tweet), it removes it from tweet.txt and appends it to completedtweets.txt.  If there was a problem sending your tweet (such as it being a duplicate), then it leaves it in the tweet.txt file.

That's it.  All you need to do to send tweets in the future is to add them to tweet.txt and they will magically happen.  You can put blank lines, comments, or whatever other formatting you want in tweet.txt, as long as the actual tweet lines follow the format in step 8.

Please let me know if this post was helpful.  Also please post any suggestions on how to make the code better.  If I can make it work, I'll update the code and the post.

 
 

Tape more reliable than disk for long term storage

PDFPrintEmail

Written by W. Curtis Preston Thursday, 02 June 2011 00:46

Tape is inherently a more stable magnetic medium than disk when used to store data for long periods of time.  This is simply "recording physics 101," according to Joe Jurneke of Applied Engineering Science, Inc. 

I had heard rumblings of this before, but it was Joe that finally explained it in almost plain English in a post to this thread from hell on LinkedIn.  Here's the core of his argument:

By the way, the time dependent change in magnetization of any magnetic recording is exponentially related to a term known as KuV/kt. This relates the "blocking energy" (KuV) which attempts to keep magnetization stable, driven by particle volume (V) and particle anisotropy (Ku) to the destabilizing force (kt) the temperature in degrees kelvin (t) and Boltzmans constant (k).  Modern disk systems have KuV/kt ratios of approximately 45-60. Modern production tape systems have ratios between 80 and 150. As stated earlier, it is exponentially related. The higher the ratio, the longer the magnetization is stable, and the more difficult it is to switch state.....Recording Physics 101....

I had to call him to get more information.  He explained how this came about.  Disk drives have been pushed for greater and greater densities, which caused their vendors to create a much tighter "areal density."  Tape, on the other hand, mainly got longer and fatter to accomodate more data in the same physical space.  (Yes, it increased areal density, too, but nowhere near as much as the disk drive folks did.)  The result is that the tape folks have more room to play, allowing them to use magnetic particles with a bigger particle volume (the V in the equation).  The bigger the particle volume, the more stable the magnetism is, according to the KuV/kt equation.  In addition, tapes are generally stored outside of the drive, which means their temperature is lower than disk drives.  That means they have a lower k volume (degrees kelvin), which is one of the "bad" numbers in the KuV/kt equation.  Having a higher V value and a lower t value is what translates into tape systems having ratios of 80-150, vs disk systems that have ratios of approximately 45-60. While I don't have an exact cite to point to in order to show these exact values, what he's describing makes perfect sense to me.
 

Add to this the fact that tape drives also have a lower bit error rate than disk.  SATA disk is 1:10^14, FC disk is 1:10^15, LTO is 1:10^16, and IBM 3xx0 and Oracle T10000s are 1:10^17.

Add to this the fact that tape drives always do a read after write, where disk drives do not always do this.

Sooo...

Tape drives:

  1. Write data more reliably than disk
  2. Read it after they've written it to make sure they did (where disks often don't do that)
  3. Have significantly less "bit rot" or "bit flip" than disk drives over time.

Like I said in a previous post, I think we've put these guys out to pasture a little too soon.

 

My Detente With EMC's DD Archiver

PDFPrintEmail

Written by W. Curtis Preston Friday, 20 May 2011 19:12

When I first heard about the EMC disk archiver, I blew my stack.  I don't remember exactly how it was presented to me, but what I heard was that EMC was coming out with a disk product that was designed to hold backups for seven years or more.  Since storing backups for seven years or more is fundamentally wrong (and no one -- and I mean no one -- argues with that), the idea that EMC was coming out with a product that was designed specifically to do that angered me.  Brian Biles, VP of Product Management for EMC's BRS division, said with a wry smile, "so you're saying we've become a tobacco company."

I replied saying, "No, you've become a cigarette case manufacturer.  You shouldn't smoke, kids, but here's a really pretty gold case to hold your ciggies in."  I had a similar conversation with Mark Twomey (@storagezilla) on Twitter.

Since that time, I have come to a detente.  I still wouldn't buy one of these for my long term storage needs, but I can see why some other people might want to do so -- and I don't think those people are wrong or committing evil or data treason. This blog post is about how I got here from there.

Here were my arguments against this product:

There's no way that this could cost less than tape

Some of the messaging that I saw for the Archiver suggested that it was as affordable as tape.  That's simply not possible.  First, let's talk about what we're competing with. (For these comparisons, I am assuming you have either a tape system or a Data Domain box, and that what we're talking about is adding the cost of extra capacity to support long term storage of backups or archives.)

A backup or archive that is kept for that long is not kept in the tape library; it's put on a shelf.  (This is because chances are that it's never going to be read from.)  Therefore, the cost for tape is about $.02/GB, which is the cost of an LTO-5 tape cartridge.  The daily operational cost of that tape's existence is negligible, assuming it's onsite.

The last time I checked target dedupe appliances, they were about $1/GB after discounting.  I also saw a slide that this archiver is supposed to be about 20% cheaper than a regular Data Domain.  That puts it at around $.80/GB -- 40 times greater than the cost of a tape on a shelf.  And the daily operational cost of that disk is higher than the tape because it is going to be powered on.  (The Archiver does not currently support powering down unused shelves, although it may in the future.)

Then there is the issue of dedupe ratio.  The deduped disk price above is assuming a 20:1 dedupe ratio.  Dedupe ratios do not go up over time; they actually decrease.  This is because eventually we start making new data.  (The full backup you take today is going to contain quite a bit of new data when compared to the full backup from a year ago.)  Then there's the fact that the Archiver needs to start each tier (a collection of disks) with a new full backup, thus decreasing the overall dedupe ratio of the entire unit.  (It must do this in order to keep each tier self-contained.)  The result is that you will probably get a much lower dedupe ratio on your long term data than on your short-term data.  This increases your cost.

If you're going to do the right thing and use archive software to store data for several years (instead of backup software), any good archive software has single-instance-storage.  So if you're using archive software, you're going to get an even lower dedupe ratio.

Which brings me back to my belief that there is no way this can be anywhere near as inexpensive as tape.

The good news is that I didn't hear EMC saying that the Archiver is as cheap as tape when I saw them speak about it at EMC World.  When I talked to the EMC people at the show, I told them I had heard stories of EMC sales reps showing this unit cheaper than tape by using dedupe ratios of 100:1.  (The idea is that you're going to store 100 copies of the same full backups.)  They told me that any sales rep quoting ratios like is not speaking on behalf of EMC and talking out of his ...  Well, you know.

There's nothing that this unit offers that justifies that difference in price

Disk offers a lot of advantages when used for day-to-day backups.  It's a whole lot easier to stream during both backups and restores.  There is no question that it adds a lot of value there.  However, the idea of backups or archives that are stored long term is that no one reads them.  If they are reading them, it's for an electronic discovery request, where the amount of time you have to retrieve that is much greater than the time you typically have for a restore.  This increased amount of time is easily met with tape as your storage medium.  Disk offers no real advantage here.

When I said this, Mark Twomey pointed out that this unit offers regular data integrity checking of backups stored on it.  I informed him that if this were important, there are now two tape library manufacturers (Quantum & Spectralogic) that will be glad to do this for your tapes.

I will concede that disk does offer an advantage if you're using backups as your archives.  Having backups that will load instantly helps mitigate the issue of how many restores you're going to be doing to satisfy a complicated ediscovery request.

It's just wrong to store backups for many years

You should not be using your backups as archives.  You should not be using backups as archives.  If you ever get an ediscovery request for all of Joe Smith's emails for the last seven years -- and you happen to have a weekly full for each of the 364 weeks of that time frame -- you will remember what I said.

The thing is that EMC agrees. In fact, the EMC Archiver presentation starts with a few slides about how you should be doing real archiving; you should not be using your backups as archives.

They also said that they see this device as a transition device that can store both backups and archives.  Just because this device can store backups doesn't mean you have to store backups on it.  You can use proper archive software.  (But, if you did, I once again point out that your dedupe ratio will go down and therefore your effective cost per GB will go up.)

So what's changed, then?

I had a number of good conversations with EMC folks at last week's EMC World.  (Which, for the record, was a really big show.)  Some of those comments are above.  They know that this is not going to be cheaper than tape, and they're saying that anyone that is saying that is not being truthful.  They know that storing backups for years is wrong; they also know that more than half of the world does it that way.

The reason for the detente, however, is that I realize that many people hate tape.  I think they're wrong, as I've stated more than a few times.  There are plenty of IT departments that have a "get rid of tape" edict.  If the goal is to get rid of tape, the fact that the alternatives are much more expensive is not really an issue.  And if you're going to store backups for a really long time on disk, then at least EMC put some thought into what a disk system would need to do in order to do that right.  This includes things like fault isolation. If you lose one tier for whatever reason, you only lose the data on that array.  It includes things like scanning data occasionally to make sure it's still good.

Finally, Index Engines also announced an important product at EMC World that will help increase the value of the Archiver for those using it to store backups.  They already have a box that can scan tape backups and basically turn them into archives.  (One of the coolest products I've ever seen, BTW.)  They now support NFS, so you can point an Index Engines box at a DD Archiver and voila!  Those backups that you are storing on disk magically become fully searchable, ediscovery-ready archives.

Summary

Don't use your backups as archives.  Use archive software instead.  Tape is still the most economical destination for long term storage of backups or archives, and it's a pretty reliable one, too.  However, if you're going to store your backups or archives on disk for many years, there are worse places to put them than the EMC Data Domain Archiver.

 
 

Server virtualization does NOT cause storage explosion

PDFPrintEmail

Written by W. Curtis Preston Friday, 06 May 2011 16:55

Server virtualization doesn't kill storage.  People kill storage.  That's all I'm saying.

I get hot under the collar when I hear people say things like "server virtualization increases storage requirements by huge amounts."  They slam server virtualization with this comment, as if changing a server from being a physical one to being a virtual one somehow magically increases its size.  They list it as a reason that you shouldn't use server virtualization.

So I got a little irked when I heard the CEO of Symantec, Enrique Salem, say something like it in his keynote this week at Symantec Vision. (It was a great show, by the way.)  "Server virtualization increases storage use by 200% - 800%," he said.  When we had the media Q&A with him, this was the first question out of my mouth.  "What about moving a server from being physical to being virtual increases storage requirements?"  I asked a similar question of every other Symantec person I met with that day, as well as when I met VMware CTO, Steve Herrod.

In retrospect, I was probably a little hard on Mr. Salem during my Q&A.  Even Steve Herrod from VMware verified that the typical VMware customer does see such a storage explosion.  However, I still stand by my statement that this is not VMware's fault.  Moving to VMware does not cause your storage to magically explode.  Moving to VMware probably does "help" it happen, though.  Here are my thoughts on that.

VMware's design actually reduces storage use

The average virtual machine image (VMDK in VMware speak) is significantly smaller than the smallest disk drive you can buy to put into a server.  The smallest hard drive I can configure in a Dell server is 250 GB. You can create a thin-provisioned VMDK and it will consume only as much storage as it needs to, which is going to be far less than 250 GB.  I don't know Hyper-V as well as I do VMware, but I'm guessing it's similar.  I would also say that moving servers into VMware/Hyper-V means that you can put all those very duplicated images on a single storage volume that supports deduplication, removing that huge storage explosion.  You can't do that if you're using physical servers with discrete hard drives.

Many people buy their first "real" storage array when they buy VMware/Hyper-V

They may feel that this "forces" them to increase their storage costs, because they're used to just buying discrete hard drives -- often with no RAID or monitoring.  They then blame this increase in cost on VMware/Hyper-V.  I don't buy that either.  First, they didn't have to do that.  They could have bought a nice HP/Dell/IBM server with internal storage and run VMware on that.  The decision to buy a storage array is a second decision.  Second, if VMware "forces" them into the 21st century as far as storage management is concerned, so be it.  It's about time they have real storage.

Server virtualization often means a lot of test/dev VMs

This was Mr. Salem's point.  VMware/Hyper-V makes it really easy to have many, many different images of different configurations, so people create dozens or hundreds of VMs in their test/dev environment, and that causes a huge increase in storage.  I again say that you could continue to do in your dev/test lab whatever it was you did before you had VMware/Hyper-V, so it isn't VMware/Hyper-V's fault that you lab now uses 10 times more storage than it used to.  But it sure does make it easy, though, doesn't it?  I would also say that this increase in storage is accompanied by a huge increase in usability of the lab.

VM sprawl is evil and real and it eats up storage

This was the universal comment from most everyone I talked to.  When we step out of the test/dev world, it is a reality that when you are buying physical servers, there tends to be much more of an approval process.  When all you have to do to create a new server is click the right button on your mouse, you tend to create new "servers" very quickly.  Next thing you know, you have a whole lot more servers (and images of Windows/Linux) than you ever would have had if you had physical servers. VM sprawl is real, and it should be addressed with process and procedure.

VMware and Hyper-V are not the problem here.  What we do with it is the problem.  Yes, they make it much easier to do dumb things like VM sprawl, but blaming VMware and Hyper-V on your storage explosion is like blaming Ferrari for your tickets.  Just saying.

 

Someone in Ohio should have been fired

PDFPrintEmail

Written by W. Curtis Preston Friday, 22 April 2011 18:50

[This story originally happened in 2007, but I just learned about it, so I blogged about it.  Then I learned that it was a four-year-old story.    Everything here still applies, even if the actual story is old.  But I did re-edit the story and change it's title because the original wording seems a bit odd four years later. ]

Someone in the office of the State of Ohio should have been fired, and it isn't the guy who already got fired.  He should get his job back.  This story has me fuming.  I don't often write blogs like this, but here it goes.

The story as it was published in 2007 was "Intern loses backup tape with 800,000 SSNs on it. Intern fired."  The real story, in my opinion is what led up to this.  I read this article and this statement from the intern, and learned that the following allegedly happened in the State of Ohio:

1. The State of Ohio used (and may still use) unencrypted backup tapes to store SSNs and names

If your company or government entity is currently making tapes of any kind with SSNs on them then fix it.  Fixing this costs so little now that it is simply unforgivable not to be encrypting your backups tapes -- especially if you're handing them to a dude in a truck.  If you're handing them to an intern to take them home in a car… well, I really don't know what to say.

This is not a new problem.  It's not like we haven't had hundreds -- hundreds -- of exposures over the past 10 years that show how bad this practice is.  Ignorance of this problem simply isn't possible at this point.

2. Employees of The State of Ohio wanted to cover this up

They told the intern to not tell the police that one of the things stolen was a tape with sensitive data on it.  Seriously.  This tells me, of course, that they knew their unencrypted backup tape was a bad idea, and that they needed to keep others from knowing what they were doing.  It also tells me that they were liars.

3. The State of Ohio (a $52B/yr enterprise) had the money to hire $150/hr and $200/hr contractors full time, but didn't have the money to hire Iron Mountain (and still may not have it)

Seriously.  It had been the practice for apparently 10 years or more for someone to take the backup tapes home in their car.  Do I really need to say why this was stupid?  A hot car is not where tapes should ever be stored -- ever.  Asking someone who is off the clock to handle company property of any kind is also wrong.  Tapes -- especially unencrypted tapes -- should only be handled by professionals with procedures and policies to do such things.

No one ever told this young man what to do with this tape other than to bring it back the next day.  So not only was the practice to have him take it home, the practice was not to even give him any special instructions on how to handle the tape. Wow.

4. These same employees and their lawyer were bullies who needed a scapegoat and found one

The story about how they bullied this young intern into signing a resignation is just tragic.  He asked for an hour to think it over and they said no.  He asked for 20 minutes.  No.  He asked for 10 and they said no.  Just sign the paper.

Jared, if you're reading this, I would gladly act as an expert witness on your behalf for any kind of wrongful termination lawsuit you want to file. (I know this offer is a little late, but it's still out there.)

Someone in Ohio should have opened an investigation about the lack of security of taxpayers' personal information, as well as the details behind this story.  But if that never happened (and I can't find any evidence that it did), it's probably too late now.

 
 

Have we put tape out to pasture too soon?

PDFPrintEmail

Written by W. Curtis Preston Thursday, 21 April 2011 23:33

A week at NABShow (National Association of Broadcasters) and two days at Tape Summit last week have given me a chance to revisit my thoughts on tape.  Here's a brief summary of how my opinion of tape has changed over the years:

Stage 1: Tape was it.  It was all I knew. Backing up to disk was crazy, as it was too expensive.  (early 90s)
Stage 2: Tape was still it, but tape drives were getting too fast.  Multiplexing or disk staging was starting to be required.  Disk was too expensive to hold backups long term.
Stage 3: The dedupe craze hit.  It was both theoretically possible, as well as financially feasible (for some) to store all backups on disk -- and still have an offsite copy.
Stage 4: (Pretty recently).  I compared the pricing of today's dedupe systems to similarly-sized tape systems.  I was shocked at how expensive disk still was (4x-8x the price of tape).
Stage 5: (Today) I think we have unsuccessfully put a very good backup and archive target out to pasture and we should really reconsider that.

First, let me state that I am not saying that we should not have disk in a backup system, or that deduped systems are over-rated. What I am saying is that tape has more to offer than we've been giving it credit for lately. Here are some factors that came into my mind while considering this:

It costs 4-8 times more to acquire a disk-based backup system than it does to acquire an automated tape system.

While I've heard this from multiple sources, let me give you a real-life example to drive home this point.  I recently priced tape libraries and dedupe disk systems for a 20 TB shop, and I was surprised to learn that disk was actually still way more than the price of tape -- even after dedupe.  The average street price of the tape libraries I was considering was about $15K, and the average price of the dedupe systems was about $60K.   Since the customer was getting rid of their (very old) tape library, their choices were:

A) Buy a new tape library, copy tapes and hand them to a dude in a truck  ($15K)

B) Buy a dedupe system AND a tape library.  Copy from the dedupe system to the tape library, and then hand tapes to a dude in a truck. ($60K + $15K)

C) Buy two dedupe systems and replicate between them (no truck needed) ($120K)

Option C was 8 times more expensive than Option A and was out of the question.  While it meant they could get rid of their Iron Mountain bill, they did not believe they could ever save enough money to recoup that additional $105K.  Option B offered no cost savings, so it was difficult to justify the additional $60K.  I pointed out that Option A (if done correctly) requires a disk cache in front of their tape library, but they informed me that they were already doing that.  (Based on their throughput requirements, though, adding a disk cache wouldn't have added that much to the price.)

You can undoubtedly make an argument that a backup-to-disk system is easier to manage than a hybrid tape system, but the simple fact is that the disk system will be more expensive to purchase.

Tape actually has a better bit error rate than disk

For those unfamiliar with the concept of bit error rate (BER), the following definition from Wikipedia should be helpful:

"The bit error rate or bit error ratio (BER) is the number of bit errors divided by the total number of transferred bits during a studied time interval. … The bit error probability p^e is the expectation value of the BER. The BER can be considered as an approximate estimate of the bit error probability. This estimate is accurate for a long time interval and a high number of bit errors."

LTO-5 has a bit error rate of 1:10^17.  The TS1130 from IBM & the T10000C from Oracle both have a BER of 1:10^19.  SATA disk has a BER of 1:10^15 for SATA (SAS/FC is 1:10^16 but no one is using that for backup or archive).  This will probably come as a surprise to many people.  Tape has actually gotten so good at writing data, it is more reliable at writing data than disk!

While 10^15 may look really close to 10^17, it's not.  When it's bits we're talking about, it's the difference between 113 TB and 11.1 PB!  It means you are 100 times more likely to have bad data on disk than you are on an LTO-5 tape drive, and 10,000 times more likely than if the data is stored on a T1000C or TS1130 drive!

Tape uses less power than disk

Every time I calculate power consumption for tape systems vs. disk systems, tape systems win.  The reason for this is that tapes in slots take up no power at all, tape drives use very little power while they're not doing anything, and you need far fewer tape drives than you need disk drives.  I recently did a comparison for a 20 TB shop that resulted in at least a 2X difference in power consumption, and that included enough disk to do disk staging before the tape system.  (I plan to publish this once I double/triple check my numbers, but right now I feel pretty safe in saying at least a 2X difference.)

You buy the system once; you power it all day long every day.

Longterm (5+ years) storage of data on disk is not compatible with the typical lifecycle of disk, but it is compatible with tape.

This one is something we don't talk about.  An individual tape is made to hold data much longer than an individual disk, and the lifecycle of most tapes is much longer than the lifecycle of most datasets.  You cannot say the same about disks.  Storing data on disks for more than 5 years automatically assumes that you're going to migrate data from one disk unit to another.

In addition to the media, it is also very common for tape libraries and tape drives to outlast the disk systems sitting next to them. Where most companies migrate data at the end of the depreciation cycle for disk, they tend to keep their tape libraries and drives much longer than that.  They also tend to swap out their drives in the tape libraries; the same is not true in disk units. If you find a disk system in your data center older than five years, I'd be shocked.

What's the problem then?

Let's throw out the claims I've heard:

1. Tape has bitrot

So does disk.  It's called magnetism.  It happens.  The chances of bitrot happening on tape are far less than the chances of it happening on disk.

2. Tape is flimsy

Tell you what.  Move disks around the way you move tapes around and see how flimsy they are.

3. 80% of tape restores fail.

This Gartner statistic has been thrown around so much and I really don't know where Gartner got this number from, but it's out there.  What I can tell you is that in my entire career of working with backups, I've only had one or two restores that failed due to an actual bad tape -- and that's why we make copies.  But I can tell you of dozens of situations where bad disk drives caused me all sorts of headaches.

I can also tell you that most of the restore failures I've seen have been caused by human error - not tape failure.

4. Tape is too slow

Baloney.  Check your facts again.  There isn't a disk drive alive that can keep up with the speed of today's tape drives.

5. Tape is hard to make happy during backups & restores

Agreed.  This is why I believe strongly in using at least disk caching.  I would never design a system that uses just tape to do backups at this point.  I'm actually OK with all of the designs mentioned above (in the A, B, C list).  I think dedupe systems are awesome, and the idea of replicating to another one is even better.  But I also know that doing this is more expensive than the alternative. The other thing I know is that it can't possibly be cheaper to store data on disk for many, many years, and it may even be risky to do so.  (See my comments on BER.

What I'm really making an argument for is the use of tape for long term archiving, and as a less expensive way of getting data offsite.  (Less expensive than having a second dedupe system and replicating to it.)

6. Tapes go bad sitting on the shelf and you never know they're bad until you need them

That is correct.  This is why both Spectralogic and Quantum have come up with products to proactively scan your old archives to find and fix any corruption issues before you need a given tape.  If it finds something wrong, it can be fixed by copying the other copy that you have.

Conclusion

Tape can be your friend for long term archives and cheap offsite storage.  Don't dismiss it so lightly.

 

SNW Reflections

PDFPrintEmail

Written by W. Curtis Preston Thursday, 07 April 2011 18:46

I attended my first SNW conference in over two years last week.  (My previous employer was really good at scheduling me for competing events, so I wasn't able to go.)  Most of my thoughts about what I saw go hand-in-hand with Don Jenning's thoughts here.  I agree that it was very cool to see The Cube guys and to know that Infosmack was recording in the social media room.  (I got to sit in on one of those recordings with none other than Dr. Dedupe, Larry Freeman.  I look forward to that podcast.)  And it was my first experience live tweeting from a big event that had a hashtag (#SNWUSA).  That was definitely very cool.  Here are my thoughts, some of which are about SNW, and others that are about the vendors.

First, SNW is still very much alive and well, but it's definitely not the show that it used to be.  There was a time when your company was considered dead or dying if your major reps were not in attendance.  That appears to be the case no longer.  Sure, there were a lot of companies there.  But there were a lot of companies not there as well.  Perhaps it's because there are other options now to talk to analysts and other vendors (like The Exec Event) that pull budget money away from this.  Perhaps its also because there is now a whole other group of people (bloggers) that you also need to reach out to.  Vendors are either doing their own thing, like HDS' recent event, or they're sponsoring one of the Tech Field Day events (which cost much less than doing it yourself).  Either way, that's more money being pulled away from this "industry event."

The other reason (if not the main reason) that vendors do a show like SNW is for new leads.  New leads means new end users to talk to.  I remain unconvinced that this is a show to find such people in significant numbers.  Any time I ever scanned the room, I saw way more vendor, analyst and press badges than end user badges.  I also kept seeing the same end users over and over.

As a former end-user, and someone who continues to see himself as an advocate for such, I am simply not drawn to the content offered by SNW.  With very few exceptions, it's one talk after another given either by a vendor or someone the vendor is paying to be there (either an analyst or an end user who was sponsored to the show by a vendor).  I do like the SNIA tutorials, BTW, because I know they try really hard to keep those vendor-neutral.  But every other presentation was (IMO) simply a marketing presentation.  They should all be titled: The Correct Way to do X, Which Can Only be Done by Buying Our Product.  I'm not saying that these sessions are bad, mind you.  I'm just saying that they don't draw me.

I'd much rather hear from independent thinkers that are mostly absent from this conference.  There's another conference that these people tend to speak at that tends to draw a much larger percentage of end users, and that is Storage Decisions.  I like that show for its content and the make-up of its audience.  It's a real shame that I seem to be uninvited.  This upcoming Storage Decisions is the first one I haven't spoken at in a really long time.  Although no official word was given, it's no coincidence that this happened right after I started hosting my own road show: Backup Central Live.  No worries.  I've got plenty of other things to do.

In my final thoughts on SNW, I want to pass out the worst-designed booth award.  Booth design experts tell you that you have 7 seconds to catch someone's attention.  Anobit's booth display was whitespace with their company name and the phrase "add another bit."  What are they, anti-dedupe software? Are they hardware, software, service, what?  Then there was RackSpace.  Their booth, which I took a picture of, said they were the world leader in hosting and cloud computing, but never said the name of their company.  This is something not lost on the people that worked the booth, because they took the ghetto little white sign (hung on the pipe and drape before the vendor gets there so they know which booth to go to) and hung it up and over their display. See this photo for what I'm talking about.  So we have a booth with a company name and no description, and a booth with a very big description but no company name.  I declare a tie.

I do have some thoughts to post about companies that I met and got briefings from.  They will come soon.

 
 

OK, encrypt your disks after you've done everything else

PDFPrintEmail

Written by W. Curtis Preston Thursday, 31 March 2011 20:53

The other day I wrote a blog entry that said encrypt your tapes but not your disks.  My fundamental premise was that encrypting data at rest in your disk drives only protects from the thing that will never happen: someone walking out with an entire disk array under their arm.  Single disk drives yanked out of the array (more likely) were not going to be any use to anyone even if you didn't encrypt them.

Turns out I was wrrrmph.

Turns out that the most sensitive data is probably very recoverable from a RAID-ed disk drive.  A whole lot of 1K database rows can be stored in a 64K block of data stored on an individual disk drive in a parity-protected disk array.  (See the comments from my previous post for details.)  And it turns out that you can't degauss hard drives and return them, so there's also the exposure of what happens when you return a disk drive to the manufacturer.

I was wrong about the risk, but I still think there are bigger fish to fry in the datacenter.  Sticking with just my world, we've got companies that:

  1. Don't copy their backups (they keep only one copy of every disk or tape they make)
  2. Don't send their backups offsite
  3. Wait a week or two before sending their backups offsite
  4. Don't back up their laptops
  5. Back up their remote offices using tapes that aren't copied and/or aren't ever sent anywhere

If you've got data that isn't being backed up and isn't being stored in a different location than it was backed up, you will lose data.  This isn't a "maybe some guy might steal a disk drive and if he does he might be able to read some data on it."  Every company in the world has lost a disk drive somewhere in their environment.  I'm a very small company and I lost four this year alone.

The number one reason people telling me they're on the list above is money.  So my point is that if you're spending money on encrypting your disks, but you're not backing your stuff up in the first place -- you've got your priorities all wrong.

I have the same opinion when I see people spending money make their backup server highly available, but they don't have money to make a second copy of their backups.  Who cares if your backup server goes down for an hour?  It's a big deal, but the only app that's down is backup -- not production.  But the chances of you losing data because you had a failed tape and no copies is much higher.  Save the money on the HA software for the backup server and spend it on something that actually makes your backups better.

I also think you can minimize this risk by doing a few things, all of which are cheaper than full disk encryption:

  1. Strong physical security in the data center.  Plenty of good things you can do.
  2. Video surveillance in the data center
  3. Identify really sensitive data and encrypt it in the application
  4. Strong physical security (locks) on the disk arrays themselves.  Prevent someone from grabbing a disk drive.
  5. Monitoring on same.  If a disk drive is taken, you should be immediately notified.

Like I said, there are lots of things you can do (and should do) that don't cost near as much as full disk encryption and most of which you should be doing anyway.

 

Announcing Backup Central Live! Q2 cities & dates

PDFPrintEmail

Written by W. Curtis Preston Wednesday, 30 March 2011 19:17

After our very successful five-city tour in Q1, we are now announcing cities, dates, and locations for our Q2 events.  Those of you that live in Raleigh, Boston, Philadelphia, Dallas, & Minneapolis are the next folks to be able to attend a Backup Central Live event.   In addition, those of you interested in deduplication, continuous protection of servers, and backup of laptop data have three webinars to choose from next month.

Here are all of our upcoming events and where you can register for them.

Seminars
Raleigh Apr 26 Register Now
Boston Apr 28 Register Now
Philadelphia May 17 Register Now
Dallas May 19 Register Now
Minneapolis May 24 Register Now
Webinars
Better Backup: Strategies for Better Protecting Your Data, Your Time, and Your IT Budget
Apr 19 12p ET Register Now
Top 10 Backup and Disaster Recovery Secrets You Can’t Afford Not to Know Apr 20 12p ET Register Now
Enterprise Laptop Backup: Protecting Users At The Edge Apr 27 1p ET Register Now

See you there!  If you have any questions about events, feel free to This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

 
 

Page 1 of 9

Sponsored Links