More proof that one basket for all your eggs is bad: codespaces.com is gone

Codespaces.com ceased to exist on June 17th, 2014 because they failed to adhere to the standard advice of not putting all your eggs in one basket.  There are a number of things that they could have done to prevent this, but they apparently did none of them.

Before I continue, let me say this.  I know it’s been more than a year since I blogged.  I don’t know what to say other than I’ve been a lot busy building my new company.  Truth in IT now has six full time employees, several part time employees, and several more contractors.  We do a little bit of everything, including backup seminars, storage seminars, webinars, viral video production, lead nurturing programs, and some other things we’re working on in stealth at the moment.  Hopefully I’ll get back to blogging more often.  OK.  Back to the business at hand.

Here’s the short story on codespaces.com.  Their websites, storage, and backups were all stored in the Amazon.com egg basket.  Then on June 17th, they were subjected to a DDOS attack by someone who was going to extort money from them.  He gained access to their Amazon control panel.  When they took steps to try and fix the problem, he reacted by wiping out their entire company.  According to their site, he “removed all EBS snapshots, S3 buckets, all AMI’s, some EBS instances and several machine instances. In summary, most of our data, backups, machine configurations and offsite backups were either partially or completely deleted.”  I hate being a Monday morning quarterback, but this is what happens when you put all your eggs in one basket. 

I’m a fan of cloud services. (Truth in IT is run entirely in the cloud.)  I’m a fan of disk backups. (Truth in IT uses both a cloud-based sync and share service and a cloud-based backup service.)  But if it’s on disk and is accessible electronically, it is at risk.  Having your services, storage, and backups all accessible via the same system is just asking for it.  

I do not see this as a cloud failure.  I see this as a process and design failure.  They would have been just as likely to have this happen to them if they had done this in their data center. That is, if they used a single system to store their server images, applications and data, snapshots of that data, and extra copies of those snapshots.  Yes, using Amazon made it easier to do this by offering all of these services in one place. But the fact that it was in the cloud was not the issue — the fact that they stored everything in one place was the issue.

I love snapshot-based backups, which is what codespaces.com used. It should go without saying, though, that snapshots must be replicated to be any good in times like this.  However, as I have repeatedly my friends at companies that push this model of data protection, even a replicated snapshot can be deleted by a malicious admin or a rolling bug in the code.  So I still like some other kind of backups of the backups as long as they are accessible electronically.  

Use a third-party replication/CDP system to copy them to a different vendor’s array that has a different password and control panel.  Back them up to tape once in a while.  Had they done any of these things into a system that was not immediately controllable via the Amazon Control Panel, their backups would have been safer.  (The hacker would have had to hack both systems.)  However, since all server data, application data, and backup data were all accesible via a single Amazon.com console, the hacker was able to access their data and their backups via the same console.

I love cloud-based computing services.  There’s nothing wrong with them running their company on that.  But also storing their backups via the same Amazon console as their server?  Not so much.

I love cloud-based backups.  They are certainly the best way to protect cloud-based servers.  I’m also fine with such backups being stored on S3.  But if your S3 backups are in the same account as your AWS instances, you’re vulnerable to this kind of attack.

I also want to say that this is one of the few advantages that tape has — the ability to create an “air gap.”  As a removable medium, it can be used to place distance (i.e. an “air gap”) between the data you’re protecting and the protection of that data.  Store those backups at an offsite storage company and make retrieval of those tapes difficult.  For example, require two-person authentication when picking up backup tapes outside of normal operations.

For those of you backing up things in a more traditional manner using servers in a non-cloud datacenter, this still applies to you.  The admin/root password to your production servers should not be the same password as your development servers — and it should not be the same one as your backup servers.  Your backup person should not have privileged access to your production servers (except via the backup software), and administrators of your production servers should not have privileged access to your backup system.  That way a single person cannot damage both your production systems and the backups of those systems.

I would add that many backup software packages have the ability to run scripts before and after backups run, and these scripts usually run as a privileged user.  If a backup user can create such a script and then run it, he/she could issue an arbitrary command, such as deleting all data — and that script would run as a privileged user.  Look into that and lock that down as much as you can.  Otherwise, the backup system could be hacked and do just what this person did.

Don’t store all your eggs in one basket.  It’s always been a bad idea. 

 

Continue reading

Get rid of tape? Inconceivable!

Stephen Manley published a blog post today called “Tape is Alive? Inconceivable!”  To which I have to reply with a quote from Inigo Montoya, “You keep using that word. I do not think it means what you think it means.”  I say that because, for me, it’s very conceivable that tape continues to play the role that it does in today’s IT departments.  Yes, its role is shrinking in the backup space, but it’s far from “dead,” which is what Stephen’s blog post suggests should happen.

He makes several good points as to why tape should be dead by now.  I like and respect Stephen very much, and I’d love to have this discussion over drinks at EMC World or VMworld sometime.  I hope that he and his employer see this post as helping him to understand what people who don’t live in the echo chamber of disk think about tape.  

Stephen makes a few good points about disk in his post.  The first point is that the fastest way to recover a disk system is to have a replicated copy standing by ready to go.  Change where you’re mounting your primary data and you’re up and running.  He’s right.  He’s also right about snapshots or CDP being the fastest way to recover from logical corruption, and the fastest way to do granular recovery of files or emails.

In my initial post on the LinkedIn discussion that started this whole thing, I make additional “pro-disk” points. First, I say that tape is very bad at what most of us use it for: receiving backups across a network — especially incremental backups.  I also mention that tape cannot be RAID-protected, where disk can be. I also mention that disk enables deduplication, CDP, near-CDP and replication — all superior ways to get your data offsite than handing tape to a dude in a truck.  I summarize with the statement that I believe that disk is the best place for day-to-day backups.

But…

Disk has all of the above going for it.  But it doesn’t have everything going for it, and that’s why tape isn’t dead yet — nor will it be any time soon.

I do have an issue or two with the paragraph in Stephen’s post called “Archival Recovery.”  First, there is no such thing.  It may seem like semantics, but one does not recover from archives; one retrieves from archives.  If one is using archive software to do their archives, there is no “recover” or “restore” button in the GUI.  There is only “retrieve.”  Stephen seems to be hinting at the fact that most people use their backups as archives — a fact on which he and I agree is bad.  Where we disagree is whether or not moving many-years-old backup data to disk solves anything. My opinion is that the problem is not that the customer has really old backups on tape.  The problem is that they have really old backups.  Doing a retrieval from backups is always going to be a really bad thing (regardless of the media you use) and could potentially cost your company millions of dollars in fines and billions of dollars in lost lawsuits if you’re unable to do it quickly enough.   (I’ll be making this point again later.)

Cost

Disk is the best thing for backups, but not everyone can afford the best.  Even companies that fill their data centers with deduplicated disk  and the like still tend to use tape somewhere — mainly for cost reasons.  They put the first 30-90 days on deduped disk, then they put the next six months on tape.  Why?  Because it’s cheaper.  If it wasn’t cheaper, there would be no reason that they do this.  (This is also the reason why EMC still sells tape libraries — because people still want to buy them.)

Just to compare cost, at $35 per 1.5 TB tape, storing 20 PB on LTO-5 tapes costs $22K with no compression, or $11K with 2:1 compression.  In contrast, the cheapest disk system I could find (Promise VTrak 32TB unit) would cost me over $12M to store that same amount of data.  Even if got a 20:1 dedupe ratio in software (which very few people get), it would still cost over $600K (plus the cost of the capacity-based dedupe license from my backup software company).

It’s also the cheapest way to get data offsite and keep it there.  Making another copy on tape at $.013/GB (current LTO-5 pricing) and paying ~$1/tape/month to Iron Mountain is much cheaper than buying another disk array (deduped or not) and replicating data to it.  The disk array is much more expensive than a tape, and then you need to pay for bandwidth — and you have to power the equipment providing that bandwidth and power the disks themselves.  The power alone for that equipment will cost more than the Iron Mountain bill for the same amount of data — and then you have the bill for the bandwidth itself.

Now let’s talk about long-term archives.  This is data stored for a long time that doesn’t need to be in a library.  It can go on a shelf and that’ll be just fine.  Therefore, the only cost for this data is the cost of the media and the cost of cooling/dehumidifying something that doesn’t generate heat.  I can put it on a tape and never touch it for 30 years, and it’ll be fine (Yes, I’m serious; read the rest of the post).  If I put it on disk, I’m going to need to buy a new disk every five years and copy it.  So, even if the media were the same price (which it most certainly is not), the cost to store it on disk would be six times the cost of storing it on tape.

Unlimited Bandwidth

Never underestimate the bandwidth of a truck.  ‘Nuf said.  Lousy latency, yes.  But definitely unlimited bandwidth.

Integrity of Initial Write

LTO is two orders of magnitude better at writing bits than enterprise-grade SATA disks, which is what most data protection data is stored on.  The undetectable bit error rate of enterprise SATA is 1:10^15, and LTO is 1:10^17.  That’s one undetectable error every 100 TB with SATA disk and one undetectable error every 10 PB with LTO.  (If you want more than that, you can have one error every Exabyte with the Oracle and IBM drives.)  I would also argue that if one error every 10 PB is too much, then you can make two copies — at a cost an order of magnitude less than doing it on disk.  There’s that cost argument again.

Long-term Integrity

As I have previously written, tape is also much better than disk at holding onto data for periods longer than five years.  This is due to the  physics of how disks and tapes are made and operated.  There is a formula (KuV/kt) that I explain in a previous blog post that explains how the bigger your magnetic grains are, the better, and the cooler your device is, the better  The resulting value of this formula gives you an understanding of how well the device will keep its bits in place over long periods of time, and not suffer what is commonly called “bit rot.”   This is because disks use significantly smaller magnetic grains than tape, and disks run at very high operating temperatures, where tape is stored in ambient temperatures.  The result is that disk cannot be trusted to hold onto data for more than five years without suffering bit rot.  If you’re going to store data longer than five years on disk, you must move it around.  And remember that every time you move it around, you’re subject to the lower write integrity of disk.

I know that those who are proponents of disk-based systems will say that because it’s on disk you can scan it regularly.  People who say that obviously don’t know that you can do the same thing on tape.  Any modern tape drive supports the SCSI verify command that will compare the checksums of the data stored on tape with the actual data.  And modern tape libraries have now worked this into their system, automatically verifying tapes as they have time.

Only optical (i.e. non-magnetic) formats (e.g. BluRay, UDO) do a better job of holding onto data for decades.  Unfortunately they’re really expensive. Last I checked, UDO media was 75 times more expensive than tape.

Air Gap [Update: I added this a day after writing the inital post because I forgot to add it]

One thing tape can do that replicated disk systems cannot do is create a gap of air between the protected data and the final copy of its backup.  Give the final tape copy to Iron Mountain and you create a barrier to someone destroying that backup maliciously.  One bad thing about replicated backups is that a malicious sysadmin can delete the primary system, backup system, and replicated backup system with a well-written script.  That’s not possible with an air gap.

Device Obsolescence

People that don’t like tape also like to bring up device obsolescence.  They say things like “you can’t even get a device to read the tape you wrote 10 years ago.”  They’re wrong.  Even if you completely failed to plan, there is a huge market for older tape drives and you can find any tape drive used in the last 20-30 years on eBay if you have no other choice. (I know because I just did it.)

Second, if you’re keeping tapes from twenty-year-old tape drives, you should be keeping the drives.  Duh.  And if those drives aren’t working, there are companies that will repair them for you.  No problem, easy peasy.  Device obsolescence is a myth.

Device Life

Suppose you have a misbehaving disk from many years ago.  There are no disk repair companies.  There are only data recovery companies that charge astronomical amounts of money to recover data from that drive.

Now consider what you do if you had a malfunctioning tape, which is odd, because there’s not much to malfunction.  I have been able to “repair” all of the physically malfunctioning tapes I have ever experienced (which is only a few out of the hundreds of thousands of tapes I’ve handled).  The physical structure of a modern tape spool is not that difficult to understand, take apart, and reassemble.

Now consider what happens when your old tape drive malfunctions, which is much more likely.  You know what you do?  Use a different drive!  If you don’t have another drive, you can just send the one that’s malfunctioning to a repair shop that will cost you far less than what a data recovery company will cost you.  If you’re in a hurry, buy another one off eBay and have them rush it to you.  Better yet, always have a spare drive.

Legal Issues

This isn’t really a disk-vs-tape issue, but I just had to comment on the customer that Stephen quoted in his blog post as saying, “I’m legally required to store data for 30 years, but I’m not required by law or business to ever recover it. That data is perfect for tape.” That may be a statement that amuses someone who works for a disk company, but I find the statement to be both idiotic and irresponsible.  If one is required by law to store data for 30 years, then one is required by law to be able to retrieve that data when asked for it.  This could be a request from a government agency, or an electronic discovery request in a lawsuit.  If you are unable to retrieve that data when you were required to store it, you run afoul of that agency and will be fined or worse.  If you are unable to retrieve the data for an electronic discovery request in a lawsuit, you risk receiving an adverse inference instruction by the judge that will result in you losing the lawsuit.  So whoever said that has no idea what he/she is talking about.

Think I’m exaggerating?  Just ask Morgan Stanley, who up until the mid 00’s used their backups as archives.  The SEC asked them for a bunch of emails, and their inability to retrieve those emails resulted in a $15M fine.  They also had a little over 1400 backup tapes that they needed months of time to be able to pull emails off of to satisfy an electronic discovery request from a major lawsuit from Coleman Holdings in 2005.  (They needed this time because they stored the data via backup software, not archive software.)  The judge said “archive searches are quick and inexpensive. They do not cost ‘hundred of thousands of dollars’ or ‘take several months.'”  (He obviously had never tried to retrieve emails off of backup tapes.)  He issued an adverse inference instruction to the jury that said that this was a ploy by Morgan Stanley to hide emails, and that they should take that into consideration in the verdict.  They did, and Morgan Stanley lost the case and Coleman Holdings was given a $1.57B judgment.

Doing a retrieval for a lawsuit or a government agency request is a piece of cake — regardless of the medium you use — if you use archive software.  If you’re use backup software to store data for many years, it won’t matter what medium you use either — retrieval will take forever.  (I do feel it important to mention that there is one product I know that will truly help you in this case, and that’s Index Engines. It’s a brute-force approach, but it’s manageable.  They support disk and tape.)

Summary

Why isn’t tape dead?  Because there are plenty of things that it is better at than disk.  Yes, there are plenty of things that disk is better at than tape.  But move all of today’s production, backup, and archive data to disk?  Inconceivable!

Continue reading

Love my Mac Starting to hate Apple.

Keep it up, Apple, and I’m going back to Windows.

I was a Windows customer for many years.  Despite running virus/malware protection and being pretty good at doing the right things security-wise, I had to completely rebuild Windows at least once a year — and it usually happened when I really didn’t have the time for it.  It happened one too many times and said, “that’s it,” and I bought my first MacBook Pro. (The last Windows OS I ran on bare metal was Windows XP.)

I made the conversion to MacOS about 4+ years ago.  During all this time, I have never — never — had to rebuild MacOS. When I get a new Mac, I just use Time Machine to move the OS, apps, and data to the new machine.  When a new version of the OS comes out, I just push a button and it upgrades itself.  I cannot say enough nice things about how much easier it is to have a Mac than a Windows box.  (I just got an email today of a Windows user complaining about what he was told about transferring his apps and user data to his new Windows8 machine.  He was told that it wasn’t possible.)

My first Mac was a used MacBook Pro for roughly $600, for which I promptly got more RAM and a bigger disk drive.  I liked it.  I soon bought a brand new MacBook Pro with a 500 GB SSD drive, making it cost much more than it would have otherwise.  (In hindsight, I should’ve bought the cheapest one I could buy and then upgrade the things I didn’t like.)  It wasn’t that long before I realized that I hadn’t put enough RAM in it, so I did.  (I didn’t account for the amount of RAM that Parallels would take.) 

My company’s second Mac was an iMac. After we started doing video editing on that, we decided to max out its RAM.  Another MacBook Pro had more RAM installed in it because Lion wanted more than Snow Leopard, and on another MacBook Pro we replaced the built-in hard drive with an SSD unit and upgraded its RAM.  We are still using that original MacBook Pro and it works fine — because we upgraded to more RAM and a better disk — because we could. It’s what people that know how to use computers do — they upgrade or repair the little parts in them to make them better.

The first expensive application we bought (besides Microsoft Office) was Final Cut Pro 7, and I bought it at Fry’s Electronics — an authorized reseller of Apple products.  I somehow managed to pay $1000 for a piece of software that Apple was going to replace in just a few days with a completely different product.  Not an upgrade, mind you, a complete ground-up rework of that product.  Again, anyone who followed that world knows what’s coming next.  I wish I had known at the time.

First, Apple ruins Final Cut Pro

For those who don’t follow the professional video editing space, Final Cut Pro was the industry standard for a long time.  Other products eventually passed it up in functionality and speed, but a lot of people hung onto Final Cut Pro 7 anyway because (A) they knew it already and (B) it worked with all their existing and past project files.  They waited for years for a 64-bit upgrade to Final Cut Pro 7. 

Apple responded by coming out with Final Cut Pro X, a product that was closer in functionality to iMovie than Final Cut Pro  — and couldn’t open Final Cut Pro 7 projects.  (In case you missed that, the two reasons that people were holding onto Final Cut Pro 7 were gone.  They didn’t know how to use the new product because it was night and day a different product, and it couldn’t open the old product’s projects.)  FCP X was missing literally dozens of features that were important to the pro editing community.  (They have since replaced a lot of those missing features, but not all of them.) And the day they started selling FCP X, they stopped selling FCP 7.  Without going into the details, suffice it to say that there was a mass exodus and Adobe and Avid both had a very good year.  (Both products offered, and may still be offering big discounts to FCP customers that wanted to jump ship.)

But what really killed me is what happened to me personally. I thought that while Apple was addressing the concerns that many had with FCP X, I’d continue using FCP 7.  So I called them to pay for commercial support for FCP 7 so I could call and ask stupid questions — of which I had many — as I was learning to use the product.  Their response was to say that support for FCP 7 was unavailable.  I couldn’t pay them to take my calls on FCP 7. What?

So here I am with a piece of software that I just paid $1000 for and I can’t get any help from the company that just sold it to me.  I can’t return it to Fry’s because it’s open software.  I can’t return it to Apple because I bought it at Fry’s.  I asked Apple to give me a free copy of FCP X to ease the pain and they told me they’d look into it and then slowly stopped returning my emails.  Thanks a bunch, Apple.  (Hey Apple: If you’re reading this, it’s never too late to make an apology & give me that free copy of FCP X.)

Apple ruins the MacBook Pro

Have you seen the new MBP?  Cool, huh?  Did you know that if you want the one with the Retina display, you’d be getting the least upgradeable, least repairable laptop in history?  That’s what iFixit had to say after they tore down then 15″ and 13″ MBPs.  You won’t be able to upgrade the RAM because it’s soldered to the motherboard.  You’ll have to replace the entire top just to replace the screen — because Apple fused the two together.

When I mention this to Apple fans and employees, what I get is, “well it’s just like the iPad!”  You’re right.  The 15-inch MacBook Pro is a $2200 iPad.  This means that they can do things like they do in the iPad where they charge you hundreds of dollars to go from a 16 GB SSD chip to a 64 GB SSD chip, although the actual difference in cost is a fraction of that.  Except now we’re not talking hundreds of dollars — we’re talking thousands.  This means that you’ll be forced to buy the most expensive one you can afford because if you do like I did and underestimate how much RAM you’ll need, you’ll be screwed.  (It costs $200 more to go from an 8GB version to a 16GB version, despite the fact that buying that same RAM directly from Crucial will cost you $30 more — not $200.)

Apple’s response is also that they’ll let the market decide.  You can have the MBP with the Retina Display and no possibility of upgrade or the MBP without the Retina Display and the ability to upgrade.

First, I want to say that that’s not a fair fight.  Second, can you please show me on the Apple website where they show any difference between the two MBPs other than CPU speed and the display?  Everyone is going to buy the cheaper laptop with the cooler display, validating Apple’s theory that you’ll buy whatever they tell you to buy. (Update: If you do order one of the Retina laptops, it does say in the memory and hard drive sections, “Please note that the memory is built into the computer, so if you think you may need more memory in the future, it is important to upgrade at the time of purchase.” But I don’t think the average schmo is going to know what that means.)

Apple Ruins the iMac

I just found out today that they did the same thing they did above, but with the iMac.  And they did this to make the iMac thinner.  My first question is why the heck did the iMac need to be thinner?  There’s already a giant empty chunk of air behind my current iMac because it’s so stinking thin already.  What exactly are they accomplishing by making it thinner?

One of the coolest things about the old iMac was how easy it was to upgrade the RAM.  There was a special door on the bottom to add more RAM.  Two screws and you’re in like Flynn.  Now it’s almost as bad as the MacBook Pros, according to the folks over at iFix it.  First, they removed the optical drive.  Great, just like FCP. They made it better by removing features!  Their tear down analysis includes sentences like the following:

  • “To our dismay, we’re forced to break out our heat gun and guitar picks to get past the adhesive holding the display down.”
  • “Repair faux pas alert! To save space and eliminate the gap between the glass and the pixels, Apple opted to fuse the front glass and the LCD. This means that if you want to replace one, you’ll have to replace both.”
  • “Putting things back together will require peeling off and replacing all of the original adhesive, which will be a major pain for repairers.”
  • “The speakers may look simple, but removing them is nerve-wracking. For seemingly no reason other than to push our buttons, Apple has added a barb to the bottom of the speaker assemblies that makes them harder-than-necessary to remove.”
  • “Good news: The iMac’s RAM is “user-replaceable.” Bad news: You have to unglue your screen and remove the logic board in order to do so. This is just barely less-terrible than having soldered RAM that’s completely non-removable.”

It is obvious to me that Apple doesn’t care at all about upgradeability and repairabiity.  Because otherwise they wouldn’t design a system that requires ungluing a display just to upgrade the RAM!  How ridiculous is that?  And they did all this to make something thinner that totally didn’t need to be thinner.  This isn’t a laptop.  There is absolutely no benefit to making it thinner.  You should have left well enough alone.

Will they screw up the Mac Pro, too?

I have it on good authority that they are also doing a major redesign of the Mac Pro (the tower config).  This is why we have waited to replace our iMac w/a Mac Pro, even though the video editing process could totally use the juice.  But now I’m scared that they’ll come out with another non-repairable product.

Keep it up, Apple, and I’m gone

Mac OS may be better than Windows in some ways, but it also comes with a lot of downsides.  I continually get sick of not being able to integrate my Office suite with many of today’s cool cloud applications, for example.  I still have to run a copy of Windows in Parallels so I can use Visio and Dragon Naturally Speaking. 

You are proving to me that you do not want intelligent people as your customers.  You don’t want people that try to extend the life of their devices by adding a little more RAM or a faster disk drive.  You want people that will go “ooh” and “ahh” when you release a thinner iMac, and never ask how you did that, or that don’t care that they now have to pay extra for a DVD drive that still isn’t Blu-Ray.

Like I said when I started this blog post.  I like my Mac.  I love my new iPad Mine, but I am really starting to hate Apple.

Continue reading

Backing Up The Cloud

"The Cloud" has changed the way I do business, but I'm not always sure how I should back up the data I have "up there."  So I thought I'd write a blog post about my research to address this hole in our plan.

Truth in IT, Inc. is run almost entirely in the cloud.  We have a few MacBooks and one iMac & a little bit of storage where we do our video editing of our educational & editorial content, as well as our ridiculous music video parodies.  But that's it.  Everything else is "out there" somewhere.  We use all of the following:

  • Salesforce.com: CRM
  • Liquidweb.com: Managed web hosting
  • Virtualpbx.com: Phone System
  • Sherweb: Hosted Exchange Services
  • Quickbooks Online: Online bookkeeping & payroll
  • Q-Commission: Online commission management (talks to Salesforce & Quickbooks)
  • Act-On: Marketing automation system
  • iCloud: Syncs & stores data from mobile devices
  • File synchronization system with history*
  • A cloud backup service for our laptops*

We have data in salesforce that is nowhere else, and the same is true of our web servers, email servers, & laptops.  Did you know that using salesforce's backups to recover data that you deleted is not included in your contract, and that if you need them to recover your data (due to your error) it will cost at least $10,000?!?!?!

I want my own backups of my data in the cloud.  I don't think we're alone in this regard. I therefore took a look at what our options are.  The process was interesting.  The following is a copy of an actual chat session I had with one of our providers:

curtis preston says:    
one question i've wondered about is how people back up the email that is hosted with you
<support person> says:    
You mean when they lose it?
curtis preston says:    
Let me put it plainly/bluntly: Scenario is you do something wrong and the exchange server i'm hosted on dies and your backups are bad.  What can I do in advance to prepare for that?
<support person> says:    
Well, when using Outlook there is always a copy on the computer that is made that could be used
<support person> says:    
And to be extra-sure you can create backups from time to time
<support person> says:    
but we have a 7 days of backup on a server so the chance both the main server and the backup cannot be backup is pretty low
<support person> says:    
Everything is really well backup here you don't have to worry

And that pretty much sums up the attitude of most of the vendors, "We've got it. Dont worry. That's the whole reason you went to the cloud!" Here's my problem with that.  Maybe they do have it; maybe they don't.  If it turns out they don't know how do IT, there's a good chance they also don't know how to configure a backup system.  I'd like to have my own copy in someone else's system and I don't mind paying for the privilege.  It turned out that all but hosted Exchange had what I would consider a decent answer.  (As far as I can tell, it's not the fault of our provider; Multi-tenant Exchange has some things ripped out of it that create this problem.)

Backups for cloud apps

There are actually a lot of solutions out there to back up cloud applications.  Here's what I found:

  • Salesforce can be automatically and regularly backed up via backupify.com, asigra.com, or ownbackup.com.
  • Gmail & Google Apps can be backed up via backupify.com.
  • Quickbooks Online can be backed up OE Companion
  • Hosted servers or virtual servers can be backed up via any cloud backup service that supports the operating system that you're using. 
  • Laptops and desktops can also easily be backed up by most cloud backup services. 
  • If you're using a file synchronization service, those files will also be backed up via whatever you choose for your backup solution for your laptops & desktops.
  • Offline copes of Outlook data can be used to restore lost Exchange data, but it seems clunky, and you need to make the offline copy manually.

Does the lack of backups for the cloud serve as a barrier to the cloud for you or your company?  Or are you in the cloud and you have the same worries as me?  Is there a particular app that worries you?  Tell me about it in the comment section.

*I don't give the name of either of these for various reasons.

Continue reading

Amazon Glacier: Cheap & Mysterious

It's a penny per GB per month saved to multiple locations and that's all you need to know — or so Amazon.com believes. I think Glacier sounds like an paradigm-shifting service that I already wrote about when I first heard about it.

For those who haven't been following, here's a summary:

  • It's $.01/GB per month of data stored in Glacier
  • There are no upload bandwidth charges at all
  • There are no download bandwidth charges — as long as you don't exceed a daily pro-rated quota of 5% of your total storage.  (I believe this should translate into no download bandwidth charges for most people.) 
  • Amazon says that Glacier was designed to provide an "annual durability of 99.999999999%"  It's here where things get interesting and mysterious.
  • If you ask to retrieve an archive, it takes a few hours to assemble that archive for downloading.  Amazon says that "Most jobs will take between 3 to 5 hours to complete."
  • If you delete archives that are less than three months old, there is a charge.

I think the pricing is awesome. I also think the durability sounds awesome.  I'm just not a huge fan of what happens when you ask them what that means.  Before I get into my direct interaction with them, I want to point out a few things from the website.

On one hand, the availability numbers for S3 and Glacier are the same.  What's not the same is how they explain those numbers.  Are the explanations different because the implementations are different?  Or is it just an oversight?  The following are direct quotes from their website (italics added):

Q: How durable is Amazon S3?

Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years. In addition, Amazon S3 is designed to sustain the concurrent loss of data in two facilities.

Q: How is Amazon S3 designed to achieve 99.999999999% durability?

Amazon S3 redundantly stores your objects on multiple devices across multiple facilities in an Amazon S3 Region. The service is designed to sustain concurrent device failures by quickly detecting and repairing any lost redundancy. When processing a request to store data, the service will redundantly store your object across multiple facilities before returning SUCCESS. Amazon S3 also regularly verifies the integrity of your data using checksums.

Q: How durable is Amazon Glacier?

Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.

On one hand, these appear to be two different wordings of the same thing.  However, note that it says that "S3 is designed to sustatin the concurrent loss of data in two facilities," but it does not say that about Glacier.  Secondly, notice the addition of the words "average annual" to the durability guarantee.  Is the data in Glacier less safe than the data in S3?  Or is this wording simply an oversight?  What happened, pray tell, when I started asking questions? First, let's talk about the questions they did answer.

I mentioned that I see that a retrieval request is only available for 24 hours, and asked what happens if the data set is large enough that it takes me longer than 24 hours to download it?  Amazon's response was basically, "don't do that."  (They said, "We anticipate that customers will size their archives in a way that allows them to comfortably download an archive within 24 hours once retrieved.)  This is therefore something you're really going to want to discuss with whomever is providing your interface to Glacier.

I als

Continue reading

One RTO/RPO or two? (Onsite & Offsite)

Disaster recovery experts do not agree whether you should have one-and-only-one recovery time objective (RTO) and recovery point objective (RPO) for each applicaition, or two of them.  What am I talking about?  Let me explain.

What are RTO and RPO, you ask? RTO is the amount of time it should take to restore your data and return the application to a ready state (e.g. "This server must be up w/in four hours").  RPO is the amount of data you can afford to lose (e.g. "You must restore this app to within one hour of when the outage occurred").

Please note that no one is suggesting you have one RTO/RPO for your entire site. What we're talking about is whether or not each application should have one RTO/RPO or two.  We're also not talking about whether or not to have different values for RTO and RPO (e.g. 12-hour RPO and 4-hour RTO).  Most people do that.

In defense of two RTOs/RPOs (for each app)

If you lose a building (e.g via a bomb blast or major fire) or a campus (e.g. via an earthquake or tsunami) it's going to take a lot longer to get up and running than if you just have a triple-disk failure in a RAID6 array.  In addition, you might have an onsite solution that gets you a nice RPO or RTO as long as the building is still intact.  But when the building ceases to exist, most people are just left to their latest backup tape they sent to Iron Mountain.  This is why most people feel it's acceptable to have two RTOs/RPOs: one for onsite "disasters" and another for true, site-wide disasters.

In defense of one RTO/RPO (for each app)

It is an absolute fact that RTOs and RPOs should be based on the needs of the business unit that is using any given application.  Those who feel that there can only be one RTO/RPO say that the business can either be down for a day or it can't (24-hour RTO).  It can either lose a day of data or it can't (24-hour RPO). If they can only afford to be down for one hour (1-hour RTO), it shouldn't matter what the cause of the outage is — they can't afford one longer than an hour.

I'm with the first team

While I agree with the second team that the business can either afford (or not) a certain amount of downtime and/or data loss, I also understand that backup and disaster recovery solutions come with a cost.  The shorter the RTO & RPO, the greater the cost.  In addition, solutions that are built to survive the loss of a datacenter or campus are more expensive than those that are built to survive a simple disk or server outage.  They cost more in terms of the software and hardware to make it possible — and especially in terms of the bandwidth required to satisfy an aggressive RTO or RPO.  You can't do an RPO of less than 24-36 hours with trucks; you have to do it with replication.

This is how it plays out in my head.  Let's say a given business unit says that one hour of downtime costs $1M.  This is after considering all of the factors, including loss of revenue and damage to the brand, etc.  So they say they decide that they can't afford more than one hour of downtime.  No problem.  Now we go and design a solution to meet a 1-hour RTO.  Now suppose that the solution to satisfy that one-hour RTO costs $10M.  After hearing this, the IT department looks at alternatives, and it finds out that we can do a 12-hour RTO for $100K and a 6-hour RTO for $2M.

So for $10M, we are assured that we will lose only $1M in an outage.  For $2M we can have a 6-hour RTO, and for $100K we can have a 12-hour RTO.  That means that a severe outage would cost me $10M-11M ($10M + 1 hour of downtime at $1M), or $6M-$12M ($6M + $6M in downtime), or $100K-$12M ($100K + 12 hours of downtime).  A gambler would say that you're looking at definitely losing (spending) $10M, $6M, or $100K and possibly losing $1M, $6M or $12M.  I would probably take option two or three — probably three.  I'd then put $9.9M I saved and make it work for me, and hopefully I'll make more for the company with that $9.9M than the amount we will lose ($12M) if we have a major outage.

Now what if I told you that I could also give you an onsite 1-hour RTO for another $10K.  Wouldn't you want to spend another $10K to prevent a loss greater than $1M, knowing full well that this solution will only work if the datacenter remains intact?  Of course you would.

So we'll have a 12-hour RTO for a true disaster that takes out my datacenter, but we'll have a 1-hour RTO as long as the outage is local and doesn't take out the entire datacenter.

Guess what.  You just agreed to have two RTOs.  (All the same logic applies to RPOs, by the way.)

If everything cost the same, then I'd agree that each application should have one — and only one — RTO and RPO.  However, things do not cost the same.  That's why I'm a firm believer in having two complete different sets of RTOs and RPOs.  You have one that you will live up to in most situations (e.g. dead disk array) and another that you hope you never have to live up to (loss of an entire building or campus).

What do you think?  Weigh in on this in the comments section.

Continue reading

Zdnet confused about Amazon Glacier pricing

Jack Clark of ZDNet wrote an article entitled AWS Glacier's dazzling price benefits melt next to the cost of tape, where he compares what he believes is the cost of storing 10 PB on tape for five years, versus the cost of doing the same with Amazon's Glacier service.  His conclusion is that Amazon's 1c/GB price is ten times the cost of tape.

I mean no disrespect, but I don't believe Jack Clark has ever had anything to do with a total cost of ownership (TCO) study of anything in IT.  Because if he had, he'd know that the acquisition cost of the hardware is only a fraction of the TCO of any given IT system. If only IT systems only cost what they cost when you buy them…. If only.

So what does it really cost to store 10 PB on tape?  Let's take a look at two published TCO studies to find out.  Before looking at these studies, let me say that since both studies were sponsored by tape companies, the point of them was to prove that tape systems are cheaper than disk systems. If these studies are biased in any way, it would be that they might underestimate the price of tape, since the purpose of these two, uh, independent studies is to prove that tape is cheaper.  (In fact, I wrote about one of the reports being significantly biased in favor of tape.)

Clipper Group Report

The first report we'll look at is the Clipper Group report that said that tape was 15 times cheaper than disk.  It's a very different report, but I'm going to use the graph on page 3, as it gives what it believes to be the TCO of storing a TB of data on tape for a year, based on four different three-year "cycles" of a 12-year period. 

diskvtape

As you can see, the cost per TB is much higher in the first three years, because it includes the cost of buying a tape library that is much larger than it needs to be for that period — because you must plan for growth.  (This, of course, is one of the major advantages of the Glacier model — you only pay for what you use.)  But to get close to Mr. Clark's five-year period, I need to use two three-year periods.

The other problem with the report is that they use graphs and don't show the actual numbers, and they use scales that make the tape numbers look really small.  You can see how difficult it is to figure out the actual numbers for tape.  It is, easy, however, to figure out the cost numbers for disk and then divide them by the multiplier shown in the graph.

The disk number for the first three-year period looks to be about $2600, which is said to be 9x the price of tape.  I divide that $2600 by 9 and I get $288/TB for that 3 year period, which matches up with the line for tape on the graph. Divide it by 3 and we get $96/TB per year.  The disk cost of the second period is $1250/TB. Divide it by15x and you get $83/TB for that 3 year period; divide that by 3 to get $27/TB per year.  If I average those two together, I get $61/TB per year.  Since Amazon Glacier stores your data in multiple locations, we'll need two copies, so the cost is $122/TB per year for two copies.  Since Jack Clark used 10 PB for five years, we'll multiply this by 10,000 to get to 10 PB, then by five to get to five years.  This gives us a cost of $6,100,000 for to store 10 PB on tape for five years, based on the numbers from the Clipper Group study.

Crossroads Report

Let's look at a more recent report that compares a relatively new idea of using a disk front end to LTFS-based tape.  The first fully-baked system of this type is from Crossroads, and they just happen to have created a TCO study that compares the cost of storing 2PB on their system (a combination of disk and tape) vs storing it on disk for ten years.  Awesome! Their 10-year cost for this is $1.64M.  Divide 2PB by 2000 gives us 1TB, then dividing the 10 year cost by 10 gives us the cost of $80/TB for one year.  Double it like we did the last number, and we have $160/TB/yr for two copies. Mutiply it by 10,000 (10 PB) and then again by five (five years) gives us a cost of $8M for 10 PB for five years based on the Crossroads Report.

On a side note, the Crossroads Strongbox system has the ability to replicate backups between two locations using their disk front end.  This makes this system a lot more like what Amazon is offering with their Glacier service.  (As opposed to traditional use of tape like the Clipper Group report was based on, where you'd also have to pay for someone like Iron Mountain to move tapes around as well.)

Net net

According to two TCO studies, storing two copies of 10 PB of data on tape for five years costs the same or more than it costs to store that same data on Amazon's Glacier.

And you don't have to buy everything up front and you only pay for what you use.  You don't have to plan for anything but bandwidth.  Yes, this will only work for data whose usage pattern matches what they offer, but they sure have made it cheap — and you don't have to manage it!

Not bad.

 

Continue reading

Amazon Glacier changes the game

In case you missed it, Amazon just announced a new storage cloud service called Glacier.  It's designed as a target for archive and backup data at a cost of $.01/GB/mth.  That's right, one penny per month per GB.  I think my first tweet on this sums up my feelings on this matter: "Amazon glacier announcement today. 1c/GB per month for backup archive type data. Wow. Seriously."

I think Amazon designed and priced this service very well.  The price includes unlimited transfers of data into the service.  The price also includes retrieving/restoring up to 5% of your total storage per month, and it includes unlimited retrievals/restores from Glacier into EC2.  If you want to retrieve/restore more than 5% of your data in a given month, additional retrievals/restores are priced at $.05/GB-$.12/GB depending on the amount you're restoring. Since most backup and archive systems store, store, store and backup, backup, backup and never retrieve or restore, I'd say that it's safe to say that most people's cost will be only $.01/GB/month.  (There are some other things you can do to drive up costs, so make sure you're aware of them, but I think as long as you take them into consideration in the design of your system, they shouldn't hit you.)

This low price comes at a cost, starting with the fact that retrievals take a while.  Each retrieval request initiates a retrieval job, and each job takes 3-5 hours to complete.  That's 3-5 hours before you can begin downloading the first byte to your datacenter.  Then it's available for download for another 24 hours.  

This is obviously not for mission critical data that needs to be retrieved in minutes.  If that doesn't meet your needs, don't use the service.  But my thinking is that it is perfectly matched to the way people use archive systems, and to a lesser degree how they use backup systems.

It's better suited for archive, which is why Amazon uses that term first to describe this system.  It also properly uses the term retrieve instead of restore.  (A retrieve is what an archive system does; a restore is what a backup system does.)  Good on ya, Amazon!  Glacier could be used for backup, as long as you're going to do small restores, and RTOs of many, many hours are OK.  But it's perfect for archives.

We need software!  (But not from Amazon!)

Right now Glacier is just an API; there is no backup or archive software that writes to that API.  A lot of people on twitter and on Glacier's forum seem to think this is lame and that Amazon should come out with some backup software.

First, let me say that this is how Amazon has always done things.  Here's where you can put some storage (S-3), but it's just an API.  Here's where you can put some servers (EC2), but what you put in those virtual servers is up to you.  This is no different.

Second, let me say that I don't want Amazon to come out with backup software.  I want all commercial backup software apps and appliances to write to Glacier as a backup target.  I'm sure Jungledisk, which currently writes to S-3, will add Glacier support posthaste.  So will all the other backup software products that currently know how to write to S-3. They'll never do that, though, if they have to compete with Amazon's own backup app.  These apps and appliances writing to Glacier will add deduplication and compression, significantly dropping the effective price of Glacier — and making archives and backups use far less bandwidth.

Questions

We all have questions that the Amazon announcement did not answer.  I have asked these questions of Amazon and am awaiting an answer.  I'll let you know what they say.

  1. Is this on disk, tape, or both?  (I've heard unofficially that the official answer is no answer, but I'll wait to see what they say to me directly.)
  2. The briefing says that it distributes my data across mutliple locations.  Are they saying that every archive will be in at least two locations, or are they saying they're doing some type of multiple location redundacy.  (Think RAID across locations.)
  3. It says that downloads are avaialble for 24 hours.  What if it takes me longer than 24 hours to download something.
  4. What about tape-based seeding for large archives, or tape-based retrieval of large archives?'

ZDNet's Cost Article

Jack Clark of ZDNet wrote an article that said that Glacier's 1c/GB/mth pricing was ten times that of tape.  Suffice it to say that I believe his numbers are way off.  I'm writing a blog post to respond to his article, but it will be a long one and a difficult read with lots of numbers and math.  I know you can't wait.

 

Continue reading

Dell buys a BUNCH of software products

Dell continues its global domination plans via more acquisitions. Last year, they acquired Ocarina & Compellant.  This year they’ve acquired Wyse (they’re still in business?) and Sonicwall. 

But now they’ve entered into my world.  They have announced that they are acquiring Appassure and Quest Software.  For those of you who haven’t been following these companies, Appassure is a near-CDP backup software product with quite a bit of success.  Quest has a bunch of products, but among them is NetVault (a traditional network backup software product) and vRanger (a purpose-built backup product for virtualization).

The Appassure acquisition is a very solid one.  I have personally watched that company increase their market share daily in a very aggressive way.  The near-CDP story plays very well both when backing up physical servers and even better when backing up virtual servers.  It’s not a small deal that they can recover a server in minutes.  It’s interesting that Appassure has been doing so well, given the difficulty that other CDP and CDP-like products have had.

For those unfamiliar with Quest’s history, the two backup products they have are the result of relatively recent acquisitions.  NetVault is a general-purpose network backup product that has been around a while, but has never garnered much market share.  I know they’ve been working to bring it up to speed with other products in the space, and have definitely increased marketing activities for it compared to its previous owner, Bakbone. vRanger was the king of the mountain in virtualization backup at one point, but they seem to have been out-marketed by Veeam lately — but they’re not taking that lying down.  Just look at product manager John Maxwell’s comments on my last blog post to see that! Perhaps having Dell’s marketing budget behind these products will finally get them the attention they are looking for.

Dell’s challenge will be similar to Quest’s: integrating all of these products into a single coherent product line.  This is a challenge they already know well.

Continue reading

Yesterday (Backup Version) A Parody Music Video

Finally the world can see what we've been showing in our Backup Central Live! shows.  This video is the first music parody song we produced, and the first music video that we did.  I hope you enjoy it.

Yesterday (Backup Version) from TrueBit.tv on Vimeo.

The musician, singer and actor for the video is none other than Cameron Romney, a talented young man I met in my daughters’ show choir. He’s 16 yo and did all the instruments, vocals, and sound mixing for this song. In case you’re curious, yes, he’s related to Mitt Romney. 1st cousin, twice removed, or something like that.

This is the first in a series of these videos that I’ll be publishing on TrueBit.TV. Make sure to check out our educational videos at TrueBit.TV. Continue reading