Why tape drives are bad for backups

Specifically, this article is about why modern tape drives are a really bad choice to store the initial copy of your backups. It’s been this way for a long time, and I’ve been saying so for at least 10 years, in case anyone thinks I’ve been swayed by my current employer.  Tape is good at some things, but receiving the first copy of your backups isn’t one of them.  There are also reasons why you don’t want to use them for your offsite copy, and I’ll look at those, too.

 

Tape drive are too fast for incremental backups

  • Tape drives are too fast
    • In case you didn’t know it, modern tape drives essentially have two speeds: stop and very fast. Yes, there are variable speed tape drives, but even the slowest speed they run at is still very fast.  For example, the slowest an LTO-7 drive can go using LTO-7 media is 79.99 MB/s native.  Add compression, and you’re at 100-200 MB/s minimum speed!
  • Incremental backups are too slow
    • Most backups are incremental backups, and incremental backups are way too slow. A file-level incremental backup supplies a random level of throughput usually measured in single digits of MegaBytes per second. This number is nowhere near 100-200 MB/s.
  • The speed mismatch is the problem
    • When incoming backups are really slow, and the tape drives want to go very fast, the drive has no choice but to stop, rewind, and start up again. It does this over and over, dragging the tape head back and forth across the read write head in multiple passes. This wears out the tape and the drive, and is the number one reason behind tape drive failures in most companies.  Tape drives are simply not the right tool for incoming backups.  Disk drives are much better suited to the task.
  • What about multiplexing
    • Multiplexing is simultaneously interleaving multiple backups together into a single stream in order to create a stream fast enough to keep your tape drive happy. It’s better than nothing, but remember that it helps your backups but hurts your restores.  If you interleave ten backups together during backup, you have to read all ten streams during a restore — and throw away nine of them just to get the one stream you want. It literally makes your restore ten times longer.  If you don’t care about restore speed, then they’re great!

What about offsite copies?

Their have been many incidents involving tapes lost or exposed by offsite vaulting companies like Iron Mountain.  Even Iron Mountain’s CEO once admitted that it happens at a regular enough interval that all tape should be encrypted. I agree with this recommendation — any transported tape ought to be encrypted.

Tape is still the cheapest way to get data offsite if you are using a traditional backup and recovery system. If you’re using such a system, you have to buy an expensive deduplication appliance to make the daily backup small enough to replicate. These can be effective, but they are very costly, and there are a lot of limits to their deduplication abilities — many of which make them cost more to purchase and use.  This is why most people are still using tape to get backups offsite.

If you have your nightly backups stored on disk, it should be possible to get those backups copied over to tape.  That is assuming that your disk target is able to supply a stream fast enough to keep your tape drives happy, and there aren’t any other bottlenecks in the way.  Unfortunately, one or more of those things is often not the case, and your offsite tape copy process becomes as mismatched as your initial backup process.

In other words, tape is often the cheapest way to get backups offsite, but it’s also the riskiest, as tapes are often lost or exposed during transit. Secondly, it can be difficult to configure your backup system properly to be able to create your offsite tape copy in an efficient manner.

I thought you liked tape?

I do like tape.  In fact, I’m probably one of the biggest proponents of tape.  It has advantages in some areas.  You cannot beat the bandwidth of tape, for example.  There is no faster way to get petabytes of data from one side of the world to another.  Tape is also much better had holding onto data for multiple decades, with a much lower chance of bit rot.  But none of these advantages come into play when talking day-to-day operational backups.

I know some of you might think that I’m saying this just because I now work at a cloud-based backup company. I will remind you that I’ve been saying these exact words above at my backup seminars for almost ten years.  Tape became a bad place to store your backups the day it started getting faster than the network connection backups were traveling over — and that was a long time ago.

What do you think?  Am I being too hard on tape?

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

6 thoughts on “Why tape drives are bad for backups

  1. Pingback: Bandwidth: Backup Design Problem #2 | Backup Central

  2. Piippu says:

    Multiplexing can be used for backup/restore when you use it with big enough blocks (10 GB+) and fast enough network.

    Using big blocks we can actually guarantee restore speed from a tape (if restoring client is able to receive and flush that stream to disk). We’ve got live examples where restore has been over 6 times faster than backup of an individual stream from an multiplexed backup (Backup speed for that stream was 15 MB/s, while restore speed was 113 MB/s). We’re using Oracle T10KD drives and normally write them about 350-450 MB/s with about 30 concurrent streams. Best thing in this is that you can scale out it linearly by adding more network speed and tape drives.

    Main disadvantage using tapes is that you cannot dedupe or run incremental forever, which causes requirement for fast network. Also you should run full backups regularly and use differentials so you need only two restores to get into required point-in-time. This actually gives you more security when you have ‘duplicated’ same data over several offline media, being secure from hacking / data corruption.

    Deduped data restore is always random IO operation in back-end, and normally it’s using slow SATA/SAS drives. Dedupe is very useful for small workloads and data behind slow connections. However in big environments (over PB FET) the disk space and power/cooling requirement makes it very costly.

    Operational protection for critical data should utilize application consistent primary storage snapshots/replication to restore really big amounts in minutes.

    • W. Curtis Preston says:

      I spent a lot of years designing backups in just the way you describe. A few thoughts.

      While you CAN use mutliplexing to make the full backups happy, there’s nothing you can do for the incremental backups. They just don’t supply a stream of data fast enough. For example, you are generating 300+ MB/s with 30 streams. That requires streams of at least 10 MB/s, which you can’t really count on from an incremental backup.

      Second, you say you can guarantee the restore speed. I would say you can guarantee it won’t be more than about 10 MB/s in your config. 😉 Because you’re going to be reading 30 streams and throwing away 29 of them.

      As to comparing the cost of disk vs tape, I’ve done a bunch of TCOs over the years comparing the two. My experience has been that it’s not quite as simple as saying tape is always cheaper. Plus with deduped disk & replication, you get backups offsite without involving any humans and trucks. It’s difficult to put a price on that.

      I’ve also found over the last decade that when people switch to a fully disk-based protection system, their backups simply work better. Tape-based backups require constant vigilance to make sure everything’s working the way it should. A small change and suddenly the drives are no longer streaming and backups are failing. I don’t hate tape. I just think it’s now better suited for archive applications.

      We agree on snapshots & replication, though! 🙂

      • Piippu says:

        If we’re having slower streams than the drive, it’s not a problem because we use large multiplexer blocks, where writing client is kind of disconnected from the target tape drive. So whatever client is capable of writing it’s fine, other streams might be faster so it could be that client writing incremental actually writes every 60’th block on target tape. If it’s a small incremental it could be only few full blocks it ever writes to tape (remember 10+ GB/block). If all clients are slow, then we cannot fully utilize the tape drive, but it won’t affect on client side backup speed.

        Main advantage of using big blocks is in restore. You said we must throw away 29 streams when restoring one … not really, since we skip on tape drive over those streams (we index location of each block on tape). Average location time on T10kD is 35 seconds (random to random position). Reading of that 10 GB takes about 40 seconds ~ 133 MB/s (10 GB/75 seconds) guaranteed read speed. Even in worst case the maximum rewind time is 97 seconds ~ 75 MB/s (10 GB/132 seconds) guaranteed read speed. Real production restores have shown average restore speeds to be about 100+ MB/s.

        In future we might be utilizing RAO (Recommended access order), this will minimize locate times since tape drive optimizes the read order of blocks requested.

        We’ve done special measures to make our tape solution fault tolerant, it automatically handles broken backup servers, SAN fabric, tape drives and tapes.

        We use mirroring and copying tapes from site to site, so no trucks or human handling.

        • W. Curtis Preston says:

          I’m not sure who “we” is, because you’re describing behavior I’ve never heard of.

          I’m also struggling w/the details. You’d have to have a very big cache to write 10 GB of data at the write speeds you specify. It would take 20-30 seconds to write 10 GB at 300-400 MB/s. You’re not streaming that from a client at that speed, so you’ve got to be doing some pretty major caching. Then you have to immediately have another cached 10GB to write the next block. And at the various write speeds of various backups, you’d have to have 30-60 caches of 10 GB each. That’s a lot of cache!

          And would this tech only work with the TK drives? They are awfully expensive. Great drives, AFAIC, but much more expensive than LTO. And they’re sold by a company whose commitment to tape is questionable. I’ve heard nothing but difficulty with buying StorageTek stuff since the Oracle takeover.

          I guess my biggest question is why go through all this effort?

          • Piippu says:

            We is NovaStor NovaBackup DataCenter (Hiback) at one of our customer site.

            And yes it handles multiplexing totally differently than older enterprise backup systems which multiplex with small (16-256KB) blocks.

            We are using large SSD (write intensive) as extension for memory / cache.

            Solution is hardware agnostic. We have used it also with IBM E07 drives. You can also use LTO drives.

            Why … TCO is the answer. Over 10k clients, 30k backup jobs per day, 20 PB per month with few persons to handle whole environment. And backups are made to be able to restore, and we’ve proven that tape can restore fast and reliable.

Leave a Reply

Your email address will not be published. Required fields are marked *