Why tape drives are bad for backups

Specifically, this article is about why modern tape drives are a really bad choice to store the initial copy of your backups. It’s been this way for a long time, and I’ve been saying so for at least 10 years, in case anyone thinks I’ve been swayed by my current employer.  Tape is good at some things, but receiving the first copy of your backups isn’t one of them.  There are also reasons why you don’t want to use them for your offsite copy, and I’ll look at those, too.

 

Tape drive are too fast for incremental backups

  • Tape drives are too fast
    • In case you didn’t know it, modern tape drives essentially have two speeds: stop and very fast. Yes, there are variable speed tape drives, but even the slowest speed they run at is still very fast.  For example, the slowest an LTO-7 drive can go using LTO-7 media is 79.99 MB/s native.  Add compression, and you’re at 100-200 MB/s minimum speed!
  • Incremental backups are too slow
    • Most backups are incremental backups, and incremental backups are way too slow. A file-level incremental backup supplies a random level of throughput usually measured in single digits of MegaBytes per second. This number is nowhere near 100-200 MB/s.
  • The speed mismatch is the problem
    • When incoming backups are really slow, and the tape drives want to go very fast, the drive has no choice but to stop, rewind, and start up again. It does this over and over, dragging the tape head back and forth across the read write head in multiple passes. This wears out the tape and the drive, and is the number one reason behind tape drive failures in most companies.  Tape drives are simply not the right tool for incoming backups.  Disk drives are much better suited to the task.
  • What about multiplexing
    • Multiplexing is simultaneously interleaving multiple backups together into a single stream in order to create a stream fast enough to keep your tape drive happy. It’s better than nothing, but remember that it helps your backups but hurts your restores.  If you interleave ten backups together during backup, you have to read all ten streams during a restore — and throw away nine of them just to get the one stream you want. It literally makes your restore ten times longer.  If you don’t care about restore speed, then they’re great!

What about offsite copies?

Their have been many incidents involving tapes lost or exposed by offsite vaulting companies like Iron Mountain.  Even Iron Mountain’s CEO once admitted that it happens at a regular enough interval that all tape should be encrypted. I agree with this recommendation — any transported tape ought to be encrypted.

Tape is still the cheapest way to get data offsite if you are using a traditional backup and recovery system. If you’re using such a system, you have to buy an expensive deduplication appliance to make the daily backup small enough to replicate. These can be effective, but they are very costly, and there are a lot of limits to their deduplication abilities — many of which make them cost more to purchase and use.  This is why most people are still using tape to get backups offsite.

If you have your nightly backups stored on disk, it should be possible to get those backups copied over to tape.  That is assuming that your disk target is able to supply a stream fast enough to keep your tape drives happy, and there aren’t any other bottlenecks in the way.  Unfortunately, one or more of those things is often not the case, and your offsite tape copy process becomes as mismatched as your initial backup process.

In other words, tape is often the cheapest way to get backups offsite, but it’s also the riskiest, as tapes are often lost or exposed during transit. Secondly, it can be difficult to configure your backup system properly to be able to create your offsite tape copy in an efficient manner.

I thought you liked tape?

I do like tape.  In fact, I’m probably one of the biggest proponents of tape.  It has advantages in some areas.  You cannot beat the bandwidth of tape, for example.  There is no faster way to get petabytes of data from one side of the world to another.  Tape is also much better had holding onto data for multiple decades, with a much lower chance of bit rot.  But none of these advantages come into play when talking day-to-day operational backups.

I know some of you might think that I’m saying this just because I now work at a cloud-based backup company. I will remind you that I’ve been saying these exact words above at my backup seminars for almost ten years.  Tape became a bad place to store your backups the day it started getting faster than the network connection backups were traveling over — and that was a long time ago.

What do you think?  Am I being too hard on tape?

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

16 comments
  • Multiplexing can be used for backup/restore when you use it with big enough blocks (10 GB+) and fast enough network.

    Using big blocks we can actually guarantee restore speed from a tape (if restoring client is able to receive and flush that stream to disk). We’ve got live examples where restore has been over 6 times faster than backup of an individual stream from an multiplexed backup (Backup speed for that stream was 15 MB/s, while restore speed was 113 MB/s). We’re using Oracle T10KD drives and normally write them about 350-450 MB/s with about 30 concurrent streams. Best thing in this is that you can scale out it linearly by adding more network speed and tape drives.

    Main disadvantage using tapes is that you cannot dedupe or run incremental forever, which causes requirement for fast network. Also you should run full backups regularly and use differentials so you need only two restores to get into required point-in-time. This actually gives you more security when you have ‘duplicated’ same data over several offline media, being secure from hacking / data corruption.

    Deduped data restore is always random IO operation in back-end, and normally it’s using slow SATA/SAS drives. Dedupe is very useful for small workloads and data behind slow connections. However in big environments (over PB FET) the disk space and power/cooling requirement makes it very costly.

    Operational protection for critical data should utilize application consistent primary storage snapshots/replication to restore really big amounts in minutes.

    • I spent a lot of years designing backups in just the way you describe. A few thoughts.

      While you CAN use mutliplexing to make the full backups happy, there’s nothing you can do for the incremental backups. They just don’t supply a stream of data fast enough. For example, you are generating 300+ MB/s with 30 streams. That requires streams of at least 10 MB/s, which you can’t really count on from an incremental backup.

      Second, you say you can guarantee the restore speed. I would say you can guarantee it won’t be more than about 10 MB/s in your config. 😉 Because you’re going to be reading 30 streams and throwing away 29 of them.

      As to comparing the cost of disk vs tape, I’ve done a bunch of TCOs over the years comparing the two. My experience has been that it’s not quite as simple as saying tape is always cheaper. Plus with deduped disk & replication, you get backups offsite without involving any humans and trucks. It’s difficult to put a price on that.

      I’ve also found over the last decade that when people switch to a fully disk-based protection system, their backups simply work better. Tape-based backups require constant vigilance to make sure everything’s working the way it should. A small change and suddenly the drives are no longer streaming and backups are failing. I don’t hate tape. I just think it’s now better suited for archive applications.

      We agree on snapshots & replication, though! 🙂

      • If we’re having slower streams than the drive, it’s not a problem because we use large multiplexer blocks, where writing client is kind of disconnected from the target tape drive. So whatever client is capable of writing it’s fine, other streams might be faster so it could be that client writing incremental actually writes every 60’th block on target tape. If it’s a small incremental it could be only few full blocks it ever writes to tape (remember 10+ GB/block). If all clients are slow, then we cannot fully utilize the tape drive, but it won’t affect on client side backup speed.

        Main advantage of using big blocks is in restore. You said we must throw away 29 streams when restoring one … not really, since we skip on tape drive over those streams (we index location of each block on tape). Average location time on T10kD is 35 seconds (random to random position). Reading of that 10 GB takes about 40 seconds ~ 133 MB/s (10 GB/75 seconds) guaranteed read speed. Even in worst case the maximum rewind time is 97 seconds ~ 75 MB/s (10 GB/132 seconds) guaranteed read speed. Real production restores have shown average restore speeds to be about 100+ MB/s.

        In future we might be utilizing RAO (Recommended access order), this will minimize locate times since tape drive optimizes the read order of blocks requested.

        We’ve done special measures to make our tape solution fault tolerant, it automatically handles broken backup servers, SAN fabric, tape drives and tapes.

        We use mirroring and copying tapes from site to site, so no trucks or human handling.

        • I’m not sure who “we” is, because you’re describing behavior I’ve never heard of.

          I’m also struggling w/the details. You’d have to have a very big cache to write 10 GB of data at the write speeds you specify. It would take 20-30 seconds to write 10 GB at 300-400 MB/s. You’re not streaming that from a client at that speed, so you’ve got to be doing some pretty major caching. Then you have to immediately have another cached 10GB to write the next block. And at the various write speeds of various backups, you’d have to have 30-60 caches of 10 GB each. That’s a lot of cache!

          And would this tech only work with the TK drives? They are awfully expensive. Great drives, AFAIC, but much more expensive than LTO. And they’re sold by a company whose commitment to tape is questionable. I’ve heard nothing but difficulty with buying StorageTek stuff since the Oracle takeover.

          I guess my biggest question is why go through all this effort?

          • We is NovaStor NovaBackup DataCenter (Hiback) at one of our customer site.

            And yes it handles multiplexing totally differently than older enterprise backup systems which multiplex with small (16-256KB) blocks.

            We are using large SSD (write intensive) as extension for memory / cache.

            Solution is hardware agnostic. We have used it also with IBM E07 drives. You can also use LTO drives.

            Why … TCO is the answer. Over 10k clients, 30k backup jobs per day, 20 PB per month with few persons to handle whole environment. And backups are made to be able to restore, and we’ve proven that tape can restore fast and reliable.

        • Piippu,

          Correct me if I’m wrong, but this sounds like a clever variation on classic disk-to-disk-to-tape backup.

          Rather than waiting for, and writing an entire backup job to tape, you fill large buckets in parallel, and when a bucket is full (or the associated job complete), you write the bucket out to tape from SSD at streaming speed. So you don’t multiplex on block level but rather on bucket level. This keeps the data fragments of individual jobs together so you get full restore speed for each 10G and then space to the next bucket if needed. Is that what you do?

          • That sounds about right. Which means that (with the help of an SSD cache) they’ve solved one of the problems of backing up to tape. The other problems, unfortunately, are still there.

            You still have the difficulties associated with transporting and storing tapes to get them offsite. It’s also unclear how this helps during restore time. Unless they are doing the reverse during restores, I’m not convinced it would solve the shoe-shining problem during restores.

          • @Rob, Yes you’re correct in our case we don’t need disk cache for whole backup, just enough to have always free ‘buckets’ for new streams.

            @Curtis, our tape library is in another machine room than production data. We also mirror backup streams inline between sites. So first backup server gets data, writes it to local ‘bucket’ and also mirrors the stream to another site, which writes it’s own ‘bucket’. Idea is not to touch tapes manually (we’re using Oracle SL8500 libraries.

            And yes problem comes at the moment if restoring client is too slow. Current tape drives can though slow their speed and shoe-shine is not real problem with current network speeds and clients in our environment.

            We’re investigating using SSD cache also during restore, so first stage data to SSD and give then data from there to requesting client. This would free tape drive for other activities while slow client is restoring it’s own data.

  • Interesting discussion, but I want to use my own anecdotal evidence to support disk based solutions.
    Our production environment is approximately 30 PB of mixed workloads, and during our last tech refresh we moved all of our production backups over to disk. The primary reason for this was to implement replication for off-site storage, and prevent data loss during tape shipping. But as we implemented the project we noticed a dramatic shift in our backup and restore times. Across the enterprise our backup performance improved by around 5x. Combined with the time saved dealing with tape handling and our backup team has had a lot more time to spend on resolving actual problems and issues. This added focus on resolving issues allowed the team to concentrate on fixing deduplication conflicts, resulting in better deduplication ratio’s across the enterprise.
    Because of these factors, I will always advocate for backup to disk.

    • I have said this for years. The first thing I saw w/tape is that most people had no idea how to architect for a properly designed tape system. I made my living for years doing just that. Fixing people’s poorly designed tape systems. Telling things like “you know, your backups would go a lot faster if you used fewer tape drives,” and watching their heads explode.

      Almost everyone I know who switched from tape to disk as their backup initial backup target reported faster and much easier backups and restores. There are a few exceptions when customers had a rare combination. If they had a well-tuned backup of a large database, for example, backing up that database to a poorly-designed inline dedupe system may actually slow down the initial backup. But such customers were definitely in the minority.

      Then when you add being able to replicate backups, being able to easily check them for corruption, etc…. disk backups look better and better.

  • Preston: re your original question – Yes, if people are using “Incrementals for ever” and Synthetic Fulls i dont see why it would be too difficult to copy the “Full” to a Tape without provoking “Shoe Shining”

    Piipu: You seem very focussed on Restore speed, this is usually restricted by the destination volume/DB insert speed, rather than the tape speed and Preston’s article is about Backup

    • If you have a properly designed system, I’d say you were right. But I’ve seen a number of situations where there was still a problem. Either the hardware was incapable of transferring data at that speed, or the backup software’s need to catalog the backups as they’re being copied sometimes slowed it down.

      But if you were able to get the copy done correctly, you’re still left with tape’s physical transport requirement. You must give it to a man in the van to get offsite. With disk you can dedupe and replicate.

  • I have said this for years. The first thing I saw w/tape is that most people had no idea how to architect for a properly designed tape system. I made my living for years doing just that. Fixing people’s poorly designed tape systems. Telling things like “you know, your backups would go a lot faster if you used fewer tape drives,” and watching their heads explode.

    Almost everyone I know who switched from tape to disk as their backup initial backup target reported faster and much easier backups and restores. There are a few exceptions when customers had a rare combination. If they had a well-tuned backup of a large database, for example, backing up that database to a poorly-designed inline dedupe system may actually slow down the initial backup. But such customers were definitely in the minority.

    Then when you add being able to replicate backups, being able to easily check them for corruption, etc…. disk backups look better and better.

  • Iron Mountain has solved a big piece of the tape handling issues. If you use the IM secure sync offering you can create tapes with a disk to disk to tape solution where the tape is actually created in IM’s DC. Reducing the amount of handling required for off site tapes.
    This brings tape back into the mainstream for long term retention, as it’s not as expensive as other cloud offerings for long term retention items.