When I first heard of LTFS (the Linear Tape File System) I have to admit my first reaction can best be summed up by this sound bite from Zoolander. With VTLs, we had disk pretending to be tape; now we have tape pretending to be disk!?!! I was reminded of other filesystem-on-removable-media attempts that I’ve seen that I also haven’t been that excited about, and the linear nature of tape does not really lend itself to typical filesystem access. Are these guys crazy? Yep. Crazy like a fox.
Haven’t they heard? We’re putting everything on disk and deduping it! Word documents, database backups, Excel files, Exchange and SharePoint backups, video and audio streams, PDF files, medical imaging — everything — right? Not so much. Yes we’re putting a lot of stuff on disk, and yes, we’re deduping everything. But video, audio, PDFs, and any kind of images don’t dedupe at all. Yes, they’ll dedupe if you have multiple copies of them, but you don’t have versions of these files the way you do with other data types. And many of these files tend to be ginormous, so we don’t do repeated fulls on them. Result? it’s a waste of money to put them on deduped storage. Remember that with most deduped storage, you’re paying a premium for the dedupe feature. They take 1 TB, make it look like 20 TB, but only charge you for 10 TB. If your’e getting no dedupe out of it (because you’re storing file types that don’t dedupe), then you’re really getting 1 TB of disk but you’re paying for 10.
So…. The folks at LTO decided to take the most popular tape drive in a long time and see what they could do about that. It starts with the fact that LTO-5 has the ability to parittion the tape, allowing you to address two different parts of the tape as if they were two separate tapes. For the purposes of LTFS, they made the partition very small (in contrast to the rest of the tape), and the other partition contains the bulk of the capacity. The first partition is used to store the filesystem metadata, and the second partition is used to store the actual files. If you load the tape into a drive on a server running LTFS (available at this time for Linux and MacOS, with Windows due soon), LTFS loads the information in the metadata partition into memory and make that information available to the operating system. The metadata includes everything about where the files are located on tape, their permissions, and where they are located in the virtual filesystem stored on the tape (what directory or subdirectory they are in). Once this is done, the tape appears in the operating system as any other filesystem (e.g. a drive letter in Windows, displayed by the df command in Linux). You can then do anything in that filesystem that you can with any other filesystem, such as open files, delete files, or even modify files.
When I read in the presentation that you could modify a file, my knowledge of tape architecture reared its ugly head. How could you modify a file in the middle of the tape? Although I’m still not sure of the details of how they make it happen, they do make it happen. Being linear, any new information obviously has to be stored at the end of the tape; therefore, modifying a file, will take longer than it would if it were on disk. No one is trying to pretend that LTFS will perform the same as tape; what they are saying is that you can use LTFS as a significantly less expensive alternative to disk for certain types of media and for certain use cases.
In addition to using LTFS for media types that don’t dedupe, it’s also excellent for long term storage (i.e. archiving). All current long term storage options on tape require the archive software to create their own tape format, or to use an existing encapsulation system (e.g. tar). If you are talking seriously long term storage (several years to many decades), there is a real possibility that the company that made the tape format you are using could cease to exist. Having a tape format that is independent of the application is an awesome way to work around this issue. Archive software companies have already expressed interest in supporting this new format.
What about backup companies? It’s perfectly conceivable that backup software companies will support this tape format going forward. And if they do, they’re creating tapes that are completely self-describing! That really helps to alleviate the same issues and fears that we have about backup format issues on the backup software side. What we need now is for a backup software upstart (hey, CommVault!) to support this format, and others will hopefully follow. I doubt the larger companies will want to do it, because it removes the stranglehold they have on their customers, but maybe someone will surprise me.
One really interesting thing these guys told me is that since nothing is ever overwritten on the tape, the tape holds not only the current version of files, but also every previous version of every file that fit on that tape. And the metadata holds all of the information necessary to get back to previous iterations of the tape (directory structure, everything). It’s like snapshots, but for tape! I think that the next thing that needs to happen is the people actually writing the LTFS software add a feature that shows all of those previous versions in a read-only state, and then allow us to select one of the read-only versions to promote it to read-write state. (They told me that once you write to a previous state, everything that was written after that is erased, so this is why the previous states should be read-only until you decide to force it to do otherwise.) A filesystem with snapshots! Awesome!
The more I thought about this idea the more I liked it. It’s not for everything, and disk is still my preferred initial target for backups these days, but we do need a place to put all this other stuff. LTFS sounds like a perfect one. These people are crazy… Crazy like a fox!
Tweet this blog post!
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.