An interesting aspect of NetApp’s primary dedupe (ASIS) came to light while talking with one of their customers the other day. It’s one of those things that should have been obvious from the start, but I never really thought about it until this customer brought it up.
Before I bring up a concern about ASIS, let me praise it just a bit. NetApp’s ASIS (Advanced Single Instance Storage) is the only product available today that does true deduplication of any data type, including deduping active data stores such as VMware images. Customers that I have spoken to tell me that there is a performance hit while the post-process dedupe process is running (generally at night), but that after the dedupe process has run, there is minimal to no performance degradation on the deduped data. Add to that fact that ASIS is included in the base OS and I think they’ve got a pretty interesting story. Other “data reduction” products include:
- EMC Celerra
- It provides file-level dedupe and compression of older files. This would therefore not work for VMware or database images. It will provide some space savings, but not as much as a subfile-level approach, and not as much as one that can dedupe active data.
- Since half of most dedupe comes from compression, StorWize said let’s just do that! They are an inline compression system that does for NAS systems what the compression chip in your tape drive does for the tape drive. Since it’s compressing inline, it actually can improve performance of some applications. Believe it or not, they’ve even tested it in front of Data Domain systems and increased their capacity! Like the NetApp approach, it works for any data type.
- Ocarina does content-aware deduplication. While they started doing only file-level dedupe, they have recently added cross-file-level dedupe, so they also are doing “true” dedupe. But they only do this for certain file types, such as Word documents, jpg files, etc. If you have a lot of data in the file types they support, they should be able to get more dedupe out of it than other approaches, but they won’t be able to address other data types at all, such as VMware.
- Content Addressable Storage (CAS) products & Single Instance Storage (SIS) products
- These products provide object-level or file-level dedupe and will not identify common blocks between files, but they should at least be mentioned in a list such as this. Some of these products have started calling themselves deduplication products, when (at best) they can call themselves object-level dedupe or file-level dedupe.
Alright, on to the interesting thing about NetApp’s primary dedupe. Here’s the thing: they “redupe” when replicating or when copying to tape. Let’s look at each of these use cases.
ASIS is run at the filer-level and actually at the flex-vol (i.e. volume) level. When that data is replicated to another file, the data is reduped, or re-constituted to its original size. If you want to run ASIS on the other side you can. Under “normal “circumstances where you start out with an empty volume, start filling it, and are replicating it, this poses no problem. It also poses no problem if you had a full volume you were replicating and then decided to run dedupe on it after the fact. Dedupe both sides — no problem. However, if you have a volume where the amount of deduped data when reduped is greater than the replicated volume’s raw capacity, and you haven’t been replicating it as you go along, you’ll need to begin replication in stages. You’ll replicate some of the data, then dedupe that data. Then you replicate some more data and dedupe that data, and so on.
Update: The above only occurs if you use qtree-based snapmirror. If you do volume-based snapmirror, there is no problem. However, many people prefer qtree snapmirror, so they should be aware of this limitation.
A bigger concern is when you’re backing this data up to tape. Like almost all dedupe products, when the deduped volume is copied to tape, it is reduped. If you had a full volume fail and needed to restore that volume, you wouldn’t be able to directly do so, as you’d have more data on tape than you could fit on the volume. You’d have to restore some data, dedupe it, restore some more, dedupe it, and so on. Therefore, it would seem that anyone with aggressive RTOs and a full deduped ASIS volume would be well advised to have a snapmirror copy of it standing by, as you won’t be able to restore it as fast as a regular volume. This limitation is confirmed by the following quote from NetApp’s ASIS Implementation Guide, “Backup of the deduplicated volume using NDMP is supported, but there is no space optimization when the data is written to tape because it’s a logical operation.”
Update: Snapmirror to tape (sm2t) doesn’t have this problem, just a regular NDMP dump. The problem with sm2t is that it doesn’t do file-level recovery AND it’s not manageable via some backup applications. (It is manageable by TSM, NBU, BakBone, CommVault, Atempo and SyncSort ). So, SM2T is fine for a full DR of a volume if you can manage it with your backup app, and that’s alright if you have enough snapshot history to handle single-file restores (which you should be doing anyway).
Like I said — just something I never thought about until someone brought it up. NetApp may be able to address both these challenges at some point, and I hope they do.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.