


Written by W. Curtis Preston
Monday, 04 August 2008 15:58
As I started working on making sure all my information was up to date on all the dedupe vendors, I thought about you! What have you always wanted to ask the dedupe vendors? Click read more to see what I'm talking about.
As this space is changing daily, I'm going to be speaking with each of the dedupe vendors to make sure my information is up to date. I also plan on presenting them the FUD that I hear about them and seeing what response they have to it. (There's nothing like frank and open dialogue to increase understanding...)
Is there something you've always wanted to ask a dedupe vendor?
Is there a piece of FUD that you heard about one of them and you want to know if it's true or not?
The best way to give me this info is to send me a private note by clicking on the "Contact Curtis" link in the menu. If you post it as a comment and it's unsubstantiated FUD -- PLEASE send it privately and not as a comment on the site. I don't want to ADD to the confusion or start a flame war, although there might be one when the stories come out. If it's just a basic question, then go ahead and post it as a comment.
Add comment
Comments
Now, on to your question. First, I don't agree that it's best practice to separate your OS and application BINARIES. I typically install OS and my applications on a single drive (virtual or not), and put my application DATA on a second drive. And when I do that, a base install of Windows and common applications (and all their patches, and undo garbage that comes with them) typically DOES come to well over 20 GB. Leave a little room for growth and play room, and you're easily at 40 GB. That's where I got the 40 GB number from. And if I've installed Windows and Exchange on 10 VMs, then they've absolutely got a lot of common blocks among them. Shoot, they're almost all common blocks.
Now let's talk about the data itself. If we're talking about Exchange, there's also a lot of common data between the information stores of multiple Exchange stores. Even with single instance storage, there is common data within a single instance of Exchange. If someone sends the same email/attachment to 20 people, and 10 of them are in unique Exchange storage groups and/or instances, then that email/attachment will be stored 11 times. (10 times for each storage group and once in my Sent Items folder.) If each of the 20 people make corrections and send it back to the original user, many of the blocks are now stored 20 times: 10 times in each of their Sent Items folders and 10 times in the Inbox of the recipient. Run dedupe against all those storage groups and you save a lot of storage.
I know customers that have deduped their VMware images and dropped the amount of disk they needed by 75%-95% -- and they've done that without a significant change in performance -- and they did it at no additional cost.
As to your spindle-count comment, I think it's really not an issue. Since we're talking OS and application binaries (mainly), they're really not accessed that much anyway; performance is really not the issue. And aren't we talking about VMs? And you're asking me about high performance and spindle count? Then it probably shouldn't be in a VM. As to performance of Exchange data, you either notice a significant performance difference when you dedupe it or you don't. If you do, then don't dedupe it! If it reduces your disk consumption by 50%, but costs twice as much as using it, then don't do it.
You sound like I or someone else is trying to shove dedupe of your C: drive down your throat. No one is trying to do that. I don't think everyone should dedupe their OS/app drives. I think you should do it if it makes sense to you.
You gave examples where dedupe might not help. Great/. Deduping standard SQL/Oracle data won't get you much, and it would be silly to dedupe a single C: drive image that's being used as a common boot for multiple systems. If dedupe doesn't help, then don't use it! BUT real customers are using it and are being helped by it and aren't suffering performance loss or increase in cost, SO... (going back to my oracle-like statement that apparently has little substance and seems all gloss), "If you could do that without a reduction in performance or an increase in cost, my question is why WOULDN'T you do it?"
I think best practice in a VM environment from what I have seen and heard, is to have the C: drive a boot image only - so you are talking 4-8GB right? Application volumes tend to be seperate. Like Exchange, SQL, Oracle etc....why would I put that in C: drive.. The C: Drives are similar however the Application volumes are going to be very different.
If you are talking Thin Clients then all boot images would be the same...so why not have one and boot them all off one image and then have seperate Datafile volumes for each user. Therefore 100 Desktops would only consume teh space of one C: Drive in this case.
You're 'oracle' like statements on this matter have little substance to them and seem all gloss.
How many drives am I really going to save? Lets say I have 100 Windows servers that I want to VM....
I've looked at the math and its 4/5's of not very much at ALL....So why would I risk additional head processing on data to save a couple of drives....
Besides - it also means I am reducing in effect the amount of disk IOPs right - Why would I want that?
Wouldn't deduping the C: also be ineffectual over time (images change right?) and when doing patches etc? May not do them all at the same time.
Sorry - I just dont get it.
Yes, you can point users to a NFS or CIFS mount from a Data Domain box (or any other dedupe box with a NAS head) today and yes that data will be deduped. BUT I'm not sure it's designed with that use case in mind. It's primarily designed as a target for backup and archive.
There are two issues with a see using a backup/archive dedupe device as a target for regular user data. The first is the user's performance experience may change (although that might be worth testing). The second, and actually bigger problem is the cost/benefit ratio. You probably won't get much more than 2:1 on user data (and that primarily comes from compression), but the dedupe vendor's pricing is typically based on the value provided by a dedupe ratio of 20:1 (or something like that). They make 1 TB look like 20 TB, but only charge you for 10, or something like that. But if they make 1 TB look like 2 TB (2:1 dedupe) and charge you for 10, they're not really helping, are they?
Now, on to NetApp & Storewize. NetApp uses their A-SIS technology to find duplicate blocks of data on data stored on NetApp filers (I believe on NFS, CIFS & SAN). While there's a significant performance hit while the post-process dedupe session is running (which they ask you to run after hours), there is a minimal impact to performance to the user when that process isn't running. So as long as you're not concerned about 100% performance 24x7, that would work well. Dedupe at night, use during the day. The other cool thing about it is that it's a free feature of Data Ontap, essentially doubling the size of their filers for free -- as long as you fit the right use case. (What works really well for this is VMware images.)
Then there's Storewize. They sit in FRONT of any kind of filer and compress & uncompress the data (not dedupe it) as it's being written to or read from the filer. What time they spend compressing it inline is made up by the reduced time it takes to write it/read it to/from disk (since it's compressed), so they're TELLING me that you should be able to pop this in front of a filer and Voila! The user never notices. It should be cheaper than buying a dedupe box (based on how they're pricing works, see above), and you should get similar data reduction rates as you will get on primary data stored on a dedupe box.
I haven't tested either of these, but this is what I've seen that answers your question.
Can CIFS / NFS users access data directly from a Data Domain box?
Thanks and Regards,
Joe
RSS feed for comments to this post