<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>Performance Comparison of Deduped Disk Vendors</title>
		<description>Discuss Performance Comparison of Deduped Disk Vendors</description>
		<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html</link>
		<lastBuildDate>Fri, 10 Feb 2012 11:24:09 +0000</lastBuildDate>
		<generator>JComments</generator>
		<atom:link href="http://www.backupcentral.com/component/jcomments/feed/com_content/229/10.html" rel="self" type="application/rss+xml" />
		<item>
			<title>Matthew O&amp;#039;Keefe says:</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-713</link>
			<description><![CDATA[Curtis, Often dedupe performance on the first write of new data is lower than later re-writes of (substantially the same) data, so are the performance numbers you've discussed mostly for first writes or later writes? It would seem that the most common case is re-writing mostly the same data, so perhaps there is a reason to focus on quoting that number. I'd appreciate your viewpoint on this issue. Thanks, Matt]]></description>
			<dc:creator>Matthew O&amp;#039;Keefe</dc:creator>
			<pubDate>Wed, 23 Sep 2009 13:20:09 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-713</guid>
		</item>
		<item>
			<title>Didn\'t have the numbers</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-650</link>
			<description><![CDATA[That's pretty much it!]]></description>
			<dc:creator>W. Curtis Preston</dc:creator>
			<pubDate>Mon, 13 Jul 2009 21:51:46 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-650</guid>
		</item>
		<item>
			<title>What about NEC?</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-649</link>
			<description><![CDATA[I noticed that you mentioned in your comments that NEC has global dedup, but they are not in your list. Is there a reason for this? Or do you just not have this information for them?]]></description>
			<dc:creator>jonronix</dc:creator>
			<pubDate>Mon, 13 Jul 2009 16:43:23 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-649</guid>
		</item>
		<item>
			<title>Welcome to my world!</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-607</link>
			<description><![CDATA[Thanks for joining the discussion and for your polite demeanor, even though it's obvious you really didn't like one part of the post. ;-) I realize I didn't specify that I'm specifically talking about target dedupe, but I am. Perhaps I'll update it just to say that, and insert the word target in a number of places. I am not an analyst. I am not a paid blogger. Data Domain has not paid me a dime to do anything. None of the vendors mentioned above are my clients. In fact, I'm as much of an annoyance to Data Domain as I am to EMC and others. (They'd really rather I stop pointing out that they don't have global dedupe.) If I say something it's because I believe it to be fact or at a minimum I believe it as my own opinion. In fact, if you had continued reading the paragraph where the sentence to which you objected was found, you'd see that I gave Data Domain more crap than praise. I basically said, "Yeah, they own the market, BUT they still don't have global dedupe." (I point this out in advance because SOME would argue that I must be wrong on global dedupe because the market leader doesn't have it. I want you to see that I know who they are in the marketplace, but I also want to you to see that they don't have global dedupe. Now. as to the &#34;owning the market&#34; comment, I should have put the word &#34;target&#34; in there, so I will (and have edited the original comment to reflect that): &#34;Let's talk about Data Domain, as they currently own the target dedupe market hands down.&#34; They've got around 3000 customers and many more shipped systems than any vendor of which I'm aware, and that number goes up every day. The mindshare they have with end users is also unparalleled. When I talk to customers and I'm talking about target dedupe, they automatically start talking about Data Domain, as if the two are synonymous. If EMC works hard enough and long enough, and continues the practices to which I alluded in my other post (http://www.backupcentral.com/content/view/234/47/), they might indeed change this, but I certainly feel that the statement holds true today. If they don't own the target dedupe market, I don't know who does.]]></description>
			<dc:creator>W. Curtis Preston</dc:creator>
			<pubDate>Thu, 07 May 2009 21:39:55 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-607</guid>
		</item>
		<item>
			<title>SNIA Comments</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-606</link>
			<description><![CDATA[Hi Preston, I'm the Co-Chair of the SNIA DMF Data Deduplication and Space Savings Special Interest Group (DDSR SIG). I currently work at EMC and have previously worked at IBM, HDS, VERITAS, and Troika Networks (acquired by QLogic). Some comments on a few of your statements: "Global dedupe only comes into play with multi-node systems" After a year of vigorous debate by DDSR SIG members, the industry consensus on what global data deduplication means is captured by this definition: Data deduplication which stores only unique data across multiple deduplication systems. For example, global data deduplication stores only unique data across multiple target appliances or sends and stores only unique data from multiple source clients. At first glance this agrees with your initial comment but it does not coincide with your later comment: "NetApp, Quantum, EMC & Dell, have only local dedupe." Are you restricting your comments to target data deduplication only? EMC has both source and target implementations. Your statement is at odds with the facts regarding EMC Avamar (just one example) which supports global data deduplication. "Let's talk about Data Domain, as they currently own the dedupe market hands down." I've been in the storage business over 30 years in management, engineering, product management, marketing, field support, and consulting roles. I get that you need to evangelize the desires of your clients to make money. However, statements like the above do a disservice to all of our customers. Try to stick with the facts.]]></description>
			<dc:creator>Mike Dutch</dc:creator>
			<pubDate>Thu, 07 May 2009 17:37:40 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-606</guid>
		</item>
		<item>
			<title>I didn\'t say I put any trust them</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-513</link>
			<description><![CDATA[I specifically said I didn't verify any of the numbers, that I was just compiling all of the numbers that each company published. I actually spoke directly to Falconstor regarding the difference between the PDF file you referenced and this page http://www.falconstor.com/en/pages/?pn=VTLFeatures, and they said that the latter was more up to date -- that they had just qualified 8 nodes in their cluster, and had not updated the PDF version yet. (Hey, Falconstor! Update your stinking PDF already!) As to SEPATON's &#34;exaggerated&#34; claims of 500:1 dedupe ratio, consider this. When they were using those numbers, they were talking about &#34;backup-over-backup&#34; dedupe, meaning last night's backup got reduced by 500:1. While the numbers they were giving were valid (when looking at them that way), I and others counseled them that it made them look silly, as no one cared about how last night's backup got deduped. What we care about is how much ALL my backups were getting reduced. The result is that they changed their messaging about that a while ago; they don't claim those numbers any more. Look all over their site, and the most you'll is 50:1, and it will have caveats that say that this is most likely to happen in an Exchange-centric environment. (Try a google of &#34;site:www.sepaton.com.com 50:1&#34; or &#34;site:www.sepaton.com 40:1&#34; and you will find hits. What you won't find is &#34;site:www.sepaton.com 500:1.&#34; So I really wouldn't say that they are more likely to exaggerate than anyone else. I actually think all of you are exaggerating. But since I can't verify (without independent testing) how MUCH each of you are exaggerating, I'm just publishing advertised numbers. I completely agree with you on the need for an independent test. It will be the subject of a later blog.]]></description>
			<dc:creator>W. Curtis Preston</dc:creator>
			<pubDate>Fri, 24 Apr 2009 01:03:35 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-513</guid>
		</item>
		<item>
			<title>Storage Consultant</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-589</link>
			<description><![CDATA[Hi Curtis, Thanks for collecting this info. Valuable and arguable, the best combo for any blog posting. Below I'll delve into IBM's TSM storagepool features and how I would see a best fit between backup, restore, storage capacity and performance. IMO, you should dedup wat is best suitable for dedup. When backing up fileserver data, TSM only backups new and changed files which they call forever incremental. Chances are you'd see low dedup ratios. So, instead of using expensive VTL capacity, expensive both in costs as in performance, you could best store the fileserver data in a filebased storagepool. You could name this a software-VTL without any dedup or compression. Creating such a pool of cheap 1/1.5 TB SATA drives may give you both the backup and restore performance you would need for fileserver data. Database backups typically are full backups each night, so dedup could work out very nice, both in terms of storage capacity and restore performance. Backup performance is somewhat trickery. Using multiple streams each to a seperate LTO4 drive would easily outperform any dedup solution. As such is only best practice for really big databases I'll take this not into consideration for now. The one really nice feature Diligent offers (oops, nowadays IBM!) is that it can take a LUN from almost any storage system. I myself would be very curious how Compellent would work out as disk storage for a Diligent VTL. Compellent is rather cheap, writes incoming datablocks to Tier 0 (SSD) or Tier 1 (FC disk) and is able to migrate all new datablocks overnight to SATA disks. In short, it uses its SSD/FC storage as a cache for lower cost RAID5 SATA storage layers and this could be very effective in both backup and restore performance. If this Diligent/Compellent combo really sings it could be a nice solution for any Backup Server (NetBackup, CommVault, TSM, etc.). Sadly, till date I have not had the possibility to test such a solution :sad:]]></description>
			<dc:creator>Alex Sons</dc:creator>
			<pubDate>Wed, 22 Apr 2009 06:06:20 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-589</guid>
		</item>
		<item>
			<title>udubplate says:</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-571</link>
			<description><![CDATA[Jeremy, As mentioned above, each solution has a varying degree of effect based on how the solution is designed as well as how its used. By reversing the curve, it may be good enough for some but that is dependent on what your requirements are, where 99% of your restores are coming from (ie the last backup or not), how much data is being stored on the device (there's a big difference between 1 week vs 1 year of retention for example), and various other factors. As should always be the case, everyone should test the solutions, and make sure they're testing performance of restores based on the desired retention period you have (ie don't simply test restore speeds for a week's worth of backups if you're going to retain a years worth on the device as the effect you mention may vary widely based on time parameters based on the solution).]]></description>
			<dc:creator>udubplate</dc:creator>
			<pubDate>Sat, 04 Apr 2009 23:28:31 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-571</guid>
		</item>
		<item>
			<title>Forward referencing and fragmentation</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-569</link>
			<description><![CDATA[udubplate, Giving preference to the most recent copy, simply reverses the performance / age curve. A number of systems use a portion of the disk space as a first level cache to keep the most recent copy fast. In those cases the performance curve is U shaped which can completely hide any advantages to "forward referencing" the data for many workloads. Fundamentally, "forward referencing" doesn't solve the problem of having to seek all over the disk to build a volume stream. In fact, the problem of defragmenting reclaimed space becomes harder and more important. For large systems its possible the defragmenting/reclamation process becomes the system bottleneck. If you fail to adequately defragment the space reclaimed from existing volumes, then the space to store the new volumes ends up being scattered in non optimal ways across the array as time progresses. If you cannot find large contiguous regions to store the new data, then you end up seeking. If the user is backing up extremely well behaved data, where references between data streams are close in time, and those references are fairly large, the problem won't initially be as noticeable. In that circumstance its probably possible to even have datasets which don't fragment. The space is reused before the system reaches a capacity where another stream is interleaving into the space of a stream still stored on the machine. As the machine fills up, this behavior is going to be minimized. Its also going to be minimized if the volumes are expiring and being reused at diffrent rates. The problem will probably be pushed off to the point where the sales guys are long gone. I'm not sure I would want to be the guy left standing there waiting for the system to rebuild an "archive" tape, or wondering why the dedupe process no longer completes in its window.]]></description>
			<dc:creator>Jeremy</dc:creator>
			<pubDate>Thu, 02 Apr 2009 17:26:17 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-569</guid>
		</item>
		<item>
			<title>udubplate says:</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-561</link>
			<description><![CDATA[Jeremy, As Curtis identified, what you mention has a varying degree of effect based on how the solution is designed. On factor is the method of referencing that is used in the vendors deduplication algorithms. For those that use Forward Referencing (the minority it seems use it, SEPATON is one of the ones that does, I believe there are others), the most recent data is kept in its undeduplicated format as opposed to Reverse Referencing where the reverse is often true and the most recent backup is the "most deduplicated" for lack of a better term. Forward Referencing creates some unique challenges from a design perspective (especially when you begin to talk about deduplicated replication), but the idea is that most restores are done from the most recent backup so that's the one that you want to be in an undeduplicated format where the effect you mention does not exist.]]></description>
			<dc:creator>udubplate</dc:creator>
			<pubDate>Wed, 01 Apr 2009 01:01:43 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/229-dedupe-perf-comparison.html#comment-561</guid>
		</item>
	</channel>
</rss>

