Written by W. Curtis Preston
Friday, 07 May 2010 15:08
I'm working on an update to my Mar 09 story on dedupe performance
. While there's still the argument of global dedupe or not, the hardest problem I'm having is with post-process vendors not publishing their dedupe rates.
Inline vendors are a piece of cake in comparison. Just look at their data sheet, and right up front it will say how fast the box is. But the post-processing vendors are hiding behind their ingest numbers only. Why?
Look at the data sheets from ExaGrid
, and SEPATON
. Every single one of them advertises only an ingest rate. And every single one of them has a dedupe rate that is at best
half their ingest rate -- but they do not advertise it. Why is that? Are they trying to hide something?
I believe strongly in truth in advertising. I think it's bogus that Data Domain advertises a DDX "array" as one system with one throughput number when everyone knows it's 16 completely separate systems with no dedupe knowledge of each other. They might as well advertise a "DDY battery" that's made up of 1000 DD 880s and say they have a system with the throughput of 5.4 PB/hr! It would be just as truthful as the DDX array.
And I think it's wrong that post-processing vendors don't advertise their dedupe rates, because it's really important for compariing and architecting systems. If you can only dedupe 500 MB/s, it doesn't matter if you can ingest 10000 MB/s. You can only dedupe 43 TB/day, so you can only ingest 43 TB/day. Maybe you can
ingest data at 10000 MB/s, but you can only do it for 4.3 hours before you ingest more data than you can dedupe in a day.
These numbers matter. So why don't they publish them?