<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>The real deal on the EMC 3D 4000</title>
		<description>Discuss The real deal on the EMC 3D 4000</description>
		<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html</link>
		<lastBuildDate>Fri, 10 Feb 2012 11:06:19 +0000</lastBuildDate>
		<generator>JComments</generator>
		<atom:link href="http://www.backupcentral.com/component/jcomments/feed/com_content/230/10.html" rel="self" type="application/rss+xml" />
		<item>
			<title>Unmatched Performance \&quot;potential?\&quot;</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-548</link>
			<description><![CDATA[Perhaps the linguistic disconnect for EMC is the age old storage BANDWIDTH vs THROUGHPUT argument. They seem to see their solution as unmatched... I'm not sure it's a good idea nor would it be ideal to spend capital today to compete with similar architecture. Blade based Proxy engine similar to ORacle RMAN is a more "viable" and ASIC optimized option. Less bundled hardware and more flexible for hashing/ shared cache and futures. Lastly: Look at the Birth of a technology and then match it's evolution - Ex. . VMWARE - started as 2 -4 cpu max Virtualization engine - Never designed for 64 processor consolidation. It was to consolidate applications off multiple bare metal's into one "multi-app" server. "save money" simpler admin. Check the facts - It's true. It's been challenged for 3 years to master replication/DR and scalability--- It wasn't designed to go there. Blessings all rest with their API and Virtual Appliance community -- it's still NO MVS and scalability is still lacking. Point in Check - Storage Arrays that required TWO boxes ---forecver will be two boxes - DataDomain wasn't packaged as software only solution - However, Decisions change when EMC can increase a partners margin... perhaps this was more deciding than "technology" or products agility. IMHO - Just another shoe shine with more smoke and mirrors. Humans may need to re-familiarize themselves with the Delete key and "self Control" That's my CLOUD Computing deduplication strategy for 2010 -beyond. SolutionsARchitect.com -TAJ]]></description>
			<dc:creator>Todd A Johnston</dc:creator>
			<pubDate>Tue, 24 Mar 2009 12:40:52 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-548</guid>
		</item>
		<item>
			<title>No title</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-524</link>
			<description><![CDATA[What part of there's only one copy of data in the system is unclear? There is no cache. There is a native pool and a block pool. There are no cached copies of data. It's either in native format because you've chosen not to de-dup it or it's de-dup'd in the block pool. Here's what you said about replication. "It also matters because if you plan to replicate, you can't replicate it until it's deduped." Correct but the point I was trying to make was that continuous replication replicates data from the first de-duped block onwards. it does not wait until the end of a datastream to replicate data. "The reason I don't allow you to combine your numbers is that you have two completely separate Quantum boxes behind your two separate Falconstor boxes. Each Quantum box has its own hash table and they do NOT compare data against each other. It is NOT a single block pool; Quantum's software simply doesn't support that and I don't know how you could say that. They are two islands of dedupe and are therefore not much more than two separate appliances in the same box." But the de-dup option is one box. Not two or ten or some other number, one unit. Regardless if it's a single node 4100, dual node 4200 or quad node 4400 it's one box with one block store accessible to all engines. Why does everyone else with a clustered nodes and a single block store get to post their max number but EMC doesn't? *All engines. One hash table.* And when this is pointed out you still won't make the correction. You've now been tagged as "Anybody But EMC" and as such I wouldn't wait by the phone if I were you.]]></description>
			<dc:creator>Mark Twomey</dc:creator>
			<pubDate>Tue, 24 Mar 2009 04:20:15 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-524</guid>
		</item>
		<item>
			<title>re: No title</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-547</link>
			<description><![CDATA[Not sure what happened to Mark's comment that I was responding to in the comment that starts "I've read and re-read...". I'm quoting them here so they don't disappear. ]]></description>
			<dc:creator>W. Curtis Preston</dc:creator>
			<pubDate>Tue, 24 Mar 2009 04:16:25 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-547</guid>
		</item>
		<item>
			<title>I decided I needed another post</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-542</link>
			<description><![CDATA[I thought my response to Mark Twomey's comments was important enough that I should write them in a separate post: http://www.backupcentral.com/content/view/232/47/]]></description>
			<dc:creator>W. Curtis Preston</dc:creator>
			<pubDate>Fri, 20 Mar 2009 18:21:15 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-542</guid>
		</item>
		<item>
			<title>The REAL real deal</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-541</link>
			<description><![CDATA[The REAL real deal on the 3D 4000 is that people who are buying it are returning it. I was a potential customer of both Data Domain and EMC, and asked for references from both. Data Domain gave me several that checked out just fine. EMC gave me four, so I called them. Out of the four references THAT EMC GAVE ME, THREE of them had returned the 3D 4000 and bought Data Domain appliances.]]></description>
			<dc:creator>NetBackup User</dc:creator>
			<pubDate>Fri, 20 Mar 2009 18:15:38 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-541</guid>
		</item>
		<item>
			<title>W. Curtis Preston says:</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-528</link>
			<description><![CDATA[I've read and re-read your comments trying to understand why we're so far apart. Now I think I get it. If you'll verify that my new understanding of what you're saying is correct, I'll be happy correct the post to reflect that. First, let's talk about the native pool/block pool thing. When YOU say native pool and block pool, I now understand (hopefully correctly) that you mean the disk attached to the DL 4000 (AKA Falconstor node) and the disk attached to the 3D 4000 (AKA Quantum node). Now I understand AND AGREE that data will be in either one and not both. So you understand what _I_ was saying, I wasn't talking about those pools. I was talking about the Quantum box's ability to store native copies of the data as a cache for restores. They're stored there until the box hits 75% full, at which point it starts &#34;truncating&#34; the native copies ON THE QUANTUM BOX. These are the copies to which I refer when _I_ say &#34;native copies&#34; or &#34;cache copies.&#34; Does that make sense now? As to your comments about about all engines using one hash table, I now understand that it is a valid configuration (in your eyes) to have a single 3D engine plugged into the back ends of two DL engines. Scott Waterhouse said in his blog that &#34;you can have one deduplication engine per DL Engine. (A DL4106 has one, a DL4206 or DL4406 has two DL Engines.)&#34; I assumed that meant that you would ALWAYS have two engines. What I'm hearing YOU say is that you can have one engine. So... If you have one engine (with an ingest/dedupe rate of 1.5 TB/hr, according to Scott), then I'll agree, you can combine the ingest speeds of the two front end systems to have an ingest speed of 8 TB/hr (2200 MB/s) and a dedupe speed of 1.5 TB/hr (400 MB/s). I'm not really sure that's any better, though. If your ingest speed is almost 6 times your dedupe speed, you'll only be able to USE the front end for 4 hours and still dedupe within 24 hours. Yuck. As to your comment on replication, I know that replication starts when dedupe starts. But what I'm saying is that replication can't FINISH until dedupe FINISHES, which means that if your dedupe speed is so slow that it takes 24 hours to dedupe it, your &#34;tapes&#34; will be sent offsite MUCH later than they would have if you were doing the old tape and truck method. HOWEVER, if your dedupe speed matches your incoming throughput (the way it can with inline systems and global dedupe post processing systems), then you can have an experience that more closely resembles the old days, with all backups being offsite by 9-10 am, with backups finishing at 8 am. I never said &#34;anything but EMC.&#34; I _am_ saying &#34;anything but the 3D 4000.&#34; That means I'm not talking about the DL 4000 (assuming you don't want dedupe), the 3D 1500 or 3000, and I'm not talking about the myriad other products you offer. I just think that this particular product makes no sense, and I'm standing by that until someone changes my mind. If you're correct and EMC wants to ignore me and let me stay &#34;confused,&#34; then that's their choice. I actually think they're bigger than that. We shall see.]]></description>
			<dc:creator>W. Curtis Preston</dc:creator>
			<pubDate>Fri, 13 Mar 2009 01:58:16 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-528</guid>
		</item>
		<item>
			<title>I also have been involved for a while</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-523</link>
			<description><![CDATA[I have had many, MANY conversations via official and unofficial channels during which I have been told that the path from what the original 4000 (i.e. the Falconstor box) to the 3D 4000 (i.e. the Quantum box) is via the DL's &#34;tape out&#34; interface. Yes, this movement is &#34;hidden&#34; from the end user, and yes, the user only sees one system -- but that movement still happens. EMC has built a product where you have gotten two (Falconstor & Quantum) competitor's products to share data. How did you do that? Did you (a) have backups go to the FalconStor box, then somehow teach Quantum how to read the Falconstor tape format, (b) teach the FalconStor box to write backups directly to the Quantum box without stopping on the Falconstor side, or (c) send backups to the Falconstor box, then use the existing &#34;tape out&#34; interface to copy backups from the Falconstor box to the Quantum box? (A) just can't be the case. No way. (B) is unlikely, and does not handle tapes in an existing DL 4000 -- which is what the 3D 4000 was designed for. (C) is therefore the most likely case because it doesn't require major recoding of either box; it just needs a little glue to make it work. It also deals with existing virtual tapes in an existing DL 4000, and (C) is what I've been consistently told from EVERY EMC PERSON since the first time I heard how the 3DL box worked. If (C) is not how it works, please explain to me what happens in the following (probably pretty common) scenario. Consider a customer who is an existing DL 4000 customer and now they want to add the 3D option. (The exact customer the 3D 4000 was made for.) Once EMC installs the Quantum node and its storage, how do the previous tapes that were stored on the original 4000 get deduped? My answer is that they are copied from the original 4000 to the 3D 4000 via the tape out interface. Yes, hidden from the customer, but copied none the less. On to your other comments... You do realize that I've had EXTENSIVE indepth conversations with the Quantum folks, and so understand how the Quantum piece works VERY well, right? Their headquarters are lot closer than Hopkinton (45 mins from my house), after all. (Quantum and several other vendors also answered every question in an RFI I did for backupcentral, where EMC declined to answer many questions. So I do say that if I misunderstand your box, it is EMC's fault, not mine.) What you call the &#34;native copy&#34; of the data, I call the cache. It is the native copy; it's PURPOSE is a cache used to enable restores many times faster than what is possible from the block pool. What about my replicated comment is wrong? You're telling me a 3D 4000 can replicate a block BEFORE it's deduped? So if a backup is made at 6 AM (at the end of the backup window), and the box has so much data to dedeup that this backup doesn't get deduped until 24-hours later, you're saying that it'll get replicated before that? You're kidding, right? As for days and days of data... It is absolutely the way the Quantum piece is designed. It's designed to keep as many days' of &#34;native versions&#34; as it can. The only question is how many days of this native data (what I call cache) a user is going to keep. If you keep only one day's worth of data in its native format, and you do a full restore of a large filesystem, the backups from last night will come from the native pool and the other 6 days (assuming weekly fulls) will come from the block pool. Considering the difference in restore speeds (as much as 75%), I would think you wouldn't want that to happen. Therefore, if you want to have decent restore speeds, you're going to want to keep the latest full and all incrementals since the full in native format. Hence the days and days comment. As to the data not being in two places at the same time, I'm going to say again that what you're saying just can't be correct. Once a block of data is identified as new/unique, it is copied into the block pool, but that block is still in the &#34;native&#34; pool. So it's in two places at once, until the copy in the native pool is truncated. The reason I don't allow you to combine your numbers is that you have two completely separate Quantum boxes behind your two separate Falconstor boxes. Each Quantum box has its own hash table and they do NOT compare data against each other. It is NOT a single block pool; Quantum's software simply doesn't support that and I don't know how you could say that. They are two islands of dedupe and are therefore not much more than two separate appliances in the same box. As to your question about how failover happens, that has nothing to do with global dedupe. It is managed via Falconstor's failover mechanism. Both FalconStor boxes can see both Quantum boxes (which really just appear to them as two tape libraries). They normally only write to the one they're controlling, but in a failover situation, they can write/read to/from either one, just like if you had connected two REAL tape libraries behind the two heads of the DL. Two two Quantum boxes know as much about each other as two physical tape libraries do. Sorry, they do NOT have global dedupe. As to me not liking the media server inside a VTL, the fact that a bunch of people asked for it doesn't sway me in the least. Saying I'm out of touch is funny, though, given that I'm probably the backup industry's biggest independent proponent of moving things FORWARD when they make sense. (Consider my post on OST/NDMP, for example.) I've pushed CDP and near-CDP in the right setting, source dedupe in the right setting, but I don't happen to like that particular option and I have (IMHO) really good reasons for not liking (the fact that Symantec hates it is a big part of it). I never said I've abandoned tape. Not once. I said I don't want to do tape via the Falconstor-style tape-out functionality. I think tape copies should be controlled via the backup software. (I am cool doing it via the tape out functionality if it's controlled by the backup software, but you can do that in the 1500 and 3000; no need to go to the 4000 to get that. What I said about iSeries is (while I've never seen one), if you want that functionality, see if you can get it from FalconStor directly. Test their dedupe and see if it works for you. (I'm speaking of the customer, of course, not EMC.) Again, no reason to go the 4000 hybrid system to get that. As to the pricing question, it is NOT my understanding that the 3D 4000 is just disk split between native and block pools. That's what you'll find in the _Quantum_ side of the equation, of course. But with the 3D 4000, there is also the Falconstor side of the equation, where you'll find a bunch of disk, too -- disk that's not needed in other vendors' implementations. As to &#34;I wasn't in the meetings&#34; all I really said was that this odd design was the only way you could fulfill your promise to bring dedupe to the 4000. I know you promised to do that. I was there at customers when you did it. As to it being the only way, well, the Falconstor route would have been the obvious one, but you chose not to do it for various reasons. The Avamar route was nowhere near fast enough. So the only choice left was to use an outside solution. And you chose the only one that wasn't being resold by a competitor. Seems obvious enough to me. As to me making any corrections, I don't see any to make. It's going to take more than a &#34;no we don't&#34; post to make that happen. I am, however, happy to have a phone conference (sooner than later) with anyone at EMC that can answer my questions with what you're saying. (We'll be typing until we're blue in the face if we do it here.)]]></description>
			<dc:creator>W. Curtis Preston</dc:creator>
			<pubDate>Wed, 11 Mar 2009 03:28:41 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-523</guid>
		</item>
		<item>
			<title>Lots to correct.</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-522</link>
			<description><![CDATA[And now you'll read my piece. I've been working with the DL series since introduction and am the voice of authority. Without struggling with your awful text editor and quoting you, I have a number of corrections. This idea of Front End/Back End you have is wrong. Like in other post processing architectures there is a native pool and a block pool. The block pool is the addition the de-dup option for the 4000 series. It is two units managed as one logical entity, backups designated for de-duplication are defined by a policy set by the end user. Data to be de-duplicated is reduced and written to the block pool, data is not moved "multiple times" as you incorrectly state it is moved once from the native pool to the block pool. The use of multiple engines/CPUs being common in some of the other post processing architectures you've listed. All post processing architectures have to deal with latency and resource contention the distribution across more than one node improves performance. Others on that table will agree. Cached copy? There is no cached copy. The cache as I know it is in the block pool and is 256MB of storage used for immediate de-duplication. There is only one copy of data. I have no idea where this multiple copies thing came from but it's wrong. If the backup data is in the native pool and you go to restore it'll be read from the native pool. If it's in the block pool it'll be read from the block pool. It'll either be one place or the other. Never both at the same time. For performance why in your table do we have a single engine compared to multiple engines in other systems? The 4206 and 4406 both have multiple nodes and just like the 4106 all have access to a block pool. How would failover work if all data was not available to all nodes? If your idea of global de-duplication is a unified block pool then I ask that you update all the relevant entries in those tables to 2,200MBs per second and eliminate the incorrect assertion of no global de-duplication. A block pool scaling to 148TB. Your replication comment and the idea of days and days is also incorrect. Policy based de-dup means I can begin replication immediately and don't have to wait until the de-dup jobs have completed. Your opinion on consolidated media management is out of touch as not only is it the most frequently sold option with the Disk Library it was the very first request for enhancement submitted to me by customers. Tape might be dead to you but it isn't for a lot of backup admins. I personally worked on qualifying the iSeries here in Cork and because you haven't seen it doesn't mean it's not a critical component of a lot of peoples infrastructure. And a lot of high end infrastructures at that. Pricing and cost. Why would a system with a drive count split between native and block pools be priced differently than any of the other post processing solutions structured the same way? As for your assertions to why this is this and that is that, I don't recall you at any of the meetings. Indeed I don't think you have any relationship with EMC do you? I'd appreciate it if you made all the relevant corrections and I realise how difficult it can be to pick out facts when you're not building these things from the screws up. Regards, Zilla. (Owner of every DL in EMEA marked engineering sample since product introduction. Setter upper of systems from the cardboard box to production)]]></description>
			<dc:creator>Mark Twomey</dc:creator>
			<pubDate>Tue, 10 Mar 2009 22:03:53 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-522</guid>
		</item>
		<item>
			<title>Thank you</title>
			<link>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-506</link>
			<description><![CDATA[OK, I am definitely not taking that personally, and I definitely do appreciate the thought and reasoning that went into that. I have lots of thoughts, but more importantly, I have a plane to catch to a tropical destination. I will respond in a week or so when I get back. In the meantime, thanks again for the reasoned dialog. I think we benefit, and the user community benefits, from this sort of exchange of ideas--even if you and I don't necessarily come to agreement on each and every point.]]></description>
			<dc:creator>Scott Waterhouse</dc:creator>
			<pubDate>Sat, 07 Mar 2009 11:38:16 +0000</pubDate>
			<guid>http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/230-the-real-deal-on-the-emc-3d-4000.html#comment-506</guid>
		</item>
	</channel>
</rss>

