Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/pbuc/public_html/forum/mods/ext_phorummail/ezc/Base/src/ezc_bootstrap.php on line 36

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; KeyCAPTCHA_CLASS has a deprecated constructor in /home/pbuc/public_html/forum/mods/keycaptcha/keycaptcha.php on line 108
Tapeless backup environments?
Welcome! » Log In » Create A New Profile

Tapeless backup environments?

Posted by Anonymous 
Tapeless backup environments?
September 24, 2007 06:14AM
Wow, this has been a fun topic! I’m really enjoying debating it, and enjoy even more that not everyone agrees with me.

I do not think that de-dupe is the only deciding factor in a purchase, but at this point, I do believe it is a show-stopper feature if you plan to use your IDT (intelligent disk target) as your PRIMARY storage device for backups (as opposed to only using it for staging). De-dupe makes the device cost 10 times less. That’s kind of huge. De-dupe is the only feature making disk affordable enough to be used as a replacement for tape (at least onsite).

You mention replication. I like replication, and I like the idea of replicating my backups off-site, and we’ve got some customers who have done it, and some more that are working on it right now. BUT I’d say that unless you’re talking LAN replication, or you’re talking a significantly small amount of data, accomplishing replication without de-dupe is impossible. It’s just math. A) The device that people already can’t afford (a device big enough to hold all backups, not just be a staging device) now has to be doubled in size (one onsite and one offsite) B) The occasional fulls and full-file incremental backups are going to create a whole lot of data that needs to be replicated. So without de-dupe, replication becomes prohibitively expensive.

You mention performance. I’m all about performance. I do not, however, agree with your assertion that all de-dupe vendors have performance issues at a certain level. I agree that many of them do have these issues. Once you get to the performance ceiling of a given device, you have to buy another one and they don’t share de-dupe information. However, that’s not the way all of them are. If you find yourself in need of MB/s in the thousand(s) range, there is more than one vendor that can give you that within a single de-dupe setup (meaning it will all get de-duped together).

I completely agree that this is all new. So you have to deal with that. That doesn’t change the fact that de-dupe makes the replication idea much more feasible for many, many customers.

If you don’t mind paying 10 times more (at least) for the hardware, and needing 20 times more bandwidth to replicate your backups, then feel free to stick with the more established products, and I truly mean that. It’s just that most customers I’ve talked to just can’t ignore those numbers. They’re seeing de-dupe as making the not-affordable affordable and the impossible possible.

---
W. Curtis Preston
Backup Blog < at > [url=http://www.backupcentral.com]www.backupcentral.com[/url]
VP Data Protection, GlassHouse Technologies

[b]From:[/b] NICHOLAS MERIZZI [mailto]
[b]Sent:[/b] Saturday, September 22, 2007 12:06 PM
[b]To:[/b] veritas-bu-bounces < at > mailman.eng.auburn.edu; Curtis Preston; Kevin.Whittaker < at > syniverse.com; jlightner < at > water.com; veritas-bu < at > mailman.eng.auburn.edu
[b]Subject:[/b] Re: FW: [Veritas-bu] Tapeless backup environments?

[quote]
Curtis - Although I agree with the other responses you have given out with respect to the tape vs. disk cost I am not sure about your statements below.

Going back for a second to the cost of tape vs. disk... if you do an analysis make sure to take all things into account when you backup to tape. This is why most people don't get a proper cost associated with tape backup i.e:

1. SAN ports

2. Tape drives -> fixing them, lost time, shoe-shining

3. media cost -> fixing media, media failure cost(cost of not being able to do a restore)

4. off siting -> the cycles/dollars lost in handling that internally, the cost of dealing with Recall/Iron Mountain (or whoever), the cost associated with the delay in waiting for a tape to be recalled...

5. library maintenance cost

6. restore duration cost (i.e. if i have 100 people waiting for a Tier 1 server to be restored...)

Anyways the list of "invisible costs" associated with tapes go on...

As for your EMC CDL comments... First I believe they are now called EDL (EMC Disk Libraries) because they take into account their new Symmetrix backend devices. Although I agree with you that de-dup is important to the future of backups you make it seem that it should be the only deciding factor in a purchase! If you push de-dup aside for a second what do most customers want? My guess is performance, availability, stability, integration with backup application. This has been my thought process and these de-dup companies you speak about such as Sepaton, Diligent, Data Domain all at one point or another have HUGE performance hits (i.e. we have tape drives that go faster then some of these), little capability to scale (without combining multiple devices together), or have un-explainable single points of failures.

I also agree that replication is important and if you can minimize the amount you replicate then great. Here is my dilemma: Most of the de-dup vendors out there (i.e. I am thinking of Sepaton) that can perform de-dup have only been in the replication business for a year (probably less) and have very little maturity in that space! That scares me a bit...

As for backup integration I personally like the fact that with EMC I can have a built in media server on top of my VTL and control everything from what I am familiar with... no other vendor offers that!

Anyways just my two cents... Bottom line is that I agree that de-dup is important but if you can push that aside and look at the other technical merit (assuming that all vendors will have de-dup sooner than later) suddenly the list of enterprise level candidates drops significantly from what I am seeing.

-Nicholas

[b]From:[/b] veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] [b]On Behalf Of [/b]Curtis Preston
[b]Sent:[/b] Friday, September 21, 2007 1:13 PM
[b]To:[/b] Kevin Whittaker; Jeff Lightner; veritas-bu < at > mailman.eng.auburn.edu
[b]Subject:[/b] Re: [Veritas-bu] Tapeless backup environments?

The only issue there is that the EMC CDL does not support de-duplication, and it doesn&#8217;t look like they&#8217;ll be doing it any time soon. I know they&#8217;re working on it, but they haven&#8217;t announced anything public, so who knows. Compare that to the other de-dupe vendors that announced probably a year before they were ready, and you&#8217;ve got some sense of my opinion of when EMC de-dupe will actually be GA if not later.

Your design would work great if you had de-dupe. Without de-dupe, you are going to be replicated 20 times more data (or more), requiring a significantly larger pipe.

---

W. Curtis Preston

Backup Blog < at > [url=http://www.backupcentral.com/]www.backupcentral.com[/url]

VP Data Protection, GlassHouse Technologies

[b]From:[/b] veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] [b]On Behalf Of [/b]Kevin Whittaker
[b]Sent:[/b] Friday, September 21, 2007 7:48 AM
[b]To:[/b] Jeff Lightner; veritas-bu < at > mailman.eng.auburn.edu
[b]Subject:[/b] Re: [Veritas-bu] Tapeless backup environments?

We have it on our plan. We will be using tape for only long term retention of data.

Our plan is to purchase another EMC CDL, and mirror our existing EMC CDL to the EMC CDL at our DR site. Our master server already is duplicated, and this will allow us to start restores of stuff that is not tier 1 applications that already are mirrored to the DR site.

I would prefer not to save the long term on tape, but we don't have a solution for any other way to do it at this time.

Kevin

[b]From:[/b] veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] [b]On Behalf Of [/b]Jeff Lightner
[b]Sent:[/b] Friday, September 21, 2007 9:44 AM
[b]To:[/b] veritas-bu < at > mailman.eng.auburn.edu
[b]Subject:[/b] [Veritas-bu] Tapeless backup environments?

Yesterday our director said that he doesn&#8217;t intend to ever upgrade existing STK L700 because eventually we&#8217;ll go tapeless as that is what the industry is doing. The idea being we&#8217;d have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes.

It made me wonder if anyone was actually doing the above already or was planning to do so?

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu [/quote]
Tapeless backup environments?
September 24, 2007 06:17AM
And yet there are many companies backing up well beyond a Terabyte from remote offices back to their central office using de-duplication. Consider JPMC&#8217;s presentation at the last vision. They&#8217;re backing up over 200 remote offices using Puredisk, a de-duplication backup product. I don&#8217;t remember the exact numbers, but many of them were quite large.

I don&#8217;t think that bandwidth is free, but neither are trucks. AND if you&#8217;re going the truck route, make sure you add the cost and risk of an encryption system to the mix.

---
W. Curtis Preston
Backup Blog < at > [url=http://www.backupcentral.com]www.backupcentral.com[/url]
VP Data Protection, GlassHouse Technologies

[b]From:[/b] veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] [b]On Behalf Of [/b]Ed Wilts
[b]Sent:[/b] Saturday, September 22, 2007 9:35 AM
[b]To:[/b] 'Jeff Lightner'; veritas-bu < at > mailman.eng.auburn.edu
[b]Subject:[/b] Re: [Veritas-bu] Tapeless backup environments?

Here&#8217;s some simple math that may help (complements of ExaGrid&#8217;s web site).

If you have 1TB of data with a 2% change rate, you&#8217;ll need to back up 20GB of daily incrementals. To replicate this to another site in 18 hours requires 3Mbps of bandwidth. If you have lots of bandwidth or not too much data, replication to an offsite location may make sense. But to think that you can replicate your backups for 20TB of data to another state is going to make your network group squirm. Iron Mountain looks pretty cheap comparing to offsite electronic replication.

We have 1 application by itself that adds 30GB of new data every day. It&#8217;s being replicated within the metro area over a 1Gbps pipe (real time, not via backups). We sure couldn&#8217;t replicate everything&#8230;

As the OLD saying goes, never understand the bandwidth of a station wagon full of tapes.

&#8230;/Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:ewilts < at > ewilts.org

[b]From:[/b] veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] [b]On Behalf Of [/b]Jeff Lightner
[b]Sent:[/b] Friday, September 21, 2007 8:44 AM
[b]To:[/b] veritas-bu < at > mailman.eng.auburn.edu
[b]Subject:[/b] [Veritas-bu] Tapeless backup environments?

Yesterday our director said that he doesn&#8217;t intend to ever upgrade existing STK L700 because eventually we&#8217;ll go tapeless as that is what the industry is doing. The idea being we&#8217;d have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes.
It made me wonder if anyone was actually doing the above already or was planning to do so?
Tapeless backup environments?
September 24, 2007 06:31AM
A 1 TB array that can store 20 TB of de-duped data in it will cost about
$20K. (A general rule of them is to base your pricing on a 20:1 de-dupe
ratio, then price it at about $1/GB of effective storage. If you do
that, you'll be close to list price of a lot of products.) At that
cost, it's very close to the price of a tape library fully populated
with tapes and drives.

As to whether or not it's worth it for a given setup, you should
obviously test it vs the pricing, but it's very uncommon for it to not
make sense financially. I can think of three setups that are known
issues:

1. If you're using it for disk staging and not storing any retention on
it. A lot of the de-dupe comes from de-duping full backups against each
other.

2. If you're trying to de-dupe non-dedupe-able things, such as seismic
data, medical imaging data, or any other data types that are
automatically created by a computer (as opposed to database entries and
Word docs.)

3. If your backup product doesn't do full backups of filesystem data,
you will not get as much as other people.

Everything is also negotiable. If you've tested and you're not getting
the advertised de-dupe ratio, use that in the negotiation stage. If
they generally advertise 20:1 and you're only getting 10:1, it would
seem reasonable to assume a 50% discount.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: Ed Wilts [mailto]
Sent: Saturday, September 22, 2007 9:47 AM
To: Curtis Preston; 'Justin Piszcz'; 'Jeff Lightner'
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] Tapeless backup environments?

But Curtis, a disk drive by itself isn't very useful either - you'll
need to
a controller or two.

And don't forget to factor in the price of the de-duplication appliances
or
software. Those suckers are *NOT* cheap. An appliance to support 1TB
of
compressed data lists out at about $20K. Unless you get a *lot* of
de-duplication - and not everybody does - that appliance is going to get
killed on price compared to not de-duping it.

It took me only 30 minutes with a de-dupe vendor last week to eliminate
their product from consideration in our environment.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:ewilts < at > ewilts.org

[quote]-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] On Behalf Of Curtis Preston
Sent: Friday, September 21, 2007 12:10 PM
To: Justin Piszcz; Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots. A drive by itself is
useful;
a tape by itself is not.

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 7:36 AM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or
even

cheaper, I do not remember the exact figures, but someone I know has
done
a cost analysis and tapes were by far cheaper. Also something that
nobody
calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

[quote]Disk is not cheaper? You've done a cost analysis?

Not saying you're wrong and I haven't done an analysis but I'd be
surprised if disks didn't actually work out to be cheaper over time:

1) Tapes age/break - We buy on average several hundred tapes a year
[/quote][/quote]-
[quote][quote]support on a disk array for failing disks may or may not be more
expensive.

2) Transport/storage - We have to pay for offsite storage and
[/quote]transfer
-
[quote]it seems just putting an array in offsite facility would eliminate
[/quote]the
[quote]need for transportation (in trucks) cost. Of course there would be
[/quote]cost
[quote]in the data transfer disk to disk but since everyone seems to have
connectivity over the internet it might be possible to do this using
[/quote]a
[quote]B2B link rather than via dedicated circuits.

3) Labor cost in dealing with mechanical failures of robots. This
[/quote]one
[quote]is hidden in salary but every time I have to work on a robot it
[/quote][/quote]means
[quote]I
[quote]can't be working on something else. While disk drives fail it
[/quote]doesn't
[quote]seem to happen nearly as often as having to fish a tape out of a
[/quote]drive
[quote]or the tape drive itself having failed.

-----Original Message-----
From: Justin Piszcz [mailto]
Sent: Friday, September 21, 2007 10:08 AM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

On Fri, 21 Sep 2007, Jeff Lightner wrote:

[quote]Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is
[/quote][/quote]what
[quote][quote]the industry is doing. The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever
[/quote][/quote]transporting
[quote][quote]tapes.

It made me wonder if anyone was actually doing the above already or
[/quote]was
[quote]planning to do so?

[/quote]
That seems to be the way people are 'thinking' but the bottom line
[/quote][/quote]is
[quote][quote]disk
still is not cheaper than LTO-3 tape and there are a lot of
[/quote]advantages
[quote]to
tape; however, convicing management of this is an uphill battle.

Justin.
[/quote][/quote]

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 06:36AM
Ed Wilts said:

[quote]1) Disk ages and breaks too.
[/quote]
But with RAID, no longer will the failure of a piece of media cause a
backup or restore failure.

[quote]2) Transport is cheap. I'd be surprised if I couldn't transport a
thousand tapes for the cost of a terabyte of storage. Bandwidth to
[/quote]move >data is *NOT* cheap. 20GB/day requires 3Mbps of pipe.

I've done a number of cost comparisons lately, and you're right. It's
not cheap, but it's not astronomical either. And you need to weigh that
cost against not having the risk of a lost tape and all the
multi-million dollar costs that come along with that these days.

[quote]3) I spend more time replacing disk drives than I do replacing tapes
[/quote]or
[quote]tape drives. To back up my 1200 SAN-based spindles, I have 6 LTO-3
[/quote]drives.

You have 200 times more disk drives than you have tape drives. Of
course you spend more time replacing them. But those drive failures
never have to cause backup or restore failures, as tape/drive failures
do. Try having a few hundred tape drives and see how your life changes.
I have a customer with 100 drives and their tape drive vendor is in once
a week swapping something, and each one of those swaps is associated
with a backup or restore failure.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 02:46PM
With VTL there is no need to multistream.

Instead of writing 8 stream to 1 drive, just create 8 Virtual drives,
and not multiplex.

It's not because of a performance issue, it's an advantage of
virtualization.

As far as performance goes, with a Disk as disk config, to create a high
perf target, you would need to create HLUNs which are striped over many,
many LUNs on your array, or present LUNs which are stripes of segments
of many RAID groups.

Many VTLs (the one I'm using, for instance) distribute the writes over
many LUNs. I'm currently writing dozens of simultaneous jobs distributed
over 28 separate LUNs.

The data reduction (compression) & throughput I'm getting with VTL is
definitely better, on a "per client job" basis than I was getting to
MPX'ed jobs going to LTO2.

Offsite is SUPER easy....we replicate our LUNs caontaining the de-duped
data to our DR site.
To bring up the other site, once the DR LUNs are made R/W, we just start
the daemons on the DR VTL and away we go.
The devices are available there as they were at head office.

Don't even need to rediscover devices on the NBU servers.

Vault works great for spinning off copies to Physical tapes, if
necessary.

Paul

--

[quote]-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf
Of Clem Kruger
Sent: September 22, 2007 5:12 AM
To: Jeff Lightner; Justin Piszcz
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Compression on a VTL is done by the operating system (normally LINUX)
which we all know is a slow process and therefore not
recommended. Your
VTL supplier will also recommend that you do not multistream as this
also slows down the process.
[/quote]====================================================================================

La version française suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so.

------------------------------------------------------------------------------------

Le présent courriel peut contenir de l'information privilégiée ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre
ordinateur toute copie du courriel reçu.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 02:47PM
I currently backup 9TB of data to VTL during a FULL window which writes ~100GB of data to the VTL repository in that window.

Another state is one thing, but across town via DWDM is no prob.

out of state is handled by duping that data to phys tape....wouldn't want to dupe disk outside of a DWDM connection.

Paul

--
[quote]
-----Original Message-----
[b]From:[/b] veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] [b]On Behalf Of [/b]Ed Wilts
[b]Sent:[/b] September 22, 2007 9:35 AM
[b]To:[/b] 'Jeff Lightner'; veritas-bu < at > mailman.eng.auburn.edu
[b]Subject:[/b] Re: [Veritas-bu] Tapeless backup environments?

Here&#8217;s some simple math that may help (complements of ExaGrid&#8217;s web site).

If you have 1TB of data with a 2% change rate, you&#8217;ll need to back up 20GB of daily incrementals. To replicate this to another site in 18 hours requires 3Mbps of bandwidth. If you have lots of bandwidth or not too much data, replication to an offsite location may make sense. But to think that you can replicate your backups for 20TB of data to another state is going to make your network group squirm. Iron Mountain looks pretty cheap comparing to offsite electronic replication.

We have 1 application by itself that adds 30GB of new data every day. It&#8217;s being replicated within the metro area over a 1Gbps pipe (real time, not via backups). We sure couldn&#8217;t replicate everything&#8230;

As the OLD saying goes, never understand the bandwidth of a station wagon full of tapes.

&#8230;/Ed
====================================================================================

La version française suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so.

------------------------------------------------------------------------------------

Le présent courriel peut contenir de l'information privilégiée ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre
ordinateur toute copie du courriel reçu.
[/quote]
Tapeless backup environments?
September 24, 2007 03:33PM
Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 03:55PM
For on-demand type database backups, I had great success with setting up
a simple SATA-based DSU which was seen by one of the media servers. It
had a vault policy to dump it to tape after 4-5 days, then expire the
DSU image. It worked out great for informix onbar log dumps
especially...

Harry S.
Atlanta

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Justin
Piszcz
Sent: Saturday, September 22, 2007 10:28 AM
To: Ed Wilts
Cc: veritas-bu < at > mailman.eng.auburn.edu; 'Jeff Lightner'
Subject: Re: [Veritas-bu] Tapeless backup environments?

Don't even get me started on SANs, I have seen the entire loss of an MTI

(now EMC) SAN and with the new Claiiron SANS I have seen entire shelves
go
off-line due to bad SPAs etc, IMO not reliable.

Also with disk, I have a question with VTLs, etc, if I am feeding
multiple
LTO-3 tape drives using 10Gbps; what type of disk/VTL (not SAN) is out
there that can accept multiple 10Gbps streams/data and will not choke?

VTLs seem like a good idea for filesystem backups but for on-demand
database backups, I do not see them as the holy grail.

Justin.

On Sat, 22 Sep 2007, Ed Wilts wrote:

[quote]1) Disk ages and breaks too.
2) Transport is cheap. I'd be surprised if I couldn't transport a
[/quote]thousand
[quote]tapes for the cost of a terabyte of storage. Bandwidth to move data
[/quote]is
[quote]*NOT* cheap. 20GB/day requires 3Mbps of pipe.
3) I spend more time replacing disk drives than I do replacing tapes
[/quote]or
[quote]tape drives. To back up my 1200 SAN-based spindles, I have 6 LTO-3
[/quote]drives.
[quote]It sounds like you need to either replace your tape drives or treat
[/quote]them
[quote]better. We do work on our robots perhaps once every few months. We
[/quote]replace
[quote]disk drives on a weekly basis. NetBackup requires a *lot* more time
[/quote]than
[quote]the robots or the disk drives ever will.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:ewilts < at > ewilts.org

[quote]-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] On Behalf Of Jeff Lightner
Sent: Friday, September 21, 2007 9:34 AM
To: Justin Piszcz
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Disk is not cheaper? You've done a cost analysis?

Not saying you're wrong and I haven't done an analysis but I'd be
surprised if disks didn't actually work out to be cheaper over time:

1) Tapes age/break - We buy on average several hundred tapes a year -
support on a disk array for failing disks may or may not be more
expensive.

2) Transport/storage - We have to pay for offsite storage and
[/quote][/quote]transfer
[quote][quote]-
it seems just putting an array in offsite facility would eliminate
[/quote][/quote]the
[quote][quote]need for transportation (in trucks) cost. Of course there would be
cost
in the data transfer disk to disk but since everyone seems to have
connectivity over the internet it might be possible to do this using
[/quote][/quote]a
[quote][quote]B2B link rather than via dedicated circuits.

3) Labor cost in dealing with mechanical failures of robots. This
[/quote][/quote]one
[quote][quote]is hidden in salary but every time I have to work on a robot it means
[/quote][/quote]I
[quote][quote]can't be working on something else. While disk drives fail it
[/quote][/quote]doesn't
[quote][quote]seem to happen nearly as often as having to fish a tape out of a
[/quote][/quote]drive
[quote][quote]or the tape drive itself having failed.

-----Original Message-----
From: Justin Piszcz [mailto]
Sent: Friday, September 21, 2007 10:08 AM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

On Fri, 21 Sep 2007, Jeff Lightner wrote:

[quote]Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is
[/quote]what
[quote]the industry is doing. The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever
[/quote][/quote][/quote]transporting
[quote][quote][quote]tapes.

It made me wonder if anyone was actually doing the above already or
[/quote]was
[quote]planning to do so?

[/quote]
That seems to be the way people are 'thinking' but the bottom line is
disk
still is not cheaper than LTO-3 tape and there are a lot of
[/quote][/quote]advantages
[quote][quote]to
tape; however, convicing management of this is an uphill battle.

Justin.
[/quote]
[/quote]_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 03:55PM
Data Domain makes a hardware storage device (disks) which does
deduplication. Rather than backing up block for block all the time it
does it only for the first backup. For subsequent backups rather than
doing an incremental backup at file level it backups up incrementally at
block level meaning only the blocks that changed in the source are
stored on the target.

The benefit to this is good for things like databases on filesystems
where the datafile gets updated for any write to the datafile. A
standard file incremental would backup the entire datafile but a
deduplication incremental would only backup the blocks modified within
the datafile. One can get what appears to be a very high level of
compression to the deduplication storage. I've seen numbers like 20:1
and even one person on this list last year said something like 80:1
though that wouldn't be typical.

Data Domain isn't the only deduplication company out there and we
haven't yet implemented the ones we bought (though we will before the
end of October). I was contacted off list by another company called
Sepaton but there solution seemed to require one to one correspondence
between original storage and target storage. I believe there is at
least one other company doing deduplication but I don't recall who
(Falconstore maybe)?

-----Original Message-----
From: Dave Markham [mailto]
Sent: Monday, September 24, 2007 11:35 AM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

----------------------------------

CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.

----------------------------------

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 03:56PM
On Mon, 24 Sep 2007, Dave Markham wrote:

[quote]Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.
[/quote]LTO-3 = 400GiB

[quote]
I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

[/quote]_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 04:37PM
Hi Dave,

Yes it is a difficult decision I have looked at DataDomain with
NetBackup. I have found that the backups are faster and there is a vast
amount of disk being saved.

NetBackup 6.5 includes de-duplication and I have become a great friend
of it. To use the words of a supplier, "Saving me Time, Saving me Space
and Saving me Money" :)

Kind Regards,
Clem Kruger

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Dave
Markham
Sent: 24 September 2007 17:35 PM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 04:44PM
On a similar note how does NDMP play with Disk de-dup? All of the de-dups
I've seem are NAS devices. NDMP only talks to tape or VTL. Are there VTL's
with De-dup that would solve the NDMP problem?

Jim

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Dave Markham
Sent: Monday, September 24, 2007 8:35 AM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it. The
first thing is i learned a new term called deduplication which i didn't know
existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does the
software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer which
has 2 sites. Production and DR about 200 miles apart. There is a link
between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required to
be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i thought
by using disk based backups, and retentions of weekly/monthly backups
lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth
transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i
thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and media

Time take to backup ( network will be bottle neck here. Still on 100Meg lan
with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 04:47PM
There are several.
FalconStor, Diligent, Quantum and Sepaton I believe will all present a
"tape" to an NDMP device, and provide de-dupe on the backend.

Paul

--

[quote]-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf
Of Jim Horalek
Sent: September 24, 2007 12:43 PM
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

On a similar note how does NDMP play with Disk de-dup? All of
the de-dups
I've seem are NAS devices. NDMP only talks to tape or VTL.
Are there VTL's
with De-dup that would solve the NDMP problem?

Jim
[/quote]====================================================================================

La version française suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so.

------------------------------------------------------------------------------------

Le présent courriel peut contenir de l'information privilégiée ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre
ordinateur toute copie du courriel reçu.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 05:17PM
Do you need a special license for 6.5 or can those with 6.0 licenses
upgrade? I assume you need to open a case with NetBackup to get the
download links?

Justin.

On Mon, 24 Sep 2007, Clem Kruger wrote:

[quote]Hi Dave,

Yes it is a difficult decision I have looked at DataDomain with
NetBackup. I have found that the backups are faster and there is a vast
amount of disk being saved.

NetBackup 6.5 includes de-duplication and I have become a great friend
of it. To use the words of a supplier, "Saving me Time, Saving me Space
and Saving me Money" :)

Kind Regards,
Clem Kruger

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Dave
Markham
Sent: 24 September 2007 17:35 PM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

[/quote]_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 05:42PM
I am not quite sure how it is done there. I would contact Symantec in
your area and ask how they will manage your license.

Kind Regards,
Clem Kruger

-----Original Message-----
From: Justin Piszcz [mailto]
Sent: 24 September 2007 19:16 PM
To: Clem Kruger
Cc: dave.markham < at > fjserv.net; Jeff Lightner;
veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Do you need a special license for 6.5 or can those with 6.0 licenses
upgrade? I assume you need to open a case with NetBackup to get the
download links?

Justin.

On Mon, 24 Sep 2007, Clem Kruger wrote:

[quote]Hi Dave,

Yes it is a difficult decision I have looked at DataDomain with
NetBackup. I have found that the backups are faster and there is a
[/quote]vast
[quote]amount of disk being saved.

NetBackup 6.5 includes de-duplication and I have become a great friend
of it. To use the words of a supplier, "Saving me Time, Saving me
[/quote]Space
[quote]and Saving me Money" :)

Kind Regards,
Clem Kruger

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Dave
Markham
Sent: 24 September 2007 17:35 PM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain
[/quote]i
[quote]think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on
[/quote]increased
[quote]bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being
[/quote]required
[quote]to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on
[/quote]100Meg
[quote]lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of
[/quote]increased
[quote]bandwith between sites to transfer backup data.

Im now confused what to propose :)

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

[/quote]
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 07:22PM
All of those Paul said and Data Domain too. They have both a NAS and a virtual tape interface. And yes, all of these do de-dupe.

I keep a directory of de-dupe vendors at Backup Central Wiki:
http://www.backupcentral.com/components/com_mambowiki/index.php/Disk_Targets%2C_currently_shipping

Here's a tinyurl version in case that one get's truncated:
http://tinyurl.com/2dtvh2

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] On Behalf Of Paul Keating
Sent: Monday, September 24, 2007 12:46 PM
To: Jim Horalek; veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

There are several.
FalconStor, Diligent, Quantum and Sepaton I believe will all present a
"tape" to an NDMP device, and provide de-dupe on the backend.

Paul

--

[quote]-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf
Of Jim Horalek
Sent: September 24, 2007 12:43 PM
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

On a similar note how does NDMP play with Disk de-dup? All of
the de-dups
I've seem are NAS devices. NDMP only talks to tape or VTL.
Are there VTL's
with De-dup that would solve the NDMP problem?

Jim
[/quote]====================================================================================

La version française suit le texte anglais.

------------------------------------------------------------------------------------

This email may contain privileged and/or confidential information, and the Bank of
Canada does not waive any related rights. Any distribution, use, or copying of this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately from
your system and notify the sender promptly by email that you have done so.

------------------------------------------------------------------------------------

Le présent courriel peut contenir de l'information privilégiée ou confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre
ordinateur toute copie du courriel reçu.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 07:35PM
Dave,

Dude, you've got to get our more. ;) I'd recommend continually perusing
some of these sites to stay current on what's going on in the industry.
De-dupe is kind of the most-mentioned topic in the storage industry
since I don't know what.

http://www.searchstorage.com
http://www.byteandswitch.com
http://www.infostoremag.com
http://www.isit.com/IndexSTO.cfm
http://www.backupcentral.com (My blog)

On my blog I've got a series of entries that talks about De-duplication,
starting with this one, "What is De-duplication?" I tried to link all
the de-dupe entries together, so that each entry has a forwarding link
to the next blog entry in the series:
http://www.backupcentral.com/content/view/58/47/

Your question about where de-dupe resides is answered in this entry "Two
different types of de-dupe:"

http://www.backupcentral.com/content/view/129/47/

We've got directories of both types:
Hardware/Target: http://tinyurl.com/384528
Software/Source: http://tinyurl.com/2dtvh2

(I use TinyUrl.com because the URLs are very long and tend to get
truncated in email. BTW, tinyurl uses de-duplication-like techniques,
as they run an algorithm against the string to give you a smaller
string. Then when you click on that string, they "restore" the original
URL to your browser. Kind of cool.)

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Dave
Markham
Sent: Monday, September 24, 2007 11:35 AM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 09:13PM
[quote]Question : I gather Deduplication is using other software.
DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?
[/quote]
In the technologies I'm familiar with--one of them is old, another new,
it's conceptually simple. "The system," whether that's a standalone
system or a box of disk with some smarts or an agent on the backup
client, receives data and examines it in blocks of some size (AFAIK,
always way larger than a 512-byte disk block). Simplistically, it
checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there. If so, the
data can be tossed away and the checksum kept. The "file" as stored as
a collection of these checksums (imprecise term, but works for the
example) or a list of pointers to the single instance (hence the SIS
term can be overloaded here) of the data represented by that checksum.
A simplistic example would be storing a TB of zeros. Deduplicating
devices would store the first "block" of zeros, then find that all the
rest of them were the same checksum, same data and just store one more
pointer. That 1TB file becomes, say, one real instance of 512KB of
zeros (if that is the "block" size) plus the space for a few million
pointers to the same 512KB of data. Obviously, even this could be
compressed but that's another story.

Backing up the same system with few changes would be a very small full
backup. Backing up many instances of, say, the C drive of w2k3 systems
will deduplicate like crazy. Backing up a million different JPEGs
wouldn't save any appreciable space, but backing them up twice, or
multiple instances of the same JPEG, would.

[quote]LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.
[/quote]
But that's a horrible number for LTO3. Either your tapes aren't full or
something is broken. Look at the available_media report to get a good
idea of the range of data stored on your FULL tapes.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 10:00PM
[quote]Simplistically, it checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there.
[/quote]
To what hole do you refer? I see one in your simplistic example, but
not in what actually happens (which require a much longer technical
explanation).

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 10:01PM
On Mon, Sep 24, 2007 at 05:08:31PM -0400, bob944 wrote:
[quote]In the technologies I'm familiar with--one of them is old, another new,
it's conceptually simple. "The system," whether that's a standalone
system or a box of disk with some smarts or an agent on the backup
client, receives data and examines it in blocks of some size (AFAIK,
always way larger than a 512-byte disk block). Simplistically, it
checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there.
[/quote]
Yes, there's a hole there if that's all you're relying on. Not all of
them do that.

--
Darren Dunham ddunham < at > taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 24, 2007 10:44PM
There are no products in the market that rely solely on a checksum to
identify redundant data. There are a few that rely solely on a 160-bit
hash, which is significantly larger than a checksum (typically 12-16
bits). There are some who are concerned about hash collisions in this
scenario. I am not one of those people. Here is a quote from an
article I wrote. The entire article is available here:

http://tinyurl.com/2j7r52

<quote>
Hash collisions occur when two different chunks produce the same hash.
It's widely acknowledged in cryptographic circles that a determined
hacker could create two blocks of data that would have the same MD5
hash. If a hacker could do that, they might be able to create a fake
cryptographic signature. That's why many security experts are turning to
SHA-1. Its bigger key space makes it much more difficult for a hacker to
crack. However, at least one group has already been credited with
creating a hash collision with SHA-1.

The ability to forcibly create a hash collision means absolutely nothing
in the context of deduplication. What matters is the chance that two
random chunks would have a hash collision. With a 128-bit and 160-bit
key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
2160 with SHA-1. That's 1038 and 1048, respectively. If you assume that
there's less than a yottabyte (1 billion petabytes) of data on the
planet Earth, then the odds of a hash collision with two random chunks
are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the
number of bytes in the known computing universe.

Let's compare those odds with the odds of an unrecoverable read error on
a typical disk--approximately 1 in 100 trillion or 1014. Even worse odds
are data miscorrection, where error-correcting codes step in and believe
they have corrected an error, but miscorrect it instead. Those odds are
approximately 1 in 1021. So you have a 1 in 1021 chance of writing data
to disk, having the data written incorrectly and not even knowing it.
Everybody's OK with these numbers, so there's little reason to worry
about the 1 in 1048 chance of a SHA-1 hash collision.

If you want to talk about the odds of something bad happening and not
knowing it, keep using tape. Everyone who has worked with tape for any
length of time has experienced a tape drive writing something that it
then couldn't read. Compare that to successful deduplication disk
restores. According to Avamar Technologies Inc. (recently acquired by
EMC Corp.), none of its customers has ever had a failed restore. Hash
collisions are a nonissue.
</quote>

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of A Darren
Dunham
Sent: Monday, September 24, 2007 5:59 PM
To: Veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

On Mon, Sep 24, 2007 at 05:08:31PM -0400, bob944 wrote:
[quote]In the technologies I'm familiar with--one of them is old, another
[/quote]new,
[quote]it's conceptually simple. "The system," whether that's a standalone
system or a box of disk with some smarts or an agent on the backup
client, receives data and examines it in blocks of some size (AFAIK,
always way larger than a 512-byte disk block). Simplistically, it
checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there.
[/quote]
Yes, there's a hole there if that's all you're relying on. Not all of
them do that.

--
Darren Dunham ddunham < at > taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 25, 2007 01:46AM
I'm not convinced that writing to a DataDomain is going to be faster than
writing to multiple LTO-3 drives over a SAN. The DD is limited to about
90MB/sec which is on par with 1-2 LTO-3 drives and not much more than that.
Unless, of course, you consider adding extra DD units for every 2 LTO-3
drives you currently have and that's going to bump your costs up even higher
(which might be offset by the requirement for a Decru FC520 encrypting
appliance for every 2-3 LTO-3 drives today).

I don't think that NetBackup 6.5 includes de-duplication. It's provided by
PureDisk which is a separately licensed product. With 6.5.1, you'll be able
to use PureDisk as a storage unit, something that's not there yet today.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:ewilts < at > ewilts.org

[quote]-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] On Behalf Of Clem Kruger
Sent: Monday, September 24, 2007 11:32 AM
To: dave.markham < at > fjserv.net; Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Hi Dave,

Yes it is a difficult decision I have looked at DataDomain with
NetBackup. I have found that the backups are faster and there is a vast
amount of disk being saved.

NetBackup 6.5 includes de-duplication and I have become a great friend
of it. To use the words of a supplier, "Saving me Time, Saving me Space
and Saving me Money" :)

Kind Regards,
Clem Kruger

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Dave
Markham
Sent: 24 September 2007 17:35 PM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on
increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of
increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)
[/quote]
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 25, 2007 05:26AM
I'm not convinced either. Although our numbers are a little different,
you and I end up roughly at the same place. There are a number of
vendors whose de-dupe targets top out at about 200-300 MB/s, which is
roughly the speed of 2-3 LTO-3 drives, depending on how well you use
them. If you need more than that, you need to buy another box. (BTW,
Data Domain's numbers have increased to about 200 MB/s.)

These numbers work just fine when we're talking backups via the LAN to
LAN-based backup servers. You're going to need at least two, possibly
three network-based backup servers to generate 200 MB/s. Assuming 70
MB/s or so per master/media server, you buy one de-dupe unit per three
master/media servers or so. You can scale pretty far that way. You
will need to make sure that backup A is always sent to de-dupe unit A,
and backup B is always sent to de-dupe unit B, and so on. (If you send
backup B to de-dupe unit A after initially sending it to de-dupe unit A,
its first backup will not get de-duped against anything, resulting in a
significant decrease in overall de-duplication ratio.) While you won't
get as big of a de-dupe ratio as you would if you could have a single
device that could do 1000s of MB/s, there is an argument to be made that
you won't get much de-dupe when de-duping the backups of server A
against those of server B -- unless they have similar data. So a very
large setup like this will require a bit of planning, but I think the
benefits outweigh the extra planning required.

Now, if you happen to have a SINGLE SAN media server that needs MORE
than 200 MB/s, then you're going to want a device that can handle that
level of throughput. This is going to be a pretty big server, BTW, as a
200 MB/s device can back up about 6 TB in 8 hours. And notice I said
SAN media server, not a regular media server, as a regular media server
isn't going to be able to generate more than 200 MB/s, as it's getting
its backups via IP. But a SAN media server is backing up its own data
locally, so it can go much faster. This also means you're really
looking at a SAN/block device, which means you're really looking at a
VTL. (Yes, I'm aware of the Puredisk storage unit around the corner. I
think you'll find it's not going after this part of the market.)

If you need this kind of throughput, there are a few products that are
advertising several hundred or thousands of MB/s within a single de-dupe
setup. These are the newer kids on the de-dupe block, of course, so
they're not going to have as many customer references as the vendors
that have been selling de-dupe as long. But from what I've seen,
they're worth a look.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Ed Wilts
Sent: Monday, September 24, 2007 9:44 PM
To: 'Clem Kruger'; dave.markham < at > fjserv.net; 'Jeff Lightner'
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

I'm not convinced that writing to a DataDomain is going to be faster
than
writing to multiple LTO-3 drives over a SAN. The DD is limited to about
90MB/sec which is on par with 1-2 LTO-3 drives and not much more than
that.
Unless, of course, you consider adding extra DD units for every 2 LTO-3
drives you currently have and that's going to bump your costs up even
higher
(which might be offset by the requirement for a Decru FC520 encrypting
appliance for every 2-3 LTO-3 drives today).

I don't think that NetBackup 6.5 includes de-duplication. It's provided
by
PureDisk which is a separately licensed product. With 6.5.1, you'll be
able
to use PureDisk as a storage unit, something that's not there yet today.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:ewilts < at > ewilts.org

[quote]-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto] On Behalf Of Clem Kruger
Sent: Monday, September 24, 2007 11:32 AM
To: dave.markham < at > fjserv.net; Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Hi Dave,

Yes it is a difficult decision I have looked at DataDomain with
NetBackup. I have found that the backups are faster and there is a
[/quote]vast
[quote]amount of disk being saved.

NetBackup 6.5 includes de-duplication and I have become a great friend
of it. To use the words of a supplier, "Saving me Time, Saving me
[/quote]Space
[quote]and Saving me Money" :)

Kind Regards,
Clem Kruger

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of Dave
Markham
Sent: 24 September 2007 17:35 PM
To: Jeff Lightner
Cc: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain
[/quote]i
[quote]think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on
increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being
[/quote]required
[quote]to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on
[/quote]100Meg
[quote]lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of
increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)
[/quote]
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 26, 2007 08:08AM
cpreston:
[quote][quote]Simplistically, it checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there.
[/quote]
To what hole do you refer?
[/quote]
The idea that N bits of data can unambiguously be represented by fewer
than N bits. Anyone who claims to the contrary might as well knock out
perpetual motion, antigravity and faster-than-light travel while they're
on a roll.

[quote]I see one in your simplistic example, but
not in what actually happens (which require a much longer technical
explanation).
[/quote]
Hence my introduction that began with "[s]implistically." But throw in
all the "much longer technical explanation" you like, any process which
compares a reduction-of-data to another reduction-of-data will sooner or
later return "foo" when what was originally stored was "bar."

cpreston:
[quote]There are no products in the market that rely solely on a checksum to
identify redundant data. There are a few that rely solely on
a 160-bit
hash, which is significantly larger than a checksum (typically 12-16
[/quote]
No importa. The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

[quote][...] The ability to forcibly create a hash collision means
absolutely nothing in the context of deduplication.
[/quote]
Of course it does. Most examples in the literature concern storing
crafted-data-pattern-A ("pay me one dollar") in order for the data to be
read later as something different ("pay me one million dollars"). It
can't have escaped your attention that every day, some yahoo crafts
another buffer-or-stack overflow exploit; some of them are brilliant.
The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

[quote]What matters is the chance that two
random chunks would have a hash collision. With a 128-bit and 160-bit
key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
2160 with SHA-1. That's 1038 and 1048, respectively. If you
[/quote]
Grasshopper, the wisdom is not in the numbers, it is in remembering that
HTML will not paste into ASCII well. But I suspect you mean "one in
2^128" or similar.

Those are impressive, and dare I guess, vendor-supplied, numbers. And
they're meaningless. We do not care about the odds that a particular
block "the quick brown fox jumps over the lazy dog"
checksums/hashes/fingerprints to the same value as another particular
block "now is the time for all good men to come to the aid of their
party." Of _course_ that will be astronomically unlikely, and with
sufficient hand-waving (to quote your article: the odds of a hash
collision with two random chunks are roughly
1,461,501,637,330,900,000,000,000,000 times greater than the number of
bytes in the known computing universe") these totally meaningless
numbers can seem important.

They're not. What _is_ important? To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint. I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database. "Probably, it won't happen" is not acceptable.

[quote]Let's compare those odds with the odds of an unrecoverable
read error on a typical disk--approximately 1 in 100 trillion
[/quote]
Bogus comparison. In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data from
foo to bar on every read with no indication that it isn't the data that
was written.

[quote]If you want to talk about the odds of something bad happening and not
knowing it, keep using tape. Everyone who has worked with tape for any
length of time has experienced a tape drive writing something that it
then couldn't read.
[/quote]
That's not news, and why we've been making copies of data for, oh, 50
years or so.

[quote]Compare that to successful deduplication disk
restores. According to Avamar Technologies Inc. (recently acquired by
EMC Corp.), none of its customers has ever had a failed restore.
[/quote]
Now _there's_ an unbiased source.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 26, 2007 12:06PM
Just a teensy point - LTO3 tapes should store 400Gb natively. They're
marketed as having a capacity up to 800Gb, but that's with 2:1
compression. We normally get about 550GB for MRI data.

LTO4 are available with 800Gb native capacity. The drives can also
encrypt data.

Dave Markham wrote:
[quote]Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.

Im now confused what to propose :)

[/quote]
--
Do you want a picture of your brain - volunteer for a brain scan!
http://www.fil.ion.ucl.ac.uk/Volunteers/

Computer systems go wrong - even backup systems
Be paranoid!

Chris Freemantle, Data Manager
Wellcome Trust Centre for Neuroimaging
+44 (0)207 833 7496
www.fil.ion.ucl.ac.uk
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 26, 2007 03:01PM
Most of this while well documented seems to boil down to the same
alarmist notion that had people trying to ban cell phones in gas
stations. The possibility that something untoward COULD happen does NOT
mean it WILL happen. To date I don't know of a single gas pump
explosion or car fire that was traced to cell phone usage at the pump.
Oddly enough though no one monitors gas pumps to be sure users aren't
re-entering their vehicles and fires HAVE been traced to static
electricity caused by that.

If odds are so important it seems it would be important to worry about
the odds that your data center, your offsite storage location and your
Disaster Recovery site will all be taken out at the same time.

I also suggest the argument is flawed because it seems to imply that
only the cksum is stored and no actual the data - it is original
compressed data AND the cksum that result in the restore - not the cksum
alone.

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto] On Behalf Of bob944
Sent: Wednesday, September 26, 2007 4:03 AM
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

cpreston:
[quote][quote]Simplistically, it checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there.
[/quote]
To what hole do you refer?
[/quote]
The idea that N bits of data can unambiguously be represented by fewer
than N bits. Anyone who claims to the contrary might as well knock out
perpetual motion, antigravity and faster-than-light travel while they're
on a roll.

[quote]I see one in your simplistic example, but
not in what actually happens (which require a much longer technical
explanation).
[/quote]
Hence my introduction that began with "[s]implistically." But throw in
all the "much longer technical explanation" you like, any process which
compares a reduction-of-data to another reduction-of-data will sooner or
later return "foo" when what was originally stored was "bar."

cpreston:
[quote]There are no products in the market that rely solely on a checksum to
identify redundant data. There are a few that rely solely on
a 160-bit
hash, which is significantly larger than a checksum (typically 12-16
[/quote]
No importa. The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

[quote][...] The ability to forcibly create a hash collision means
absolutely nothing in the context of deduplication.
[/quote]
Of course it does. Most examples in the literature concern storing
crafted-data-pattern-A ("pay me one dollar") in order for the data to be
read later as something different ("pay me one million dollars"). It
can't have escaped your attention that every day, some yahoo crafts
another buffer-or-stack overflow exploit; some of them are brilliant.
The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

[quote]What matters is the chance that two
random chunks would have a hash collision. With a 128-bit and 160-bit
key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
2160 with SHA-1. That's 1038 and 1048, respectively. If you
[/quote]
Grasshopper, the wisdom is not in the numbers, it is in remembering that
HTML will not paste into ASCII well. But I suspect you mean "one in
2^128" or similar.

Those are impressive, and dare I guess, vendor-supplied, numbers. And
they're meaningless. We do not care about the odds that a particular
block "the quick brown fox jumps over the lazy dog"
checksums/hashes/fingerprints to the same value as another particular
block "now is the time for all good men to come to the aid of their
party." Of _course_ that will be astronomically unlikely, and with
sufficient hand-waving (to quote your article: the odds of a hash
collision with two random chunks are roughly
1,461,501,637,330,900,000,000,000,000 times greater than the number of
bytes in the known computing universe") these totally meaningless
numbers can seem important.

They're not. What _is_ important? To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint. I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database. "Probably, it won't happen" is not acceptable.

[quote]Let's compare those odds with the odds of an unrecoverable
read error on a typical disk--approximately 1 in 100 trillion
[/quote]
Bogus comparison. In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data from
foo to bar on every read with no indication that it isn't the data that
was written.

[quote]If you want to talk about the odds of something bad happening and not
knowing it, keep using tape. Everyone who has worked with tape for any
length of time has experienced a tape drive writing something that it
then couldn't read.
[/quote]
That's not news, and why we've been making copies of data for, oh, 50
years or so.

[quote]Compare that to successful deduplication disk
restores. According to Avamar Technologies Inc. (recently acquired by
EMC Corp.), none of its customers has ever had a failed restore.
[/quote]
Now _there's_ an unbiased source.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
----------------------------------

CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.

----------------------------------

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 26, 2007 03:08PM
Pls read my other post about the odds of this happening. With a decent
key space, the odds of a hash collision with a 160=bit key space are so
small that any statistician would call them zero. 1 in 2^160. Do you
know how big that number is? It's a whole lot bigger than it looks.
And those odds are significantly better than the odds that you would
write a bad block of data to a regular disk drive and never know it.

---
W. Curtis Preston
Backup Blog < at > www.backupcentral.com
VP Data Protection, GlassHouse Technologies

-----Original Message-----
From: bob944 [mailto]
Sent: Wednesday, September 26, 2007 4:03 AM
To: veritas-bu < at > mailman.eng.auburn.edu
Cc: Curtis Preston
Subject: RE: [Veritas-bu] Tapeless backup environments?

cpreston:
[quote][quote]Simplistically, it checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there.
[/quote]
To what hole do you refer?
[/quote]
The idea that N bits of data can unambiguously be represented by fewer
than N bits. Anyone who claims to the contrary might as well knock out
perpetual motion, antigravity and faster-than-light travel while they're
on a roll.

[quote]I see one in your simplistic example, but
not in what actually happens (which require a much longer technical
explanation).
[/quote]
Hence my introduction that began with "[s]implistically." But throw in
all the "much longer technical explanation" you like, any process which
compares a reduction-of-data to another reduction-of-data will sooner or
later return "foo" when what was originally stored was "bar."

cpreston:
[quote]There are no products in the market that rely solely on a checksum to
identify redundant data. There are a few that rely solely on
a 160-bit
hash, which is significantly larger than a checksum (typically 12-16
[/quote]
No importa. The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

[quote][...] The ability to forcibly create a hash collision means
absolutely nothing in the context of deduplication.
[/quote]
Of course it does. Most examples in the literature concern storing
crafted-data-pattern-A ("pay me one dollar") in order for the data to be
read later as something different ("pay me one million dollars"). It
can't have escaped your attention that every day, some yahoo crafts
another buffer-or-stack overflow exploit; some of them are brilliant.
The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

[quote]What matters is the chance that two
random chunks would have a hash collision. With a 128-bit and 160-bit
key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
2160 with SHA-1. That's 1038 and 1048, respectively. If you
[/quote]
Grasshopper, the wisdom is not in the numbers, it is in remembering that
HTML will not paste into ASCII well. But I suspect you mean "one in
2^128" or similar.

Those are impressive, and dare I guess, vendor-supplied, numbers. And
they're meaningless. We do not care about the odds that a particular
block "the quick brown fox jumps over the lazy dog"
checksums/hashes/fingerprints to the same value as another particular
block "now is the time for all good men to come to the aid of their
party." Of _course_ that will be astronomically unlikely, and with
sufficient hand-waving (to quote your article: the odds of a hash
collision with two random chunks are roughly
1,461,501,637,330,900,000,000,000,000 times greater than the number of
bytes in the known computing universe") these totally meaningless
numbers can seem important.

They're not. What _is_ important? To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint. I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database. "Probably, it won't happen" is not acceptable.

[quote]Let's compare those odds with the odds of an unrecoverable
read error on a typical disk--approximately 1 in 100 trillion
[/quote]
Bogus comparison. In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data from
foo to bar on every read with no indication that it isn't the data that
was written.

[quote]If you want to talk about the odds of something bad happening and not
knowing it, keep using tape. Everyone who has worked with tape for any
length of time has experienced a tape drive writing something that it
then couldn't read.
[/quote]
That's not news, and why we've been making copies of data for, oh, 50
years or so.

[quote]Compare that to successful deduplication disk
restores. According to Avamar Technologies Inc. (recently acquired by
EMC Corp.), none of its customers has ever had a failed restore.
[/quote]
Now _there's_ an unbiased source.

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 26, 2007 03:25PM
It's interesting that the probability of any 2 randomly selected hashs
being the same is quoted, rather than the probability that at least 2
out of a whole group are the same. That's probably because the minutely
small chance becomes rather bigger when you consider many hashs. This
will still be small, but I suspect not as reassuringly small.

To illustrate this consider the 'birthday paradox'. How many people do
you need in a room to have at least a 50% chance that 2 of them have the
same birthday? The chance of any 2 randomly chosen people sharing the
same birthday is 1/365 (neglecting leap years). Thats quite small, so we
need a lot of people to get a 50% chance, right? Wrong. You need 23
people. Google for 'birthday paradox' for the simple maths.

For our data I would certainly not use de-duping, even if it did work
well on image data.

bob944 wrote:
[quote]cpreston:
[quote][quote]Simplistically, it checksums the "block" and looks in a table of
checksums-of-"blocks"-that-it-already-stores to see if the identical
<ahem, anyone see a hole here?> data already lives there.
[/quote]To what hole do you refer?
[/quote]
The idea that N bits of data can unambiguously be represented by fewer
than N bits. Anyone who claims to the contrary might as well knock out
perpetual motion, antigravity and faster-than-light travel while they're
on a roll.

[quote]I see one in your simplistic example, but
not in what actually happens (which require a much longer technical
explanation).
[/quote]
Hence my introduction that began with "[s]implistically." But throw in
all the "much longer technical explanation" you like, any process which
compares a reduction-of-data to another reduction-of-data will sooner or
later return "foo" when what was originally stored was "bar."

cpreston:
[quote]There are no products in the market that rely solely on a checksum to
identify redundant data. There are a few that rely solely on
a 160-bit
hash, which is significantly larger than a checksum (typically 12-16
[/quote]
No importa. The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

[quote][...] The ability to forcibly create a hash collision means
absolutely nothing in the context of deduplication.
[/quote]
Of course it does. Most examples in the literature concern storing
crafted-data-pattern-A ("pay me one dollar") in order for the data to be
read later as something different ("pay me one million dollars"). It
can't have escaped your attention that every day, some yahoo crafts
another buffer-or-stack overflow exploit; some of them are brilliant.
The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

[quote]What matters is the chance that two
random chunks would have a hash collision. With a 128-bit and 160-bit
key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
2160 with SHA-1. That's 1038 and 1048, respectively. If you
[/quote]
Grasshopper, the wisdom is not in the numbers, it is in remembering that
HTML will not paste into ASCII well. But I suspect you mean "one in
2^128" or similar.

Those are impressive, and dare I guess, vendor-supplied, numbers. And
they're meaningless. We do not care about the odds that a particular
block "the quick brown fox jumps over the lazy dog"
checksums/hashes/fingerprints to the same value as another particular
block "now is the time for all good men to come to the aid of their
party." Of _course_ that will be astronomically unlikely, and with
sufficient hand-waving (to quote your article: the odds of a hash
collision with two random chunks are roughly
1,461,501,637,330,900,000,000,000,000 times greater than the number of
bytes in the known computing universe") these totally meaningless
numbers can seem important.

They're not. What _is_ important? To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint. I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database. "Probably, it won't happen" is not acceptable.

[quote]Let's compare those odds with the odds of an unrecoverable
read error on a typical disk--approximately 1 in 100 trillion
[/quote]
Bogus comparison. In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data from
foo to bar on every read with no indication that it isn't the data that
was written.

[quote]If you want to talk about the odds of something bad happening and not
knowing it, keep using tape. Everyone who has worked with tape for any
length of time has experienced a tape drive writing something that it
then couldn't read.
[/quote]
That's not news, and why we've been making copies of data for, oh, 50
years or so.

[quote]Compare that to successful deduplication disk
restores. According to Avamar Technologies Inc. (recently acquired by
EMC Corp.), none of its customers has ever had a failed restore.
[/quote]
Now _there's_ an unbiased source.

[/quote]
--
Do you want a picture of your brain - volunteer for a brain scan!
http://www.fil.ion.ucl.ac.uk/Volunteers/

Computer systems go wrong - even backup systems
Be paranoid!

Chris Freemantle, Data Manager
Wellcome Trust Centre for Neuroimaging
+44 (0)207 833 7496
www.fil.ion.ucl.ac.uk
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 26, 2007 03:41PM
On Wed, Sep 26, 2007 at 04:02:49AM -0400, bob944 wrote:
[quote]Bogus comparison. In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data from
foo to bar on every read with no indication that it isn't the data that
was written.
[/quote]
While I find the "compare only based on hash" a bit annoying for other
reasons, the argument above doesn't convince me.

Disks, controllers, and yes RAID arrays can fail silently in all sorts
of ways by either acknowledging a write that is not done, writing to the
wrong location, reading from the wrong location, or reading blocks where
only some of the data came from the correct location. Most RAID systems
do not verify data on read to protect against silent data errors on the
storage, only against obvious failures.

--
Darren Dunham ddunham < at > taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Tapeless backup environments?
September 26, 2007 04:04PM
On Wed, Sep 26, 2007 at 09:58:12AM -0400, Jeff Lightner wrote:
[quote]I also suggest the argument is flawed because it seems to imply that
only the cksum is stored and no actual the data - it is original
compressed data AND the cksum that result in the restore - not the cksum
alone.
[/quote]
It's not that the actual data isn't stored, it's whether or not the
actual data is checked. Some algorithms search through the hash space,
and if a hit comes up, they assume that the previously stored data is a
match without a comparison.

The original data must always be stored. Even if it were possible to
run a hash algorithm in reverse quickly, there would be no way to
determine which of various possible input strings was the original.

--
Darren Dunham ddunham < at > taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Sorry, only registered users may post in this forum.

Click here to login