I like another approach as well. Rather than over worrying the restore of a
server, start to fixate on restoring applications and data. For instance,
in most shops, the most critical application has something to do with money:
money in or money out. So the finance app is most critical. If you focus
on the data and app for finance, and make sure that is optimized for
disaster restore, you can probably shorten the time to making this app
functional again. Take your average large server: a couple of hundred gigs
maybe. How big can the finance app and data be? 20 - 50 GB? Restoring
20 - 50 GB is an easy problem.
Proceed like this covering all the mission critical apps. The nice thing
about apps like these is they tend to somewhat naturally collocated on the
primary and copy storage pool tapes due to the way they are backed up: shut
'em down and back 'em up. Nothing special required to ensure data is on
just one tape.
The downside of this approach to the rapid restore approach is you must have
the *SM server to do the restore.
When faced with the prospect of eating a whole elephant, the task can appear
daunting. However, if you eat him a little at a time, you won't get a
stomach ache. Divide and conquer folks.
Kelly J. Lipp
Storage Solutions Specialists, Inc.
PO Box 51313
Colorado Springs CO 80919
lipp < at > storsol.com
From: ADSM: Dist Stor Manager [mailto:ADSM-L < at > VM.MARIST.EDU]On Behalf Of
sal Salak Juraj
Sent: Thursday, February 03, 2000 4:26 AM
To: ADSM-L < at > VM.MARIST.EDU
Subject: AW: Designing a solution for fast manual restore
I am coping with similar issues
and untill now I have not found
an 100% solution.
However, I am playing in mind with
the idea of using the TSM4s 3.7
new feature for this purpose:
rapid recover, lan free.
(once I have TSM 3.7)
On the paper it looks like following:
I would regulary generate
off-site rapid recovery tapes
for selected important data
from selected importanant nodes.
It is very similar to your regular
full-backup approach, but with advantages:
- the adsm db will not grow (each rapid recovery
backup tape is treated as single object in the DB)
- the regular "full backup" is cheaper as
both data source and target is ADSM Server,
the clients & network are not involved at all
- the restore, assuming you attach
local tape devices to your nodes, is much quicker,
as the tapes will be probybly streaming
and there is no network involved
- the restore would not depend on ADSM server at all,
so after disaster I can first restore
important file server HW first and only
after that the ADSM server
- "your" regular full backups generate
regulary new unnecessary
file versions in ADSM, eventually pushing
necessary, older ones away.
"My" regular full backup does not alter
storage pools at all.
- for several reasons you still
will need the common off-site
storage pool in addition to the new approach,
so your backup effort/costs
- you will need to plan for a couple of locally
attached tape drives after disaster
- this tape drives will probably not be of 3590 type,
so you will have to introduce another tape technology,
like DLT, in addition to your existing one
Gewerbepark Urfahr 14 - 16
e-mail: sal < at > keba.co.at
P.S. I have a question about your process:
You create tapes in an storage pool in an tape library,
and during disaster recovery you restore
from this tapes using singe tape drives.
How do you manage this?
As far as I now one cannot move tapes among storage pools
and one cannot mix libraries and single drives in one
device class => storage pool?
One even cannot combine two identical libraries together!
Von: Eric Winters [mailto:ewinters < at > AU1.IBM.COM]
Gesendet am: Dienstag, 1. Februar 2000 05:18
An: ADSM-L < at > VM.MARIST.EDU
Betreff: Designing a solution for fast manual restore
Dear ADSM community,
I'd be interested in hearing how others are handling a similar
requirements to the ones we have or if you have any good ideas.
The implementation has a 3494 library and critical data resides on almost
three hundred and fifty 3590 cartridges which are part of a collocated
(node level) offsite storage pool. In the event of disaster, the file
systems would be restored via 4 external 3590 tape drives. This is of couse
an extremely manual process. The objective is to minimise the manual
cartridge loading/unloading as much as possible within the constraints of
not having a library. To this end, the original system architect decided to
do a full backup once a week (thereby essentially collocating filesystem
data) and incrementals daily thereafter. Cartridges are moved offsite
daily. A disaster recovery test worked well.
This has worked up to now but the volume of data being backed up is
increasing quickly - the full node backups are causing the recovery log to
fill, trigger db backups and recently this has also lead to ADSM crashing.
How do others prepare for similar disaster recover scenarios?
ie no automated cartridge handling and a requirement for fast restores of
large volumes of data.
I have considered collocation at the filesystem level coupled with
incremental backups, but I understand this would quickly lead to all
scratch tapes in the library being defined even though they might contain
only small filesystems. I cannot allow this to happen as scratch tapes are
required for other operations. I understand strictly speaking I could
define volumes to my offsite storage pool with collocation enabled at the
filesystem level but this is not practical - there are so many volumes for
this storage pool being checked in and out, I would be sure to run out of
My initial reaction was one of horror when I saw that a full backup was
being performed weekly. I had always assumed ADSM would be set to perform
incrementals ad infinitum. I'd be interested in hearing if other locations
also run full backups too and what approaches others are taking to meet
Thanks for any input,
IBM Global Services Australia