Views |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AMANDA
Amanda, the Advanced Maryland Automated Network Disk Archiver, is the most well-known open source backup software. Amanda was initially developed at University of Maryland in 1991 with the goal to protect files on a large numbers of client workstations with a single backup server. James da Silva was one of its original developers. The Amanda project was registered on SourceForge.net in 1999. Jean-Louis Martineau of the University of Montreal has been the gatekeeper and leader of Amanda development in recent years. Over the years more than 250 developers have contributed to the Amanda code and thousands of users provided testing and feedback resulting in a stable and robust package. In April of 2006 Amanda was estimated to be deployed at more than 20,000 sites worldwide. Originally Amanda was deployed in production mostly by universities, technical labs and research departments. Today with wide adoption of Linux in IT at large, Amanda is found in many other places, especially where focus is on applications deployed on a LAMP stack. Over the years Amanda has received multiple awards from users. For example, in 2005 it received Linux Journal Readers’ Choice Award for “Favorite Backup System”. Figure 4-1. Typical Amanda network. Amanda allows you to set up a single master backup server to back up multiple Linux, UNIX, Mac OS-X and Windows hosts to a very large selection of tape, disk, and optical devices including tape libraries, autochangers, optical jukeboxes, RAID arrays, NAS devices and many others. The following are a few real life examples of Amanda in production.
[edit] Real life examplesOne company uses 3 Amanda servers on CentOS in three countries to protect 30+ clients on Solaris, Linux and Windows. Different versions of Amanda have been in production for 9 years as of this writing. The total amount of protected data is more than 500 GB and data grows on average 8 GB per week. One of the sites performs backup to disk only, and the other two backup to both disk and LTO autoloaders. System administrators recover files at least once per week because users erase a file by accident. A few times over the years they lost servers because of failed hard drives, and Amanda came to the rescue for bare metal recovery. A major university in the UK has 2 Amanda servers on Fedora Core with 100+ Linux (Fedora Core, Red Hat Enterprise Linux), Mac OS-X and Solaris clients with more that 2 TB of data. One of the Amanda servers is dedicated to backup of SAP and Oracle on Solaris. A cinematographic post-production company has 3 Debian Amanda servers on 2 sites protecting 84 Linux and IRIX clients with total amount of data 26 TB. They recover files about twice per week due to user error. In three years of production they had three instances of total volume loss despite using RAID arrays, and Amanda was able to recover all three lost volumes. Throughout this chapter we will use more examples of real life Amanda implementations. Based on feedback from many Amanda users with a great variety of configurations and different level of Amanda expertise, we believe that the key reasons for wide adoption of Amanda are:
Amanda software has a source code tarball and has RPMs for most common versions of Linux and is available from http://www.zmanda.com. Additionally, source code is available from SourceForge.net at http://sourceforge.net/projects/amanda. Some older (but stable) versions of Amanda are packaged with all common Linux distributions, including Fedora Core, Red Hat Enterprise Server, Debian, Ubuntu, OpenSUSE, SUSE Linux Enterprise Server, including releases for Itanium, IBM p-Series and even IBM S/390 and z-Series mainframes. Amanda documentation including Quick Start Guide and FAQ, written by users for users is available at Amanda wiki at http://wiki.zmanda.com. To wrap up this introduction to Amanda, we want to share just one of many success stories where Amanda saved the day and made a difference. The story is told by long time Amanda user Jon LaBadie. In 1999 I began consulting for a small service organization within one of the US Government Departments. They used about 40 Windows PCs and 3 Sun servers, the latter running Oracle. For backups they used two separate commercial products and were unhappy with each. A fourth Sun server was already purchased and the tasks were being shifted around, including the UNIX backups. I was asked for suggestions for a replacement for their backup software before an additional copy was purchased and support contracts renewed. I did a bit of research and discovered Amanda. I installed it on my home systems, ran it for a week and suggested it to my management. But as was common in that time, management would not consider free software. Who would they get support from? What if something went wrong and it was discovered that free software was being used for such an important function as backup? How good can it be if it is free? Thanks, but no thanks. We’ll make the safe choice; pay our thousands of dollars for software we are not happy with, just because it is sold by a large company. So they migrated their backups to a different server with some difficulty. Meanwhile, without telling my management, I started a parallel backup system with Amanda using the oldest Sun server and a spare DAT drive. About a month later the crisis happened. A directory tree from several weeks earlier was needed. I was not involved in the recovery but I thought it was a good chance to compare recovery times from the two systems. About twenty minutes later I had used Amanda recovery to get what I thought they were seeking and copied it to a directory on their system under /var/tmp. From the other camp I heard much cursing and hair pulling all morning. In the afternoon I ended their torture and, pointing to the /var/tmp directory, asked “Is this what you need?” Later I learned the problem with the commercial backups was that the backup tapes were keyed to the backup server. Restores could only be made from the same server. The data they needed had been made on the previous backup server which now had neither installed software nor license. The backup tapes were basically worthless. Management then decided to give Amanda a try as their primary backup system. Eventually they also backed up the PCs using Amanda. The last time I checked, Amanda was still in use in that department. [edit] Summary of important featuresWe will start with brief overview of Amanda architecture. This will help in understanding most important concepts in Amanda functionality. [edit] Client server architecture using non-proprietary toolsAmanda is designed to handle large numbers of clients and data, yet is reasonably simple to install and maintain. As a matter of fact it takes more time to order a pizza than to configure an Amanda server with two Linux and one Windows client and to start a test backup. A white paper available at http://amanda.zmanda.com/quick-backup-setup.html provides detailed information about configuring Amanda backup in less than 15 minutes. Amanda scales well up and down, so small configurations are possible – even a single client. There are many users who back up just a single client that is also the Amanda server. On the other hand, many Amanda users backup hundreds and even thousands of file systems (there could be multiple file systems per protected system) to a large tape library with multiple drives. The Amanda code is written in C (with some Perl and Shell scripts) and the code is portable to any flavor of Linux and UNIX including Mac OS-X. Windows clients can be backed up today via Samba or via a Cygwin client, which is a Linux-like environment for Windows. The Amanda community is actively working on providing a native client for Windows. The new Windows client will take advantage of Microsoft technologies such as Volume Shadow Copy Service (VSS) that provides snapshots of a system’s volumes, including snapshots of open files. The biggest advantage of Amanda over any other backup software is that Amanda does not use any proprietary data formats. Amanda uses standard operating system utilities such as dump and tar, or open source utilities available in many operating systems such as GNUtar, smbtar and Schily tar, and uses the same archive format on the media. Depending on which one is the best match for your file systems, directories and files, you can mix and match these utilities as you wish. Since you use standard utilities, you can be confident that these utilities will always be available to you. Another advantage of using standard utilities is that in case of disaster recovery or any other emergency you can recover your data even without Amanda. (We will explain how to recover data without Amanda when we discuss Amanda restores.) Since Amanda uses standard utilities it provides the following:
From the system administrator perspective, it is very important that Amanda does not use any proprietary device drivers. Any device supported by an operating system works well with Amanda. In practical terms it means that Amanda supports a wide range of tape storage devices and new devices are usually not difficult to add. Many tape changers, stackers, jukeboxes and tape libraries are supported by using special tape changer scripts to provide truly hands-off and lights-out backup. Basically, if you can read and write to your tape drive and move tapes in your tape library with standard operating system commands such as mt, Amanda will work with your tape library. Since Amanda doesn’t use proprietary device drivers, one more benefit is you don’t have to worry about breaking support for a device when upgrading to the latest version of Amanda. To understand Amanda architecture and inner workings, let’s take a look at a simplified Amanda configuration and review an example of a backup cycle. Figure 4-2. Amanda server with two backup clients. To simplify our discussion let’s assume that we have only two Amanda clients that run on two workstations: workstation Copper running Solaris and workstation Iron running Linux. Each workstation has two file systems with users’ data that we want to protect. Amanda server Quartz is installed on a different Linux host and, for simplicity sake, we don’t backup the Amanda server itself. (In your production and evaluation environments you should always backup the Amanda server.) Let’s also assume that we want to run a full backup once every four days and incremental backups between full backups. Amanda is designed as a traditional client-server architecture. The Amanda server, also historically known as the tape host, is connected either directly or over the Storage Area Network to a tape drive or tape changer. Each client backup program is instructed to write to standard output, which Amanda collects and transmits to the tape server. The client-server architecture provides these benefits:
Considering the ever increasing importance of security for backup data from privacy and compliance perspective, let’s go over a brief overview of Amanda security. [edit] Amanda securityAmanda clients communicate with Amanda server via its own network protocols on top of TCP and UDP. The Amanda’s client-server communications do not suffer from the security holes inherent in the traditional rmt approach used by dump, such as using an .rhosts file in root’s home directory. As in every other client-server setup, you should ensure that only your own and trusted Amanda server is able to communicate with Amanda clients. Amanda achieves that by using the file .amandahosts. You can see that in Figure 2 there are three .amandahosts files, one on the Amanda server Quartz and one for each Amanda client. On the client side you have to add the name of the Amanda server (or Amanda servers if you prefer the same host to be protected by multiple Amanda servers) and the Amanda user that is allowed to backup the client. For example, the .amandahosts file for Linux client Iron in Figure 4-2 should have the following entry: quartz.zmanda.com amandabackup That tells the Amanda client Iron to let Amanda server Quartz to communicate with user amandabackup. During restores you need access to an Amanda server. For the configuration presented in Figure 4-2, the .amandahosts file on the tape server Quartz should have the following entries: iron.zmanda.com root copper.zmanda.com root These entries tell Amanda server to allow the root user on each client to run restores. For security reasons Amanda was designed to allow only the root user to restore data. For stronger data transport security, Amanda can also use OpenSSH. This allows Amanda to protect the transfer of data between clients and backup server with strong authentication and authorization mechanisms. The current 2.5 version of Amanda also features an abstracted secure communication API that enables developers to easily add different communication plugins between backup server and client. A single backup server Amanda can use different communication mechanisms for different clients. To protect data on the backup media itself, Amanda 2.5 provides the ability to encrypt backup data with symmetric or asymmetric encryption algorithms (using either aespipe or gpg). Encryption is very expensive in terms of CPU utilization, which is why the Amanda encryption can be done either on the server or the client. (Use it wherever you have more CPU cycles available.) In addition to relieving the Amanda server CPU, client site encryption also ensures security of data on a wire, which could be important for backing up remote clients. Because of CPU constraints you might choose to encrypt only certain data. Amanda is flexible enough to configure data encryption for a single directory or even for a single file. If aespipe and gpg don’t match your encryption requirements, Amanda will work with your custom encryption utilities. Amanda does not manage encryption keys. A system administrator should take care to safeguard the keys and make them available during restore. Amanda works with Security-Enhanced Linux (SELinux) and it also works reasonably well with common types of firewalls between Amanda server and clients as long as you select UDP and TCP port ranges during the initial setup. Please check installation and configuration details for firewall setup at http://wiki.zmanda.com. To conclude this brief overview of Amanda security, we want to emphasize that the flexibility of the security configurations allows Amanda to fit well into security policies and processes of most IT environments including the organizations with strict security requirements. [edit] Holding diskYou might recall that Amanda is actually an acronym, and ‘D’ in Amanda is for Disk. To explain how Amanda moves data from client to its final destination on tape or disk we will introduce a very important Amanda concept: the holding disk. In Figure 4-2 you can see that Amanda server Quartz has a holding disk attached. A holding disk is one or several directories on any file system that is accessible from the Amanda server. It could be as small as a single 10 GB directory on your Amanda server drive, or as large as 5-10 TB on a Fibre-attached RAID array. As the name suggests, the holding disk is used as a cache to store backup data from all Amanda clients. Each set of backup data from a client file system or a client directory is just a bunch of files on the holding disk. Later, an independent Amanda process flushes individual backup images from holding disk to tape or virtual tape at maximum throughput possible to keep tape drive streaming. Using holding disk as a staging area for backups has several benefits: Modern tape drives are very fast. Even Gigabit network can not feed backup data from a single client through the Amanda server to modern tape drives fast enough to avoid shoe-shining, which reduces throughput and shortens life of media and the drive. The holding disk collects data from all clients and as soon as the first backup is complete, and starts feeding data to tape as fast as the Amanda server can push it. However, many users prefer to complete backup of all clients before they start flushing data to tape. Shoe-shining is covered in more detail in Chapter 21. A holding disk can accept data streams from multiple clients in parallel to overcome the sequential nature of a tape. Instead of writing one backup to tape after another, you can configure multiple backups running in parallel and make full use of your available network bandwidth, thus reducing total backup time. If the network becomes your bottleneck for performance, you can reduce total backup time by adding another NIC to your backup server or dedicating a separate network for backups. The use of multiple holding disks can also improve overall backup performance. Lastly, using a holding disk provides additional safety in case you have a bad or wrong tape -- or no available tape at all. Your backup will be complete even if you forget to insert a new tape before taking the day off. It also provides a backup when there is media error during backup run or there is no sufficient space for backup. Amanda supports different algorithms to move the data from the holding disk to the media. Of course, your chosen algorithm will impact the effective use of the tape. Amanda supports multiple holding disks so that backup images from different clients can be sent to different holding disks. This increases the scalability of Amanda and provides better load balancing for I/O since holding disks could be on different controllers. Often times new Amanda users ask how large the holding disk should be. Since for a typical “full and incrementals” backup cycle, most backups are small incrementals, even a modest amount of holding disk space can provide better flow of backup images to a tape. A good rule of thumb is that there should be enough holding disk space for the two largest backup images at the same time, so one image can be coming into the holding disk while the other is being written to tape. For example if in Figure 4-2 the full backup of both file systems for Copper is 50 GB and the full backup of both file systems for Iron is 30 GB the optimal capacity of holding disk on Quartz should be at least 80 GB. If that is not practical, any amount that holds at least a few of the smaller incremental backups is better than no holding disk at all. With today’s low disk prices a good sized holding disk is well worth the investment. On the other hand, some Amanda users have significantly larger capacities for their holding disks. For example, a very large Japanese manufacturing company has 4 Amanda servers running on Solaris and BSD protecting more than a hundred Amanda clients on BSD, Windows, Linux, HP-UX and Solaris running Oracle. One of their holding disks is on a RAID array with total capacity of 4 TB. Fast arrays and Amanda servers with high I/O allow streaming throughput from holding disk to tapes at approximately 120 MB/second. The flexibility of Amanda allows configurations without a holding disk, but then backups can be written to tape only sequentially instead of in parallel to the holding disk. Obviously, the lack of holding disk will significantly reduce backup performance. If the holding disk is for temporary storage of backup files, how does Amanda decide what to send to the holding disk in the first place? Let’s take a look at Amanda unique way to schedule backups. [edit] Backup schedulingMost backup products provide basically the same backup scheduling. The system administrator configures software to perform a full backup on Sunday, every other Sunday, or the last day of the month, with different levels of incrementals between full backups. The biggest problem with this approach is that it does not provide any load balancing. You have to make sure that enough resources are available to manage peak demand for backup server CPU, network, and I/O during full backups. Since you perform full backups only once in a while, your resources are under-utilized most of the time. More often than anybody wants to admit, the system administrator finds out on Monday morning that Sunday’s full backup did not complete because there were not enough tapes available in a library. Other Mondays you might find that your full backups are still running and users are calling you to kill all backups. Of course, you can figure out yourself how to achieve load balancing by instructing your backup software to distribute full backups among your clients throughout the week or month, but then you have to make sure that no changes in your environment; new clients break down your balancing schema. Amanda provides a unique approach to scheduling that optimizes load balancing of backups and simplifies your life. Instead of giving Amanda the exact instruction “Do a full backup every Sunday for clients A, B, and C and full backups on Wednesday for clients D, E, and F, and incrementals all other times” you just set up a few ground rules that control Amanda scheduling. For example, you might give Amanda the rule “Do at least one full backup within a 7 day period, and do incrementals all other days with a maximum time between full backups of 7 days”. The maximum time between full backups is called the dump cycle. For any dump cycle specified by you, Amanda finds an optimal combination of full and incremental backups from all clients to make the total amount of backup data per backup run as small as possible and consistent from one backup run to another. To find such a balance Amanda uses the following considerations:
To calculate the optimal backup level, Amanda starts every backup run with the estimate phase. Every Amanda client runs a special process to determine which files have changed and what is the total size of all changed files. The estimate phase can take some time, especially if there are many clients and file systems. If some file systems are not very dynamic and files don’t change much, you can tell that to Amanda, saving time during estimate phase. After collecting data from all clients, Amanda goes into the planning phase and calculates the optimal combination of full and incremental backups for all clients. Let’s take a look at how Amanda will schedule backups for clients in Figure 4-2, assuming that each home directory is 100 GB, the data change rate is 15%, and the dump cycle is 4 days. For simplicity, let’s assume that Amanda writes each backup run to a new tape labeled DailySet1 to DailySet4 and that all incrementals are level 1 (level 0 is usually defined as a full backup) meaning everything that changed since the last full backup. Figure 4-3. Illustration of Amanda scheduling For each run Amanda schedules a full backup for exactly 1/dump cycle of the total amount of data. Since our dump cycle is 4 days, for DailySet1 Amanda will do the full backup for ¼ of the data, in this case /home1. For DailySet2 Amanda will do a full backup for another ¼ of data, in this case /home2, and an incremental backup for /home1 which is 15 GB (15% of 100 GB). For DailySet3 Amanda will do a full backup of /home3 and incrementals for /home1 and /home2. After the initial startup period of 4 days, Amanda will run a full backup for one of the /home directories and incremental backups for all the others. Let’s calculate what would be the total amount of data on each DailySet tape.
Table 4-1. Illustration of Amanda scheduling – total amount of data per DailySet tape. It is trivial to calculate the total amount of data for DailySet1 and DailySet2. For the third run for backup of /home1 we have to consider that 15% of data that was backed up on DailySet2 which was 15% of 100 GB changed again. See Figure 4-4 for an illustration of this. Figure 4-4. Clarification to total amount of data per tape discussion. To avoid double counting we have to subtract that small overlap area from 30 GB. So for DailySet3 the size of the incremental for /home1 will be 30 GB - (15 GB x 15%) = 27.75 GB. Following the same logic, for DailySet4 the incremental for /home1 is not 45GB. It’s 45GB - (27.75GB x 15%) = 40.84 GB. This example is only an illustration for explaining Amanda’s approach to scheduling. In reality Amanda uses all nine levels of incremental backups to optimize total amount of data on tape. In addition to a traditional schema with full backups and incrementals in between, Amanda also supports:
It’s easy to support multiple configurations on the same Amanda server, such as doing traditional full backups and incrementals on a weekly basis, and also doing additional monthly full backups for off-site storage. Multiple configurations can run simultaneously on the same tape server if there are multiple tape drives. When you decide on the length of your own dump cycle, you should take into consideration that shorter dump cycles such as 3-4 days make restores easier because there are fewer incrementals, but they use more tape and require more time to backup. Longer dump cycles allow Amanda spread the load better over multiple tapes but may require more steps during a restore. More information about how to choose a reasonably balanced dump cycle depending on amount of data, tape drive capacity, etc is available at http://wiki.zmanda.com Let’s take a look at Amanda tape management. [edit] Tape ManagementEach tape should be labeled before use by Amanda command amlabel. There is a default template for labels, but you could define your own label templates. Labeling prevents overwriting of tapes with valid backup images and allows Amanda server to keep track of all tapes that were labeled. At the present time Amanda starts a new tape for each backup run, for example, each nightly backup, and does not provide a mechanism to append a new run to the same tape as a previous run. Based on your backup retention policy, Amanda keeps track of the expiration date for each labeled tape and Amanda will re-use that tape for new backups after it has expired. However, you can configure Amanda not to re-use specific tapes. You might choose to never expire some backup images and use Amanda for creating archives. (Amanda’ recent support for optical media becomes very useful for archiving.) For backup of large amounts of data Amanda supports using multiple tapes in a single backup run, for example backups from clients A, B, and C could go be written on one tape and backups from clients E, F, and G could be written to another tape. In the past Amanda could not span multiple tapes for a single backup image and system administrators had to break large file systems into smaller chunks, e.g. into several directories. This is no longer the case starting with version 2.5. Amanda can span multiple tapes. That alleviates a significant limitation and is a major step forward in terms of scalability and simplicity of its use. The size of the backed up images is no longer restricted to a single tape and there is no need for the system administrator to artificially segment data into parts which can fit into a single tape. Perhaps the most awaited Amanda feature is tape spanning. This feature is available as of Amanda 2.5. [edit] Device managementWe already mentioned that Amanda does not use any proprietary drivers for tape or optical devices. You have to make sure your tape devices are configured as non-rewinding devices (e.g. /dev/nst0, /dev/nst1). You also have to select the tapetype definition specific to your tape drive technology. There are many default tape definitions provided with Amanda. Here is an example of tape type definition for LTO-3: define tapetype LTO3-400-HWC {
comment "LTO Ultrium 3 400/800, compression on"
length 401408 mbytes
filemark 0 kbytes
speed 74343 kps
}
(Amanda does not use the length of the tape value. It tries to write to the tape until it gets an error.) You will have to select a tape changer script for your tape changer. Examples of tape definition for most commonly used tape drives and details about configuring tape drives and tape changer scripts are available at http://wiki.zmanda.com. For a long time Amanda has provided the ability to use disk as the target media for backup. Dedicated directories are used as virtual tapes called vtapes. You work with vtapes exactly the same way as you work with real tapes. For example you have to label vtapes before they can be used by Amanda. There are several usage scenarios for vtapes:
A most interesting scenario is the use of tapes and disk at the same time. Amanda provides an interesting functionality called RAIT, which stands for “Redundant Array of Inexpensive Tapes”. Initially RAIT was designed to increase redundancy. This is the same technology as RAID where data is striped over several disks. Amanda supports RAIT with 2, 3 and 5-tape sets. A 3-drive RAIT will write 2 data streams and one parity stream, and give you twice the capacity, twice the throughput, and the square of the failure rate (for example, a 1/100 failure rate becomes 1/10,000, since you might loose data only if two tapes are faulty or not available). Similarly, a 5-drive RAIT set will give you 4 times the capacity and 4 times the throughput. A 2-drive RAIT duplicates the output stream and each output stream can have either the same or different media targets. If you have the same media targets, for example, 2 tape drives you get the exact copies of your backup data called clones. You can keep one clone on-site for occasional restores and take another clone off-site for disaster recovery. If you have different media targets, than you can keep your backup data on disk for 2-3 weeks for occasional restores. For long term retention you have a copy on tape. Most restores happen within 10 days after a file has been lost and the ability to restore data quickly from disk becomes very important. Since you already understand all important Amanda concepts, let’s take a look at how to configure Amanda backup. [edit] Configuring AmandaDetailed instructions how to install and configure Amanda client and server are available from http://wiki.zmanda.com. Here we want to provide the configuration roadmap. The preferred way to install Amanda is from the RPMs found at http://www.zmanda.com. To compile Amanda client from source:
To install the Amanda server, you can also use RPMs. If you want to compile from source:
While one machine can be both a client and a server, there is no need to perform both of the above procedures; installing the server normally includes the client. Figure 4-5. Amanda configuration files. The most important file for configuring your Amanda setup is amanda.conf. The example file is quite large with more than 700 lines (and that is why we don’t provide an example here, see details at http://wiki.zmanda.com), but self-explanatory with easy to follow comments and examples. That file defines HOW you do your backups by configuring the following parameters:
Instructions about what to backup are provided in the disklist file. For example, to back up the /home directories for clients Iron and Copper in Figure 4-5 we will need the following Disk List Entries, often referred as DLEs: # hostname diskname dumptype Copper /home1 stable Copper /home2 stable Iron /home3 normal Iron /home4 normal The word dumptype in the disk list entry refers to a dumptype that should be defined in the amanda.conf file. Dumptypes specify backup related parameters, such as whether to compress the backups, whether to record backup results in /etc/dumpdates, the disk’s relative priority, exclude lists, etc. Here are the sample definitions for the dumpcycles stable and normal that we used for Copper and Iron entries in disklist file above: define dumptype normal {
comment “gnutar backup”
holdingdisk yes # (on by default)
index yes
program “GNUTAR”
priority medium
}
define dumptype stable {
comment “ufsdump backup”
holdingdisk yes # (on by default)
index yes
program “DUMP”
priority medium
}
Many parameters in amanda.conf have default values that you don’t have to edit, but because all parameters are available to your for editing you have full control over your backup environment. A new Amanda user should plan a learning curve of about 2-4 weeks before having a full production backup. It does not mean that a novice user will spend the whole month studying the Amanda wiki and reading the source code. As a matter of fact it takes less than 15 minutes to configure an Amanda server with two Linux and one Windows client and to start a test backup. A white paper available at http://amanda.zmanda.com/quick-backup-setup.html provides detailed information about the Start Amanda backup in 15 minutes benchmark. However, you should plan to allocate some time to get comfortable with Amanda functionality and to test your restores several times before going into production. For large sites it is a good idea to add one or two clients every day until all clients are protected by Amanda. So far we discussed the most typical situation with an Amanda client configured on the system to be protected. However, there are various scenarios in which a system administrator may decide to mount a file system via NFS or Samba on the Amanda server, and have the Amanda client running on the same system (the Amanda server) backup these networked file systems. [edit] We All Make MistakesI had just started using Amanda to backup the main server at a small web shop. I was feeling overconfident and wanted to listen to CDs on the server, so I tried to connect the audio cable of the CD player to the sound card of the machine -- without shutting down. I got the CD connected but happened to bump the SCSI cable between the three RAID5 disks and the RAID controller. This was the backup server, so when the RAID card refused to play and the server wouldn't boot, I was looking at a bare metal recovery of the backup server. Doh! Fortunately the folks who built the server, VALinux, were able to put me in touch with an engineer of theirs who knew the voodoo to tell the RAID card to ignore the fact that disks were in an unknown state, and to just bring them back online. I've since had similar situations with a different RAID card that refused to play due to power loss during boot. That time I had to drive a very long way just to get the floppy with the RAID software and docs on it. Moral 1: Yes, even you can make mistakes. Moral 2: Keep your RAID docs and media with your RAID card. [edit] Backup up clients via NFS or Samba (SMB/CIFS)Comparing to the traditional approach of using an Amanda client on the system to be protected, there are several advantages of backing up via NFS or Samba:
However, while considering this approach you should be aware of some trade-offs:
[edit] Backing up via NFSFigure 4-6. Configuration issues with NFS based backup You need to install and configure NFS server on the target system, and a NFS client on the Amanda server. At this point, export the file systems to be backed up (by listing them in the /etc/exports file of the client system). You need to make sure that the Amanda server can access all the files that are needed to be backed up. In many cases this means turning on the no_root_squash option on the NFS share that is being backed up - so the Amanda server can access all files. Note that the hostname in the corresponding Disk List Entry will be the system where NFS share is being mounted (not the client system), for example in Figure 4-6 it would be Quartz. [edit] Backing up via CIFSFigure 4-7. Configuration issues while backing up a Windows based system using Samba. You need to install the Samba client on the Amanda server. You don’t have to explicitly mount the remote file system. Amanda is well integrated with the smbclient utility (ftp-like client to access SMB/CIFS resources on servers). It uses the -T option of the smbclient utility to create tar compatible backups of all the files on an SMB/CIFS share. Amanda will clear the archive bit of the files (on the Windows based target) it backs up, hence enabling the incremental backup process. A user must be created on the Windows system with full access rights (read/write) to the share, for an example in Figure 7 it would be user amandabackup. Amanda will connect to the share via this user. If the user does not have full access, incremental backups will not work and the whole share will be backed up every time (because the archive bits are never reset). Note that if any other program on the Windows system goes and resets the archive bit of a file, Amanda will not backup that file during an incremental backup. Other than the standard Amanda configuration, you need to create the file /etc/amandapass on the system where the smbclient utility is run. This file contains authentication information to access specific Windows shares. Also note that the hostname in the corresponding Disk List Entry is the system where smbclient is run, and not the Windows system being backed up, for example in Figure 7 it would be Amanda server Quartz. We want to reiterate that many Amanda installations protect Windows servers and PCs in production. For example, the Radiology Department at a large Mid West University is using Amanda since 1999. In the past they had their Amanda server running on IRIX, AIX, and Solaris, but the current Amanda server runs on Linux with indices replicated to another server. They backup more than 70 Linux, Solaris, IRIX, Mac OS-X and Windows clients with total amount of backup data around 4 TB. The holding disk is 1.4 TB and the dump cycle is 90 days. All Windows clients are protected via Samba. Several times per months they recover files because of user error or hard drive failures and they never lost data because Amanda was always able to recover lost files. That brings us to a brief overview of Amanda recovery. [edit] Amanda recoveryamrecover and amrestore are two programs to restore Amanda backups. amrecover restores files by using an interface that allows browsing of your backup file index to a certain date and choosing files you need to restore. Of course, in order to use amrecover you should enable indexing of backup files when you specify dumptype in amanda.conf. After you make your selection of files, Amanda finds the required tape, looks for the backup image, decompresses the image if required, brings the image over the network to the client and pipes it into the appropriate restore program with the arguments needed to extract the requested files. In case you have to restore your files from incremental backups, Amanda will instruct you about correct order of tapes you need. For security amrecover must run as root on the client and you should list root as the remote user in .amandahosts on Amanda server. Full file system recovery should be done with amrestore which retrieves the whole file system images from tape. amrecover can be done on any client including Amanda server. amrestore can be done only on the Amanda server. You have to use amrestore when you don’t have backup index. If your backup policy specifies backup of everything including the operating system, you can do bare metal recoveries with Amanda:
The Amanda tape format is deliberately simple so in case of emergency, restoring data could be done without any Amanda tools. The first tape file is a volume label with the tape Volume Serial Number and date it was written. It is not in ANSI format, but is plain text. Each file after that contains one image using 32 KB blocks. The first block is an Amanda header with client, area and options used to create the image. As with the volume label, the header is not in ANSI format, but is plain text. The image follows, starting at the next tape block, until end of file. Since the image header is text, it may be viewed with: # mt rewind # mt fsf NN # dd if=$TAPE bs=32k count=1 In addition to describing the image, it contains text showing the commands needed to do a restore. Here’s a typical entry for /home2 file system on iron.zmanda.com. It is a level 1 dump done without compression using Solaris ufsdump program: AMANDA: FILE 20060418 copper.zmanda.com /home2 lev 1 comp N program /usr/sbin/ufsdump To restore, position the tape at start of file and run: # dd if=$TAPE bs=32k skip=1 | /usr/sbin/ufsrestore -f... - To retrieve an image with standard UNIX utilities if amrestore is not available, position the tape to the image, then use dd to read it: # mt rewind # mt fsf NN # dd if=$TAPE bs=32k skip=1 of=dump_image The skip=1 option tells dd to skip over the AMANDA file header. Without the of= option, dd writes the image to standard output, which can be piped to the decompression program, if needed, and then to the client restore program. If RAIT is used as the media, a shell script using the commands dd and mt must be used to restore data from the tapes without using Amanda commands. As with any backup system, you should test and retest your restore procedures so you know exactly what to do when disaster strikes. At this point we explained the most important core functionality of Amanda. However, Amanda is a mature and feature rich product that provides more functionality than we can explain in one chapter. For in-depth information about Amanda monitoring, reporting, self-checking, encryption and many other features please use resources described below. Community and support options. Amanda is the only open source backup software with enterprise support – available from Zmanda, Inc. (http://www.zmanda.com). Support for Amanda is sold as a subscription service (very much along the lines of subscriptions from Red Hat and MySQL). Zmanda also offers indemnification to select buyers of its Amanda Enterprise Edition subscription from any intellectual property infringement issues. In addition, professional services are available from Zmanda and several other organizations for installing and configuring Amanda. Amanda documentation written by users for users is available at the Amanda Wiki at http://wiki.zmanda.com. Ease of remote editing by multiple users, an on-going archival of changes and search capability are key features of this Wiki. Amanda community uses various collaboration tools including Amanda forums at http://forums.zmanda.com. Amanda users also have a very friendly mailing list at amanda-users@amanda.org with archives available at http://archives.zmanda.com/ and http://groups.yahoo.com/group/amanda-users. [edit] Future plansOne of the main challenges in IT today is the overall security of systems. Since security is such a fundamental part of backup (especially when people lose un-encrypted tapes), the Amanda community plans to continue hardening all aspects of security with Amanda. There is a fundamental shift in backup industry with disk becoming the primary media for backups. Even though Amanda has been designed for backup to disk from the very beginning, the Amanda team plans many backup to disk improvements, such as providing multiple simultaneous backups and restores from disk. Many Amanda users have a constant battle with overwhelming data growth. Amanda has to be up to the task and we are working on increasing scalability and performance. Wide adoption of open source products (especially Linux) brings Amanda to production environments with Oracle, MySQL, SAP and many other applications. There are many users who successfully deploy Amanda in such demanding environments, and Amanda team is working on an application API that will simplify backup of those applications. Amanda has always strived to simplify the life of a system administrator and we will continue to work on the simplification of installation, administration and recovery while giving the system administrator the full control of how you want to do your backups. As you can see from this short list, the development of Amanda continues toward addressing of real-world requirements of real people. The Amanda development team and the Amanda community will further maintain and enhance this powerful and well-known software suite. This chapter was written by Dmitri Joukovski and Stefan G. Weichinger. Since so many great technical writers before us wrote excellent articles about Amanda, we want to give due credit to John R. Jackson, Alexandre Oliva, Æleen Frisch, Paul Bijnens, and many others who contributed to the wealth of published knowledge about Amanda. Many of their ideas have made it into this chapter. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| This page was last modified 17:28, 24 May 2008. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||