Views |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
BackupPC
Most open-source solutions are not designed to solve this particular set of problems. Some don’t support disconnected or Windows DHCP clients. Others don’t allow users to schedule their own backups or restores. Finally, most of these solutions require installing additional software on the machine to be backed up, creating challenges with machines that are not centrally administered, such as Linux systems or Mac laptops. This chapter was written by Don Harper, Global Linux Engineer for JP Morgan/Chase. Don is a second-generation computer professional (his father was Lead Systems Analyst for Shell Oil in the 70s) and has been making his living working with Unix systems since 1987, from work in startups to large multinationals. He can be reached at donald.m.harper@gmail.com.
[edit] BackupPC FeaturesBackupPC is an entirely disk-based backup and recovery system. It offers a number of advantages, some of which are available only with BackupPC.
[edit] How BackupPC WorksThe BackupPC model has one user per client. This fits the usage pattern of the type of environment it was specifically designed for: backing up several users’ PCs (hence the name). This should typically be the user who owns the data on the machine. In the case of a large file server, it should be an administrator. BackupPC emails the owner if it cannot back up the client after a configurable time, and the owner can control restores using the web interface. The following list describes how BackupPC works.
[edit] Installation How-toThe BackupPC server runs on Unix or Linux under the Apache web server using mod_perl, which are not difficult requirements. The BackupPC server is not designed to run under Windows since Windows filesystems do not readily support Unix-style hard links. Prerequisites dictate how the backups are performed. Table 5-1 lists the high-level requirements and suggestions for tools to meet the requirements. Table 5-1. BackupPC requirements
The amount of disk space needed depends on the amount and type of the data to be backed up. The more diverse the data, the more disk space is needed. A RAID subsystem is not required, but would be a good idea for performance, data protection and scalability reasons. The server doesn’t need anything more than a 1.5 GHz processor with at least 512 MB of RAM. Since much of the backup computation is compression, more memory and a faster processor is needed for large numbers of clients. Of course, faster disk technologies and clean networks also improve performance. [edit] Security versus Ease of UseAs is often the case, there is a trade-off between ease of use and security. For a home installation, ease of setup is usually a higher priority than security. The opposite may be true for a division of a global company. The default configuration has the BackupPC service running as its own user on the server machine and using pre-shared ssh keys without passwords to connect to the clients as root, or SMB using a password in the case of Windows. This may not fit with local site security policies. Other options include setting up an rsync server on the client using stored passwords or a two-step process of ssh to a low-privileged client account followed by sudo with a configuration that allows only rsync (or tar) to be executed. All of the BackupPC processes on the server run as one user ID. This user ID should have limited privileges on the system. The install process ensures the chosen user ID has the correct permissions on the data storage areas. If a new user ID is used, setting up the web interface requires a few extra steps. If the existing ID of the web server is used, then there are fewer setup steps, but the system will be less secure. If you use a new user ID, create it before starting the install process. [edit] Basic SizingThe amount of space needed is highly dependent on the number and type of clients being backed up. The more homogeneous the clients, the more effective the disk pooling is, and the less disk you need. The more diverse the clients, the less effective the disk pooling is, and more space is needed. Additionally, if the data is relatively static, less space is needed for the incrementals. With a lot of data change, the incrementals are larger. The retention policy also affects storage requirements; more full backups and longer retention times naturally increase storage requirements. [edit] What NIS+?A client called me after they had some problems with NIS+. The manager there (engineer, not IT) wanted to clean up unwanted files on his /var partition and found some handy “log” files to delete. (He removed the transaction logs for NIS+ on the NIS+ master.) NIS+ immediately recognized the fact that its transaction logs were no longer there and stopped. When I got onsite, I asked the backup administrator for the backup tapes. They did weekly fulls and no incrementals. I found that all four weeks of tapes were incomplete—they’d been getting errors on backup for more than six weeks and hadn’t done anything about it. To determine the amount of space needed for backups, add up the disk usage of each client, then multiply by the number of full backups configured. The resulting number is the amount of space required for the full backups (prior to pooling and compression). Next, estimate what percentage of data will change for each incremental. Multiply this by the total amount of data, then by the number of incrementals configured. Add this to the number for the full backups. Compression and disk pooling reduce storage requirements by a factor of 6 to 8, so divide the total by this factor. To provide headroom as client storage grows and as more clients are added, you might want to start with 2 to 3 times this amount of storage. For example, consider 100 laptops backing up user data that averages 4 GB per client, with each incremental averaging about 0.4 GB. Storing three weekly full backups takes around 1200 GB, and six incremental backups takes another 240 GB for a total of 1440 GB of raw data. Because of pooling and compression approximately 180–240 GB of storage is needed. To support growth of the user data or adding more clients, 500 GB or greater capacity should be sufficient for current and future needs. Because of the use of hard links to compactly store identical files, the entire data store must be on a single filesystem. Using a RAID array or LVM setup allows this filesystem to be expanded over time as needed. [edit] Installing BackupPCOnce the server is identified and the prerequisites are installed and verified to work, it is time to get down to business. Head over to http://backuppc.sourceforge.net/ and download the latest tar ball. Do not use the beta unless there is a specific need for it. If there is a patch file, download it as well. Move the tar ball into a working directory and unpack it. Change directory to the newly created directory. If you have a patch file to apply, do that now: $ patch -p0 < [path to patchfile] Something went wrong if you see an error message like Hunk #1 FAILED at 58 and you should verify the patch file version matches the distribution you downloaded. If you are still unsuccessful, search or contact the mailing list for help (see “The BackupPC Community” at the end of this chapter for details). The next logical step is to read the README file for any details that may have changed for this release. In addition, there is up-to-date and complete documentation in the doc/ subdirectory. Read through the BackupPC.html file and note the specific requirements listed in the installation section. Generally, a few perl modules need to be installed depending on how the system is configured. Now, run the following command as root: # perl ./configure.pl This process inspects the system and asks some installation questions. These include: Verify that the list of programs have been fully qualified. If a program is not found but is not planned for use, it is safe to ignore it. If all the needed programs are listed, press Y; otherwise, press Ctrl-C to stop the installation if there is a critical program missing. Fix the problem, and rerun the configuration script. At this point, most of the questions should be answered. The script asks whether you want to continue before actually modifying the system. Answering y here allows the script to create the necessary directory structure and installs the scripts to run. As with any install script, watch the output for any errors. Several init scripts are updated with settings based on the configuration responses but are not automatically installed. These scripts are located in the init.d subdirectory where the configure script was run. Copy the correct one into the correct location for starting the service on boot. Note that starting in version 3.0 of BackupPC, the configure.pl script complies with the Filesystem Hierarchy Standard (FHS). One change is that all the configuration files are by default stored below /etc rather than below the data store. As described earlier, any of the default locations can be changed when running the configure script. If you are upgrading for the first time to version 3.0 or later, the configure.pl script continues to use the original locations for the data store, configuration files, and program executables. [edit] Installation packagesAn alternative to the manual installation procedure described here is to find and install a BackupPC package specific to your operating system. Packages exist for Debian, Ubuntu, Gentoo, and others. For example, BackupPC can be installed on Debian with # apt-get install backuppc [edit] Starting BackupPCThe BackupPC server is started by using the init.d script created as part of the installation. This means it is automatically started after a reboot. [edit] Using the CGI interfaceBy default, the CGI interface should be accessible via the URL http://localhost/cgi-bin/BackupPC/BackupPC_Admin. Depending upon your apache setup you might need to create an htaccess file for user authentication. [edit] Configuration FilesStarting in version 3.0 the host and configuration settings can be edited using the CGI interface. You can also configure the server by manually editing the configuration and host files. To do this, change directory to the data directory defined during installation. Then change directory into the conf directory. In version 3.0 and later, the default configuration directory is /etc/BackupPC/conf. In the conf directory are two files that must be edited before BackupPC is usable. The first file is the hosts file. This file contains all the hosts that the server will back up. The format of the file is: host dhcp user moreUsers where host is the hostname of the client, dhcp is set to 0 if the machine can be found via normal name lookups, or 1 if the service needs to look in the DHCP pool, user is the name/email of the primary owner of that machine, and moreUsers is a comma-separated list of users who are able to access this host via the web GUI. The file config.pl in this directory is the master config file. As the name implies, this is a Perl file, and all variables are set using Perl syntax, which allows arrays to be used for values. This file defines the number of backups running ($Conf{MaxBackups}), the number of full backups to hold ($Conf{FullKeepCnt}), and how long between full backups ($Conf{FullPeriod}). Read this file; it is well commented and includes many settings that you may want to change. [edit] Per Client ConfigurationIt is very easy to override the default settings per client. Under the data directory, there is a pc directory with one directory per client. To have custom client settings, simply create a new config.pl in the appropriate directory with the settings needed. As of version 3.0, per-client configuration files are stored below /etc/BackupPC/pc. The per-client configuration files are used to specify different transfer settings, passwords, exclude files, and number of backups to keep. For example, the main configuration file might specify the XferMethod as smb for the Windows client machines while the per-client configuration files for the Linux machines override XferMethod to tar or rsync. Almost any setting can be overridden in the per-client configuration file. The exceptions are any settings dealing with the server itself, such as the wakeup schedule. [edit] The BackupPC CommunityHow big is BackupPC’s installed base? Given the nature of an open-source project, it is very hard to give exact figures, but here are some for an overview. From September 2001 until late February 2006, there were over 87,000 downloads of the core product from SourceForge. These downloads don’t include installations via packages for standard Linux distributions such as Debian. In that time, there have been over 4 million visitors to the SourceForge project page. The project ranks in SourceForge’s top 500 projects (out of over 110,000 projects). On Freshmeat, BackupPC has a user rating among the top 50 projects (out of over 40,000). Sites of all sizes are currently running BackupPC, from home users to small businesses, nongovernmental organizations (NGOs), schools, universities, corporate departments, and large companies. Some of the largest BackupPC installations include a large school district with 1,500 clients and a division of a large company with 4,000 clients, each of which involve multiple servers with several terabytes of storage. The BackupPC community is very responsive and helpful. There a few ways to join and become active in the BackupPC community, and there are many places to turn if you run into problems. The BackupPC web site at http://backuppc.sourceforge.net/ provides many helpful tools for the community. The documentation covers the configuration in depth and is kept current with any changes in the code. The SourceForge site also has a FAQ. If the answers cannot be found there, searching or posting to the mailing list normally solves any issue. The main mailing list is BackupPC@lists.sourceforge.net. The main developers are very active on the list, and they take time to help new folks out. The list itself is not high volume, but the quality of the discussion is very useful. The list is very forgiving of new users and works hard to help them out. The developers respond fairly promptly, and, if the issue is a bug, work with the bug reporter to get the issue solved. When posting to the mailing list, try to include as much information as possible, such as the OS of the server and the client, a copy of the config files, and the error sections from the logs. [edit] The Future of BackupPCFeatures are steadily being added to BackupPC. Recent additions include a full CGI-based configuration editor and improved internationalization support. Currently, the big development activity is around BackupPCd. This is a side project being worked on in tandem with the work being done on the main server. This is a client for the BackupPC server which will handle all the issues about dealing with the client. It will also implement its own transport protocol with the server. This protocol is being based on the rsync protocol, ensuring a reliable and efficient transport. While BackupPC will continue to support the existing transport mechanisms such as SMB, tar over ssh and rsync, using BackupPCd will allow ACLs and other file metadata to be backed up, avoiding the need to install cygwin on Windows machines for rsync, allowing a uniform backup protocol across different client operating systems, and providing better performance. Note: According to this post on the backuppc mailing list, the BackupPCd client software is no longer being developed. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
| This page was last modified 18:55, 6 May 2008. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||