Views

BackupPC

This Wiki is brought to you by Backup Central, where you can find the Mr. Backup Blog, Forums, and a mailing list for each forum!

Backup FAQs Service Providers Backup Software Backup Hardware Backup Book Wiki Free Stuff Miscellaneous


Most small businesses and home networks now have a mix of machines to back up: desktops, servers, and laptops, often with a variety of operating systems. To this challenge, add the fact that they do not have tape robots or the money for a high-end tape solution. Now throw in the problem of laptops, which by design are mobile and may not be on the local LAN in the middle of the night when backups typically run.

Most open-source solutions are not designed to solve this particular set of problems. Some don’t support disconnected or Windows DHCP clients. Others don’t allow users to schedule their own backups or restores. Finally, most of these solutions require installing additional software on the machine to be backed up, creating challenges with machines that are not centrally administered, such as Linux systems or Mac laptops.

This chapter was written by Don Harper, Global Linux Engineer for JP Morgan/Chase. Don is a second-generation computer professional (his father was Lead Systems Analyst for Shell Oil in the 70s) and has been making his living working with Unix systems since 1987, from work in startups to large multinationals. He can be reached at donald.m.harper@gmail.com.

Contents

BackupPC Features

BackupPC is an entirely disk-based backup and recovery system. It offers a number of advantages, some of which are available only with BackupPC.

Support for any client OS
By using standard tools that either come with the base distribution or can be easily added to the system, it is possible to support a wide range of clients. In addition, there is no need to install any client software beyond standard system utilities (tar, ssh, rsync). Adding new client operating systems becomes easy, especially if the client is a Unix derivation like Mac OS X.
Web interface allows user control of and access to backups
Every major OS has a web browser, so using a web interface is another way to speed the process of supporting new operating systems. The web interface should be designed to give as much control to the client as possible and do it securely. A user should be able to request a restore without having to find the backup operator and easily browse and restore individual files. However, the user should not be able to see another user’s machine.
Support for DHCP and disconnected clients
Again, by using standard utilities, BackupPC supports DHCP clients as long as the client is registered with a name service such as DNS, Active Directory, or LDAP. The problem with hosts that roam from the network should be handled by testing for the client on the network and not raising an error until a set amount of time has passed.

How BackupPC Works

The BackupPC model has one user per client. This fits the usage pattern of the type of environment it was specifically designed for: backing up several users’ PCs (hence the name). This should typically be the user who owns the data on the machine. In the case of a large file server, it should be an administrator. BackupPC emails the owner if it cannot back up the client after a configurable time, and the owner can control restores using the web interface. The following list describes how BackupPC works.

Direct to disk

BackupPC stores all its backups directly on disk. Identical files across any directory or client are stored only once, which dramatically reduces the server storage requirements. These files are stored in a disk pool. In addition to the disk pool, the backups are in a directory tree organized by host, then by backup with hard links to the disk pool. BackupPC also has a nightly process to reclaim space from the disk pool that is no longer referenced by any backups. This helps keep the overall disk usage from growing out of bounds. This is an automatic process which the administrator does not have to configure.

Support for any client OS

The server portion of BackupPC is designed to run on a Unix-style system using perl and mod_perl under apache for best performance, but it can be run on any web server that supports perl and running perl CGIs. (It does require either mod_perl or setuid perl.) The server should have a large disk or RAID disk for backup storage. As for clients, almost any Unix or Unix-like OS can be easily backed up. Most modern versions of the commercial Unix variants (Solaris, AIX, IRIX, HP-UX) have tar, compress, gzip, rsync, rsh and/or ssh either in the base distribution or found easily on the Web. Other Unix operating systems (Linux, FreeBSD, OpenBSD, NetBSD, Mac OS X) also have these tools. Windows clients can be backed up in a few different ways. If the local policy prevents additional software being loaded, BackupPC can use part of the Samba suite (http://www.samba.org) to back up Server Message Block (SMB) shares on the client. If software can be installed locally, then rsync together with the Cygwin tool set (http://www.cygwin.com) can be used on the client.

Support for native tools

BackupPC uses standard Unix tools for its tasks. This includes programs like perl, tar, rsync, compress, gzip, bzip2, zip, apache, and samba. This makes porting the server to a new OS much smoother than trying to port C code. BackupPC does not use a database or catalog to store backup information. Instead, it uses the disk tree to store this information. This means that upgrading the operating system of the BackupPC server (or upgrading the BackupPC application itself) is painless.
User control of backup/restores through web interface
The web is the main interface for BackupPC. After the initial configuration, there is no need to have command-line access to the server to administer BackupPC. The web interface is written in perl and has been designed to run either under mod_perl or normal CGIs running with setuid perl. The interface allows users to log in and control on demand backup and restores. The user can request a one-time backup, a full backup, or an incremental backup. If the user needs to recover a file, there are a few options. Individual files can be downloaded simply by selecting them. Groups of files or directories can be restored back in place, or the user can download the files as a tar file or, if configured, as a zip file. The user has full control over which files or directories to restore and where to restore them. A history feature displays which files changed during each backup in each directory.

Support for DHCP and disconnected clients

Since BackupPC’s clients are referenced by hostname, if the network being backed up uses DHCP and has dynamic name resolution enabled, nothing further needs to be done for the BackupPC server to back up DHCP clients. If this is not the case and the clients are Windows machines, BackupPC can be configured to search an address pool for the clients, locating them via their smb hostname. If the client is not online during its normal backup period, the BackupPC server does not generate an error unless a set period of time has elapsed since the last successful backup. At this point, the server emails the owner of the client and remind him to ensure the machine is on the network for a backup. (The server can also email any errors to the administrator.) Clients that live on a remote LAN can be backed up locally assuming there is network connectivity between the sites. This means that clients connected via VPN can be backed up. If the user does not want to back up at that point, a trip to the web GUI can cancel the current backup. They can also optionally block out a set of time for no backups to permanently fix the issue. BackupPC uses ping’s round-trip time to determine whether a client is on a remote network and won’t back up the machine if the round-trip time is longer than a configurable setting.

Backup pooling

If many clients use the same OS, many duplicated files will be backed up. Keeping multiple full backups increases the number of duplicate files, which increases the storage requirements for the server. BackupPC stores a directory tree per client backup but checks to see whether each file has been stored before from any client. If it has, it then uses a hard link to point to the existing file in the common disk pool, saving a great deal of space. In addition, BackupPC can optionally use compression to save more space. For example, on a server with 9 clients, 8 Linux and one Windows 2000 machine, backing up only system configuration and user files, the server has 195G before pooling and compression backed up, but actually disk usage is below 40G. This is for two full backups and two weeks of daily backups per client. Pooling of common files and compression typically reduce the server’s disk storage requirements by factors of 6 to 8.

Easy per-client configuration

After the administrator has defined what the site backup policies should be, it is very easy for her to override any configuration option on a per client basis. This allows great flexibility on what, when, and how to back up a client. There are no classes of clients per se, but this can be achieved by symlinking configurations for clients from a master for the “class”.

Installation How-to

The BackupPC server runs on Unix or Linux under the Apache web server using mod_perl, which are not difficult requirements. The BackupPC server is not designed to run under Windows since Windows filesystems do not readily support Unix-style hard links.

Prerequisites dictate how the backups are performed. Table 5-1 lists the high-level requirements and suggestions for tools to meet the requirements.

Table 5-1. BackupPC requirements

FunctionSuggested toolNeeded forOther toolsNotes
HTTP serverApacheControl GUIAny CGI capable HTTP server x
CGIMod_perlSpeed x Optional
perlPerl 5.8Server written in Perl x x
tarGNU tarArchive files in a tar container for transferring to serverAny command-line tar programIf tar method used
rsyncGNU rsyncTransport from client to serverAny command-line rsync programIf rsync method used
smbclientSambaTransport from client to server x If SMB method used
sshOpenSSHTransport layer x If tar or rsync methods used

The amount of disk space needed depends on the amount and type of the data to be backed up. The more diverse the data, the more disk space is needed. A RAID subsystem is not required, but would be a good idea for performance, data protection and scalability reasons.

The server doesn’t need anything more than a 1.5 GHz processor with at least 512 MB of RAM. Since much of the backup computation is compression, more memory and a faster processor is needed for large numbers of clients. Of course, faster disk technologies and clean networks also improve performance.

Security versus Ease of Use

As is often the case, there is a trade-off between ease of use and security. For a home installation, ease of setup is usually a higher priority than security. The opposite may be true for a division of a global company.

The default configuration has the BackupPC service running as its own user on the server machine and using pre-shared ssh keys without passwords to connect to the clients as root, or SMB using a password in the case of Windows. This may not fit with local site security policies. Other options include setting up an rsync server on the client using stored passwords or a two-step process of ssh to a low-privileged client account followed by sudo with a configuration that allows only rsync (or tar) to be executed.

All of the BackupPC processes on the server run as one user ID. This user ID should have limited privileges on the system. The install process ensures the chosen user ID has the correct permissions on the data storage areas. If a new user ID is used, setting up the web interface requires a few extra steps. If the existing ID of the web server is used, then there are fewer setup steps, but the system will be less secure. If you use a new user ID, create it before starting the install process.

Basic Sizing

The amount of space needed is highly dependent on the number and type of clients being backed up. The more homogeneous the clients, the more effective the disk pooling is, and the less disk you need. The more diverse the clients, the less effective the disk pooling is, and more space is needed. Additionally, if the data is relatively static, less space is needed for the incrementals. With a lot of data change, the incrementals are larger.

The retention policy also affects storage requirements; more full backups and longer retention times naturally increase storage requirements.

What NIS+?

A client called me after they had some problems with NIS+. The manager there (engineer, not IT) wanted to clean up unwanted files on his /var partition and found some handy “log” files to delete. (He removed the transaction logs for NIS+ on the NIS+ master.) NIS+ immediately recognized the fact that its transaction logs were no longer there and stopped. When I got onsite, I asked the backup administrator for the backup tapes. They did weekly fulls and no incrementals. I found that all four weeks of tapes were incomplete—they’d been getting errors on backup for more than six weeks and hadn’t done anything about it.

To determine the amount of space needed for backups, add up the disk usage of each client, then multiply by the number of full backups configured. The resulting number is the amount of space required for the full backups (prior to pooling and compression). Next, estimate what percentage of data will change for each incremental. Multiply this by the total amount of data, then by the number of incrementals configured. Add this to the number for the full backups. Compression and disk pooling reduce storage requirements by a factor of 6 to 8, so divide the total by this factor. To provide headroom as client storage grows and as more clients are added, you might want to start with 2 to 3 times this amount of storage.

For example, consider 100 laptops backing up user data that averages 4 GB per client, with each incremental averaging about 0.4 GB. Storing three weekly full backups takes around 1200 GB, and six incremental backups takes another 240 GB for a total of 1440 GB of raw data. Because of pooling and compression approximately 180–240 GB of storage is needed. To support growth of the user data or adding more clients, 500 GB or greater capacity should be sufficient for current and future needs.

Because of the use of hard links to compactly store identical files, the entire data store must be on a single filesystem. Using a RAID array or LVM setup allows this filesystem to be expanded over time as needed.

Installing BackupPC

Once the server is identified and the prerequisites are installed and verified to work, it is time to get down to business. Head over to http://backuppc.sourceforge.net/ and download the latest tar ball. Do not use the beta unless there is a specific need for it. If there is a patch file, download it as well.

Move the tar ball into a working directory and unpack it. Change directory to the newly created directory. If you have a patch file to apply, do that now:

$ patch -p0 < [path to patchfile]

Something went wrong if you see an error message like Hunk #1 FAILED at 58 and you should verify the patch file version matches the distribution you downloaded. If you are still unsuccessful, search or contact the mailing list for help (see “The BackupPC Community” at the end of this chapter for details).

The next logical step is to read the README file for any details that may have changed for this release. In addition, there is up-to-date and complete documentation in the doc/ subdirectory. Read through the BackupPC.html file and note the specific requirements listed in the installation section. Generally, a few perl modules need to be installed depending on how the system is configured.

Now, run the following command as root:

# perl ./configure.pl

This process inspects the system and asks some installation questions. These include:

Full path to existing conf/config.pl

This is only used for upgrades. Press Enter for a new install.

Are these paths correct?

Verify that the list of programs have been fully qualified. If a program is not found but is not planned for use, it is safe to ignore it. If all the needed programs are listed, press Y; otherwise, press Ctrl-C to stop the installation if there is a critical program missing. Fix the problem, and rerun the configuration script.

BackupPC will run on host

The script guesses the hostname. Correct as needed.

BackupPC should run as user
This is the user ID that all of the BackupPC processes will run as on the server. Either create a user with no special privileges or choose one with limited privileges.

Install directory (full path)

This is where the BackupPC program and library files will be stored.

Data directory

This is where the data store will be located. This should be on its own partition and preferably on its own disk or RAID array.

Compression level

This the amount of compression used to store the backups. There is a trade-off between the amount of compression and speed. This value should be from 0 to 9 with 0 being no compression and 9 being the highest amount of compression and with the most CPU usage. The default, 3, is a good middle ground.

CGI bin directory

The full path to the web server’s CGI bin. Apache image directory This is the directory which will hold the image files for the web GUI. This should be a directory that the web server can display.

URL for image directory (omit http://host; starts with ‘/’)

This is what the last part of the URL needs to be to display something placed in the image directory.

At this point, most of the questions should be answered. The script asks whether you want to continue before actually modifying the system. Answering y here allows the script to create the necessary directory structure and installs the scripts to run. As with any install script, watch the output for any errors. Several init scripts are updated with settings based on the configuration responses but are not automatically installed. These scripts are located in the init.d subdirectory where the configure script was run. Copy the correct one into the correct location for starting the service on boot.

Note that starting in version 3.0 of BackupPC, the configure.pl script complies with the Filesystem Hierarchy Standard (FHS). One change is that all the configuration files are by default stored below /etc rather than below the data store. As described earlier, any of the default locations can be changed when running the configure script. If you are upgrading for the first time to version 3.0 or later, the configure.pl script continues to use the original locations for the data store, configuration files, and program executables.

Installation packages

An alternative to the manual installation procedure described here is to find and install a BackupPC package specific to your operating system. Packages exist for Debian, Ubuntu, Gentoo, and others. For example, BackupPC can be installed on Debian with

# apt-get install backuppc 

Starting BackupPC

The BackupPC server is started by using the init.d script created as part of the installation. This means it is automatically started after a reboot.

Using the CGI interface

By default, the CGI interface should be accessible via the URL http://localhost/cgi-bin/BackupPC/BackupPC_Admin. Depending upon your apache setup you might need to create an htaccess file for user authentication.

Configuration Files

Starting in version 3.0 the host and configuration settings can be edited using the CGI interface. You can also configure the server by manually editing the configuration and host files. To do this, change directory to the data directory defined during installation. Then change directory into the conf directory. In version 3.0 and later, the default configuration directory is /etc/BackupPC/conf. In the conf directory are two files that must be edited before BackupPC is usable. The first file is the hosts file. This file contains all the hosts that the server will back up. The format of the file is: host dhcp user moreUsers where host is the hostname of the client, dhcp is set to 0 if the machine can be found via normal name lookups, or 1 if the service needs to look in the DHCP pool, user is the name/email of the primary owner of that machine, and moreUsers is a comma-separated list of users who are able to access this host via the web GUI.

The file config.pl in this directory is the master config file. As the name implies, this is a Perl file, and all variables are set using Perl syntax, which allows arrays to be used for values. This file defines the number of backups running ($Conf{MaxBackups}), the number of full backups to hold ($Conf{FullKeepCnt}), and how long between full backups ($Conf{FullPeriod}). Read this file; it is well commented and includes many settings that you may want to change.

Per Client Configuration

It is very easy to override the default settings per client. Under the data directory, there is a pc directory with one directory per client. To have custom client settings, simply create a new config.pl in the appropriate directory with the settings needed. As of version 3.0, per-client configuration files are stored below /etc/BackupPC/pc.

The per-client configuration files are used to specify different transfer settings, passwords, exclude files, and number of backups to keep. For example, the main configuration file might specify the XferMethod as smb for the Windows client machines while the per-client configuration files for the Linux machines override XferMethod to tar or rsync. Almost any setting can be overridden in the per-client configuration file. The exceptions are any settings dealing with the server itself, such as the wakeup schedule.

The BackupPC Community

How big is BackupPC’s installed base? Given the nature of an open-source project, it is very hard to give exact figures, but here are some for an overview. From September 2001 until late February 2006, there were over 87,000 downloads of the core product from SourceForge. These downloads don’t include installations via packages for standard Linux distributions such as Debian. In that time, there have been over 4 million visitors to the SourceForge project page. The project ranks in SourceForge’s top 500 projects (out of over 110,000 projects). On Freshmeat, BackupPC has a user rating among the top 50 projects (out of over 40,000). Sites of all sizes are currently running BackupPC, from home users to small businesses, nongovernmental organizations (NGOs), schools, universities, corporate departments, and large companies. Some of the largest BackupPC installations include a large school district with 1,500 clients and a division of a large company with 4,000 clients, each of which involve multiple servers with several terabytes of storage.

The BackupPC community is very responsive and helpful. There a few ways to join and become active in the BackupPC community, and there are many places to turn if you run into problems.

The BackupPC web site at http://backuppc.sourceforge.net/ provides many helpful tools for the community. The documentation covers the configuration in depth and is kept current with any changes in the code. The SourceForge site also has a FAQ.

If the answers cannot be found there, searching or posting to the mailing list normally solves any issue. The main mailing list is BackupPC@lists.sourceforge.net. The main developers are very active on the list, and they take time to help new folks out. The list itself is not high volume, but the quality of the discussion is very useful. The list is very forgiving of new users and works hard to help them out.

The developers respond fairly promptly, and, if the issue is a bug, work with the bug reporter to get the issue solved. When posting to the mailing list, try to include as much information as possible, such as the OS of the server and the client, a copy of the config files, and the error sections from the logs.

The Future of BackupPC

Features are steadily being added to BackupPC. Recent additions include a full CGI-based configuration editor and improved internationalization support.

Currently, the big development activity is around BackupPCd. This is a side project being worked on in tandem with the work being done on the main server. This is a client for the BackupPC server which will handle all the issues about dealing with the client. It will also implement its own transport protocol with the server. This protocol is being based on the rsync protocol, ensuring a reliable and efficient transport. While BackupPC will continue to support the existing transport mechanisms such as SMB, tar over ssh and rsync, using BackupPCd will allow ACLs and other file metadata to be backed up, avoiding the need to install cygwin on Windows machines for rsync, allowing a uniform backup protocol across different client operating systems, and providing better performance.

Note: According to this post on the backuppc mailing list, the BackupPCd client software is no longer being developed.