One of the most popular questions I get concerning using Linux in a production environment is how to back the system up without having to resort to non-open source solutions.
Backing up Linux isn’t any different then backing up any of it closed source counterparts. The options range from simply writing a script to copy your most critical files to a CD to making a complete copy of the system with an imaging program such as ghost. These options are great if you are backing up an end user system, such as your laptop but, don’t really meet the requirements of a sound backup strategy when a production server is the concern.
To answer that question I thought I’d compare and contrast the two open source Linux backup software options I use day to day. Amanda and Bacula.
Amanda is the most popular open source backup solution for Linux. I have used it for years. What’s sets Amanda apart from most other backup systems is how it handles media. Most backup products just ask the user what they wish to save and start writing this data to the media selected without regard to the storage capacity of that media. What Amanda does, is when it is about to perform a backup, it queries all the selected systems in the backup job and ask how much storage each system is going to require for a full or incremental backup. It then looks at the size of the backup media and determines how many of each type of backup its going to make. This leads to very efficient use of the backup media. Within the configuration files you can specify the maximum amount of time Amanda should go before doing a full backup so that an excessive amount of time doesn’t pass. What this means, in the end, is that Amanda automatically adjusts to ensure that any restore requires as few tapes as possible.
Bacula shares many of the design goals of Amanda, even so, Bacula is very different to Amanda. First, it has a much more complicated architecture. Unlike Amanda, which employs a central controlling server and local clients, Bacula uses a central director program, client programs running on each machine to be backed up, and distributed storage servers. You can set up your backup strategy on one machine, which talks to other machines to obtain its backups and may then distribute these among a collection of storage servers to take advantage of local tape drives or disk space. Like Amanda, Bacula supports a large collection of different tape drives and libraries. Its more complex architecture supports a degree of parallelism, so a single director can organize many backups at the same time.
The second big difference is that Bacula doesn’t rely on any other backup program, having its own built-in backup format that can handle many different types of system. It should always be possible to restore a Bacula backup to a different system to that from which it originally came. A third difference is that Bacula is more like a conventional backup program in that it attempts to take full and incremental backups according to a predefined schedule. As the volumes to which it backs up data may be appended to by more than one job, it’s possible to have a tape volume per day, to which a number of full and incremental backups are performed.
Both Amanda and Bacula can back up systems and restore them later, which is what backup software is suppose to do. However, both packages have other features that may be useful to different people. Starting with Amanda, there’s integrated support for encryption. The encryption options available with Amanda not only include the ability to encrypt the backup media but, also extend to encrypting the transmission of the data over the network. This is important feature on all the network security surveys I have seen.
Amanda supports either private (symmetric) encryption, where the same key is used for encryption and decryption, or public key encryption. This is a powerful feature if you need to produce secure backups of sensitive information, such as social security numbers, credit card information, or medical records.
Amanda also has support for RAIT (Redundant Array of Inexpensive Tapes). RAIT is similar to RAID. One tape contains the checksum data which allows you to still recover the information if any one tape in the set fails. For example, on the Amanda website, it claims you can set up a three-drive RAIT system that will write two data streams and one checksum stream that will give you twice the capacity, twice the throughput, and the square of the failure rate (so a 1-in-100 rate becomes 1 in 10,000), since two tapes must fail to lose any data. I have never used this but, if you have to be 100% sure that you can recover the data set in the future this might be worth investigating.
Bacula has a host of complex features based on it’s more complex architecture. Unlike Amanda, it’s easier to set up archiving routines within your backup setup. One major advantage of Bacula is that you can use a scripting language to control how it works. With Python and Bacula you can program backup jobs with powerful results. One example of this might be using GPG to encrypt the backup before writing it to the backup media. So, even though Bacula doesn’t include encryption natively, with the addition of scripting to Bacula you can extend the system yourself.
Whether you choose Amanda or Bacula will depend greatly on your individual network setup. If you have a single server with a tape drive that you use to backup all the rest of your systems I would suggest Amanda. On the other hand, if you have multiple servers, each with there own tape drives, and a limited backup window, I would suggest Bacula. For those familiar with the commercial backup products, you might compare Amanda to Retrospect and Bacula to Veritas Backup Exec, to clarify things in your mind.