Site home page
(news and notices)

Get alerts when Linktionary is updated

Book updates and addendums

Get info about the Encyclopedia of Networking and Telecommunicatons, 3rd edition (2001)

Download the electronic version of the Encyclopedia of Networking, 2nd edition (1996). It's free!

Contribute to this site

Electronic licensing info

 

 

Backup and Data Archiving
Expanded version: contains additional text not in the book

Related Entries    Web Links    New/Updated Information

  
Search Linktionary (powered by FreeFind)

Note: Many topics at this site are reduced versions of the text in "The Encyclopedia of Networking and Telecommunications." Search results will not be as extensive as a search of the book's CD-ROM.

It is essential to back up the data on servers and other data systems throughout your network. That is obvious. This section describes a number of ways you can perform backups, including copying data to magnetic tape or optical disks, or by copying or replicating information to other systems. Before getting started, take note of the following terminology:

  • A backup is a copy of online storage information that provides fault protection. An archive is a historical backup.


  • An online storage device is a high-performance magnetic disk that stores information users access most often. Nearline and offline storage devices are slower, secondary storage devices that provide backup services or archiving services.

  • Hierarchical file systems move little-used files or large image files from online storage to nearline storage systems such as optical disk, where they remain available to users. For more information, see "Storage Management Systems."

  • Tape backup systems are the traditional backup medium while optical disk systems provide archiving and nearline storage requirements.

  • Real-time backups take place at any time and must have a procedure for handling files that are open during backup. In most cases, the backup system tracks open files and returns to back them up later.

  • Disk mirroring is a real-time strategy that writes data to two or more disks at the same time. If one disk fails, the other continues to operate and provide access for users. Server mirroring provides the same functionality, except that an entire server is duplicated. This strategy allows users to continue accessing data if one of the servers fails. See "Fault Tolerance and High Availability" for additional information on these strategies.

  • Replication copies information to alternate servers on distributed networks to make that information more readily available to people in other locations. While replication is not necessarily a backup technique, replicated data on remote servers can be made available to local users should the server close to them go down.

  • Remote vaulting is an automatic backup technique that transmits data to alternate sites. The alternate sites can be more than just warehouses for backups. They may be entire data centers that can be brought online when the primary data center goes offline in the event of a major disaster.

Note: The remainder of this topics deals with traditional tape backup strategies. Advanced backup topics such as replication, fault management, fault tolerance, and high availability are discussed elsewhere. Refer to the links on the related entries page.

The traditional backup medium is magnetic tape. Tapes are relatively inexpensive, making it economical to devise an archiving scheme where you store tapes permanently at safe locations rather than reusing the tapes. You can refer to "Storage Systems" for additional information.

Here are some points to keep in mind:

  • Back up data regularly or whenever you make major upgrades to software, directory structures, and configurations.
  • Even if disk mirroring, server mirroring, and replication are implemented, you still need an archival storage mechanism that can restore corrupted data. For example, in a mirrored setup, corrupted data is written to most disks or servers at the same time. To recover, you might need to go to the most recent offline backup set.
  • Perform incremental backups to the files that have changed since you last made a major backup. If the information on your server changes constantly, you'll need to back up constantly.
  • Store a duplicate backup set at an offsite location to protect the backups from local disasters such as fires, earthquakes, and floods.
  • Before you put your server into service, back it up, then try restoring the information to make sure everything works and that you are familiar with the process.
  • A backup system must have a way to deal with files that are open at the time of backup. Schedule all of your backups during hours when fewer files are likely to be open.
  • To minimize network traffic imposed by backups, attach backup systems to the systems that need to be backed up.

Backup Operators

Backup operators are administrative personnel that handle backing up data. Keep in mind that they have full access to all your data. They can carry tapes offsite and they have the rights/permissions to access all files on systems, giving them the opportunity to steal, corrupt, alter, or use the data for their own benefit.

Designate only trusted people as backup operators, and make sure their rights are limited to only those files and directories they need to back up. As an added precaution, you should make sure that an auditing system tracks and logs all activities of the backup operator.

Types of Backup

There are three types of backup: normal, incremental, and differential. The type of backup you choose depends on how many tapes you use, how often you want to back up, whether you are archiving tapes at a permanent storage location, and whether you rotate copies of your tapes offsite.

Normal Backup    A normal backup copies all the files selected for backup to a backup device and marks the files with a flag to indicate that they have been backed up. This method is the easiest to use and understand, because the most recent tape has the most recent backup. However, you'll need more tapes and more time for backup since all the selected files are backed up.

Incremental Backup    Incremental backup backs up only files that have been created or changed since the last normal or incremental backup. Files are marked with an archive flag so that they don't get backed up in the next backup unless they have been changed. This method requires that you create a normal backup set on a regular basis. If you need to restore from backup, you first restore the normal backup, then restore each incremental backup in order.

Differential Backup    With differential backup, you back up only files that were created or changed since the last normal (or incremental) backup. This saves times (and bandwidth on a network) because you don't need to back up all files. This method does not mark files with an archive flag to indicate that they have been backed up; consequently, they are included in a normal backup. If you implement this method, you should still create a normal backup on a regular basis. If you need to restore, first restore the normal backup, then restore the last differential backup tape.

The above backup procedures assume that files are being backed up one at a time. This is called a file-by-file backup. While such backups are usually slow, they allow the backup operator to back up individual files as they change between backup sessions. An alternative method is the image backup, which basically streams all the information on a disk without regard for the file structure to the backup medium. The advantage of this method is speed, but the entire volume must be backed up at the same time and the restore must usually be done on a disk that is physically the same as the original.

Stack Electronics (http://www.stac.com) has a product that bridges the gap between file-by-file and image backup. File-by-file systems are slow because each file must be opened, copied, then closed. Stac's Replica product uses a special technique to quickly copy volumes without the need to open and close every file. This provides backup speeds similar to image backups, but the disk is not copied sector by sector so there is no need to use a similar storage device in the event of a restore.

Another interesting product is SafeBack from Sydex, Inc. SafeBack is called a forensic backup system because it makes a full mirror image of a hard drive, including information in erased spaces, caches, and swap files. This information may be essential in tracking cybercrimes. Normal backups do not copy it and some backup routines destroy the data you might really need for forensics by continuing to write information to the drive. SafeBack works in DOS to avoid overwriting critical information.

Tape Rotation Methods

The number of backups you perform depends on the number of copies you want to keep, whether you want to keep onsite and offsite copies, and the age of the last backup (hours, days, weeks). You should consider a backup rotation method, which keeps incremental copies of backup data available.

The backup rotation method discussed here stores current and older data on a set of media that you can store in other locations, thus reducing the risk of losing your only backup set. If you have a five-day workweek, you need 20 tapes. Increase the number of tapes if you have six- or seven-day workweeks. Here are the key points of this rotation method:

  • Four tapes are labeled Monday, Tuesday, Wednesday, and Thursday. Use these tapes for incremental or differential backup.
  • Four tapes are labeled Week 1, Week 2, Week 3, and Week 4. Create a complete backup to these tapes every Friday.
  • Twelve tapes are labeled for each month of the year; back up to these tapes at the end of each month. These tapes are stored offsite.

To create a duplicate backup set that you can carry to an offsite location, double the number of tapes.

Note: This is only one example of a rotation method. You may need to alter this technique to fit your own needs.

With any backup system, you need to run a restoration test to ensure that your backup and restore procedures work. You might want to set aside spare servers and then run restoration tests using these servers on a regular basis. Before dismissing the concept of spare servers as an unjustifiable expense, consider how much a downed server could cost you in dollars and in customer dissatisfaction.

Backup Management Systems

Automated backup systems include dedicated backup servers that automatically back up data to magnetic disk, jukebox tape libraries, and jukebox optical disk systems. They may provide hierarchical storage management functions as well. A typical automated system will run 24 hours a day and provide backup services for a number of clients (clients in this case are file and application servers that use the backup services). For example, ARCserve from Cheyenne Software runs on Novell NetWare servers and provides support for NetWare, Windows NT, UNIX systems, SUN systems, IBM/AIX, HP/UX, and SGI IRIX. It provides centralized backup administration, data compression, security, and a number of other features specific to the clients it supports. The Arback and Boole & Babbage products also support remote vaulting. A list of vendors that make backup products is presented on the related entries page.

There are many problems with tape backup. Files may have changed since the last backup, and these changes will be lost in the event of a restore. Also, the administrator is faced with creating and tracking incremental or differential backups. Tape backup is also slow. Because of these problems, many backup products now take advantage of server mirroring techniques and magnetic disk storage. To overcome the problem of file corruption being written to mirrored systems simultaneously, additional backup servers are included in the arrangement that retain data without new changes for a period of time.

Figure Backup-1 shows a hierarchical backup system. Note that mirrored servers provide reliable real-time backup for the current state of all files.

Figure Backup-1. Hierarchical backup system

The backup server may back up files on the mirrored servers every day, every hour, or at other defined intervals. Files on the backup server move to offline optical disk storage and eventually to tape storage, which is carried offsite. In the event that corrupted information is written to the mirrored servers, users can fall back to the backup server. If corrupted files are not detected and are written to the backup server, the most recent good file can be obtained from the optical disk or tape archives. This system can be totally automated and run continuously so that the most recent uncorrupted copy of a file can be traced back through the backup server to the archive, as necessary.




Copyright (c) 2001 Tom Sheldon and Big Sur Multimedia.
All rights reserved under Pan American and International copyright conventions.