Clustering (Linktionary term)

Site home page
(news and notices)

Get alerts when Linktionary is updated

Book updates and addendums

Get info about the Encyclopedia of Networking and Telecommunicatons, 3rd edition (2001)

Download the electronic version of the Encyclopedia of Networking, 2nd edition (1996). It's free!

Contribute to this site

Electronic licensing info

Clustering

Note: Many topics at this site are reduced versions of the text in "The Encyclopedia of Networking and Telecommunications." Search results will not be as extensive as a search of the book's CD-ROM.

Clustering is a fault-tolerant server technology that provides availability and scalability. It groups servers and shared resources into a single system that can provide immunity from faults. Improved performance is a byproduct of such a system. Clients interact with clusters of servers as if they are a single system. Should one of the servers fail in the cluster, the other servers can take over its load.

An interesting aspect of clustering is that it can provide "four nines" of availability (99.99 percent uptime), which translates to 53 minutes of downtime per year. A complete description of availability is under "Fault Tolerance and High Availability."

Before discussing clustering, it is important to put the range of network server/storage devices and configurations into context. Here are the typical arrangements of servers and storage devices on enterprise networks and Web sites on the Internet:

SMP (symmetric multiprocessing) systems A single system with multiple processors, multiple power supplies, network interface cards, and multiple storage devices that provide "local" fault tolerance (if one processor, power supply, or interface fails, the others remain operational), but not disaster tolerance (flood or fire). These systems can provide scalable performance, but do not provide an ideal scalable storage environment.

LAN-based server configurations This is the traditional LAN or wide area network in which servers are attached to the LAN at various locations in a building or over a MAN or WAN. Replication can be used to copy data to different locations in the same building or the extended network. This provides protection against local disasters such as fires and equipment failures. However, the systems are loosely coupled and do not provide the performance and management benefits of the systems discussed next.

Clustering A clustered system is a set of servers and attached storage devices that are in the same location and in a configuration that provides fault tolerance against the failure of any one device. All of the servers appear as a single server to users. Requests are balanced between the servers, either by external load-balancing devices or by software provided in the operating systems. All servers have access to all the storage devices, so if one server goes down, access is still available to hard drives through other servers in the cluster.

SAN (Storage Area Network) SANs basically put storage devices on their own high-speed network, typically Fibre Channel. By having all the devices attached to the network, any storage device becomes easily accessible. The switching nature of the network allows servers to make direct connections to any device. See "SANs (Storage Area Networks)" and "Fibre Channel." SANs are often created with multiple clustered systems.

NAS (Network Attached Storage) A NAS is now commonly referred to as a network appliance. The basic concept is to remove the server altogether and attach storage directly to the network, thus reducing cost and simplifying management. While a SAN is typically a system defined for enterprise use, a NAS is often installed at the department level. This concept relies on the fact that with standard document formats and XML, proprietary file formats used by servers are no longer needed and the server can be removed altogether. According to George Gilder, NASs are all about removing storage from the enslavement of server operating systems. See "NAS (Network Attached Storage)."

Note that clusters, SANs, and NASs all separate storage from servers, thus allowing any server to access any storage device. In addition, access to storage is not dependent on any one server since clients can get at storage through any server (or directly, in the case of NAS).

A simple clustering solution is pictured on the left in Figure C-20. In this case, two servers share the same hard disks. A cluster may also consist of more than two servers, as shown on the right in Figure C-20. The latter is often established to provide additional performance. In fact, a multicluster system is designed to allow scaling as well as fault tolerance. Additional servers are added to the cluster as processing requirements grow.

[ANCHOR HERE: Figure 20]

Clustering technology can provide better performance than large symmetric multi- processing servers because multiple systems provide better I/O (input/output) for a large number of network clients. Other important features and benefits are outlined here:

Load balancing Cluster software provides load balancing among the processors to ensure that processing is distributed in a way that optimizes the system.

Failover This is the term used to describe how other servers take over the load of a failed server.

Fault-resilience The ability to provide uninterrupted service. Obviously, this depends on the number of servers, disk arrays, backup power supplies, and quality of equipment.

Multiport disk access Each system in the cluster has access to the same RAID system. RAID systems have their own built-in fault tolerance.

Scalability The ability to scale the system up to handle more clients on an as-needed basis.

Both clients and network administrators view clusters of servers as a single server. A virtual IP (Internet Protocol) addressing scheme is implemented in which a single URL (Universal Resource Locator) points to the entire cluster of servers. A file system distributed over a cluster appears as a single file system even if one of the servers in the cluster fails. Network administrators can run a single management application to monitor the performance of the cluster.

Some of the difficulties of clustering include handling failures of components, recovering from failures, arbitration of access to disks in a shared environment, and cache coherency (a disk is read by another user before update information in the cache is written).

Clustering is a typical configuration in multitiered network architectures where the cluster represents the middle and back-end tiers. See "Multitiered Architectures" for more information.

Microsoft Windows 2000 DataCenter is designed for use at the core of enterprise networks. It supports a four-node cluster with expanded support in the future. A related product is Microsoft WLBS (Windows Load Balancing Service) for Windows NT. This is called NLB (Network Load Balancing) in Windows 2000. In Windows 2000, a feature called CLB (Component Load Balancing) balances the distribution of objects (software components) across a set of COM+ servers. Microsoft developed the MSCS (Microsoft Cluster Service) API to support cluster development.

IBM's Netfinity servers support up to eight nodes in a cluster with software that was designed using the Microsoft MSCS API. Novell's NWCS (NetWare Cluster Services) is another clustering solution.

The Virtual Interface (VI) Architecture specification defines an industry-standard architecture for communication within clusters of servers and workstations. The specification is designed to boost I/O by reducing the time it takes to exchange messages between devices. ""The VI architecture is promoted by Compaq, Intel, and Microsoft. More information is available at the Web site listed on the related entries page. Refer to "VI Architecture" for more information.

Load-balancing is a related technology for creating what are often called virtual clusters. Refer to "Load Balancing" for more information. Other interesting related technologies are described under "InfiniBand" and "DAFS (Direct Access File System)."