Chapter 1 - Backup and Restore Design
This chapter describes the Microsoft® Internet Data Center backup and restore solution. Data center backup principles and strategies are described, together with methods of assessing what to backup in an Internet Data Center environment. The recommended backup and restore solution uses CommVault Galaxy software in each of the Internet Data Center virtual local-area networks (VLANs). The functionality of Galaxy software is explained and the design of the solution is described, including details of the planning and configuration steps required to successfully implement this backup and restore solution. This chapter is part of the Backup and Restore Solution for Windows 2000–based Data Centers.
On This Page
Galaxy Software Modules and Core Functionality
Galaxy Architecture and Deployment Strategy
Deciding What to Back Up and Restore
Galaxy Solution Design and Configuration
The Internet Data Center architecture is designed to provide resilience with no single point of failure. It is still essential, however, to make adequate backups so that data and systems configuration can be restored in the event of a catastrophic failure. Although you may take every conceivable precaution, it is impossible to plan for every disaster or outage that could affect a data center, which is why planning a strategy for disaster recovery is so important.
The Importance of Backing Up the Internet Data Center Environment
The quantity of data that is stored in Internet Data Center environments varies but can grow as large as multiple terabytes, while the number of supported users will increase as well. In this type of constantly changing environment, mission-critical applications must be available, downtimes must be kept to a minimum, and increased dependence on the multiple tiers must be managed effectively.
It is important to back up Internet Data Center environments to protect critical data and allow it to be restored quickly in the event of any data loss, whether small or large.
Data loss can result from the following:
Hard disk subsystem failure
Power failure (resulting in corrupted data)
Systems software failure
Accidental or malicious deletion or modification of data
Natural disasters (that is, fire, flood, earthquake, and so on)
Theft or sabotage
An organization must be able to recover quickly from any outage or disaster, whether the situation involves a simple component failure or the complete destruction of a site. Therefore, when designing a backup and recovery architecture, you should consider all types of failure. The architecture you select should be based on well-defined system availability requirements and should take into account the contents and configuration of each server.
Assessing Your Situation
For each operating system and application introduced into an Internet Data Center environment, consider the following questions:
What are possible failure scenarios?
What is the critical data and where is it located?
How often are backups required?
When should full backups be done, as compared to incremental or differential backups?
What backup media will be used (magnetic disk, magneto-optical disk, or tape)?
Will backups be performed online or offline?
Will backups be started manually, or automatically according to a schedule?
What will be used to test for valid backups?
Where will backups be stored (on-site, off-site, or both)?
A good backup and recovery architecture should include a disaster avoidance plan, procedures, and tools that assist in recovering from a disaster or outage, and detailed procedures and standards for performing the recovery. For each subject area, the architecture should clearly define the people, process, and technologies required for success.
You should consider a number of factors when developing a backup solution for the Internet Data Center architecture. For example, you'll need to determine how to anticipate and avoid disasters, decide which parts of the environment should be backed up and how often, and learn how to plan a backup and recovery strategy for the environment. Your completed solution should include well-documented disaster avoidance and disaster recovery plans.
Disaster Avoidance Plan
A disaster avoidance plan must anticipate events that can affect system operation and provide for such occurrences. Events that can disrupt Internet services range from an Internet connection problem to minor failures in components that cannot readily be replaced, to more complex software problems.
Elements of a successful disaster avoidance plan include geographical redundancy and remote storage of backup tapes. Use of redundant, geographically distant data centers is good way to ensure that a regional catastrophe does not eliminate the ability to provide service. Removing backup tapes from each data center is a good way to avoid losing both the data center and the data center's backup mechanism. Depending on the importance of the data, several off-site storage facilities can be used. Off-site storage need not add a great deal to the cost of the backup and recovery architecture; many companies provide off-site storage services and will pick up and deliver backups tapes when the tapes need to be rotated.
A disaster avoidance plan must be based on the performance and availability requirements defined for the particular application being hosted. If the application serves a specific region, for example, it may not make sense to include a second, geographically distant data center in planning.
Disaster Recovery Plan
A disaster recovery plan prepares an organization for recovering from disasters and outages that cannot be avoided. When developing the plan, consider the following:
Can business operations continue during a disaster or outage? A disaster recovery plan should include procedures for maintaining business operations during a disaster or outage (including network outages). For example, the telephones in the sales department will continue to ring even when the server is not operational, so staff may need to take orders manually until the server is operational again. Each department should work out strategies for such situations.
How is the disaster recovery plan to be created and maintained? To ensure its success, the disaster recovery plan must be managed properly. It is recommended that one or more members of the organization be responsible for supervising the organization's disaster preparation efforts. Someone must install and maintain hardware protection devices, ensure that all departments have a plan if the server fails temporarily, ensure that backups are made and rotated off-site regularly, and create extensive documentation to support the disaster recovery plan.
Best Practices for Developing a Backup Solution
When developing a backup solution, follow these recommendations:
Involve the correct personnel and use the appropriate resources when developing and testing backup and restore strategies.
Create a data protection organization chart that includes responsibilities and contact information for each person.
Perform an initial full backup of every volume that needs protection.
Back up system state for every server and ensure that Microsoft Active Directory® directory service is included for each domain controller.
Print and review backup reports for CommVault Galaxy systems to ensure that all files are being backed up correctly.
Perform trial restorations of data periodically to verify that the files are being backed up correctly.
Ensure that backup media, systems, and servers are secured in a manner that prevents a rogue administrator from restoring stolen data onto your server.
Develop and implement a disaster recovery test plan to ensure the integrity of your backup data.
You should consider a number of factors when planning for your backup solution, such as backing up only what is necessary, scheduling backups carefully, and choosing the appropriate type of backup to perform.
Avoiding Unnecessary Backups
When designing a backup strategy, you may be tempted to perform a full backup of every server in the environment. Keep in mind, however, that your objective is to successfully restore the environment after an outage or disaster. Therefore, your backup strategy should focus on the following goals:
The data to be restored should be easy to find.
The restoration should be as quick as possible.
If you back up every server indiscriminately, you will have a large volume of data to recover. Although current tape storage and backup products allow for fast restoration of data, it may increase downtime if everything must be restored from tape. For example, most backup products require the following steps:
Reinstall the operating system.
Reinstall the backup software.
Restore the backup from tape.
The more files you are backing up, the longer the backup takes to perform and, more important, the longer it takes to restore the files. When disaster strikes, time is critical, so the shorter the restoration process, the better. Furthermore, large backups that are performed on a regular basis affect network performance negatively unless you establish a dedicated backup network.
After you have determined the optimal backup strategy for your environment, it is vital that you perform a trial restoration across the entire test network. This trial identifies any problem areas and provides useful experience in restoring systems in the Internet Data Center environment, without the pressure of having to bring a production system back online.
(The CommVault Galaxy interface, which is described more fully later in this chapter, simplifies data identification so that you can select the appropriate data to back up and then restore critical data first.)
Choosing an Appropriate Time for Backup
Backing up an e-commerce environment is not the same as backing up a corporate local area network (LAN) infrastructure. On a corporate LAN, network usage usually drops outside of core business hours. In an e-commerce environment, usage generally increases in the early evening and may continue at this level until the early hours of the morning, especially if the customer base spans multiple time zones. For this reason, it may not be possible to identify an ideal time to back up your environment. To reduce impact to Web customers, follow these guidelines:
Schedule backups to avoid times of peak Web usage.
Do not back up unnecessary data.
Perform regular trial restore operations on a test network to verify that the correct backups are being made.
Choosing the Appropriate Type of Backup
The three main types of backup are:
In addition, CommVault Galaxy software provides two backup types that assist in the backup process and save time within the critical backup window:
A normal (or full) backup copies all selected files and marks each file as having been backed up (in other words, the archive attribute is cleared). With normal backups, only the most recent copy of the backup file or tape is needed to restore all of the files. A normal backup is usually performed the first time a backup set is created.
An incremental backup backs up only those files that have been created or changed since the last normal or incremental backup. It marks files as having been backed up (in other words, the archive attribute is cleared). If a combination of normal and incremental backups is used, the last normal backup set and all incremental backup sets are needed to restore all data.
A differential backup copies only those files that have been created or changed since the last normal or incremental backup. It does not mark files as having been backed up (in other words, the archive attribute is not cleared). If a combination of normal and differential backups is used, the files or tapes from both the last normal backup and the last differential backup are needed to restore all data.
An auxiliary (or secondary) copy is a copy of backup data. The data copied is a true image of the primary backup copy and can be used as a hot standby backup copy in the event that primary backup servers, devices, and media are lost or destroyed. The primary and secondary copies use different media and often use different backup libraries.
Synthetic Full Backup
A synthetic full backup combines the most recent full backup of the selected data with all subsequent incremental and/or differential backups, and stores the result in a single archive file. Synthetic full backups are used primarily to enhance the performance of restore operations, because a single backup will then be all that is needed for a successful restore.
Advantages and Disadvantages of Each Type of Backup
When deciding which type of backup to perform, you must consider the backup's impact on network bandwidth and the time required to restore the data. Table 1.1 describes the advantages and disadvantages of each type of backup.
Table 1.1 Comparison of backup types
Files are easier to find because they are on the current backup medium.
Is time consuming.
Requires the least amount of data storage.
Complete system restoration may take longer than using normal or differential backup.
Recovery requires media from last normal and last differential backups only. Provides faster backup than normal.
Complete system restoration may take longer than using normal backup. If large amounts of data changes occur, backups may take longer than incremental type.
Makes exact copies of backup tapes for redundancy. Copies are quicker to generate than actual backups. Copies can be kept onsite for disaster recovery.
Consolidates normal and incremental backups into a new normal backup within a library that is stored off of the network and/or critical servers. Reduces backup and restoration time.
Choosing the Appropriate Storage Media to Use
In addition to determining what type of backup to perform and when to perform it, you should evaluate the types of storage media available, and select appropriately.
When choosing a storage medium, consider the following factors:
The amount of data to be backed up
The type of data to be backed up
The backup window
The distance between the systems being backed up and the storage device
Your organization's budget
The Service Level Agreements for data restorations
Table 1.2 summarizes the advantages and disadvantages of the common backup media types.
Table 1.2 Comparison of backup media types
Backup Media Type
Provides fast backup and long retention.
Wears out faster and is more susceptible to errors than magnetic disks and magneto-optical disks.
Is easy to configure and maintain.
Is the most expensive medium for initial storage.
Offers longest life span without degradation of the medium.
Is slowest for backup and restore.
Galaxy Software Modules and Core Functionality
The Microsoft Internet Datacenter architecture uses CommVault Galaxy for Windows 2000 as its backup solution. The CommVault Galaxy framework of software modules includes the following components:
One or more Intelligent DataAgents (iDataAgents), which back up and restore particular data
One or more MediaAgents, which oversee the transfer of data between iDataAgents and backup media
One CommServe StorageManager, which controls the iDataAgents and MediaAgents
All of these software modules can exist on the same computer system, each can reside on a separate system, or they can be combined on various systems.
Together, the iDataAgents, MediaAgents, and CommServe StorageManager make up a single CommCell, which is the primary building block of the Galaxy framework.
Galaxy software supports the creation of multiple, discrete CommCells. Each CommCell contains the appropriate number of iDataAgents and MediaAgents to meet the requirements of the system it is backing up (these requirements include the backup window, performance throughput, and amount of data within the CommCell). From a single logon screen, the user can select and manage any individual CommCell from any Web-based console in the enterprise.
For more information on the CommCell and its deployment options, refer to "Galaxy Architecture and Deployment Strategy," later in this chapter.
A single CommServe StorageManager software module (a CommServe) directs combinations of MediaAgents and iDataAgents. The CommServe is the command and control center of the CommCell. The CommServe software handles all requests for activity between MediaAgents and iDataAgents, and monitors and administers all backups and restores. Only control information passes through the CommServe software module; not the backup or restore data itself.
The CommServe includes the centralized event and job managers and the logical and physical management tree, and it also houses the meta database catalog. This database includes metadata about the nature and location of the data that is backed up. The centralized event manager logs all events, providing unified notification of important events. The job manager controls all the major activity of the software and provides Galaxy with its restart capabilities. Because the CommCell console is displayed through the use of a Web browser interface or Microsoft Management Console (MMC) snap-in, you can remotely manage the entire Galaxy system either internally from any VLAN in the Internet Data Center environment, or externally by using Web-based access through a virtual private network (VPN).
The CommServe StorageManager may reside on its own dedicated system, or on a system that also contains a MediaAgent and/or iDataAgent.
The MediaAgent software module manages the movement of data between the physical backup storage devices and the corresponding iDataAgents. MediaAgents manage the backup storage devices, which are typically attached through a local bus adapter, such as a small computer system interface (SCSI). The MediaAgent software is designed to be storage-media independent; thus, it is capable of supporting a wide variety of storage models. This approach allows an organization to adapt rapidly to changes in storage technology. For example, MediaAgents communicate with the following types of storage devices:
Tape libraries. The MediaAgent manages the multiple tape media and multiple tape drives in the library, and the movement of the robotic arm within it. The use of tape libraries saves time, limits the possibility of human errors, provides lights-out data protection, and allows for the consolidation of data through synthetic full backups.
Stand-alone tape drives. The MediaAgent manages the tape drive. You must load and unload media manually, so you should avoid using stand-alone tape drives if possible.
Magnetic disk. Magnetic storage can consist of multiple disks or a redundant array of independent disks (RAID). This option has become more popular with increasing disk sizes, falling prices, and faster transfer rates. Storage costs for magnetic disk remain more expensive than for tape or magneto-optical disk, but in many Internet Data Center environments, the need for fast backups requires the use of magnetic disk for intermediate stage storage prior to backup to tape.
Magneto-optical disk. Magneto-optical disks offer data throughput of 6 megabytes (MB) per second and decades of shelf life. Bar-coded magneto-optical libraries are gaining favor as a medium for hierarchical storage management, in which you create policies to move less-frequently used files off magnetic disks, which are more expensive.
The iDataAgent (Intelligent DataAgent) is the software module that manages the data transfer to the backup media through the MediaAgent, and is specific to the data type it manages. In the Internet Data Center environment, there are specific iDataAgents for the file systems for the Microsoft Windows® 2000 Server operating system (for example, Web and application servers), the servers running Microsoft Exchange Server 2000, and Microsoft SQL Server™ 2000 databases. An iDataAgent is required for each managed data type per client system, whether the system is physical or virtual, such as on clustered systems or in SAN configurations where there are many virtual file systems or clients. Each iDataAgent can manage multiple instances per client of the appropriate data type, so a single Galaxy iDataAgent for Windows 2000 configured to manage the Windows NTFS file system can manage multiple file system instances on the same client computer.
The Galaxy software uses a two-part synchronized indexing scheme. This scheme consists of a centralized meta database catalog residing within the CommServe StorageManager software and an index that is located on the same computer as the MediaAgent software.
To enhance browsing and recovery performance, each MediaAgent maintains an index of the backup data written to the backup media. A permanent copy of the index is stored on the backup media and an active copy of the index is maintained on the media storage disk where the MediaAgent is installed. This local index cache disk is finite. As new data is written to the backup media, new indexes are created. Configurable parameters allow administrators to set the size of the cache and lifespan of the local index. If the index exceeds the preconfigured capacity, older indexes are overwritten by using a least recently used (LRU) algorithm.
Galaxy DataPipe technology is designed to move data as fast as the source client can provide it and as fast as the backup media device can write it. The Galaxy software uses the same process for writing backups to direct-attached SCSI devices, SAN-attached devices, and remote network Transmission Control Protocol/Internet Protocol (TCP/IP) connections.
The Galaxy DataPipe provides high-performance data movement with low overhead. Over TCP/IP networks, you can achieve data transfer rates close to the theoretical limit of the network, minus protocol overhead. The data transfer method works as well with disparate media types as it does with identical media types.
Dynamic Drive Sharing
Galaxy can share tape devices (libraries and drives) in a switched SAN fabric configuration, with more than one MediaAgent sharing all or some drives of a tape library.
The implementation of dynamic drive sharing (DDS) in Galaxy software allows for policy-based sharing to tape library drive resources among multiple backup systems. Galaxy software uses a software layer to manage the sharing of devices. This approach provides significantly greater reliability than does a simple SCSI reserve and release strategy, and makes data easier to manage.
The advantages of DDS and library sharing include the following:
Better backup performance
Better return on hardware investment
Reduced hardware spending
Improved data protection
Faster data access
CommVault Galaxy uses authentication mechanisms to ensure that communication between the Galaxy clients (iDataAgents) and Galaxy components (CommServe and MediaAgents) is performed only between recognized modules. The graphical user interface (GUI), an application of the CommServe, processes requests from user-initiated GUI sessions; it is also responsible for the challenge/response authentication of users and for processing the requests of those users. Therefore, because the CommServe is using authenticated connections, computers outside the Galaxy realm are prevented from connecting with Galaxy processes.
The CommCell uses a network password as an internal security measure to ensure that Galaxy communications occur only between CommCell computers. By default, Galaxy assigns each computer in the CommCell a different password (which is not a user-level password). At any time, you can define a new CommCell network password for any computer in the CommCell.
Access to the resources and features of a CommCell is granted or denied based on a combination of the CommCell resources, capabilities, user groups, and accounts. The Galaxy administrator assigns user names and passwords. These Galaxy user accounts exist only in the context of Galaxy and are not Windows accounts.
During installation, Galaxy creates a permanent user named cvadmin, which is the default CommCell administrator. This user name cannot be changed after the installation and has all available rights within a CommCell, including rights for creating user accounts.
Galaxy Architecture and Deployment Strategy
The Internet Data Center environment is built on a multi-tiered architecture of physical VLANs and distributed Microsoft .NET–based application servers. The recommended Galaxy data storage management strategy includes procedures for backup and restore operations from the systems level through to applications and application configurations. This strategy protects data residing in each system, each virtual local area network (VLAN), and across the complete Internet Data Center environment.
Galaxy uses all of the available technologies used in Internet Data Center, such as clustering, Storage Area Network (SAN) for speed, and security techniques such as IP Security (IPSec) for making the Internet Data Center environment highly available. In addition, Galaxy uses clustering to protect against failover of its components (such as CommServe and MediaAgents). The Galaxy solution is designed to protect against component, server, or application failure, and also to protect against an entire data center loss, as in a fire.
Because all of the tiers in the Internet Data Center architecture communicate through the firewalls and the VLAN switched environment, a distributed backup and restore architecture is recommended. This recommendation is not based on a Galaxy software requirement or limitation; rather, it is to enhance the availability of the Internet Data Center environment.
Distributed MediaAgent Strategy
Placing a MediaAgent in each VLAN accomplishes the following:
Enables effective control of backup data transfer.
Reduces the volume of data being transferred between VLANs.
Improves security for servers and simplifies the firewall configuration.
For example, the Infrastructure VLAN in the Internet Data Center architecture includes at least five servers running .NET-based applications, each of which contain several distributed components. Therefore, backing up each application server requires that you open ports at each server and at the firewall to provide the connection to the Data/Management VLAN for backup traffic. If a MediaAgent is placed in the Infrastructure VLAN, the MediaAgent can then back up the application servers locally and, at a time when there is less network traffic, use the Auxiliary Copy feature of the Galaxy software to copy the data to the tape media on the Data/Management VLAN. To use this method, the MediaAgent must have adequate disk space to store the backup data before transferring it to the tape media.
Figure 1.1 illustrates the distributed Galaxy architecture for the Internet Data Center environment.
Figure 1.1: Distributed Galaxy Architecture
CommCell Deployment Strategy
As stated previously, the combination of iDataAgents, MediaAgents, and the CommServe StorageManager together form a CommCell, which is the primary building block of the Galaxy framework. The control and data paths shown in Figure 1.2 assume a traditional LAN-based computing environment. When Galaxy software is deployed in a storage-networking environment (for example, when backing up Microsoft SQL Server 2000 on the Data VLAN), the control and data flows may be different.
Figure 1.2: Galaxy CommCell
For extremely large systems, the Galaxy architecture allows you to place the iDataAgent and MediaAgent modules on the same computer to deliver high-performance direct-attached throughput (Figure 1.3).
Figure 1.3: iDataAgent and MediaAgent on a large system
In environments where centralized management and centralized storage are essential, (for example, in environments that use centralized storage for a single department or in raised-floor data center environments that centralize all operations), the Galaxy solution easily conforms to the data protection strategy. Various iDataAgents across the network can pass monitoring and control information and data to a centrally managed MediaAgent and attached storage (Figure 1.4).
Figure 1.4: Centralized control in the Galaxy solution
You can also deploy Galaxy software with centralized control of distributed storage. Doing so eliminates the requirement that you move backup data across the LAN, and therefore significantly reduces backup and recovery time. Figure 1.5 illustrates centralized management control of both local and remote storage in a system running Galaxy software. Because only control information is passed between the MediaAgent and the CommServe StorageManager software, slow communication links can be used.
Figure 1.5: Management control over local and remote storage
Galaxy software supports storage area network (SAN) architectures (Figure 1.6). In the Internet Data Center environment, the cluster running SQL Server 2000 on the Data VLAN uses a SAN for data storage. The Galaxy tape library is also attached to the SAN. This means the MediaAgents can take advantage of SAN speed to transfer data to the tape library. In SAN environments, Galaxy software supports LAN-free backup as well as server-free and serverless backup and recovery of application data.
Figure 1.6: Galaxy software in SAN environments
Recommendation To use DDS to share all drives in the library, configure all drives in both the MediaAgents on the cluster running SQL Server and the MediaAgent on the CommServe. The MediaManager of the CommServe manages the resource allocation. Detailed information on configuring MediaAgents is provided in the CommVault Galaxy CommCell Media Management Administration Guide.
Deciding What to Back Up and Restore
Virtually any system component can be backed up, and backup media is relatively inexpensive. It is therefore tempting to back up every component of the Internet Data Center architecture. However, a solution of this type requires a significant amount of time and bandwidth to perform the backup and to restore the system.
It is important to look at all the parts of the implementation of the Internet Data Center architecture and determine what data must be restored if the disaster recovery plan is invoked. Doing so helps you to decide on an effective backup and restore strategy and to identify potential weaknesses in the application design. If the application architecture changes significantly, you should re-evaluate your backup strategy.
Figure 1.7 can be used to decide which servers in the Internet Data Center architecture must be backed up. The discussion that follows provides backup RECOMMENDATIONs for typical Internet Data Center environments.
Figure 1.7: Backup design flowchart
Front-End Web Recovery
In the Internet Data Center architecture, it is recommended that a recovery process for the front-end Web farm involve rebuilding the servers by using automated builds. One of the principle aims of the design for the front-end Web in the Internet Data Center architecture is that no persistent data should be stored on any of the Web servers. All of the Web-tier servers are clones that receive their content and settings from Microsoft Application Center 2000, which resides on the Infrastructure VLAN. As a result, the Web servers for most Web applications do not need to be backed up. This dramatically reduces the volume of data that needs to be backed up and simplifies the firewall configuration.
In some cases, you may need to back up individual Web servers (for example, if you are not using Microsoft Application Center 2000, or if you are running an application such as a business-to-business (B2B) application where persistent data must be stored on the Web servers). You should avoid having to do so if possible. If required, however, you can use the CommVault Galaxy backup and restore solution through a firewall to back up and restore the Web servers in this tier.
Recommendation If Internet Data Center Web servers are to be backed up, you must install Galaxy iDataAgent for Windows 2000 on the servers. In addition, you may want to use a dedicated server on this VLAN to host a MediaAgent that writes to magnetic disk rather than to tape. This approach offers the following advantages:
Backup and restore traffic is localized in the VLAN.
The media is readily available (for example, tapes do not have to be loaded), and the backup and restoration of data to or from magnetic disk is fast.
Secondary copies can be (selectively) copied to the back-end tapes during periods of low network usage.
The Infrastructure VLAN contains the domain controllers for Windows 2000 and, if the application architecture requires it, the load-balancing servers running Component Services, the business-to-consumer (B2C) components running Microsoft Commerce Server, and the B2B components running Microsoft BizTalk™ Server (including Product Catalog System, Profiling System, and Business Process Pipelines). It also contains the controller and staging servers for Application Center 2000. If the application architecture relies on Active Directory to store customer account data, including user authentication and computer accounts, in the Active Directory database, the successful backup and restoration of Active Directory should be as high a priority as the restoration of the data from computers running SQL Server 2000 in the Data VLAN.
Even if the Web application does not rely on Active Directory for data storage or user authentication, it is still vital that the Active Directory domain controllers be backed up on a regular basis. Active Directory can store security credentials, such as certificates, replication components, and system resources. Furthermore, other servers in the environment will have permissions and service accounts based on instances of accounts stored in Active Directory. If it is not possible to recover a current instance of Active Directory, considerable effort will be required to resynchronize computer accounts and reapply permissions.
If the Web application is using an array of load-balancing servers running Component Services that are managed by Application Center, consider backing up only the staging server. Because the Internet Data Center architecture provides automated installations of all servers, the servers running Component Services can be rebuilt as quickly as they can be restored from a backup, and rebuilding may be less problematic than restoring from backups. However, because the staging server contains the current master copy of the Component Services–based application and the configuration for Application Center, it should be backed up to reduce the time needed to restore the entire array.
Recommendation An iDataAgent needs to be installed on the servers. As with the front-end Web, you can use an optional dedicated server to host a MediaAgent that writes to magnetic disk rather than tape. The advantages of this approach are as follows:
Backup and restore traffic is localized in the VLAN.
The media is readily available (for example, tapes do not have to be loaded), and the backup and restoration of data to/from magnetic disk is fast.
Secondary copies can be (selectively) copied to the back-end tapes during periods of low network usage.
The database servers running SQL Server 2000 on the Data VLAN are most likely to need a strong backup solution. The database servers are likely to contain customer information, financial information, and crucial data for the functionality of the Internet Data Center Web application. If a strong backup and restore solution is not used, major organizational disruption could result. Therefore, all database servers containing live data must be backed up as frequently as possible. A clustered iDataAgent for SQL Server 2000 must be configured on servers running SQL Server 2000.
Recommendation Placing a MediaAgent and iDataAgents on the same physical hardware in the Data VLAN reduces backup time and enhances backup and restore performance. Because the SQL Server 2000 data layer is configured in a SAN environment, installing a MediaAgent on the same server as the iDataAgent takes advantage of the speed and robustness of the available SAN environment without incurring additional hardware costs.
Stand-By Disaster Recovery Option for the Data VLAN
Because the Data VLAN is the most important layer of the Internet Data Center architecture from a data perspective, you should back up the databases frequently and restore them to exact replicated servers running SQL Server 2000 at a different location. To do so, you can use the SQL Server 2000 log-shipping utility in combination with the Galaxy solution. This strategy provides higher availability of the data layer. It also prevents problems such as data corruption and virus attack if the data is not protected by clustering and replication. This option is inexpensive and provides a strategy for availability in environments where there is some tolerance for downtime.
Monitoring and management repositories should also be backed up because they contain historical data about the system. For example, security events may be archived to the these repositories, and some organizations, such as financial service providers, may be legally required to retain this data for a set period of time.
Summary of Recommendations
Table 1.3 summarizes the state of each VLAN in an Internet Data Center environment and the relevant backup and restore recommendations.
Table 1.3 Backup and restore recommendations
Front-end Web VLAN in standard environments. Front-end interface layer (top tier).
Web server clones managed by Application Server 2000. No stored persistent data.
IIS Web Server farm: Rebuild servers by using automated builds (no backups required).
Front-end Web VLAN in particular B2B environments.
Web server clones managed by Application Server 2000. B2B applications that require persistent data to be stored on Web servers.
You must back up persistent data.
Infrastructure VLAN. Business logic layer (middle tier).
Contains business logic. Can also contain user authentication and computer accounts stored in Active Directory. Components include the controller for Application 2000; staging servers; Active Directory /Domain Name System (DNS) servers; Commerce Server 2000 (for B2C Web applications), BizTalk Server 2000 (for B2B Web applications); Exchange Server 2000 (for the mail component of BizTalk Server).
Application Center 2000: You must back up the controller configuration for Application Center 2000, content on staging servers, and COM+ data.
Data VLAN Database layer (bottom tier).
Contains crucial data used by Commerce Server 2000, BizTalk Server 2000, and Web applications. Data is managed and stored by clustered SQL Server 2000 in a SAN environment.
SQL Server 2000: You must back up system state and all databases.
Management VLAN. Administration and systems management.
Monitoring and management servers and VPN servers.
Monitoring and Management repositories: You must back up historical system data.
Galaxy Solution Design and Configuration
This section provides procedures for designing and configuring a Galaxy system for optimum performance.
To design and implement a Galaxy system, you must determine the following:
Storage devices required
Your first-pass design of a Galaxy system should provide you with preliminary hardware and storage requirements. You can then perform the same process again to refine the design for specific storage needs. "Galaxy Configuration" (later in this chapter) provides recommendations for enhancing this design. An example is included to illustrate the Galaxy System design process.
You can find details about the hardware solution used for the Internet Data Center architecture on the following site:
It is important that you determine what additional systems hardware you will need for on-site backup, equipment redundancy, and off-site storage. For many environments, it is good practice to maintain a test facility that has the same equipment as the production environment, but is in a different location.
Note: For information about hardware requirements for Galaxy systems, see the Galaxy documentation or visit: http://www.commvault.com
Storage requirements are the total amount of storage space and storage media required to maintain the backup for a specified period of time.
To calculate storage requirements for your system, you must determine the following:
Number of clients
Data retention period
Storage media required
You must first determine these requirements to estimate the amount of physical storage space your solution requires.
Determining Number of Clients
Determine the number of client computers in the Galaxy system.
Determining the Data Retention Scheme
The data retention period is the period of time for which a particular set of backup data is to remain available to restore. After the data retention period has elapsed and you have run the pruning utility, the media will be available for reuse.
To determine your data retention scheme, use the following criteria:
Number of full backup cycles maintained in storage (Cycles). A full backup cycle includes the full backups and all other backups until the next full backup.
Number of incremental/differential backups in a full cycle (Incrementals).
Determining the Storage Required
The storage required is the total amount of data that will be maintained on the storage media for the data retention period. Index is the space required on the MediaAgent to store the index data that defines the user objects saved in a given backup. This index is archived to storage media at the end of the backup.
To determine your total storage needs, use the following criteria:
Storage Required = Full Backups + Incremental Backups + Index
Full Backups = (Cycles * Total Used)
Incremental Backups = (Cycles * Daily Change * Incrementals)
Index = 4% (Full Backups + Incremental Backups)
Total Used is the total amount of disk space used for all clients.
Cycles is the number of full backup cycles.
Daily Change is the estimated daily rate of change of data.
Incrementals is the number of incremental and/or differential backups in each full backup cycle.
For example, assume that the backup cycle for your system is four weeks, and that you run six incremental backups each week. Also assume that the total amount of disk space used on all clients (Total Used) is 1 (terabyte) TB. and that the daily estimated change (Daily Change) is 10%, or 100 gigabytes (GB).
The size of all the full backups is:
(4 Cycles * 1 TB) = 4 TB
The size of all incremental backups is:
(4 Cycles * 6 Incrementals* 100 GB Daily Change) = 2.4 TB
The size of the index is:
4% of 6.4 TB = 256 GB
The storage required is:
4 TB + 2.4 TB + 256 GB = 6.656 TB
Determining the Storage Media Requirement
The storage media requirement is the amount of physical media (tape, magnetic disk, or magneto-optical disk) needed to hold your total storage requirements for the data retention period.
To determine your storage media needs, use the following criteria:
Storage Media = Storage Required / (Media * Compression Rate)
Storage Media is the amount of storage media needed.
Storage Required is the total calculated in the previous example.
Media is the uncompressed capacity of the media type used.
Compression Rate is the compression ratio of the hardware.
Continuing with the previous example, assuming that the tape you are using has a capacity of 60 GB uncompressed, and that hardware compression allows a 2:1 compression ratio, the number of tapes required is:
6.656 TB / (60 GB * 2) = 56 Tapes
Determining the Storage Device Requirement
The storage device requirement is the number of tape drives needed to perform a full backup on all clients simultaneously such that the backups are completed within a specific time period or backup window.
To determine your storage device needs, use the following criteria:
Drives Minimum = (Full / Backup Rate) / Backup Window
Drives Maximum = (Clients * Streams * Backup Duration) / Backup Window
Drives is the number of tape drives needed.
Full is the size of a single full backup for all clients.
Backup Rate is the estimated backup rate in GB per hour.
Backup Window is the backup window (available time for backups to be completed) in hours.
Continuing with the previous example, assume that the full backup size for all clients is 1 TB, and that your drive has a backup rate of about 35 GB per hour, that the backup window is eight hours, that each client has 2 backup streams, and that the backup duration per client is two hours.
The minimum number of tape drives needed is:
(1 TB/35 GB per hour) / 8 Hours = 4 Drives
The maximum number of tape drives needed is:
(30 Clients * 2 Streams * 2 Hours) / 8 Hours = 15 Drives
Determining the Number of MediaAgents
A Galaxy MediaAgent manages the library and the transmission of data between clients and backup media.
To determine MediaAgent needs for your system, use the following criteria:
MediaAgents = Drives / Drives per Library
MediaAgents is the number of MediaAgents needed.
Drives is the number of tape drives needed.
Drives per Library is the number of drives in the library.
Continuing with the previous example, assuming that a library contains 10 backup drives, the number of MediaAgents needed is:
15 Drives / 10 = Two MediaAgents
Best Practices for Configuring Galaxy Components
Configuring Galaxy components and following best practices for optimal performance can enhance the Galaxy system you design. You can configure specific components or applications, or adjust your storage requirements according to the requirements of your particular Internet Data Center environment.
The following configuration is provided to enhance and support the availability and performance of the Internet Data Center environment.
This section provides best practices and recommendations for implementing Galaxy CommServe in the Internet Data Center environment.
The CommServe should be implemented in a Windows 2000 clustered environment.
A single CommServe can manage up to 35 clients. The number of clients in the environment can vary, depending on the design and hardware you use. For environments with more than 35 clients, you should deploy another CommServe.
Hardware and Software for CommServe
To optimize the performance of a clustered CommServe, use a quad processor computer running the Windows 2000 Advanced Server operating system. The computer should meet the following minimum requirements:
Pentium-compatible 700 megahertz (MHz) or higher Xeon processor
2 GB of RAM
At least 8 GB of hard disk space, plus index cache storage
This section provides the best practices and recommendations for implementing Galaxy MediaAgents in the Internet Data Center environment.
To increase performance and reduce network traffic, place a MediaAgent on the same computer as any iDataAgent that will have large storage requirements.
A MediaAgent can be implemented in a Windows 2000 clustered environment where fail-over capability is needed.
Use a one-month, four-backup cycle for your data retention scheme. Perform a full backup once a week, and perform incremental backups daily.
When scheduling full backups, use Start new media in Advanced Backup Options.
Use a separate storage policy for each iDataAgent.
Include the client name and iDataAgent type in each storage policy name (for example, Server1_SQL)
Increase the number of streams so that it equals the maximum number of drives configured.
Use hardware compression for each storage policy by setting the Hardware Compression attribute.
Hardware and Software for MediaAgent
To optimize the performance of the server hosting the MediaAgent, use a quad processor computer that is running the Windows 2000 Server operating system. The computer should meet the following minimum requirements:
Pentium-compatible 700 MHz or higher Xeon processor
2 GB of RAM
At least 8 GB of hard disk space, plus index cache storage
Configuring Storage Media
When using storage media in the Internet Data Center environment, follow these recommendations.
Use magnetic media only when your total storage requirement on a client is less than 20 GB.
When the primary copy of a backup is on magnetic media, you must create an auxiliary copy from which to make secondary copies to tape.
Set short data retention periods for magnetic media (for example, two days for one full backup).
Configuring Off-site Storage
When configuring off-site storage, follow these recommendations:
Assign each iDataAgent to one storage policy. If an iDataAgent has multiple subclients, assign them all to the same storage policy.
Name the storage policy using a simple naming scheme such as host name_application type. For example, if a server running Exchange Server named avocado hosts two Galaxy iDataAgents named file system and Exchange database, the storage policies that the server points to would be named avocado_fs and avocado_exdb.
When scheduling full backups, use Start new media in Advanced Backup Options. Doing so marks as Full the previous active tape for the storage policy.
By following these steps, you can take tapes off-site weekly. To identify which tapes you can take off-site, run the Backups on Media report for the appropriate storage policies in the CommCell, and specify that the report display only those tapes marked for the library as Full and In.
If the data retention period has not elapsed for the data stored on the off-site tapes, you can bring the tapes back on-site and restore the data. If the data retention period has elapsed but the tapes have not been recycled, you can use the Galaxy Disaster Recovery tool to restore the data.
Note: The Disaster Recovery tool currently supports the restoration of data from file system backups. Later this year, CommVault will add support for Exchange 2000 and other application types.
To keep selected backup media (such as full backups for the end of the month, quarter, and year), use the View media option in the Galaxy Backup History to identify those tapes so that you can be sure not to reuse them. To restore the backups from the tapes, use the Disaster Recovery tool.
To prevent the off-site data from aging off, use the auxiliary copy feature of the Galaxy software. This requires a separate set of tapes that are used specifically for off-site storage. The auxiliary copy feature copies Galaxy archive files from the primary copy of a given storage policy to an auxiliary (secondary, tertiary, and so on) copy. This copying can occur between similar (for example, tape to tape) or dissimilar (for example, magnetic to tape) media. After running the auxiliary copy, you can take the tape belonging to the auxiliary copy off-site. Aging rules (retention times) for the auxiliary copy data can be greater than the retention times on the primary copy.
Configuring iDataAgents for SQL Server 2000
When stripping is supported, use multiple-stream storage policies.
The Effect of Backup on a Client
To run, backups require hardware resources (particularly CPU and RAM) on the client computer. The degree to which a backup operation affects a client depends on what hardware resource is available and whether other operations are running on the client computer concurrently.
Make sure that the client computer meets at least the minimum requirements described in Chapter 2, Backup and Restore Deployment.
Perform all nonessential tasks or other operations (for example, virus scans) outside the backup window.
Run backups on a client during the evening, or at times when activity on the client is minimal.
Run subclient backups sequentially.
Set the operational window to prevent backups from occurring during the day, or during other active periods.