選擇 SharePoint Server 的災害復原策略Choose a disaster recovery strategy for SharePoint Server

摘要: 了解發生災害時可用來復原 SharePoint Server 2016 和 SharePoint 2013 伺服器陣列的災害復原選項和支援的技術。Summary: Understand the disaster recovery options and supported technologies for recovering a SharePoint Server 2016 and SharePoint 2013 farm if there is a disaster.

我們對「災害復原」的定義是能夠在裝載 SharePoint Server 伺服器陣列的主要資料中心無法繼續運作時,復原到正常狀態。不論事件的性質與成因為何,資料中心運作中斷是非常嚴重的情況,必須採取貴組織的災害復原計劃中所定義的行動。這表示要運用未受事件波及的資料中心裡的電腦資源,將能夠完整運作的伺服器陣列投入正式運作行列。We define disaster recovery as the ability to recover from a situation in which the primary data center that hosts a SharePoint Server farm is unable to continue to operate. Regardless of the nature of event and its cause, the data center outage is significant enough to set into motion the actions defined in your organization's disaster recovery plan. This means putting a fully operational farm into production using computer resources that are located in a data center that is not affected by the event.

含 Service Pack 1 (SP1) 或 SQL Server 2016 的 SharePoint Server 2016 和 SQL Server 2014 以及 SharePoint 2013 和 SQL Server 2008 R2 Service Pack 1 (SP1) 或 SQL Server 2012,提供設定與內容復原選項,可讓企業在發生災害時達到所需的復原時間目標 (RTO) 和復原點目標 (RPO)。如需這些與其他災害復原概念的詳細資訊,請參閱 SharePoint Server 的高可用性與災害復原概念SharePoint Server 2016 and SQL Server 2014 with Service Pack 1 (SP1) or SQL Server 2016, and SharePoint 2013 and SQL Server 2008 R2 with Service Pack 1 (SP1) or SQL Server 2012 provide configuration and content recovery options that can meet the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) that are required for your business if there is a disaster. For more information about these and other disaster recovery concepts, see High availability and disaster recovery concepts in SharePoint Server.

簡介Introduction

SharePoint Server 伺服器陣列的有效災害復原策略必須能夠滿足貴組織的業務需求,這通常以兩個量值來表示:目標復原時間 (RTO) 和目標復原時點 (RPO)。RTO 和 RPO 需求是在判斷災害發生時對組織造成的停機成本後所得出。An effective disaster recovery strategy for a SharePoint Server farm must be sufficient to meet your organization's business requirements, which are typically expressed by using two measures: Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO and RPO requirements are derived by determining the downtime cost to the organization if a disaster happens.

重要

以最佳作法而言,建議您在開發復原策略及實作技術解決方案之前,先清楚找出並量化貴組織的 RTO 和 RPO。請將重點放在需要達成的目標,而非達成的方式。As a best practice we recommend that you clearly identify and quantify your organization's RTO and RPO before you develop a recovery strategy and implement a technical solution. Focus on what is required, not how to do it.

停機成本在不同產業間和相同產業內都極為不同,主要是因為停機的影響各有不同。企業規模是最明顯的因素,但這並非唯一的因素。要設定量值前,必須先確定失敗事件的性質和潛在影響。重要應用程式失敗所造成的影響,可簡單歸類為下列類型的損失:Downtime costs vary significantly between and within industries, especially due to the different effects of downtime. Business size is the most obvious factor. However, it is not the only one. Setting a measure means establishing the nature and implications of the failure. Reduced to the simplest level, a failure of a critical application could lead to the following types of losses:

  • 應用程式服務中斷。停機影響依應用程式和業務而有所不同。Loss of the application service. The effect of downtime varies with the application and the business.

  • 資料遺失。因系統運作中斷而遺失資料,可能造成嚴重的法律與財務影響。Loss of data. The potential loss of data due to a system outage can have significant legal and financial impact.

大多數組織在遇到停機時,都會遭受前述兩種損失,但何種損失的影響最大取決於業務性質。以下這篇文章由 Chris Preimesberger 撰寫並刊登在 eWEEK 上,文中特別點出資料中心停機造成的財務影響:<IT 每意外停機一分鐘,衍生的代價可能高達 5000 美元:報告>。Most organizations will incur a downtime cost from both of the previous types of loss but the nature of the business will determine which type of loss has the biggest effect. The following article, written by Chris Preimesberger at eWEEK, highlights the financial effect of data center downtime. Unplanned IT Downtime Can Cost $5K Per Minute: Report.

在大多數情況下,當資料中心停止運作的程度足以構成災害時,SharePoint 產品是少數幾個必須復原的應用程式之一。因此,我們並不討論災害復原規劃的相關資訊,而是著重在討論有哪些選項可確保您能夠在別處復原 SharePoint Server 2016 伺服器陣列。In most scenarios, SharePoint products is one of several applications that must be recovered in the event of a data center shutdown that qualifies as a disaster. For this reason we have not included information about disaster recovery planning but focus on options for making sure that you can recover your SharePoint Server 2016 farm at another location.

不論災害的類型和規模為何,復原過程需要使用待命資料中心,以便您將伺服器陣列復原到該處。Regardless of the type and scale of a disaster, recovery involves the use of a standby data center that you can recover the farm to.

待命資料中心復原選項Standby data center recovery options

當主要資料中心停止運作,而本機備援系統與備份無法復原時,就需要待命資料中心。依讓替代的伺服器陣列在別處開始運作所需花之時間與工夫的不同,通常可分為熱待命、暖待命和冷待命。我們對這些伺服器陣列復原資料中心的定義如下:Standby data centers are required for scenarios where local redundant systems and backups cannot recover from the outage at the primary data center. The time and immediate effort to get a replacement farm up and running in a different location is often known as a hot, warm, or cold standby. Our definitions for these farm recovery data centers are as follows:

  • 冷待命 :第二個資料中心可在數小時或幾天之內開始運作。Cold standby. A secondary data center that can provide availability within hours or days.

  • 暖待命:第二個資料中心可在幾分鐘或數小時之內開始運作。Warm standby. A secondary data center that can provide availability within minutes or hours.

  • 熱待命:第二個資料中心可在幾秒或幾分鐘之內開始運作。Hot standby. A secondary data center that can provide availability within seconds or minutes.

這每種待命資料中心各有其特殊的特性和需求,也有其相關的操作與維護成本。Each of these standby data centers has specific characteristics and requirements, and also an associated cost to operate and maintain.

  • 冷待命災害復原策略:企業定期將支援裸機復原的備份傳送至各地與各區域的異地存放裝置,且簽有合約可在另一區域緊急租用伺服器。Cold standby disaster recovery strategy: A business ships backups to support bare metal recovery to local and regional offsite storage regularly, and has contracts in place for emergency server rentals in another region.

    Pros: 通常是維護成本最低的選項,因為不需太多操作。通常是復原成本較高的選項,因為在發生災害之後,必須正確設定實體伺服器。Pros: Often the cheapest option to maintain, operationally. Often an expensive option to recover, because it requires that physical servers be configured correctly after a disaster has occurred.

    缺點:復原速度最慢的選項。Cons: The slowest option to recover.

  • 暖待命災害復原策略:企業將備份或虛擬機器映像檔傳送至各地或各區域的災害復原伺服器陣列。Warm standby disaster recovery strategy: A business ships backups or virtual machine images to local and regional disaster recovery farms.

    Pros: 復原成本通常很低,因為在復原時,虛擬伺服器陣列幾乎不需要進行什麼設定。Pros: Often fairly inexpensive to recover, because a virtual server farm can require little configuration upon recovery.

    Cons: 維護起來可能既昂貴又費時。Cons: Can be very expensive and time-consuming to maintain.

  • 熱待命災害復原策略:企業有多個資料中心在運作,但只透過其中一個資料中心來提供內容和服務。Hot standby disaster recovery strategy: A business runs multiple data centers, but serves content and services through only one data center.

    Pros: 復原速度通常相當快。Pros: Often fairly fast to recover.

    Cons: 設定與維護成本可能很高。Cons: Can be very expensive to configure and maintain.

重要

不論您決定套用上述何種災害復原解決方案,遺失一些資料是難免的。No matter which of the previous disaster recovery solutions that you decide to apply, there is likely going to be some data loss.

冷待命復原Cold standby recovery

在冷待命災害復原案例中,復原方式是在新的位置架設新的伺服器陣列 (最好用指令碼進行部署),然後還原備份。或者,也可以透過用備份解決方案 (例如 System Center 2016 - Data Protection Manager (DPM) 或 System Center 2012 - Data Protection Manager (DPM)) 將伺服器陣列還原的方式進行復原。System Center Data Protection Manager 是在電腦作業系統層級保護您的資料,讓您個別還原每部伺服器。本文不含關於如何在冷待命案例中建立及復原伺服器陣列的詳細指示。如需詳細資訊,請參閱:In a cold standby disaster recovery scenario, you recover by setting up a new farm in a new location, (preferably by using a scripted deployment), and restoring backups. Or, you can recover by restoring the farm using a backup solution such as System Center 2016 - Data Protection Manager (DPM) or System Center 2012 - Data Protection Manager (DPM). System Center Data Protection Manager protects your data at the computer operating system level and lets you restore each server individually. This article does not contain detailed instructions for how to create and recover in cold standby scenarios. For more information, see:

暖待命復原Warm standby recovery

在暖待命災害復原案例中,您需要在替代的資料中心建立一模一樣的伺服器陣列來建立暖待命環境,並且定期使用主要伺服器陣列的完整與增量備份來加以更新。In a warm standby disaster recovery scenario, you create a warm standby environment by creating a duplicate farm at the alternate data center and ensure that it is updated regularly by using full and incremental backups of the primary farm.

虛擬暖待命環境Virtual warm standby environments

虛擬化是一種可行且符合成本效益的暖待命復原解決方案選項。您可以使用 Hyper-V 作為內部解決方案,或使用 Azure 作為代管解決方案,以提供復原所需的基礎結構。Virtualization provides a workable and cost effective option for a warm standby recovery solution. You can use Hyper-V as an in-house solution or Azure as a hosted solution to provide necessary infrastructure for recovery.

您可以建立實際執行伺服器的虛擬映像檔,並將這些映像檔傳送至待命資料中心。如果採用虛擬待命解決方案,您必須確保會經常建立虛擬映像檔,讓伺服器陣列保有夠新鮮的設定與內容,而適合用來復原伺服器陣列。在這次要位置,必須有個環境可讓您在其中輕鬆地設定及連接映像檔來重建伺服器陣列環境。如需詳細資訊,請參閱使用 SQL Server AlwaysOn 可用性群組在 Azure 中部署 SharePoint Server 2016You can create virtual images of the production servers and ship these images to the standby data center. By using the virtual standby solution, you have to make sure that the virtual images are created often enough to provide the level of farm configuration and content freshness that you must have for recovering the farm. At the secondary location, you must have an environment available in which you can easily configure and connect the images to re-create your farm environment. For more information, see Deploying SharePoint Server 2016 with SQL Server AlwaysOn Availability Groups in Azure

熱待命復原Hot standby recovery

在熱待命災害復原案例中,您需要在待命資料中心架設容錯移轉伺服器陣列,以防萬一主要伺服器陣列離線,就能瞬間接手處理實際執行作業。在另設有容錯移轉伺服器陣列的環境中,具有下列特性:In a hot standby disaster recovery scenario, you set up a failover farm in the standby data center so that it can assume production operations almost immediately after the primary farm goes offline. An environment that has a separate failover farm has the following characteristics:

  • 容錯移轉伺服器陣列上必須另維護一個設定資料庫與 SharePoint 管理中心網站內容資料庫。A separate configuration database and the SharePoint Central Administration website content database must be maintained on the failover farm.

  • 所有自訂項目必須同時部署在兩個伺服器陣列上。All customizations must be deployed on both farms.

    提示

    兩個伺服器陣列保持一致,並減少出錯的機會,建議您採用指令碼形式的部署,以相同的組態設定與自訂項目來建立主要與容錯移轉伺服器陣列。There is consistency between the two farms and to reduce the possibility of error we recommend that you use scripted deployment to create the primary and failover farm by using the same configuration settings and customizations.

  • 兩個伺服器陣列都必須套用作業系統、SQL Server 和 SharePoint Server 軟體更新,讓兩個伺服器陣列維持一致的設定。Operating system, SQL Server and SharePoint Server software updates must be applied to both farms, to maintain a consistent configuration across both farms.

  • 您可以使用非同步鏡像、可用性群組複本上的非同步認可,或記錄傳送,將 SharePoint Server 內容資料庫複製到容錯移轉伺服器陣列。You can copy SharePoint Server content databases to the failover farm by using asynchronous mirroring, asynchronous commit on an availability group replica, or log-shipping.

    注意

    SQL Server 鏡像只能用於將資料庫複製到單一鏡像伺服器,但您可以將記錄傳送至多部次要伺服器。SQL Server mirroring can only be used to copy databases to a single mirror server, but you can log-ship to multiple secondary servers.

    SQL Server 資料庫鏡像功能將在未來版本予以移除。建議您在新的開發工作中避免使用此功能,並規劃變更目前使用此功能的應用程式。請改用 AlwaysOn 可用性群組。The SQL Server database mirroring feature will be removed in future versions. We recommend that you avoid using this feature in new development work. Plan to change applications that currently use this feature. Use AlwaysOn Availability Groups instead.

  • 能否透過記錄將服務應用程式傳送至伺服器陣列,各服務應用程式皆有不同。如需詳細資訊,請參閱本文稍後的服務應用程式備援Service applications vary in whether they can be log-shipped to a farm. For more information, see Service application redundancy later in this article.

熱待命伺服器陣列拓撲可以重複運用在多個資料中心,只要您設定 SQL Server 透過記錄傳送至一或多個其他資料中心即可。The hot standby farm topology can be repeated across more than one data center, as long as you configure SQL Server log shipping to one or more additional data centers.

重要

針對災害復原採用容錯移轉作法時,可用的網路頻寬和延遲性是主要的考量因素。建議您洽詢 SAN 廠商,確定您是否可使用 SAN 複寫或其他支援的機制,跨資料中心提供熱待命層級的可用性。Available network bandwidth and latency are major considerations when you are using a failover approach for disaster recovery. We recommend that you consult with your SAN vendor to determine whether you can use SAN replication or another supported mechanism to provide the hot standby level of availability across data centers.

服務應用程式備援Service application redundancy

若要提供跨資料中心的服務應用程式可用性,建議針對可跨伺服器陣列執行的服務,另執行一個可同時讓主要與次要資料中心存取的服務伺服器陣列。To provide availability across data centers for service applications, we recommend that for the services that can be run cross-farm, you run a separate services farm that can be accessed from both the primary and the secondary data centers.

如果服務無法跨伺服器陣列執行,且要提供服務伺服器陣列本身的可用性,為服務應用程式提供跨資料中心的備援,所需的策略各不相同。採用的策略取決於下列因素:For services that cannot be run cross-farm, and to provide availability for the services farm itself, the strategy for providing redundancy across data centers for a service application varies. The strategy employed depends on whether:

  • 在未使用的災害復原伺服器陣列中執行服務應用程式,是否有其商業價值。There is business value in running the service application in the disaster recovery farm when it is not being used.

  • 與服務相關聯的資料庫是否可以透過記錄傳送、受到非同步鏡像,或透過非同步認可受到複寫。The databases associated with the service application can be log-shipped, asynchronously mirrored, or replicated using asynchronous commit.

  • 服務應用程式是否可以對唯讀資料庫執行。The service application can run against read-only databases.

在設計一套使用暖待命或熱待命資料中心的災害復原解決方案之前,請檢閱<SharePoint 資料庫支援的高可用性和災害復原選項>一文。Review the Supported high availability and disaster recovery options for SharePoint databases article before designing a disaster recovery solution that uses a warm or hot standby data center.

復原的系統需求System requirements for recovery

理想情況是,容錯移轉元件與系統在各方面都符合主要元件與系統:平台、硬體和伺服器數目。容錯移轉環境至少要能夠處理容錯移轉期間預期會有的流量。切記,容錯移轉網站可能只需要服務一部分使用者。系統至少至少要在下列方面相符:In an ideal scenario, the failover components and systems match the primary components and systems in all ways: platform, hardware, and number of servers. At a minimum, the failover environment must be able to handle the traffic that you expect during a failover. Keep in mind that only a subset of users may have to be served by the failover site. The systems must match in at least the following:

  • 作業系統版本和所有更新Operating system version and all updates

  • SQL Server 版本和所有更新SQL Server versions and all updates

  • SharePoint Server 版本和所有更新SharePoint Server versions and all updates

除了上述需求,設施與基礎結構元件的可用性,也會影響伺服器陣列的復原時間:In addition to the previous requirements, farm recovery time will also be affected by availability of facilities and infrastructure components. Make sure that the following requirements are met:

  • 電源、冷卻、網路、目錄和 SMTP 受到完整備援Power, cooling, network, directory, and SMTP are fully redundant

  • 選擇符合您需求的切換機制:DNS 或硬體負載平衡。Choose a switching mechanism; whether DNS or hardware load balancing, that meets your needs.

另請參閱See also

概念Concepts

SharePoint Server 的高可用性與災害復原概念High availability and disaster recovery concepts in SharePoint Server