AlwaysOn 故障转移群集实例 (SQL Server)Always On Failover Cluster Instances (SQL Server)

适用对象: yesSQL ServeryesAzure SQL 数据库noAzure SQL 数据仓库no并行数据仓库APPLIES TO: yesSQL Server noAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

作为 SQL ServerSQL Server Always On 产品/服务的一部分,Always On 故障转移群集实例利用 Windows Server 故障转移群集 (WSFC) 功能通过冗余在实例级别(故障转移群集实例 (FCI))提供了本地高可用性 。As part of the SQL ServerSQL Server Always On offering, Always On Failover Cluster Instances leverages Windows Server Failover Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level-a failover cluster instance (FCI). FCI 是在 Windows Server 故障转移群集 (WSFC) 节点上和(可能)多个子网中安装的单个 SQL ServerSQL Server 实例。An FCI is a single instance of SQL ServerSQL Server that is installed across Windows Server Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. 在网络中,FCI 显示为在单台计算机上运行的 SQL ServerSQL Server 实例,不过它提供了从一个 WSFC 节点到另一个 WSFC 节点的故障转移(如果当前节点不可用)。On the network, an FCI appears to be an instance of SQL ServerSQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.

FCI 可利用可用性组提供数据库级别的远程灾难恢复。An FCI can leverage Availability Groups to provide remote disaster recovery at the database level. 有关详细信息,请参阅故障转移群集和可用性组 (SQL Server)For more information, see Failover Clustering and Availability Groups (SQL Server).

备注

Windows Server 2016 Datacenter 版引入了对存储空间直通 (S2D) 的支持。Windows Server 2016 Datacenter edition introduces support for Storage Spaces Direct (S2D). SQL Server 故障转移群集实例支持将 S2D 用于群集存储资源。SQL Server Failover Cluster Instances support S2D for cluster storage resources. 有关详细信息,请参阅 Windows Server 2016 中的存储空间直通For more information, see Storage Spaces Direct in Windows Server 2016.

故障转移群集实例还支持群集共享卷 (CSV)。Failover Cluster Instances also support Clustered Shared Volumes (CSV). 有关详细信息,请参阅 了解故障转移群集中的群集共享卷For more information, see Understanding Cluster Shared Volumes in a Failover Cluster.

本主题内容:In this Topic:

故障转移群集实例的优点Benefits of a Failover Cluster Instance

当服务器上出现硬件或软件故障时,连接到该服务器的应用程序或客户端将会停机。When there is hardware or software failure of a server, the applications or clients connecting to the server will experience downtime. 在将 SQL ServerSQL Server 实例配置为 FCI(而非独立实例)时,该 SQL ServerSQL Server 实例的高可用性受到 FCI 中提供的冗余节点的保护。When a SQL ServerSQL Server instance is configured to be an FCI (instead of a standalone instance), the high availability of that SQL ServerSQL Server instance is protected by the presence of redundant nodes in the FCI. 在 FCI 中,一次只能有一个节点拥有 WSFC 资源组。Only one of the nodes in the FCI owns the WSFC resource group at a time. 在出现故障(硬件故障、操作系统故障、应用程序或服务故障)或进行计划的升级时,该资源组的所有权就会转移至另一个 WSFC 节点。In case of a failure (hardware failures, operating system failures, application or service failures), or a planned upgrade, the resource group ownership is moved to another WSFC node. 此过程对于连接到 SQL ServerSQL Server 的客户端或应用程序是透明的,可以最大限度地缩短出现故障时应用程序或客户端的停机时间。This process is transparent to the client or application connecting to SQL ServerSQL Server and this minimize the downtime the application or clients experience during a failure. 以下列出了 SQL ServerSQL Server 故障转移群集实例提供的一些主要优点:The following lists some key benefits that SQL ServerSQL Server failover cluster instances provide:

  • 通过冗余提供实例级的保护Protection at the instance level through redundancy

  • 在出现故障(硬件故障、操作系统故障、应用程序或服务故障)时自动进行故障转移Automatic failover in the event of a failure (hardware failures, operating system failures, application or service failures)

    重要

    在可用性组中,不支持从 FCI 到可用性组中其他节点的自动故障转移。In an availability group, automatic failover from an FCI to other nodes within the availability group is not supported. 这意味着,如果自动故障转移是高可用性解决方案的一个重要组成部分,则 FCI 和独立节点不应在某一可用性组中结合在一起使用。This means that FCIs and standalone nodes should not be coupled together within an availability group if automatic failover is an important component your high availability solution. 不过,对于灾难恢复解决方案而言,可以进行此类结合使用。However, this coupling can be made for your disaster recovery solution.

  • 支持多种存储解决方案,包括 WSFC 群集磁盘(iSCSI、光纤信道等)和服务器消息块 (SMB) 文件共享。Support for a broad array of storage solutions, including WSFC cluster disks (iSCSI, Fiber Channel, and so on) and server message block (SMB) file shares.

  • 使用多子网 FCI 或在可用性组中运行 FCI 托管数据库的灾难恢复解决方案。Disaster recovery solution using a multi-subnet FCI or running an FCI-hosted database inside an availability group. 利用 MicrosoftMicrosoftSQL Server 2012 (11.x)SQL Server 2012 (11.x)中的新的多子网支持功能,多子网 FCI 不再需要虚拟 LAN,因此可提高多子网 FCI 的可管理性和安全性。With the new multi-subnet support in MicrosoftMicrosoftSQL Server 2012 (11.x)SQL Server 2012 (11.x), a multi-subnet FCI no longer requires a virtual LAN, increasing the manageability and security of a multi-subnet FCI.

  • 故障转移过程中无需重新配置应用程序和客户端Zero reconfiguration of applications and clients during failovers

  • 用于实现自动故障转移的针对具体触发器事件的灵活的故障转移策略Flexible failover policy for granular trigger events for automatic failovers

  • 通过使用专用和持久的连接执行定期的详细运行状况检测,实现可靠的故障转移Reliable failovers through periodic and detailed health detection using dedicated and persisted connections

  • 通过间接后台检查点在故障转移期间实现可配置性和可预测性Configurability and predictability in failover time through indirect background checkpoints

  • 故障转移期间限制对资源的使用Throttled resource usage during failovers

建议Recommendations

在生产环境中,我们建议将静态 IP 地址与故障转移群集实例的虚拟 IP 地址结合使用。In a production environment, we recommend that you use static IP addresses in conjunction the virtual IP address of a Failover Cluster Instance. 我们不建议在生产环境中使用 DHCP。We recommend against using DHCP in a production environment. 在停机情况下,如果 DHCP IP 租期已到,则它需要额外的时间重新注册与 DNS 名称关联的新 DHCP IP 地址。In the event of down time, if the DHCP IP lease expires, extra time is required to re-register the new DHCP IP address associated with the DNS name.

故障转移群集实例概述Failover Cluster Instance Overview

FCI 会在具有一个或多个 WSFC 节点的 WSFC 资源组中运行。An FCI runs in a WSFC resource group with one or more WSFC nodes. 当 FCI 启动时,这些节点中的某个节点将获取该资源组的所有权并使其 SQL ServerSQL Server 实例处于联机状态。When the FCI starts up, one of the nodes assume ownership of the resource group and brings its SQL ServerSQL Server instance online. 此节点拥有的资源包括:The resources owned by this node include:

  • 网络名称Network name

  • IP 地址IP address

  • 共享磁盘Shared disks

  • SQL ServerSQL Server 数据库引擎服务Database Engine service

  • SQL ServerSQL Server 代理服务Agent service

  • SQL ServerSQL Server Analysis Services 服务(如果已安装)Analysis Services service, if installed

  • 一个文件共享资源(如果安装了 FILESTREAM 功能)One file share resource, if the FILESTREAM feature is installed

任何时候,只有资源组所有者(而非 FCI 中的任何其他节点)将在资源组中运行各自的 SQL ServerSQL Server 服务。At any time, only the resource group owner (and no other node in the FCI) is running its respective SQL ServerSQL Server services in the resource group. 在出现故障转移(无论是自动故障转移还是计划的故障转移)时,将发生以下事件序列:When a failover occurs, whether it be an automatic failover or a planned failover, the following sequence of events happen:

  1. 除非出现硬件或系统故障,否则会将缓冲区缓存中的所有脏页写入磁盘。Unless a hardware or system failure occurs, all dirty pages in the buffer cache are written to disk.

  2. 资源组中所有相应的 SQL ServerSQL Server 服务都将在活动节点上停止。All respective SQL ServerSQL Server services in the resource group are stopped on the active node.

  3. 资源组所有权将转移到 FCI 中的另一个节点。The resource group ownership is transferred to another node in the FCI.

  4. 新资源组所有者将启动其 SQL ServerSQL Server 服务。The new resource group owner starts its SQL ServerSQL Server services.

  5. 客户端应用程序连接请求将自动定向到使用相同虚拟网络名称 (VNN) 的新活动节点。Client application connection requests are automatically directed to the new active node using the same virtual network name (VNN).

只要 FCI 的基础 WSFC 群集处于良好的仲裁运行状况(大多数仲裁 WSFC 节点可用作自动故障转移目标),FCI 就将处于联机状态。The FCI is online as long as its underlying WSFC cluster is in good quorum health (the majority of the quorum WSFC nodes are available as automatic failover targets). 当 WSFC 群集丢失其仲裁时(无论此情况是因硬件、软件、网络故障还是不适当的仲裁配置导致的),整个 WSCF 群集以及 FCI 将脱机。When the WSFC cluster loses its quorum, whether due to hardware, software, network failure, or improper quorum configuration, the entire WSFC cluster, along with the FCI, is brought offline. 在此计划外故障转移方案中,需要手动干预以在剩余可用节点中重新建立仲裁,以使 WSFC 群集和 FCI 重新联机。Manual intervention is then required in this unplanned failover scenario to reestablish quorum in the remaining available nodes in order to bring the WSFC cluster and FCI back online. 有关详细信息,请参阅 WSFC 仲裁模式和投票配置 (SQL Server)For more information, see WSFC Quorum Modes and Voting Configuration (SQL Server).

可预测的故障转移时间Predictable Failover Time

根据 SQL ServerSQL Server 实例上次执行检查点操作的时间,缓冲区缓存中可能存在大量脏页。Depending on when your SQL ServerSQL Server instance last performed a checkpoint operation, there can be a substantial amount of dirty pages in the buffer cache. 因此,故障转移持续的时间取决于将剩余脏页写入磁盘的时间,这会导致不可预测的较长的故障转移时间。Consequently, failovers last as long as it takes to write the remaining dirty pages to disk, which can lead to long and unpredictable failover time. MicrosoftMicrosoftSQL Server 2012 (11.x)SQL Server 2012 (11.x)开始,FCI 可以使用间接检查点来限制保存在缓冲区缓存中的脏页数。Beginning with MicrosoftMicrosoftSQL Server 2012 (11.x)SQL Server 2012 (11.x), the FCI can use indirect checkpoints to throttle the amount of dirty pages kept in the buffer cache. 虽然这样做会导致在常规工作负荷下占用额外资源,但将提高故障转移时间的可预测性和可配置性。While this does consume additional resources under regular workload, it makes the failover time more predictable as well as more configurable. 这在组织中的服务级协议指定高可用性解决方案的还原时间目标 (RPO) 时很有用。This is very useful when the service-level agreement in your organization specifies the recovery time objective (RTO) for your high availability solution. 有关间接检查点的详细信息,请参阅 Indirect CheckpointsFor more information on indirect checkpoints, see Indirect Checkpoints.

可靠的运行状况监视和灵活的故障转移策略Reliable Health Monitoring and Flexible Failover Policy

成功启动 FCI 后,WSFC 服务将监视基础 WSFC 群集的运行状况和 SQL ServerSQL Server 实例的运行状况。After the FCI starts successfully, the WSFC service monitors both the health of the underlying WSFC cluster, as well as the health of the SQL ServerSQL Server instance. MicrosoftMicrosoftSQL Server 2012 (11.x)SQL Server 2012 (11.x)开始,WSFC 服务使用专用连接来轮询活动 SQL ServerSQL Server 实例,以便通过系统存储过程获取详细的组件诊断信息。Beginning with MicrosoftMicrosoftSQL Server 2012 (11.x)SQL Server 2012 (11.x), the WSFC service uses a dedicated connection to poll the active SQL ServerSQL Server instance for detailed component diagnostics through a system stored procedure. 这蕴含了三方面的含义:The implication of this is three-fold:

  • 利用与 SQL ServerSQL Server 实例的专用连接,始终能够对组件诊断信息进行可靠轮询,即使在 FCI 负荷较重时也是如此。The dedicated connection to the SQL ServerSQL Server instance makes it possible to reliably poll for component diagnostics all the time, even when the FCI is under heavy load. 这样一来,便能够区分负荷较重的系统与实际具有故障条件的系统,从而阻止出现诸如错误故障转移这样的问题。This makes it possible to distinguish between a system that is under heavy load and a system that actually has failure conditions, thus preventing issues such as false failovers.

  • 利用详细组件诊断信息,可以配置更灵活的故障转移策略,由此您便能选择哪些故障条件将触发故障转移以及哪些故障条件将不触发故障转移。The detailed component diagnostics makes it possible to configure a more flexible failover policy, whereby you can choose what failure conditions trigger failovers and which failure conditions do not.

  • 利用详细组件诊断信息,还可以通过追溯方式更好地对自动故障转移进行故障排除。The detailed component diagnostics also enables better troubleshooting of automatic failovers retroactively. 诊断信息将存储到与 SQL ServerSQL Server 错误日志并置的日志文件中。The diagnostic information is stored to log files, which are collocated with the SQL ServerSQL Server error logs. 可以将这些日志文件加载到日志文件查看器中以检查导致出现故障转移的组件状态,从而确定导致该故障转移的原因。You can load them into the Log File Viewer to inspect the component states leading up to the failover occurrence in order to determine what cause that failover.

有关详细信息,请参阅 Failover Policy for Failover Cluster InstancesFor more information, see Failover Policy for Failover Cluster Instances

故障转移群集实例的元素Elements of a Failover Cluster Instance

FCI 由一组物理服务器(节点)构成,这些服务器包含类似的硬件配置以及相同的软件配置,其中包括操作系统版本和修补程序级别,以及 SQL ServerSQL Server 版本、修补程序级别、组件和实例名称。An FCI consists of a set of physical servers (nodes) that contain similar hardware configuration as well as identical software configuration that includes operating system version and patch level, and SQL ServerSQL Server version, patch level, components, and instance name. 相同的软件配置是确保 FCI 在节点间进行故障转移时能够正常运行所必需的。Identical software configuration is necessary to ensure that the FCI can be fully functional as it fails over between the nodes.

WSFC 资源组WSFC Resource Group
SQL ServerSQL Server FCI 在 WSFC 资源组中运行。A SQL ServerSQL Server FCI runs in a WSFC resource group. 该资源组中的每个节点均维护配置设置和检查点注册表项的同步副本,以确保 FCI 在故障转移后可完全正常运行,并且群集中一次只有一个节点(活动节点)拥有该资源组。Each node in the resource group maintains a synchronized copy of the configuration settings and check-pointed registry keys to ensure full functionality of the FCI after a failover, and only one of the nodes in the cluster owns the resource group at a time (the active node). WSFC 服务可管理服务器群集、仲裁配置、故障转移策略和故障转移操作以及 FCI 的 VNN 和虚拟 IP 地址。The WSFC service manages the server cluster, quorum configuration, failover policy, and failover operations, as well as the VNN and virtual IP addresses for the FCI. 在出现故障(硬件故障、操作系统故障、应用程序或服务故障)或进行计划的升级时,资源组所有权就转移至 FCI 中的另一个节点。WSFC 资源组中支持的节点数取决于 SQL ServerSQL Server 版本。In case of a failure (hardware failures, operating system failures, application or service failures) or a planned upgrade, the resource group ownership is moved to another node in the FCI.The number of nodes that are supported in a WSFC resource group depends on your SQL ServerSQL Server edition. 另外,同一个 WSFC 群集可运行多个 FCI(多个资源组),具体取决于您的硬件能力(如 CPU、内存和磁盘数)。Also, the same WSFC cluster can run multiple FCIs (multiple resource groups), depending on your hardware capacity, such as CPUs, memory, and number of disks.

SQL Server 二进制文件SQL Server Binaries
产品二进制文件本地安装在 FCI 的每个节点上,此过程类似于 SQL ServerSQL Server 独立安装。The product binaries are installed locally on each node of the FCI, a process similar to SQL ServerSQL Server stand-alone installations. 但是,在启动过程中,服务将不会自动启动,而是由 WSFC 管理。However, during startup, the services are not started automatically, but managed by WSFC.

存储器Storage
与可用性组相反,对于数据库和日志存储,FCI 必须在 FCI 的所有节点之间使用共享存储。Contrary to the availability group, an FCI must use shared storage between all nodes of the FCI for database and log storage. 共享存储的形式可以为 WSFC 群集磁盘、SAN 上的磁盘、存储空间直通 (S2D) 或 SMB 上的文件共享。The shared storage can be in the form of WSFC cluster disks, disks on a SAN, Storage Spaces Direct (S2D), or file shares on an SMB. 这样一来,当发生故障转移时,FCI 中的所有节点都会具有相同的实例数据视图。This way, all nodes in the FCI have the same view of instance data whenever a failover occurs. 但这意味着,共享存储有可能成为单个故障点,并且 FCI 依赖于基本存储解决方案来确保数据保护。This does mean, however, that the shared storage has the potential of being the single point of failure, and FCI depends on the underlying storage solution to ensure data protection.

网络名称Network Name
FCI 的 VNN 为 FCI 提供了一个统一连接点。The VNN for the FCI provides a unified connection point for the FCI. 这将允许应用程序连接到 VNN,而无需知道当前活动节点。This allows applications to connect to the VNN without the need to know the currently active node. 当发生故障转移时,VNN 会在新的活动节点启动后注册到该节点。When a failover occurs, the VNN is registered to the new active node after it starts. 此过程对于连接到 SQL ServerSQL Server 的客户端或应用程序是透明的,可以最大限度地缩短出现故障时应用程序或客户端的停机时间。This process is transparent to the client or application connecting to SQL ServerSQL Server and this minimize the downtime the application or clients experience during a failure.

虚拟 IPVirtual IPs
对于多子网 FCI,将为 FCI 中的每个子网分配一个虚拟 IP 地址。In the case of a multi-subnet FCI, a virtual IP address is assigned to each subnet in the FCI. 在故障转移期间,将更新 DNS 服务器上的 VNN 以指向各自子网的虚拟 IP 地址。During a failover, the VNN on the DNS server is updated to point to the virtual IP address for the respective subnet. 在发生多子网故障转移后,应用程序和客户端可使用同一 VNN 连接到 FCI。Applications and clients can then connect to the FCI using the same VNN after a multi-subnet failover.

SQL Server 故障转移的概念和任务SQL Server Failover Concepts and Tasks

概念和任务Concepts and Tasks 主题Topic
介绍故障检测机制和灵活的故障转移策略。Describes the failure detection mechanism and the flexible failover policy. Failover Policy for Failover Cluster InstancesFailover Policy for Failover Cluster Instances
介绍 FCI 管理和维护概念。Describes concepts in FCI administration and maintenance. 故障转移群集实例管理和维护Failover Cluster Instance Administration and Maintenance
介绍多子网配置和概念Describes multi-subnet configuration and concepts SQL Server 多子网群集 (SQL Server)SQL Server Multi-Subnet Clustering (SQL Server)

相关主题Related Topics

主题说明Topic descriptions 主题Topic
介绍如何安装新的 SQL ServerSQL Server FCI。Describes how to install a new SQL ServerSQL Server FCI. 创建新的 SQL Server 故障转移群集(安装程序)Create a New SQL Server Failover Cluster (Setup)
介绍如何升级到 SQL Server 2017SQL Server 2017 故障转移群集。Describes how to upgrade to a SQL Server 2017SQL Server 2017 failover cluster. 升级 SQL Server 故障转移群集实例Upgrade a SQL Server Failover Cluster Instance
介绍 Windows 故障转移群集的概念并提供指向 Windows 故障转移群集相关任务的链接Describes Windows Failover Clustering Concepts and provides links to tasks related to Windows Failover Clustering Windows Server 2008(可能为英文页面)Windows Server 2008设置用户帐户 :故障转移群集的概述: Overview of Failover Clusters

Windows Server 2008(可能为英文页面)Windows Server 2008 R2:故障转移群集的概述R2: Overview of Failover Clusters
介绍 FCI 中的节点和可用性组中的副本的概念区别以及有关使用 FCI 承载可用性组的副本的注意事项。Describes the distinctions in concepts between nodes in an FCI and replicas within an availability group and considerations for using an FCI to host a replica for an availability group. 故障转移群集和可用性组 (SQL Server)Failover Clustering and Availability Groups (SQL Server)