您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

监视进程服务器Monitor the process server

本文介绍如何监视 Site Recovery 进程服务器。This article describes how to monitor the Site Recovery process server.

  • 将本地 VMware VM 和物理服务器设置为灾难恢复到 Azure 时,将使用进程服务器。The process server is used when you set up disaster recovery of on-premises VMware VMs and physical servers to Azure.
  • 默认情况下,进程服务器在配置服务器上运行。By default the process server runs on the configuration server. 部署配置服务器时,默认会安装进程服务器。It's installed by default when you deploy the configuration server.
  • (可选)若要缩放和处理更多的复制计算机与复制流量,可以部署更多的横向扩展进程服务器。Optionally, to scale and handle larger numbers of replicated machines and higher volumes of replication traffic, you can deploy additional, scale-out process servers.

详细了解进程服务器的角色和部署。Learn more about the role and deployment of process servers.

监视概述Monitoring overview

由于进程服务器具有大量的角色(尤其是缓存、压缩复制的数据以及将其传输到 Azure 时),因此必须持续监视进程服务器的运行状况。Since the process server has so many roles, particularly in replicated data caching, compression, and transfer to Azure, it's important to monitor process server health on an ongoing basis.

在一般情况下,有很多的因素会影响进程服务器的性能。There are a number of situations that commonly affect process server performance. 影响性能的问题会对 VM 的运行状况产生连锁影响,最终将进程服务器及其复制计算机置于一种严重状态。Issues affecting performance will have a cascading effect on VM health, eventually pushing both the process server and its replicated machines into a critical state. 这些因素包括:Situations include:

  • 大量的 VM 使用进程服务器,使其接近或超过建议的限制。High numbers of VMs use a process server, approaching or exceeding recommended limitations.
  • 使用进程服务器的 VM 存在较高的变动率。VMs using the process server have a high churn rate.
  • VM 与进程服务器之间的网络吞吐量不足,无法将复制数据上传到进程服务器。Network throughput between VMs and the process server isn't enough to upload replication data to the process server.
  • 进程服务器与 Azure 之间的网络吞吐量不足,无法将进程服务器中的复制数据上传到 Azure。Network throughput between the process server and Azure isn't sufficient to upload replication data from the process server to Azure.

所有这些问题都可能会影响 VM 的恢复点目标 (RPO)。All of these issues can affect the recovery point objective (RPO) of VMs.

为什么?Why? 因为生成 VM 的恢复点要求该 VM 上的所有磁盘具有一个共同点。Because generating a recovery point for a VM requires all disks on the VM to have a common point. 如果某个磁盘的变动率较高、复制速度缓慢或进程服务器不是最佳服务器,则会影响到恢复点的创建效率。If one disk has a high churn rate, replication is slow, or the process server isn't optimal, it impacts how efficiently recovery points are created.

主动监视Monitor proactively

若要避免进程服务器出现问题,必须:To avoid issues with the process server, it's important to:

  • 遵循容量和大小指南了解进程服务器的具体要求,并确保根据建议部署和运行进程服务器。Understand specific requirements for process servers using capacity and sizing guidance, and make sure process servers are deployed and running according to recommendations.
  • 监视警报并排查发生的问题,使进程服务器保持有效运行。Monitor alerts, and troubleshoot issues as they occur, to keep process servers running efficiently.

进程服务器警报Process server alerts

进程服务器会生成一些运行状况警报,下表对此做了汇总。The process server generates a number of health alerts, summarized in the following table.

警报类型Alert type 详细信息Details
正常 进程服务器已连接且正常运行。Process server is connected and healthy.
警告 过去 15 分钟的 CPU 利用率超过 80%CPU utilization > 80% for the last 15 minutes
警告 过去 15 分钟的内存使用率超过 80%Memory usage > 80% for the last 15 minutes
警告 过去 15 分钟的缓存文件夹可用空间小于 30%Cache folder free space < 30% for the last 15 minutes
警告 Site Recovery 每五分钟监视一次挂起/传出数据,并估计进程服务器缓存中的数据无法在 30 分钟内上传到 Azure。Site Recovery monitors pending/outgoing data every five minutes, and estimates that data in the process server cache can't be uploaded to Azure within 30 minutes.
警告 进程服务器服务在过去 15 分钟未运行Process server services aren't running for the last 15 minutes
严重 过去 15 分钟的 CPU 利用率超过 95%CPU utilization > 95% for the last 15 minutes
严重 过去 15 分钟的内存使用率超过 95%Memory usage > 95% for the last 15 minutes
严重 过去 15 分钟的缓存文件夹可用空间小于 25%Cache folder free space < 25% for the last 15 minutes
严重 Site Recovery 每五分钟监视一次挂起/传出数据,并估计进程服务器缓存中的数据无法在 45 分钟内上传到 Azure。Site Recovery monitors pending/outgoing data every five minutes, and estimates that data in the process server cache can't be uploaded to Azure within 45 minutes.
严重 进程服务器有 15 分钟未发出检测信号。No heartbeat from the process server for 15 minutes.

表键

备注

进程服务器的总体运行状况是根据生成的最不利警报确定的。The overall health status of the process server is based on the worst alert generated.

监视进程服务器运行状况Monitor process server health

可按如下所述监视进程服务器的运行状况:You can monitor the health state of your process servers as follows:

  1. 若要监视复制的计算机及其进程服务器的复制运行状况和状态,请在保管库中选择“已复制的项”,然后单击要监视的计算机。To monitor the replication health and status of a replicated machine, and of its process server, in vault > Replicated items, click the machine you want to monitor.

  2. 在“复制运行状况”中,可以监视 VM 的运行状况。In Replication Health, you can monitor the VM health status. 单击该状态可以深入查看错误详细信息。Click the status to drill down for error details.

    VM 仪表板中的进程服务器运行状况

  3. 在“进程服务器运行状况”中,可以监视进程服务器的状态。In Process Server Health, you can monitor the status of the process server. 深入查看详细信息。Drill down for details.

    VM 仪表板中的进程服务器详细信息

  4. 也可以使用 VM 页上的图形表示形式来监视运行状况。Health can also be monitored using the graphical representation on the VM page.

    • 如果横向扩展进程服务器出现相关的警告,该服务器将以橙色突出显示;如果出现任何严重问题,它将以红色突出显示。A scale-out process server will be highlighted in orange if there are warnings associated with it, and red if it has any critical issues.
    • 如果进程服务器在配置服务器上的默认部署中运行,则会相应地突出显示配置服务器。If the process server is running in the default deployment on the configuration server, then the configuration server will be highlighted accordingly.
    • 若要深入查看信息,请单击配置服务器或进程服务器。To drill down, click on the configuration server or process server. 注意任何问题和任何补救建议。Note any issues, and any remediation recommendations.

还可以在“Site Recovery 基础结构”下监视保管库中的进程服务器。You can also monitor process servers in the vault under Site Recovery Infrastructure. 在“管理 Site Recovery 基础结构”中,单击“配置服务器”。In Manage your Site Recovery infrastructure, click Configuration Servers. 选择与进程服务器关联的配置服务器,然后深入查看进程服务器详细信息。Select the configuration server associated with the process server, and drill down into process server details.

后续步骤Next steps