您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

应用设计原则和高级操作Apply design principles and advanced operations

头三项云管理规则介绍管理基线。The first three cloud management disciplines describe a management baseline. 至少,管理基准还应包括标准的业务承诺来最大程度地减少业务中断,并在服务中断时加快恢复速度。At a minimum, a management baseline should include a standard business commitment to minimize business interruptions and accelerate recovery if service is interrupted. 大多数管理基线都包含严格关注的重点:维护 "清单和可见性"、"操作相容性" 和 "保护和恢复"。Most management baselines include a disciplined focus on maintaining "inventory and visibility," "operational compliance," and "protection and recovery."

管理基线的目的是创建一致的产品/服务,为所有受支持的工作负荷提供最低级别的业务承诺。The purpose of a management baseline is to create a consistent offering that provides a minimum level of business commitment for all supported workloads. 这种常见的可重复管理产品的基线允许团队提供高度优化的操作管理,但偏差最小。This baseline of common, repeatable management offerings allows the team to deliver a highly optimized degree of operational management, with minimal deviation. 但这种标准产品/服务可能无法提供丰富的业务承诺。But that standard offering might not provide a rich enough commitment to the business.

下一节中的关系图说明了超出管理基线的三种方法。The diagram in the next section illustrates three ways to go beyond the management baseline.

管理基准应满足项目组合中80% 的最低严重性工作负载所需的最小承诺。The management baseline should meet the minimum commitment required by 80 percent of the lowest criticality workloads in the portfolio. 基线不应应用于任务关键型工作负荷。The baseline should not be applied to mission-critical workloads. 也不应将其应用于跨工作负荷共享的通用平台。Nor should it be applied to common platforms that are shared across workloads. 这些工作负荷需要重点关注设计原则和高级操作。Those workloads require a focus on design principles and advanced operations.

高级操作选项Advanced operations options

有三个建议的路径可用于改进管理基线以外的业务承诺,如下图所示:There are three suggested paths for improving business commitments beyond the management baseline, as shown in the following diagram:

高级操作

增强的管理基线Enhanced management baseline

如 Azure 管理指南中所述,增强的管理基线使用云本机工具来提高运行时间和缩短恢复时间。As outlined in the Azure Management Guide, an enhanced management baseline uses cloud-native tools to improve uptime and decrease recovery times. 改进功能非常重要,但不如工作负荷或平台专门化。The improvements are significant, but less so than with workload or platform specialization. 增强的管理基线的优点是成本和实现时间的大幅降低。The advantage of an enhanced management baseline is the equally significant reduction in cost and implementation time.

管理专用化Management specialization

工作负荷和平台操作的各个方面可能需要更改设计和体系结构原则。Aspects of workload and platform operations might require changes to design and architecture principles. 这些更改可能需要一段时间,并且可能会导致增加运营费用。Those changes could take time and might result in increased operating expenses. 若要减少需要此类投资的工作负荷的数目,可以使用能够对业务承诺提供足够改进的增强型管理基线。To reduce the number of workloads requiring such investments, an enhanced management baseline could provide enough of an improvement to the business commitment.

对于保证更高投资以满足业务承诺的工作负荷,操作的专用化是关键所在。For workloads that warrant a higher investment to meet a business commitment, specialization of operations is key.

管理专用化领域Areas of management specialization

专用化分为两个方面:There are two areas of specialization:

  • 平台专用化: 投资共享平台的正在进行的操作,将投资分散到多个工作负荷。Platform specialization: Invest in ongoing operations of a shared platform, distributing the investment across multiple workloads.
  • 工作负荷专用化: 对特定工作负荷的持续操作进行投资,通常为任务关键型工作负荷预留。Workload specialization: Invest in ongoing operations of a specific workload, generally reserved for mission-critical workloads.

中心 IT 团队或云 (CCoE) Central IT team or cloud center of excellence (CCoE)

平台特殊化与工作负荷专用化之间的决策取决于每个工作负荷的重要程度和影响。Decisions between platform specialization and workload specialization are based on the criticality and impact of each workload. 不过,这些决策还表明了中心 IT 团队和 CCoE 组织模型之间的文化更大的决策。However, these decisions are also indicative of larger cultural decisions between central IT team and CCoE organizational models.

工作负荷专用化通常会触发文化变革。Workload specialization often triggers a cultural change. 传统 IT 并集中在一起,这两个版本都可提供大规模支持。Traditional IT and centralized IT both build processes that can provide support at scale. 对于在管理基线、增强基准甚至平台操作中找到的可重复服务,缩放支持更可实现。Scale support is more achievable for repeatable services found in a management baseline, enhanced baseline, or even platform operations. 工作负荷专用化并不经常缩放。Workload specialization doesn't often scale. 这种规模缺乏规模会使集中式 IT 组织难以提供必要的支持,而无需达到组织的规模限制。This lack of scale makes it difficult for a centralized IT organization to provide necessary support without reaching organizational scale limitations.

或者,一种优秀的云中心通过目的性的责任和选择性集中进行扩展。Alternatively, a cloud center of excellence approach scales through purposeful delegation of responsibility and selective centralization. 工作负荷专用化往往与委托的责任方法 CCoE 更好。Workload specialization tends to better align with the delegated responsibility approach of a CCoE.

CCoE 中角色的自然对齐方式如下所述:The natural alignment of roles in a CCoE is outlined as follows:

  • 云平台团队有助于构建支持多个云采用团队的通用平台。The cloud platform team helps build common platforms that support multiple cloud adoption teams.
  • 云自动化团队将这些平台扩展为服务目录中可部署的资产。The cloud automation team extends those platforms into deployable assets in a service catalog.
  • 云管理集中提供管理基线,并帮助支持服务目录的使用。Cloud management delivers the management baseline centrally and helps support the use of the service catalog.
  • 但业务部门 (以 business DevOps 团队或云采用团队的形式) 持有工作负荷、管道或性能的日常操作的责任。But the business unit (in the form of a business DevOps team or cloud adoption team) holds responsibility for day-to-day operations of the workload, pipeline, or performance.

与协调管理领域一样,中心 IT 团队和 CCoE 模型通常可在平台特殊化上提供,并具有最小的文化变动。As for aligning areas of management, central IT team and CCoE models can generally deliver on platform specialization, with minimal cultural change. 对于中心 IT 团队而言,交付工作负荷专用化可能更为复杂。Delivering on workload specialization might be more complex for central IT teams.

管理专用化过程Management specialization processes

在每个特殊化范围内,以下四步过程以一种规范的迭代方法提供。Within each specialization, the following four-step process is delivered in a disciplined, iterative approach. 此方法需要在云采用、云平台、云自动化和云管理专家之间建立合作,以创建可行且有经验的反馈循环。This approach requires partnership among cloud adoption, cloud platform, cloud automation, and cloud management experts to create a viable and informed feedback loop.

  • 改善系统设计:) 或特定工作负载,提高常见系统 (平台的设计,以有效地最大限度地减少中断。Improve system design: Improve the design of common systems (platforms) or specific workloads to effectively minimize interruptions.
  • 自动修正: 某些改进并非经济高效。Automate remediation: Some improvements are not cost effective. 在这种情况下,自动进行修正并降低中断的影响可能更有意义。In such cases, it might make more sense to automate remediation and reduce the impact of interruptions.
  • 缩放解决方案: 随着系统设计和自动修正的改进,你可以通过服务目录在环境中缩放这些更改。Scale the solution: As systems design and automated remediation are improved, you can scale those changes across the environment through the service catalog.
  • 持续改进: 您可以使用多种监视工具发现在下一轮系统设计、自动化和缩放过程中要解决的增量改进。Continuous improvement: You can use various monitoring tools to discover incremental improvements to address in the next pass of system design, automation, and scale.

改进系统设计Improve system design

若要改进任何常用平台的运营,改进系统设计是最有效的方法。Improving system design is the most effective approach to improving operations of any common platform. 系统设计改进有助于增加稳定性并降低业务中断。System design improvements can help increase stability and decrease business interruptions. 单个系统的设计超出了在整个云采用框架中使用的环境视图的范围。Design of individual systems is out of scope for the environment view taken throughout the Cloud Adoption Framework.

作为该框架的补充,Microsoft Azure 架构良好的框架提供了提高平台或特定工作负载的质量的指导原则。As a complement to this framework, the Microsoft Azure Well-Architected Framework provides guiding tenets for improving the quality of a platform or a specific workload. 该框架侧重于对卓越架构的五大支柱进行改进:The framework focuses on improvement across five pillars of architecture excellence:

  • 成本优化: 管理成本,将提供的价值最大化。Cost optimization: Manage costs to maximize the value delivered.
  • 卓越运营: 遵循操作流程,让系统在生产环境中持续运行。Operational excellence: Follow operational processes that keep a system running in production.
  • 性能效率: 缩放系统,适应负载中的变化。Performance efficiency: Scale systems to adapt to changes in load.
  • 可靠性: 进行系统设计,使其从故障中恢复并继续正常运行。Reliability: Design systems to recover from failures and continue to function.
  • 安全性: 保护应用程序和数据免受威胁。Security: Protect applications and data from threats.

大多数业务中断实际上是某种形式的技术欠债或体系结构缺陷。Most business interruptions equate to some form of technical debt, or deficiency in the architecture. 对于现有部署,系统设计改进可以说是对现有技术欠债的清偿。For existing deployments, systems design improvements can be viewed as payments against existing technical debt. 对于新部署,系统设计改进可以说是为了避免技术欠债。For new deployments, systems design improvements can be viewed as avoidance of technical debt. 下一节 "自动修正" 查看解决不能或不应解决的技术债务的方法。The next section, "Automated remediation," looks at ways to address technical debt that can't or shouldn't be addressed.

若要改进系统设计,请详细了解 Microsoft Azure Well-Architected FrameworkTo improve system design, learn more about the Microsoft Azure Well-Architected Framework. 随着系统设计的改进,请返回到本文,查找提高和扩展环境的改进的新机会。As your system design improves, return to this article to find new opportunities to improve and scale the improvements across your environment.

自动修正Automated remediation

无法或不应解决某些技术债务。Some technical debt can't or shouldn't be addressed. 解决问题的开销可能过于昂贵。Resolution could be too expensive to correct. 可以计划它,但项目持续时间可能较长。It could be planned but might have a long project duration. 业务中断可能不会对业务造成重大影响,或者业务优先级是快速恢复,而不是投资弹性。The business interruption might not have a significant business impact, or the business priority is to recover quickly instead of investing in resiliency.

如果不需解决技术欠债,则通常情况下,下一步需进行自动修正。When resolution of technical debt isn't the desired path, automated remediation is commonly the desired next step. 使用 Azure 自动化和 Azure Monitor 来检测趋势并提供自动修正是最常用于自动修正的方法。Using Azure Automation and Azure Monitor to detect trends and provide automated remediation is the most common approach to automated remediation.

有关自动修正的指南,请参阅 Azure 自动化和警报For guidance on automated remediation, see Azure Automation and alerts.

通过服务目录扩展解决方案Scale the solution with a service catalog

平台专用化和平台运营的基石是管理良好的服务目录。The cornerstone of platform specialization and platform operations is a well-managed service catalog. 这是改进系统设计并将修正扩展到整个环境的方式。This is how improvements to systems design and remediation are scaled across an environment. 云平台团队和云自动化团队可以合作创建适用于任何环境中的最常用平台的可重复解决方案。The cloud platform team and cloud automation team align to create repeatable solutions to the most common platforms in any environment. 但是,如果这些解决方案不一致地应用,云管理可以提供比基线更少的产品。However, if those solutions aren't consistently applied, cloud management can provide little more than a baseline offering.

为了最大限度地利用和最大限度地减少任何优化平台的维护开销,应将平台添加到服务目录中。To maximize adoption and minimize maintenance overhead of any optimized platform, the platform should be added to a service catalog. 目录中的每个应用程序在部署后可以通过服务目录供内部使用,也可以以市场产品/服务的形式供外部消费者使用。Each application in the catalog can be deployed for internal consumption via the service catalog, or as a marketplace offering for external consumers.

有关发布到服务目录的信息,请参阅 发布到服务目录中的系列。For information about publishing to a service catalog, see the series on publishing to a service catalog.

持续改进Continuous improvement

平台专用化和平台运营均依赖于采用、平台、自动化和管理团队之间的强大反馈循环。Platform specialization and platform operations both depend on strong feedback loops between adoption, platform, automation, and management teams. 这些反馈循环基于数据,使每个团队都能进行明智的决策。Grounding those feedback loops in data empowers each team to make wise decisions. 对于实现长期业务承诺的平台操作,充分利用特定于集中平台的见解非常重要。For platform operations to achieve long-term business commitments, it's important to take advantage of insights that are specific to the centralized platform. 由于容器和 SQL Server 是最常用的两个集中管理的平台,因此请考虑以下文章,开始使用持续改进数据收集:Because containers and SQL Server are the two most common centrally managed platforms, consider beginning with continuous improvement data collection by reviewing the following articles: