建立操作適用性檢閱Establish an operational fitness review

當您的企業開始操作 Azure 中的工作負載時,下一步就是建立 操作健身審查 的流程。As your enterprise begins to operate workloads in Azure, the next step is to establish a process for operational fitness review. 此程式會列舉、實行和反復檢查這些工作負載的非 功能性需求This process enumerates, implements, and iteratively reviews the nonfunctional requirements for these workloads. 非功能性需求與服務預期的操作行為有關。Nonfunctional requirements are related to the expected operational behavior of the service.

非功能性需求有五個基本的類別,也就是 卓越的架構要素:There are five essential categories of nonfunctional requirements, known as the pillars of architecture excellence:

  • 成本最佳化Cost optimization
  • 卓越營運Operational excellence
  • 效能效率Performance efficiency
  • 可靠性Reliability
  • 安全性Security

操作健身審查的程式可確保您的任務關鍵性工作負載符合您的業務對您的業務的期望。A process for operational fitness review ensures that your mission-critical workloads meet the expectations of your business with respect to the pillars.

建立操作健身審查的流程,以充分瞭解在生產環境中執行工作負載所產生的問題,以及如何修復和解決這些問題。Create a process for operational fitness review to fully understand the problems that result from running workloads in a production environment, and how to remediate and resolve those problems. 本文概述您企業可用來達成此目標的高階操作審核程式。This article outlines a high-level process for operational fitness review that your enterprise can use to achieve this goal.

在 Microsoft 的操作適用性Operational fitness at Microsoft

從一開始,Microsoft 的許多團隊都參與了 Azure 平臺的開發。From the outset, many teams across Microsoft have been involved in the development of the Azure platform. 很難確保這種大小和複雜度的專案具有品質與一致性。It's difficult to ensure quality and consistency for a project of such size and complexity. 您需要一個健全的流程,以定期列舉和執行基本的非功能性需求。You need a robust process to enumerate and implement fundamental nonfunctional requirements on a regular basis.

Microsoft 遵循的程式會形成本文中所述程式的基礎。The processes that Microsoft follows form the basis for the processes outlined in this article.

了解問題Understand the problem

如開始使用所述 :加速遷移,企業數位轉型的第一個步驟是透過採用 Azure 來識別要解決的商務問題。As discussed in Get started: Accelerate migration, the first step in an enterprise's digital transformation is to identify the business problems to be solved by adopting Azure. 下一步是判斷問題的高階解決方案,例如將工作負載遷移至雲端,或調整現有的內部部署服務以包含雲端功能。The next step is to determine a high-level solution to the problem, such as migrating a workload to the cloud or adapting an existing, on-premises service to include cloud functionality. 最後,您可以設計和實行方案。Finally, you design and implement the solution.

在這個過程中,焦點通常是服務的功能:您希望服務執行的一組 功能性 需求。During this process, the focus is often on the features of the service: the set of functional requirements that you want the service to perform. 例如,產品傳遞服務需要功能來判斷產品的來源和目的地位置、在傳遞期間追蹤產品,以及傳送通知給客戶。For example, a product-delivery service requires features for determining the source and destination locations of the product, tracking the product during delivery, and sending notifications to the customer.

相反地,非 功能性 需求與服務的 可用性復原能力和擴充 等屬性相關。The nonfunctional requirements, in contrast, relate to properties such as the service's availability, resiliency, and scalability. 這些屬性與功能性需求不同,因為它們不會直接影響服務中任何特定功能的最終功能。These properties differ from the functional requirements because they don't directly affect the final function of any particular feature in the service. 但是,非功能性需求與服務的效能和持續性相關。However, nonfunctional requirements do relate to the performance and continuity of the service.

您可以根據服務等級協定 (SLA) 來指定某些非功能性需求。You can specify some nonfunctional requirements in terms of a service-level agreement (SLA). 例如,您可以將服務持續性表示為可用性百分比:「可用時間99.99%」。For example, you can express service continuity as a percentage of availability: "available 99.99 percent of the time". 其他的非功能性需求可能較難定義,而且可能隨著生產需求的變更而變更。Other nonfunctional requirements might be more difficult to define and might change as production needs change. 例如,取用者導向的服務可能會在普及後遇到未預期的輸送量需求。For example, a consumer-oriented service might face unanticipated throughput requirements after a surge of popularity.

注意

如需有關復原需求的詳細資訊,請參閱 設計可靠的 Azure 應用程式For more information about resiliency requirements, see Designing reliable Azure applications. 該文章包含概念的說明,例如復原點目標 (RPO) 、復原時間目標 (RTO) 和 SLA。That article includes explanations of concepts like recovery-point objective (RPO), recovery-time objective (RTO), and SLA.

操作健身審查的流程Process for operational fitness review

維護企業服務之效能和持續性的關鍵,就是執行操作健身審查的流程。The key to maintaining the performance and continuity of an enterprise's services is to implement a process for operational fitness review.

操作健身審查的流程總覽

從高階觀點來看,該程序有兩個階段。At a high level, the process has two phases. 必要條件階段 中,已建立需求並將其對應至支援的服務。In the prerequisites phase, the requirements are established and mapped to supporting services. 這個階段不常發生:可能是每年或推出新作業。This phase occurs infrequently: perhaps annually or when new operations are introduced. 流程」階段會使用「必要條件」階段的輸出。The output of the prerequisites phase is used in the flow phase. 流程階段會更頻繁地發生,例如每月。The flow phase occurs more frequently, such as monthly.

先決條件階段Prerequisites phase

此階段中的步驟會取得定期審核重要服務的需求。The steps in this phase capture the requirements for conducting a regular review of the important services.

  1. 識別重要的商務營運。Identify critical business operations. 找出企業的任務關鍵性商業作業。Identify the enterprise's mission-critical business operations. 商業作業與任何支援的服務功能不同。Business operations are independent from any supporting service functionality. 換句話說,商務營運代表企業需要執行的實際活動,以及一組 IT 服務所支援的活動。In other words, business operations represent the actual activities that the business needs to perform and that are supported by a set of IT services.

    如果作業阻礙,關鍵 任務 (或 業務關鍵性 的) 會反映對企業的重大影響。The term mission-critical (or business-critical) reflects a severe impact on the business if the operation is impeded. 例如,線上零售商可能會有商務營運,例如「讓客戶將專案新增至購物車」或「處理信用卡付款」。For example, an online retailer might have a business operation, such as "enable a customer to add an item to a shopping cart" or "process a credit card payment." 如果其中一項作業失敗,客戶就無法完成交易,企業也無法實現銷售。If either of these operations fails, a customer can't complete the transaction and the enterprise fails to realize sales.

  2. 將作業對應至服務。Map operations to services. 將重要的商務營運對應至支援這些作業的服務。Map the critical business operations to the services that support them. 在購物車範例中,可能涉及數個服務,包括庫存庫存管理服務和購物車服務。In the shopping-cart example, several services might be involved, including an inventory stock-management service and a shopping-cart service. 若要處理信用卡付款,內部部署付款服務可能會與協力廠商的付款處理服務互動。To process a credit-card payment, an on-premises payment service might interact with a third-party, payment-processing service.

  3. 分析服務相依性。Analyze service dependencies. 大部分的商務作業需要跨多個支援服務的協調流程。Most business operations require orchestration among multiple supporting services. 請務必瞭解服務之間的相依性,以及透過這些服務進行任務關鍵性交易的流程。It's important to understand the dependencies between the services, and the flow of mission-critical transactions through these services.

    也請考慮內部部署服務與 Azure 服務之間的相依性。Also consider the dependencies between on-premises services and Azure services. 在購物車範例中,庫存庫存管理服務可能會裝載于內部部署環境,並內嵌來自實體倉儲的員工所輸入的資料。In the shopping-cart example, the inventory stock-management service might be hosted on-premises and ingest data entered by employees from a physical warehouse. 不過,它可能會將內部部署的資料儲存在 Azure 服務(例如 Azure 儲存體)或資料庫(例如 azure Cosmos DB)。However, it might store data off-premises in an Azure service, such as Azure Storage, or a database, such as Azure Cosmos DB.

這些活動的輸出是一組服務作業的 計分卡計量An output from these activities is a set of scorecard metrics for service operations. 計分卡會測量準則,例如可用性、擴充性和嚴重損壞修復。The scorecard measures criteria such as availability, scalability, and disaster recovery. 計分卡計量表達您預期服務符合的操作準則。Scorecard metrics express the operational criteria that you expect the service to meet. 這些計量可以用任何適用于服務作業的資料細微性層級來表示。These metrics can be expressed at any level of granularity that's appropriate for the service operation.

計分卡應該以在公司擁有者與工程人員之間促使有意義之決策的簡單方式表示。The scorecard should be expressed in simple terms to facilitate meaningful discussion between the business owners and engineering. 例如,擴充性的計分卡度量可能會以簡單的方式以色彩標示。For example, a scorecard metric for scalability might be color-coded in a simple way. 綠色表示符合定義的準則,黃色表示無法符合已定義的準則,而是主動實行計畫的補救,而紅色表示無法符合定義的準則,而不需要計畫或動作。Green means meeting the defined criteria, yellow means failing to meet the defined criteria but actively implementing a planned remediation, and red means failing to meet the defined criteria with no plan or action.

請務必強調這些計量應該直接反映商務需求。It's important to emphasize that these metrics should directly reflect business needs.

服務審核階段Service-review phase

「服務審核」階段是操作健身審查的核心。The service-review phase is the core of the operational fitness review. 流程有三個步驟:It involves these steps:

  1. 測量服務計量。Measure service metrics. 使用計分卡計量來監視服務,以確保服務符合商務期望。Use the scorecard metrics to monitor the services, to ensure that the services meet the business expectations. 服務監視是不可或缺的。Service monitoring is essential. 如果您無法針對非功能性需求監視一組服務,請將對應的計分卡計量視為紅色。If you can't monitor a set of services with respect to the nonfunctional requirements, consider the corresponding scorecard metrics to be red. 在此案例中,補救措施的第一個步驟是實作適當的服務監視機制。In this case, the first step for remediation is to implement the appropriate service monitoring. 例如,如果企業預期服務可運作99.99% 的可用性,但沒有生產遙測可用來測量可用性,請假設您不符合需求。For example, if the business expects a service to operate with 99.99 percent availability, but there is no production telemetry in place to measure availability, assume that you're not meeting the requirement.

  2. 規劃補救。Plan remediation. 針對計量低於可接受閾值的每個服務作業,判斷補救服務以將作業帶入可接受層級的成本。For each service operation for which metrics fall below an acceptable threshold, determine the cost of remediating the service to bring operation to an acceptable level. 如果補救服務的成本大於預期的服務產生收益,請繼續考慮無形成本,例如客戶體驗。If the cost of remediating the service is greater than the expected revenue generation of the service, move on to consider the intangible costs, such as customer experience. 例如,如果客戶在使用服務時遇到問題,可能會改為選擇競爭者。For example, if customers have difficulty placing a successful order by using the service, they might choose a competitor instead.

  3. 執行補救。Implement remediation. 當商務擁有者和工程團隊同意計畫之後,請加以實行。After the business owners and engineering team agree on a plan, implement it. 當您查看計分卡計量時,報告執行的狀態。Report the status of the implementation whenever you review scorecard metrics.

此程式是反復進行的,因此在理想的情況下,您的企業有專門的團隊。This process is iterative, and ideally your enterprise has a team dedicated to it. 此小組應定期符合以檢查現有的補救專案、開始新工作負載的基本審查,以及追蹤企業的整體計分卡。This team should meet regularly to review existing remediation projects, kick off the fundamental review of new workloads, and track the enterprise's overall scorecard. 如果小組落後或無法符合計量,則小組也應讓有權持有補救小組的責任。The team should also have the authority to hold remediation teams accountable if they're behind schedule or fail to meet metrics.

審核小組的結構Structure of the review team

負責操作健身審查的團隊是由下列角色所組成:The team responsible for operational fitness review is composed of the following roles:

  • 企業擁有者: 提供企業知識,以找出並排定每個任務關鍵性商務作業的優先順序。Business owner: Provides knowledge of the business to identify and prioritize each mission-critical business operation. 此角色也會比較業務影響的風險降低成本,並驅動有關補救的最後決策。This role also compares the mitigation cost to the business impact, and drives the final decision on remediation.

  • 商務提倡者: 將商務營運細分為重要的部分,並將這些元件對應至服務和基礎結構(不論是在內部部署或雲端中)。Business advocate: Breaks down business operations into discreet parts, and maps those parts to services and infrastructure, whether on-premises or in the cloud. 此角色需要與每個商業作業關聯的深入技術知識。The role requires deep knowledge of the technology associated with each business operation.

  • 工程擁有者: 實行與商務作業相關聯的服務。Engineering owner: Implements the services associated with the business operation. 這些個人可能會參與任何解決方案的設計、執行和部署,以因應審核小組所發現的無功能性需求問題。These individuals might participate in the design, implementation, and deployment of any solutions for nonfunctional requirement problems that are uncovered by the review team.

  • 服務擁有者: 操作企業的應用程式和服務。Service owner: Operates the business's applications and services. 這些個人會收集這些應用程式與服務的記錄與使用狀況資料。These individuals collect logging and usage data for these applications and services. 這項資料是用來識別問題,並在部署之後驗證修正。This data is used both to identify problems and to verify fixes after they're deployed.

審核會議Review meeting

建議您的審查團隊定期進行。We recommend that your review team meet on a regular basis. 例如,小組可能會每月符合,然後將狀態和計量回報給每季的資深領導階層。For example, the team might meet monthly, and then report status and metrics to senior leadership on a quarterly basis.

調整流程和會議的詳細資料,以符合您的特定需求。Adapt the details of the process and meeting to fit your specific needs. 建議您以下列工作做為起點:We recommend the following tasks as a starting point:

  1. 企業擁有者和商務提倡者會列舉並判斷每個商務作業的非功能性需求,以及工程和服務擁有者的輸入。The business owner and business advocate enumerate and determine the nonfunctional requirements for each business operation, with input from the engineering and service owners. 針對先前已識別的商務作業,請檢查並確認優先順序。For business operations that have been identified previously, review and verify the priority. 針對新的商務作業,請在現有的清單中指派優先順序。For new business operations, assign a priority in the existing list.

  2. 工程團隊與服務擁有者會將商業作業的目前狀態對應到對應的內部部署與雲端服務。The engineering and service owners map the current state of business operations to the corresponding on-premises and cloud services. 對應是每個服務中的元件清單,以相依性樹狀結構的形式來導向。The mapping is a list of the components in each service, oriented as a dependency tree. 然後,工程和服務擁有者會決定通過樹狀結構的重要路徑。The engineering and service owners then determine the critical paths through the tree.

  3. 工程團隊與服務擁有者會檢閱操作記錄的目前狀態,並監視上一個步驟中所列的服務。The engineering and service owners review the current state of operational logging and monitoring for the services listed in the previous step. 健全的記錄和監視很重要:它們會識別導致無法滿足非功能性需求的服務元件。Robust logging and monitoring are critical: they identify service components that contribute to a failure to meet nonfunctional requirements. 如果沒有足夠的記錄和監視功能,小組必須建立並執行方案,將它們放在原處。If sufficient logging and monitoring aren't in place, the team must put them in place by creating and implementing a plan.

  4. 小組會針對新的商務營運建立計分卡度量。The team creates scorecard metrics for new business operations. 計分卡是由步驟2中識別之每個服務的構成元件清單所組成。The scorecard consists of the list of constituent components for each service identified in step 2. 它會與非功能性需求一致,並包含每個元件符合需求的程度量值。It's aligned with the nonfunctional requirements, and includes a measure of how well each component meets the requirements.

  5. 針對無法滿足非功能性需求的組成元件,小組會設計高階方案,並指派工程擁有者。For constituent components that fail to meet nonfunctional requirements, the team designs a high-level solution, and assigns an engineering owner. 此時,業務擁有者和業務提倡者會根據商務營運的預期收入,來建立補救工作的預算。At this point, the business owner and business advocate establish a budget for the remediation work, based on the expected revenue of the business operation.

  6. 最後,小組會複習進行中的補救工作。Finally, the team conducts a review of the ongoing remediation work. 每個進行中工作的計分卡計量都會根據預期準則進行審核。Each of the scorecard metrics for work in progress is reviewed against the expected criteria. 針對符合計量準則的組成元件,服務擁有者會提供記錄和監視資料,以確認符合準則。For constituent components that meet metric criteria, the service owner presents logging and monitoring data to confirm that the criteria are met. 對於不符合計量準則的構成元件,每個工程擁有者都會說明導致無法符合準則的問題,並提供補救的任何新設計。For those constituent components that don't meet metric criteria, each engineering owner explains the problems that are preventing criteria from being met, and presents any new designs for remediation.

  • Microsoft Azure Well-Architected Framework:瞭解用來改善工作負載品質的指導原則。Microsoft Azure Well-Architected Framework: Learn about guiding tenets for improving the quality of a workload. 該架構包含五個卓越的架構要素:The framework consists of five pillars of architecture excellence:
    • 成本最佳化Cost optimization
    • 卓越營運Operational excellence
    • 效能效率Performance efficiency
    • 可靠性Reliability
    • 安全性Security
  • Azure 應用程式的10個設計原則Ten design principles for Azure applications. 請遵循這些設計原則,讓您的應用程式更有擴充空間、可容易復原且更方便管理。Follow these design principles to make your application more scalable, resilient, and manageable.
  • 設計適用于 Azure 的復原應用程式Designing resilient applications for Azure. 使用結構化的方法,在應用程式的存留期內建立和維護可靠的系統,從設計和執行到部署和作業。Build and maintain reliable systems using a structured approach over the lifetime of an application, from design and implementation to deployment and operations.
  • 雲端設計模式Cloud design patterns. 使用設計模式,在卓越的架構要素上打造應用程式。Use design patterns to build applications on the pillars of architecture excellence.
  • Azure AdvisorAzure Advisor. Azure Advisor 會根據您的使用方式和設定提供個人化建議,以協助將您的資源優化,以獲得高可用性、安全性、效能和成本。Azure Advisor provides personalized recommendations based on your usage and configurations to help optimize your resources for high availability, security, performance, and cost.