DevOps 檢查清單DevOps Checklist

DevOps 是一種整合,將開發、品質保證和 IT 營運整合至統一的文化特性和一組處理程序,以傳遞軟體。DevOps is the integration of development, quality assurance, and IT operations into a unified culture and set of processes for delivering software. 使用此檢查清單作為起點,來評估您的 DevOps 文化特性和程序。Use this checklist as a starting point to assess your DevOps culture and process.

文化特性Culture

確保業務跨組織和小組保持一致。Ensure business alignment across organizations and teams. 組織內資源、目的、目標和優先順序的衝突對於作業成功是風險。Conflicts over resources, purpose, goals, and priorities within an organization can be a risk to successful operations. 確保業務、開發和作業小組保持一致。Ensure that the business, development, and operations teams are all aligned.

確定整個小組了解軟體生命週期。Ensure the entire team understands the software lifecycle. 您的小組必須了解應用程式的整體生命週期,以及應用程式目前在生命週期的哪個部分。Your team needs to understand the overall lifecycle of the application, and which part of the lifecycle the application is currently in. 這有助於所有小組成員知道他們現在應該做什麼,以及未來應該規劃及準備什麼。This helps all team members know what they should be doing now, and what they should be planning and preparing for in the future.

減少週期時間。Reduce cycle time. 目的是將從構想到可用開發軟體所花費的時間降至最低。Aim to minimize the time it takes to move from ideas to usable developed software. 限制個別版本的大小和範圍,以壓低測試負擔。Limit the size and scope of individual releases to keep the test burden low. 盡可能自動化建置、測試、設定和部署程序。Automate the build, test, configuration, and deployment processes whenever possible. 清除開發人員彼此之間和開發人員與作業之間通訊的任何障礙。Clear any obstacles to communication among developers, and between developers and operations.

檢閱和改善處理程序。Review and improve processes. 您的處理程序和程序,不論是自動化或是手動,都不是最後一個項目。Your processes and procedures, both automated and manual, are never final. 設定目前工作流程、程序和文件的定期檢閱,目標是持續改進。Set up regular reviews of current workflows, procedures, and documentation, with a goal of continual improvement.

進行主動式規劃。Do proactive planning. 針對失敗的主動式規劃。Proactively plan for failure. 讓處理程序就位以在問題發生時快速識別,呈報至修正小組成員以修正及確認解決。Have processes in place to quickly identify issues when they occur, escalate to the correct team members to fix, and confirm resolution.

從失敗中學習。Learn from failures. 失敗在所難免,但是請務必從失敗中學習以避免重複這些失敗。Failures are inevitable, but it's important to learn from failures to avoid repeating them. 如果發生操作失敗,將問題分級、記錄原因和解決方案,並且共用從中汲取的經驗。If an operational failure occurs, triage the issue, document the cause and solution, and share any lessons that were learned. 如果可能,請更新您的建置程序以在未來自動偵測這種失敗。Whenever possible, update your build processes to automatically detect that kind of failure in the future.

針對速度和收集資料最佳化。Optimize for speed and collect data. 每個規劃的改善都是假設。Every planned improvement is a hypothesis. 盡可能使用最小的增量。Work in the smallest increments possible. 將新概念視為實驗。Treat new ideas as experiments. 檢測實驗,以便收集生產資料來評估其有效性。Instrument the experiments so that you can collect production data to assess their effectiveness. 準備好在假設錯誤時快速檢錯。Be prepared to fail fast if the hypothesis is wrong.

允許學習的時間。Allow time for learning. 失敗和成功都提供了學習的良好機會。Both failures and successes provide good opportunities for learning. 前進到新的專案之前,允許足夠的時間來蒐集重要的課程,並確定這些課程會被您的小組吸收。Before moving on to new projects, allow enough time to gather the important lessons, and make sure those lessons are absorbed by your team. 同時給予小組建置技術、實驗以及學習新工具和技術的時間。Also give the team the time to build skills, experiment, and learn about new tools and techniques.

記錄作業。Document operations. 記錄與您的產品程式碼相同品質等級的所有工具、處理程序和自動化工作。Document all tools, processes, and automated tasks with the same level of quality as your product code. 記錄您支援之任何系統的目前設計和架構,以及復原處理程序和其他維護程序。Document the current design and architecture of any systems you support, along with recovery processes and other maintenance procedures. 專注於您實際執行的步驟,而不是理論上的最佳處理程序。Focus on the steps you actually perform, not theoretically optimal processes. 定期檢閱並更新文件。Regularly review and update the documentation. 針對程式碼,請確定有意義的註解會包含在內,尤其是在公用 API 中,並且盡可能使用工具來自動產生程式碼文件。For code, make sure that meaningful comments are included, especially in public APIs, and use tools to automatically generate code documentation whenever possible.

共用知識。Share knowledge. 只有在人們知道其存在且找得到時,文件才有用。Documentation is only useful if people know that it exists and can find it. 請確定文件已經過組織且可輕易地探索。Ensure the documentation is organized and easily discoverable. 有創意:使用牛皮紙袋 (非正式簡報)、影片或電子報來共用知識。Be creative: Use brown bags (informal presentations), videos, or newsletters to share knowledge.

開發Development

提供開發人員類似生產的環境。Provide developers with production-like environments. 如果開發和測試環境不符合生產環境,就難以測試和診斷問題。If development and test environments don't match the production environment, it is hard to test and diagnose problems. 因此,讓開發和測試環境盡可能接近生產環境。Therefore, keep development and test environments as close to the production environment as possible. 請確定測試資料與生產中使用的資料一致,即使它是範例資料且並非真正的生產資料 (基於隱私或相容性理由)。Make sure that test data is consistent with the data used in production, even if it's sample data and not real production data (for privacy or compliance reasons). 計劃產生並匿名範例測試資料。Plan to generate and anonymize sample test data.

確定所有授權的小組成員可以佈建基礎結構,以及部署應用程式。Ensure that all authorized team members can provision infrastructure and deploy the application. 設定類似生產的資源和部署應用程式,應該不涉及複雜的手動工作或系統的詳細技術知識。Setting up production-like resources and deploying the application should not involve complicated manual tasks or detailed technical knowledge of the system. 具備正確權限的任何人都應該能夠建立或部署類似生產的資源,而不需要作業小組。Anyone with the right permissions should be able to create or deploy production-like resources without going to the operations team.

這項建議並非表示任何人都可以將即時更新推送到生產部署。This recommendation doesn't imply that anyone can push live updates to the production deployment. 它是關於減少開發和 QA 小組的摩擦以建立類似生產的環境。It's about reducing friction for the development and QA teams to create production-like environments.

檢測應用程式以深入解析。Instrument the application for insight. 若要了解您的應用程式的健康情況,您必須知道它如何執行,以及它是否經歷任何錯誤或問題。To understand the health of your application, you need to know how it's performing and whether it's experiencing any errors or problems. 永遠納入檢測作為設計需求,並且從頭開始就將檢測建置到應用程式中。Always include instrumentation as a design requirement, and build the instrumentation into the application from the start. 檢測必須包含根本原因分析的事件記錄,也必須包含遙測和計量以監視應用程式的整體健康情況和使用方式。Instrumentation must include event logging for root cause analysis, but also telemetry and metrics to monitor the overall health and usage of the application.

追蹤您的技術債務。Track your technical debt. 在許多專案中,發行排程的優先順序某種程度上高於程式碼品質。In many projects, release schedules can get prioritized over code quality to one degree or another. 當發生這種情況時,請務必保持追蹤。Always keep track when this occurs. 記錄任何捷徑或其他非最佳的實作,並且排程未來的時間以再次瀏覽這些問題。Document any shortcuts or other nonoptimal implementations, and schedule time in the future to revisit these issues.

請考慮將更新直接推送到生產。Consider pushing updates directly to production. 若要減少整體發行週期時間,請考慮將測試完成的程式碼認可直接推送到生產。To reduce the overall release cycle time, consider pushing properly tested code commits directly to production. 使用功能切換來控制已啟用的功能。Use feature toggles to control which features are enabled. 這可讓您快速地從開發移至發行,使用切換來啟用或停用功能。This allows you to move from development to release quickly, using the toggles to enable or disable features. 執行測試(例如,將特定功能部署到生產環境的子集)時,切換也很有用。Toggles are also useful when performing tests such as canary releases, where a particular feature is deployed to a subset of the production environment.

測試Testing

自動化測試。Automate testing. 手動測試軟體相當乏味且容易發生錯誤。Manually testing software is tedious and susceptible to error. 自動化一般測試工作,並將測試整合到您的建置程序。Automate common testing tasks and integrate the tests into your build processes. 自動化的測試可確保一致的測試涵蓋範圍和重現性。Automated testing ensures consistent test coverage and reproducibility. 整合式 UI 測試也應該由自動化工具執行。Integrated UI tests should also be performed by an automated tool. Azure 提供開發與測試資源,可協助您設定和執行測試。Azure offers development and test resources that can help you configure and execute testing. 如需詳細資訊,請參閱開發和測試For more information, see Development and test.

針對失敗的測試。Test for failures. 如果系統無法連線到服務,它要如何回應?If a system can't connect to a service, how does it respond? 一旦服務再次可供使用,它是否可以復原?Can it recover once the service is available again? 讓錯誤插入測試成為測試和預備環境上檢閱的標準部分。Make fault injection testing a standard part of review on test and staging environments. 當您的測試程序和做法成熟時,請考慮在生產環境中執行這些測試。When your test process and practices are mature, consider running these tests in production.

在生產環境中測試。Test in production. 發行程序不會隨著部署至生產環境而結束。The release process doesn't end with deployment to production. 讓測試就位以確保已部署的程式碼如預期般運作。Have tests in place to ensure that deployed code works as expected. 對於不常更新的部署,將生產測試排程為定期維護。For deployments that are infrequently updated, schedule production testing as a regular part of maintenance.

自動化效能測試以及早識別效能問題。Automate performance testing to identify performance issues early. 嚴重效能問題的影響可能會與程式碼中的錯誤一樣嚴重。The impact of a serious performance issue can be as severe as a bug in the code. 雖然自動化功能測試可以防止應用程式錯誤,但是它們可能無法偵測效能問題。While automated functional tests can prevent application bugs, they might not detect performance problems. 為計量 (例如延遲、載入時間和資源使用方式) 定義可接受的效能目標。Define acceptable performance goals for metrics like latency, load times, and resource usage. 在您的發行管線中包含自動化效能測試,以確定應用程式符合這些目標。Include automated performance tests in your release pipeline, to make sure the application meets those goals.

執行容量測試。Perform capacity testing. 應用程式在測試條件下可能會正常運作,然後在生產環境中因為範圍或資源限制而發生問題。An application might work fine under test conditions, and then have problems in production due to scale or resource limitations. 一律定義最大預期容量和使用方式限制。Always define the maximum expected capacity and usage limits. 測試以確保應用程式可以處理這些限制,同時測試當超過這些限制時會發生什麼事。Test to make sure the application can handle those limits, but also test what happens when those limits are exceeded. 容量測試應該定期執行。Capacity testing should be performed at regular intervals.

初始發行之後,您應該在每次對生產環境程式碼進行更新時,執行效能和容量測試。After the initial release, you should run performance and capacity tests whenever updates are made to production code. 使用歷程記錄資料微調測試,並判斷需要執行的測試類型。Use historical data to fine-tune tests and to determine what types of tests need to be performed.

執行自動化安全性滲透測試。Perform automated security penetration testing. 確保您的應用程式安全,與測試任何其他功能同等重要。Ensuring your application is secure is as important as testing any other functionality. 讓自動化滲透測試成為建置和部署程序的標準部分。Make automated penetration testing a standard part of the build and deployment process. 在已部署應用程式上排程定期安全性測試和弱點掃描,以監視開啟連接埠、端點和攻擊。Schedule regular security tests and vulnerability scanning on deployed applications, monitoring for open ports, endpoints, and attacks. 自動化的測試不會移除定期需要深入安全性檢閱的需求。Automated testing does not remove the need for in-depth security reviews at regular intervals.

執行自動化商務持續性測試。Perform automated business continuity testing. 開發大規模商務持續性的測試,包括備份復原和容錯移轉。Develop tests for large-scale business continuity, including backup recovery and failover. 設定自動化的程序,以定期執行這些測試。Set up automated processes to perform these tests regularly.

發行Release

自動化部署。Automate deployments. 自動化將應用程式部署至測試、預備及生產環境。Automate deploying the application to test, staging, and production environments. 自動化可讓部署更快速且更可靠,並確保對任何支援環境的一致部署。Automation enables faster and more reliable deployments, and ensures consistent deployments to any supported environment. 它會移除手動部署所造成之人為錯誤的風險。It removes the risk of human error caused by manual deployments. 它也可以輕鬆排程在方便的時間發行,以將可能停機的任何影響降至最低。It also makes it easy to schedule releases for convenient times, to minimize any effects of potential downtime. 備妥系統來偵測首度發行期間的任何問題,並提供自動化的方式來向前復原修正程式或回復變更。Have systems in place to detect any problems during rollout, and have an automated way to roll forward fixes or roll back changes.

使用持續整合。Use continuous integration. 持續整合 (CI) 是以定期排程將所有開發人員程式碼合併到中央程式碼基底的做法,然後會自動執行標準建置和測試程序。Continuous integration (CI) is the practice of merging all developer code into a central codebase on a regular schedule, and then automatically performing standard build and test processes. CI 可確保整個小組同時在程式碼基底上工作,而不會有衝突。CI ensures that an entire team can work on a codebase at the same time without having conflicts. 它也可確保能夠盡早找到程式碼缺失。It also ensures that code defects are found as early as possible. CI 程序最好應該在每次程式碼認可或簽入時執行。Preferably, the CI process should run every time that code is committed or checked in. 至少應該每天執行一次。At the very least, it should run once per day.

請考慮採用以主幹為基礎的開發模型Consider adopting a trunk based development model. 在此模型中,開發人員會認可至單一分支 (主幹)。In this model, developers commit to a single branch (the trunk). 有著認可永遠不中斷建置的需求。There is a requirement that commits never break the build. 此模型有助於 CI,因為所有功能工作都是在主幹中完成,當認可發生時任何合併衝突都會解決。This model facilitates CI, because all feature work is done in the trunk, and any merge conflicts are resolved when the commit happens.

考慮使用持續傳遞。Consider using continuous delivery. 持續傳遞 (CD) 是確保程式碼一律就緒準備部署的做法,方法是自動建置、測試程式碼,並且將其部署至類似生產的環境。Continuous delivery (CD) is the practice of ensuring that code is always ready to deploy, by automatically building, testing, and deploying code to production-like environments. 新增持續傳遞以建立完整的 CI/CD 管線,可協助您儘速偵測程式碼缺失,並且確保適當的已測試更新可以在很短的時間內發行。Adding continuous delivery to create a full CI/CD pipeline will help you detect code defects as soon as possible, and ensures that properly tested updates can be released in a very short time.

持續部署是額外的處理程序,會自動採用通過 CI/CD 管線的任何更新,然後將它們部署到生產環境。Continuous deployment is an additional process that automatically takes any updates that have passed through the CI/CD pipeline and deploys them into production. 持續部署需要強固的自動測試和進階程序規劃,可能不適用於所有小組。Continuous deployment requires robust automatic testing and advanced process planning, and may not be appropriate for all teams.

進行小型增量變更。Make small incremental changes. 大型程式碼變更更有可能引入錯誤。Large code changes have a greater potential to introduce bugs. 您應該盡可能保持小型變更。Whenever possible, keep changes small. 這樣會限制每個變更的潛在影響,並且讓了解及偵錯任何問題更容易。This limits the potential effects of each change, and makes it easier to understand and debug any issues.

控制變更的公開。Control exposure to changes. 確定您可以控制使用者何時能夠看見更新。Make sure you're in control of when updates are visible to your end users. 為使用者啟用功能時,請考慮使用功能切換來控制。Consider using feature toggles to control when features are enabled for end users.

實作發行管理策略來減少部署風險。Implement release management strategies to reduce deployment risk. 將應用程式更新部署到生產環境永遠會伴隨著某些風險。Deploying an application update to production always entails some risk. 若要將此風險降到最低,請使用一些策略(例如,不帶的版本藍綠部署),將更新部署到使用者的子集。To minimize this risk, use strategies such as canary releases or blue-green deployments to deploy updates to a subset of users. 確認更新如預期般運作,然後將更新推出給系統其餘部分。Confirm the update works as expected, and then roll the update out to the rest of the system.

記錄所有變更。Document all changes. 次要更新和組態變更會是混淆和版本衝突的來源。Minor updates and configuration changes can be a source of confusion and versioning conflict. 請務必保留任何變更的清楚記錄,無論多麼微小。Always keep a clear record of any changes, no matter how small. 記錄變更的每個項目,包括套用的修補程式、原則變更和組態變更。Log everything that changes, including patches applied, policy changes, and configuration changes. (不要在這些記錄中包括敏感性資料。(Don't include sensitive data in these logs. 例如,記錄認證已更新,以及進行變更的人員,但不記錄更新的認證)。整個小組應該可以看到變更的記錄。For example, log that a credential was updated, and who made the change, but don't record the updated credentials.) The record of the changes should be visible to the entire team.

考慮讓基礎結構不可變。Consider making infrastructure immutable. 不可變的基礎結構是您在將基礎結構部署至生產環境之後不應該修改的準則。Immutable infrastructure is the principle that you shouldn’t modify infrastructure after it’s deployed to production. 否則,就可以進入已套用臨機操作變更的狀態,很難完全了解變更的內容。Otherwise, you can get into a state where ad hoc changes have been applied, making it hard to know exactly what changed. 不可變的基礎結構之運作方式是在任何新部署中取代整部伺服器。Immutable infrastructure works by replacing entire servers as part of any new deployment. 這可讓程式碼和裝載環境進行測試和部署為區塊。This allows the code and the hosting environment to be tested and deployed as a block. 一旦部署,基礎結構元件不會修改,直到下一個建置和部署週期。Once deployed, infrastructure components aren't modified until the next build and deploy cycle.

監視Monitoring

讓系統變成可觀察。Make systems observable. 作業小組應該一律對於系統或服務的健康情況和狀態有清楚的可見性。The operations team should always have clear visibility into the health and status of a system or service. 設定外部健康情況端點以監視狀態,並且確認應用程式已編碼以檢測作業計量。Set up external health endpoints to monitor status, and ensure that applications are coded to instrument the operations metrics. 使用通用且一致的架構,協助您將事件相互關聯至不同的系統。Use a common and consistent schema that helps you correlate events across systems. Azure 診斷Application Insights是追蹤 Azure 資源健康情況和狀態的標準方法。Azure Diagnostics and Application Insights are the standard method of tracking the health and status of Azure resources. Microsoft Operation Management Suite也提供雲端或混合式解決方案的集中監視和管理。Microsoft Operation Management Suite also provides centralized monitoring and management for cloud or hybrid solutions.

彙總記錄和計量並且將其相互關聯Aggregate and correlate logs and metrics. 適當檢測遙測系統會提供大量的未經處理效能資料和事件記錄。A properly instrumented telemetry system will provide a large amount of raw performance data and event logs. 請確定遙測和記錄資料經過處理,並且在短時間內相互關聯,讓作業人員一律擁有系統健康情況的最新概念。Make sure that telemetry and log data is processed and correlated in a short period of time, so that operations staff always have an up-to-date picture of system health. 以提供任何問題密切檢視的方式來組織及顯示資料,以便在事件彼此相關時能夠盡量清楚。Organize and display data in ways that give a cohesive view of any issues, so that whenever possible it's clear when events are related to one another.

如需資料處理方式以及應該儲存多久的需求,請參閱公司的保留原則。Consult your corporate retention policy for requirements on how data is processed and how long it should be stored.

實作自動化的警示和通知。Implement automated alerts and notifications. 設定監視工具(例如Azure 監視器)來偵測表示可能或目前問題的模式或條件,並將警示傳送給可以解決問題的小組成員。Set up monitoring tools like Azure Monitor to detect patterns or conditions that indicate potential or current issues, and send alerts to the team members who can address the issues. 微調警示以避免誤判。Tune the alerts to avoid false positives.

監視資產和資源是否到期。Monitor assets and resources for expirations. 某些資源和資產,例如憑證,會在指定的一段時間之後到期。Some resources and assets, such as certificates, expire after a given amount of time. 請確定追蹤哪些資產到期、它們何時到期,以及哪些服務或功能相依於這些資產。Make sure to track which assets expire, when they expire, and what services or features depend on them. 使用自動化的程序來監視這些資產。Use automated processes to monitor these assets. 在資產到期之前通知作業小組,如果到期威脅會中斷應用程式,則向上呈報。Notify the operations team before an asset expires, and escalate if expiration threatens to disrupt the application.

管理性Management

自動化作業工作。Automate operations tasks. 手動處理重複的作業程序很容易發生錯誤。Manually handling repetitive operations processes is error-prone. 盡可能自動化這些工作,以確保一致的執行和品質。Automate these tasks whenever possible to ensure consistent execution and quality. 會實作自動化的程式碼應該在原始程式碼控制中建立版本。Code that implements the automation should be versioned in source control. 如同任何其他程式碼,自動化工具必須進行測試。As with any other code, automation tools must be tested.

運用基礎結構即程式碼的方法來佈建。Take an infrastructure-as-code approach to provisioning. 將佈建資源所需的手動設定數量降到最低。Minimize the amount of manual configuration needed to provision resources. 相反地,請使用腳本和Azure Resource Manager範本。Instead, use scripts and Azure Resource Manager templates. 在原始程式碼控制中保留指令碼和範本,就像您維護的任何其他程式碼一樣。Keep the scripts and templates in source control, like any other code you maintain.

考慮使用容器。Consider using containers. 容器會提供用來部署應用程式的標準套件型介面。Containers provide a standard package-based interface for deploying applications. 使用容器,應用程式會使用獨立式套件 (包含任何軟體、相依性,以及執行應用程式所需的檔案) 進行部署,可大幅簡化部署程序。Using containers, an application is deployed using self-contained packages that include any software, dependencies, and files needed to run the application, which greatly simplifies the deployment process.

容器也會在應用程式與基礎作業系統之間建立抽象層,提供整個環境的一致性。Containers also create an abstraction layer between the application and the underlying operating system, which provides consistency across environments. 這個抽象層也可以從其他程序或在主機上執行的應用程式將容器隔離。This abstraction can also isolate a container from other processes or applications running on a host.

實作復原和自我修復。Implement resiliency and self-healing. 復原是應用程式從失敗中復原的能力。Resiliency is the ability of an application to recover from failures. 復原的策略包括重試暫時性失敗,並且容錯移轉到次要執行個體或甚至是另一個區域。Strategies for resiliency include retrying transient failures, and failing over to a secondary instance or even another region. 如需詳細資訊,請參閱設計可靠的 Azure 應用程式For more information, see Designing reliable Azure applications . 檢測您的應用程式,讓問題可以立即回報,且您可以管理中斷或其他系統失敗。Instrument your applications so that issues are reported immediately and you can manage outages or other system failures.

具備作業手冊。Have an operations manual. 作業手冊或「Runbook」會記錄作業人員維護系統所需的程序和管理資訊。An operations manual or runbook documents the procedures and management information needed for operations staff to maintain a system. 也會記錄任何作業案例和風險降低計畫,可能會在服務失敗或其他中斷期間執行。Also document any operations scenarios and mitigation plans that might come into play during a failure or other disruption to your service. 在開發程序期間建立這份文件,並在往後將它保持在最新狀態。Create this documentation during the development process, and keep it up to date afterwards. 這是即時文件,應該要定期檢閱、測試及改進。This is a living document, and should be reviewed, tested, and improved regularly.

共用的文件很重要。Shared documentation is critical. 促使小組成員參與及共用知識。Encourage team members to contribute and share knowledge. 整個小組應該都能夠存取文件。The entire team should have access to documents. 讓小組的每個人都能輕易地協助保持更新文件。Make it easy for anyone on the team to help keep documents updated.

記錄待命程序。Document on-call procedures. 請確定已記錄待命職責、排程及程序,並且與所有小組成員共用。Make sure on-call duties, schedules, and procedures are documented and shared to all team members. 隨時將這項資訊保持在最新狀態。Keep this information up-to-date at all times.

記錄協力廠商相依產品的呈核程序。Document escalation procedures for third-party dependencies. 如果您的應用程式相依於您無法直接控制的外部協力廠商服務,您必須有處理中斷的計劃。If your application depends on external third-party services that you don't directly control, you must have a plan to deal with outages. 為您的風險降低程序建立文件。Create documentation for your planned mitigation processes. 包含支援連絡人和呈核路徑。Include support contacts and escalation paths.

使用組態管理。Use configuration management. 組態變更應該已納入規劃、可讓作業看見,而且已記錄。Configuration changes should be planned, visible to operations, and recorded. 這可能會採用組態管理資料庫的形式,或組態即程式碼的方法。This could take the form of a configuration management database, or a configuration-as-code approach. 應該定期稽核組態,以確保實際就位的是預期的內容。Configuration should be audited regularly to ensure that what's expected is actually in place.

取得 Azure 支援方案並了解程序。Get an Azure support plan and understand the process. Azure 提供數個支援方案Azure offers a number of support plans. 針對您的需求判斷正確的方案,並確定整個小組知道如何使用它。Determine the right plan for your needs, and make sure the entire team knows how to use it. 小組成員應該了解方案的詳細資料、支援程序的運作方式,以及如何使用 Azure 開啟支援票證。Team members should understand the details of the plan, how the support process works, and how to open a support ticket with Azure. 如果您預期是高規格事件,Azure 支援人員可以協助您增加服務限制。If you are anticipating a high-scale event, Azure support can assist you with increasing your service limits. 如需詳細資訊,請參閱 Azure 支援常見問題集For more information, see the Azure Support FAQs.

授與資源的存取權時請遵循最低權限原則。Follow least-privilege principles when granting access to resources. 謹慎管理對資源的存取。Carefully manage access to resources. 根據預設應拒絕存取,除非使用者明確獲得資源的存取權。Access should be denied by default, unless a user is explicitly given access to a resource. 僅授與使用者完成其工作所需的存取權。Only grant a user access to what they need to complete their tasks. 追蹤使用者權限,並執行定期的安全性稽核。Track user permissions and perform regular security audits.

使用角色型存取控制。Use role-based access control. 將使用者帳戶和存取權指派給資源不應該是手動程序。Assigning user accounts and access to resources should not be a manual process. 使用角色型存取控制(RBAC)會根據Azure Active Directory身分識別和群組來授與存取權。Use role-based access control (RBAC) grant access based on Azure Active Directory identities and groups.

使用錯誤追蹤系統來追蹤問題。Use a bug tracking system to track issues. 如果沒有追蹤問題的好方法,就很容易會遺失項目、重複工作,或產生額外的問題。Without a good way to track issues, it's easy to miss items, duplicate work, or introduce additional problems. 不要依賴非正式個人對個人通訊,來追蹤錯誤的狀態。Don't rely on informal person-to-person communication to track the status of bugs. 使用錯誤追蹤工具來記錄問題的相關詳細資料、指派資源以解決問題,以及提供進度和狀態的稽核記錄。Use a bug tracking tool to record details about problems, assign resources to address them, and provide an audit trail of progress and status.

管理變更管理系統中的所有資源。Manage all resources in a change management system. 您的 DevOps 程序的所有層面應該包含在管理和版本設定系統,以便輕鬆地追蹤和稽核變更。All aspects of your DevOps process should be included in a management and versioning system, so that changes can be easily tracked and audited. 這包括程式碼、基礎結構、組態、文件以及指令碼。This includes code, infrastructure, configuration, documentation, and scripts. 將這些類型的資源都視為整個測試/建置/檢閱程序的程式碼。Treat all these types of resources as code throughout the test/build/review process.

使用檢查清單。Use checklists. 建立作業檢查清單以確保遵循處理程序。Create operations checklists to ensure processes are followed. 通常會在大型手冊中遺漏某些項目,遵循檢查清單可以強制注意可能會被忽略的細節。It’s common to miss something in a large manual, and following a checklist can force attention to details that might otherwise be overlooked. 維護檢查清單,並持續尋找自動化工作並簡化處理程序的方式。Maintain the checklists, and continually look for ways to automate tasks and streamline processes.

如需有關 DevOps 的詳細資訊,請參閱 Visual Studio 網站上的什麼是 DevOps?For more about DevOps, see What is DevOps? on the Visual Studio site.