組織及設定 Azure Machine Learning 環境Organize and set up Azure Machine Learning environments

規劃企業環境的 Azure Machine Learning 部署時,有一些常見的決策點會影響您建立工作區的方式:When planning an Azure Machine Learning deployment for an enterprise environment, there are some common decision points that affect how you create the workspace:

  • 小組結構: 您的 Machine Learning 團隊在使用案例和資料隔離或成本管理需求的情況下,組織及共同處理專案的方式。Team structure: The way your Machine Learning teams are organized and collaborate on projects given use case and data segregation, or cost management requirements.

  • 環境: 用來作為開發和發行工作流程的環境,以隔離開發與生產環境。Environments: The environments used as part of your development and release workflow to segregate development from production.

  • 區域: 您的資料所在位置,以及您需要服務 Machine Learning 解決方案的物件。Region: The location of your data and the audience you need to serve your Machine Learning solution to.

小組結構和工作區設定Team structure and workspace setup

工作區是 Azure Machine Learning 中的最上層資源。The workspace is the top-level resource in Azure Machine Learning. 它會儲存使用 Machine Learning 時所產生的成品,以及連接和相關聯資源的受控計算和指標。It stores the artifacts produced when working with Machine Learning and the managed compute and pointers to attached and associated resources. 從管理的觀點來看,做為 Azure Resource Manager 資源的工作區允許 Azure 角色型存取控制 (Azure RBAC) 、依原則管理,並可作為成本報告的單位使用。From a manageability standpoint, the workspace as an Azure Resource Manager resource allows for Azure role-based access control (Azure RBAC), management by Policy, and can be used as a unit for cost reporting.

組織通常會選擇下列解決方案模式的其中一種或組合,以遵循管理性需求。Organizations typically choose one or a combination of the following solution patterns to follow manageability requirements.

每個小組的工作區:當小組的所有成員都需要相同層級的資料和實驗資產存取權時,請選擇每個小組使用一個工作區。Workspace per team: Choose to use one workspace for each team when all members of a team require the same level of access to data and experimentation assets. 例如,有三個機器學習小組的組織可能會建立三個工作區,每個小組各一個。For example, an organization with three machine learning teams might create three workspaces, one for each team.

每個小組使用一個工作區的好處是,小組專案的所有 Machine Learning 成品都會儲存在一個位置。The benefit of using one workspace per team is that all Machine Learning artifacts for the team’s projects are stored in one place. 因為小組成員可以輕鬆地存取、探索及重複使用測試結果,所以可以實現提高生產力。Productivity increases can be realized because team members can easily access, explore, and reuse experimentation results. 依小組組織您的工作區,可減少您的 Azure 使用量並簡化小組的成本管理。Organizing your workspaces by team reduces your Azure footprint and simplifies cost management by team. 由於實驗資產的數目可以快速成長,因此您可以遵循命名和標記慣例來組織您的構件。Because the number of experimentation assets can grow quickly, you can keep your artifacts organized by following naming and tagging conventions. 如需如何命名資源的建議,請參閱 開發 Azure 資源的命名和標記策略For recommendations about how to name resources, see Develop your naming and tagging strategy for Azure resources.

這種方法的考慮是每個小組成員都必須具有類似的資料存取層級許可權。A consideration for this approach is each team member must have similar data access level permissions. 適用于資料來源 (ACL) 的細微 RBAC 和存取控制清單,以及工作區中的實驗資產。Granular RBAC and access control lists (ACL) for data sources and experimentation assets are limited within a workspace. 您不能有使用案例資料隔離需求。You can’t have use case data segregation requirements.

每個專案的工作區: 如果您需要依專案隔離資料和實驗資產,或在專案層級擁有成本報告和預算需求,請選擇針對每個專案使用一個工作區。Workspace per project: Choose to use one workspace for each project if you require segregation of data and experimentation assets by project, or have cost reporting and budgeting requirements at a project level. 例如,有四個機器學習小組且每個小組都執行三個專案的組織,可能會建立12個工作區實例。For example, an organization with four machine learning teams that each runs three projects, might create 12 workspace instances.

針對每個專案使用一個工作區的好處是,可以在專案層級管理成本。The benefit of using one workspace per project is that costs can be managed at the project level. 小組通常會基於類似的原因,為 Azure Machine Learning 和相關聯的資源建立專用的資源群組。Teams typically create a dedicated resource group for Azure Machine Learning and associated resources for similar reasons. 例如,當您使用外部參與者時,以專案為主的工作區可簡化專案的共同作業,因為外部使用者只需要授與專案資源的存取權,而不是小組資源的存取權。When you work with external contributors, for example, a project-centered workspace simplifies collaboration on a project because external users only need to be granted access to the project resources, not the team resources.

這種方法的考慮是將實驗結果和資產隔離。A consideration with this approach is the isolation of experimentation results and assets. 資產的探索和重複使用可能會更困難,因為資產會分散到多個工作區實例。The discovery and reuse of the assets might be more difficult because of assets being spread across multiple workspace instances.

單一工作區: 選擇針對非 team 或非專案相關工作使用一個工作區,或在成本無法直接與特定的計費單位相關聯時,例如搭配 R&D。Single Workspace: Choose to use one workspace for non-team or non-project related work, or when costs can’t be directly associated to a specific unit of billing, for example with R&D.

這項設定的優點是,個別、非專案相關工作的成本可與專案相關的成本分離。The benefit of this setup is the cost of individual, non-project related work can be decoupled from project-related costs. 當您為所有使用者設定單一工作區來執行其個別工作時,會降低您的 Azure 使用量。When you set up a single workspace for all users to do their individual work, you reduce your Azure footprint.

這種方法的考慮是,當有許多 Machine Learning 的執行者共用相同的實例時,工作區可能會變得很雜亂。A consideration for this approach is the workspace might become cluttered quickly when many Machine Learning practitioners share the same instance. 使用者可能需要以 UI 為基礎的資產篩選,才能有效地找出其資源。Users might require UI-based filtering of assets to effectively find their resources. 您可以為每個營業單位建立共用 Machine Learning 工作區,以減少規模考慮或分割預算。You can create shared Machine Learning workspaces for each business division to mitigate scale concerns or to segment budgets.

環境和工作區設定Environments and workspace setup

環境是根據應用程式生命週期中的階段部署目標的資源集合。An environment is a collection of resources that deployments target based on their stage in the application lifecycle. 環境名稱的常見範例包括開發、測試、QA、預備和生產環境。Common examples of environment names are Dev, Test, QA, Staging, and Production.

您組織中的開發程式會影響環境使用的需求。The development process in your organization affects requirements for environment usage. 您的環境會影響 Azure Machine Learning 的設定以及相關聯的資源,例如附加的計算。Your environment affects the setup of Azure Machine Learning and associated resources, for example attached compute. 例如,資料可用性可能會限制每個環境具有可用 Machine Learning 實例的管理性。For example, data availability might put constraints on the manageability of having a Machine Learning instance available for each environment. 以下是常見的解決方案模式:The following solution patterns are common:

單一環境工作區部署: 當您選擇單一環境工作區部署時,Azure Machine Learning 會部署至一個環境。Single environment workspace deployment: When you choose a single environment workspace deployment, Azure Machine Learning is deployed to one environment. 這項設定常見於以研究為主的案例,在此情況下,不需要根據生命週期階段,在環境之間發行 Machine Learning 成品。This setup is common for research-centered scenarios, where there is no need to release Machine Learning artifacts based on their lifecycle stage, across environments. 這項設定有意義的另一種情況是,只有推斷服務(而不是 Machine Learning 管線)會跨環境進行部署。Another scenario where this setup makes sense is when only inferencing services, and not Machine Learning pipelines, are deployed across environments.

以研究為中心的設定優點是較小的 Azure 使用量和最少的管理額外負荷。The benefit of a research-centered setup is a smaller Azure footprint and minimal management overhead. 這種運作方式暗示不需要在每個環境中部署 Azure Machine Learning 工作區。This way of working implies no need to have an Azure Machine Learning workspace deployed in each environment.

這種方法的考慮是單一環境部署受限於資料的可用性。A consideration for this approach is a single environment deployment is subject to data availability. 設定資料存放區時需要注意。Caution is required with the Datastore set up. 如果您設定大量存取(例如,寫入器存取生產資料來源),您可能會不小心損害資料品質。If you set up extensive access, for example, writer access on production data sources, you might unintentionally harm data quality. 如果您在開發完成的相同環境中將工作帶入生產環境,則適用于開發工作和生產工作的相同 RBAC 限制也適用。If you bring work to production in the same environment where development is done, the same RBAC restrictions apply for both the development work and the production work. 這項設定可能會讓這兩個環境的環境過於固定或太有彈性。This setup might make both environments too rigid or too flexible.


多個環境工作區部署: 當您選擇多個環境工作區部署時,會為每個環境部署工作區實例。Multiple environment workspace deployment: When you choose a multiple environment workspace deployment, a workspace instance is deployed for each environment. 這項設定的常見案例是受管制的工作場所,可清楚區分環境之間的職責,以及可存取這些環境之資源的使用者。A common scenario for this setup is a regulated workplace with a clear separation of duties between environments, and for users who have resource access to those environments.

這項設定的優點如下:The benefits of this setup are:

  • Machine Learning 工作流程和構件的分段推出。Staged rollout of Machine Learning workflows and artifacts. 例如,跨環境的模型,有可能會增強靈活性並縮短部署時間。For example, models across environments, with the potential of enhancing agility and reducing time-to-deployment.

  • 增強了資源的安全性和控制,因為您能夠在下游環境中指派更多的存取限制。Enhanced security and control of resources because you have the ability to assign more access restrictions in downstream environments.

  • 針對非開發環境中的生產資料定型案例,因為您可以為選取的使用者群組提供存取權。Training scenarios on production data in non-development environments because you can give a select group of users access.

這種方法的考慮是有更多管理和處理額外負荷的風險,因為這項設定需要更細緻的開發和推出程式,才能在工作區實例間 Machine Learning 構件。A consideration for this approach is you are at risk for more management and process overhead since this setup requires a fine-grained development and rollout process for Machine Learning artifacts across workspace instances. 此外,您可能需要進行資料管理和工程工作,才能讓生產資料可用於開發環境中的定型。Additionally, data management and engineering effort might be required to make production data available for training in the development environment. 需要存取管理,才能讓小組存取在生產環境中解析和調查事件。Access management is required for you to give a team access to resolve and investigate incidents in production. 最後,您的小組需要 Azure DevOps 和 Machine Learning 工程專長來實行自動化工作流程。And finally, Azure DevOps and Machine Learning engineering expertise is needed on your team to implement automation workflows.


一個具有有限資料存取權的環境,一個具有生產資料存取權: 當您選擇此設定時,Azure Machine Learning 會部署至兩個環境:一個具有有限資料存取權的環境,以及一個具有生產資料存取權的環境。One environment with limited data access, one with production data access: When you choose this setup, Azure Machine Learning is deployed to two environments – one environment that has limited data access, and one environment that has production data access. 如果您需要隔離開發和生產環境,這是常見的設定。This setup is common if you have a requirement to segregate development and production environments. 例如,如果您要在組織限制下工作,讓生產資料可在任何環境中使用,或是當您想要將開發工作與生產工作隔離,而不需要因為維護成本高昂而重複超過所需的資料時。For example, if you are working under organizational constraints to make production data available in any environment or when you want to segregate development work from production work without duplicating data more than required due to the high cost of maintenance.

這項設定的優點是在開發和生產環境之間清楚區隔職責和存取。The benefit of this setup is the clear separation of duties and access between development and production environments. 相較于多環境部署案例,另一個優點是資源管理額外負荷較低。Another benefit is lower resource management overhead when compared to a multi-environment deployment scenario.

這種方式的考慮是需要在工作區間 Machine Learning 成品的定義開發和推出流程。A consideration for this approach a defined development and rollout process for Machine Learning artifacts across workspaces is required. 另一個考慮是資料管理和工程工作,可能需要在開發環境中讓生產資料可供定型。Another consideration is data management and engineering effort might be required to make production data available for training in a development environment. 不過,它可能需要比多環境工作區部署更少的工作。However, it might require relatively less effort than a multi-environment workspace deployment.


區域和資源設定Regions and resource setup

資源、資料或使用者的位置可能會要求您在多個 Azure 區域中建立 Azure Machine Learning 工作區實例和相關聯的資源。The location of your resources, data, or users, might require you to create Azure Machine Learning workspace instances and associated resources in multiple Azure regions. 例如,一個專案可能會跨越西歐和美國東部 Azure 區域的資源,以獲得效能、成本和合規性的原因。For example, one project might span its resources across the West Europe and East US Azure regions for performance, cost, and compliance reasons. 以下是常見案例:The following scenarios are common:

區域訓練: 機器學習訓練工作會在與資料所在的相同 Azure 區域中執行。Regional training: The machine learning training jobs run in the same Azure region as where the data is located. 在此設定中,會將 Machine Learning 工作區部署到資料所在的每個 Azure 區域。In this setup, a Machine Learning workspace is deployed to each Azure region where data is located. 當您在合規性下進行時,或跨區域具有資料移動限制時,就會發生這種情況。It's a common scenario when you are acting under compliance, or when you have data movement constraints across regions.

這項設定的優點是,可以在資料所在的資料中心內,以最少的網路延遲來進行實驗。The benefit of this setup is experimentation can be done in the data center where the data is located with the least network latency. 這種方法的考慮是當 Machine Learning 管線跨多個工作區實例執行時,它會增加更多管理複雜度。A consideration for this approach is when a Machine Learning pipeline is run across multiple workspace instances, it adds more management complexity. 比較各實例的測試結果,並增加配額和計算管理的額外負荷,會變得很困難。It becomes challenging to compare experimentation results across instances and adds overhead to quota and compute management.

如果您想要在不同區域之間附加儲存體,但使用一個區域的計算,Azure Machine Learning 可支援在區域(而非工作區)中附加儲存體帳戶的案例。If you want to attach storage across regions, but use compute from one region, Azure Machine Learning supports the scenario of attaching storage accounts in a region rather than the workspace. 中繼資料(例如計量)將會儲存在工作區區域中。Metadata, for example metrics, will be stored in the workspace region.


區域服務: Machine Learning 的服務會部署到接近目標物件所在的位置。Regional serving: Machine Learning services are deployed close to where the target audience lives. 例如,如果目標使用者在澳大利亞,且主要儲存體和測試區域是西歐,請在西歐部署 Machine Learning 工作區進行實驗,並部署 AKS 叢集以在澳大利亞推斷端點部署。For example, if target users are in Australia and the main storage and experimentation region is West Europe, deploy the Machine Learning workspace for experimentation in West Europe, and deploy an AKS cluster for inference endpoint deployment in Australia.

這項設定的優點是在資料中心內推斷新資料內嵌、將延遲和資料移動降至最低,以及符合當地法規的機會。The benefits of this setup are the opportunity for inferencing in the data center where new data is ingested, minimizing latency and data movement, and compliance with local regulations.

這種方法的考慮是多重區域設定提供數個優點,也增加了配額和計算管理的額外負荷。A consideration for this approach is a multi-region setup provides several advantages, it also adds more overhead on quota and compute management. 當 batch 推斷有需求時,區域服務可能需要多個工作區部署。When there is a requirement for batch inferencing, regional serving might require a multi-workspace deployment. 透過推斷端點收集的資料可能需要跨區域傳輸,以進行重新定型案例。Data collected through inferencing endpoints might require to be transferred across regions for retraining scenarios.


區域微調: 基底模型會在初始資料集上定型,例如,公用資料或所有區域的資料,並在稍後使用區域資料集進行微調。Regional fine-tuning: A base model is trained on an initial dataset, for example, public data or data from all regions, and is later fine-tuned with a regional dataset. 由於合規性或資料移動條件約束,區域資料集可能只存在於特定區域中。The regional dataset might only exist in a particular region because of compliance or data movement constraints. 例如,基底模型定型可能會在區域 A 的工作區中完成,而細微微調可能會在區域 B 的工作區中完成。For example, base model training might be done in a workspace in region A, while fine tuning might be done in a workspace in region B.

這項設定的優點是,可在資料所在的資料中心符合規範的情況下使用,而且仍會利用舊版管線階段中較大資料集的基底模型定型。The benefit of this setup is experimentation is available in compliance with the data center where the data resides, and still takes advantage of base model training on a larger dataset in an earlier pipeline stage.

考慮這種方法可提供複雜的測試管線,但可能會產生更多挑戰。A consideration is this approach provides the ability for complex experimentation pipelines, however it might create more challenges. 例如,比較跨區域的實驗結果,以及增加配額和計算管理的額外負荷。For example, comparing experiment results across regions and more adding more overhead to quota and compute management.


參考實作Reference implementation

為了說明如何以較大的設定部署 Azure Machine Learning,本節將概述組織的「Contoso」如何根據組織的條件約束、報告和預算需求來設定 Azure Machine Learning:To illustrate the deployment of Azure Machine Learning in a larger setting, this section outlines how the organization 'Contoso' has set up Azure Machine Learning given their organizational constraints, reporting, and budgeting requirements:

  • Contoso 會根據成本管理和報告原因,以解決方案為基礎建立資源群組。Contoso creates resource groups on a solution basis for cost management and reporting reasons.

  • IT 系統管理員只會針對符合預算需求的投資方案,建立資源群組和資源。IT administrators only create resource groups and resources for funded solutions to meet budget requirements.

  • 因為資料科學的 explorative 和不確定本質,所以使用者需要有一個地方來實驗和處理使用案例和資料探索。Because of the explorative and uncertain nature of Data Science, there’s a need for users to have a place to experiment and work for use case and data exploration. Explorative 工作數次不能直接與特定使用案例相關聯,而且只能與 R&D 預算相關聯。Explorative work many times can’t be directly associated to a particular use case, and can be associated only to R&D budget. Contoso 想要集中進行某些 Machine Learning 資源,讓任何人都能用來進行探索。Contoso is looking to fund some Machine Learning resources centrally that anyone can use for exploration purposes.

  • 一旦 Machine Learning 使用案例證明在 explorative 環境中成功,小組就可以要求資源群組。Once a Machine Learning use case proves to be successful in the explorative environment, teams can request resource groups. 例如,開發、QA 和生產環境可進行反復實驗專案工作,並可設定實際執行資料來源的存取權。For example, Dev, QA, and Prod for iterative experimentation project work, and access to production data sources can be set up.

  • 資料隔離和合規性需求不允許實際生產資料存在於開發環境中Data segregation and compliance requirements don’t allow live production data to exist in development environments

  • 每個環境依 IT 原則的不同使用者群組都有不同的 RBAC 需求,例如存取在生產環境中的限制較高。Different RBAC requirements exist for various user groups by IT policy per environment, for example access is more restrictive in production.

  • 所有資料、測試和推斷都是在單一 Azure 區域中完成。All data, experimentation, and inferencing is done in a single Azure region.

為了遵守上述需求,Contoso 已透過下列方式設定其資源:To adhere to the above requirements, Contoso has set up their resources in the following way:

  • Azure Machine Learning 工作區和資源群組的範圍為每個專案,以遵循預算和使用案例隔離需求。Azure Machine Learning workspaces and resource groups are scoped per project to follow budgeting and use case segregation requirements.
  • 針對 Azure Machine Learning 和相關聯的資源進行多重環境設定,以解決成本管理、RBAC 和資料存取需求。A multiple-environment setup for Azure Machine Learning and associated resources to address cost management, RBAC, and data access requirements.
  • 專用於探索的單一資源群組和 Machine Learning 工作區。A single resource group and Machine Learning workspace that is dedicated for exploration.
  • 每個使用者角色和環境不同的 Azure Active Directory 群組,例如資料科學家可以在生產環境中執行的作業與開發環境中的不同,而且每個解決方案的存取層級可能各不相同。Azure Active Directory groups that are different per user role and environment, for example operations that a data scientist can do in a production environment are different than in the development environment, and access levels might differ per solution.
  • 在單一 Azure 區域中建立所有資源All resources are created in a single Azure region

Contoso 參考實行