Team Data Science Process 角色和工作Team Data Science Process roles and tasks

Team Data 科學程式 (TDSP) 是由 Microsoft 所開發的架構,可提供結構化的方法來有效率地建立預測性分析解決方案和智慧型應用程式。The Team Data Science Process (TDSP) is a framework developed by Microsoft that provides a structured methodology to efficiently build predictive analytics solutions and intelligent applications. 本文概述資料科學小組在此程式上標準化的重要人員角色和相關工作。This article outlines the key personnel roles and associated tasks for a data science team standardizing on this process.

本簡介文章會連結到如何設定 TDSP 環境的教學課程。This introductory article links to tutorials on how to set up the TDSP environment. 本教學課程提供使用 Azure DevOps Projects、Azure Repos 存放庫和 Azure Boards 的詳細指導方針。The tutorials provide detailed guidance for using Azure DevOps Projects, Azure Repos repositories, and Azure Boards. 發展目標是透過模型化和部署,從概念中移出。The motivating goal is moving from concept through modeling and into deployment.

本教學課程使用 Azure DevOps,因為這是如何在 Microsoft 中執行 TDSP。The tutorials use Azure DevOps because that is how to implement TDSP at Microsoft. Azure DevOps 藉由整合以角色為基礎的安全性、工作專案管理和追蹤,以及程式碼裝載、共用和原始檔控制,來促進共同作業。Azure DevOps facilitates collaboration by integrating role-based security, work item management and tracking, and code hosting, sharing, and source control. 本教學課程也會使用 Azure 資料科學虛擬機器 (DSVM) 作為分析桌面,該桌面已預先設定數個熱門的資料科學工具,並與 Microsoft 軟體和 Azure 服務整合。The tutorials also use an Azure Data Science Virtual Machine (DSVM) as the analytics desktop, which has several popular data science tools pre-configured and integrated with Microsoft software and Azure services.

您可以使用這些教學課程,以使用其他程式碼裝載、agile 規劃和開發工具和環境來執行 TDSP,但有些功能可能無法使用。You can use the tutorials to implement TDSP using other code-hosting, agile planning, and development tools and environments, but some features may not be available.

資料科學群組和小組的結構Structure of data science groups and teams

企業中的資料科學函數通常會組織成下列階層:Data science functions in enterprises are often organized in the following hierarchy:

  • 資料科學群組Data science group
    • 群組中的資料科學小組/秒Data science team/s within the group

在這種結構中,有群組潛在客戶和小組負責人。In such a structure, there are group leads and team leads. 一般來說,資料科學專案是由資料科學小組所完成。Typically, a data science project is done by a data science team. 資料科學團隊具有專案管理和治理工作的專案負責人,以及個別的資料科學家和工程師,以執行專案的資料科學和資料工程部分。Data science teams have project leads for project management and governance tasks, and individual data scientists and engineers to perform the data science and data engineering parts of the project. 初始專案設定和治理由群組、小組或專案負責人進行。The initial project setup and governance is done by the group, team, or project leads.

四個 TDSP 角色的定義和工作Definition and tasks for the four TDSP roles

假設資料科學單位是由群組內的小組所組成,則 TDSP 人員有四個不同的角色:With the assumption that the data science unit consists of teams within a group, there are four distinct roles for TDSP personnel:

  1. 群組管理員:管理企業中的整個資料科學單位。Group Manager: Manages the entire data science unit in an enterprise. 資料科學單位可能有多個小組,而每個小組負責不同商業垂直市場中的多個資料科學專案。A data science unit might have multiple teams, each of which is working on multiple data science projects in distinct business verticals. 團隊管理員可能會將他們的工作委派給代理人,但是與角色相關的工作會改變。A Group Manager might delegate their tasks to a surrogate, but the tasks associated with the role do not change.

  2. 小組負責人:管理企業資料科學單位中的小組。Team Lead: Manages a team in the data science unit of an enterprise. 小組是由多個資料科學家所組成。A team consists of multiple data scientists. 若為小型資料科學單位,群組管理員和小組負責人可能會是同一人。For a small data science unit, the Group Manager and the Team Lead might be the same person.

  3. 專案負責人:管理特定資料科學專案的個別資料科學家日常活動。Project Lead: Manages the daily activities of individual data scientists on a specific data science project.

  4. 專案個別參與者:資料科學家、商務分析師、資料工程師、架構設計人員,以及執行資料科學專案的其他人。Project Individual Contributors: Data Scientists, Business Analysts, Data Engineers, Architects, and others who execute a data science project.

注意

依企業的結構和大小而定,一個人可能扮演一個以上的角色,或多個人可能會填滿一個角色。Depending on the structure and size of an enterprise, a single person may play more than one role, or more than one person may fill a role.

由四個角色完成的工作Tasks to be completed by the four roles

下圖顯示每個 Team Data 科學流程角色的最上層工作。The following diagram shows the top-level tasks for each Team Data Science Process role. 此架構和下列更詳細的每個 TDSP 角色工作大綱,可協助您根據自己的職責選擇所需的教學課程。This schema and the following, more detailed outline of tasks for each TDSP role can help you choose the tutorial you need based on your responsibilities.

角色和工作概觀

群組管理員工作Group Manager tasks

群組管理員或指定的 TDSP 系統管理員會完成下列工作以採用 TDSP:The Group Manager or a designated TDSP system administrator completes the following tasks to adopt the TDSP:

  • 在組織內建立 Azure DevOps 的 組織 和群組專案。Creates an Azure DevOps organization and a group project within the organization.
  • 在 Azure DevOps 群組專案中建立 專案範本儲存 機制,並從 Microsoft TDSP 小組所開發的專案範本存放庫植入該存放庫。Creates a project template repository in the Azure DevOps group project, and seeds it from the project template repository developed by the Microsoft TDSP team. Microsoft TDSP 專案範本存放庫提供:The Microsoft TDSP project template repository provides:
    • 標準化目錄結構,包括資料、程式碼和檔的目錄。A standardized directory structure, including directories for data, code, and documents.
    • 一組 標準化的檔範本 ,可引導有效率的資料科學流程。A set of standardized document templates to guide an efficient data science process.
  • 建立 公用程式存放庫,並從 Microsoft TDSP 小組所開發的公用程式存放庫植入該存放庫。Creates a utility repository, and seeds it from the utility repository developed by the Microsoft TDSP team. Microsoft 的 TDSP 公用程式存放庫提供一組實用的公用程式,讓資料科學家的工作更有效率。The TDSP utility repository from Microsoft provides a set of useful utilities to make the work of a data scientist more efficient. Microsoft 公用程式存放庫包含互動式資料探索、分析、報告和基準模型和報告的公用程式。The Microsoft utility repository includes utilities for interactive data exploration, analysis, reporting, and baseline modeling and reporting.
  • 設定組織帳戶的 安全性控制原則Sets up the security control policy for the organization account.

如需詳細指示,請參閱 適用于資料科學小組的群組管理員工作For detailed instructions, see Group Manager tasks for a data science team.

小組負責人工作Team Lead tasks

小組負責人或指定的專案管理員完成下列工作以採用 TDSP:The Team Lead or a designated project administrator completes the following tasks to adopt the TDSP:

  • 在群組的 Azure DevOps 組織中建立 team 專案Creates a team project in the group's Azure DevOps organization.
  • 在專案中建立 專案範本存放庫 ,並從群組管理員或委派所設定的群組專案範本存放庫植入該存放庫。Creates the project template repository in the project, and seeds it from the group project template repository set up by the Group Manager or delegate.
  • 建立 小組公用程式存放庫,從群組公用程式存放庫植入該存放庫,並將小組專屬公用程式新增至存放庫。Creates the team utility repository, seeds it from the group utility repository, and adds team-specific utilities to the repository.
  • (選擇性)建立 Azure 檔案儲存體 ,為小組儲存實用的資料資產。Optionally creates Azure file storage to store useful data assets for the team. 其他小組成員可以在其分析桌面上掛接此共用雲端檔案存放區。Other team members can mount this shared cloud file store on their analytics desktops.
  • 選擇性地在小組的 DSVM 上掛接 Azure 檔案儲存體,並在其中新增小組資料資產。Optionally mounts the Azure file storage on the team's DSVM and adds team data assets to it.
  • 藉由新增小組成員及設定其許可權來設定 安全性控制Sets up security control by adding team members and configuring their permissions.

如需詳細指示,請參閱 適用于資料科學小組的小組負責人工作For detailed instructions, see Team Lead tasks for a data science team.

專案負責人工作Project Lead tasks

專案負責人會完成下列工作以採用 TDSP:The Project Lead completes the following tasks to adopt the TDSP:

  • 在 team 專案中建立 專案儲存 機制,並從專案範本存放庫植入該存放庫。Creates a project repository in the team project, and seeds it from the project template repository.
  • (選擇性)建立 Azure 檔案儲存體 來儲存專案的資料資產。Optionally creates Azure file storage to store the project's data assets.
  • 選擇性地將 Azure 檔案儲存體掛接至 DSVM ,並在其中新增專案資料資產。Optionally mounts the Azure file storage to the DSVM and adds project data assets to it.
  • 藉由新增專案成員及設定其許可權來設定 安全性控制Sets up security control by adding project members and configuring their permissions.

如需詳細指示,請參閱 資料科學小組的專案負責人工作。For detailed instructions, see Project Lead tasks for a data science team.

專案個別參與者工作Project Individual Contributor tasks

專案個別參與者(通常是資料科學家)會使用 TDSP 來執行下列工作:The Project Individual Contributor, usually a Data Scientist, conducts the following tasks using the TDSP:

  • 複製專案負責人所設定的 專案存放庫Clones the project repository set up by the project lead.
  • (選擇性)將共用的小組和專案 Azure 檔案儲存體 裝載于其 資料科學虛擬機器 (DSVM) 。Optionally mounts the shared team and project Azure file storage on their Data Science Virtual Machine (DSVM).
  • 執行專案。Executes the project.

如需上架專案的詳細指示,請參閱 為數據科學小組投影個別參與者工作。For detailed instructions for onboarding onto a project, see Project Individual Contributor tasks for a data science team.

資料科學專案執行工作流程Data science project execution workflow

藉由遵循相關的教學課程,資料科學家、專案負責人和小組負責人可以建立工作專案,以追蹤專案從開始到結束的所有工作和階段。By following the relevant tutorials, data scientists, project leads, and team leads can create work items to track all tasks and stages for project from beginning to end. 使用 Azure Repos 可提升資料科學家之間的共同作業,並確保在專案執行期間產生的成品是由所有專案成員控制和共用。Using Azure Repos promotes collaboration among data scientists and ensures that the artifacts generated during project execution are version controlled and shared by all project members. Azure DevOps 可讓您將 Azure Boards 工作專案與 Azure Repos 存放庫分支連結,並輕鬆地追蹤工作專案的工作專案。Azure DevOps lets you link your Azure Boards work items with your Azure Repos repository branches and easily track what has been done for a work item.

下圖概述專案執行的 TDSP 工作流程:The following figure outlines the TDSP workflow for project execution:

一般資料科學專案工作流程

工作流程步驟可以分為三個活動:The workflow steps can be grouped into three activities:

  • 專案負責人進行短期衝刺計畫Project Leads conduct sprint planning
  • 資料科學家在分支上開發 git 成品以處理工作專案Data Scientists develop artifacts on git branches to address work items
  • 專案負責人或其他小組成員進行程式碼審核,並將工作分支合併至主要分支Project Leads or other team members do code reviews and merge working branches to the primary branch

如需專案執行工作流程的詳細指示,請參閱 敏捷式開發資料科學專案For detailed instructions on project execution workflow, see Agile development of data science projects.

TDSP 專案範本存放庫TDSP project template repository

使用 Microsoft TDSP 小組的 專案範本存放庫 ,以支援有效率的專案執行和協同作業。Use the Microsoft TDSP team's project template repository to support efficient project execution and collaboration. 存放庫會提供您可用於自己 TDSP 專案的標準化目錄結構和檔範本。The repository gives you a standardized directory structure and document templates you can use for your own TDSP projects.

下一步Next steps

深入探索 Team Data Science Process 定義的角色和工作描述:Explore more detailed descriptions of the roles and tasks defined by the Team Data Science Process: