CLI(v2) 일괄 처리 배포 YAML 스키마

아티클
11/15/2023

원본 JSON 스키마는 .에서 https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json찾을 수 있습니다.

참고 항목

이 문서에 자세히 설명된 YAML 구문은 최신 버전의 ML CLI v2 확장에 대한 JSON 스키마를 기반으로 합니다. 이 구문은 최신 버전의 ML CLI v2 확장에서만 작동하도록 보장됩니다. https://azuremlschemasprod.azureedge.net/에서 이전 확장 버전에 대한 스키마를 찾을 수 있습니다.

YAML 구문

키	형식	설명	허용된 값	기본값
`$schema`	string	YAML 스키마입니다. Azure Machine Learning VS Code 확장을 사용하여 YAML 파일을 제작하는 경우 파일 맨 위에 `$schema`를 포함하여 스키마 및 리소스 완성을 호출할 수 있습니다.
`name`	string	필수입니다. 배포의 이름입니다.
`description`	string	배포에 대한 설명입니다.
`tags`	개체	배포에 대한 태그 사전입니다.
`endpoint_name`	string	필수입니다. 배포를 만들 엔드포인트의 이름입니다.
`type`	string	필수입니다. 배치 배포 형식. 모델 배포에는 `model`을 사용하고, 파이프라인 구성 요소 배포에는 `pipeline`을 사용합니다. 버전 1.7의 새로운 기능	`model`, `pipeline`	`model`
`settings`	개체	배포 구성. 허용되는 값은 모델 및 파이프라인 구성 요소에 대한 특정 YAML 참조를 참조하세요. 버전 1.7의 새로운 기능

팁

type 키는 CLI 확장 버전 1.7 이상에서 도입되었습니다. 이전 버전과의 호환성을 완벽하게 지원하기 위해 이 속성의 기본값은 model입니다. 그러나 명시적으로 지정하지 않으면 settings 키가 적용되지 않으며 모델 배포 설정의 모든 속성이 YAML 사양의 루트에 표시되어야 합니다.

모델 배포를 위한 YAML 구문

type: model인 경우 다음 구문이 적용됩니다.

키	형식	설명	허용된 값	Default value
`model`	문자열 또는 개체	필수입니다. 배포에 사용할 모델입니다. 이 값은 작업 영역에서 기존 버전의 모델에 대한 참조 또는 인라인 모델 사양일 수 있습니다. 기존 모델을 참조하려면 `azureml:<model-name>:<version>` 구문을 사용합니다. 모델 인라인을 정의하려면 모델 스키마를 따릅니다. 프로덕션 시나리오에 대한 모범 사례로 모델을 별도로 만들고 여기에서 참조해야 합니다.
`code_configuration`	개체	채점 코드 논리에 대한 구성입니다. 모델이 MLflow 형식이면 이 속성은 필요하지 않습니다.
`code_configuration.code`	string	모델을 채점하는 데 필요한 모든 Python 소스 코드가 포함된 로컬 디렉터리입니다.
`code_configuration.scoring_script`	string	위의 디렉터리에 있는 Python 파일입니다. 이 파일에는 `init()` 함수와 `run()` 함수가 있어야 합니다. 비용이 많이 들거나 일반적인 준비에는 `init()` 함수를 사용합니다(예: 메모리에 모델 로드). `init()`는 프로세스 시작 시 한 번만 호출됩니다. `run(mini_batch)`를 사용하여 각 항목의 점수를 매길 수 있습니다. `mini_batch`의 값은 파일 경로 목록입니다. `run()` 함수는 pandas DataFrame 또는 배열을 반환해야 합니다. 반환되는 각 요소는 `mini_batch`의 입력 요소의 성공적인 실행 1건을 나타냅니다. 채점 스크립트를 작성하는 방법에 대한 자세한 내용은 채점 스크립트 이해를 참조하세요.
`environment`	문자열 또는 개체	배포에 사용할 환경입니다. 이 값은 작업 영역에서 기존 버전의 환경에 대한 참조 또는 인라인 환경 사양일 수 있습니다. 모델이 MLflow 형식이면 이 속성은 필요하지 않습니다. 기존 환경을 참조하려면 `azureml:<environment-name>:<environment-version>` 구문을 사용합니다. 환경을 인라인으로 정의하려면 환경 스키마를 따릅니다. 프로덕션 시나리오에 대한 모범 사례로 환경을 별도로 만들고 여기에서 참조해야 합니다.
`compute`	string	필수입니다. 일괄 처리 채점 작업을 실행할 컴퓨팅 대상의 이름입니다. 이 값은 `azureml:<compute-name>` 구문을 사용하여 작업 영역의 기존 컴퓨팅에 대한 참조여야 합니다.
`resources.instance_count`	정수	각 일괄 처리 채점 작업에 사용할 노드 수입니다.		`1`
`settings`	개체	모델 배포의 특정 구성입니다. 버전 1.7에서 변경되었습니다.
`settings.max_concurrency_per_instance`	정수	인스턴스당 최대 병렬 `scoring_script` 실행 수입니다.		`1`
`settings.error_threshold`	정수	무시해야 할 파일 실패 횟수입니다. 전체 입력의 오류 수가 이 값을 초과하면 일괄 처리 채점 작업이 종료됩니다. `error_threshold`는 개별 미니 일괄 처리에 대해서가 아니라 전체 입력을 위한 것입니다. 생략하면 작업을 종료하지 않고도 파일 오류가 여러 번 허용됩니다.		`-1`
`settings.logging_level`	string	로그 세부 정보 표시 수준입니다.	`warning`, , `infodebug`	`info`
`settings.mini_batch_size`	정수	`code_configuration.scoring_script`에서 한 번의 `run()` 호출로 처리할 수 있는 파일 수입니다.		`10`
`settings.retry_settings`	개체	각 미니 일괄 처리의 채점 설정을 다시 시도합니다.
`settings.retry_settings.max_retries`	정수	실패 또는 시간이 제한된 미니 일괄 처리에 대한 최대 다시 시도 횟수입니다.		`3`
`settings.retry_settings.timeout`	정수	단일 미니 일괄 처리 점수를 매기기 위한 제한 시간(초)입니다. 미니 일괄 처리 크기가 더 크거나 모델 실행 비용이 더 많이 드는 경우 더 큰 값을 사용합니다.		`30`
`settings.output_action`	string	출력 파일에서 출력을 구성하는 방법을 나타냅니다. 모델 배포에서 출력 사용자 지정에 표시된 대로 출력 파일을 생성하는 경우 `summary_only`를 사용합니다. `run()` 함수 `return` 문의 일부로 예측을 반환하는 경우 `append_row`를 사용합니다.	`append_row`, `summary_only`	`append_row`
`settings.output_file_name`	string	일괄 처리 채점 출력 파일의 이름입니다.		`predictions.csv`
`settings.environment_variables`	개체	각 일괄 처리 채점 작업에 대해 설정할 환경 변수 키-값 쌍의 사전입니다.

파이프라인 구성 요소 배포를 위한 YAML 구문

type: pipeline인 경우 다음 구문이 적용됩니다.

키	형식	설명	허용된 값	Default value
`component`	문자열 또는 개체	필수입니다. 배포에 사용되는 파이프라인 구성 요소입니다. 이 값은 작업 영역이나 레지스트리에 있는 버전이 지정된 기존 파이프라인 구성 요소에 대한 참조이거나 인라인 파이프라인 사양일 수 있습니다. 기존 구성 요소를 참조하려면 `azureml:<component-name>:<version>` 구문을 사용합니다. 파이프라인 구성 요소를 인라인으로 정의하려면 파이프라인 구성 요소 스키마를 따릅니다. 프로덕션 시나리오의 모범 사례로 구성 요소를 별도로 만들고 여기에서 참조하는 것이 좋습니다. 버전 1.7의 새로운 기능
`settings`	개체	파이프라인 작업에 대한 기본 설정입니다. 구성 가능한 속성 집합에 대해서는 설정 키 특성을 참조하세요. 버전 1.7의 새로운 기능

설명

az ml batch-deployment 명령은 Azure Machine Learning 일괄 처리 배포를 관리하는 데 사용할 수 있습니다.

예제

예제는 예제 GitHub 리포지토리에서 사용할 수 있습니다. 그 중 일부는 아래에 참조되어 있습니다.

YAML: MLflow 모델 배포

code_configuration 또는 environment를 표시할 필요가 없는 MLflow 모델을 포함하는 모델 배포:

$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
endpoint_name: heart-classifier-batch
name: classifier-xgboost-mlflow
description: A heart condition classifier based on XGBoost
type: model
model: azureml:heart-classifier-mlflow@latest
compute: azureml:batch-cluster
resources:
  instance_count: 2
settings:
  max_concurrency_per_instance: 2
  mini_batch_size: 2
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 300
  error_threshold: -1
  logging_level: info

YAML: 채점 스크립트를 사용한 사용자 지정 모델 배포

사용할 채점 스크립트와 환경을 나타내는 모델 배포:

$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
name: mnist-torch-dpl
description: A deployment using Torch to solve the MNIST classification dataset.
endpoint_name: mnist-batch
type: model
model:
  name: mnist-classifier-torch
  path: model
code_configuration:
  code: code
  scoring_script: batch_driver.py
environment:
  name: batch-torch-py38
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
  conda_file: environment/conda.yaml
compute: azureml:batch-cluster
resources:
  instance_count: 1
settings:
  max_concurrency_per_instance: 2
  mini_batch_size: 10
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 30
  error_threshold: -1
  logging_level: info

YAML: 레거시 모델 배포

YAML에 type 특성이 표시되지 않으면 모델 배포가 유추됩니다. 그러나 settings 키는 사용할 수 없으며 속성은 이 예에 표시된 대로 YAML의 루트에 배치되어야 합니다. 항상 속성 type을 지정하는 것이 좋습니다.

$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
endpoint_name: heart-classifier-batch
name: classifier-xgboost-mlflow
description: A heart condition classifier based on XGBoost
model: azureml:heart-classifier-mlflow@latest
compute: azureml:batch-cluster
resources:
  instance_count: 2
max_concurrency_per_instance: 2
mini_batch_size: 2
output_action: append_row
output_file_name: predictions.csv
retry_settings:
  max_retries: 3
  timeout: 300
error_threshold: -1
logging_level: info

YAML: 파이프라인 구성 요소 배포

간단한 파이프라인 구성 요소 배포:

$schema: https://azuremlschemas.azureedge.net/latest/pipelineComponentBatchDeployment.schema.json
name: hello-batch-dpl
endpoint_name: hello-pipeline-batch
type: pipeline
component: azureml:hello_batch@latest
settings:
    default_compute: batch-cluster

다음 단계

CLI(v2) 설치 및 사용