使用批次模型部署進行映像處理

發行項
04/07/2024

適用於：Azure CLI ml 延伸模組 v2 (目前)Python SDK azure-ai-ml v2 (目前)

批次模型部署可以用於處理表格式資料，但也可以處理任何其他的檔案類型，例如映像。 MLflow 和自訂模型都支援這些部署。在本教學課程中，我們將了解如何部署模型，以根據 ImageNet 分類法分類映像。

關於此範例

我們要使用的模型是使用 TensorFlow 以及 RestNet 架構 (深層殘差網路中的身分識別對應) 來建置。您可以從此處下載此模型的範例。此模型有下列部署的限制，請務必牢記：

其適用於大小為 244x244 的映像 (張量 (224, 224, 3))。
其需要將輸入調整為範圍 [0,1]。

本文中的資訊是以 azureml-examples 存放庫中包含的程式碼範例為基礎。若要在本機執行命令，而不需要複製/貼上 YAML 和其他檔案，請複製存放庫，然後將目錄變更為 cli/endpoints/batch/deploy-models/imagenet-classifier (如果您使用 Azure CLI) 或變更為 sdk/python/endpoints/batch/deploy-models/imagenet-classifier (如果您使用適用於 Python 的 SDK)。

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli/endpoints/batch/deploy-models/imagenet-classifier

在 Jupyter Notebook 中跟著做

您可以在 Jupyter Notebook 中遵循此範例。在複製的存放庫中，開啟筆記本：imagenet-classifier-batch.ipynb。

必要條件

遵循本文中的步驟之前，請確定您已滿足下列必要條件：

Azure 訂用帳戶。如果您沒有 Azure 訂用帳戶，請在開始前建立免費帳戶。試用免費或付費版本的 Azure Machine Learning。
Azure Machine Learning 工作區。如果您沒有帳戶，請使用管理 Azure 機器學習工作區一文中的步驟來建立一個工作區。
請確定您在工作區中具有下列權限：
- 建立或管理批次端點和部署：使用允許 Microsoft.MachineLearningServices/workspaces/batchEndpoints/*的擁有者、參與者或自定義角色。
- 在工作區資源群組中建立ARM部署：使用可在部署工作區的資源群組中使用擁有 Microsoft.Resources/deployments/write 者、參與者或自定義角色。
您必須安裝下列軟體，才能使用 Azure 機器學習：
- Azure CLI
- Python
Azure CLI 和適用於 Azure Machine Learning 的 ml 擴充功能。
```
az extension add -n ml
```
注意

批次端點的管線元件部署是在 Azure CLI 的 ml 擴充功能 2.7 版中引進。使用 az extension update --name ml 來取得其最後一個版本。
適用於 Python 的 Azure Machine Learning SDK。
```
pip install azure-ai-ml
```
注意

類別 ModelBatchDeployment 和 PipelineComponentBatchDeployment 是在 SDK 1.7.0 版中引進。使用 pip install -U azure-ai-ml 來取得其最後一個版本。

連線到您的工作區

工作區是 Azure Machine Learning 的最上層資源，其提供一個集中位置來處理您在使用 Azure Machine Learning 時建立的所有成品。在本節中，我們將連線到您將執行部署工作的工作區。

Azure CLI
Python

在下列程式碼中傳入訂用帳戶識別碼、工作區、位置和資源群組的值：

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

匯入必要的程式庫：

from azure.ai.ml import MLClient, Input, load_component
from azure.ai.ml.entities import BatchEndpoint, ModelBatchDeployment, ModelBatchDeploymentSettings, PipelineComponentBatchDeployment, Model, AmlCompute, Data, BatchRetrySettings, CodeConfiguration, Environment, Data
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.ai.ml.dsl import pipeline
from azure.identity import DefaultAzureCredential

設定工作區詳細資料，並取得工作區的控制代碼：

在下列程式碼中傳入訂用帳戶識別碼、工作區和資源群組的值：

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

使用批次部署的影像分類

在此範例中，我們將了解如何部署深度學習模型，以根據 ImageNet 的分類法來分類指定的映像。

建立端點

首先，讓我們建立將裝載模型的端點：

Azure CLI
Python

決定端點的名稱：

ENDPOINT_NAME="imagenet-classifier-batch"

下列 YAML 檔案會定義批次端點：

endpoint.yml

$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: imagenet-classifier-batch
description: A batch endpoint for performing image classification using a TFHub model ImageNet model.
auth_mode: aad_token

執行下列程式碼以建立端點。

az ml batch-endpoint create --file endpoint.yml  --name $ENDPOINT_NAME

決定端點的名稱：

endpoint_name="imagenet-classifier-batch"

設定端點：

endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An batch service to perform ImageNet image classification",
)

執行下列程式碼以建立端點：

ml_client.batch_endpoints.begin_create_or_update(endpoint)

註冊模型

模型部署只能部署已註冊的模型，因此我們需要加以註冊。如果您已註冊您嘗試部署的模型，則可以略過此步驟。

下載模型的複本：

Azure CLI
Python

wget https://azuremlexampledata.blob.core.windows.net/data/imagenet/model.zip
unzip model.zip -d .

import os
import urllib.request
from zipfile import ZipFile

response = urllib.request.urlretrieve('https://azuremlexampledata.blob.core.windows.net/data/imagenet/model.zip', 'model.zip')

os.mkdirs("imagenet-classifier", exits_ok=True)
with ZipFile(response[0], 'r') as zip:
  model_path = zip.extractall(path="imagenet-classifier")

註冊模型：

Azure CLI
Python

MODEL_NAME='imagenet-classifier'
az ml model create --name $MODEL_NAME --path "model"

model_name = 'imagenet-classifier'
model = ml_client.models.create_or_update(
    Model(name=model_name, path=model_path, type=AssetTypes.CUSTOM_MODEL)
)

建立評分指令碼

我們需要建立評分指令碼，讀取批次部署所提供的映像，然後傳回模型的分數。下列指令碼：

使用 tensorflow 中的 keras 模組，指定載入模型的 init 函式。
指定 run 函式執行批次部署所提供的每個迷你批次。
函式 run 一次讀取一個檔案映像
run 方法會將映像大小調整為模型的預期大小。
run 方法會將映像重新調整為範圍 [0,1] 網域，這是模型預期的情況。
它會傳回與預測相關聯的類別和機率。

code/score-by-file/batch_driver.py

import os
import numpy as np
import pandas as pd
import tensorflow as tf
from os.path import basename
from PIL import Image
from tensorflow.keras.models import load_model


def init():
    global model
    global input_width
    global input_height

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    model = load_model(model_path)
    input_width = 244
    input_height = 244


def run(mini_batch):
    results = []

    for image in mini_batch:
        data = Image.open(image).resize(
            (input_width, input_height)
        )  # Read and resize the image
        data = np.array(data) / 255.0  # Normalize
        data_batch = tf.expand_dims(
            data, axis=0
        )  # create a batch of size (1, 244, 244, 3)

        # perform inference
        pred = model.predict(data_batch)

        # Compute probabilities, classes and labels
        pred_prob = tf.math.reduce_max(tf.math.softmax(pred, axis=-1)).numpy()
        pred_class = tf.math.argmax(pred, axis=-1).numpy()

        results.append([basename(image), pred_class[0], pred_prob])

    return pd.DataFrame(results)

提示

雖然部署是以迷你批次提供映像，但此評分指令碼會一次處理一個映像。這是常見的模式，因為嘗試載入整個批次並一次傳送至模型，可能會導致批次執行程式面臨高記憶體壓力 (OOM 例外狀況)。不過，在某些情況下，這樣做會在評分工作中啟用高輸送量。這是在我們想要達到高 GPU 使用率的 GPU 硬體上批次部署執行個體的情況。如需採用此方法的評分指令碼範例，請參閱高輸送量部署。

注意

如果您嘗試部署產生模型 (產生檔案的模型)，請參閱如何撰寫評分指令碼，如產生多個檔案的模型部署中所述。

建立部署

評分指令碼建立之後，就是為其建立批次部署的時刻。請遵循下列步驟建立部署：

確定您已建立計算叢集，我們可以在其中建立部署。在此範例中，我們將使用名為 gpu-cluster 的計算叢集。雖然並非必要，但我們會使用 GPU 來加速處理。
我們必須指定要執行部署的環境。在我們的案例中，我們的模型是在 TensorFlow 上執行。 Azure Machine Learning 已備有環境且已安裝必要的軟體，因此我們可以重新利用這個環境。我們只會在 conda.yml 檔案中新增幾個相依性。
- Azure CLI
- Python
環境定義將會包含在部署檔案中。
```
compute: azureml:gpu-cluster
environment:
  name: tensorflow212-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.12-cuda11:latest
```
讓我們取得環境的參考：
```
environment = Environment(
    name="tensorflow27-cuda11-gpu",
    conda_file="environment/conda.yml",
    image="mcr.microsoft.com/azureml/curated/tensorflow-2.7-ubuntu20.04-py38-cuda11-gpu:latest",
)
```

現在，我們來建立部署。

Azure CLI
Python

若要在已建立的端點下建立新部署，請建立 YAML 設定，如下所示。您可以檢查完整的批次端點 YAML 結構描述，以取得額外的屬性。

$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
endpoint_name: imagenet-classifier-batch
name: imagenet-classifier-resnetv2
description: A ResNetV2 model architecture for performing ImageNet classification in batch
type: model
model: azureml:imagenet-classifier@latest
compute: azureml:gpu-cluster
environment:
  name: tensorflow212-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.12-cuda11:latest
  conda_file: environment/conda.yaml
code_configuration:
  code: code/score-by-file
  scoring_script: batch_driver.py
resources:
  instance_count: 2
settings:
  max_concurrency_per_instance: 1
  mini_batch_size: 5
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 300
  error_threshold: -1
  logging_level: info

接著，使用下列命令建立部署：

az ml batch-deployment create --file deployment-by-file.yml --endpoint-name $ENDPOINT_NAME --set-default

若要使用指定的環境和評分指令碼建立新部署，請使用下列程式碼：

deployment = BatchDeployment(
    name="imagenet-classifier-resnetv2",
    description="A ResNetV2 model architecture for performing ImageNet classification in batch",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="code/score-by-file",
        scoring_script="batch_driver.py",
    ),
    compute=compute_name,
    instance_count=2,
    max_concurrency_per_instance=1,
    mini_batch_size=10,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)

接著，使用下列命令建立部署：

ml_client.batch_deployments.begin_create_or_update(deployment)

雖然您可以在端點內叫用特定部署，但是您通常會想要叫用端點本身，讓端點決定要使用的部署。這類部署名為「預設」部署。這可讓您變更預設部署，進而變更提供部署的模型，而不需變更與端點叫用者之間的合約。使用下列指示來更新預設部署：
- Azure Machine Learning CLI
- Azure Machine Learning SDK for Python
```
az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME
```
```
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint)
```
此時，我們的批次端點已可供使用。

測試部署

為了測試端點，我們將使用來自原始 ImageNet 資料集的 1000 個映像範例。批次端點只能處理位於雲端且可從 Azure Machine Learning 工作區存取的資料。在此範例中，我們會將其上傳至 Azure Machine Learning 資料存放區。特別的是，我們將建立可用來叫用端點以進行評分的資料資產。不過，請注意，批次端點可接受可放置在多個位置類型的資料。

讓我們下載相關聯的範例資料：

Azure CLI
Python

wget https://azuremlexampledata.blob.core.windows.net/data/imagenet/imagenet-1000.zip
unzip imagenet-1000.zip -d data

!wget https://azuremlexampledata.blob.core.windows.net/data/imagenet-1000.zip
!unzip imagenet-1000.zip -d data

現在，讓我們從剛下載的資料建立資料資產

Azure CLI
Python

在 YAML 中建立資料資產定義：

imagenet-sample-unlabeled.yml

$schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
name: imagenet-sample-unlabeled
description: A sample of 1000 images from the original ImageNet dataset. Download content from https://azuremlexampledata.blob.core.windows.net/data/imagenet-1000.zip.
type: uri_folder
path: data

然後，建立資料資產：

az ml data create -f imagenet-sample-unlabeled.yml

data_path = "data"
dataset_name = "imagenet-sample-unlabeled"

imagenet_sample = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="A sample of 1000 images from the original ImageNet dataset",
    name=dataset_name,
)

然後，建立資料資產：

ml_client.data.create_or_update(imagenet_sample)

若要取得新建立的資料資產，請使用：

imagenet_sample = ml_client.data.get(dataset_name, label="latest")

現在資料已上傳並可供使用，讓我們來叫用端點：
- Azure CLI
- Python
```
JOB_NAME=$(az ml batch-endpoint invoke --name $ENDPOINT_NAME --input azureml:imagenet-sample-unlabeled@latest --query name -o tsv)
```
注意

公用程式 jq 可能不會在每次安裝時進行安裝。您可以在此連結中取得指示。
提示

叫用端點時 inputs 和 input 之間有何差異？

一般而言，您可以使用字典 inputs = {} 搭配 invoke 方法，向批次端點 (其中包括模型部署或管線部署) 提供任意數目的必要輸入。

針對模型部署，您可以使用 input 作為較短的方式來指定部署的輸入資料位置，因為模型部署一律只需要一個資料輸入。
```
input = Input(type=AssetTypes.URI_FOLDER, path=imagenet_sample.id)
job = ml_client.batch_endpoints.invoke(
   endpoint_name=endpoint.name,
   input=input,
)
```
提示

請注意，我們不會在叫用作業中指出部署名稱。這是因為端點會自動將工作路由至預設部署。因為我們的端點只有一個部署，所以該部署就是預設部署。您可以指示引數/參數 deployment_name，以特定部署為目標。
命令傳回時，就會立即啟動批次工作。您可以監視工作的狀態，直到工作完成為止：
- Azure CLI
- Python
```
az ml job show -n $JOB_NAME --web
```
```
ml_client.jobs.get(job.name)
```

部署完成後，我們可以下載預測：

Azure CLI
Python

若要下載預測，請使用下列命令：

az ml job download --name $JOB_NAME --output-name score --download-path ./

ml_client.jobs.download(name=job.name, output_name='score', download_path='./')

輸出預測顯示如下。請注意，預測已與標籤結合，方便讀者使用。若要深入了解如何達成此目的，請參閱相關聯的筆記本。

import pandas as pd
score = pd.read_csv("named-outputs/score/predictions.csv", header=None,  names=['file', 'class', 'probabilities'], sep=' ')
score['label'] = score['class'].apply(lambda pred: imagenet_labels[pred])
score

檔案	class	機率	label
n02088094_Afghan_hound.JPEG	161	0.994745	阿富汗獵犬
n02088238_basset	162	0.999397	巴吉度獵犬
n02088364_beagle.JPEG	165	0.366914	布魯泰克獵浣熊犬
n02088466_bloodhound.JPEG	164	0.926464	尋血獵犬
...	...	...	...

高輸送量部署

如先前所述，我們剛才建立的部署會一次處理一個映像，即使批次部署提供一批映像也是如此。在大部分情況下，這是簡化模型執行方式的最佳方法，且可避免任何可能的記憶體不足問題。不過，在某些情況下，我們可能會想要盡可能讓基礎硬體的使用率達到飽和。例如，GPU 就是這種情況。

在這些情況下，我們可能會想要對整個批次的資料執行推斷。這表示將整個映像集載入記憶體，並將其直接傳送至模型。下列範例會使用 TensorFlow 來讀取映像批次，並一次為映像評分。它也會使用 TensorFlow ops 來執行任何資料前置處理，讓整個管線發生在 (CPU/GPU) 所使用的相同裝置上。

警告

有些模型與輸入的大小在記憶體耗用量方面，有非線性關聯性。再次進行批次處理 (如此範例中所執行) 或減少批次部署所建立的批次大小，以避免記憶體不足的例外狀況。

建立評分指令碼：

code/score-by-batch/batch_driver.py

import os
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import load_model


def init():
    global model
    global input_width
    global input_height

    # AZUREML_MODEL_DIR is an environment variable created during deployment
    model_path = os.path.join(os.environ["AZUREML_MODEL_DIR"], "model")

    # load the model
    model = load_model(model_path)
    input_width = 244
    input_height = 244


def decode_img(file_path):
    file = tf.io.read_file(file_path)
    img = tf.io.decode_jpeg(file, channels=3)
    img = tf.image.resize(img, [input_width, input_height])
    return img / 255.0


def run(mini_batch):
    images_ds = tf.data.Dataset.from_tensor_slices(mini_batch)
    images_ds = images_ds.map(decode_img).batch(64)

    # perform inference
    pred = model.predict(images_ds)

    # Compute probabilities, classes and labels
    pred_prob = tf.math.reduce_max(tf.math.softmax(pred, axis=-1)).numpy()
    pred_class = tf.math.argmax(pred, axis=-1).numpy()

    return pd.DataFrame(
        [mini_batch, pred_prob, pred_class], columns=["file", "probability", "class"]
    )

提示

請注意，此指令碼會從批次部署所傳送的迷你批次建構 Tensor 資料集。此資料集會預先處理，以使用 map 作業搭配 decode_img 函式來取得模型的預期張量。
資料集會再次進行批次處理 (16) 將資料傳送至模型。使用此參數來控制您可以載入記憶體中多少資訊，並一次傳送至模型。如果在 GPU 上執行，您必須仔細調整此參數，以在取得 OOM 例外狀況之前，達到 GPU 的最大使用率。
計算預測之後，張量會轉換成 numpy.ndarray。

現在，我們來建立部署。

Azure CLI
Python

若要在已建立的端點下建立新部署，請建立 YAML 設定，如下所示。您可以檢查完整的批次端點 YAML 結構描述，以取得額外的屬性。

$schema: https://azuremlschemas.azureedge.net/latest/modelBatchDeployment.schema.json
endpoint_name: imagenet-classifier-batch
name: imagenet-classifier-resnetv2
description: A ResNetV2 model architecture for performing ImageNet classification in batch
type: model
model: azureml:imagenet-classifier@latest
compute: azureml:gpu-cluster
environment:
  name: tensorflow212-cuda11-gpu
  image: mcr.microsoft.com/azureml/curated/tensorflow-2.12-cuda11:latest
  conda_file: environment/conda.yaml
code_configuration:
  code: code/score-by-batch
  scoring_script: batch_driver.py
resources:
  instance_count: 2
tags:
  device_acceleration: CUDA
  device_batching: 16
settings:
  max_concurrency_per_instance: 1
  mini_batch_size: 5
  output_action: append_row
  output_file_name: predictions.csv
  retry_settings:
    max_retries: 3
    timeout: 300
  error_threshold: -1
  logging_level: info

接著，使用下列命令建立部署：

az ml batch-deployment create --file deployment-by-batch.yml --endpoint-name $ENDPOINT_NAME --set-default

若要使用指定的環境和評分指令碼建立新部署，請使用下列程式碼：

deployment = BatchDeployment(
    name="imagenet-classifier-resnetv2",
    description="A ResNetV2 model architecture for performing ImageNet classification in batch",
    endpoint_name=endpoint.name,
    model=model,
    environment=environment,
    code_configuration=CodeConfiguration(
        code="code/score-by-batch",
        scoring_script="batch_driver.py",
    ),
    compute=compute_name,
    instance_count=2,
    tags={ "device_acceleration": "CUDA", "device_batching": "16" }
    max_concurrency_per_instance=1,
    mini_batch_size=10,
    output_action=BatchDeploymentOutputAction.APPEND_ROW,
    output_file_name="predictions.csv",
    retry_settings=BatchRetrySettings(max_retries=3, timeout=300),
    logging_level="info",
)

接著，使用下列命令建立部署：

ml_client.batch_deployments.begin_create_or_update(deployment)

您可以使用這個新的部署搭配先前所示的範例資料。請記得，若要叫用此部署，您應該在叫用方法中指出部署的名稱，或將它設定為預設部署的名稱。

MLflow 模型處理映像的注意事項

Batch 端點中的 MLflow 模型支援將映像讀取為輸入資料。由於 MLflow 部署不需要評分指令碼，因此使用時請考慮下列事項：

支援的映像檔包括：.png、.jpg、.jpeg、.tiff、.bmp 和 .gif。
MLflow 模型應該會收到 np.ndarray 做為輸入，以符合輸入映像的維度。為了支援每個批次上的多個映像大小，批次執行程式會針對每個映像檔叫用 MLflow 模型一次。
強烈建議 MLflow 模型包含簽章，而且如果這樣做必須是類型 TensorSpec。如果有的話，輸入會重新調整以符合張量圖形。如果沒有可用的簽章，則會推斷類型 np.uint8 的張量。
對於包含簽章且預期處理不同大小的映像模型，則包含可保證其大小的簽章。例如，下列簽章範例將允許 3 個通道映像的批次。

import numpy as np
import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec

input_schema = Schema([
  TensorSpec(np.dtype(np.uint8), (-1, -1, -1, 3)),
])
signature = ModelSignature(inputs=input_schema)

(...)

mlflow.<flavor>.log_model(..., signature=signature)

您可以在 Jupyter 筆記本中 imagenet-classifier-mlflow.ipynb 找到工作範例。如需如何在批次部署中使用 MLflow 模型的詳細資訊，請參閱在批次部署中使用 MLflow 模型。

Share via

使用批次模型部署進行映像處理

關於此範例

在 Jupyter Notebook 中跟著做

必要條件

連線到您的工作區

使用批次部署的影像分類

建立端點

註冊模型

建立評分指令碼

建立部署

測試部署

高輸送量部署

MLflow 模型處理映像的注意事項

下一步

其他資源