適用於 Python 的 Azure 認知服務電腦視覺 SDK

發行項
04/05/2023

電腦視覺服務可供開發人員存取進階演算法，以處理影像及傳回資訊。電腦視覺演算法可根據您感興趣的視覺化功能，以不同的方式分析影像的內容。

您可以在應用程式中使用電腦視覺來：

分析影像以產生深入解析
擷取影像中的文字
產生縮圖

在尋找更多文件嗎？

必要條件

如果您需要電腦視覺 API 帳戶，則可以使用這個 Azure CLI 命令來建立：

RES_REGION=westeurope
RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

az cognitiveservices account create \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --location $RES_REGION \
    --kind ComputerVision \
    --sku S1 \
    --yes

安裝

您也可以選擇在虛擬環境內，使用 pip 安裝 Azure 認知服務電腦視覺 SDK。

設定虛擬環境 (選擇性)

如果您使用虛擬環境，則可以讓基底系統與 Azure SDK 環境彼此隔離，但您不一定要這麼做。執行下列命令，使用 venv 先設定再輸入虛擬環境，例如 cogsrv-vision-env：

python3 -m venv cogsrv-vision-env
source cogsrv-vision-env/bin/activate

安裝 SDK

使用 pip 安裝適用於 Python 的 Azure 認知服務電腦視覺 SDK 套件：

pip install azure-cognitiveservices-vision-computervision

驗證

在建立電腦視覺資源後，您需要其區域及其中一個帳戶金鑰來具現化用戶端物件。

在建立 ComputerVisionClient 用戶端物件的執行個體時，請使用這些值。

取得認證

請使用下面的 Azure CLI 程式碼片段，來為兩個環境變數填入電腦視覺帳戶區域及其中一個金鑰 (您也可以在 Azure 入口網站找到這些值)。此程式碼片段會針對 Bash Shell 加以格式化。

RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

export ACCOUNT_REGION=$(az cognitiveservices account show \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query location \
    --output tsv)

export ACCOUNT_KEY=$(az cognitiveservices account keys list \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query key1 \
    --output tsv)

建立用戶端

填入 ACCOUNT_REGION 和 ACCOUNT_KEY 環境變數之後，您就可以建立 ComputerVisionClient 用戶端物件。

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

import os
region = os.environ['ACCOUNT_REGION']
key = os.environ['ACCOUNT_KEY']

credentials = CognitiveServicesCredentials(key)
client = ComputerVisionClient(
    endpoint="https://" + region + ".api.cognitive.microsoft.com/",
    credentials=credentials
)

使用方式

初始化 ComputerVisionClient 用戶端物件之後，您可以.

分析影像：您可以分析特定功能的影像，例如臉部、色彩、標籤。
產生縮圖：建立自訂 JPEG 影像，以作為原始影像的縮圖。
取得影像的描述：根據影像的主旨網域取得影像的描述。

如需此服務的詳細資訊，請參閱什麼是電腦視覺？。

範例

下列各節提供數個程式碼片段，內容涵蓋一些最常見的電腦視覺工作，包括：

分析影像
取得主題領域清單
依領域分析影像
取得影像的文字描述
取得影像中的手寫文字
產生縮圖

分析影像

您可以使用 analyze_image 分析影像，以找出某些特性。請使用 visual_features 屬性來設定要對影像執行的分析類型。常見的值為 VisualFeatureTypes.tags 和 VisualFeatureTypes.description。

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"

image_analysis = client.analyze_image(url,visual_features=[VisualFeatureTypes.tags])

for tag in image_analysis.tags:
    print(tag)

取得主題領域清單

請使用 list_models 來檢閱用於分析影像的主題領域。在依領域分析影像時會用到這些領域名稱。 landmarks 便是領域的其中一個範例。

models = client.list_models()

for x in models.models_property:
    print(x)

依領域分析影像

您可以使用 analyze_image_by_domain 來依主題領域分析影像。請取得所支援主題領域的清單，以便使用正確的領域名稱。

domain = "landmarks"
url = "https://images.pexels.com/photos/338515/pexels-photo-338515.jpeg"
language = "en"

analysis = client.analyze_image_by_domain(domain, url, language)

for landmark in analysis.result["landmarks"]:
    print(landmark["name"])
    print(landmark["confidence"])

取得影像的文字描述

您可以使用 describe_image 來取得影像的語言式文字描述。如果您要對影像的相關關鍵字執行文字分析，請使用 max_description 屬性要求幾個描述。下圖的文字描述範例包括 a train crossing a bridge over a body of water、a large bridge over a body of water 和 a train crossing a bridge over a large body of water。

domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

analysis = client.describe_image(url, max_descriptions, language)

for caption in analysis.captions:
    print(caption.text)
    print(caption.confidence)

從影像中取得文字

您可以從影像中取得任何手寫或列印的文字。這需要對 SDK 發出兩個呼叫：read 和 get_read_result。讀取的呼叫是非同步。在get_read_result呼叫的結果中，您需要先檢查第一次呼叫是否已完成 OperationStatusCodes ，再擷取文字資料。結果中會包含文字以及文字的週框座標。

# import models
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes

url = "https://github.com/Azure-Samples/cognitive-services-python-sdk-samples/raw/master/samples/vision/images/make_things_happen.jpg"
raw = True
numberOfCharsInOperationId = 36

# SDK call
rawHttpResponse = client.read(url, language="en", raw=True)

# Get ID from returned headers
operationLocation = rawHttpResponse.headers["Operation-Location"]
idLocation = len(operationLocation) - numberOfCharsInOperationId
operationId = operationLocation[idLocation:]

# SDK call
result = client.get_read_result(operationId)

# Get data
if result.status == OperationStatusCodes.succeeded:

    for line in result.analyze_result.read_results[0].lines:
        print(line.text)
        print(line.bounding_box)

產生縮圖

您可以使用 generate_thumbnail 來產生影像的縮圖 (JPG)。縮圖的比例不必和原始影像相同。

此範例會使用 Pillow 套件在本機儲存新的縮圖影像。

from PIL import Image
import io

width = 50
height = 50
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"

thumbnail = client.generate_thumbnail(width, height, url)

for x in thumbnail:
    image = Image.open(io.BytesIO(x))

image.save('thumbnail.jpg')

疑難排解

一般

使用 Python SDK 與 ComputerVisionClient 用戶端物件互動時，系統會用 ComputerVisionErrorException 類別來傳回錯誤。服務所傳回的錯誤會對應至 REST API 要求所傳回的相同 HTTP 狀態碼。

例如，如果您嘗試使用無效的金鑰來分析影像，就會傳回 401 錯誤。在下列程式碼片段中，藉由攔截例外狀況並顯示錯誤的其他資訊，以正常方式處理錯誤。


domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

try:
    analysis = client.describe_image(url, max_descriptions, language)

    for caption in analysis.captions:
        print(caption.text)
        print(caption.confidence)
except HTTPFailure as e:
    if e.status_code == 401:
        print("Error unauthorized. Make sure your key and region are correct.")
    else:
        raise

透過重試來處理暫時性錯誤

在使用 ComputerVisionClient 用戶端時，您可能會遇到由於服務強制執行速率限制而造成的暫時性失敗，或遇到其他暫時性問題 (例如，網路中斷)。如需如何處理這些失敗類型的相關資訊，請參閱《雲端設計模式》指南中的重試模式，以及相關的斷路器模式。

下一步

其他文件

如需更多的電腦視覺服務文件，請參閱 docs.microsoft.com 上的 Azure 電腦視覺文件。