您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

适用于 Python 的 Azure 认知服务计算机视觉 SDKAzure Cognitive Services Computer Vision SDK for Python

使用计算机视觉服务,开发人员可以访问用于处理图像并返回信息的高级算法。The Computer Vision service provides developers with access to advanced algorithms for processing images and returning information. 计算机视觉算法根据你感兴趣的视觉特征,通过不同的方式分析图像的内容。Computer Vision algorithms analyze the content of an image in different ways, depending on the visual features you're interested in.

有关此服务的详细信息,请参阅什么是计算机视觉?For more information about this service, see What is Computer Vision?.

想要更多文档?Looking for more documentation?

先决条件Prerequisites

如果你没有 Azure 订阅If you don't have an Azure Subscription

创建有效期为 7 天的免费密钥,获得计算机视觉服务的 试用 体验。Create a free key valid for 7 days with the Try It experience for the Computer Vision service. 创建密钥后,复制密钥和终结点名称。When the key is created, copy the key and endpoint name. 需要这些来创建客户端You will need this to create the client.

创建密钥后,保留以下项:Keep the following after the key is created:

  • 密钥值:32 个字符的字符串,格式为 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxKey value: a 32 character string with the format of xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • 密钥终结点:基本终结点 URL,https://westcentralus.api.cognitive.microsoft.comKey endpoint: the base endpoint URL, https://westcentralus.api.cognitive.microsoft.com

如果你拥有 Azure 订阅If you have an Azure Subscription

在订阅中创建资源的最简单方法是使用以下 Azure CLI 命令。The easiest method to create a resource in your subscription is to use the following Azure CLI command. 这样会创建一个认知服务密钥,该密钥可以在许多认知服务中使用。This creates a Cognitive Service key that can be used across many cognitive services. 需要选择现有的资源组名称(例如“my-cogserv-group”)和新的计算机视觉资源名称(例如“my-computer-vision-resource”)。You need to choose the existing resource group name, for example, "my-cogserv-group" and the new computer vision resource name, such as "my-computer-vision-resource".

RES_REGION=westeurope
RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

az cognitiveservices account create \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --location $RES_REGION \
    --kind CognitiveServices \
    --sku S0 \
    --yes

安装 SDKInstall the SDK

安装包含 pip 的适用于 Python 的 Azure 认知服务计算机视觉 SDK Install the Azure Cognitive Services Computer Vision SDK for Python package with pip:

pip install azure-cognitiveservices-vision-computervision

AuthenticationAuthentication

创建计算机视觉资源后,需要使用该资源的终结点及其帐户密钥之一来实例化客户端对象。Once you create your Computer Vision resource, you need its endpoint, and one of its account keys to instantiate the client object.

创建 ComputerVisionClient 客户端对象的实例时需要使用这些值。Use these values when you create the instance of the ComputerVisionClient client object.

例如,使用 Bash 终端设置环境变量:For example, use the Bash terminal to set the environment variables:

ACCOUNT_ENDPOINT=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

对于 Azure 订阅用户,请获取密钥和终结点的凭据For Azure subscription users, get credentials for key and endpoint

如果忘记了终结点和密钥,可以使用以下方法找到它们。If you do not remember your endpoint and key, you can use the following method to find them. 如需创建密钥和终结点,则可使用适用于 Azure 订阅持有人的方法,或者使用适用于没有 Azure 订阅的用户的方法。If you need to create a key and endpoint, you can use the method for Azure subscription holders or for users without an Azure subscription.

使用以下 Azure CLI 代码片段在两个环境变量中填充计算机视觉帐户的终结点及其密钥之一(也可以在 Azure 门户中找到这些值)。Use the Azure CLI snippet below to populate two environment variables with the Computer Vision account endpoint and one of its keys (you can also find these values in the Azure portal). 此代码片段已针对 Bash shell 格式化。The snippet is formatted for the Bash shell.

RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

export ACCOUNT_ENDPOINT=$(az cognitiveservices account show \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query endpoint \
    --output tsv)

export ACCOUNT_KEY=$(az cognitiveservices account keys list \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query key1 \
    --output tsv)

创建客户端Create client

从环境变量获取终结点和密钥,然后创建 ComputerVisionClient 客户端对象。Get the endpoint and key from environment variables then create the ComputerVisionClient client object.

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

# Get endpoint and key from environment variables
import os
endpoint = os.environ['ACCOUNT_ENDPOINT']
key = os.environ['ACCOUNT_KEY']

# Set credentials
credentials = CognitiveServicesCredentials(key)

# Create client
client = ComputerVisionClient(endpoint, credentials)

示例Examples

在使用以下任何任务前,你需要 ComputerVisionClient 客户端对象。You need a ComputerVisionClient client object before using any of the following tasks.

分析图像Analyze an image

可以使用 analyze_image 分析图像中的某些特征。You can analyze an image for certain features with analyze_image. 使用 visual_features 属性设置针对图像执行的分析类型。Use the visual_features property to set the types of analysis to perform on the image. 常用值为 VisualFeatureTypes.tagsVisualFeatureTypes.descriptionCommon values are VisualFeatureTypes.tags and VisualFeatureTypes.description.

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"

image_analysis = client.analyze_image(url,visual_features=[VisualFeatureTypes.tags])

for tag in image_analysis.tags:
    print(tag)

获取主题域列表Get subject domain list

使用 list_models 查看用于分析图像的主题域。Review the subject domains used to analyze your image with list_models. 按域分析图像时将使用这些域名。These domain names are used when analyzing an image by domain. 域的示例是 landmarksAn example of a domain is landmarks.

models = client.list_models()

for x in models.models_property:
    print(x)

按域分析图像Analyze an image by domain

可以使用 analyze_image_by_domain 按主题域分析图像。You can analyze an image by subject domain with analyze_image_by_domain. 获取支持的主题域列表,以使用正确的域名。Get the list of supported subject domains in order to use the correct domain name.

# type of prediction
domain = "landmarks"

# Public domain image of Eiffel tower
url = "https://images.pexels.com/photos/338515/pexels-photo-338515.jpeg"

# English language response
language = "en"

analysis = client.analyze_image_by_domain(domain, url, language)

for landmark in analysis.result["landmarks"]:
    print(landmark["name"])
    print(landmark["confidence"])

获取图像的文本说明Get text description of an image

可以使用 describe_image 获取图像的基于语言的文本说明。You can get a language-based text description of an image with describe_image. 如果你要针对与图像关联的关键字执行文本分析,请使用 max_description 属性请求多个说明。Request several descriptions with the max_description property if you are doing text analysis for keywords associated with the image. 以下图像的文本说明示例包括 a train crossing a bridge over a body of watera large bridge over a body of watera train crossing a bridge over a large body of waterExamples of a text description for the following image include a train crossing a bridge over a body of water, a large bridge over a body of water, and a train crossing a bridge over a large body of water.

domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

analysis = client.describe_image(url, max_descriptions, language)

for caption in analysis.captions:
    print(caption.text)
    print(caption.confidence)

获取图像中的文本Get text from image

可以从图像中获取任何手写或打印的文本。You can get any handwritten or printed text from an image. 这需要对 SDK 进行两次调用:batch_read_fileget_read_operation_resultThis requires two calls to the SDK: batch_read_file and get_read_operation_result. batch_read_file 的调用是异步的。The call to batch_read_file is asynchronous. get_read_operation_result 调用的结果中,需要先使用 TextOperationStatusCodes 检查第一次调用是否已完成,然后提取文本数据。In the results of the get_read_operation_result call, you need to check if the first call completed with TextOperationStatusCodes before extracting the text data. 结果包括文本以及该文本的边框坐标。The results include the text as well as the bounding box coordinates for the text.

# import models
from azure.cognitiveservices.vision.computervision.models import TextRecognitionMode
from azure.cognitiveservices.vision.computervision.models import TextOperationStatusCodes
import time

url = "https://azurecomcdn.azureedge.net/cvt-1979217d3d0d31c5c87cbd991bccfee2d184b55eeb4081200012bdaf6a65601a/images/shared/cognitive-services-demos/read-text/read-1-thumbnail.png"
mode = TextRecognitionMode.handwritten
raw = True
custom_headers = None
numberOfCharsInOperationId = 36

# Async SDK call
rawHttpResponse = client.batch_read_file(url, mode, custom_headers,  raw)

# Get ID from returned headers
operationLocation = rawHttpResponse.headers["Operation-Location"]
idLocation = len(operationLocation) - numberOfCharsInOperationId
operationId = operationLocation[idLocation:]

# SDK call
while True:
    result = client.get_read_operation_result(operationId)
    if result.status not in ['NotStarted', 'Running']:
        break
    time.sleep(1)

# Get data
if result.status == TextOperationStatusCodes.succeeded:
    for textResult in result.recognition_results:
        for line in textResult.lines:
            print(line.text)
            print(line.bounding_box)

生成缩略图Generate thumbnail

可以使用 generate_thumbnail 生成图像的缩略图 (JPG)。You can generate a thumbnail (JPG) of an image with generate_thumbnail. 缩略图的比例不需要与原始图像相同。The thumbnail does not need to be in the same proportions as the original image.

安装 Pillow 以使用此示例:Install Pillow to use this example:

pip install Pillow

Pillow 安装后,使用以下代码示例中的包来生成缩略图。Once Pillow is installed, use the package in the following code example to generate the thumbnail image.

# Pillow package
from PIL import Image

# IO package to create local image
import io

width = 50
height = 50
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"

thumbnail = client.generate_thumbnail(width, height, url)

for x in thumbnail:
    image = Image.open(io.BytesIO(x))

image.save('thumbnail.jpg')

故障排除Troubleshooting

常规General

使用 Python SDK 来与 ComputerVisionClient 客户端对象交互时,将使用 ComputerVisionErrorException 返回错误。When you interact with the ComputerVisionClient client object using the Python SDK, the ComputerVisionErrorException class is used to return errors. 服务返回的错误对应于返回给 REST API 请求的相同 HTTP 状态代码。Errors returned by the service correspond to the same HTTP status codes returned for REST API requests.

例如,如果你尝试使用无效的密钥分析图像,则会返回 401 错误。For example, if you try to analyze an image with an invalid key, a 401 error is returned. 以下代码片段通过捕获异常并显示有关错误的其他信息来妥善处理该错误In the following snippet, the error is handled gracefully by catching the exception and displaying additional information about the error.


domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

try:
    analysis = client.describe_image(url, max_descriptions, language)

    for caption in analysis.captions:
        print(caption.text)
        print(caption.confidence)
except HTTPFailure as e:
    if e.status_code == 401:
        print("Error unauthorized. Make sure your key and endpoint are correct.")
    else:
        raise

使用重试处理暂时性错误Handle transient errors with retries

使用 ComputerVisionClient 客户端时,可能会遇到服务强制实施的速率限制所导致的暂时性错误,或者网络中断等其他暂时性问题。While working with the ComputerVisionClient client, you might encounter transient failures caused by rate limits enforced by the service, or other transient problems like network outages. 了解如何处理此类错误的信息,请参阅云设计模式指南中的重试模式,以及相关的断路器模式For information about handling these types of failures, see Retry pattern in the Cloud Design Patterns guide, and the related Circuit Breaker pattern.

后续步骤Next steps