您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是计算机视觉?What is Computer Vision?

使用 Azure 的计算机视觉服务,开发人员可以访问用于处理图像并返回信息的高级算法。Azure's Computer Vision service provides developers with access to advanced algorithms that process images and return information. 若要分析图像,可以上传图像,也可以指定图像 URL。To analyze an image, you can either upload an image or specify an image URL. 图像处理算法可以通过多种不同的方式分析内容,具体取决于你感兴趣的视觉功能。The images processing algorithms can analyze content in several different ways, depending on the visual features you're interested in. 例如,计算机视觉可以确定图像是否包含成人内容或不雅内容,或者查找图像中的所有人脸。For example, Computer Vision can determine if an image contains adult or racy content, or it can find all of the human faces in an image.

可以在应用程序中使用计算机视觉,方法是:使用本机 SDK,或者直接调用 REST API。You can use Computer Vision in your application by using either a native SDK or invoking the REST API directly. 此页广泛地介绍了计算机视觉的功能。This page broadly covers what you can do with Computer Vision.

通过分析图像来获取见解Analyze images for insight

可以分析图像,以便检测并提供有关视觉特性和特征的见解。You can analyze images to detect and provide insights about their visual features and characteristics. 下表中的所有特性由分析图像 API 提供。All of the features in the table below are provided by the Analyze Image API.

操作Action 说明Description
标记视觉特性Tag visual features 根据数千个可识别对象、生物、风景和操作识别并标记图像中的视觉特征。Identify and tag visual features in an image, from a set of thousands of recognizable objects, living things, scenery, and actions. 如果标记含混不清或者不常见,API 响应会提供“提示”,明确已知设置上下文中的标记的含义。When the tags are ambiguous or not common knowledge, the API response provides 'hints' to clarify the meaning of the tag in the context of a known setting. 标记并不局限于主体(如前景中的人员),还包括设置(室内或室外)、家具、工具、工厂、动物、附件、小工具等。Tagging isn't limited to the main subject, such as a person in the foreground, but also includes the setting (indoor or outdoor), furniture, tools, plants, animals, accessories, gadgets, and so on.
检测对象Detect objects 对象检测类似于添加标记,但 API 返回应用于每个标记的边框坐标。Object detection is similar to tagging, but the API returns the bounding box coordinates for each tag applied. 例如,如果图像包含狗、猫和人,检测操作将列出这些对象及其在图像中的坐标。For example, if an image contains a dog, cat and person, the Detect operation will list those objects together with their coordinates in the image. 可以使用此功能进一步处理图像中各对象之间的关系。You can use this functionality to process further relationships between the objects in an image. 当图像中有多个相同标记的实例时,还会通知你。It also lets you know when there are multiple instances of the same tag in an image.
检测品牌Detect brands 根据一个包含数千全球徽标的数据库,确定图像或视频中的商业品牌。Identify commercial brands in images or videos from a database of thousands of global logos. 可以使用此功能来执行特定的操作,例如,发现哪些品牌在社交媒体上最受欢迎,或者哪些品牌在社交产品排名上最靠前。You can use this feature, for example, to discover which brands are most popular on social media or most prevalent in media product placement.
对图像分类Categorize an image 使用具有父/子遗传层次结构的类别分类对整个图像进行标识和分类。Identify and categorize an entire image, using a category taxonomy with parent/child hereditary hierarchies. 类别可单独使用或与我们的新标记模型结合使用。Categories can be used alone, or with our new tagging models.
目前,英语是唯一可以对图像进行标记和分类的语言。Currently, English is the only supported language for tagging and categorizing images.
描述图像Describe an image 使用完整的句子,以人类可读语言生成整个图像的说明。Generate a description of an entire image in human-readable language, using complete sentences. 计算机视觉算法可根据图像中标识的对象生成各种说明。Computer Vision's algorithms generate various descriptions based on the objects identified in the image. 分别对这些说明进行评估并生成置信度分数。The descriptions are each evaluated and a confidence score generated. 然后将返回置信度分数从高到低的列表。A list is then returned ordered from highest confidence score to lowest.
检测人脸Detect faces 检测图像中的人脸,提供每个检测到的人脸的相关信息。Detect faces in an image and provide information about each detected face. 计算机视觉返回每个检测到的人脸的坐标、矩形、性别和年龄。Computer Vision returns the coordinates, rectangle, gender, and age for each detected face.
计算机视觉提供可以在人脸中发现的部分功能。可以使用人脸服务进行更详细的分析,例如人脸识别和姿势检测。Computer Vision provides a subset of the functionality that can be found in Face, and you can use the Face service for more detailed analysis, such as facial identification and pose detection.
检测图像类型Detect image types 检测图像特征,例如图像是否为素描,或者图像是剪贴画的可能性。Detect characteristics about an image, such as whether an image is a line drawing or the likelihood of whether an image is clip art.
检测特定领域的内容Detect domain-specific content 使用域模型来检测和标识图像中特定领域的内容,例如名人和地标。Use domain models to detect and identify domain-specific content in an image, such as celebrities and landmarks. 例如,如果图像中包含人物,则计算机视觉可以使用服务随附的针对名人的域模型来确定图像中检测到的人物是否与已知名人匹配。For example, if an image contains people, Computer Vision can use a domain model for celebrities included with the service to determine if the people detected in the image match known celebrities.
检测颜色方案Detect the color scheme 分析图像中的颜色使用情况。Analyze color usage within an image. 计算机视觉可以确定图像是黑白的还是彩色的,而对于彩色图像,又可以确定主色和主题色。Computer Vision can determine whether an image is black & white or color and, for color images, identify the dominant and accent colors.
生成缩略图Generate a thumbnail 分析图像的内容,生成该图像的相应缩略图。Analyze the contents of an image to generate an appropriate thumbnail for that image. 计算机视觉首先生成高质量缩略图,然后通过分析图像中的对象来确定“感兴趣区域”。Computer Vision first generates a high-quality thumbnail and then analyzes the objects within the image to determine the area of interest. 然后,计算机视觉会裁剪图像以满足感兴趣区域的要求。Computer Vision then crops the image to fit the requirements of the area of interest. 可以根据用户需求,使用与原始图像的纵横比不同的纵横比显示生成的缩略图。The generated thumbnail can be presented using an aspect ratio that is different from the aspect ratio of the original image, depending on your needs.
获取感兴趣区域Get the area of interest 分析图像内容,以返回“感兴趣区域”的坐标。Analyze the contents of an image to return the coordinates of the area of interest. 这是用于生成缩略图的相同函数,但是计算机视觉并没有裁剪图像,而是返回该区域的边框坐标,因此调用的应用程序可以根据需要修改原始图像。This is the same function that is used to generate a thumbnail, but instead of cropping the image, Computer Vision returns the bounding box coordinates of the region, so the calling application can modify the original image as desired.

从图像中提取文本Extract text from images

可以使用计算机视觉通过光学字符识别 (OCR) 将图像中的文本提取到机器可读的字符流中。You can use Computer Vision to extract text from an image into a machine-readable character stream using optical character recognition (OCR). 如果需要,OCR 会校正已识别文本的旋转角度并提供每个词的帧坐标。If needed, OCR corrects the rotation of the recognized text and provides the frame coordinates of each word. OCR 支持 25 种语言,并会自动检测已识别文本的语言。OCR supports 25 languages and automatically detects the language of the recognized text.

还可以使用 Read API 从图像和以文本为主的文档中提取印刷体文本和手写文本。You can also use the Read API to extract both printed and handwritten text from images and text-heavy documents. Read API 使用更新的模型,适用于具有不同表面和背景的各种对象,比如收据、海报、名片、信件和白板。The Read API uses updated models and works for a variety objects with different surfaces and backgrounds, such as receipts, posters, business cards, letters, and whiteboards. 目前,Read API 处于预览状态,英语是唯一受支持的语言。Currently, the Read API is in preview, and English is the only supported language.

管理图像中的内容Moderate content in images

可以使用计算机视觉来检测图像中的成人和不雅内容,并返回这两者的置信度分数。You can use Computer Vision to detect adult and racy content in an image and return a confidence score for both. 可以根据自己的偏好在滑尺上设置成人和不雅内容检测的筛选器。The filter for adult and racy content detection can be set on a sliding scale to accommodate your preferences.

使用容器Use containers

将标准化的 Docker 容器安装到靠近数据的位置以后,即可在本地使用计算机视觉容器识别印刷体文本和手写文本。Use Computer Vision containers to recognize printed and handwritten text locally by installing a standardized Docker container closer to your data.

图像要求Image requirements

计算机视觉可以分析符合以下要求的图像:Computer Vision can analyze images that meet the following requirements:

  • 图像必须以 JPEG、PNG、GIF 或 BMP 格式显示The image must be presented in JPEG, PNG, GIF, or BMP format
  • 图像的文件大小必须不到 4 兆字节 (MB)The file size of the image must be less than 4 megabytes (MB)
  • 图像的尺寸必须大于 50 x 50 像素The dimensions of the image must be greater than 50 x 50 pixels
    • 对于 OCR,图像的尺寸必须介于 50 x 50 和 4200 x 4200 像素之间For OCR, the dimensions of the image must be between 50 x 50 and 4200 x 4200 pixels

数据隐私和安全性Data privacy and security

与所有认知服务一样,使用计算机视觉服务的开发人员应该了解 Microsoft 针对客户数据的政策。As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft's policies on customer data. 请参阅 Microsoft 信任中心上的“认知服务”页面来了解详细信息。See the Cognitive Services page on the Microsoft Trust Center to learn more.

后续步骤Next steps

按照快速入门指南操作,完成计算机视觉入门:Get started with Computer Vision by following a quickstart guide: