什麼是電腦視覺?What is Computer Vision?

Azure 的電腦視覺服務可供開發人員存取進階演算法,以處理影像並傳回資訊。Azure's Computer Vision service provides developers with access to advanced algorithms that process images and return information. 若要分析影像,您可以上傳影像或指定影像 URL。To analyze an image, you can either upload an image or specify an image URL. 影像處理演算法可根據您感興趣的視覺化功能,以不同的方式分析內容。The images processing algorithms can analyze content in several different ways, depending on the visual features you're interested in. 例如,電腦視覺可判斷影像中是否包含成人或猥褻內容,也可以尋找影像中的所有人臉。For example, Computer Vision can determine if an image contains adult or racy content or find all of the human faces in an image.

您可以藉由使用原生 SDK 或直接叫用 REST API,在應用程式中使用電腦視覺。You can use Computer Vision in your application by using either a native SDK or invoking the REST API directly. 本頁會廣泛說明電腦視覺的功用。This page broadly covers what you can do with Computer Vision.

深入分析影像Analyze images for insight

您可以分析影像,以偵測並提供與其視覺特徵和特性有關的深入解析。You can analyze images to detect and provide insights about their visual features and characteristics. 下表中的所有功能是由分析影像 API 所提供。All of the features in the table below are provided by the Analyze Image API.

動作Action 說明Description
標記視覺特徵Tag visual features 從一組數千個可辨識的物件、生物、景象和動作,識別及標記影像中的視覺特徵。Identify and tag visual features in an image, from a set of thousands of recognizable objects, living things, scenery, and actions. 若標記不明確或不屬於常識,API 回應會提供提示來釐清標記的內容。When the tags are ambiguous or not common knowledge, the API response provides hints to clarify the context of the tag. 標記並未限定於主體 (例如前景中的人物),而是包含周遭環境 (室內或室外)、家具、工具、植物、動物、配件和小工具等。Tagging isn't limited to the main subject, such as a person in the foreground, but also includes the setting (indoor or outdoor), furniture, tools, plants, animals, accessories, gadgets, and so on.
偵測物件Detect objects 物件偵測與標記功能類似,但 API 會傳回每個所套用標記的週框方塊座標。Object detection is similar to tagging, but the API returns the bounding box coordinates for each tag applied. 例如,如果影像包含狗、貓或人物,「偵測」作業就會列出這些物件及其在影像中的座標。For example, if an image contains a dog, cat and person, the Detect operation will list those objects together with their coordinates in the image. 您可以使用此功能來處理影像中物件間的進一步關聯性。You can use this functionality to process further relationships between the objects in an image. 當影像中有多個相同標記的執行個體時,此功能也會讓您知道。It also lets you know when there are multiple instances of the same tag in an image.
偵測品牌Detect brands 從擁有數千個全球商標的資料庫中,識別影像或視訊內的商業品牌。Identify commercial brands in images or videos from a database of thousands of global logos. 例如,您可以使用這項功能探索哪些品牌在社交媒體最受歡迎或在媒體產品位置中最常見。You can use this feature, for example, to discover which brands are most popular on social media or most prevalent in media product placement.
將影像分類Categorize an image 使用具有父/子承襲階層的類別分類法來識別及分類整個影像。Identify and categorize an entire image, using a category taxonomy with parent/child hereditary hierarchies. 類別可單獨使用,或與我們新的標記模型搭配使用。Categories can be used alone, or with our new tagging models.
目前,英文是唯一支援影像標記和分類的語言。Currently, English is the only supported language for tagging and categorizing images.
說明影像Describe an image 以一般人看得懂的語言,使用完整的句子產生整個影像的描述。Generate a description of an entire image in human-readable language, using complete sentences. 電腦視覺的演算法會根據在影像中識別出來的物件產生各種描述。Computer Vision's algorithms generate various descriptions based on the objects identified in the image. 這些描述會個別受到評估,並產生信賴分數。The descriptions are each evaluated and a confidence score generated. 接著會傳回一份清單,並依照信賴分數由高至低排序。A list is then returned ordered from highest confidence score to lowest.
偵測臉部Detect faces 偵測影像中的臉部,並提供與每個偵測到的臉部有關的資訊。Detect faces in an image and provide information about each detected face. 電腦視覺會針對每個偵測到的臉部傳回座標、矩形、性別和年齡。Computer Vision returns the coordinates, rectangle, gender, and age for each detected face.
電腦視覺提供部分的臉部服務功能。Computer Vision provides a subset of the Face service functionality. 您可以使用臉部服務進行更詳細的分析,例如臉部識別和姿勢偵測。You can use the Face service for more detailed analysis, such as facial identification and pose detection.
偵測影像類型Detect image types 偵測影像的關於特性,例如影像是否為線條繪圖,或影像為美工圖案的可能性。Detect characteristics about an image, such as whether an image is a line drawing or the likelihood of whether an image is clip art.
偵測特定領域內容Detect domain-specific content 使用領域模型可偵測及識別影像中的特定領域內容,例如名人和地標。Use domain models to detect and identify domain-specific content in an image, such as celebrities and landmarks. 例如,如果影像包含人物,電腦視覺即可使用名人領域模型,判斷影像中偵測到的人物是否為知名人士。For example, if an image contains people, Computer Vision can use a domain model for celebrities to determine if the people detected in the image are known celebrities.
偵測色彩配置Detect the color scheme 分析影像中的用色方式。Analyze color usage within an image. 電腦視覺可判斷影像是黑白還是彩色的,如果是彩色影像,則會找出主色和輔色。Computer Vision can determine whether an image is black & white or color and, for color images, identify the dominant and accent colors.
產生縮圖Generate a thumbnail 分析影像的內容,為其產生適當的縮圖。Analyze the contents of an image to generate an appropriate thumbnail for that image. 「電腦視覺」會先產生高品質的縮圖,然後分析該影像內的物件,以判斷「關注區域」 。Computer Vision first generates a high-quality thumbnail and then analyzes the objects within the image to determine the area of interest. 接著,「電腦視覺」會裁剪影像以符合關注區域的需求。Computer Vision then crops the image to fit the requirements of the area of interest. 產生的縮圖可以使用與原始影像的外觀比例不同的外觀比例來呈現,視您的需求而定。The generated thumbnail can be presented using an aspect ratio that is different from the aspect ratio of the original image, depending on your needs.
取得關注區域Get the area of interest 分析影像的內容以傳回「關注區域」 的座標。Analyze the contents of an image to return the coordinates of the area of interest. 「電腦視覺」會傳回該區域的週框方塊座標,而不會裁剪影像並產生縮圖,因此呼叫端應用程式可以視需要修改原始影像。Instead of cropping the image and generating a thumbnail, Computer Vision returns the bounding box coordinates of the region, so the calling application can modify the original image as desired.

擷取影像中的文字Extract text from images

您可以使用電腦視覺閱讀 API,將影像中列印和手寫文字擷取到電腦可讀取的字元資料流中。You can use Computer Vision Read API to extract printed and handwritten text from images into a machine-readable character stream. 讀取 API 會使用最新的模型並處理各種表層和背景上的文字,例如收據、海報、名片、信件和白板等。The Read API uses our latest models and works with text on a variety of surfaces and backgrounds, such as receipts, posters, business cards, letters, and whiteboards. 目前,英文是唯一支援的語言。Currently, English is the only supported language.

您也可以使用光學字元辨識 (OCR) API,以數種語言擷取列印的文字。You can also use the optical character recognition (OCR) API to extract printed text in several languages. 如有需要,OCR 會校正已辨識文字的旋轉角度,並提供每個字的框架座標。If needed, OCR corrects the rotation of the recognized text and provides the frame coordinates of each word. OCR 支援 25 種語言,且會根據辨識的文字自動偵測其語言。OCR supports 25 languages and automatically detects the language of the recognized text.

調節影像中的內容Moderate content in images

您可以使用電腦視覺在影像中偵測成人和猥褻內容,並傳回兩者的信賴分數。You can use Computer Vision to detect adult and racy content in an image and return a confidence score for both. 您可以在滑動標尺上設定成人和猥褻內容偵測的篩選條件,以配合您的喜好設定。You can set the filter for adult and racy content detection on a sliding scale to accommodate your preferences.

使用容器Use containers

使用電腦視覺容器,藉由在更接近資料的位置安裝標準化的 Docker 容器,於本機辨識列印和手寫的文字。Use Computer Vision containers to recognize printed and handwritten text locally by installing a standardized Docker container closer to your data.

影像需求Image requirements

電腦視覺可分析符合下列需求的影像:Computer Vision can analyze images that meet the following requirements:

  • 必須以 JPEG、PNG、GIF 或 BMP 格式呈現的影像The image must be presented in JPEG, PNG, GIF, or BMP format
  • 影像的檔案大小必須小於 4 MBThe file size of the image must be less than 4 megabytes (MB)
  • 影像的維度必須大於 50 x 50 像素The dimensions of the image must be greater than 50 x 50 pixels
    • 針對 OCR,影像的大小必須介於 50 x 50 與 4200 x 4200 像素之間For OCR, the dimensions of the image must be between 50 x 50 and 4200 x 4200 pixels

資料隱私權和安全性Data privacy and security

和所有認知服務一樣,使用電腦視覺服務的開發人員應該要了解 Microsoft 對於客戶資料的政策。As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft's policies on customer data. 請參閱 Microsoft 信任中心上的認知服務頁面,以進行深入了解。See the Cognitive Services page on the Microsoft Trust Center to learn more.

後續步驟Next steps

藉由遵循快速入門指南來開始使用電腦視覺:Get started with Computer Vision by following a quickstart guide: