什麼是文字分析 API?What is Text Analytics API?

文字分析 API 是一種雲端式服務,可針對原始的文字提供進階的自然語言處理,同時包含四種主要功能:情感分析、關鍵片語擷取、語言偵測和實體辨識。The Text Analytics API is a cloud-based service that provides advanced natural language processing over raw text, and includes four main functions: sentiment analysis, key phrase extraction, language detection, and entity recognition.

此 API 屬於 Azure 認知服務的一部分,由雲端中的機器學習和 AI 演算法的集合所組成,可供您的開發專案使用。The API is a part of Azure Cognitive Services, a collection of machine learning and AI algorithms in the cloud for your development projects.

文字分析可能代表不同的意義,但在認知服務中,文字分析 API 可提供四種類型的分析,如下所述。Text analysis can mean different things, but in Cognitive Services, the Text Analytics API provides four types of analysis as described below.

情感分析Sentiment Analysis

使用情感分析可藉由分析原始文字而獲得有關於正面或負面情感的線索,來了解客戶對您的品牌或主題有何看法。Use sentiment analysis to find out what customers think of your brand or topic by analyzing raw text for clues about positive or negative sentiment. 此 API 會為每份文件傳回 0 到 1 之間的情感分數,1 代表最正面的情感。This API returns a sentiment score between 0 and 1 for each document, where 1 is the most positive.
分析模型會使用大量文字主體和 Microsoft 的自然語言技術預先定型。The analysis models are pretrained using an extensive body of text and natural language technologies from Microsoft. 針對選取的語言,API 可對您所提供的任何原始文字進行分析及評分,並直接將結果傳回至呼叫端應用程式。For selected languages, the API can analyze and score any raw text that you provide, directly returning results to the calling application. 您可以使用 REST API 或 .NET SDK。You can use the REST API or the .NET SDK.

關鍵片語擷取Key Phrase Extraction

可自動擷取關鍵片語,以快速識別重點。Automatically extract key phrases to quickly identify the main points. 例如,若輸入文字為 "The food was delicious and there were wonderful staff",API 即會傳回主要討論要點:"food" 和 "wonderful staff"。For example, for the input text "The food was delicious and there were wonderful staff", the API returns the main talking points: "food" and "wonderful staff". 您可以使用這裡的 REST API 或 .NET SDK。You can use the REST API here or the .NET SDK.

語言偵測Language Detection

您可以偵測輸入文字是以何種語言撰寫的,並按要求針對所提交的每份文件回報單一語言代碼,包括多種不同的語言、變體、方言,和某些區域性/文化語言。You can detect which language the input text is written in and report a single language code for every document submitted on the request in a wide range of languages, variants, dialects, and some regional/cultural languages. 語言代碼各配有一個分數,表示分數的強度。The language code is paired with a score indicating the strength of the score. 您可以使用 REST API 或 .NET SDK。You can use the REST API or the .NET SDK.

具名實體辨識Named Entity Recognition

識別文字中的實體並分類為人員、位置、組織、日期/時間、數量、百分比、貨幣等等。Identify and categorize entities in your text as people, places, organizations, date/time, quantities, percentages, currencies, and more. 已知的實體也會辨識出來,並連結至網路上的其他資訊。Well-known entities are also recognized and linked to more information on the web. 您可以使用 REST API。You can use the REST API.

使用容器Use containers

使用文字分析容器,藉由在更接近資料的位置安裝標準化的 Docker 容器,於本機擷取關鍵片語、偵測語言和分析人氣。Use the Text Analytics containers to extract key phrases, detect language, and analyze sentiment locally, by installing standardized Docker containers closer to your data.

一般工作流程Typical workflow

工作流程很簡單:您提交資料以進行分析,並在程式碼中處理輸出。The workflow is simple: you submit data for analysis and handle outputs in your code. 分析器會依原狀使用,不需要額外的設定或自訂。Analyzers are consumed as-is, with no additional configuration or customization.

  1. 註冊存取金鑰Sign up for an access key. 每次要求時都必須傳入此金鑰。The key must be passed on each request.

  2. 編寫要求,其中包含 JSON 格式的原始非結構化文字,作為您的資料。Formulate a request containing your data as raw unstructured text, in JSON.

  3. 將要求發佈至註冊期間所建立的端點,並附上所需的資源:情感分析、關鍵片語擷取、語言偵測或實體識別。Post the request to the endpoint established during sign-up, appending the desired resource: sentiment analysis, key phrase extraction, language detection, or entity identification.

  4. 將回應串流處理或儲存至本機。Stream or store the response locally. 視要求之不同,結果可能是情感分數、擷取的關鍵片語集合或語言代碼。Depending on the request, results are either a sentiment score, a collection of extracted key phrases, or a language code.

輸出會根據識別碼以單一 JSON 文件的形式傳回,且附有您發佈的每個文字文件所產生的結果。Output is returned as a single JSON document, with results for each text document you posted, based on ID. 您可在後續分析和視覺化結果,或將結果分類為可操作的深入解析。You can subsequently analyze, visualize, or categorize the results into actionable insights.

資料不會儲存在您的帳戶中。Data is not stored in your account. 文字分析 API 所執行的作業是無狀態的,這表示您提供的文字經處理後隨即傳回結果。Operations performed by the Text Analytics API are stateless, which means the text you provide is processed and results are returned immediately.

適用於多個程式設計經驗層級的 Text AnalyticsText Analytics for multiple programming experience levels

即使您沒有太多程式設計經驗,也可以在您的流程中開始使用 Text Analytics API。You can start using the Text Analytics API in your processes, even if you don't have much experience in programming. 使用這些教學課程可以了解如何使用 API 以不同的方式來分析文字,以符合您的經驗層級。Use these tutorials to learn how you can use the API to analyze text in different ways to fit your experience level.

支援的語言Supported languages

本節已移至個別的文章,以利說明。This section has been moved to a separate article for better discoverability. 如需此內容,請參閱文字分析 API 中支援的語言Refer to Supported languages in Text Analytics API for this content.

資料限制Data limits

所有的文字分析 API 端點均接受原始文字資料。All of the Text Analytics API endpoints accept raw text data. 目前的限制是每個文件 5,120 個字元;如果您需要分析較大的文件,您可以其分成較小的區塊。The current limit is 5,120 characters for each document; if you need to analyze larger documents, you can break them up into smaller chunks. 如果您仍需要更高的限制,請與我們連絡,以便共同討論您的需求。If you still require a higher limit, contact us so that we can discuss your requirements.

限制Limit Value
單一文件的大小上限Maximum size of a single document StringInfo.LengthInTextElements 測量的 5,120 個字元。5,120 characters as measured by StringInfo.LengthInTextElements.
整體要求的大小上限Maximum size of entire request 1 MB1 MB
要求中的文件數上限Maximum number of documents in a request 1,000 份文件1,000 documents

速率限制是每秒 100 個要求和每分鐘 1000 個要求。The rate limit is 100 requests per second and 1000 requests per minute. 您可以在單一呼叫中提交大量文件 (最多 1000 份文件)。You can submit a large quantity of documents in a single call (up to 1000 documents).

Unicode 編碼Unicode encoding

針對文字表示法和字元計數計算,文字分析 API 會使用 Unicode 編碼。The Text Analytics API uses Unicode encoding for text representation and character count calculations. 要求可使用 UTF-8 和 UTF-16 來提交,兩者的字元計數並沒有明顯差異。Requests can be submitted in both UTF-8 and UTF-16 with no measurable differences in the character count. Unicode 字碼指標會作為字元長度的啟發學習法,且會被視為等同於文字分析資料限制的用途。Unicode codepoints are used as the heuristic for character length and are considered equivalent for the purposes of text analytics data limits. 如果您使用 StringInfo.LengthInTextElements 取得字元計數,您所使用就是我們用來測量資料大小的相同方法。If you use StringInfo.LengthInTextElements to get the character count, you are using the same method we use to measure data size.

後續步驟Next steps

  • 註冊存取金鑰,並檢閱呼叫 API 的步驟。Sign up for an access key and review the steps for calling the API.

  • 快速入門是以 C# 撰寫的 REST API 的呼叫適用的逐步解說。Quickstart is a walkthrough of the REST API calls written in C#. 請了解如何以最少的程式碼提交文字、選擇分析,以及檢視結果。Learn how to submit text, choose an analysis, and view results with minimal code. 如有需要,您可以改為從 Python 快速入門來開始。If you prefer, you can start with the Python quickstart instead.

  • 稍微深入了解這個使用 Azure Databricks 的情感分析教學課程Dig in a little deeper with this sentiment analysis tutorial using Azure Databricks.

  • 請至外部和社群內容頁面來參閱我們的部落格文章清單,以及觀看更多相關影片來了解如何搭配其他工具和技術來使用文字分析 API。Check out our list of blog posts and more videos on how to use Text Analytics API with other tools and technologies in our External & Community Content page.