什麼是文字分析 API?What is the Text Analytics API?

文字分析 API 是一種雲端式服務,可針對原始的文字提供進階的自然語言處理,同時包含四種主要功能:情感分析、關鍵片語擷取、語言偵測和實體辨識。The Text Analytics API is a cloud-based service that provides advanced natural language processing over raw text, and includes four main functions: sentiment analysis, key phrase extraction, language detection, and entity recognition.

此 API 屬於 Azure 認知服務的一部分,由雲端中的機器學習和 AI 演算法的集合所組成,可供您的開發專案使用。The API is a part of Azure Cognitive Services, a collection of machine learning and AI algorithms in the cloud for your development projects.

文字分析可能代表不同的意義,但在認知服務中,文字分析 API 可提供四種類型的分析,如下所述。Text analysis can mean different things, but in Cognitive Services, the Text Analytics API provides four types of analysis as described below. 您可以將這些功能與 REST API 搭配使用,或是與 .NETPythonNode.jsGoRuby 搭配使用。You can use these features with the REST API, or a client library for .NET, Python, Node.js, Go, or Ruby.

情感分析Sentiment Analysis

使用情感分析可藉由分析原始文字而獲得有關於正面或負面情感的線索,來了解客戶對您的品牌或主題有何看法。Use sentiment analysis to find out what customers think of your brand or topic by analyzing raw text for clues about positive or negative sentiment. 此 API 會為每份文件傳回 0 到 1 之間的情感分數,1 代表最正面的情感。This API returns a sentiment score between 0 and 1 for each document, where 1 is the most positive.
分析模型會使用大量文字主體和 Microsoft 的自然語言技術預先定型。The analysis models are pretrained using an extensive body of text and natural language technologies from Microsoft. 針對選取的語言,API 可對您所提供的任何原始文字進行分析及評分,並直接將結果傳回至呼叫端應用程式。For selected languages, the API can analyze and score any raw text that you provide, directly returning results to the calling application.

關鍵片語擷取Key Phrase Extraction

可自動擷取關鍵片語,以快速識別重點。Automatically extract key phrases to quickly identify the main points. 例如,若輸入文字為 "The food was delicious and there were wonderful staff",API 即會傳回主要討論要點:"food" 和 "wonderful staff"。For example, for the input text "The food was delicious and there were wonderful staff", the API returns the main talking points: "food" and "wonderful staff".

語言偵測Language Detection

您可以偵測輸入文字是以何種語言撰寫的,並按要求針對所提交的每份文件回報單一語言代碼,包括多種不同的語言、變體、方言,和某些區域性/文化語言。You can detect which language the input text is written in and report a single language code for every document submitted on the request in a wide range of languages, variants, dialects, and some regional/cultural languages. 語言代碼各配有一個分數,表示分數的強度。The language code is paired with a score indicating the strength of the score.

具名實體辨識Named Entity Recognition

識別文字中的實體並分類為人員、位置、組織、日期/時間、數量、百分比、貨幣等等。Identify and categorize entities in your text as people, places, organizations, date/time, quantities, percentages, currencies, and more. 已知的實體也會辨識出來,並連結至網路上的其他資訊。Well-known entities are also recognized and linked to more information on the web.

使用容器Use containers

使用文字分析容器,藉由在更接近資料的位置安裝標準化的 Docker 容器,於本機擷取關鍵片語、偵測語言和分析人氣。Use the Text Analytics containers to extract key phrases, detect language, and analyze sentiment locally, by installing standardized Docker containers closer to your data.

一般工作流程Typical workflow

工作流程很簡單:您提交資料以進行分析,並在程式碼中處理輸出。The workflow is simple: you submit data for analysis and handle outputs in your code. 分析器會依原狀使用,不需要額外的設定或自訂。Analyzers are consumed as-is, with no additional configuration or customization.

  1. 建立文字分析的 Azure 資源Create an Azure resource for Text Analytics. 其後,請取得為您產生的金鑰,用以驗證您的要求。Afterwards, get the key generated for you to authenticate your requests.

  2. 編寫要求,其中包含 JSON 格式的原始非結構化文字,作為您的資料。Formulate a request containing your data as raw unstructured text, in JSON.

  3. 將要求發佈至註冊期間所建立的端點,並附上所需的資源:情感分析、關鍵片語擷取、語言偵測或實體識別。Post the request to the endpoint established during sign-up, appending the desired resource: sentiment analysis, key phrase extraction, language detection, or entity identification.

  4. 將回應串流處理或儲存至本機。Stream or store the response locally. 視要求之不同,結果可能是情感分數、擷取的關鍵片語集合或語言代碼。Depending on the request, results are either a sentiment score, a collection of extracted key phrases, or a language code.

輸出會根據識別碼以單一 JSON 文件的形式傳回,且附有您發佈的每個文字文件所產生的結果。Output is returned as a single JSON document, with results for each text document you posted, based on ID. 您可在後續分析和視覺化結果,或將結果分類為可操作的深入解析。You can subsequently analyze, visualize, or categorize the results into actionable insights.

資料不會儲存在您的帳戶中。Data is not stored in your account. 文字分析 API 所執行的作業是無狀態的,這表示您提供的文字經處理後隨即傳回結果。Operations performed by the Text Analytics API are stateless, which means the text you provide is processed and results are returned immediately.

適用於多個程式設計經驗層級的 Text AnalyticsText Analytics for multiple programming experience levels

即使您沒有太多程式設計經驗,也可以在您的流程中開始使用 Text Analytics API。You can start using the Text Analytics API in your processes, even if you don't have much experience in programming. 使用這些教學課程可以了解如何使用 API 以不同的方式來分析文字,以符合您的經驗層級。Use these tutorials to learn how you can use the API to analyze text in different ways to fit your experience level.

支援的語言Supported languages

本節已移至個別的文章,以利說明。This section has been moved to a separate article for better discoverability. 如需此內容,請參閱文字分析 API 中支援的語言Refer to Supported languages in the Text Analytics API for this content.

資料限制Data limits

所有的文字分析 API 端點均接受原始文字資料。All of the Text Analytics API endpoints accept raw text data. 目前的限制是每個文件 5,120 個字元;如果您需要分析較大的文件,您可以其分成較小的區塊。The current limit is 5,120 characters for each document; if you need to analyze larger documents, you can break them up into smaller chunks.

限制Limit Value
單一文件的大小上限Maximum size of a single document StringInfo.LengthInTextElements 測量的 5,120 個字元。5,120 characters as measured by StringInfo.LengthInTextElements.
整體要求的大小上限Maximum size of entire request 1 MB1 MB
要求中的文件數上限Maximum number of documents in a request 1,000 份文件1,000 documents

您的速率限制會隨著定價層而不同。Your rate limit will vary with your pricing tier.

Tier 每秒要求Requests per second 每分鐘要求Requests per minute
S / 多服務S / Multi-service 10001000 10001000
S0 / F0S0 / F0 100100 300300
S1S1 200200 300300
S2S2 300300 300300
S3S3 500500 500500
S4S4 10001000 10001000

要求是針對每個「文字分析」功能個別進行測量。Requests are measured for each Text Analytics feature separately. 例如,您可以同時將適用於定價層的要求數量上限傳送到每個功能。For example, you can send the maximum number of requests for your pricing tier to each feature, at the same time.

Unicode 編碼Unicode encoding

針對文字表示法和字元計數計算,文字分析 API 會使用 Unicode 編碼。The Text Analytics API uses Unicode encoding for text representation and character count calculations. 要求可使用 UTF-8 和 UTF-16 來提交,兩者的字元計數並沒有明顯差異。Requests can be submitted in both UTF-8 and UTF-16 with no measurable differences in the character count. Unicode 字碼指標會作為字元長度的啟發學習法,且會視為等同於文字分析資料大小上限的用途。Unicode codepoints are used as the heuristic for character length and are considered equivalent for the purposes of text analytics data limits. 如果您使用 StringInfo.LengthInTextElements 取得字元計數,您所使用就是我們用來測量資料大小的相同方法。If you use StringInfo.LengthInTextElements to get the character count, you are using the same method we use to measure data size.

後續步驟Next steps