com.azure.ai.textanalytics

Azure AI Language Service is a cloud-based natural language processing (NLP) service offered by Microsoft Azure. It's designed to extract valuable insights and information from text data through various NLP techniques. The service provides a range of capabilities for analyzing text, including sentiment analysis, entity recognition, key phrase extraction, language detection, and more. These capabilities can be leveraged to gain a deeper understanding of textual data, automate processes, and make informed decisions based on the analyzed content.

Here are some of the key features of Azure Text Analytics:

  • Sentiment Analysis: This feature determines the sentiment expressed in a piece of text, whether it's positive, negative, or neutral. It's useful for understanding the overall emotional tone of customer reviews, social media posts, and other text-based content.
  •  <li>Entity Recognition: Azure AI Language can identify and categorize entities mentioned in the text,
     such as people, organizations, locations, dates, and more. This is particularly useful for extracting
     structured information from unstructured text.</li>
    
     <li>Key Phrase Extraction: The service can automatically identify and extract key phrases or important terms
     from a given text. This can help summarize the main topics or subjects discussed in the text.</li>
    
     <li>Language Detection: Azure AI Language can detect the language in which the text is written. This is
     useful for routing content to appropriate language-specific processes or for organizing and categorizing
     multilingual data.</li>
    
     <li>Named Entity Recognition: In addition to identifying entities, the service can categorize them into
     pre-defined types, such as person names, organization names, locations, dates, and more.</li>
    
     <li>Entity Linking: This feature can link recognized entities to external databases or sources of information,
     enriching the extracted data with additional context.</li>
    
     <li>Customizable Models: Azure AI Language allows you to fine-tune and train the service's models with your
     specific domain or industry terminology, which can enhance the accuracy of entity recognition and sentiment
     analysis.</li>
    

The Azure Text Analytics library is a client library that provides Java developers with a simple and easy-to-use interface for accessing and using the Azure AI Language Service. This library allows developers to can be used to analyze unstructured text for tasks, such as sentiment analysis, entities recognition(PII, Health, Linked, Custom), key phrases extraction, language detection, abstractive and extractive summarizations, single-label and multi-label classifications, and execute multiple actions/operations in a single request.

Getting Started

In order to interact with the Text Analytics features in Azure AI Language Service, you'll need to create an instance of the Text Analytics Client class. To make this possible you'll need the key credential of the service. Alternatively, you can use AAD authentication via Azure Identity to connect to the service.

  1. Azure Key Credential, see AzureKeyCredential.
  2. Azure Active Directory, see TokenCredential.

Sample: Construct Synchronous Text Analytics Client with Azure Key Credential

The following code sample demonstrates the creation of a TextAnalyticsClient, using the TextAnalyticsClientBuilder to configure it with a key credential.

 TextAnalyticsClient textAnalyticsClient = new TextAnalyticsClientBuilder()
     .credential(new AzureKeyCredential("{key}"))
     .endpoint("{endpoint}")
     .buildClient();
 

Sample: Construct Asynchronous Text Analytics Client with Azure Key Credential

The following code sample demonstrates the creation of a TextAnalyticsAsyncClient, using the TextAnalyticsClientBuilder to configure it with a key credential.

 TextAnalyticsAsyncClient textAnalyticsAsyncClient = new TextAnalyticsClientBuilder()
     .credential(new AzureKeyCredential("{key}"))
     .endpoint("{endpoint}")
     .buildAsyncClient();
 

Note: See methods in client level class below to explore all features that library provides.



Extract information

Text Analytics client can be use Natural Language Understanding (NLU) to extract information from unstructured text. For example, identify key phrases or Personally Identifiable, etc. Below you can look at the samples on how to use it.

Key Phrases Extraction

The extractKeyPhrases method can be used to extract key phrases, which returns a list of strings denoting the key phrases in the document.

 KeyPhrasesCollection extractedKeyPhrases =
     textAnalyticsClient.extractKeyPhrases("My cat might need to see a veterinarian.");
 for (String keyPhrase : extractedKeyPhrases) {
     System.out.printf("%s.%n", keyPhrase);
 }
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Named Entities Recognition(NER): Prebuilt Model

The recognizeEntities method can be used to recognize entities, which returns a list of general categorized entities in the provided document.

 CategorizedEntityCollection recognizeEntitiesResult =
     textAnalyticsClient.recognizeEntities("Satya Nadella is the CEO of Microsoft");
 for (CategorizedEntity entity : recognizeEntitiesResult) {
     System.out.printf("Recognized entity: %s, entity category: %s, confidence score: %f.%n",
         entity.getText(), entity.getCategory(), entity.getConfidenceScore());
 }
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Custom Named Entities Recognition(NER): Custom Model

The com.azure.ai.textanalytics.TextAnalyticsClient#beginRecognizeCustomEntities( java.lang.Iterable, java.lang.String, java.lang.String) method can be used to recognize custom entities, which returns a list of custom entities for the provided list of document.

 List<String> documents = new ArrayList<>();
 for (int i = 0; i < 3; i++) {
     documents.add(
         "A recent report by the Government Accountability Office (GAO) found that the dramatic increase "
             + "in oil and natural gas development on federal lands over the past six years has stretched the"
             + " staff of the BLM to a point that it has been unable to meet its environmental protection "
             + "responsibilities."); }
 SyncPoller<RecognizeCustomEntitiesOperationDetail, RecognizeCustomEntitiesPagedIterable> syncPoller =
     textAnalyticsClient.beginRecognizeCustomEntities(documents, "{project_name}", "{deployment_name}");
 syncPoller.waitForCompletion();
 syncPoller.getFinalResult().forEach(documentsResults -> {
     System.out.printf("Project name: %s, deployment name: %s.%n",
         documentsResults.getProjectName(), documentsResults.getDeploymentName());
     for (RecognizeEntitiesResult documentResult : documentsResults) {
         System.out.println("Document ID: " + documentResult.getId());
         for (CategorizedEntity entity : documentResult.getEntities()) {
             System.out.printf(
                 "\tText: %s, category: %s, confidence score: %f.%n",
                 entity.getText(), entity.getCategory(), entity.getConfidenceScore());
         }
     }
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Linked Entities Recognition

The recognizeLinkedEntities method can be used to find linked entities, which returns a list of recognized entities with links to a well-known knowledge base for the provided document.

 String document = "Old Faithful is a geyser at Yellowstone Park.";
 System.out.println("Linked Entities:");
 textAnalyticsClient.recognizeLinkedEntities(document).forEach(linkedEntity -> {
     System.out.printf("Name: %s, entity ID in data source: %s, URL: %s, data source: %s.%n",
         linkedEntity.getName(), linkedEntity.getDataSourceEntityId(), linkedEntity.getUrl(),
         linkedEntity.getDataSource());
     linkedEntity.getMatches().forEach(entityMatch -> System.out.printf(
         "Matched entity: %s, confidence score: %f.%n",
         entityMatch.getText(), entityMatch.getConfidenceScore()));
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Personally Identifiable Information(PII) Entities Recognition

The recognizePiiEntities method can be used to recognize PII entities, which returns a list of Personally Identifiable Information(PII) entities in the provided document.

For a list of supported entity types, check: this

 PiiEntityCollection piiEntityCollection = textAnalyticsClient.recognizePiiEntities("My SSN is 859-98-0987");
 System.out.printf("Redacted Text: %s%n", piiEntityCollection.getRedactedText());
 for (PiiEntity entity : piiEntityCollection) {
     System.out.printf(
         "Recognized Personally Identifiable Information entity: %s, entity category: %s,"
             + " entity subcategory: %s, confidence score: %f.%n",
         entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore());
 }
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Text Analytics for Health: Prebuilt Model

The beginAnalyzeHealthcareEntities method can be used to analyze healthcare entities, entity data sources, and entity relations in a list of documents.

 List<String> documents = new ArrayList<>();
 for (int i = 0; i < 3; i++) {
     documents.add("The patient is a 54-year-old gentleman with a history of progressive angina over "
         + "the past several months.");
 }

 SyncPoller<AnalyzeHealthcareEntitiesOperationDetail, AnalyzeHealthcareEntitiesPagedIterable>
     syncPoller = textAnalyticsClient.beginAnalyzeHealthcareEntities(documents);

 syncPoller.waitForCompletion();
 AnalyzeHealthcareEntitiesPagedIterable result = syncPoller.getFinalResult();

 result.forEach(analyzeHealthcareEntitiesResultCollection -> {
     analyzeHealthcareEntitiesResultCollection.forEach(healthcareEntitiesResult -> {
         System.out.println("document id = " + healthcareEntitiesResult.getId());
         System.out.println("Document entities: ");
         AtomicInteger ct = new AtomicInteger();
         healthcareEntitiesResult.getEntities().forEach(healthcareEntity -> {
             System.out.printf("\ti = %d, Text: %s, category: %s, confidence score: %f.%n",
                 ct.getAndIncrement(), healthcareEntity.getText(), healthcareEntity.getCategory(),
                 healthcareEntity.getConfidenceScore());

             IterableStream<EntityDataSource> healthcareEntityDataSources =
                 healthcareEntity.getDataSources();
             if (healthcareEntityDataSources != null) {
                 healthcareEntityDataSources.forEach(healthcareEntityLink -> System.out.printf(
                     "\t\tEntity ID in data source: %s, data source: %s.%n",
                     healthcareEntityLink.getEntityId(), healthcareEntityLink.getName()));
             }
         });
         // Healthcare entity relation groups
         healthcareEntitiesResult.getEntityRelations().forEach(entityRelation -> {
             System.out.printf("\tRelation type: %s.%n", entityRelation.getRelationType());
             entityRelation.getRoles().forEach(role -> {
                 final HealthcareEntity entity = role.getEntity();
                 System.out.printf("\t\tEntity text: %s, category: %s, role: %s.%n",
                     entity.getText(), entity.getCategory(), role.getName());
             });
             System.out.printf("\tRelation confidence score: %f.%n",
                 entityRelation.getConfidenceScore());
         });
     });
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.



Summarize text-based content: Document Summarization

Text Analytics client can use Natural Language Understanding (NLU) to summarize lengthy documents. For example, extractive or abstractive summarization. Below you can look at the samples on how to use it.

Extractive summarization

The beginExtractSummary method returns a list of extract summaries for the provided list of document.

This method is supported since service API version com.azure.ai.textanalytics.TextAnalyticsServiceVersion#V2023_04_01.

 List<String> documents = new ArrayList<>();
 for (int i = 0; i < 3; i++) {
     documents.add(
         "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic,"
             + " human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI"
             + " Cognitive Services, I have been working with a team of amazing scientists and engineers to turn "
             + "this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship"
             + " among three attributes of human cognition: monolingual text (X), audio or visual sensory signals,"
             + " (Y) and multilingual (Z). At the intersection of all three, there\u2019s magic\u2014what we call XYZ-code"
             + " as illustrated in Figure 1\u2014a joint representation to create more powerful AI that can speak, hear,"
             + " see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term"
             + " vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have"
             + " pretrained models that can jointly learn representations to support a broad range of downstream"
             + " AI tasks, much in the way humans do today. Over the past five years, we have achieved human"
             + " performance on benchmarks in conversational speech recognition, machine translation, "
             + "conversational question answering, machine reading comprehension, and image captioning. These"
             + " five breakthroughs provided us with strong signals toward our more ambitious aspiration to"
             + " produce a leap in AI capabilities, achieving multisensory and multilingual learning that "
             + "is closer in line with how humans learn and understand. I believe the joint XYZ-code is a "
             + "foundational component of this aspiration, if grounded with external knowledge sources in "
             + "the downstream AI tasks.");
 }
 SyncPoller<ExtractiveSummaryOperationDetail, ExtractiveSummaryPagedIterable> syncPoller =
     textAnalyticsClient.beginExtractSummary(documents);
 syncPoller.waitForCompletion();
 syncPoller.getFinalResult().forEach(resultCollection -> {
     for (ExtractiveSummaryResult documentResult : resultCollection) {
         System.out.println("\tExtracted summary sentences:");
         for (ExtractiveSummarySentence extractiveSummarySentence : documentResult.getSentences()) {
             System.out.printf(
                 "\t\t Sentence text: %s, length: %d, offset: %d, rank score: %f.%n",
                 extractiveSummarySentence.getText(), extractiveSummarySentence.getLength(),
                 extractiveSummarySentence.getOffset(), extractiveSummarySentence.getRankScore());
         }
     }
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Abstractive summarization

The beginAbstractSummary method returns a list of abstractive summary for the provided list of document.

This method is supported since service API version com.azure.ai.textanalytics.TextAnalyticsServiceVersion#V2023_04_01.

 List<String> documents = new ArrayList<>();
 for (int i = 0; i < 3; i++) {
     documents.add(
         "At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic,"
             + " human-centric approach to learning and understanding. As Chief Technology Officer of Azure AI"
             + " Cognitive Services, I have been working with a team of amazing scientists and engineers to turn "
             + "this quest into a reality. In my role, I enjoy a unique perspective in viewing the relationship"
             + " among three attributes of human cognition: monolingual text (X), audio or visual sensory signals,"
             + " (Y) and multilingual (Z). At the intersection of all three, there\u2019s magic\u2014what we call XYZ-code"
             + " as illustrated in Figure 1\u2014a joint representation to create more powerful AI that can speak, hear,"
             + " see, and understand humans better. We believe XYZ-code will enable us to fulfill our long-term"
             + " vision: cross-domain transfer learning, spanning modalities and languages. The goal is to have"
             + " pretrained models that can jointly learn representations to support a broad range of downstream"
             + " AI tasks, much in the way humans do today. Over the past five years, we have achieved human"
             + " performance on benchmarks in conversational speech recognition, machine translation, "
             + "conversational question answering, machine reading comprehension, and image captioning. These"
             + " five breakthroughs provided us with strong signals toward our more ambitious aspiration to"
             + " produce a leap in AI capabilities, achieving multisensory and multilingual learning that "
             + "is closer in line with how humans learn and understand. I believe the joint XYZ-code is a "
             + "foundational component of this aspiration, if grounded with external knowledge sources in "
             + "the downstream AI tasks.");
 }
 SyncPoller<AbstractiveSummaryOperationDetail, AbstractiveSummaryPagedIterable> syncPoller =
     textAnalyticsClient.beginAbstractSummary(documents);
 syncPoller.waitForCompletion();
 syncPoller.getFinalResult().forEach(resultCollection -> {
     for (AbstractiveSummaryResult documentResult : resultCollection) {
         System.out.println("\tAbstractive summary sentences:");
         for (AbstractiveSummary summarySentence : documentResult.getSummaries()) {
             System.out.printf("\t\t Summary text: %s.%n", summarySentence.getText());
             for (AbstractiveSummaryContext abstractiveSummaryContext : summarySentence.getContexts()) {
                 System.out.printf("\t\t offset: %d, length: %d%n",
                     abstractiveSummaryContext.getOffset(), abstractiveSummaryContext.getLength());
             }
         }
     }
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.



Classify Text

Text Analytics client can use Natural Language Understanding (NLU) to detect the language or classify the sentiment of text you have. For example, language detection, sentiment analysis, or custom text classification. Below you can look at the samples on how to use it.

Analyze Sentiment and Mine Text for Opinions

The com.azure.ai.textanalytics.TextAnalyticsClient#analyzeSentiment(java.lang.String, java.lang.String, com.azure.ai.textanalytics.models.AnalyzeSentimentOptions) analyzeSentiment} method can be used to analyze sentiment on a given input text string, which returns a sentiment prediction, as well as confidence scores for each sentiment label (Positive, Negative, and Neutral) for the document and each sentence within it. If the includeOpinionMining of AnalyzeSentimentOptions set to true, the output will include the opinion mining results. It mines the opinions of a sentence and conducts more granular analysis around the aspects in the text (also known as aspect-based sentiment analysis).

 DocumentSentiment documentSentiment = textAnalyticsClient.analyzeSentiment(
     "The hotel was dark and unclean.", "en",
     new AnalyzeSentimentOptions().setIncludeOpinionMining(true));
 for (SentenceSentiment sentenceSentiment : documentSentiment.getSentences()) {
     System.out.printf("\tSentence sentiment: %s%n", sentenceSentiment.getSentiment());
     sentenceSentiment.getOpinions().forEach(opinion -> {
         TargetSentiment targetSentiment = opinion.getTarget();
         System.out.printf("\tTarget sentiment: %s, target text: %s%n", targetSentiment.getSentiment(),
             targetSentiment.getText());
         for (AssessmentSentiment assessmentSentiment : opinion.getAssessments()) {
             System.out.printf("\t\t'%s' sentiment because of \"%s\". Is the assessment negated: %s.%n",
                 assessmentSentiment.getSentiment(), assessmentSentiment.getText(), assessmentSentiment.isNegated());
         }
     });
 }
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Detect Language

The detectLanguage method returns the detected language and a confidence score between zero and one. Scores close to one indicate 100% certainty that the identified language is true.

This method will use the default country hint that sets up in com.azure.ai.textanalytics.TextAnalyticsClientBuilder#defaultCountryHint(String). If none is specified, service will use 'US' as the country hint.

 DetectedLanguage detectedLanguage = textAnalyticsClient.detectLanguage("Bonjour tout le monde");
 System.out.printf("Detected language name: %s, ISO 6391 name: %s, confidence score: %f.%n",
     detectedLanguage.getName(), detectedLanguage.getIso6391Name(), detectedLanguage.getConfidenceScore());
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Single-Label Classification

The beginSingleLabelClassify beginSingleLabelClassify} method returns a list of single-label classification for the provided list of document.

Note: this method is supported since service API version com.azure.ai.textanalytics.TextAnalyticsServiceVersion#V2022_05_01.

 List<String> documents = new ArrayList<>();
 for (int i = 0; i < 3; i++) {
     documents.add(
         "A recent report by the Government Accountability Office (GAO) found that the dramatic increase "
             + "in oil and natural gas development on federal lands over the past six years has stretched the"
             + " staff of the BLM to a point that it has been unable to meet its environmental protection "
             + "responsibilities."
     );
 }
 // See the service documentation for regional support and how to train a model to classify your documents,
 // see https://aka.ms/azsdk/textanalytics/customfunctionalities
 SyncPoller<ClassifyDocumentOperationDetail, ClassifyDocumentPagedIterable> syncPoller =
     textAnalyticsClient.beginSingleLabelClassify(documents, "{project_name}", "{deployment_name}");
 syncPoller.waitForCompletion();
 syncPoller.getFinalResult().forEach(documentsResults -> {
     System.out.printf("Project name: %s, deployment name: %s.%n",
         documentsResults.getProjectName(), documentsResults.getDeploymentName());
     for (ClassifyDocumentResult documentResult : documentsResults) {
         System.out.println("Document ID: " + documentResult.getId());
         for (ClassificationCategory classification : documentResult.getClassifications()) {
             System.out.printf("\tCategory: %s, confidence score: %f.%n",
                 classification.getCategory(), classification.getConfidenceScore());
         }
     }
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Multi-Label Classification

The beginMultiLabelClassify method returns a list of multi-label classification for the provided list of document.

Note: this method is supported since service API version com.azure.ai.textanalytics.TextAnalyticsServiceVersion#V2022_05_01.

 List<String> documents = new ArrayList<>();
 for (int i = 0; i < 3; i++) {
     documents.add(
         "I need a reservation for an indoor restaurant in China. Please don't stop the music."
             + " Play music and add it to my playlist");
 }
 SyncPoller<ClassifyDocumentOperationDetail, ClassifyDocumentPagedIterable> syncPoller =
     textAnalyticsClient.beginMultiLabelClassify(documents, "{project_name}", "{deployment_name}");
 syncPoller.waitForCompletion();
 syncPoller.getFinalResult().forEach(documentsResults -> {
     System.out.printf("Project name: %s, deployment name: %s.%n",
         documentsResults.getProjectName(), documentsResults.getDeploymentName());
     for (ClassifyDocumentResult documentResult : documentsResults) {
         System.out.println("Document ID: " + documentResult.getId());
         for (ClassificationCategory classification : documentResult.getClassifications()) {
             System.out.printf("\tCategory: %s, confidence score: %f.%n",
                 classification.getCategory(), classification.getConfidenceScore());
         }
     }
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.



Execute multiple actions

The beginAnalyzeActions method execute actions, such as, entities recognition, PII entities recognition, key phrases extraction, and etc, for a list of documents.

 List<String> documents = Arrays.asList(
     "Elon Musk is the CEO of SpaceX and Tesla.",
     "My SSN is 859-98-0987"
 );

 SyncPoller<AnalyzeActionsOperationDetail, AnalyzeActionsResultPagedIterable> syncPoller =
     textAnalyticsClient.beginAnalyzeActions(
         documents,
         new TextAnalyticsActions().setDisplayName("{tasks_display_name}")
             .setRecognizeEntitiesActions(new RecognizeEntitiesAction())
             .setExtractKeyPhrasesActions(new ExtractKeyPhrasesAction()));
 syncPoller.waitForCompletion();
 AnalyzeActionsResultPagedIterable result = syncPoller.getFinalResult();
 result.forEach(analyzeActionsResult -> {
     System.out.println("Entities recognition action results:");
     analyzeActionsResult.getRecognizeEntitiesResults().forEach(
         actionResult -> {
             if (!actionResult.isError()) {
                 actionResult.getDocumentsResults().forEach(
                     entitiesResult -> entitiesResult.getEntities().forEach(
                         entity -> System.out.printf(
                             "Recognized entity: %s, entity category: %s, entity subcategory: %s,"
                                 + " confidence score: %f.%n",
                             entity.getText(), entity.getCategory(), entity.getSubcategory(),
                             entity.getConfidenceScore())));
             }
         });
     System.out.println("Key phrases extraction action results:");
     analyzeActionsResult.getExtractKeyPhrasesResults().forEach(
         actionResult -> {
             if (!actionResult.isError()) {
                 actionResult.getDocumentsResults().forEach(extractKeyPhraseResult -> {
                     System.out.println("Extracted phrases:");
                     extractKeyPhraseResult.getKeyPhrases()
                         .forEach(keyPhrases -> System.out.printf("\t%s.%n", keyPhrases));
                 });
             }
         });
 });
 

See this for supported languages in Text Analytics API.

Note: For asynchronous sample, refer to TextAnalyticsAsyncClient.

Classes

TextAnalyticsAsyncClient

This class provides an asynchronous client that contains all the operations that apply to Azure Text Analytics.

TextAnalyticsClient

This class provides a synchronous client that contains all the operations that apply to Azure Text Analytics.

TextAnalyticsClientBuilder

This class provides a fluent builder API to help instantiation of TextAnalyticsClient and TextAnalyticsAsyncClient, call buildClient() buildClient} and buildAsyncClient() respectively to construct an instance of the desired client.

Enums

TextAnalyticsServiceVersion

The versions of Azure Text Analytics supported by this client library.