Quickstart: Detect Personally Identifiable Information (PII)

Use this article to get started detecting and redacting sensitive information in text, using the NER and PII client library and REST API. Follow these steps to try out examples code for mining text:

Reference documentation | Library source code | Package (NuGet) | Additional samples

Prerequisites

  • Azure subscription - Create one for free
  • The Visual Studio IDE
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.

Setting up

Create a new .NET Core application

Using the Visual Studio IDE, create a new .NET Core console app. This will create a "Hello World" project with a single C# source file: program.cs.

Install the client library by right-clicking on the solution in the Solution Explorer and selecting Manage NuGet Packages. In the package manager that opens select Browse and search for Azure.AI.TextAnalytics. Select version 5.1.0, and then Install. You can also use the Package Manager Console.

Code example

Copy the following code into your program.cs file. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.

Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.

using Azure;
using System;
using Azure.AI.TextAnalytics;

namespace Example
{
    class Program
    {
        private static readonly AzureKeyCredential credentials = new AzureKeyCredential("replace-with-your-key-here");
        private static readonly Uri endpoint = new Uri("replace-with-your-endpoint-here");

        // Example method for detecting sensitive information (PII) from text 
        static void RecognizePIIExample(TextAnalyticsClient client)
        {
            string document = "A developer with SSN 859-98-0987 whose phone number is 800-102-1100 is building tools with our APIs.";
        
            PiiEntityCollection entities = client.RecognizePiiEntities(document).Value;
        
            Console.WriteLine($"Redacted Text: {entities.RedactedText}");
            if (entities.Count > 0)
            {
                Console.WriteLine($"Recognized {entities.Count} PII entit{(entities.Count > 1 ? "ies" : "y")}:");
                foreach (PiiEntity entity in entities)
                {
                    Console.WriteLine($"Text: {entity.Text}, Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
                }
            }
            else
            {
                Console.WriteLine("No entities were found.");
            }
        }

        static void Main(string[] args)
        {
            var client = new TextAnalyticsClient(endpoint, credentials);
            RecognizePIIExample(client);

            Console.Write("Press any key to exit.");
            Console.ReadKey();
        }

    }
}

Output

Recognized 2 PII entities:
Text: 859-98-0987, Category: U.S. Social Security Number (SSN), SubCategory: , Confidence score: 0.65
Text: 800-102-1100, Category: Phone Number, SubCategory: , Confidence score: 0.8

Reference documentation | Library source code | Package | Samples

Prerequisites

  • Azure subscription - Create one for free
  • Java Development Kit (JDK) with version 8 or above
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.

Setting up

Add the client library

Create a Maven project in your preferred IDE or development environment. Then add the following dependency to your project's pom.xml file. You can find the implementation syntax for other build tools online.

<dependencies>
     <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-ai-textanalytics</artifactId>
        <version>5.1.0</version>
    </dependency>
</dependencies>

Code example

Create a Java file named Example.java. Open the file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.

Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.

import com.azure.core.credential.AzureKeyCredential;
import com.azure.ai.textanalytics.models.*;
import com.azure.ai.textanalytics.TextAnalyticsClientBuilder;
import com.azure.ai.textanalytics.TextAnalyticsClient;

public class Example {

    private static String KEY = "replace-with-your-key-here";
    private static String ENDPOINT = "replace-with-your-endpoint-here";

    public static void main(String[] args) {
        TextAnalyticsClient client = authenticateClient(KEY, ENDPOINT);
        recognizePiiEntitiesExample(client);
    }
    // Method to authenticate the client object with your key and endpoint
    static TextAnalyticsClient authenticateClient(String key, String endpoint) {
        return new TextAnalyticsClientBuilder()
                .credential(new AzureKeyCredential(key))
                .endpoint(endpoint)
                .buildClient();
    }

    // Example method for detecting sensitive information (PII) from text 
    static void recognizePiiEntitiesExample(TextAnalyticsClient client)
    {
        // The text that need be analyzed.
        String document = "My SSN is 859-98-0987";
        PiiEntityCollection piiEntityCollection = client.recognizePiiEntities(document);
        System.out.printf("Redacted Text: %s%n", piiEntityCollection.getRedactedText());
        piiEntityCollection.forEach(entity -> System.out.printf(
            "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s,"
                + " confidence score: %f.%n",
            entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()));
    }
}

Output

Redacted Text: My SSN is ***********
Recognized Personally Identifiable Information entity: 859-98-0987, entity category: U.S. Social Security Number (SSN), entity subcategory: null, confidence score: 0.650000.

Reference documentation | Library source code | Package (NPM) | Samples

Prerequisites

  • Azure subscription - Create one for free
  • The current version of Node.js.
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.

Setting up

Create a new Node.js application

In a console window (such as cmd, PowerShell, or Bash), create a new directory for your app, and navigate to it.

mkdir myapp 

cd myapp

Run the npm init command to create a node application with a package.json file.

npm init

Install the client library

Install the NPM package:

nnpm install @azure/ai-text-analytics@5.1.0

Code example

Open the file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.

Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.

"use strict";

const { TextAnalyticsClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const key = '<paste-your-key-here>';
const endpoint = '<paste-your-endpoint-here>';
// Authenticate the client with your key and endpoint
const textAnalyticsClient = new TextAnalyticsClient(endpoint,  new AzureKeyCredential(key));

// Example method for detecting sensitive information (PII) from text 
async function piiRecognition(client) {

    const documents = [
        "The employee's phone number is (555) 555-5555."
    ];

    const results = await client.recognizePiiEntities(documents, "en");
    for (const result of results) {
        if (result.error === undefined) {
            console.log("Redacted Text: ", result.redactedText);
            console.log(" -- Recognized PII entities for input", result.id, "--");
            for (const entity of result.entities) {
                console.log(entity.text, ":", entity.category, "(Score:", entity.confidenceScore, ")");
            }
        } else {
            console.error("Encountered an error:", result.error);
        }
    }
}
piiRecognition(textAnalyticsClient)

Output

Redacted Text:  The employee's phone number is **************.
 -- Recognized PII entities for input 0 --
(555) 555-5555 : Phone Number (Score: 0.8 )

Reference documentation | Library source code | Package (PiPy) | Samples

Prerequisites

  • Azure subscription - Create one for free
  • Python 3.x
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.

Setting up

Install the client library

After installing Python, you can install the client library with:

pip install azure-ai-textanalytics==5.1.0

Code example

Create a new Python file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.

Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.

key = "paste-your-key-here"
endpoint = "paste-your-endpoint-here"

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using your key and endpoint 
def authenticate_client():
    ta_credential = AzureKeyCredential(key)
    text_analytics_client = TextAnalyticsClient(
            endpoint=endpoint, 
            credential=ta_credential)
    return text_analytics_client

client = authenticate_client()

# Example method for detecting sensitive information (PII) from text 
def pii_recognition_example(client):
    documents = [
        "The employee's SSN is 859-98-0987.",
        "The employee's phone number is 555-555-5555."
    ]
    response = client.recognize_pii_entities(documents, language="en")
    result = [doc for doc in response if not doc.is_error]
    for doc in result:
        print("Redacted Text: {}".format(doc.redacted_text))
        for entity in doc.entities:
            print("Entity: {}".format(entity.text))
            print("\tCategory: {}".format(entity.category))
            print("\tConfidence Score: {}".format(entity.confidence_score))
            print("\tOffset: {}".format(entity.offset))
            print("\tLength: {}".format(entity.length))
pii_recognition_example(client)

Output

Redacted Text: The employee's SSN is ***********.
Entity: 859-98-0987
        Category: U.S. Social Security Number (SSN)
        Confidence Score: 0.65
        Offset: 22
        Length: 11
Redacted Text: The employee's phone number is ************.
Entity: 555-555-5555
        Category: Phone Number
        Confidence Score: 0.8
        Offset: 31
        Length: 12

Reference documentation

Prerequisites

  • The current version of cURL.
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.

Note

  • The following BASH examples use the \ line continuation character. If your console or terminal uses a different line continuation character, use that character.
  • You can find language specific samples on GitHub.
  • Go to the Azure portal and find the key and endpoint for the Language resource you created in the prerequisites. They will be located on the resource's key and endpoint page, under resource management. Then replace the strings in the code below with your key and endpoint. To call the API, you need the following information:
parameter Description
-X POST <endpoint> Specifies your endpoint for accessing the API.
-H Content-Type: application/json The content type for sending JSON data.
-H "Ocp-Apim-Subscription-Key:<key> Specifies the key for accessing the API.
-d <documents> The JSON containing the documents you want to send.

The following cURL commands are executed from a BASH shell. Edit these commands with your own resource name, resource key, and JSON values.

Personally Identifying Information (PII) detection

  1. Copy the command into a text editor.
  2. Make the following changes in the command where needed:
    1. Replace the value <your-text-analytics-key-here> with your key.
    2. Replace the first part of the request URL <your-text-analytics-endpoint-here> with the your own endpoint URL.
  3. Open a command prompt window.
  4. Paste the command from the text editor into the command prompt window, and then run the command.
curl -X POST https://your-text-analytics-endpoint-here>/text/analytics/v3.1/entities/recognition/pii \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", language:"en", text: "Call our office at 312-555-1234, or send an email to support@contoso.com"}]}'

JSON response

{
   "documents":[
      {
         "redactedText":"Call our office at ************, or send an email to *******************",
         "id":"1",
         "entities":[
            {
               "text":"312-555-1234",
               "category":"PhoneNumber",
               "offset":19,
               "length":12,
               "confidenceScore":0.8
            },
            {
               "text":"support@contoso.com",
               "category":"Email",
               "offset":53,
               "length":19,
               "confidenceScore":0.8
            }
         ],
         "warnings":[
            
         ]
      }
   ],
   "errors":[
      
   ],
   "modelVersion":"2021-01-15"
}

Clean up resources

If you want to clean up and remove a Cognitive Services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.

Next steps