Quickstart: Detect Personally Identifiable Information (PII)

Note

This quickstart only covers PII detection in documents. To learn more about detecting PII in conversations, see How to detect and redact PII in conversations.

Reference documentation | Additional samples | Package (NuGet) | Library source code

Use this quickstart to create a Personally Identifiable Information (PII) detection application with the client library for .NET. In the following example, you'll create a C# application that can identify recognized sensitive information in text.

Prerequisites

  • Azure subscription - Create one for free
  • The Visual Studio IDE
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You'll need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (Free F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you'll need a Language resource with the standard (S) pricing tier.

Setting up

Create a new .NET Core application

Using the Visual Studio IDE, create a new .NET Core console app. This will create a "Hello World" project with a single C# source file: program.cs.

Install the client library by right-clicking on the solution in the Solution Explorer and selecting Manage NuGet Packages. In the package manager that opens select Browse and search for Azure.AI.TextAnalytics. Select version 5.1.1, and then Install. You can also use the Package Manager Console.

Code example

Copy the following code into your program.cs file. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint by navigating to your resource's Keys and Endpoint page, under Resource Management.

Important

Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.

using Azure;
using System;
using Azure.AI.TextAnalytics;

namespace Example
{
    class Program
    {
        private static readonly AzureKeyCredential credentials = new AzureKeyCredential("replace-with-your-key-here");
        private static readonly Uri endpoint = new Uri("replace-with-your-endpoint-here");

        // Example method for detecting sensitive information (PII) from text 
        static void RecognizePIIExample(TextAnalyticsClient client)
        {
            string document = "Call our office at 312-555-1234, or send an email to support@contoso.com.";
        
            PiiEntityCollection entities = client.RecognizePiiEntities(document).Value;
        
            Console.WriteLine($"Redacted Text: {entities.RedactedText}");
            if (entities.Count > 0)
            {
                Console.WriteLine($"Recognized {entities.Count} PII entit{(entities.Count > 1 ? "ies" : "y")}:");
                foreach (PiiEntity entity in entities)
                {
                    Console.WriteLine($"Text: {entity.Text}, Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
                }
            }
            else
            {
                Console.WriteLine("No entities were found.");
            }
        }

        static void Main(string[] args)
        {
            var client = new TextAnalyticsClient(endpoint, credentials);
            RecognizePIIExample(client);

            Console.Write("Press any key to exit.");
            Console.ReadKey();
        }

    }
}

Output

Redacted Text: Call our office at ************, or send an email to *******************.
Recognized 2 PII entities:
Text: 312-555-1234, Category: PhoneNumber, SubCategory: , Confidence score: 0.8
Text: support@contoso.com, Category: Email, SubCategory: , Confidence score: 0.8

Reference documentation | Additional samples | Package (Maven) | Library source code

Use this quickstart to create a Personally Identifiable Information (PII) detection application with the client library for Java. In the following example, you will create a Java application that can identify recognized sensitive information in text.

Prerequisites

  • Azure subscription - Create one for free
  • Java Development Kit (JDK) with version 8 or above
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (Free F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.

Setting up

Add the client library

Create a Maven project in your preferred IDE or development environment. Then add the following dependency to your project's pom.xml file. You can find the implementation syntax for other build tools online.

<dependencies>
     <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-ai-textanalytics</artifactId>
        <version>5.1.9</version>
    </dependency>
</dependencies>

Code example

Create a Java file named Example.java. Open the file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint by navigating to your resource's Keys and Endpoint page, under Resource Management.

Important

Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.

import com.azure.core.credential.AzureKeyCredential;
import com.azure.ai.textanalytics.models.*;
import com.azure.ai.textanalytics.TextAnalyticsClientBuilder;
import com.azure.ai.textanalytics.TextAnalyticsClient;

public class Example {

    private static String KEY = "replace-with-your-key-here";
    private static String ENDPOINT = "replace-with-your-endpoint-here";

    public static void main(String[] args) {
        TextAnalyticsClient client = authenticateClient(KEY, ENDPOINT);
        recognizePiiEntitiesExample(client);
    }
    // Method to authenticate the client object with your key and endpoint
    static TextAnalyticsClient authenticateClient(String key, String endpoint) {
        return new TextAnalyticsClientBuilder()
                .credential(new AzureKeyCredential(key))
                .endpoint(endpoint)
                .buildClient();
    }

    // Example method for detecting sensitive information (PII) from text 
    static void recognizePiiEntitiesExample(TextAnalyticsClient client)
    {
        // The text that need be analyzed.
        String document = "My SSN is 859-98-0987";
        PiiEntityCollection piiEntityCollection = client.recognizePiiEntities(document);
        System.out.printf("Redacted Text: %s%n", piiEntityCollection.getRedactedText());
        piiEntityCollection.forEach(entity -> System.out.printf(
            "Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s,"
                + " confidence score: %f.%n",
            entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()));
    }
}

Output

Redacted Text: My SSN is ***********
Recognized Personally Identifiable Information entity: 859-98-0987, entity category: USSocialSecurityNumber, entity subcategory: null, confidence score: 0.650000.

Reference documentation | Additional samples | Package (npm) | Library source code

Use this quickstart to create a Personally Identifiable Information (PII) detection application with the client library for Node.js. In the following example, you'll create a JavaScript application that can identify recognized sensitive information in text.

Prerequisites

  • Azure subscription - Create one for free
  • Node.js v16 LTS
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • you'll need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (Free F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you'll need a Language resource with the standard (S) pricing tier.

Setting up

Create a new Node.js application

In a console window (such as cmd, PowerShell, or Bash), create a new directory for your app, and navigate to it.

mkdir myapp 

cd myapp

Run the npm init command to create a node application with a package.json file.

npm init

Install the client library

Install the npm package:

npm install @azure/ai-text-analytics@5.1.0

Code example

Open the file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint by navigating to your resource's Keys and Endpoint page, under Resource Management.

Important

Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.

"use strict";

const { TextAnalyticsClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const key = '<paste-your-key-here>';
const endpoint = '<paste-your-endpoint-here>';
// Authenticate the client with your key and endpoint
const textAnalyticsClient = new TextAnalyticsClient(endpoint,  new AzureKeyCredential(key));

// Example method for detecting sensitive information (PII) from text 
async function piiRecognition(client) {

    const documents = [
        "The employee's phone number is (555) 555-5555."
    ];

    const results = await client.recognizePiiEntities(documents, "en");
    for (const result of results) {
        if (result.error === undefined) {
            console.log("Redacted Text: ", result.redactedText);
            console.log(" -- Recognized PII entities for input", result.id, "--");
            for (const entity of result.entities) {
                console.log(entity.text, ":", entity.category, "(Score:", entity.confidenceScore, ")");
            }
        } else {
            console.error("Encountered an error:", result.error);
        }
    }
}
piiRecognition(textAnalyticsClient)

Output

Redacted Text:  The employee's phone number is **************.
 -- Recognized PII entities for input 0 --
(555) 555-5555 : Phone Number (Score: 0.8 )

Reference documentation | Additional samples | Package (PiPy) | Library source code

Use this quickstart to create a Personally Identifiable Information (PII) detection application with the client library for Python. In the following example, you will create a Python application that can identify recognized sensitive information in text.

Prerequisites

  • Azure subscription - Create one for free
  • Python 3.x
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (Free F0) to try the service, and upgrade later to a paid tier for production.
  • To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.

Setting up

Install the client library

After installing Python, you can install the client library with:

pip install azure-ai-textanalytics==5.1.0

Code example

Create a new Python file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.

Important

Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint by navigating to your resource's Keys and Endpoint page, under Resource Management.

Important

Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Cognitive Services security article for more information.

key = "paste-your-key-here"
endpoint = "paste-your-endpoint-here"

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using your key and endpoint 
def authenticate_client():
    ta_credential = AzureKeyCredential(key)
    text_analytics_client = TextAnalyticsClient(
            endpoint=endpoint, 
            credential=ta_credential)
    return text_analytics_client

client = authenticate_client()

# Example method for detecting sensitive information (PII) from text 
def pii_recognition_example(client):
    documents = [
        "The employee's SSN is 859-98-0987.",
        "The employee's phone number is 555-555-5555."
    ]
    response = client.recognize_pii_entities(documents, language="en")
    result = [doc for doc in response if not doc.is_error]
    for doc in result:
        print("Redacted Text: {}".format(doc.redacted_text))
        for entity in doc.entities:
            print("Entity: {}".format(entity.text))
            print("\tCategory: {}".format(entity.category))
            print("\tConfidence Score: {}".format(entity.confidence_score))
            print("\tOffset: {}".format(entity.offset))
            print("\tLength: {}".format(entity.length))
pii_recognition_example(client)

Output

Redacted Text: The employee's SSN is ***********.
Entity: 859-98-0987
        Category: U.S. Social Security Number (SSN)
        Confidence Score: 0.65
        Offset: 22
        Length: 11
Redacted Text: The employee's phone number is ************.
Entity: 555-555-5555
        Category: Phone Number
        Confidence Score: 0.8
        Offset: 31
        Length: 12

Reference documentation

Use this quickstart to send Personally Identifiable Information (PII) detection requests using the REST API. In the following example, you will use cURL to identify recognized sensitive information in text.

Prerequisites

  • The current version of cURL.
  • Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
    • You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
    • You can use the free pricing tier (Free F0) to try the service, and upgrade later to a paid tier for production.

Note

  • The following BASH examples use the \ line continuation character. If your console or terminal uses a different line continuation character, use that character.
  • You can find language specific samples on GitHub.
  • Go to the Azure portal and find the key and endpoint for the Language resource you created in the prerequisites. They will be located on the resource's key and endpoint page, under resource management. Then replace the strings in the code below with your key and endpoint. To call the API, you need the following information:
parameter Description
-X POST <endpoint> Specifies your endpoint for accessing the API.
-H Content-Type: application/json The content type for sending JSON data.
-H "Ocp-Apim-Subscription-Key:<key> Specifies the key for accessing the API.
-d <documents> The JSON containing the documents you want to send.

The following cURL commands are executed from a BASH shell. Edit these commands with your own resource name, resource key, and JSON values.

Personally Identifying Information (PII) detection

  1. Copy the command into a text editor.
  2. Make the following changes in the command where needed:
    1. Replace the value <your-text-analytics-key-here> with your key.
    2. Replace the first part of the request URL <your-text-analytics-endpoint-here> with the your own endpoint URL.
  3. Open a command prompt window.
  4. Paste the command from the text editor into the command prompt window, and then run the command.
curl -i -X POST https://<your-language-resource-endpoint>/language/:analyze-text?api-version=2022-05-01 \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key:<your-language-resource-key>" \
-d \
'
{
    "kind": "PiiEntityRecognition",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "language": "en",
                "text": "Call our office at 312-555-1234, or send an email to support@contoso.com"
            }
        ]
    }
}
'

JSON response

{
	"kind": "PiiEntityRecognitionResults",
	"results": {
		"documents": [{
			"redactedText": "Call our office at ************, or send an email to *******************",
			"id": "1",
			"entities": [{
				"text": "312-555-1234",
				"category": "PhoneNumber",
				"offset": 19,
				"length": 12,
				"confidenceScore": 0.8
			}, {
				"text": "support@contoso.com",
				"category": "Email",
				"offset": 53,
				"length": 19,
				"confidenceScore": 0.8
			}],
			"warnings": []
		}],
		"errors": [],
		"modelVersion": "2021-01-15"
	}
}

Clean up resources

If you want to clean up and remove a Cognitive Services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.

Next steps