Quickstart: Detect Personally Identifiable Information (PII)
Use this article to get started detecting and redacting sensitive information in text, using the NER and PII client library and REST API. Follow these steps to try out examples code for mining text:
Reference documentation | Library source code | Package (NuGet) | Additional samples
Prerequisites
- Azure subscription - Create one for free
- The Visual Studio IDE
- Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
- You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
- You can use the free pricing tier (
F0) to try the service, and upgrade later to a paid tier for production.
- To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.
Setting up
Create a new .NET Core application
Using the Visual Studio IDE, create a new .NET Core console app. This will create a "Hello World" project with a single C# source file: program.cs.
Install the client library by right-clicking on the solution in the Solution Explorer and selecting Manage NuGet Packages. In the package manager that opens select Browse and search for Azure.AI.TextAnalytics. Select version 5.1.0, and then Install. You can also use the Package Manager Console.
Code example
Copy the following code into your program.cs file. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.
Important
Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.
Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.
using Azure;
using System;
using Azure.AI.TextAnalytics;
namespace Example
{
class Program
{
private static readonly AzureKeyCredential credentials = new AzureKeyCredential("replace-with-your-key-here");
private static readonly Uri endpoint = new Uri("replace-with-your-endpoint-here");
// Example method for detecting sensitive information (PII) from text
static void RecognizePIIExample(TextAnalyticsClient client)
{
string document = "A developer with SSN 859-98-0987 whose phone number is 800-102-1100 is building tools with our APIs.";
PiiEntityCollection entities = client.RecognizePiiEntities(document).Value;
Console.WriteLine($"Redacted Text: {entities.RedactedText}");
if (entities.Count > 0)
{
Console.WriteLine($"Recognized {entities.Count} PII entit{(entities.Count > 1 ? "ies" : "y")}:");
foreach (PiiEntity entity in entities)
{
Console.WriteLine($"Text: {entity.Text}, Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
}
}
else
{
Console.WriteLine("No entities were found.");
}
}
static void Main(string[] args)
{
var client = new TextAnalyticsClient(endpoint, credentials);
RecognizePIIExample(client);
Console.Write("Press any key to exit.");
Console.ReadKey();
}
}
}
Output
Recognized 2 PII entities:
Text: 859-98-0987, Category: U.S. Social Security Number (SSN), SubCategory: , Confidence score: 0.65
Text: 800-102-1100, Category: Phone Number, SubCategory: , Confidence score: 0.8
Reference documentation | Library source code | Package | Samples
Prerequisites
- Azure subscription - Create one for free
- Java Development Kit (JDK) with version 8 or above
- Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
- You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
- You can use the free pricing tier (
F0) to try the service, and upgrade later to a paid tier for production.
- To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.
Setting up
Add the client library
Create a Maven project in your preferred IDE or development environment. Then add the following dependency to your project's pom.xml file. You can find the implementation syntax for other build tools online.
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-textanalytics</artifactId>
<version>5.1.0</version>
</dependency>
</dependencies>
Code example
Create a Java file named Example.java. Open the file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.
Important
Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.
Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.
import com.azure.core.credential.AzureKeyCredential;
import com.azure.ai.textanalytics.models.*;
import com.azure.ai.textanalytics.TextAnalyticsClientBuilder;
import com.azure.ai.textanalytics.TextAnalyticsClient;
public class Example {
private static String KEY = "replace-with-your-key-here";
private static String ENDPOINT = "replace-with-your-endpoint-here";
public static void main(String[] args) {
TextAnalyticsClient client = authenticateClient(KEY, ENDPOINT);
recognizePiiEntitiesExample(client);
}
// Method to authenticate the client object with your key and endpoint
static TextAnalyticsClient authenticateClient(String key, String endpoint) {
return new TextAnalyticsClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
}
// Example method for detecting sensitive information (PII) from text
static void recognizePiiEntitiesExample(TextAnalyticsClient client)
{
// The text that need be analyzed.
String document = "My SSN is 859-98-0987";
PiiEntityCollection piiEntityCollection = client.recognizePiiEntities(document);
System.out.printf("Redacted Text: %s%n", piiEntityCollection.getRedactedText());
piiEntityCollection.forEach(entity -> System.out.printf(
"Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s,"
+ " confidence score: %f.%n",
entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()));
}
}
Output
Redacted Text: My SSN is ***********
Recognized Personally Identifiable Information entity: 859-98-0987, entity category: U.S. Social Security Number (SSN), entity subcategory: null, confidence score: 0.650000.
Reference documentation | Library source code | Package (NPM) | Samples
Prerequisites
- Azure subscription - Create one for free
- The current version of Node.js.
- Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
- You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
- You can use the free pricing tier (
F0) to try the service, and upgrade later to a paid tier for production.
- To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.
Setting up
Create a new Node.js application
In a console window (such as cmd, PowerShell, or Bash), create a new directory for your app, and navigate to it.
mkdir myapp
cd myapp
Run the npm init command to create a node application with a package.json file.
npm init
Install the client library
Install the NPM package:
npm install @azure/ai-text-analytics@5.1.0
Code example
Open the file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.
Important
Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.
Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.
"use strict";
const { TextAnalyticsClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
const key = '<paste-your-key-here>';
const endpoint = '<paste-your-endpoint-here>';
// Authenticate the client with your key and endpoint
const textAnalyticsClient = new TextAnalyticsClient(endpoint, new AzureKeyCredential(key));
// Example method for detecting sensitive information (PII) from text
async function piiRecognition(client) {
const documents = [
"The employee's phone number is (555) 555-5555."
];
const results = await client.recognizePiiEntities(documents, "en");
for (const result of results) {
if (result.error === undefined) {
console.log("Redacted Text: ", result.redactedText);
console.log(" -- Recognized PII entities for input", result.id, "--");
for (const entity of result.entities) {
console.log(entity.text, ":", entity.category, "(Score:", entity.confidenceScore, ")");
}
} else {
console.error("Encountered an error:", result.error);
}
}
}
piiRecognition(textAnalyticsClient)
Output
Redacted Text: The employee's phone number is **************.
-- Recognized PII entities for input 0 --
(555) 555-5555 : Phone Number (Score: 0.8 )
Reference documentation | Library source code | Package (PiPy) | Samples
Prerequisites
- Azure subscription - Create one for free
- Python 3.x
- Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
- You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
- You can use the free pricing tier (
F0) to try the service, and upgrade later to a paid tier for production.
- To use the Analyze feature, you will need a Language resource with the standard (S) pricing tier.
Setting up
Install the client library
After installing Python, you can install the client library with:
pip install azure-ai-textanalytics==5.1.0
Code example
Create a new Python file and copy the below code. Remember to replace the key variable with the key for your resource, and replace the endpoint variable with the endpoint for your resource.
Important
Go to the Azure portal. If the Language resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. You can find your key and endpoint in the resource's key and endpoint page, under resource management.
Remember to remove the key from your code when you're done, and never post it publicly. For production, consider using a secure way of storing and accessing your credentials. For example, Azure key vault.
key = "paste-your-key-here"
endpoint = "paste-your-endpoint-here"
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
# Authenticate the client using your key and endpoint
def authenticate_client():
ta_credential = AzureKeyCredential(key)
text_analytics_client = TextAnalyticsClient(
endpoint=endpoint,
credential=ta_credential)
return text_analytics_client
client = authenticate_client()
# Example method for detecting sensitive information (PII) from text
def pii_recognition_example(client):
documents = [
"The employee's SSN is 859-98-0987.",
"The employee's phone number is 555-555-5555."
]
response = client.recognize_pii_entities(documents, language="en")
result = [doc for doc in response if not doc.is_error]
for doc in result:
print("Redacted Text: {}".format(doc.redacted_text))
for entity in doc.entities:
print("Entity: {}".format(entity.text))
print("\tCategory: {}".format(entity.category))
print("\tConfidence Score: {}".format(entity.confidence_score))
print("\tOffset: {}".format(entity.offset))
print("\tLength: {}".format(entity.length))
pii_recognition_example(client)
Output
Redacted Text: The employee's SSN is ***********.
Entity: 859-98-0987
Category: U.S. Social Security Number (SSN)
Confidence Score: 0.65
Offset: 22
Length: 11
Redacted Text: The employee's phone number is ************.
Entity: 555-555-5555
Category: Phone Number
Confidence Score: 0.8
Offset: 31
Length: 12
Prerequisites
- The current version of cURL.
- Once you have your Azure subscription, create a Language resource in the Azure portal to get your key and endpoint. After it deploys, click Go to resource.
- You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
- You can use the free pricing tier (
F0) to try the service, and upgrade later to a paid tier for production.
Note
- The following BASH examples use the
\line continuation character. If your console or terminal uses a different line continuation character, use that character. - You can find language specific samples on GitHub.
- Go to the Azure portal and find the key and endpoint for the Language resource you created in the prerequisites. They will be located on the resource's key and endpoint page, under resource management. Then replace the strings in the code below with your key and endpoint. To call the API, you need the following information:
| parameter | Description |
|---|---|
-X POST <endpoint> |
Specifies your endpoint for accessing the API. |
-H Content-Type: application/json |
The content type for sending JSON data. |
-H "Ocp-Apim-Subscription-Key:<key> |
Specifies the key for accessing the API. |
-d <documents> |
The JSON containing the documents you want to send. |
The following cURL commands are executed from a BASH shell. Edit these commands with your own resource name, resource key, and JSON values.
Personally Identifying Information (PII) detection
- Copy the command into a text editor.
- Make the following changes in the command where needed:
- Replace the value
<your-text-analytics-key-here>with your key. - Replace the first part of the request URL
<your-text-analytics-endpoint-here>with the your own endpoint URL.
- Replace the value
- Open a command prompt window.
- Paste the command from the text editor into the command prompt window, and then run the command.
curl -X POST https://your-text-analytics-endpoint-here>/text/analytics/v3.1/entities/recognition/pii \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", language:"en", text: "Call our office at 312-555-1234, or send an email to support@contoso.com"}]}'
JSON response
{
"documents":[
{
"redactedText":"Call our office at ************, or send an email to *******************",
"id":"1",
"entities":[
{
"text":"312-555-1234",
"category":"PhoneNumber",
"offset":19,
"length":12,
"confidenceScore":0.8
},
{
"text":"support@contoso.com",
"category":"Email",
"offset":53,
"length":19,
"confidenceScore":0.8
}
],
"warnings":[
]
}
],
"errors":[
],
"modelVersion":"2021-01-15"
}
Clean up resources
If you want to clean up and remove a Cognitive Services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it.
Next steps
Povratne informacije
Pošalјite i prikažite povratne informacije za