How to detect and redact Personally Identifying Information (PII)

The PII feature can evaluate unstructured text, extract extract sensitive information (PII) and health information (PHI) in text across several pre-defined categories.

Determine how to process the data (optional)

Specify the PII detection model

By default, this feature will use the latest available AI model on your text. You can also configure your API requests to use a specific model version.

Input languages

When you submit documents to be processed, you can specify which of the supported languages they're written in. if you don't specify a language, key phrase extraction will default to English. The API may return offsets in the response to support different multilingual and emoji encodings.

Submitting data

Analysis is performed upon receipt of the request. Using the PII detection feature synchronously is stateless. No data is stored in your account, and results are returned immediately in the response.

When using this feature asynchronously, the API results are available for 24 hours from the time the request was ingested, and is indicated in the response. After this time period, the results are purged and are no longer available for retrieval.

The API will attempt to detect the defined entity categories for a given document language. If you want to specify which entities will be detected and returned, use the optional piiCategories parameter with the appropriate entity categories. This parameter can also let you detect entities that aren't enabled by default for your document language. The following URL example would detect a French driver's license number that might occur in English text, along with the default English entities.

Tip

If you don't include default when specifying entity categories, The API will only return the entity categories you specify.

https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1/entities/recognition/pii?piiCategories=default,FRDriversLicenseNumber

Getting PII results

When you get results from PII detection, you can stream the results to an application or save the output to a file on the local system. The API response will include recognized entities, including their categories and sub-categories, and confidence scores. The text string with the PII entities redacted will also be returned.

Service and data limits

For information on the size and number of requests you can send per minute and second, see the service limits article.

Next steps

Named Entity Recognition overview