如何在交談中偵測和修訂個人識別資訊 (PII)

發行項
12/19/2023

交談 PII 功能可以評估交談，從數個預先定義類別的內容擷取敏感性資訊 (PII)，並加以修訂。這個 API 適用於轉譯文字 (指文字記錄) 和聊天內容。對於文字記錄，API 還可提供音訊區段的音訊計時資訊，藉此修訂音訊區段，包括 PII 資訊在內。

決定如何處理資料 (選擇性)

指定 PII 偵測模型

根據預設，此功能會對您的輸入使用最新可用的 AI 模型。您也可以將 API 要求設定為使用特定的模型版本。

語言支援

目前交談式 PII 預覽 API 僅支援英文。

區域支援

目前交談式 PII 預覽 API 支援語言服務支援的所有 Azure 區域。

提交資料

注意

請參閱 Language Studio 文章，以取得使用 Language Studio 將交談文字格式化的資訊。

您可以將輸入提交至 API 作為交談項目清單。分析會在接收要求時執行。因為 API 為非同步，所以傳送 API 要求與接收結果之間可能會有延遲。如需您每分鐘和每秒鐘可以傳送的要求大小和數量資訊，請參閱下列資料限制一節。

使用非同步功能時，API 結果可從要求內嵌的時間起 24 小時內提供使用，且會在回應中指出。在這段時間之後，結果將會予以清除，且無法再供擷取。

提交資料至交談式 PII 後，每個要求可以傳送一個交談 (文字或語音)。

API 將嘗試偵測指定交談輸入的所有已定義實體類別。如果您想要指定將偵測並傳回哪些實體，請使用選擇性 piiCategories 參數搭配適當的實體類別。

對於口語文字記錄，將在所提供的 redactionSource 參數值中傳回偵測到的實體。目前支援的值為 redactionSource 、、和 (，分別對應至語音轉換文字 REST API 的 displayTextdisplay\ 、 lexical 和 itn 格式 maskedItn) 。 maskedItnitnlexicaltext 此外，對於口語文字記錄輸入，此 API 也會提供音訊計時資訊，以提升音訊修訂能力。若要使用 audioRedaction 功能，請使用選用的 includeAudioRedaction 旗標搭配 true 值。音訊修訂是根據語彙輸入格式來執行。

注意

交談 PII 現在支援 40,000 個字元作為文件大小。

取得 PII 結果

當您取得 PII 偵測的結果時，可以將結果串流至應用程式，或將輸出儲存到本機系統上的檔案。 API 回應會包含已辨識的實體 (包括其類別和子類別)，以及信賴分數。也會傳回具有 PII 實體修訂的文字字串。

前往您位於 Azure 入口網站的資源概觀頁面
在左側功能表中，選取 [金鑰和端點]。您需要其中一個金鑰和端點來驗證 API 要求。
下載並安裝您所選語言的用戶端程式庫套件：

語言套件版本

.NET 1.0.0

Python 1.0.0
如需用戶端和傳回物件的詳細資訊，請參閱下列參考文件：
- C#
- Python

語言	套件版本
.NET	1.0.0
Python	1.0.0

使用語音轉換文字提交文字記錄

如果您有使用語音服務語音轉文字功能的交談，請使用下列範例：

curl -i -X POST https://your-language-endpoint-here/language/analyze-conversations/jobs?api-version=2022-05-15-preview \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your-key-here" \
-d \
' 
{
    "displayName": "Analyze conversations from xxx",
    "analysisInput": {
        "conversations": [
            {
                "id": "23611680-c4eb-4705-adef-4aa1c17507b5",
                "language": "en",
                "modality": "transcript",
                "conversationItems": [
                    {
                        "participantId": "agent_1",
                        "id": "8074caf7-97e8-4492-ace3-d284821adacd",
                        "text": "Good morning.",
                        "lexical": "good morning",
                        "itn": "good morning",
                        "maskedItn": "good morning",
                        "audioTimings": [
                            {
                                "word": "good",
                                "offset": 11700000,
                                "duration": 2100000
                            },
                            {
                                "word": "morning",
                                "offset": 13900000,
                                "duration": 3100000
                            }
                        ]
                    },
                    {
                        "participantId": "agent_1",
                        "id": "0d67d52b-693f-4e34-9881-754a14eec887",
                        "text": "Can I have your name?",
                        "lexical": "can i have your name",
                        "itn": "can i have your name",
                        "maskedItn": "can i have your name",
                        "audioTimings": [
                            {
                                "word": "can",
                                "offset": 44200000,
                                "duration": 2200000
                            },
                            {
                                "word": "i",
                                "offset": 46500000,
                                "duration": 800000
                            },
                            {
                                "word": "have",
                                "offset": 47400000,
                                "duration": 1500000
                            },
                            {
                                "word": "your",
                                "offset": 49000000,
                                "duration": 1500000
                            },
                            {
                                "word": "name",
                                "offset": 50600000,
                                "duration": 2100000
                            }
                        ]
                    },
                    {
                        "participantId": "customer_1",
                        "id": "08684a7a-5433-4658-a3f1-c6114fcfed51",
                        "text": "Sure that is John Doe.",
                        "lexical": "sure that is john doe",
                        "itn": "sure that is john doe",
                        "maskedItn": "sure that is john doe",
                        "audioTimings": [
                            {
                                "word": "sure",
                                "offset": 5400000,
                                "duration": 6300000
                            },
                            {
                                "word": "that",
                                "offset": 13600000,
                                "duration": 2300000
                            },
                            {
                                "word": "is",
                                "offset": 16000000,
                                "duration": 1300000
                            },
                            {
                                "word": "john",
                                "offset": 17400000,
                                "duration": 2500000
                            },
                            {
                                "word": "doe",
                                "offset": 20000000,
                                "duration": 2700000
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "tasks": [
        {
            "taskName": "analyze 1",
            "kind": "ConversationalPIITask",
            "parameters": {
                "modelVersion": "2022-05-15-preview",
                "redactionSource": "text",
                "includeAudioRedaction": true,
                "piiCategories": [
                    "all"
                ]
            }
        }
    ]
}
`

提交文字聊天

如果您有源自文字的交談，請使用下列範例。例如，透過文字型聊天用戶端進行交談。

curl -i -X POST https://your-language-endpoint-here/language/analyze-conversations/jobs?api-version=2022-05-15-preview \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your-key-here" \
-d \
' 
{
    "displayName": "Analyze conversations from xxx",
    "analysisInput": {
        "conversations": [
            {
                "id": "23611680-c4eb-4705-adef-4aa1c17507b5",
                "language": "en",
                "modality": "text",
                "conversationItems": [
                    {
                        "participantId": "agent_1",
                        "id": "8074caf7-97e8-4492-ace3-d284821adacd",
                        "text": "Good morning."
                    },
                    {
                        "participantId": "agent_1",
                        "id": "0d67d52b-693f-4e34-9881-754a14eec887",
                        "text": "Can I have your name?"
                    },
                    {
                        "participantId": "customer_1",
                        "id": "08684a7a-5433-4658-a3f1-c6114fcfed51",
                        "text": "Sure that is John Doe."
                    }
                ]
            }
        ]
    },
    "tasks": [
        {
            "taskName": "analyze 1",
            "kind": "ConversationalPIITask",
            "parameters": {
                "modelVersion": "2022-05-15-preview"
            }
        }
    ]
}
`

取得結果

從回應標頭取得 operation-location。數值會如以下 URL 所示：

https://your-language-endpoint/language/analyze-conversations/jobs/12345678-1234-1234-1234-12345678

若要取得要求的結果，請使用下列 cURL 命令。請務必將 my-job-id 取代為您從先前的 operation-location 回應標頭收到的數字識別碼數值：

curl -X GET    https://your-language-endpoint/language/analyze-conversations/jobs/my-job-id \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: your-key-here"

服務和資料限制

如需每分鐘和每秒可傳送的要求大小和數目的相關資訊，請參閱服務限制一文。