您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

如何:使用健康状况文本分析(预览)How to: Use Text Analytics for health (preview)

重要

健康状况文本分析是一项预览功能,其“按原样”提供并在“不保证没有缺点”情况下提供。Text Analytics for health is a preview capability provided “AS IS” and “WITH ALL FAULTS.” 因此,不应在任何生产用途中实施或部署健康状况文本分析(预览版)。As such, Text Analytics for health (preview) should not be implemented or deployed in any production use. 健康状况文本分析不应用于或不可供用于医疗设备、临床支持、诊断工具或者其他旨在用于诊断、治愈、缓解、治疗或预防疾病或其他健康问题的技术,Microsoft 不授予将此功能用于此类目的的任何许可或权利。Text Analytics for health is not intended or made available for use as a medical device, clinical support, diagnostic tool, or other technology intended to be used in the diagnosis, cure, mitigation, treatment, or prevention of disease or other conditions, and no license or right is granted by Microsoft to use this capability for such purposes. 此功能不旨在代替专业人员医疗建议或保健意见、诊断、治疗或医疗保健专业人员临床判断而实施或部署,并且不应用作此用途。This capability is not designed or intended to be implemented or deployed as a substitute for professional medical advice or healthcare opinion, diagnosis, treatment, or the clinical judgment of a healthcare professional, and should not be used as such. 客户独自负责健康状况文本分析的任何使用。The customer is solely responsible for any use of Text Analytics for health. Microsoft 不保证健康状况文本分析或提供的与该功能相关的任何材料足够充分用于任何医疗目的,或者满足任何人的健康或医疗要求。Microsoft does not warrant that Text Analytics for health or any materials provided in connection with the capability will be sufficient for any medical purposes or otherwise meet the health or medical requirements of any person.

健康状况文本分析是文本分析 API 服务的一项功能,它可以从非结构化文本(例如医生的备注、出院摘要、临床文档和电子健康记录)中提取和标记相关医疗信息。Text Analytics for health is a feature of the Text Analytics API service that extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records. 可以使用两种方法来利用这项服务:There are two ways to utilize this service:

功能Features

健康状况文本分析对英语文本执行命名实体识别 (NER)、关系提取、实体否定和实体链接,以发现非结构化临床和生物医学文本中的见解。Text Analytics for health performs Named Entity Recognition (NER), relation extraction, entity negation and entity linking on English-language text to uncover insights in unstructured clinical and biomedical text.

命名实体识别检测非结构化文本中提及的可与一个或多个语义类型关联的字词和短语,如诊断、药物名称、症状/体征或年龄。Named Entity Recognition detects words and phrases mentioned in unstructured text that can be associated with one or more semantic types, such as diagnosis, medication name, symptom/sign, or age.

健康状况 NERHealth NER

请参阅运行状况文本分析返回的实体类别,获取支持的实体的完整列表。See the entity categories returned by Text Analytics for health for a full list of supported entities. 有关置信度分数的信息,请参阅文本分析透明度备注For information on confidence scores, see the Text Analytics transparency note.

支持的语言和区域Supported languages and regions

健康状况文本分析仅支持英语文档。Text Analytics for health only supports English language documents.

健康状况文本分析托管 Web API 目前仅在以下区域提供:美国西部 2、美国东部 2、美国中部、欧洲北部和欧洲西部。The Text Analytics for health hosted web API is currently only available in these regions: West US 2, East US 2, Central US, North Europe and West Europe.

请求访问公共预览版Request access to the public preview

填写并提交认知服务请求表单,请求访问健康状况文本分析公共预览版。Fill out and submit the Cognitive Services request form to request access to the Text Analytics for health public preview. 不会对健康状况文本分析的使用而收费。You will not be billed for Text Analytics for health usage.

通过该表单请求有关你、你的公司以及要使用该容器的用户方案的信息。The form requests information about you, your company, and the user scenario for which you'll use the container. 提交表单后,Azure 认知服务团队将对其进行审核,并通过电子邮件向你告知决定。After you submit the form, the Azure Cognitive Services team will review it and email you with a decision.

重要

  • 在此表单上,必须使用与 Azure 订阅 ID 关联的电子邮件地址。On the form, you must use an email address associated with an Azure subscription ID.
  • 使用的 Azure 资源必须通过批准的 Azure 订阅 ID 创建。The Azure resource you use must have been created with the approved Azure subscription ID.
  • 请检查你的电子邮件(“收件箱”和“垃圾邮件”文件夹)以获取来自 Microsoft 的应用程序状态更新。Check your email (both inbox and junk folders) for updates on the status of your application from Microsoft.

使用 Docker 容器Using the Docker container

若要在自己的环境中运行健康状况文本分析,请按照说明下载并安装该容器To run the Text Analytics for health container in your own environment, follow these instructions to download and install the container.

使用客户端库Using the client library

文本分析客户端库的最新预发行版本让你能够使用客户端对象调用健康状况文本分析。The latest prerelease of the Text Analytics client library enables you to call Text Analytics for health using a client object. 请参阅参考文档,并查看 GitHub 上的示例:Refer to the reference documentation, and see the examples on GitHub:

发送 REST API 请求Sending a REST API request

准备工作Preparation

当为健康状况文本分析提供较少的文本时,会得到更高质量的结果。Text Analytics for health produces a higher-quality result when you give it smaller amounts of text to work on. 这与一些其他文本分析功能(例如关键短语提取)相反,关键短语提取在处理较大的文本块时效果更佳。This is opposite to some of the other Text Analytics features such as key phrase extraction which performs better on larger blocks of text. 若要从这些操作中获得最佳结果,请考虑相应地重构输入。To get the best results from these operations, consider restructuring the inputs accordingly.

必须拥有以下格式的 JSON 文档:ID、文本和语言You must have JSON documents in this format: ID, text, and language.

每个文档的大小必须少于 5,120 个字符,Document size must be under 5,120 characters per document. 对于集合中允许的最大文档数,请参阅“概念”下的数据限制一文。For the maximum number of documents permitted in a collection, see the data limits article under Concepts. 集合在请求正文中提交。The collection is submitted in the body of the request.

为托管的异步 Web API 构造 API 请求Structure the API request for the hosted asynchronous web API

对于容器和托管 Web API,必须创建 POST 请求。For both the container and hosted web API, you must create a POST request. 可以使用 Postman、cURL 命令或健康状况文本分析托管 API 参考中的 API 测试控制台快速构造 POST 请求,并将其发送到所需区域中的托管 Web API。You can use Postman, a cURL command or the API testing console in the Text Analytics for health hosted API reference to quickly construct and send a POST request to the hosted web API in your desired region.

备注

异步 /analyze/health 终结点仅在以下区域提供:美国西部 2、美国东部 2、美国中部、欧洲北部和欧洲西部。Both the asynchronous /analyze and /health endpoints are only available in the following regions: West US 2, East US 2, Central US, North Europe and West Europe. 若要成功地向这些终结点发出请求,请确保已在其中一个区域中创建资源。To make successful requests to these endpoints, please make sure your resource is created in one of these regions.

以下是附加到健康状况文本分析 API 请求 POST 正文的 JSON 文件示例:Below is an example of a JSON file attached to the Text Analytics for health API request's POST body:

example.json

{
  "documents": [
    {
      "language": "en",
      "id": "1",
      "text": "Subject was administered 100mg remdesivir intravenously over a period of 120 min"
    }
  ]
}

托管的异步 Web API 响应Hosted asynchronous web API response

由于此 POST 请求用于提交异步操作的作业,因此响应对象中没有任何文本。Since this POST request is used to submit a job for the asynchronous operation, there is no text in the response object. 但是,需要响应标头中操作位置键的值,才能发出 GET 请求来检查作业和输出的状态。However, you need the value of the operation-location KEY in the response headers to make a GET request to check the status of the job and the output. 下面是 POST 请求的响应标头中操作位置键的值的示例:Below is an example of the value of the operation-location KEY in the response header of the POST request:

https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1-preview.4/entities/health/jobs/<jobID>

若要检查作业状态,请在 POST 响应的操作位置键标头的值中向 URL 发出 GET 请求。To check the job status, make a GET request to the URL in the value of the operation-location KEY header of the POST response. 以下状态用于反映作业的状态:NotStartedrunning``succeeded``failed``rejected``cancellingcancelledThe following states are used to reflect the status of a job: NotStarted, running, succeeded, failed, rejected, cancelling, and cancelled.

你可以通过对与 GET 请求相同的 URL 的 DELETE HTTP 调用来取消状态为 NotStartedrunning 的作业。You can cancel a job with a NotStarted or running status with a DELETE HTTP call to the same URL as the GET request. 有关 DELETE 调用的详细信息,请参阅健康状况文本分析托管 API 参考More information on the DELETE call is available in the Text Analytics for health hosted API reference.

以下是 GET 请求的响应示例。The following is an example of the response of a GET request. expirationDateTime(创建作业 24 小时后)已过之前,输出可供检索;在此时间之后,输出将被清除。The output is available for retrieval until the expirationDateTime (24 hours from the time the job was created) has passed after which the output is purged.

{
    "jobId": "be437134-a76b-4e45-829e-9b37dcd209bf",
    "lastUpdateDateTime": "2021-03-11T05:43:37Z",
    "createdDateTime": "2021-03-11T05:42:32Z",
    "expirationDateTime": "2021-03-12T05:42:32Z",
    "status": "succeeded",
    "errors": [],
    "results": {
        "documents": [
            {
                "id": "1",
                "entities": [
                    {
                        "offset": 25,
                        "length": 5,
                        "text": "100mg",
                        "category": "Dosage",
                        "confidenceScore": 1.0
                    },
                    {
                        "offset": 31,
                        "length": 10,
                        "text": "remdesivir",
                        "category": "MedicationName",
                        "confidenceScore": 1.0,
                        "name": "remdesivir",
                        "links": [
                            {
                                "dataSource": "UMLS",
                                "id": "C4726677"
                            },
                            {
                                "dataSource": "DRUGBANK",
                                "id": "DB14761"
                            },
                            {
                                "dataSource": "GS",
                                "id": "6192"
                            },
                            {
                                "dataSource": "MEDCIN",
                                "id": "398132"
                            },
                            {
                                "dataSource": "MMSL",
                                "id": "d09540"
                            },
                            {
                                "dataSource": "MSH",
                                "id": "C000606551"
                            },
                            {
                                "dataSource": "MTHSPL",
                                "id": "3QKI37EEHE"
                            },
                            {
                                "dataSource": "NCI",
                                "id": "C152185"
                            },
                            {
                                "dataSource": "NCI_FDA",
                                "id": "3QKI37EEHE"
                            },
                            {
                                "dataSource": "NDDF",
                                "id": "018308"
                            },
                            {
                                "dataSource": "RXNORM",
                                "id": "2284718"
                            },
                            {
                                "dataSource": "SNOMEDCT_US",
                                "id": "870592005"
                            },
                            {
                                "dataSource": "VANDF",
                                "id": "4039395"
                            }
                        ]
                    },
                    {
                        "offset": 42,
                        "length": 13,
                        "text": "intravenously",
                        "category": "MedicationRoute",
                        "confidenceScore": 1.0
                    },
                    {
                        "offset": 73,
                        "length": 7,
                        "text": "120 min",
                        "category": "Time",
                        "confidenceScore": 0.94
                    }
                ],
                "relations": [
                    {
                        "relationType": "DosageOfMedication",
                        "entities": [
                            {
                                "ref": "#/results/documents/0/entities/0",
                                "role": "Dosage"
                            },
                            {
                                "ref": "#/results/documents/0/entities/1",
                                "role": "Medication"
                            }
                        ]
                    },
                    {
                        "relationType": "RouteOfMedication",
                        "entities": [
                            {
                                "ref": "#/results/documents/0/entities/1",
                                "role": "Medication"
                            },
                            {
                                "ref": "#/results/documents/0/entities/2",
                                "role": "Route"
                            }
                        ]
                    },
                    {
                        "relationType": "TimeOfMedication",
                        "entities": [
                            {
                                "ref": "#/results/documents/0/entities/1",
                                "role": "Medication"
                            },
                            {
                                "ref": "#/results/documents/0/entities/3",
                                "role": "Time"
                            }
                        ]
                    }
                ],
                "warnings": []
            }
        ],
        "errors": [],
        "modelVersion": "2021-03-01"
    }
}

为容器构造 API 请求Structure the API request for the container

可以使用 Postman 或下面的 cURL 请求示例向部署的容器提交查询,以适当的值替换 serverURL 变量。You can use Postman or the example cURL request below to submit a query to the container you deployed, replacing the serverURL variable with the appropriate value. 请注意,容器的 URL 中的 API 版本不同于托管 API。Note the version of the API in the URL for the container is different than the hosted API.

curl -X POST 'http://<serverURL>:5000/text/analytics/v3.2-preview.1/entities/health' --header 'Content-Type: application/json' --header 'accept: application/json' --data-binary @example.json

以下 JSON 是附加到健康状况文本分析 API 请求 POST 正文的 JSON 文件示例:The following JSON is an example of a JSON file attached to the Text Analytics for health API request's POST body:

example.json

{
  "documents": [
    {
      "language": "en",
      "id": "1",
      "text": "Patient reported itchy sores after swimming in the lake."
    },
    {
      "language": "en",
      "id": "2",
      "text": "Prescribed 50mg benadryl, taken twice daily."
    }
  ]
}

容器响应正文Container response body

以下 JSON 是来自容器化同步调用的健康状况文本分析 API 响应正文的示例:The following JSON is an example of the Text Analytics for health API response body from the containerized synchronous call:

{
    "documents": [
        {
            "id": "1",
            "entities": [
                {
                    "offset": 25,
                    "length": 5,
                    "text": "100mg",
                    "category": "Dosage",
                    "confidenceScore": 1.0
                },
                {
                    "offset": 31,
                    "length": 10,
                    "text": "remdesivir",
                    "category": "MedicationName",
                    "confidenceScore": 1.0,
                    "name": "remdesivir",
                    "links": [
                        {
                            "dataSource": "UMLS",
                            "id": "C4726677"
                        },
                        {
                            "dataSource": "DRUGBANK",
                            "id": "DB14761"
                        },
                        {
                            "dataSource": "GS",
                            "id": "6192"
                        },
                        {
                            "dataSource": "MEDCIN",
                            "id": "398132"
                        },
                        {
                            "dataSource": "MMSL",
                            "id": "d09540"
                        },
                        {
                            "dataSource": "MSH",
                            "id": "C000606551"
                        },
                        {
                            "dataSource": "MTHSPL",
                            "id": "3QKI37EEHE"
                        },
                        {
                            "dataSource": "NCI",
                            "id": "C152185"
                        },
                        {
                            "dataSource": "NCI_FDA",
                            "id": "3QKI37EEHE"
                        },
                        {
                            "dataSource": "NDDF",
                            "id": "018308"
                        },
                        {
                            "dataSource": "RXNORM",
                            "id": "2284718"
                        },
                        {
                            "dataSource": "SNOMEDCT_US",
                            "id": "870592005"
                        },
                        {
                            "dataSource": "VANDF",
                            "id": "4039395"
                        }
                    ]
                },
                {
                    "offset": 42,
                    "length": 13,
                    "text": "intravenously",
                    "category": "MedicationRoute",
                    "confidenceScore": 1.0
                },
                {
                    "offset": 73,
                    "length": 7,
                    "text": "120 min",
                    "category": "Time",
                    "confidenceScore": 0.94
                }
            ],
            "relations": [
                {
                    "relationType": "DosageOfMedication",
                    "entities": [
                        {
                            "ref": "#/documents/0/entities/0",
                            "role": "Dosage"
                        },
                        {
                            "ref": "#/documents/0/entities/1",
                            "role": "Medication"
                        }
                    ]
                },
                {
                    "relationType": "RouteOfMedication",
                    "entities": [
                        {
                            "ref": "#/documents/0/entities/1",
                            "role": "Medication"
                        },
                        {
                            "ref": "#/documents/0/entities/2",
                            "role": "Route"
                        }
                    ]
                },
                {
                    "relationType": "TimeOfMedication",
                    "entities": [
                        {
                            "ref": "#/documents/0/entities/1",
                            "role": "Medication"
                        },
                        {
                            "ref": "#/documents/0/entities/3",
                            "role": "Time"
                        }
                    ]
                }
            ],
            "warnings": []
        }
    ],
    "errors": [],
    "modelVersion": "2021-03-01"
}

断言输出Assertion output

健康状况文本分析返回断言修饰符,这些修饰符是分配给医学概念的信息属性,能够提供对文本中概念上下文更深入的理解。Text Analytics for health returns assertion modifiers, which are informative attributes assigned to medical concepts that provide deeper understanding of the concepts’ context within the text. 这些修饰符分为三个类别,每个类别侧重于不同的方面,并包含一组互斥的值。These modifiers are divided into three categories, each focusing on a different aspect, and containing a set of mutually exclusive values. 每个类别仅为每个实体指定一个值。Only one value per category is assigned to each entity. 每个类别最常见的值是默认值。The most common value for each category is the Default value. 服务的输出响应仅包含不同于默认值的断言修饰符。The service’s output response contains only assertion modifiers that are different from the default value.

CERTAINTY - 提供有关概念存在的信息(存在与不存在),以及文本与其存在(明确与可能)有关的信息。CERTAINTY – provides information regarding the presence (present vs. absent) of the concept and how certain the text is regarding its presence (definite vs. possible).

  • Positive [默认值]:概念存在或已发生。Positive [Default]: the concept exists or happened.
  • Negative:概念目前尚不存在或者从未发生过。Negative: the concept does not exist now or never happened.
  • Positive_Possible:概念可能存在,但存在一些不确定性。Positive_Possible: the concept likely exists but there is some uncertainty.
  • Negative_Possible:概念可能不存在,但存在一些不确定性。Negative_Possible: the concept’s existence is unlikely but there is some uncertainty.
  • Neutral_Possible:概念可能存在,也可能不存在,没有偏向任何一方的倾向。Neutral_Possible: the concept may or may not exist without a tendency to either side.

CONDITIONALITY - 提供有关概念的存在是否依赖于特定条件的信息。CONDITIONALITY – provides information regarding whether the existence of a concept depends on certain conditions.

  • None [默认值]:概念是事实,而不是假设,并且不依赖于特定情况。None [Default]: the concept is a fact and not hypothetical and does not depend on certain conditions.
  • Hypothetica:概念可能正在形成,或者会在将来发生。Hypothetical: the concept may develop or occur in the future.
  • Conditional:概念存在或仅在某些条件下出现。Conditional: the concept exists or occurs only under certain conditions.

ASSOCIATION - 描述概念是否与文本的主体或其他人相关联。ASSOCIATION – describes whether the concept is associated with the subject of the text or someone else.

  • Subject [默认值]:概念与文本的主体(通常为患者)相关联。Subject [Default]: the concept is associated with the subject of the text, usually the patient.
  • Someone_Else:概念与不是文本主体的人员关联。Someone_Else: the concept is associated with someone who is not the subject of the text.

断言检测将否定的实体表示为确定性类别的负值,例如:Assertion detection represents negated entities as a negative value for the certainty category, for example:

{
                        "offset": 381,
                        "length": 3,
                        "text": "SOB",
                        "category": "SymptomOrSign",
                        "confidenceScore": 0.98,
                        "assertion": {
                            "certainty&quot;: &quot;negative"
                        },
                        "name": "Dyspnea",
                        "links": [
                            {
                                "dataSource": "UMLS",
                                "id&quot;: &quot;C0013404"
                            },
                            {
                                "dataSource": "AOD",
                                "id&quot;: &quot;0000005442"
                            },
    ...

关系提取输出Relation extraction output

健康状况文本分析可识别不同概念之间的关系,包括属性和实体之间的关系(例如正文结构的方向、药物的剂量)和实体之间的关系(例如缩写检测)。Text Analytics for Health recognizes relations between different concepts, including relations between attribute and entity (for example, direction of body structure, dosage of medication) and between entities (for example, abbreviation detection).

ABBREVIATIONABBREVIATION

DIRECTION_OF_BODY_STRUCTUREDIRECTION_OF_BODY_STRUCTURE

DIRECTION_OF_CONDITIONDIRECTION_OF_CONDITION

DIRECTION_OF_EXAMINATIONDIRECTION_OF_EXAMINATION

DIRECTION_OF_TREATMENTDIRECTION_OF_TREATMENT

DOSAGE_OF_MEDICATIONDOSAGE_OF_MEDICATION

FORM_OF_MEDICATIONFORM_OF_MEDICATION

FREQUENCY_OF_MEDICATIONFREQUENCY_OF_MEDICATION

FREQUENCY_OF_TREATMENTFREQUENCY_OF_TREATMENT

QUALIFIER_OF_CONDITIONQUALIFIER_OF_CONDITION

RELATION_OF_EXAMINATIONRELATION_OF_EXAMINATION

ROUTE_OF_MEDICATIONROUTE_OF_MEDICATION

TIME_OF_CONDITIONTIME_OF_CONDITION

TIME_OF_EVENTTIME_OF_EVENT

TIME_OF_EXAMINATIONTIME_OF_EXAMINATION

TIME_OF_MEDICATIONTIME_OF_MEDICATION

TIME_OF_TREATMENTTIME_OF_TREATMENT

UNIT_OF_CONDITIONUNIT_OF_CONDITION

UNIT_OF_EXAMINATIONUNIT_OF_EXAMINATION

VALUE_OF_CONDITIONVALUE_OF_CONDITION

VALUE_OF_EXAMINATIONVALUE_OF_EXAMINATION

备注

  • 引用 CONDITION 的关系可以指 DIAGNOSIS 实体类型,也可以指 SYMPTOM_OR_SIGN 实体类型。Relations referring to CONDITION may refer to either the DIAGNOSIS entity type or the SYMPTOM_OR_SIGN entity type.
  • 引用 MEDICATION 的关系可以指 MEDICATION_NAME 实体类型,也可以指 MEDICATION_CLASS 实体类型。Relations referring to MEDICATION may refer to either the MEDICATION_NAME entity type or the MEDICATION_CLASS entity type.
  • 引用 TIME 的关系可以指 TIME 实体类型,也可以指 DATE 实体类型。Relations referring to TIME may refer to either the TIME entity type or the DATE entity type.

关系提取输出包含关系类型的实体的 URI 引用和分配的角色。Relation extraction output contains URI references and assigned roles of the entities of the relation type. 例如:For example:

                "relations": [
                    {
                        "relationType": "DosageOfMedication",
                        "entities": [
                            {
                                "ref": "#/results/documents/0/entities/0",
                                "role": "Dosage"
                            },
                            {
                                "ref": "#/results/documents/0/entities/1",
                                "role": "Medication"
                            }
                        ]
                    },
                    {
                        "relationType": "RouteOfMedication",
                        "entities": [
                            {
                                "ref": "#/results/documents/0/entities/1",
                                "role": "Medication"
                            },
                            {
                                "ref": "#/results/documents/0/entities/2",
                                "role": "Route"
                            }
                        ]
...
]

另请参阅See also