Transparency note for Text Analytics for health

What is a transparency note?


This article assumes that you're familiar with guidelines and best practices for Azure Cognitive Service for language. For more information, see Transparency note for Azure Cognitive Service for language.

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, its capabilities and limitations, and how to achieve the best performance. Microsoft's Transparency Notes are intended to help you understand how our AI technology works, the choices system owners can make that influence system performance and behavior, and the importance of thinking about the whole system, including the technology, the people, and the environment. You can use Transparency Notes when developing or deploying your own system, or share them with the people who will use or be affected by your system.

Microsoft's Transparency notes are part of a broader effort at Microsoft to put our AI principles into practice. To find out more, see Responsible AI principles from Microsoft.

Introduction to Text Analytics for health

the Text Analytics for health feature of Azure Cognitive Service for language uses natural language processing techniques to find and label valuable health information, such as diagnosis, symptoms, medications and treatments in unstructured text documents. The service can be used for many different types of unstructured medical documents, such as discharge summaries, clinical notes, clinical trial protocols, medical publications and more.

Text Analytics for health currently performs Named Entity Recognition (NER), relation extraction, entity negation and entity linking for English-language medical text.

  • Named Entity Recognition detects words and phrases mentioned in unstructured text that can be associated with one or more semantic types, such as diagnosis, medication name, symptom/sign, or age.
  • Relation extraction identifies meaningful connections between concepts mentioned in text. For example, a "time of condition" relation is found by associating a condition name with a time.
  • Entity linking disambiguates distinct entities by associating named entities mentioned in text to concepts found in a predefined database of concepts, such as the Unified Medical Language System (UMLS).
  • The meaning of medical content is highly affected by modifiers such as negation, which can have critical implication if misdiagnosed. The health feature supports negation detection for the different entities mentioned in the text.

You can read an overview of the API and its capabilities here. You can see a full list of entities and relations that are supported here.

Example use cases

Text Analytics for health can be used in multiple scenarios across a variety of industries that this type of system supports.

Some common customer motivations for using Text Anlaytics for health include:

  • Assist and automate the processing of medical documents for proper coding to ensure accurate care and billing.
  • Increase efficiency of analyzing healthcare data to help drive success of value-based care models (e.g. Medicare).
  • Improve the aggregation of key data for tracking trends of patient care and history without adding overhead to healthcare providers.
  • Make progress towards adopting HL7 standards which is the framework for the exchange, integration, sharing and retrieval of electronic health information in support of the daily clinical practice and management and overall delivery and evaluation of health services.

Example use cases:

Use case Description
Insights and statistics extraction Identify medical entities such as symptoms, medications, diagnosis in clinical notes and diverse clinical documents. Use this information for producing insights and statistics on patient populations, searching clinical documents, research documents and publications.
Creation of predictive analytics and predictive models from historic data Powering solutions for planning, decision support, risk analysis and more, based on prediction models created from historic data.
Assisted annotation and curation Support solutions for clinical data annotation and curation. For example: to support clinical coding, digitization of data that was manually created, automation of registry reporting.
Support solutions for displaying or analyzing medical information Support solutions for displaying or analyzing medical information. For example, for reporting purposes, supporting quality assurance processes, flagging possible errors to be reviewed by a human.
Decision support Enable solutions that provide information that can assist a human in their work or support a decision made by a human.

Considerations when choosing a use case

Text Analytics for health is a valuable tool in the management and knowledge extraction of unstructured medical text. However, given the sensitive nature of health-related data, it's important to consider your use cases carefully. In all cases, a human should be making decisions, assisted by the information the system returns and there should be a way to review the source data and correct errors.

Do not use

  • Do not use for scenarios that use this service as a medical device, clinical support, or diagnostic tools to be used in the diagnosis, cure, mitigation, treatment or prevention of disease or other conditions without a human intervention. A qualified medical professional should always do due diligence and verify the source data regarding patient care decisions.
  • Do not use for scenarios related to automatically granting or denying medical service or health insurance without human intervention. Since this is an extremely impactful decision, the source data should always be verified for decisions that affect coverage level.
  • Do not use for scenarios that use personal health information for a purpose that consent was not obtained for. Health information has special protections regarding consent. Make sure all data you use has consent for the purpose of your system.*

Carefully consider

  • Carefully consider scenarios that use detected entities to automatically update patient records without human intervention. Always make sure there is a way to report, trace and correct any errors to avoid incorrect data propagating to other systems and affecting patient records.
  • Carefully consider scenarios that use detected entities as a part of patient billing without human intervention. Always make sure there is a way for providers and patients to report, trace, and correct data that is generating incorrect billing.

Characteristics and limitations

The system could have both false positive and false negative errors for each capability supported by the health feature. Several examples of the potential error types are described in table below.

Named Entity Recognition (NER)

False positive

When the system identifies an entity that does not belong to the correct category. For example: COVID-19 in the example below was identified as EXAMINATION_NAME. COVID -19 is not an examination name, it is a diagnosis.

Named Entity Recognition False Positive

False negative

When an entity should have been identified, but wasn't. For example, the entity "ER" in the example below should be identified as CARE_ENVIRONMENT but was not. If the entity was not properly recognized, then the linked code would also not be recognized.

Named Entity Recognition False Negative

Relation Extraction

False positive

When a relation should not have been recognized but was. For example, the value of the AST examination was incorrectly attributed to the ALT examination which already has a measurement value assigned to it.

False negative

When a relation should have been recognized, but wasn't. For example, in the same example, the measurement value of 45 was not assigned to the AST examination.

Relation Extraction False Negative

Entity Linking

False positive

Entity linking is an exact match with the text that is recognized, so a false positive for entity linking would only happen when the source text has a false positive for named entity recognition and the source text is spelled exactly as a valid entity.

False negative

Since entity linking is an exact match with the original text, you could get a false negative if there's enough signal to properly recognize the entity, but the spelling of that entity is not correct in the text. For example, in the text below where therapies was spelled therapis, you would not get the linked entity UMLS: C0087111.

Entity Linking False Negative

Negation Detection

False positive

When the system identifies a negation that should not exist in the text. For example, in the text below, the entity "respiratory disease" is incorrectly negated as a DIAGNOSIS for COVID-19.

Negation Detection False Positive

False negative

When a negation is not properly identified. In the example below, the medication_name should be negated since the patient did not respond to it.

Negation Detection False Negative

Best practices to improve performance

  • In all cases, it is important to do a full evaluation of the performance you are achieving on the real data your system will process. Using real data is key to understanding the performance you can expect to see in your specific scenario.
  • Currently the health feature only supports English text. If there are other languages embedded within the input text, the quality of the output may be affected.
  • Incorrect spelling may affect the output. Specifically, entity linking is looking for terms and synonyms based on correct spelling. If a drug name, for example, is spelled incorrectly, the system may have enough information to recognize that the text is a drug name, but it may not have the link identified as it would for the correctly spelled drug name.
  • The system does not yet recognize the context of a hypothetical in text. For example, if the doctor were to say "if the patient starts to experience nausea, I would recommend to start Dramamine b.i.d", The system might recognize nausea as an existing symptom rather than a hypothetical one. Review your data and ensure you have other ways to account for recognizing hypotheticals in your data.

See also