Transparency note for Text Analytics for Health


This article assumes that you're familiar with guidelines and best practices for the Text Analytics service. For more information, see Transparency note for Text Analytics.

Text Analytics for health uses natural language processing techniques to find and label valuable health information, such as diagnosis, symptoms, medications and treatments in unstructured text documents. The service can be used for many different types of unstructured medical documents, such as discharge summaries, clinical notes, clinical trial protocols, medical publications and more.

Text Analytics for health currently performs Named Entity Recognition (NER), relation extraction, entity negation and entity linking for English-language medical text.

  • Named Entity Recognition detects words and phrases mentioned in unstructured text that can be associated with one or more semantic types, such as diagnosis, medication name, symptom/sign, or age.
  • Relation extraction identifies meaningful connections between concepts mentioned in text. For example, a "time of condition" relation is found by associating a condition name with a time.
  • Entity linking disambiguates distinct entities by associating named entities mentioned in text to concepts found in a predefined database of concepts, such as the Unified Medical Language System (UMLS).
  • The meaning of medical content is highly affected by modifiers such as negation, which can have critical implication if misdiagnosed. Text Analytics for health supports negation detection for the different entities mentioned in the text.

You can read an overview of the API and its capabilities here. You can see a full list of entities and relations that are supported here.

Example use cases

Text Analytics for health can be used in multiple scenarios across a variety of industries that this type of system supports.

Some common customer motivations for using Text Analytics for health include:

  • Assist and automate the processing of medical documents for proper coding to ensure accurate care and billing.
  • Increase efficiency of analyzing healthcare data to help drive success of value-based care models (e.g. Medicare).
  • Improve the aggregation of key data for tracking trends of patient care and history without adding overhead to healthcare providers.
  • Make progress towards adopting HL7 standards which is the framework for the exchange, integration, sharing and retrieval of electronic health information in support of the daily clinical practice and management and overall delivery and evaluation of health services.

Example use cases:

Use case Description
Insights and statistics extraction Identify medical entities such as symptoms, medications, diagnosis in clinical notes and diverse clinical documents. Use this information for producing insights and statistics on patient populations, searching clinical documents, research documents and publications.
Creation of predictive analytics and predictive models from historic data  Powering solutions for planning, decision support, risk analysis and more, based on prediction models created from historic data.
Assisted annotation and curation  Support solutions for clinical data annotation and curation. For example: to support clinical coding, digitization of data that was manually created, automation of registry reporting.
Support solutions for displaying or analyzing medical information  Support solutions for displaying or analyzing medical information. For example, for reporting purposes, supporting quality assurance processes, flagging possible errors to be reviewed by a human.
Decision support Enable solutions that provide information that can assist a human in their work or support a decision made by a human.

Considerations when choosing a use case

Text Analytics for health is a valuable tool in the management and knowledge extraction of unstructured medical text. However, given the sensitive nature of health-related data, it's important to consider your use cases carefully. In all cases, a human should be making decisions, assisted by the information the system returns and there should be a way to review the source data and correct errors.

  • Avoid scenarios that use this service as a medical device, clinical support, or diagnostic tools to be used in the diagnosis, cure, mitigation, treatment or prevention of disease or other conditions without a human intervention. A qualified medical professional should always do due diligence and verify the source data regarding patient care decisions.
  • Avoid scenarios related to automatically granting or denying medical service or health insurance without human intervention. Since this is an extremely impactful decision, the source data should always be verified for decisions that affect coverage level.
  • Carefully consider scenarios that use detected entities to automatically update patient records without human intervention. Always make sure there is a way to report, trace and correct any errors to avoid incorrect data propagating to other systems and affecting patient records.
  • Carefully consider scenarios that use detected entities as a part of patient billing without human intervention. Always make sure there is a way for providers and patients to report, trace, and correct data that is generating incorrect billing.
  • Do not use personal health information for a purpose that consent was not obtained for. Health information has special protections regarding consent. Make sure all data you use has consent for the purpose of your system.

Characteristics and limitations

The system could have both false positive and false negative errors for each feature supported by Text Analytics for health. Several examples of the potential error types are described in table below.

Named Entity Recognition (NER)

False positive

When the system identifies an entity that does not belong to the correct category. For example: COVID-19 in the example below was identified as EXAMINATION_NAME. COVID -19 is not an examination name, it is a diagnosis.

Named Entity Recognition False Positive

False negative

When an entity should have been identified, but wasn't. For example, the entity "ER" in the example below should be identified as CARE_ENVIRONMENT but was not. If the entity was not properly recognized, then the linked code would also not be recognized.

Named Entity Recognition False Negative

Relation Extraction

False positive

When a relation should not have been recognized but was. For example, the value of the AST examination was incorrectly attributed to the ALT examination which already has a measurement value assigned to it.

False negative

When a relation should have been recognized, but wasn't. For example, in the same example, the measurement value of 45 was not assigned to the AST examination.

Relation Extraction False Negative

Entity Linking

False positive

Entity linking is an exact match with the text that is recognized, so a false positive for entity linking would only happen when the source text has a false positive for named entity recognition and the source text is spelled exactly as a valid entity.

False negative

Since entity linking is an exact match with the original text, you could get a false negative if there's enough signal to properly recognize the entity, but the spelling of that entity is not correct in the text. For example, in the text below where therapies was spelled therapis, you would not get the linked entity UMLS: C0087111.

Entity Linking False Negative

Negation Detection

False positive

When the system identifies a negation that should not exist in the text. For example, in the text below, the entity "respiratory disease" is incorrectly negated as a DIAGNOSIS for COVID-19.

Negation Detection False Positive

False negative

When a negation is not properly identified. In the example below, the medication_name should be negated since the patient did not respond to it.

Negation Detection False Negative

Best practices to improve performance

  • In all cases, it is important to do a full evaluation of the performance you are achieving on the real data your system will process. Using real data is key to understanding the performance you can expect to see in your specific scenario.
  • Currently Text Analytics for health only supports English text. If there are other languages embedded within the input text, the quality of the output may be affected.
  • Incorrect spelling may affect the output. Specifically, entity linking is looking for terms and synonyms based on correct spelling. If a drug name, for example, is spelled incorrectly, the system may have enough information to recognize that the text is a drug name, but it may not have the link identified as it would for the correctly spelled drug name.
  • The system does not yet recognize the context of a hypothetical in text. For example, if the doctor were to say "if the patient starts to experience nausea, I would recommend to start Dramamine b.i.d", The system might recognize nausea as an existing symptom rather than a hypothetical one. Review your data and ensure you have other ways to account for recognizing hypotheticals in your data.

Public preview gating

To ensure TA for health is used for scenarios it was designed for, we are making this technology available to customers through an application process. To get access to TA for health, you will need to start by filling out our online intake form. Begin your application here.

Access to the TA for health is subject to Microsoft's sole discretion based on our eligibility criteria, vetting process, and availability to support a limited number of customers during this gated preview. In public preview we are looking for customers who have a significant relationship with Microsoft are interested in working with us on the and additional scenarios that are in keeping with our responsible AI commitments.

See also