Transparency note for summarization

This article discusses the document summarization feature of Azure Cognitive Services. Specifically, you learn about how to use the technology responsibly, through general principles and example use cases.

What is a transparency note?

An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Creating a system that is fit for its intended purpose requires an understanding of how the technology works, its capabilities and limitations, and how to achieve the best performance.

Microsoft transparency notes are intended to help you understand how our AI technology works, and the choices that you as a system owner can make that influence system performance and behavior. It's important to think about the whole system, including the technology, the people, and the environment. You can use transparency notes when you develop or deploy your own system, or share them with the people who will use or be affected by your system.

Transparency notes are part of a broader effort at Microsoft to put our AI principles into practice. To find out more, see Responsible AI principles from Microsoft.

General principles

There are a number of things you need to consider when deciding how to use and implement AI-powered products and features responsibly:

  • Will this product or feature perform well in my scenario? Before deploying AI into your scenario, test how it performs by using real-life data, and make sure it can deliver the accuracy you need.
  • Are you equipped to identify and respond to errors? AI-powered products and features aren't 100 percent accurate, so consider how you will identify and respond to any errors that might occur.

Document summarization uses natural language processing techniques to generate a summary for documents. There are two general approaches to auto-summarization: extractive and abstractive. The technique for document summarization is extractive.

This feature extracts sentences that collectively represent the most important or relevant information within the original content. It locates key sentences in an unstructured text document, and collectively, these sentences convey the main idea of the document.

Document summarization helps users with content that they think is too long to read. The extractive process condenses articles, papers, or documents to key sentences. This feature is provided as an API for developers. They can use it to build intelligent solutions based on the relevant information extracted, to support various use cases.

Extractive summarization returns a rank score as a part of the system response, along with extracted sentences and their position in the original documents. A rank score of a sentence is an indicator of how important or relevant that sentence is to the main idea of a document. The model gives a score between 0 and 1 to each sentence, and returns the highest scored sentences per request. If you request a three-sentence summary, the service returns the three highest scored sentences.

Example use cases

You might want to use this feature if you need to:

  • Assist the processing of documents to improve efficiency.
  • Distill critical information from lengthy documents, reports, and other text forms.
  • Highlight key sentences in documents.
  • Quickly skim documents in a library.
  • Generate news feed content.

You can also use extractive summarization in multiple scenarios, across a variety of industries. For example, you can use extractive summarization to:

  • Extract key information from public news articles, to produce insights such as trends and news spotlights.
  • Classify documents by their key contents.
  • Distill important information from long documents to empower solutions such as search, question and answer formats, and decision support.
  • Empower solutions for clustering documents by their relevant content.

Considerations when you choose a use case

We encourage you to come up with use cases that most closely match your own particular context and requirements. Draw on actionable information that enables responsible integration in your use cases, and conduct your own testing specific to your scenarios.

The summarization models reflect certain societal views that are over-represented in the training data, relative to other, marginalized perspectives. The models reflect societal biases and other undesirable content present in the training data. As a result, we caution against using the models in high-stakes scenarios, where unfair, unreliable, or offensive behavior might be extremely costly or lead to harm.

  • Avoid real-time, critical safety alerting. Don't rely on this feature for scenarios that require real-time alerts to trigger intervention to prevent injury. For example, don't rely on summarization for turning off a piece of heavy machinery when a harmful action is present.

  • The feature isn't suitable for scenarios where up-to-date, factually accurate information is crucial, unless you have human reviewers. The service doesn't have information about current events after its training date, probably has missing knowledge about some topics, and might not always produce factually accurate information.

  • Avoid scenarios in which the use or misuse of the system could have a consequential impact on life opportunities or legal status. For example, avoid scenarios in which the AI system could affect an individual's legal status or legal rights. Additionally, avoid scenarios in which the AI system could affect an individual's access to credit, education, employment, healthcare, housing, insurance, social welfare benefits, services, opportunities, or the terms on which they are provided.

Next steps