Data and privacy for question answering

This article provides some high level details regarding how data is processed by question answering. Its important to remember that you are responsible for your use and the implementation of this technology, including complying with all applicable laws and regulations that apply to you. For example, it's your responsibility to:

  • Understand where your data is processed and stored by the question answering service in order to meet regulatory obligations for your application.

  • Inform the users of your applications that information like chat logs will be logged and can be used for further processing.

  • Ensure that you have all necessary licenses, proprietary rights or other permissions required to the content in your knowledge base that is used as the basis for developing the QnAs.

What data does question answering process?

question answering uses several Azure services, each with a different purpose. For a detailed explanation of how these services are used read the documentation here.

Question answering handles two kinds of customer data:

  • Data sources: Any sources (documents or URLs) added to question answering via the portal or APIs are parsed to extract the QnA pairs. These QnAs are stored in a Azure Cognitive Search service in the customer's subscription. After extracting QnA pairs the management service discards the data sources, so no customer data is stored with the question answering service.

  • Chat logs: If diagnostic logs are turned on, all chat logs are stored in the Azure Monitor service in the customer's subscription.

In both of these cases, Microsoft acts as a data processor. Data is stored and served directly from the customer's subscription.

How does question answering process data?

There are two main parts in the question answering stack that process data:

  • Extraction of question and answer pairs: Any data sources added by the user to the knowledge base are parsed to extract these pairs. The algorithm looks for a repeating pattern in the source documents, or for a particular layout of the content, to determine which sections constitute a question and answer. question answering optimizes the extraction for display in a chat bot, which typically has a small surface area. The extracted QnAs are stored in Azure Cognitive Search.

  • Search for the best answer match: When the Azure Cognitive Search index is built, the ranking looks for the best match for any incoming user question. It does so by applying natural language processing techniques.

How is data retained and what customer controls are available?

The question answering knowledge base and the user chat logs are stored in Azure Cognitive Search and Azure Monitor in the user's subscription itself.

  • Only users who have access to the customer's Azure subscription can view the chat logs stored in Azure Monitor. The owner of the subscription can control who has access by using role-based access control.

  • To control access to a question answering knowledge base, you can assign the appropriate roles to users by using question answering specific roles.

To learn more about privacy and security commitments, see the Microsoft Trust Center.

Next steps