您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是在会议 (预览) 中进行对话对话?What is Conversation Transcription in meetings (Preview)?

对话脚本是一种 语音到文本 解决方案,它将语音识别、发言人标识和句子归属组合到每个扬声器 (也称为 Diarization) ,以提供任何会话的实时和/或异步方式。Conversation Transcription is a speech-to-text solution that combines speech recognition, speaker identification, and sentence attribution to each speaker (also known as diarization) to provide real-time and/or asynchronous transcription of any conversation. 对话脚本将会话中的发言人区分开来,以确定谁说什么和何时,并使开发人员能够轻松地将语音到文本添加到执行多发言人 diarization 的应用程序。Conversation Transcription distinguishes speakers in a conversation to determine who said what and when, and makes it easy for developers to add speech-to-text to their applications that perform multi-speaker diarization.

主要功能Key features

  • 时间戳 -每个扬声器查询文本都有一个时间戳,以便您可以轻松地找到短语。Timestamps - each speaker utterance has a timestamp, so that you can easily find when a phrase was said.
  • 可读的脚本 -脚本自动添加格式设置和标点符号,以确保文本与所说的内容完全匹配。Readable transcripts - transcripts have formatting and punctuation added automatically to ensure the text closely matches what was being said.
  • 用户配置文件 -通过收集用户语音示例并将其发送到签名生成来生成用户配置文件。User profiles - user profiles are generated by collecting user voice samples and sending them to signature generation.
  • 演讲者标识 -使用用户配置文件识别扬声器,并为每个扬声器指定 扬声器标识符Speaker identification - speakers are identified using user profiles and a speaker identifier is assigned to each.
  • 多扬声器 diarization -通过使用每个扬声器标识符综合音频流来确定其含义。Multi-speaker diarization - determine who said what by synthesizing the audio stream with each speaker identifier.
  • 实时 脚本–提供有关会话发生的时间和时间的实时脚本。Real-time transcription – provide live transcripts of who is saying what and when while the conversation is happening.
  • 异步 脚本–通过使用多通道音频流提供更高准确性的脚本。asynchronous transcription – provide transcripts with higher accuracy by using a multichannel audio stream.

备注

虽然会话脚本未对房间中的扬声器数量施加限制,但对于每个会话,它已针对2-10 扬声器进行了优化。Although Conversation Transcription does not put a limit on the number of speakers in the room, it is optimized for 2-10 speakers per session.

入门Get started

若要开始,请参阅实时对话对话 快速入门See the real-time conversation transcription quickstart to get started.

用例Use cases

若要为每个人(例如失聪和听力障碍的参与者)提供会议,请务必实时进行脚本。To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it is important to have transcription in real time. 实时模式下的对话对话采用会议音频,并确定谁在说什么,这允许所有会议参与者在不延迟的情况下执行脚本并参与会议。Conversation Transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting without a delay.

提高效率Improved efficiency

会议参与者可以将精力集中在会议上,并对对话进行记录。Meeting participants can focus on the meeting and leave note-taking to Conversation Transcription. 参与者随时可以参加会议,并通过使用脚本而不是记笔记并在会议期间可能会遗漏一些内容来快速跟进后续步骤。Participants can actively engage in the meeting and quickly follow up on next steps, using the transcript instead of taking notes and potentially missing something during the meeting.

工作原理How it works

这是会话脚本工作原理的简要概述。This is a high-level overview of how Conversation Transcription works.

导入对话听录示意图

预期输入Expected inputs

备注

用户语音示例是可选的。User voice samples are optional. 如果不输入此项,脚本将显示不同的扬声器,但显示为 "Speaker1"、"Speaker2" 等,而不是将其识别为预先注册的特定发言人名称。Without this input, the transcription will show different speakers, but shown as "Speaker1", "Speaker2", etc. instead of recognizing as pre-enrolled specific speaker names.

实时与异步Real-time vs. asynchronous

会话脚本提供了三种模式:Conversation Transcription offers three transcription modes:

实时Real-time

实时处理音频数据以返回演讲者标识符 + 抄本。Audio data is processed live to return speaker identifier + transcript. 如果脚本解决方案要求向对话参与者提供正在进行的会话的实时记录视图,请选择此模式。Select this mode if your transcription solution requirement is to provide conversation participants a live transcript view of their ongoing conversation. 例如,如果生成应用程序以使会议更易于访问,听力障碍参与者的理想使用方案是实时脚本。For example, building an application to make meetings more accessible the deaf and hard of hearing participants is an ideal use case for real-time transcription.

异步Asynchronous

音频数据经过批处理处理,以返回演讲者标识符和脚本。Audio data is batch processed to return speaker identifier and transcript. 如果你的脚本解决方案要求提供更高的准确性,而无需实时记录视图,请选择此模式。Select this mode if your transcription solution requirement is to provide higher accuracy without live transcript view. 例如,如果你想要构建一个应用程序以允许会议参与者轻松地捕获错过的会议,则使用异步脚本模式来获取高准确性的脚本结果。For example, if you want to build an application to allow meeting participants to easily catch up on missed meetings, then use the asynchronous transcription mode to get high-accuracy transcription results.

实时加异步Real-time plus asynchronous

实时处理音频数据以返回演讲者标识符 + 抄本,另外,还会创建一个请求,以通过异步处理获取高准确性脚本。Audio data is processed live to return speaker identifier + transcript, and, in addition, a request is created to also get a high-accuracy transcript through asynchronous processing. 如果你的应用程序需要实时脚本,但还需要更高的准确性,以便在会话或会议发生后使用,请选择此模式。Select this mode if your application has a need for real-time transcription but also requires a higher accuracy transcript for use after the conversation or meeting occurred.

语言支持Language support

目前,对话脚本支持以下区域中的 所有语音到文本语言 :  centraluseastasiaeastuswesteuropeCurrently, Conversation Transcription supports all speech-to-text languages in the following regions: centralus, eastasia, eastus, westeurope. 如果需要其他区域设置支持,请联系 对话脚本功能人员。If you require additional locale support, contact the Conversation Transcription Feature Crew.

后续步骤Next steps