您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是语音转文本?What is speech-to-text?

本概述介绍语音转文本服务的优势和功能。In this overview, you learn about the benefits and capabilities of the speech-to-text service. 使用语音转文本(也称为语音识别)功能,可将音频流实时听录为文本。Speech-to-text, also known as speech recognition, enables real-time transcription of audio streams into text. 应用程序、工具或设备可以使用、显示和处理此文本即命令输入。Your applications, tools, or devices can consume, display, and take action on this text as command input. 此服务由 Microsoft 对 Cortana 和 Office 产品使用的同一识别技术提供支持。This service is powered by the same recognition technology that Microsoft uses for Cortana and Office products. 它可与翻译文本转语音服务产品无缝地协同工作。It seamlessly works with the translation and text-to-speech service offerings. 有关可用语音转文本语言的完整列表,请参阅支持的语言For a full list of available speech-to-text languages, see supported languages.

语音转文本服务默认使用通用语言模型。The speech-to-text service defaults to using the Universal language model. 此模型已使用 Microsoft 自有的数据训练,部署在云中。This model was trained using Microsoft-owned data and is deployed in the cloud. 它非常适合用于对话和听写方案。It's optimal for conversational and dictation scenarios. 使用语音转文本在独特的环境中进行识别和听录时,可以创建并训练自定义的声学、语言和发音模型。When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models. 自定义有助于解决环境干扰或特定于行业的词汇的问题。Customization is helpful for addressing ambient noise or industry-specific vocabulary.

通过使用其他参考文本作为输入,语音转文本服务还支持发音评估功能(用于评估语音发音),并向说话人提供有关语音准确性和流畅度的反馈。With additional reference text as input, speech-to-text service also enables pronunciation assessment capability to evaluate speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. 通过发音评估,语言学习者可以练习、获得即时反馈并改进其发音,因此能够自信地讲话和演示。With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. 教师可以使用此功能来实时评估多个说话人的发音。Educators can use the capability to evaluate pronunciation of multiple speakers in real-time. 此功能目前支持美国英语,并与专家进行的语音评估高度相关。The feature currently supports US English, and correlates highly with speech assessments conducted by experts.

备注

必应语音于2019年10月15日停用。Bing Speech was decommissioned on October 15, 2019. 如果你的应用程序、工具或产品正在使用必应语音 Api,我们已创建了可帮助你迁移到语音服务的指南。If your applications, tools, or products are using the Bing Speech APIs, we've created guides to help you migrate to the Speech service.

重要

现在,将对此服务的所有 HTTP 请求强制执行 TLS 1.2。TLS 1.2 is now enforced for all HTTP requests to this service. 有关详细信息,请参阅 Azure 认知服务安全性For more information, see Azure Cognitive Services security.

入门Get started

请参阅快速入门以开始使用语音转文本。See the quickstart to get started with speech-to-text. 该服务通过语音 SDKREST API语音 CLI 提供。The service is available via the Speech SDK, the REST API, and the Speech CLI.

代码示例Sample code

GitHub 上提供了语音 SDK 的示例代码。Sample code for the Speech SDK is available on GitHub. 这些示例涵盖了常见方案,例如,从文件或流中读取音频、连续和单次识别,以及使用自定义模型。These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models.

自定义Customization

除了标准语音服务模型外,还可以创建自定义模型。In addition to the standard Speech service model, you can create custom models. 自定义有助于克服语音识别障碍,如说话风格、词汇和背景噪音,详见自定义语音识别Customization helps to overcome speech recognition barriers such as speaking style, vocabulary and background noise, see Custom Speech. 自定义选项因语言/区域设置而异,请参阅支持的语言以验证相关支持。Customization options vary by language/locale, see supported languages to verify support.

批量听录Batch transcription

批量听录是一组 REST API 操作,可用于听录存储中的大量音频。Batch transcription is a set of REST API operations that enable you to transcribe a large amount of audio in storage. 你可以指向具有共享访问签名 (SAS) URI 的音频文件并异步接收听录结果。You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. 有关如何使用批量听录 API 的详细信息,请参阅操作说明See the how-to for more information on how to use the batch transcription API.

参考文档Reference docs

此语音服务提供两个 SDK。The Speech service provides two SDKs. 第一个 SDK 是主要语音 SDK,它提供了与语音服务交互所需的大部分功能。The first SDK is the primary Speech SDK and provides most of the functionalities needed to interact with the Speech service. 第二个 SDK 特定于设备,其相应的命名是语音设备 SDKThe second SDK is specific to devices, appropriately named the Speech Devices SDK. 这两种 SDK 都提供多种语言版本。Both SDKs are available in many languages.

语音 SDK 参考文档Speech SDK reference docs

请使用以下列表来查找相应的语音 SDK 参考文档:Use the following list to find the appropriate Speech SDK reference docs:

提示

我们会积极维护和更新语音服务 SDK。The Speech service SDK is actively maintained and updated. 若要跟踪更改、更新和添加的功能,请参阅语音 SDK 发行说明To track changes, updates and feature additions refer to the Speech SDK release notes.

语音设备 SDK 参考文档Speech Devices SDK reference docs

语音设备 SDK 是语音 SDK 的超集,具有针对特定设备的扩展功能。The Speech Devices SDK is a superset of the Speech SDK, with extended functionality for specific devices. 若要下载语音设备 SDK,必须首先选择开发工具包To download the Speech Devices SDK, you must first choose a development kit.

REST API 参考REST API references

有关各种语音服务 REST API 的参考,请参阅下面的列表:For references of various Speech service REST APIs, refer to the listing below:

后续步骤Next steps