您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是语音服务?What is the Speech service?

语音服务在单个 Azure 订阅中统合了语音转文本、文本转语音以及语音翻译功能。The Speech service is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. 使用语音 CLI语音 SDK语音设备 SDKSpeech StudioREST API 可以轻松在应用程序、工具和设备中启用语音。It's easy to speech enable your applications, tools, and devices with the Speech CLI, Speech SDK, Speech Devices SDK, Speech Studio, or REST APIs.

重要

语音服务已替代必应语音 API 和语音翻译。The Speech service has replaced Bing Speech API and Translator Speech. 有关迁移说明,请参阅_操作指南 > 迁移_。See How-to guides > Migration for migration instructions.

以下功能是语音服务的一部分。The following features are part of the Speech service. 请使用下表中的链接详细了解每项功能的常见用例或浏览 API 参考信息。Use the links in this table to learn more about common use-cases for each feature, or browse the API reference.

服务Service 功能Feature 说明Description SDK 中 IsInRole 中的声明SDK RESTREST
语音转文本Speech-to-Text 实时语音转文本Real-time Speech-to-text 语音转文本可将音频流或本地文件实时转录或翻译为文本,应用程序、工具或设备可以使用或显示这些文本。Speech-to-text transcribes or translates audio streams or local files to text in real time that your applications, tools, or devices can consume or display. 结合语言理解 (LUIS) 使用语音转文本可以从听录的语音中派生用户意向,以及处理语音命令。Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. Yes Yes
批量语音转文本Batch Speech-to-Text 批量语音转文本支持对 Azure Blob 存储中存储的大量语音音频数据进行异步语音到文本转录。Batch Speech-to-text enables asynchronous speech-to-text transcription of large volumes of speech audio data stored in Azure Blob Storage. 除了将语音音频转换为文本,批量语音转文本还允许进行分割聚类和情感分析。In addition to converting speech audio to text, Batch Speech-to-text also allows for diarization and sentiment-analysis. No Yes
多设备对话Multi-device Conversation 在对话中连接多个设备或客户端以发送基于语音或文本的消息,并轻松支持听录和翻译Connect multiple devices or clients in a conversation to send speech- or text-based messages, with easy support for transcription and translation Yes No
对话听录Conversation Transcription 启用实时语音识别、说话人识别和分割聚类。Enables real-time speech recognition, speaker identification, and diarization. 它非常适合用于听录能够区分说话人的面对面会谈场景。It's perfect for transcribing in-person meetings with the ability to distinguish speakers. Yes No
创建自定义语音识别模型Create Custom Speech Models 如果使用语音转文本在独特的环境中进行识别和听录,则可以创建并训练自定义的声学、语言和发音模型,以解决环境干扰或行业特定的词汇。If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. No Yes
文本转语音Text-to-Speech 文本转语音Text-to-speech 文本转语音可使用语音合成标记语言 (SSML) 将输入文本转换为类似人类的合成语音。Text-to-speech converts input text into human-like synthesized speech using Speech Synthesis Markup Language (SSML). 可以选择标准语音或神经语音(请参阅语言支持)。Choose from standard voices and neural voices (see Language support). Yes Yes
创建自定义语音Create Custom Voices 创建专属于品牌或产品的自定义语音字体。Create custom voice fonts unique to your brand or product. No Yes
语音翻译Speech Translation 语音翻译Speech translation 使用语音翻译可在应用程序、工具和设备中实现实时的多语言语音翻译。Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. 进行语音转语音和语音转文本翻译时可以使用此服务。Use this service for speech-to-speech and speech-to-text translation. Yes No
语音助手Voice assistants 语音助手Voice assistants 语音助手使用语音服务为开发人员助力,使他们可为其应用程序和体验创建自然的、类似于人类的对话界面。Voice assistants using the Speech service empower developers to create natural, human-like conversational interfaces for their applications and experiences. 语音助手服务在设备和助手实现之间提供快速且可靠的交互。该实现使用 Bot Framework 的 Direct Line 语音通道或集成的自定义命令(预览版)服务来完成任务。The voice assistant service provides fast, reliable interaction between a device and an assistant implementation that uses the Bot Framework's Direct Line Speech channel or the integrated Custom Commands (Preview) service for task completion. Yes No
说话人识别Speaker Recognition 说话人验证和标识Speaker verification & identification 说话人识别服务提供根据其独特的语音特征来验证和识别说话人的算法。The Speaker Recognition service provides algorithms that verify and identify speakers by their unique voice characteristics. 说话人识别用于回答“谁在说话?”的问题。Speaker Recognition is used to answer the question “who is speaking?”. Yes Yes

重要

现在,将对此服务的所有 HTTP 请求强制执行 TLS 1.2。TLS 1.2 is now enforced for all HTTP requests to this service. 有关详细信息,请参阅 Azure 认知服务安全性For more information, see Azure Cognitive Services security.

免费试用语音服务Try the Speech service for free

若要执行以下步骤,需要一个 Microsoft 帐户和一个 Azure 帐户。For the following steps, you need both a Microsoft account and an Azure account. 如果没有 Microsoft 帐户,可以在 Microsoft 帐户门户上注册一个免费帐户。If you do not have a Microsoft account, you can sign up for one free of charge at the Microsoft account portal. 选择“Microsoft 登录”,然后,当系统要求登录时,选择“创建 Microsoft 帐户” 。Select Sign in with Microsoft and then, when asked to sign in, select Create a Microsoft account. 按步骤创建并验证新的 Microsoft 帐户。Follow the steps to create and verify your new Microsoft account.

具有 Azure 帐户后,请转到 Azure 注册页面,选择“免费开始使用”,然后使用 Microsoft 帐户创建新的 Azure 帐户。Once you have a Microsoft account, go to the Azure sign-up page, select Start free, and create a new Azure account using a Microsoft account.

备注

语音服务有两个服务层级:免费和订阅,它们具有不同的限制和优势。The Speech service has two service tiers: free and subscription, which have different limitations and benefits. 注册 Azure 免费帐户时,该帐户附带 200 美元的服务额度,可用于支付长达 30 天的付费语音服务订阅。When you sign up for a free Azure account it comes with $200 in service credit that you can apply toward a paid Speech service subscription, valid for up to 30 days.

如果使用免费的低流量语音服务层级,即使是在免费试用帐户或服务额度过期之后,也仍可以保留此免费订阅。If you use the free, low-volume Speech service tier you can keep this free subscription even after your free trial or service credit expires.

有关详细信息,请参阅认知服务定价 - 语音服务For more information, see Cognitive Services pricing - Speech service.

创建 Azure 资源Create the Azure resource

若要将语音服务资源(免费层或付费层)添加到 Azure 帐户,请执行以下步骤:To add a Speech service resource (free or paid tier) to your Azure account:

  1. 使用你的 Microsoft 帐户登录到 Azure 门户Sign in to the Azure portal using your Microsoft account.

  2. 选择门户左上角的“创建资源”。 Select Create a resource at the top left of the portal. 如果未看到“创建资源”,可通过选择屏幕左上角的折叠菜单找到它。If you do not see Create a resource, you can always find it by selecting the collapsed menu in the upper left corner of the screen.

  3. 在“新建”窗口中的搜索框内键入“语音”,然后按 ENTER。 In the New window, type "speech" in the search box and press ENTER.

  4. 在搜索结果中,选择“语音”。 In the search results, select Speech.

    语音搜索结果

  5. 选择“创建”,然后: Select Create, then:

    • 为新资源指定唯一的名称。Give a unique name for your new resource. 名称有助于区分绑定到同一服务的多个订阅。The name helps you distinguish among multiple subscriptions tied to the same service.
    • 选择新资源关联的 Azure 订阅,以确定计费方式。Choose the Azure subscription that the new resource is associated with to determine how the fees are billed.
    • 选择将使用资源的区域Choose the region where the resource will be used.
    • 选择免费 (F0) 或付费 (S0) 定价层。Choose either a free (F0) or paid (S0) pricing tier. 若要查看每个层的定价和用量配额的完整信息,请选择“查看全部定价详细信息” 。For complete information about pricing and usage quotas for each tier, select View full pricing details. 有关可为每个订阅创建的资源的限制,请参阅 Azure 认知服务限制For limits on resources you can create for each subscription, see Azure Cognitive Services Limits.
    • 为此“语音”订阅创建新的资源组或将订阅分配到现有资源组。Create a new resource group for this Speech subscription or assign the subscription to an existing resource group. 资源组有助于使多种 Azure 订阅保持有序状态。Resource groups help you keep your various Azure subscriptions organized.
    • 选择“创建” 。Select Create. 系统随后会将你转到部署概述,并显示部署进度消息。This will take you to the deployment overview and display deployment progress messages.

部署新的语音资源需要花费片刻时间。It takes a few moments to deploy your new Speech resource. 部署完成后,选择“转到资源”,然后在左侧导航窗格中选择“密钥”以显示语音服务订阅密钥。 Once deployment is complete, select Go to resource and in the left navigation pane select Keys to display your Speech service subscription keys. 每个订阅有两个密钥;可在应用程序中使用任意一个密钥。Each subscription has two keys; you can use either key in your application. 若要将密钥快速复制/粘贴到代码编辑器或其他位置,请选择每个密钥旁边的复制按钮,切换窗口,然后将剪贴板中的内容粘贴到所需位置。To quickly copy/paste a key to your code editor or other location, select the copy button next to each key, switch windows to paste the clipboard contents to the desired location.

重要

这些订阅密钥用于访问认知服务 API。These subscription keys are used to access your Cognitive Service API. 不要共享你的密钥。Do not share your keys. 安全存储密钥 - 例如,使用 Azure Key Vault。Store them securely– for example, using Azure Key Vault. 此外,我们建议定期重新生成这些密钥。We also recommend regenerating these keys regularly. 发出 API 调用只需一个密钥。Only one key is necessary to make an API call. 重新生成第一个密钥时,可以使用第二个密钥来持续访问服务。When regenerating the first key, you can use the second key for continued access to the service.

完成快速入门Complete a quickstart

我们提供了适用于大多数流行编程语言的快速入门,旨在让你了解基本设计模式并帮助你在 10 分钟以内运行代码。We offer quickstarts in most popular programming languages, each designed to teach you basic design patterns, and have you running code in less than 10 minutes. 请参阅以下列表,了解每项功能的快速入门。See the following list for the quickstart for each feature.

在你有机会开始使用语音服务后,请尝试一下我们的教程,了解如何处理各种情况。After you've had a chance to get started with the Speech service, try our tutorials that show you how to solve various scenarios.

获取示例代码Get sample code

GitHub 上提供了语音服务的示例代码。Sample code is available on GitHub for the Speech service. 这些示例涵盖了常见方案,例如,从文件或流中读取音频、连续和单次识别,以及使用自定义模型。These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. 使用以下链接查看 SDK 和 REST 示例:Use these links to view SDK and REST samples:

自定义语音体验Customize your speech experience

语音服务能够很好地与内置模型配合工作,但是,你可能想要根据自己的产品或环境,进一步自定义和优化体验。The Speech service works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. 自定义选项的范围从声学模型优化,到专属于自有品牌的语音字体。Customization options range from acoustic model tuning to unique voice fonts for your brand.

其他产品提供了针对特定用途(如卫生保健或保险)而优化的语音模型,但可供所有人平等地使用。Other products offer speech models tuned for specific purposes like healthcare or insurance, but are available to everyone equally. Azure 语音的自定义功能将成为你的独特竞争优势部分,而其他任何用户或客户都无法使用。Customization in Azure Speech becomes part of your unique competitive advantage that is unavailable to any other user or customer. 换句话说,你的模型是私人的,仅针对你的用例进行自定义调整。In other words, your models are private and custom-tuned for your use-case only.

语音服务Speech Service 平台Platform 说明Description
语音转文本Speech-to-Text 自定义语音识别Custom Speech 根据需要和可用数据自定义语音识别模型。Customize speech recognition models to your needs and available data. 克服语音识别障碍,如说话风格、词汇和背景噪音。Overcome speech recognition barriers such as speaking style, vocabulary and background noise.
文本转语音Text-to-Speech 自定义语音Custom Voice 使用可用语音数据为文本转语音应用生成可识别的独一无二的语音。Build a recognizable, one-of-a-kind voice for your Text-to-Speech apps with your speaking data available. 可以通过调整一组语音参数来进一步微调语音输出。You can further fine-tune the voice outputs by adjusting a set of voice parameters.

参考文档Reference docs

后续步骤Next steps