您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

什么是语音服务?What are the Speech Services?

Azure 语音服务在单个 Azure 订阅中统合了语音转文本、文本转语音以及语音翻译功能。Azure Speech Services are the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. 使用语音 SDK语音设备 SDKREST API 可以轻松在应用程序、工具和设备中启用语音。It's easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.


语音服务已替代必应语音 API、语音翻译和自定义语音。Speech Services have replaced Bing Speech API, Translator Speech, and Custom Speech. 有关迁移说明,请参阅操作指南 > 迁移See How-to guides > Migration for migration instructions.

这些功能构成了 Azure 语音服务。These features make up the Azure Speech Services. 请使用下表中的链接详细了解每项功能的常见用例或浏览 API 参考信息。Use the links in this table to learn more about common use cases for each feature or browse the API reference.

服务Service FeatureFeature 说明Description SDK 中 IsInRole 中的声明SDK RESTREST
语音转文本Speech-to-Text 语音转文本Speech-to-text 语音转文本可将音频流实时听录为应用程序、工具或设备可以使用或显示的文本。Speech-to-text transcribes audio streams to text in real time that your applications, tools, or devices can consume or display. 结合语言理解 (LUIS) 使用语音转文本可以从听录的语音中派生用户意向,以及处理语音命令。Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. Yes Yes
批量听录Batch Transcription 使用批量听录能够以异步方式对大量的数据进行语音转文本听录。Batch transcription enables asynchronous speech-to-text transcription of large volumes of data. 这是一个基于 REST 的服务,它使用的终结点与自定义和模型管理相同。This is a REST-based service, which uses same endpoint as customization and model management. No Yes
对话听录Conversation Transcription 启用实时语音识别、说话人识别和分割聚类。Enables real-time speech recognition, speaker identification, and diarization. 它非常适合用于听录能够区分说话人的面对面会谈场景。It's perfect for transcribing in-person meetings with the ability to distinguish speakers. Yes No
创建自定义语音模型Create Custom Speech Models 如果使用语音转文本在独特的环境中进行识别和听录,则可以创建并训练自定义的声学、语言和发音模型,以解决环境干扰或行业特定的词汇。If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. No Yes
文本转语音Text-to-Speech 文本转语音Text-to-speech 文本转语音可使用语音合成标记语言 (SSML) 将输入文本转换为类似人类的合成语音。Text-to-speech converts input text into human-like synthesized speech using Speech Synthesis Markup Language (SSML). 可以选择标准语音或神经语音(请参阅语言支持)。Choose from standard voices and neural voices (see Language support). Yes Yes
创建自定义语音Create Custom Voices 创建专属于品牌或产品的自定义语音字体。Create custom voice fonts unique to your brand or product. No Yes
语音翻译Speech Translation 语音翻译Speech translation 使用语音翻译可在应用程序、工具和设备中实现实时的多语言语音翻译。Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. 进行语音转语音和语音转文本翻译时可以使用此服务。Use this service for speech-to-speech and speech-to-text translation. Yes No
语音优先虚拟助手Voice-first Virtual Assistants 语音优先虚拟助手Voice-first virtual assistants 自定义虚拟助手使用 Azure 语音服务为开发人员助力,使开发人员可以为其应用程序和体验创建自然的、类似于人类的对话接口。Custom virtual assistants using Azure Speech Services empower developers to create natural, human-like conversational interfaces for their applications and experiences. Bot Framework 的 Direct Line Speech 通道通过为兼容机器人提供协调的、安排好的入口点来实现延迟时间短、可靠性高的双向语音交互,从而增强了这些功能。The Bot Framework's Direct Line Speech channel enhances these capabilities by providing a coordinated, orchestrated entry point to a compatible bot that enables voice in, voice out interaction with low latency and high reliability. Yes No

新增功能和更新News and updates

了解 Azure 语音服务的新增功能。Learn what's new with the Azure Speech Services.

  • 2019 年 7 月June 2019
    • 发布了语音 SDK 1.6.0。Released Speech SDK 1.6.0. 有关更新、增强功能和已知问题的完整列表,请参阅发行说明For a full list of updates, enhancements, and known issues, see Release notes.
  • 2019 年 5 月 - 会话听录呼叫中心听录语音优先虚拟助手的文档现已提供。May 2019 - Documentation is now available for Conversation Transcription, Call Center Transcription, and Voice-first Virtual Assistants.
  • 2019 年 5 月May 2019
    • 发布了语音 SDK 1.5.1。Released Speech SDK 1.5.1. 有关更新、增强功能和已知问题的完整列表,请参阅发行说明For a full list of updates, enhancements, and known issues, see Release notes.
    • 发布了语音 SDK 1.5.0。Released Speech SDK 1.5.0. 有关更新、增强功能和已知问题的完整列表,请参阅发行说明For a full list of updates, enhancements, and known issues, see Release notes.
  • 2019 年 4 月 - 发布了语音 SDK 1.4.0,支持在 Windows 和 Linux 上使用 C++、C# 和 Java 进行文本到语音转换(Beta 版本)。April 2019 - Released Speech SDK 1.4.0 with support for text-to-speech (Beta) for C++, C#, and Java on Windows and Linux. 另外,SDK 现在对于 Linux 上的 C++ 和 C# 支持 MP3 和 Opus/Ogg 音频格式。Additionally, the SDK now supports MP3 and Opus/Ogg audio formats for C++ and C# on Linux. 有关更新、增强功能和已知问题的完整列表,请参阅发行说明For a full list of updates, enhancements, and known issues, see Release notes.
  • 2019 年 3 月 - 现在有一个新的用于文本到语音转换 (TTS) 的终结点可用,可以返回特定区域中可用语音的完整列表。March 2019 - A new endpoint for text-to-speech (TTS) that returns a full list of voices available in a specific region is now available. 另外,TTS 现在支持新区域。Additionally, new regions are now supported for TTS. 有关详细信息,请参阅文本到语音转换 API 参考 (REST)For more information, see Text-to-speech API reference (REST).

试用语音服务Try Speech Services

我们提供了适用于大多数流行编程语言的快速入门,旨在帮助你在 10 分钟以内运行代码。We offer quickstarts in most popular programming languages, each designed to have you running code in less than 10 minutes. 下表包含有关每项功能在最流行编程语言中的用法的快速入门。This table contains the most popular quickstarts for each feature. 使用左侧的导航栏可以浏览其他语言和平台。Use the left-hand navigation to explore additional languages and platforms.

语音转文本 (SDK)Speech-to-text (SDK) 文本转语音 (SDK)Text-to-Speech (SDK) 翻译 (SDK)Translation (SDK)
C#、.NET Core (Windows)C#, .NET Core (Windows) C#、.NET Framework (Windows)C#, .NET Framework (Windows) Java(Windows、Linux)Java (Windows, Linux)
JavaScript(浏览器)JavaScript (Browser) C++ (Windows)C++ (Windows) C#、.NET Core (Windows)C#, .NET Core (Windows)
Python(Windows、Linux、macOS)Python (Windows, Linux, macOS) C++ (Linux)C++ (Linux) C#、.NET Framework (Windows)C#, .NET Framework (Windows)
Java(Windows、Linux)Java (Windows, Linux) C++ (Windows)C++ (Windows)


“语音转文本”和“文本转语音”功能也有 REST 终结点和相关联的快速入门。Speech-to-text and text-to-speech also have REST endpoints and associated quickstarts.

有机会使用语音服务后,请尝试学习有关如何使用语音 SDK 和 LUIS 从语音中识别意向的教程。After you've had a chance to use the Speech Services, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.

获取示例代码Get sample code

GitHub 中提供了每个 Azure 语音服务的示例代码。Sample code is available on GitHub for each of the Azure Speech Services. 这些示例涵盖了常见方案,例如,从文件或流中读取音频、连续和单次识别,以及使用自定义模型。These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. 使用以下链接查看 SDK 和 REST 示例:Use these links to view SDK and REST samples:

自定义语音体验Customize your speech experience

Azure 语音服务能够很好地与内置模型配合工作,但是,你可能想要根据自己的产品或环境,进一步自定义和优化体验。Azure Speech Services works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. 自定义选项的范围从声学模型优化,到专属于自有品牌的语音字体。Customization options range from acoustic model tuning to unique voice fonts for your brand. 生成自定义模型后,可将其与任何 Azure 语音服务配合使用。After you've built a custom model, you can use it with any of the Azure Speech Services.

语音服务Speech Service 模型Model 说明Description
语音转文本Speech-to-Text 声学模型Acoustic model 为特定环境(例如汽车或工厂车间)中使用的应用程序、工具或设备创建自定义声学模型,每个模型具有特定的录制条件。Create a custom acoustic model for applications, tools, or devices that are used in particular environments like in a car or on a factory floor, each with specific recording conditions. 示例包括带有口音的讲话、特定的背景噪音,或使用特定的麦克风录制音频。Examples include accented speech, specific background noises, or using a specific microphone for recording.
语言模型Language model 创建自定义语言模型来改善特定领域的词汇和语法的听录,例如医疗术语中或 IT 行话。Create a custom language model to improve transcription of field-specific vocabulary and grammar, such as medical terminology, or IT jargon.
发音模型Pronunciation model 借助自定义发音模型,可以定义语音形式以及字词或术语的显示。With a custom pronunciation model, you can define the phonetic form and display of a word or term. 它适用于处理自定义术语,如产品名称或首字母缩略词。It's useful for handling customized terms, such as product names or acronyms. 只需使用发音文件(简单的 .txt 文件)即可。All you need to get started is a pronunciation file -- a simple .txt file.
文本转语音Text-to-Speech 语音字体Voice font 使用自定义语音字体可为自有品牌创建可识别的独一无二的声音。Custom voice fonts allow you to create a recognizable, one-of-a-kind voice for your brand. 只需使用少量的数据即可开始创建。It only takes a small amount of data to get started. 提供的数据越多,语音字体就越自然,且越接近人类语音。The more data that you provide, the more natural and human-like your voice font will sound.

参考文档Reference docs

后续步骤Next steps