您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

关于语音 SDKAbout the Speech SDK

语音软件开发工具包 (SDK) 公开了许多语音服务功能,这使得你能够开发支持语音的应用程序。The Speech software development kit (SDK) exposes many of the Speech service capabilities, to empower you to develop speech-enabled applications. 语音 SDK 可以在许多编程语言中和所有平台中使用。The Speech SDK is available in many programming languages and across all platforms.

编程语言Programming language 平台Platform SDK 参考SDK reference
C# 1C# 1 Windows、Linux、macOS、Mono、Xamarin.iOS、Xamarin.Mac、Xamarin.Android、UWP、UnityWindows, Linux, macOS, Mono, Xamarin.iOS, Xamarin.Mac, Xamarin.Android, UWP, Unity .NET SDK.NET SDK
C++C++ Windows、Linux、macOSWindows, Linux, macOS C++ SDKC++ SDK
Java 2Java 2 Android、Windows、Linux、macOSAndroid, Windows, Linux, macOS Java SDKJava SDK
JavascriptJavaScript Browser、Node.jsBrowser, Node.js JavaScript SDKJavaScript SDK
Objective-C/SwiftObjective-C / Swift iOS、macOSiOS, macOS Objective-C SDKObjective-C SDK
PythonPython Windows、Linux、macOSWindows, Linux, macOS Python SDKPython SDK

1 .NET 语音 SDK 基于 .NET Standard 2.0,因此它支持很多平台。有关详细信息,请参阅 .NET 实现支持1 The .NET Speech SDK is based on .NET Standard 2.0, thus it supports many platforms. For more information, see .NET implementation support .

2 Java 语音 SDK 也作为语音设备 SDK 的一部分提供。2 The Java Speech SDK is also available as part of the Speech Devices SDK.

方案功能Scenario capabilities

语音 SDK 公开了语音服务中的许多功能,但未公开全部功能。The Speech SDK exposes many features from the Speech service, but not all of them. 语音 SDK 的功能通常与方案相关联。The capabilities of the Speech SDK are often associated with scenarios. 语音 SDK 同时适用于实时和非实时方案,使用本地设备、文件、Azure Blob 存储甚至输入和输出流。The Speech SDK is ideal for both real-time and non-real-time scenarios, using local devices, files, Azure blob storage, and even input and output streams. 如果无法通过语音 SDK 实现某个方案,请寻求使用 REST API 替代方法。When a scenario is not achievable with the Speech SDK, look for a REST API alternative.

语音转文本Speech-to-text

语音转文本(也称为“语音识别”)可将音频流听录为应用程序、工具或设备可以使用或显示的文本。Speech-to-text (also known as speech recognition) transcribes audio streams to text that your applications, tools, or devices can consume or display. 结合语言理解 (LUIS) 使用语音转文本可以从听录的语音中派生用户意向,以及处理语音命令。Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. 使用语音翻译通过单个调用将语音输入翻译为另一种语言。Use Speech Translation to translate speech input to a different language with a single call. 有关详细信息,请参阅语音转文本基础知识For more information, see Speech-to-text basics.

可在以下平台上使用语音识别 (SR)、短语列表、意向、翻译和本地容器:Speech-Recognition (SR), Phrase List, Intent, Translation, and On-premises containers are available on the following platforms:

  • C++/Windows 和 Linux 和 macOSC++/Windows & Linux & macOS
  • C#(Framework 和 .NET Core)/Windows 和 UWP 和 Unity 和 Xamarin 和 Linux 和 macOSC# (Framework & .NET Core)/Windows & UWP & Unity & Xamarin & Linux & macOS
  • Java(Jre 和 Android)Java (Jre and Android)
  • JavaScript(浏览器和 NodeJS)JavaScript (Brower and NodeJS)
  • PythonPython
  • SwiftSwift
  • Objective-CObjective-C
  • Go(仅 SR)Go (SR only)

文本转语音Text-to-speech

文本转语音(也称为“语音合成”)将文本转换为类似人类语言的合成语音。Text-to-speech (also known as speech synthesis) converts text into human-like synthesized speech. 输入文本是字符串文字或使用语音合成标记语言 (SSML)The input text is either string literals or using the Speech Synthesis Markup Language (SSML). 有关标准语音或神经语音的详细信息,请参阅文本转语音语言和语音支持For more information on standard or neural voices, see Text-to-speech language and voice support.

可在以下平台上使用文本转语音 (TTS):Text-to-speech (TTS) is available on the following platforms:

  • C++/Windows 和 LinuxC++/Windows & Linux
  • C#/Windows 和 UWP 和 UnityC#/Windows & UWP & Unity
  • Java(Jre 和 Android)Java (Jre and Android)
  • PythonPython
  • SwiftSwift
  • Objective-CObjective-C
  • TTS REST API 可以在所有其他情况下使用。TTS REST API can be used in every other situation.

语音助手Voice assistants

使用语音 SDK 的语音助手使开发人员可以为其应用程序和体验创建自然、人为的对话接口。Voice assistants using the Speech SDK enable developers to create natural, human-like conversational interfaces for their applications and experiences. 语音助手服务在设备和助手之间提供快速、可靠的交互。The voice assistant service provides fast, reliable interaction between a device and an assistant. 实现使用 Bot 框架的直接线路语音通道或集成自定义命令 (预览) 服务完成任务。The implementation uses the Bot Framework's Direct Line Speech channel or the integrated Custom Commands (Preview) service for task completion. 此外,语音助手可以使用在 自定义语音门户 中创建的自定义语音来添加独特的语音输出体验。Additionally, voice assistants can use custom voices created in the Custom Voice Portal to add a unique voice output experience.

语音助手 适用于以下平台:Voice assistants is available on the following platforms:

  • C++/Windows 和 Linux 和 macOSC++/Windows & Linux & macOS
  • C#/WindowsC#/Windows
  • Java/Windows & Linux & macOS & Android (语音设备 SDK) Java/Windows & Linux & macOS & Android (Speech Devices SDK)

关键字发现Keyword spotting

语音 SDK 支持 关键字发现 的概念。The concept of keyword spotting is supported in the Speech SDK. 关键字发现是在语音中标识关键字的操作,后跟一个对关键字的操作。Keyword spotting is the act of identifying a keyword in speech, followed by an action upon hearing the keyword. 例如,"你好 Cortana" 会激活 Cortana 助手。For example, "Hey Cortana" would activate the Cortana assistant.

关键字发现 (KWS) 可在以下平台上使用:Keyword Spotting (KWS) is available on the following platforms:

  • C++/Windows 和 LinuxC++/Windows & Linux
  • C#/Windows 和 LinuxC#/Windows & Linux
  • Python/Windows 和 LinuxPython/Windows & Linux
  • Java/Windows 和 Linux 和 Android(语音设备 SDK)Java/Windows & Linux & Android (Speech Devices SDK)
  • 关键字发现 (KWS) 功能可能适用于任何麦克风类型,但官方 KWS 支持目前仅限于在 Azure Kinect 深色硬件或语音设备 SDK 中找到的麦克风阵列Keyword spotting (KWS) functionality might work with any microphone type, official KWS support, however, is currently limited to the microphone arrays found in the Azure Kinect DK hardware or the Speech Devices SDK

会议方案Meeting scenarios

无论是通过单个设备还是多设备会话,语音 SDK 都适用于转录 meeting 方案。The Speech SDK is perfect for transcribing meeting scenarios, whether from a single device or multi-device conversation.

对话听录Conversation Transcription

对话 脚本为每个扬声器 (实时 (和异步) 语音识别、发言人识别和句子归属,也称为 diarization) 。Conversation Transcription enables real-time (and asynchronous) speech recognition, speaker identification, and sentence attribution to each speaker (also known as diarization). 它非常适合用于听录能够区分说话人的面对面会谈场景。It's perfect for transcribing in-person meetings with the ability to distinguish speakers.

会话 脚本在以下平台上提供:Conversation Transcription is available on the following platforms:

  • C++/Windows 和 LinuxC++/Windows & Linux
  • C#(Framework 和 .NET Core)/Windows 和 UWP 和 LinuxC# (Framework & .NET Core)/Windows & UWP & Linux
  • Java/Windows 和 Linux 和 Android(语音设备 SDK)Java/Windows & Linux & Android (Speech Devices SDK)

多设备对话Multi-device Conversation

通过 多设备会话,连接会话中的多个设备或客户端,以发送基于语音的消息或基于文本的消息,并对脚本和翻译提供简单支持。With Multi-device Conversation, connect multiple devices or clients in a conversation to send speech-based or text-based messages, with easy support for transcription and translation.

多设备对话 在以下平台上提供:Multi-device Conversation is available on the following platforms:

  • C + +/WindowsC++/Windows
  • C # (Framework & .NET Core) /WindowsC# (Framework & .NET Core)/Windows

自定义/代理方案Custom / agent scenarios

语音 SDK 可用于转录呼叫中心方案,其中生成了电话服务数据。The Speech SDK can be used for transcribing call center scenarios, where telephony data is generated.

呼叫中心听录Call Center Transcription

呼叫中心 脚本是一种常见的语音到文本转录,适用于来自各种系统(如交互式语音响应 (IVR) 的大量电话服务数据。Call Center Transcription is common scenario for speech-to-text for transcribing large volumes of telephony data that may come from various systems, such as Interactive Voice Response (IVR). 语音服务的最新语音识别模型非常擅长听录这些电话数据,即使是人类也难以识别的数据。The latest speech recognition models from the Speech service excel at transcribing this telephony data, even in cases when the data is difficult for a human to understand.

Call Center 脚本通过 Batch Speech Service 通过其 REST API 提供,可以在任何情况下使用。Call Center Transcription is available through the Batch Speech Service via its REST API and can be used in any situation.

编解码器压缩的音频输入Codec compressed audio input

一些语音 SDK 编程语言支持编解码器压缩的音频输入流。Several of the Speech SDK programming languages support codec compressed audio input streams. 有关详细信息,请参阅使用压缩的音频输入格式 For more information, see use compressed audio input formats .

可在以下平台上使用编解码器压缩的音频输入:Codec compressed audio input is available on the following platforms:

  • C++/LinuxC++/Linux
  • C#/LinuxC#/Linux
  • Java/Linux、Android 和 iOSJava/Linux, Android, and iOS

REST APIREST API

虽然语音 SDK 涵盖了语音服务的许多功能,但对于某些方案,你可能需要使用 REST API。While the Speech SDK covers many feature capabilities of the Speech Service, for some scenarios you might want to use the REST API.

批量听录Batch transcription

使用批量听录能够以异步方式对大量的数据进行语音转文本听录。Batch transcription enables asynchronous speech-to-text transcription of large volumes of data. 只能通过 REST API 使用批量听录。Batch transcription is only possible from the REST API. 除了将语音音频转换为文本,批量语音转文本还允许进行分割聚类和情感分析。In addition to converting speech audio to text, batch speech-to-text also allows for diarization and sentiment-analysis.

自定义Customization

语音服务在语音转文本、文本转语音和语音翻译方面提供了强大的功能和默认模型。The Speech Service delivers great functionality with its default models across speech-to-text, text-to-speech, and speech-translation. 有时,你可能希望提高基线性能,以便更好地处理你的独特用例。Sometimes you may want to increase the baseline performance to work even better with your unique use case. 语音服务有各种各样的无代码自定义工具,这些工具使上述事项变得简单,并使你能够使用基于你自己的数据的自定义模型获得竞争优势。The Speech Service has a variety of no-code customization tools that make it easy, and allow you to create a competitive advantage with custom models based on your own data. 这些模型将仅供你和你的组织使用。These models will only be available to you and your organization.

自定义语音转文本Custom Speech-to-text

使用语音转文本在独特的环境中进行识别和听录时,可以创建并训练自定义的声学、语言和发音模型,以解决环境干扰或行业特定的词汇的问题。When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. 可通过自定义语音识别门户来创建和管理无代码自定义语音识别模型。The creation and management of no-code Custom Speech models is available through the Custom Speech Portal. 自定义语音识别模型在发布后可以由语音 SDK 使用。Once the Custom Speech model is published, it can be consumed by the Speech SDK.

自定义文本到语音转换Custom Text-to-speech

自定义文本到语音功能(也称为自定义语音)是一组联机工具,可用于为品牌创建可识别的一种类型的声音。Custom text-to-speech, also known as Custom Voice is a set of online tools that allow you to create a recognizable, one-of-a-kind voice for your brand. 可以通过 自定义语音门户来创建和管理无代码自定义语音模型。The creation and management of no-code Custom Voice models is available through the Custom Voice Portal. 自定义语音模型发布后,它可以由语音 SDK 使用。Once the Custom Voice model is published, it can be consumed by the Speech SDK.

获取语音 SDKGet the Speech SDK

语音 SDK 支持 Windows 10 和 Windows Server 2016 或更高版本。The Speech SDK supports Windows 10 and Windows Server 2016, or later versions. 以前的版本不受官方支持 。Earlier versions are not officially supported. 部分语音 SDK 可以在早期版本的 Windows 中使用,但不建议这样做。It is possible to use parts of the Speech SDK with earlier versions of Windows, although it's not advised.


Windows

系统要求System requirements

Windows 版语音 SDK 要求系统上安装有 Microsoft Visual C++ Redistributable for Visual Studio 2019 The Speech SDK on Windows requires the Microsoft Visual C++ Redistributable for Visual Studio 2019 on the system.

C#C#

.NET 语音 SDK 以 NuGet 包的形式提供并实现了 .NET Standard 2.0。有关详细信息,请参阅 Microsoft.CognitiveServices.Speech The .NET Speech SDK is available as a NuGet package and implements .NET Standard 2.0, for more information, see Microsoft.CognitiveServices.Speech .


C#

C# NuGet 包C# NuGet Package

可以使用以下 dotnet add 命令从 .NET Core CLI 安装 .NET 语音 SDK。The .NET Speech SDK can be installed from the .NET Core CLI with the following dotnet add command.

dotnet add package Microsoft.CognitiveServices.Speech

可以使用以下 Install-Package 命令从包管理器 安装 .NET 语音 SDK。The .NET Speech SDK can be installed from the Package Manager with the following Install-Package command.

Install-Package Microsoft.CognitiveServices.Speech

其他资源Additional resources

对于麦克风输入,必须安装媒体基础库。For microphone input, the Media Foundation libraries must be installed. 这些库包含在 Windows 10 和 Windows Server 2016 中。These libraries are part of Windows 10 and Windows Server 2016. 只要未将麦克风用作音频输入设备,则可在没有这些库的情况下使用语音 SDK。It's possible to use the Speech SDK without these libraries, as long as a microphone isn't used as the audio input device.

所需语音 SDK 文件可部署在与应用程序相同的目录中。The required Speech SDK files can be deployed in the same directory as your application. 这样,应用程序便可直接访问库。This way your application can directly access the libraries. 请确保选择与应用程序匹配的正确版本 (x86/x64)。Make sure you select the correct version (x86/x64) that matches your application.

名称Name 函数Function
Microsoft.CognitiveServices.Speech.core.dll 核心 SDK,对于本机和托管部署是必需的Core SDK, required for native and managed deployment
Microsoft.CognitiveServices.Speech.csharp.dll 对于托管部署是必需的Required for managed deployment

备注

从版本 1.3.0 开始,不再需要 Microsoft.CognitiveServices.Speech.csharp.bindings.dll 文件(在以前的版本中提供)。Starting with the release 1.3.0 the file Microsoft.CognitiveServices.Speech.csharp.bindings.dll (shipped in previous releases) isn't needed anymore. 此功能现在集成到核心 SDK 中。The functionality is now integrated in the core SDK.

重要

对于 Windows 窗体应用 (.NET Framework) C# 项目,请确保项目的部署设置中包含这些库。For the Windows Forms App (.NET Framework) C# project, make sure the libraries are included in your project's deployment settings. 你可以在 Properties -> Publish Section 下查看此内容。You can check this under Properties -> Publish Section. 单击 Application Files 按钮并从向下滚动列表中查找相应的库。Click the Application Files button and find corresponding libraries from the scroll down list. 请确保将值设置为 IncludedMake sure the value is set to Included. Visual Studio 将在发布/部署项目时包含该文件。Visual Studio will include the file when project is published/deployed.

C++C++

C++ 语音 SDK 在 Windows、Linux 和 macOS 上可用。The C++ Speech SDK is available on Windows, Linux, and macOS. 有关详细信息,请参阅 Microsoft.CognitiveServices.Speech For more information, see Microsoft.CognitiveServices.Speech .


C++

C++ NuGet 包C++ NuGet package

可以使用以下 Install-Package 命令从包管理器 安装 C++ 语音 SDK。The C++ Speech SDK can be installed from the Package Manager with the following Install-Package command.

Install-Package Microsoft.CognitiveServices.Speech

C++ 二进制文件和头文件C++ binaries and header files

也可以从二进制文件安装 C++ 语音 SDK。Alternatively, the C++ Speech SDK can be installed from binaries. 将 SDK 下载为 .tar 包 ,并将文件解压缩到所选的一个目录中。Download the SDK as a .tar package and unpack the files in a directory of your choice. 此包的内容(包括 x86 和 x64 目标体系结构的头文件)的结构如下所示:The contents of this package (which include header files for both x86 and x64 target architectures) are structured as follows:

PathPath 说明Description
license.md 许可License
ThirdPartyNotices.md 第三方声明Third-party notices
include 用于 C++ 的头文件Header files for C++
lib/x64 用于与应用程序链接的本机 x64 库Native x64 library for linking with your application
lib/x86 用于与应用程序链接的本机 x86 库Native x86 library for linking with your application

要创建应用程序,请将必需的二进制文件(以及库)复制到开发环境中。To create an application, copy or move the required binaries (and libraries) into your development environment. 在生成过程中根据需要添加它们。Include them as required in your build process.

其他资源Additional resources

PythonPython

Python 语音 SDK 以 Python 包索引 (PyPI) 模块的形式提供。有关详细信息,请参阅 azure-cognitiveservices-speech The Python Speech SDK is available as a Python Package Index (PyPI) module, for more information, see azure-cognitiveservices-speech . Python 语音 SDK 与 Windows、Linux 和 macOS 兼容。The Python Speech SDK is compatible with Windows, Linux, and macOS.


Python
pip install azure-cognitiveservices-speech

提示

如果在 macOS 上操作,可能需要运行以下命令才能让上述 pip 命令生效:If you are on macOS, you may need to run the following command to get the pip command above to work:

python3 -m pip install --upgrade pip

其他资源Additional resources

JavaJava

Java SDK for Android 打包为 AAR(Android 库),其中包括必要的库以及所需的 Android 权限。The Java SDK for Android is packaged as an AAR (Android Library) , which includes the necessary libraries and required Android permissions. 它作为包 com.microsoft.cognitiveservices.speech:client-sdk:1.14.0 托管在 https://csspeechstorage.blob.core.windows.net/maven/ 的 Maven 存储库中。It's hosted in a Maven repository at https://csspeechstorage.blob.core.windows.net/maven/ as package com.microsoft.cognitiveservices.speech:client-sdk:1.14.0.


Java

若要从你的 Android Studio 项目中使用该包,请进行以下更改:To consume the package from your Android Studio project, make the following changes:

  1. 在项目级 build.gradle 文件中,向 repositories 部分添加以下内容:In the project-level build.gradle file, add the following to the repositories section:
maven { url 'https://csspeechstorage.blob.core.windows.net/maven/' }
  1. 在模块级 build.gradle 文件中,向 dependencies 部分添加以下内容:In the module-level build.gradle file, add the following to the dependencies section:
implementation 'com.microsoft.cognitiveservices.speech:client-sdk:1.14.0'

Java SDK 也是语音设备 SDK 的一部分。The Java SDK is also part of the Speech Devices SDK.

其他资源Additional resources

重要

下载任何 Azure 认知服务语音 SDK,即表示你已确认接受其许可条款。By downloading any of the Azure Cognitive Services Speech SDKs, you acknowledge its license. 有关详细信息,请参阅:For more information, see:

示例源代码Sample source code

语音 SDK 团队在一个开源存储库中积极维护大量的示例。The Speech SDK team actively maintains a large set of examples in an open-source repository. 有关示例源代码存储库,请访问 GitHub 上的 Microsoft 认知服务语音 SDKFor the sample source code repository, visit the Microsoft Cognitive Services Speech SDK on GitHub . 其中有适用于 C#、C++、Java、Python、Objective-C、Swift、JavaScript、UWP、Unity 和 Xamarin 的示例。There are samples for C#, C++, Java, Python, Objective-C, Swift, JavaScript, UWP, Unity, and Xamarin.


GitHub

后续步骤Next steps