什麼是語音服務?What are the Speech Services?

Azure 語音服務會將語音轉文字、文字轉語音及語音翻譯整合至單一 Azure 訂用帳戶。Azure Speech Services are the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. 藉由語音 SDK語音裝置 SDKREST API,可輕易地透過語音來啟用您的應用程式、工具和裝置。It's easy to speech enable your applications, tools, and devices with the Speech SDK, Speech Devices SDK, or REST APIs.


語音服務已取代 Bing 語音 API、翻譯工具語音和自訂語音。Speech Services have replaced Bing Speech API, Translator Speech, and Custom Speech. 如需移轉說明,請參閱 [操作指南] > [移轉] 。See How-to guides > Migration for migration instructions.

Azure 語音服務是由以下功能所組成。These features make up the Azure Speech Services. 請使用此資料表中的連結,深入了解每項功能的常見使用案例,或瀏覽 API 參考。Use the links in this table to learn more about common use cases for each feature or browse the API reference.

語音轉文字Speech-to-Text 語音轉文字Speech-to-text 語音轉文字會即時地將音訊串流轉譯成文字,以便您的應用程式、工具或裝置使用或顯示。Speech-to-text transcribes audio streams to text in real time that your applications, tools, or devices can consume or display. 若搭配 Language Understanding (LUIS) 使用語音轉文字,即可從轉譯的語音衍生使用者意圖,以及根據語音命令執行動作。Use speech-to-text with Language Understanding (LUIS) to derive user intents from transcribed speech and act on voice commands. Yes Yes
批次轉譯Batch Transcription 批次轉譯可讓您非同步地對大量資料進行語音轉文字的轉譯。Batch transcription enables asynchronous speech-to-text transcription of large volumes of data. 這是以 REST 為基礎的服務,其在自訂和模型管理上使用相同端點。This is a REST-based service, which uses same endpoint as customization and model management. No Yes
對話轉譯Conversation Transcription 啟用即時語音辨識、說話者識別和自動分段標記功能。Enables real-time speech recognition, speaker identification, and diarization. 非常適合利用辨識說話者的能力來轉譯面對面會議。It's perfect for transcribing in-person meetings with the ability to distinguish speakers. yesYes No
建立自訂語音模型Create Custom Speech Models 如果您在獨特的環境中使用語音轉文字進行辨識及轉譯,您可以建立並定型自訂原音、語言和發音模型,以處理環境噪音或業界專有詞彙。If you are using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. No Yes
文字轉換語音Text-to-Speech 文字轉換語音Text-to-speech 文字轉語音會使用語音合成標記語言 (SSML) 將輸入文字轉換為仿真人的合成語音。Text-to-speech converts input text into human-like synthesized speech using Speech Synthesis Markup Language (SSML). 可選擇標準語音和類神經語音 (請參閱語言支援)。Choose from standard voices and neural voices (see Language support). Yes Yes
建立自訂語音Create Custom Voices 建立您品牌或產品專有的自訂聲音音調。Create custom voice fonts unique to your brand or product. No Yes
語音翻譯Speech Translation 語音翻譯Speech translation 語音翻譯可讓您在應用程式、工具和裝置上使用即時且多語言的語音翻譯。Speech translation enables real-time, multi-language translation of speech to your applications, tools, and devices. 此服務可用於語音轉語音及語音轉文字翻譯。Use this service for speech-to-speech and speech-to-text translation. Yes No
語音優先虛擬助理Voice-first Virtual Assistants 語音優先虛擬助理Voice-first virtual assistants 使用 Azure 語音服務自訂虛擬助理,賦予開發人員建立自然、擬人的對話介面,供應用程式和體驗之用。Custom virtual assistants using Azure Speech Services empower developers to create natural, human-like conversational interfaces for their applications and experiences. Bot Framework 的 Direct Line Speech 頻道提供相容 Bot 有組織且協調的進入點,具備低延遲性和高可靠性的互動語音功能。The Bot Framework's Direct Line Speech channel enhances these capabilities by providing a coordinated, orchestrated entry point to a compatible bot that enables voice in, voice out interaction with low latency and high reliability. Yes No

新功能和更新News and updates

了解 Azure 語音服務的新功能。Learn what's new with the Azure Speech Services.

  • 2019 年 6 月June 2019
    • 發行的語音 SDK 1.6.0。Released Speech SDK 1.6.0. 如需更新、增強功能和已知問題的完整清單,請參閱版本資訊For a full list of updates, enhancements, and known issues, see Release notes.
  • 2019 年 5 月 - 對話轉譯, 話務中心轉譯語音優先虛擬助理的文件目前可供使用。May 2019 - Documentation is now available for Conversation Transcription, Call Center Transcription, and Voice-first Virtual Assistants.
  • 2019 年 5 月May 2019
    • 發行的語音 SDK 1.5.1。Released Speech SDK 1.5.1. 如需更新、增強功能和已知問題的完整清單,請參閱版本資訊For a full list of updates, enhancements, and known issues, see Release notes.
    • 發行的語音 SDK 1.5.0。Released Speech SDK 1.5.0. 如需更新、增強功能和已知問題的完整清單,請參閱版本資訊For a full list of updates, enhancements, and known issues, see Release notes.
  • 2019 年 4 月 - 已發行語音 SDK 1.4.0,在 Windows 和 Linux 上可支援適用於 C++、C# 和 Java 的文字轉語音搶鮮版 (Beta)。April 2019 - Released Speech SDK 1.4.0 with support for text-to-speech (Beta) for C++, C#, and Java on Windows and Linux. 此外,此 SDK 目前在 Linux 上已支援適用於 C++ 和 C# 的 MP3 和 Opus/Ogg 音訊格式。Additionally, the SDK now supports MP3 and Opus/Ogg audio formats for C++ and C# on Linux. 如需更新、增強功能和已知問題的完整清單,請參閱版本資訊For a full list of updates, enhancements, and known issues, see Release notes.
  • 2019 年 3 月 - 現已推出新的文字轉語音 (TTS) 端點,其可傳回特定區域中可用的完整語音清單。March 2019 - A new endpoint for text-to-speech (TTS) that returns a full list of voices available in a specific region is now available. 此外,TTS 現在支援新的區域。Additionally, new regions are now supported for TTS. 如需詳細資訊,請參閱文字轉語音 API 參考 (REST)For more information, see Text-to-speech API reference (REST).

試試語音服務Try Speech Services

我們以最受歡迎的程式設計語言提供快速入門,目的是讓您能在 10 分鐘內執行程式碼。We offer quickstarts in most popular programming languages, each designed to have you running code in less than 10 minutes. 此資料表包含每項功能最受歡迎的快速入門。This table contains the most popular quickstarts for each feature. 您可以使用左側導覽列來瀏覽其他語言及平台。Use the left-hand navigation to explore additional languages and platforms.

C#、.NET Core (Windows)C#, .NET Core (Windows) C#、.NET Framework (Windows)C#, .NET Framework (Windows) Java (Windows、Linux)Java (Windows, Linux)
JavaScript (瀏覽器)JavaScript (Browser) C++ (Windows)C++ (Windows) C#、.NET Core (Windows)C#, .NET Core (Windows)
Python (Windows、Linux、macOS)Python (Windows, Linux, macOS) C++ (Linux)C++ (Linux) C#、.NET Framework (Windows)C#, .NET Framework (Windows)
Java (Windows、Linux)Java (Windows, Linux) C++ (Windows)C++ (Windows)


語音轉換文字和文字轉換語音也都具備 REST 端點和相關聯的快速入門。Speech-to-text and text-to-speech also have REST endpoints and associated quickstarts.

當您有機會使用語音服務後,請嘗試使用我們的教學課程,其中會教導您如何使用語音 SDK 和 LUIS 從語音辨識意圖。After you've had a chance to use the Speech Services, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.

取得範例程式碼Get sample code

每個 Azure 語音服務的範例程式碼皆可在 GitHub 上取得。Sample code is available on GitHub for each of the Azure Speech Services. 這些範例包含常見案例,例如:從檔案或資料流讀取音訊、連續辨識、一次性辨識及使用自訂模型。These samples cover common scenarios like reading audio from a file or stream, continuous and single-shot recognition, and working with custom models. 使用下列連結來檢視 SDK 和 REST 範例:Use these links to view SDK and REST samples:

自訂語音體驗Customize your speech experience

Azure 語音服務可順利地與內建模型搭配使用,不過,您可以進一步自訂及調整體驗,以搭配您的產品或環境。Azure Speech Services works well with built-in models, however, you may want to further customize and tune the experience for your product or environment. 從原音模型調整到專屬於自身品牌的獨特聲音音調,都是自訂選項的範圍。Customization options range from acoustic model tuning to unique voice fonts for your brand. 建立自訂模型之後,您可以將其與任一 Azure 語音服務搭配使用。After you've built a custom model, you can use it with any of the Azure Speech Services.

語音服務Speech Service 模型Model 說明Description
語音轉文字Speech-to-Text 原音模型Acoustic model 針對用於特定環境 (例如汽車或工廠) 的應用程式、工具或裝置建立自訂原音模型,而這每一個的錄音條件都較特殊。Create a custom acoustic model for applications, tools, or devices that are used in particular environments like in a car or on a factory floor, each with specific recording conditions. 例如,帶有口音的語音、特定背景雜音或使用特定麥克風來錄音。Examples include accented speech, specific background noises, or using a specific microphone for recording.
語言模型Language model 建立自訂語言模型來提升特定領域的詞彙和文法轉譯,例如醫療術語或 IT 專業術語。Create a custom language model to improve transcription of field-specific vocabulary and grammar, such as medical terminology, or IT jargon.
發音模型Pronunciation model 使用自訂發音模型,您可以定義語音形式和顯示字組或字詞。With a custom pronunciation model, you can define the phonetic form and display of a word or term. 它可用於處理自訂的字詞,如產品名稱或縮略字。It's useful for handling customized terms, such as product names or acronyms. 您只需要有發音檔 - 簡單的 .txt 檔。All you need to get started is a pronunciation file -- a simple .txt file.
文字轉語音Text-to-Speech 聲音音調Voice font 自訂聲音音調可讓您為自己的品牌建立可辨識的獨特聲音。Custom voice fonts allow you to create a recognizable, one-of-a-kind voice for your brand. 只需少量資料即可開始建立。It only takes a small amount of data to get started. 提供的資料愈多,您的聲音音調聽起來就愈自然且愈像真人。The more data that you provide, the more natural and human-like your voice font will sound.

參考文件Reference docs

