什麼是 Bing 語音?What is Bing Speech?

注意

Bing 語音已由新的語音服務與 SDK 取代,前者將於 2019 年 10 月 15 日之後停止服務。The new Speech Service and SDK is replacing Bing Speech, which will no longer work starting October 15, 2019. 如需轉換為語音服務的相關資訊,請參閱從 Bing 語音遷移至語音服務 (英文)。For information on switching to the Speech Service, see Migrating from Bing Speech to the Speech Service.

雲端式 Microsoft Bing 語音 API 可讓開發人員在其應用程式中輕鬆建立支援語音的強大功能,例如語音命令控制、使用自然語音轉換的使用者對話,以及語音文字記錄和聽寫。The cloud-based Microsoft Bing Speech API provides developers an easy way to create powerful speech-enabled features in their applications, like voice command control, user dialog using natural speech conversation, and speech transcription and dictation. Microsoft Speech API 同時支援「語音轉換文字」 和「文字轉換語音」 轉換。The Microsoft Speech API supports both Speech to Text and Text to Speech conversion.

  • 語音轉換文字 API 會將人類語音轉換成可作為輸入或命令來控制應用程式的文字。Speech to Text API converts human speech to text that can be used as input or commands to control your application.
  • 文字轉換語音 API 會將文字轉換成可向應用程式使用者播放的音訊資料流。Text to Speech API converts text to audio streams that can be played back to the user of your application.

語音轉換文字 (語音辨識)Speech to text (speech recognition)

Microsoft 語音辨識 API 會將音訊資料流「轉譯」 成文字,可供應用程式向使用者顯示,或作為命令輸入來執行動作。Microsoft speech recognition API transcribes audio streams into text that your application can display to the user or act upon as command input. 其為開發人員提供兩種可將語音新增到應用程式的方式:REST API Websocket 型用戶端程式庫。It provides two ways for developers to add Speech to their apps: REST APIs or Websocket-based client libraries.

  • REST API:開發人員可從其應用程式對服務使用 HTTP 呼叫,來進行語音辨識。REST APIs: Developers can use HTTP calls from their apps to the service for speech recognition.
  • 用戶端程式庫:如需進階功能,開發人員可以下載 Microsoft Speech 用戶端程式庫,並連結至他們的應用程式。Client libraries: For advanced features, developers can download Microsoft Speech client libraries, and link into their apps. 用戶端程式庫可用於使用不同語言 (C#、Java、JavaScript、ObjectiveC) 的各種平台 (Windows、Android、iOS)。The client libraries are available on various platforms (Windows, Android, iOS) using different languages (C#, Java, JavaScript, ObjectiveC). 與 REST API 不同,用戶端程式庫會使用 Websocket 型通訊協定。Unlike the REST APIs, the client libraries utilize Websocket-based protocol.
使用案例Use cases REST APIsREST APIs 用戶端程式庫Client Libraries
轉換簡短的語音,例如命令 (音訊長度 < 15 秒),但不提供中期結果Convert a short spoken audio, for example, commands (audio length < 15 s) without interim results Yes Yes
轉換長音訊 (> 15 秒)Convert a long audio (> 15 s) No Yes
串流音訊並提供所需的中期結果Stream audio with interim results desired No Yes
使用 LUIS 來理解從音訊轉換的文字Understand the text converted from audio using LUIS No Yes

不論開發人員選擇哪一種方法 (REST API 或用戶端程式庫),Microsoft 語音服務都支援下列各項:Whichever approach developers choose (REST APIs or client libraries), Microsoft speech service supports the following:

  • Cortana、「Office 聽寫」、「Office 翻譯工具」及其他 Microsoft 產品所使用的 Microsoft 進階語音辨識技術。Advanced speech recognition technologies from Microsoft that are used by Cortana, Office Dictation, Office Translator, and other Microsoft products.
  • 即時連續辨識。Real-time continuous recognition. 語音辨識 API 可讓使用者將音訊即時轉譯成文字,並支援接收到目前為止已辨識單字的中繼結果。The speech recognition API enables users to transcribe audio into text in real time, and supports to receive the intermediate results of the words that have been recognized so far. 語音服務也支援語音結束偵測。The speech service also supports end-of-speech detection. 此外,使用者也可以選擇額外的格式設定功能,例如轉換成大寫和標點符號、粗話遮罩,以及文字正規化。In addition, users can choose additional formatting capabilities, like capitalization and punctuation, masking profanity, and text normalization.
  • 針對「互動式」 、「對話」 及「聽寫」 案例,支援最佳化語音辨識結果。Supports optimized speech recognition results for interactive, conversation, and dictation scenarios. 針對需要自訂的語言模型和原音模型的使用者案例,自訂語音服務可讓您建立為應用程式和使用者量身打造的語音模型。For user scenarios which require customized language models and acoustic models, Custom Speech Service allows you to create speech models that tailored to your application and your users.
  • 支援多個方言的眾多口語。Support many spoken languages in multiple dialects. 如需每個辨識模式中所支援語言的完整清單,請參閱辨識語言For the full list of supported languages in each recognition mode, see recognition languages.
  • 與語言理解整合Integration with language understanding. 除了將輸入音訊轉換成文字之外,「語音轉換文字」 還為應用程式提供一項可理解文字意義的額外功能。Besides converting the input audio into text, the Speech to Text provides applications an additional capability to understand what the text means. 它會使用 Language Understanding Intelligent Service (LUIS) 從已辨識的文字中擷取意圖和實體。It uses the Language Understanding Intelligent Service(LUIS) to extract intents and entities from the recognized text.

後續步驟Next steps

文字轉換語音 (語音合成)Text to speech (speech synthesis)

「文字轉換語音 API」 會使用 REST 將結構化文字轉換成音訊資料流。Text to Speech APIs use REST to convert structured text to an audio stream. 這些 API 提供各種語音和語言的快速文字轉換語音轉換。The APIs provide fast text to speech conversion in various voices and languages. 此外,使用者還能夠變更音訊特性,例如發音、音量、音調等In addition users also have the ability to change audio characteristics like pronunciation, volume, pitch etc. (透過使用 SSML 標記來變更)。using SSML tags.

後續步驟Next steps