speech Package

Microsoft Speech SDK for Python

Modules

audio

Classes that are concerned with the handling of audio input to the various recognizers, and audio output from the speech synthesizer.

dialog

Classes related to dialog service connector.

intent

Classes related to intent recognition from speech.

languageconfig

Classes that are concerned with the handling of language configurations

speech

Classes related to recognizing text from speech, synthesizing speech from text, and general classes used in the various recognizers.

speech_py_impl
transcription

Classes related to conversation transcription.

translation

Classes related to translation of speech to other languages.

version

Classes

AudioDataStream

Represents audio data stream used for operating audio data as a stream.

Generates an audio data stream from a speech synthesis result (type SpeechSynthesisResult) or a keyword recognition result (type KeywordRecognitionResult).

AutoDetectSourceLanguageResult

Represents auto detection source language result.

The result can be initialized from a speech recognition result.

CancellationDetails

Contains detailed information about why a result was canceled.

Connection

Proxy class for managing the connection to the speech service of the specified Recognizer.

By default, a Recognizer autonomously manages connection to service when needed. The Connection class provides additional methods for users to explicitly open or close a connection and to subscribe to connection status changes. The use of Connection is optional. It is intended for scenarios where fine tuning of application behavior based on connection status is needed. Users can optionally call open to manually initiate a service connection before starting recognition on the Recognizer associated with this Connection. After starting a recognition, calling open or close might fail. This will not impact the Recognizer or the ongoing recognition. Connection might drop for various reasons, the Recognizer will always try to reinstitute the connection as required to guarantee ongoing operations. In all these cases connected/disconnected events will indicate the change of the connection status.

Note

Updated in version 1.17.0.

ConnectionEventArgs

Provides data for the ConnectionEvent.

Note

Added in version 1.2.0

EventSignal

Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events.

KeywordRecognitionEventArgs

Class for keyword recognition event arguments.

KeywordRecognitionModel

Represents a keyword recognition model.

KeywordRecognitionResult

Result of a keyword recognition operation.

KeywordRecognizer

A keyword recognizer.

NoMatchDetails

Detailed information for NoMatch recognition results.

PhraseListGrammar

Class that allows runtime addition of phrase hints to aid in speech recognition.

Phrases added to the recognizer are effective at the start of the next recognition, or the next time the speech recognizer must reconnect to the speech service.

Note

Added in version 1.5.0.

PronunciationAssessmentConfig

Represents pronunciation assessment configuration

Note

Added in version 1.14.0.

The configuration can be initialized in two ways:

  • from parameters: pass reference text, grading system, granularity, enable miscue and scenario id.

  • from json: pass a json string

For the parameters details, see https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-speech-to-text#pronunciation-assessment-parameters

PronunciationAssessmentPhonemeResult

Contains phoneme level pronunciation assessment result

Note

Added in version 1.14.0.

PronunciationAssessmentResult

Represents pronunciation assessment result.

Note

Added in version 1.14.0.

The result can be initialized from a speech recognition result.

PronunciationAssessmentWordResult

Contains word level pronunciation assessment result

Note

Added in version 1.14.0.

PropertyCollection

Class to retrieve or set a property value from a property collection.

RecognitionEventArgs

Provides data for the RecognitionEvent.

RecognitionResult

Detailed information about the result of a recognition operation.

Recognizer

Base class for different recognizers

ResultFuture

The result of an asynchronous operation.

SessionEventArgs

Base class for session event arguments.

SourceLanguageRecognizer

A source language recognizer - standalone language recognizer, can be used for single language or continuous language detection.

Note

Added in version 1.18.0.

SpeechConfig

Class that defines configurations for speech / intent recognition and speech synthesis.

The configuration can be initialized in different ways:

  • from subscription: pass a subscription key and a region

  • from endpoint: pass an endpoint. Subscription key or authorization token are optional.

  • from host: pass a host address. Subscription key or authorization token are optional.

  • from authorization token: pass an authorization token and a region

SpeechRecognitionCanceledEventArgs

Class for speech recognition canceled event arguments.

SpeechRecognitionEventArgs

Class for speech recognition event arguments.

SpeechRecognitionResult

Base class for speech recognition results.

SpeechRecognizer

A speech recognizer. If you need to specify source language information, please only specify one of these three parameters, language, source_language_config or auto_detect_source_language_config.

SpeechSynthesisBookmarkEventArgs

Class for speech synthesis bookmark event arguments.

Note

Added in version 1.16.0.

SpeechSynthesisCancellationDetails
SpeechSynthesisEventArgs

Class for speech synthesis event arguments.

SpeechSynthesisResult

Result of a speech synthesis operation.

SpeechSynthesisVisemeEventArgs

Class for speech synthesis viseme event arguments.

Note

Added in version 1.16.0.

SpeechSynthesisWordBoundaryEventArgs

Class for speech synthesis word boundary event arguments.

Note

Updated in version 1.21.0.

SpeechSynthesizer

A speech synthesizer.

SyllableLevelTimingResult

Contains syllable level timing result

Note

Added in version 1.20.0.

SynthesisVoicesResult

Contains detailed information about the retrieved synthesis voices list.

Note

Added in version 1.16.0.

VoiceInfo

Contains detailed information about the synthesis voice information.

Note

Updated in version 1.17.0.

Enums

AudioStreamContainerFormat

Supported audio input container formats.

Note

Added in version 1.13.0.

Values:

CancellationErrorCode

Defines error code in case that CancellationReason is Error.

Values:

CancellationReason

Defines the possible reasons a recognition result might be

canceled.

Values:

NoMatchReason

Defines the possible reasons a recognition result might not be

recognized.

Values:

OutputFormat

Values:

ProfanityOption

Defines the setting for the profanity filter.

Note

Added in version 1.5.0.

Values:

PronunciationAssessmentGradingSystem

Defines the point system for pronunciation score calibration; default value is FivePoint.

Note

Added in version 1.14.0.

Values:

PronunciationAssessmentGranularity

Defines the pronunciation evaluation granularity; default value is Phoneme.

Note

Added in version 1.14.0.

Values:

PropertyId

Defines speech property ids.

Values:

SpeechServiceConnection_Key

  The Cognitive Services Speech Service subscription key. If you are using

  an intent recognizer, you need to specify the LUIS endpoint key for your

  particular LUIS app. Under normal circumstances, you shouldn't have to

  use this property directly. Instead, construct a

  <xref:azure.cognitiveservices.speech.SpeechConfig> instance from a subscription key.

SpeechServiceConnection_Endpoint

  The Cognitive Services Speech Service endpoint (url). Under normal

  circumstances, you shouldn't have to use this property directly. Instead,

  construct a <xref:azure.cognitiveservices.speech.SpeechConfig> instance from a subscription key.


  > [!NOTE]
  > This endpoint is not the same as the endpoint used to obtain an access token.
  >

SpeechServiceConnection_Region

  The Cognitive Services Speech Service region. Under normal circumstances,

  you shouldn't have to use this property directly. Instead, construct a

  <xref:azure.cognitiveservices.speech.SpeechConfig> instance from a subscription key, an endpoint, a host,

  or an authorization token.

SpeechServiceAuthorization_Token

  The Cognitive Services Speech Service authorization token (aka access

  token). Under normal circumstances, you shouldn't have to use this

  property directly. Instead, construct a <xref:azure.cognitiveservices.speech.SpeechConfig>

  instance from an authorization token, or set

  <xref:azure.cognitiveservices.speech.Recognizer.authorization_token>.

SpeechServiceAuthorization_Type

  The Cognitive Services Speech Service authorization type. Currently

  unused.

SpeechServiceConnection_EndpointId

  The Cognitive Services Custom Speech or Custom Voice Service endpoint id.

  Under normal circumstances, you shouldn't have to use this property directly.

  Instead set <xref:azure.cognitiveservices.speech.SpeechConfig.endpoint_id>.


  > [!NOTE]
  > The endpoint id is available in the Custom Speech Portal, listed under
  >
  > 
  >
  > Endpoint Details.
  >

SpeechServiceConnection_Host

  The Cognitive Services Speech Service host (url). Under normal

  circumstances, you shouldn't have to use this property directly. Instead,

  construct a <xref:azure.cognitiveservices.speech.SpeechConfig> instance.

SpeechServiceConnection_ProxyHostName

  The host name of the proxy server used to connect to the Cognitive

  Services Speech Service. Under normal circumstances, you shouldn't have

  to use this property directly. Instead, use

  <xref:azure.cognitiveservices.speech.SpeechConfig.set_proxy>.

SpeechServiceConnection_ProxyPort

  The port of the proxy server used to connect to the Cognitive Services

  Speech Service. Under normal circumstances, you shouldn't have to use

  this property directly. Instead, use <xref:azure.cognitiveservices.speech.SpeechConfig.set_proxy>.

SpeechServiceConnection_ProxyUserName

  The user name of the proxy server used to connect to the Cognitive

  Services Speech Service. Under normal circumstances, you shouldn't have

  to use this property directly. Instead, use

  <xref:azure.cognitiveservices.speech.SpeechConfig.set_proxy>.

SpeechServiceConnection_ProxyPassword

  The password of the proxy server used to connect to the Cognitive

  Services Speech Service. Under normal circumstances, you shouldn't have

  to use this property directly. Instead, use

  <xref:azure.cognitiveservices.speech.SpeechConfig.set_proxy>.

SpeechServiceConnection_Url

  The URL string built from speech configuration. This property is intended

  to be read-only. The SDK is using it internally.


  > [!NOTE]
  > This property id was added in version 1.5.0.
  >

SpeechServiceConnection_TranslationToLanguages

  The list of comma separated languages used as target translation

  languages. Under normal circumstances, you shouldn't have to use this

  property directly. Instead use

  <xref:azure.cognitiveservices.speech.speech_py_impl.SpeechTranslationConfig.add_target_language> and

  <xref:azure.cognitiveservices.speech.translation.SpeechTranslationConfig.target_languages>.

SpeechServiceConnection_TranslationVoice

  The name of the Cognitive Service Text to Speech Service voice. Under

  normal circumstances, you shouldn't have to use this property directly.

  Instead set <xref:azure.cognitiveservices.speech.translation.SpeechTranslationConfig.voice_name>.


  > [!NOTE]
  > Valid voice names can be found [here](https://aka.ms/csspeech/voicenames).
  >

SpeechServiceConnection_TranslationFeatures

  Translation features. For internal use.

SpeechServiceConnection_IntentRegion

  The Language Understanding Service region. Under normal circumstances,

  you shouldn't have to use this property directly. Instead use

  <xref:azure.cognitiveservices.speech.intent.LanguageUnderstandingModel>.

SpeechServiceConnection_RecoMode

  The Cognitive Services Speech Service recognition mode. Can be

  "INTERACTIVE", "CONVERSATION", "DICTATION". This property is intended to

  be read-only. The SDK is using it internally.

SpeechServiceConnection_RecoLanguage

  The spoken language to be recognized (in BCP-47 format). Under normal

  circumstances, you shouldn't have to use this property directly. Instead,

  use <xref:azure.cognitiveservices.speech.SpeechConfig.speech_recognition_language>.

Speech_SessionId

  The session id. This id is a universally unique identifier (aka UUID)

  representing a specific binding of an audio input stream and the

  underlying speech recognition instance to which it is bound. Under normal

  circumstances, you shouldn't have to use this property directly. Instead

  use <xref:azure.cognitiveservices.speech.SessionEventArgs.session_id>.

SpeechServiceConnection_SynthLanguage

  The spoken language to be synthesized (e.g. en-US)


  > [!NOTE]
  > This property id was added in version 1.7.0.
  >

SpeechServiceConnection_SynthVoice

  The name of the TTS voice to be used for speech synthesis


  > [!NOTE]
  > This property id was added in version 1.7.0.
  >

SpeechServiceConnection_SynthOutputFormat

  The string to specify TTS output audio format


  > [!NOTE]
  > This property id was added in version 1.7.0.
  >
ResultReason

Specifies the possible reasons a recognition result might be

generated.

Values:

ServicePropertyChannel

Defines channels used to pass property settings to service.

Note

Added in version 1.5.0.

Values:

SpeechSynthesisOutputFormat

Defines the possible speech synthesis output audio format.

Note

Updated in version 1.17.0.

Values:

StreamStatus

Defines the possible status of audio data stream.

Note

Added in version 1.7.0.

Values:

SynthesisVoiceGender

Defines synthesis voice gender.

Note

Added in version 1.17.0.

Values:

SynthesisVoiceType

Defines synthesis voice type.

Note

Added in version 1.16.0.

Values: