Namespace Microsoft::CognitiveServices::Speech

Summary

Members Descriptions
enum PropertyId Defines speech property ids. Changed in version 1.4.0.
enum OutputFormat
enum ProfanityOption
enum ResultReason Specifies the possible reasons a recognition result might be generated.
enum CancellationReason Defines the possible reasons a recognition result might be canceled.
enum CancellationErrorCode Defines error code in case that CancellationReason is Error. Added in version 1.1.0.
enum NoMatchReason Defines the possible reasons a recognition result might not be recognized.
enum ActivityJSONType Defines the possible types for an activity json value. Added in version 1.5.0
enum SpeechSynthesisOutputFormat Defines the possible speech synthesis output audio formats. Added in version 1.4.0
enum StreamStatus Defines the possible status of audio data stream. Added in version 1.4.0
enum ServicePropertyChannel Defines channels used to pass property settings to service. Added in version 1.5.0.
enum VoiceProfileType Defines voice profile types
enum RecognitionFactorScope Defines the scope that a Recognition Factor is applied to.
enum EnrollmentInfoType A enum that represents the timing information of an enrollment. Added in version 1.12.0.
class AsyncRecognizer AsyncRecognizer abstract base class.
class AudioDataStream Represents audio data stream used for operating audio data as a stream. Added in version 1.4.0
class AutoDetectSourceLanguageConfig Class that defines auto detection source configuration Updated in 1.13.0
class AutoDetectSourceLanguageResult Contains auto detected source language result Added in 1.8.0
class BaseAsyncRecognizer BaseAsyncRecognizer class.
class CancellationDetails Contains detailed information about why a result was canceled.
class ClassLanguageModel Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0.
class Connection Connection is a proxy class for managing connection to the speech service of the specified Recognizer. By default, a Recognizer autonomously manages connection to service when needed. The Connection class provides additional methods for users to explicitly open or close a connection and to subscribe to connection status changes. The use of Connection is optional. It is intended for scenarios where fine tuning of application behavior based on connection status is needed. Users can optionally call Open() to manually initiate a service connection before starting recognition on the Recognizer associated with this Connection. After starting a recognition, calling Open() or Close() might fail. This will not impact the Recognizer or the ongoing recognition. Connection might drop for various reasons, the Recognizer will always try to reinstitute the connection as required to guarantee ongoing operations. In all these cases Connected/Disconnected events will indicate the change of the connection status. Added in version 1.2.0.
class ConnectionEventArgs Provides data for the ConnectionEvent. Added in version 1.2.0.
class ConnectionMessage ConnectionMessage represents implementation specific messages sent to and received from the speech service. These messages are provided for debugging purposes and should not be used for production use cases with the Azure Cognitive Services Speech Service. Messages sent to and received from the Speech Service are subject to change without notice. This includes message contents, headers, payloads, ordering, etc. Added in version 1.10.0.
class ConnectionMessageEventArgs Provides data for the ConnectionMessageEvent
class EventArgs Base class for event arguments.
class EventSignal Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events.
class Grammar Represents base class grammar for customizing speech recognition. Added in version 1.5.0.
class GrammarList Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0.
class GrammarPhrase Represents a phrase that may be spoken by the user. Added in version 1.5.0.
class KeywordRecognitionEventArgs Class for the events emmited by the KeywordRecognizer.
class KeywordRecognitionModel Represents keyword recognition model used with StartKeywordRecognitionAsync methods.
class KeywordRecognitionResult Class that defines the results emitted by the KeywordRecognizer.
class KeywordRecognizer Recognizer type that is specialized to only handle keyword activation.
class NoMatchDetails Contains detailed information for NoMatch recognition results.
class PhraseListGrammar Represents a phrase list grammar for dynamic grammar scenarios. Added in version 1.5.0.
class PropertyCollection Class to retrieve or set a property value from a property collection.
class RecognitionEventArgs Provides data for the RecognitionEvent.
class RecognitionResult Contains detailed information about result of a recognition operation.
class Recognizer Recognizer base class.
class SessionEventArgs Base class for session event arguments.
class SmartHandle Smart handle class.
class SourceLanguageConfig Class that defines source language configuration, added in 1.8.0
class SpeakerIdentificationModel Represents speaker identification model used with speaker recognition class. Added in version 1.12.0
class SpeakerRecognitionCancellationDetails Represents the details of a canceled speaker recognition result.
class SpeakerRecognitionResult Represents speaker recognition result. Added in 1.12.0
class SpeakerRecognizer Perform speaker recognition. Added in version 1.12.0
class SpeakerVerificationModel Represents speaker verification model used with speaker recognition class. Added in version 1.12.0
class SpeechConfig Class that defines configurations for speech / intent recognition, or speech synthesis.
class SpeechRecognitionCanceledEventArgs Class for speech recognition canceled event arguments.
class SpeechRecognitionEventArgs Class for speech recognition event arguments.
class SpeechRecognitionResult Base class for speech recognition results.
class SpeechRecognizer Class for speech recognizers.
class SpeechSynthesisCancellationDetails Contains detailed information about why a result was canceled. Added in version 1.4.0
class SpeechSynthesisEventArgs Class for speech synthesis event arguments. Added in version 1.4.0
class SpeechSynthesisResult Contains information about result from text-to-speech synthesis. Added in version 1.4.0
class SpeechSynthesisWordBoundaryEventArgs Class for speech synthesis word boundary event arguments. Added in version 1.7.0
class SpeechSynthesizer Class for speech synthesizer. Updated in version 1.13.0
class VoiceProfile Class for VoiceProfile. Added in version 1.12.0
class VoiceProfileCancellationDetails Class for VoiceProfileCancellationDetails. This class represents error details of a voice profile result.
class VoiceProfileClient Class for VoiceProfileClient. This class creates voice profile client for creating, doing enrollment, deleting and reseting a voice profile. Added in version 1.12.0
class VoiceProfileEnrollmentCancellationDetails Represents the cancellation details of a result of an enrollment. Added in version 1.12.0.
class VoiceProfileEnrollmentResult Represents the result of an enrollment. Added in version 1.12.0.
class VoiceProfileResult Class for VoiceProfileResult. This class represents the result of processing voice profiles. Added in version 1.12.0

Members

enum PropertyId

Values Descriptions
SpeechServiceConnection_Key The Cognitive Services Speech Service subscription key. If you are using an intent recognizer, you need to specify the LUIS endpoint key for your particular LUIS app. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription.
SpeechServiceConnection_Endpoint The Cognitive Services Speech Service endpoint (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromEndpoint. NOTE: This endpoint is not the same as the endpoint used to obtain an access token.
SpeechServiceConnection_Region The Cognitive Services Speech Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription, SpeechConfig::FromEndpoint, SpeechConfig::FromHost, SpeechConfig::FromAuthorizationToken.
SpeechServiceAuthorization_Token The Cognitive Services Speech Service authorization token (aka access token). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromAuthorizationToken, SpeechRecognizer::SetAuthorizationToken, IntentRecognizer::SetAuthorizationToken, TranslationRecognizer::SetAuthorizationToken.
SpeechServiceAuthorization_Type The Cognitive Services Speech Service authorization type. Currently unused.
SpeechServiceConnection_EndpointId The Cognitive Services Custom Speech Service endpoint id. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetEndpointId. NOTE: The endpoint id is available in the Custom Speech Portal, listed under Endpoint Details.
SpeechServiceConnection_Host The Cognitive Services Speech Service host (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromHost.
SpeechServiceConnection_ProxyHostName The host name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyPort The port of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyUserName The user name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyPassword The password of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_Url The URL string built from speech configuration. This property is intended to be read-only. The SDK is using it internally. NOTE: Added in version 1.5.0.
SpeechServiceConnection_TranslationToLanguages The list of comma separated languages used as target translation languages. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::AddTargetLanguage and SpeechTranslationConfig::GetTargetLanguages.
SpeechServiceConnection_TranslationVoice The name of the Cognitive Service Text to Speech Service voice. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::SetVoiceName. NOTE: Valid voice names can be found here.
SpeechServiceConnection_TranslationFeatures Translation features. For internal use.
SpeechServiceConnection_IntentRegion The Language Understanding Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead use LanguageUnderstandingModel.
SpeechServiceConnection_RecoMode The Cognitive Services Speech Service recognition mode. Can be "INTERACTIVE", "CONVERSATION", "DICTATION". This property is intended to be read-only. The SDK is using it internally.
SpeechServiceConnection_RecoLanguage The spoken language to be recognized (in BCP-47 format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetSpeechRecognitionLanguage.
Speech_SessionId The session id. This id is a universally unique identifier (aka UUID) representing a specific binding of an audio input stream and the underlying speech recognition instance to which it is bound. Under normal circumstances, you shouldn't have to use this property directly. Instead use SessionEventArgs::SessionId.
SpeechServiceConnection_UserDefinedQueryParameters The query parameters provided by users. They will be passed to service as URL query parameters. Added in version 1.5.0
SpeechServiceConnection_SynthLanguage The spoken language to be synthesized (e.g. en-US) Added in version 1.4.0
SpeechServiceConnection_SynthVoice The name of the TTS voice to be used for speech synthesis Added in version 1.4.0
SpeechServiceConnection_SynthOutputFormat The string to specify TTS output audio format Added in version 1.4.0
SpeechServiceConnection_InitialSilenceTimeoutMs The initial silence timeout value (in milliseconds) used by the service. Added in version 1.5.0
SpeechServiceConnection_EndSilenceTimeoutMs The end silence timeout value (in milliseconds) used by the service. Added in version 1.5.0
SpeechServiceConnection_EnableAudioLogging A boolean value specifying whether audio logging is enabled in the service or not. Added in version 1.5.0
SpeechServiceConnection_AutoDetectSourceLanguages The auto detect source languages Added in version 1.8.0
SpeechServiceConnection_AutoDetectSourceLanguageResult The auto detect source language result Added in version 1.8.0
SpeechServiceResponse_RequestDetailedResultTrueFalse The requested Cognitive Services Speech Service response output format (simple or detailed). Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetOutputFormat.
SpeechServiceResponse_RequestProfanityFilterTrueFalse The requested Cognitive Services Speech Service response output profanity level. Currently unused.
SpeechServiceResponse_ProfanityOption The requested Cognitive Services Speech Service response output profanity setting. Allowed values are "masked", "removed", and "raw". Added in version 1.5.0.
SpeechServiceResponse_PostProcessingOption A string value specifying which post processing option should be used by service. Allowed values are "TrueText". Added in version 1.5.0
SpeechServiceResponse_RequestWordLevelTimestamps A boolean value specifying whether to include word-level timestamps in the response result. Added in version 1.5.0
SpeechServiceResponse_StablePartialResultThreshold The number of times a word has to be in partial results to be returned. Added in version 1.5.0
SpeechServiceResponse_OutputFormatOption A string value specifying the output format option in the response result. Internal use only. Added in version 1.5.0.
SpeechServiceResponse_TranslationRequestStablePartialResult A boolean value to request for stabilizing translation partial results by omitting words in the end. Added in version 1.5.0.
SpeechServiceResponse_JsonResult The Cognitive Services Speech Service response output (in JSON format). This property is available on recognition result objects only.
SpeechServiceResponse_JsonErrorDetails The Cognitive Services Speech Service error details (in JSON format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use CancellationDetails::ErrorDetails.
SpeechServiceResponse_RecognitionLatencyMs The recognition latency in milliseconds. Read-only, available on final speech/translation/intent results. This measures the latency between when an audio input is received by the SDK, and the moment the final result is received from the service. The SDK computes the time difference between the last audio fragment from the audio input that is contributing to the final result, and the time the final result is received from the speech service. Added in version 1.3.0.
CancellationDetails_Reason The cancellation reason. Currently unused.
CancellationDetails_ReasonText The cancellation text. Currently unused.
CancellationDetails_ReasonDetailedText The cancellation detailed text. Currently unused.
LanguageUnderstandingServiceResponse_JsonResult The Language Understanding Service response output (in JSON format). Available via IntentRecognitionResult.Properties.
AudioConfig_DeviceNameForCapture The device name for audio capture. Under normal circumstances, you shouldn't have to use this property directly. Instead, use AudioConfig::FromMicrophoneInput. NOTE: This property id was added in version 1.3.0.
AudioConfig_NumberOfChannelsForCapture The number of channels for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_SampleRateForCapture The sample rate (in Hz) for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_BitsPerSampleForCapture The number of bits of each sample for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_AudioSource The audio source. Allowed values are "Microphones", "File", and "Stream". Added in version 1.3.0.
Speech_LogFilename The file name to write logs. Added in version 1.4.0.
Conversation_ApplicationId Identifier used to connect to the backend service. Added in version 1.5.0.
Conversation_DialogType Type of dialog backend to connect to. Added in version 1.7.0.
Conversation_Initial_Silence_Timeout Silence timeout for listening Added in version 1.5.0.
Conversation_From_Id From id to be used on speech recognition activities Added in version 1.5.0.
Conversation_Conversation_Id ConversationId for the session. Added in version 1.8.0.
Conversation_Custom_Voice_Deployment_Ids Comma separated list of custom voice deployment ids. Added in version 1.8.0.
Conversation_Speech_Activity_Template Speech activity template, stamp properties in the template on the activity generated by the service for speech. Added in version 1.10.0.
Conversation_ParticipantId Your participant identifier in the current conversation. Added in version 1.13.0
DataBuffer_TimeStamp The time stamp associated to data buffer written by client when using Pull/Push audio input streams. The time stamp is a 64-bit value with a resolution of 90 kHz. It is the same as the presentation timestamp in an MPEG transport stream. See https://en.wikipedia.org/wiki/Presentation_timestamp Added in version 1.5.0.
DataBuffer_UserId The user id associated to data buffer written by client when using Pull/Push audio input streams. Added in version 1.5.0.

Defines speech property ids. Changed in version 1.4.0.

enum OutputFormat

Values Descriptions
Simple
Detailed

enum ProfanityOption

Values Descriptions
Masked
Removed
Raw

enum ResultReason

Values Descriptions
NoMatch Indicates speech could not be recognized. More details can be found in the NoMatchDetails object.
Canceled Indicates that the recognition was canceled. More details can be found using the CancellationDetails object.
RecognizingSpeech Indicates the speech result contains hypothesis text.
RecognizedSpeech Indicates the speech result contains final text that has been recognized. Speech Recognition is now complete for this phrase.
RecognizingIntent Indicates the intent result contains hypothesis text and intent.
RecognizedIntent Indicates the intent result contains final text and intent. Speech Recognition and Intent determination are now complete for this phrase.
TranslatingSpeech Indicates the translation result contains hypothesis text and its translation(s).
TranslatedSpeech Indicates the translation result contains final text and corresponding translation(s). Speech Recognition and Translation are now complete for this phrase.
SynthesizingAudio Indicates the synthesized audio result contains a non-zero amount of audio data
SynthesizingAudioCompleted Indicates the synthesized audio is now complete for this phrase.
RecognizingKeyword Indicates the speech result contains (unverified) keyword text. Added in version 1.3.0
RecognizedKeyword Indicates that keyword recognition completed recognizing the given keyword. Added in version 1.3.0
SynthesizingAudioStarted Indicates the speech synthesis is now started Added in version 1.4.0
TranslatingParticipantSpeech Indicates the transcription result contains hypothesis text and its translation(s) for other participants in the conversation. Added in version 1.8.0
TranslatedParticipantSpeech Indicates the transcription result contains final text and corresponding translation(s) for other participants in the conversation. Speech Recognition and Translation are now complete for this phrase. Added in version 1.8.0
TranslatedInstantMessage Indicates the transcription result contains the instant message and corresponding translation(s). Added in version 1.8.0
TranslatedParticipantInstantMessage Indicates the transcription result contains the instant message for other participants in the conversation and corresponding translation(s). Added in version 1.8.0
EnrollingVoiceProfile Indicates the voice profile is being enrolling and customers need to send more audio to create a voice profile. Added in version 1.12.0
EnrolledVoiceProfile The voice profile has been enrolled. Added in version 1.12.0
RecognizedSpeakers Indicates successful identification of some speakers. Added in version 1.12.0
RecognizedSpeaker Indicates successfully verified one speaker. Added in version 1.12.0
ResetVoiceProfile Indicates a voice profile has been reset successfully. Added in version 1.12.0
DeletedVoiceProfile Indicates a voice profile has been deleted successfully. Added in version 1.12.0

Specifies the possible reasons a recognition result might be generated.

enum CancellationReason

Values Descriptions
Error Indicates that an error occurred during speech recognition.
EndOfStream Indicates that the end of the audio stream was reached.

Defines the possible reasons a recognition result might be canceled.

enum CancellationErrorCode

Values Descriptions
NoError No error. If CancellationReason is EndOfStream, CancellationErrorCode is set to NoError.
AuthenticationFailure Indicates an authentication error. An authentication error occurs if subscription key or authorization token is invalid, expired, or does not match the region being used.
BadRequest Indicates that one or more recognition parameters are invalid or the audio format is not supported.
TooManyRequests Indicates that the number of parallel requests exceeded the number of allowed concurrent transcriptions for the subscription.
Forbidden Indicates that the free subscription used by the request ran out of quota.
ConnectionFailure Indicates a connection error.
ServiceTimeout Indicates a time-out error when waiting for response from service.
ServiceError Indicates that an error is returned by the service.
ServiceUnavailable Indicates that the service is currently unavailable.
RuntimeError Indicates an unexpected runtime error.
ServiceRedirectTemporary Indicates the Speech Service is temporarily requesting a reconnect to a different endpoint.
ServiceRedirectPermanent Indicates the Speech Service is permanently requesting a reconnect to a different endpoint.

Defines error code in case that CancellationReason is Error. Added in version 1.1.0.

enum NoMatchReason

Values Descriptions
NotRecognized Indicates that speech was detected, but not recognized.
InitialSilenceTimeout Indicates that the start of the audio stream contained only silence, and the service timed out waiting for speech.
InitialBabbleTimeout Indicates that the start of the audio stream contained only noise, and the service timed out waiting for speech.
KeywordNotRecognized Indicates that the spotted keyword has been rejected by the keyword verification service. Added in version 1.5.0.

Defines the possible reasons a recognition result might not be recognized.

enum ActivityJSONType

Values Descriptions
Null
Object
Array
String
Double
UInt
Int
Boolean

Defines the possible types for an activity json value. Added in version 1.5.0

enum SpeechSynthesisOutputFormat

Values Descriptions
Raw8Khz8BitMonoMULaw raw-8khz-8bit-mono-mulaw
Riff16Khz16KbpsMonoSiren riff-16khz-16kbps-mono-siren
Audio16Khz16KbpsMonoSiren audio-16khz-16kbps-mono-siren
Audio16Khz32KBitRateMonoMp3 audio-16khz-32kbitrate-mono-mp3
Audio16Khz128KBitRateMonoMp3 audio-16khz-128kbitrate-mono-mp3
Audio16Khz64KBitRateMonoMp3 audio-16khz-64kbitrate-mono-mp3
Audio24Khz48KBitRateMonoMp3 audio-24khz-48kbitrate-mono-mp3
Audio24Khz96KBitRateMonoMp3 audio-24khz-96kbitrate-mono-mp3
Audio24Khz160KBitRateMonoMp3 audio-24khz-160kbitrate-mono-mp3
Raw16Khz16BitMonoTrueSilk raw-16khz-16bit-mono-truesilk
Riff16Khz16BitMonoPcm riff-16khz-16bit-mono-pcm
Riff8Khz16BitMonoPcm riff-8khz-16bit-mono-pcm
Riff24Khz16BitMonoPcm riff-24khz-16bit-mono-pcm
Riff8Khz8BitMonoMULaw riff-8khz-8bit-mono-mulaw
Raw16Khz16BitMonoPcm raw-16khz-16bit-mono-pcm
Raw24Khz16BitMonoPcm raw-24khz-16bit-mono-pcm
Raw8Khz16BitMonoPcm raw-8khz-16bit-mono-pcm

Defines the possible speech synthesis output audio formats. Added in version 1.4.0

enum StreamStatus

Values Descriptions
Unknown The audio data stream status is unknown
NoData The audio data stream contains no data
PartialData The audio data stream contains partial data of a speak request
AllData The audio data stream contains all data of a speak request
Canceled The audio data stream was canceled

Defines the possible status of audio data stream. Added in version 1.4.0

enum ServicePropertyChannel

Values Descriptions
UriQueryParameter Uses URI query parameter to pass property settings to service.

Defines channels used to pass property settings to service. Added in version 1.5.0.

enum VoiceProfileType

Values Descriptions
TextIndependentIdentification Text independent speaker identification.
TextDependentVerification Text dependent speaker verification.
TextIndependentVerification Text independent verification.

Defines voice profile types

enum RecognitionFactorScope

Values Descriptions
PartialPhrase A Recognition Factor will apply to grammars that can be referenced as individual partial phrases.

Defines the scope that a Recognition Factor is applied to.

enum EnrollmentInfoType

Values Descriptions
EnrollmentsCount Number of enrollment audios accepted for this profile.
EnrollmentsLength Total length of enrollment audios accepted for this profile.
EnrollmentsSpeechLength Summation of pure speech(which is the amount of audio after removing silence and non - speech segments) across all profile enrollments.
RemainingEnrollmentsSpeechLength Amount of pure speech (which is the amount of audio after removing silence and non-speech segments) needed to complete profile enrollment.
RemainingEnrollmentsCount Number of enrollment audios needed to complete profile enrollment.
AudioLength This enrollment audio length in hundred nanoseconds.
AudioSpeechLength This enrollment audio pure speech(which is the amount of audio after removing silence and non - speech segments) length in hundred nanoseconds.

A enum that represents the timing information of an enrollment. Added in version 1.12.0.