Glossary (Speech Server)

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

  • abandonment rate
    The percentage of callers who hang up before carrying out a task in an automated system.
  • access control list (ACL)
    A list of security protections that apply to an entire object, a set of the object's properties, or an individual property of an object. There are two types of access control lists: discretionary and system.
  • ACL
    See access control list (ACL).
  • activation order
    The order in which Speech Controls and Application Speech Controls are evaluated for activation.
  • active grammar
    A speech or DTMF grammar that is currently active, based on the currently executing element and the scope elements of the currently defined grammars.
  • Active Server Pages (ASP)
    A server-side scripting environment that is used to create Web pages and build Web applications. ASP files contain Hypertext Markup Language (HTML) tags, text, and script commands. ASPs can call Microsoft ActiveX components to perform tasks, such as connecting to a database, performing a business calculation, or creating a printer Web page.
  • activity
    A .NET managed code object that performs a step in a workflow. Activities expose methods that can be invoked by the workflow runtime and expose programmable properties and events. Activities can be grouped into two broad categories: basic activities that perform discrete tasks and composite activities that manage a set of child activities.
  • a-law
    A standard compression algorithm, used in digital communications systems of the European digital hierarchy, to modify and optimize the dynamic range of an analog signal for digitizing.
  • alignment
    The process of matching words in a transcript to words in a .wav file.
  • ambient noise
    The background noise in an area or environment, being a composite of sounds from many sources near and far.
  • amplitude
    A measure of the strength of a signal, such as sound or voltage, determined by the distance from the baseline to the peak of the waveform.
  • Analytics and Tuning Studio
    An integrated set of voice response application analysis tools and reports that can be used to access and improve the effectiveness of an application and its deployment.
  • ANI
    See automatic number identification (ANI).
  • application error page
    The Web page that runs when an application error occurs.
  • application manifest file
    See manifest file.
  • approximated time block
    A period of time that does not contain boundary time.
  • .asp
    The file name extension that identifies a Web page as an Active Server Page.
  • ASP
    See Active Server Pages (ASP).
  • ASP.NET
    A component of the Microsoft .NET Framework for building, deploying, and running Web applications and distributed applications.
  • .aspx
    The file name extension for Active Server Pages Framework files.
  • auto-attendant
    A system or application that replaces the traditional switchboard operator, directing telephone calls to their correct extensions or voice mail. Auto-attendant systems can implement voice prompts, touch-tone menus, or voice recognition features to send calls to their proper destinations.
  • automatic alignment
    The process by which the Prompt Editor matches words in a transcript to words in a .wav file.
  • automatic number identification (ANI)
    A means by which telephone company switches, call centers, and computer telephony gear ascertains the calling party's telephone number.
  • babble
    A stream of speech, expected by the application, that continues beyond a time limit set by the application. For example, if a telephony application user begins a conversation with a colleague instead of responding to the application, and the duration of the user's conversational utterance exceeds the time limit for a response set by the application, the application treats the user's speech as babble.
  • bargein
    The ability of the user to interrupt the system using voice or DTMF input while the application is playing a prompt.
  • CA
    See certificate authority (CA).
  • call answering precedence
    The specified order in which applications are assigned to receive calls that overlap with other applications.
  • call flow
    The set of logical steps that form the user's interactions with the system, defining how the user passes through a series of dialogues.
  • call throttling
    Setting the maximum inbound or maximum outbound calls to a specific number.
  • caller ID
    A telephony network service that transmits the caller's telephone number to the called party's telephone equipment during the ringing signal or when the call is being set up but before the call is answered.
  • CamelCase
    The practice of capitalizing each subword within a word that is used in code, and that is comprised of multiple words or phrases that are joined without spaces (for example, CamalCaseMethod).
  • capacity planning
    Estimating the space, hardware, software, and connection infrastructure resources that will be needed over some future period of time.
  • CAS
    See Channel Associated Signaling (CAS).
  • certificate authority (CA)
    An issuer of digital certificates, the cyberspace equivalent of identity cards.
  • .cfg
    The file name extension for context-free grammar files.
  • CFG
    See context-free grammar (CFG).
  • Channel Associated Signaling (CAS)
    A form of digital communication signaling. As with most telecommunication signaling methods, it uses routing information to direct the payload of voice or data to its destination.
  • CIM
    See Common Information Model (CIM).
  • closed time block
    A period of time that is bounded by a start and an end time.
  • close-talk microphone
    A standard type of microphone often used in headsets and other devices in which the user speaks directly into the microphone.
  • code-behind
    For ASP.NET pages, code that is contained within a separate class file, allowing separation of HTML from presentation logic.
  • code-behind file
    A code file containing the page class that implements the program logic of a Web Forms or ASP.NET mobile Web Forms application.
  • code-behind page
    See code-behind file.
  • codec
    An abbreviation for compressor/decompressor. Software or hardware used to compress or decompress digital media.
  • Common Information Model (CIM)
    A standard designed by the Distributed Management Task Force (DMTF) to allow multiple parties to exchange system, network, application, and service management information.
  • computer accent
    The non-human quality of the speech generated by a TTS engine.
  • Computer Supported Telecommunications Applications (CSTA)
    A set of API calls that provide an international standard interface between network servers and telephone switches.
  • Computer Telephony Integration (CTI)
    The enabling of computer applications to integrate and control telephony functions.
  • concept recognition model
    The statistical language model that is generated for use in conversational grammars.
  • confidence score
    A value indicating the likelihood that the word or phrase recognized by the speech engine matches the word or phrase actually uttered by the speaker.
  • confirmation
    An acknowledgement that the system has heard a user's response.
  • confirmation threshold
    A confidence value above which an answer is accepted by the application without requiring the application to prompt the caller to verify the answer.
  • context-free grammar (CFG)
    Rules that predict the words that might follow the word just spoken, reducing the number of candidates that need to be evaluated to recognize the next word.
  • continuous speech
    An uninterrupted utterance without pauses between words.
  • Conversational Grammar Builder
    A tool for developing speech recognition grammars in Speech Server, such as single-keyword grammars.
  • Conversational Grammar Compiler
    The compiler for grammars built using Conversational Grammar Builder.
  • conversational understanding
    The ability of a system to recognize spontaneous, conversational speech.
  • Coordinated Universal Time
    See Universal Time Coordinate (UTC).
  • CSTA
    See Computer Supported Telecommunications Applications (CSTA).
  • CTI
    See Computer Telephony Integration (CTI).
  • culture
    In managed code, a class of information about a particular nation or people including their collective name, writing system, calendar used, and conventions for formatting dates and sorting strings.
  • DDI
    See Direct Dial Inward (DDI).
  • degradation
    A reduction in quality or performance of a communications channel.
  • denial of service (DOS) attack
    An assault, usually planned, that seeks to disrupt Web access. A denial of service attack overwhelms an Internet server with connection requests that cannot be completed.
  • deployment group
    A collection of related system components, such as voice response applications and trusted SIP peers, which represent different deployments in an environment.
  • design canvas
    The area in Microsoft Visual Studio on which you design and create speech applications.
  • Dialed Number Identification Service (DNIS)
    A telephone service that enables the receiver of a call to determine the number that the caller dialed. This service is commonly used by companies that have multiple 1-800 or 1-900 numbers.
  • dialog
    A turn-taking exchange of audio, such as a human-to-human or human-to-computer exchange.
  • dialog flow
    The sequence of turns in a dialog.
  • digital-signal processing (DSP)
    The study of signals in a digital representation and the processing methods of these signals.
  • diphone
    A sound consisting of two phonemes: one that leads into the sound and one that finishes the sound. For example, the word "hello" consists of these diphones: [silence-h] [h-eh] [eh-l] [l-oe] [oe-silence].
  • diphone concatenation
    The process of the text-to-speech engine concatenating short digital-audio segments and performing intersegment smoothing to produce a continuous sound.
  • Direct Dial Inward (DDI)
    A telephone service that provides companies or businesses with a block of numbers for calling into their Private Branch Exchange (PBX) system. With DDI, outside callers can dial individuals directly without intervention from a switchboard operator.
  • Distributed Management Task Force (DMTF)
    An industry consortium that develops, supports, and maintains standards for systems management of PC systems and products, to reduce total cost of ownership.
  • DMTF
    See Distributed Management Task Force (DMTF).
  • DNIS
    See Dialed Number Identification Service (DNIS).
  • DOCTYPE declaration
    A declaration at the beginning of an SGML document that gives a public or system identifier for the document type definition (DTD) of the document.
  • DSP
    See digital-signal processing (DSP).
  • DTMF
    See dual tone multi-frequency (DTMF).
  • DTMF grammar
    A grammar that recognizes dual tone multi-frequency (DTMF) inputs. Contrast with a speech grammar.
  • dual tone multi-frequency (DTMF)
    The signaling system used in telephones with touch-tone keypads, in which each digit is associated with two specific frequencies.
  • Dublin Core Metadata Initiative
    An open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models.
  • dynamic grammar
    A grammar that is created during application execution.
  • EC
    See explicit confirmation (EC).
  • ECMA
    See European Computer Manufacturer's Association (ECMA).
  • ECMAScript
    A scripting programming language created to capture the common core language elements of JavaScript and JScript.
  • EIF
    See Enterprise Instrumentation Framework (EIF).
  • Enterprise Instrumentation Framework (EIF)
    A Microsoft technology that provides an extensible event schema and unified API that leverage existing event, logging, and tracing mechanisms built in to Microsoft Windows.
  • Erlang
    A measure of traffic intensity through telephony equipment.
  • error page
    The Web page that displays when a user tries to view a page that no longer exists or the page file name has changed.
  • .etl
    The file name extension for a Windows event trace log.
  • ETL
    See event trace log (ETL).
  • European Computer Manufacturer's Association (ECMA)
    An industry organization whose goal is the standardization of Information and Communication Technology (ICT) systems. The American counterpart is the Computer and Business Equipment Manufacturer's Association (CBEMA).
  • evaluated relative date
    A period of time that is bounded by a start date and an end date.
  • event trace log (ETL)
    A file containing event trace log data.
  • explicit confirmation (EC)
    The confirmation method that introduces an extra prompt to explicitly confirm information that the user has previously provided.
  • Extensible HTML (XHTML)
    A markup language incorporating elements of HTML and XML. Web sites designed using XHTML can be more readily displayed on handheld computers and digital phones equipped with microbrowsers.
  • Extensible Markup Language (XML)
    A markup language that provides a format for describing structured data. XML is a World Wide Web Consortium (W3C) specification, and is a subset of Standard Generalized Markup Language (SGML).
  • Extensible Stylesheet Language Transformation (XSLT)
    A language used to transform an existing XML document into a restructured XML document. XSLT is primarily intended for use as part of XSL. Also called XSL Transformations.
  • external grammar
    A stand-alone grammar file that is not part of the application code and is accessed using a rule reference.
  • extraction
    A segment of a prompt that can combine dynamically with other extractions at run time to create a prompt.
  • extraction boundary
    The start and end times for an extraction, which can include all or part of the silence between the extraction and adjacent words.
  • extraction ID
    The number used to identify each extraction within the prompt database.
  • gateway
    A device that connects networks using different communications protocols so that information can be passed from one to the other.
  • .gbuilder
    The file name extension for files generated by Conversational Grammar Builder.
  • globally unique identifier (GUID)
    A program-generated number that creates a unique identity for an object.
  • grammar
    A structured list of words and phrases that are governed by rules and can be recognized by an application.
  • Grammar Collection Editor
    A tool for adding and removing Grammar objects from the Grammar collection.
  • grammar compiler
    A compiler that transforms the XML elements that define grammar elements into a binary format used by speech recognition (SR) engines.
  • Grammar Explorer
    A pane that displays the available grammar files and their component rules.
  • grammar file
    A file containing a structured list of words and phrases that are used by the speech recognition (SR) engine.
  • grammar library
    A collection of ready-to-use rules and rulesets designed to recognize commonly used types of user voice input such as dates, times, currency units, numbers, and confirmatory responses, as well as dual tone multi-frequency (DTMF) input.
  • Grammar Rule Name (GRN) referencing
    A type of Semantic Markup Language (SML) script referencing in which the script expression evaluates semantic values of, or assigns semantic values to, the Rule Variable (RV) of the rule element that contains the expression.
  • Grammar Rule Name (GRN) Rule Variable
    A predefined object that holds a semantic value that can be composed of multiple properties. Every rule element in a grammar has a single GRN Rule Variable. The GRN Rule Variable is identified by a dollar sign ($).
  • Grammar Rule Reference (GRR) referencing
    A type of Semantic Markup Language (SML) script referencing in which the script expression evaluates semantic values of the Rule Variable (RV) of a rule element outside of the rule element that contains the expression.
  • Grammar Rule Reference (GRR) Rule Variable
    The Rule Variable of the external rule element to which a grammar rule reference is made. The GRR Rule Variable is identified by a double dollar sign ($$).
  • Grammar Tuning Advisor
    A tool for finding commonly spoken words and phrases that are missing from a grammar or set of grammars.
  • Greenwich Mean Time
    See Universal Time Coordinate (UTC).
  • GRN referencing
    See Grammar Rule Name (GRN) referencing.
  • GRN Rule Variable
    See Grammar Rule Name (GRN) Rule Variable.
  • GRR referencing
    See Grammar Rule Reference (GRR) referencing.
  • GRR Rule Variable
    See Grammar Rule Reference (GRR) Rule Variable.
  • .grxml
    The file name extension for XML Form grammar files.
  • GUID
    See globally unique identifier (GUID).
  • hard closed date
    A period of time that is bounded by a start date and an end date.
  • hard open date
    A period of time that is specified by either a start date or deadline, but not both.
  • homonym
    A word with the same sound (homophone) or spelling (homograph) as another but with a different meaning.
  • IC
    See implicit confirmation (IC).
  • IETF
    See Internet Engineering Task Force (IETF).
  • IIS
    See Internet Information Services (IIS).
  • implicit confirmation (IC)
    The confirmation method that combines the confirmation question with the next information retrieval question to form a single prompt. This methods uses fewer prompts than explicit confirmation (EC).
  • inbound call
    A telephone call originated by a user and directed toward the telephony server. Synonymous with incoming call.
  • inbound-calling application
    An application that receives and processes telephone calls to the system on which the application is running.
  • inline grammar
    Grammar logic that exists as XML markup in the code of a SALT page rather than in a separate grammar file.
  • inline prompt
    Static text that the prompt engine plays when the application activates a control.
  • interactive voice response (IVR) application
    A telephony application that leads a telephone caller through a hierarchy of menus, delivers voice responses, collects voice and data inputs, and performs other operations on behalf of the caller or the program sponsor.
  • interference
    Noise or other external signals that affect the performance of a communications channel. Also, the electromagnetic signals generated by electronic devices, such as computers, that can disturb radio or television reception.
  • International Phonetic Alphabet (IPA)
    A standardized system of letters and marks, mostly based on the letters of the Roman alphabet, used internationally to represent speech sounds.
  • Internet Engineering Task Force (IETF)
    The international community of network designers and professionals that defines standard Internet protocols and addresses Internet architecture issues.
  • Internet Information Services (IIS)
    Software services that support Web site creation, configuration, and management, along with other Internet functions. Internet Information Services include Network News Transfer Protocol (NNTP), File Transfer Protocol (FTP), and Simple Mail Transfer Protocol (SMTP).
  • interoperability
    The ability of components of computer systems to function in different environments.
  • inverse text normalization (ITN)
    A feature that enables a spoken numeric or symbolic value to appear as a number or symbol when translated by a speech recognition program. For example, if "twenty three" is spoken, it appears as "23" on the computer screen.
  • IP PBX
    A Private Branch Exchange (PBX) that supports the Internet Protocol along with the traditional analog and digital circuit-switched connections to the Public Switched Telephone Network (PSTN) and telephone sets.
  • IPA
    See International Phonetic Alphabet (IPA).
  • ITN
    See inverse text normalization (ITN).
  • .js
    The file name extension for JScript files.
  • keypress
    The act of pressing a key on a phone to enter a number or symbol.
  • language pack
    A set of language resources that supports the development and deployment of applications in a particular language.
  • lexicon
    The words of a language and their definitions. In speech recognition systems, a lexicon normally contains only the orthographic and the phonetic representations of words.
  • listening port
    The TCP port on which Speech Server expects Session Initiation Protocol (SIP) INVITE messages for incoming calls.
  • localhost
    The standard domain name for the computer system currently in use.
  • managed code
    Code executed by the Microsoft .NET Framework common language runtime (CLR).
  • managed code voice response application
    An automated application, which is built using managed code, that allows a person to speak responses to application prompts (using speech recognition) to interact with the system.
  • Managed Object Format (MOF)
    The standard language used to define elements of the Common Information Model (CIM).
  • Management Pack
    Predefined solutions from Microsoft that contain computer groups and rules, with filters, performance counters, and alerts defined for specific customer applications.
  • manifest file
    A file containing a list of resources, such as grammar files and prompt databases, that Speech Server preloads and caches to improve performance. Speech Server automatically creates a manifest file (Manifest.xml) when a new project is created.
  • metadata
    Data that is used to describe other data.
  • Microsoft Enterprise Instrumentation
    An application programming interface (API) used by Speech Server to perform logging services.
  • Microsoft Management Console (MMC)
    An application that provides a graphical user interface (GUI) and an operational framework for administrative and management tools.
  • Microsoft Message Queuing (MSMQ)
    A Microsoft technology that provides for the passing of messages between applications.
  • Microsoft Operations Manager (MOM)
    A family of Microsoft server software that provides event management, proactive monitoring and alerting, reporting, and trend analysis services.
  • MIME
    See Multipurpose Internet Mail Extensions (MIME).
  • mixed-initiative dialog
    A speech dialog in which the application prompts for specific information, but the user might respond with additional or different information that the application recognizes.
  • MMC
    See Microsoft Management Console (MMC).
  • MOF
    See Managed Object Format (MOF).
  • MOM
    See Microsoft Operations Manager (MOM).
  • MSMQ
    See Microsoft Message Queuing (MSMQ).
  • mu-law
    A standard analog signal compression or companding algorithm, used in digital communications systems of the North American and Japanese digital hierarchies, to optimize the dynamic range of an audio analog signal prior to digitizing.
  • Multipurpose Internet Mail Extensions (MIME)
    A protocol widely used on the Internet that extends the Simple Mail Transfer Protocol (SMTP) to permit data (such as video, sound, and binary files) to be transmitted by Internet e-mail without having to be translated into ASCII format first.
  • mumble
    An utterance that the application recognizes with a confidence level that falls below the recognition rejection threshold. A speech recognizer often classifies an utterance as a mumble when the user's pronunciation does not match the pronunciation expected by the speech recognizer or when excessive noise (background noise or line noise) is present in the input.
  • Mutual Transport Layer Security (MTLS)
    An implementation of Transport Layer Security (TLS) that requires mutual authentication of the end-points using digital certificates.
  • natural language
    A human language, as opposed to a command or programming language traditionally used to communicate with a computer.
  • natural language understanding
    The ability to infer the intended meaning of a natural language utterance based on the words contained in that utterance.
  • N-best
    The recognition results in which the speech recognition engine has the highest levels of confidence. N is the number of results returned.
  • .NET Framework
    An environment for building, deploying, and running Web Services and other applications. It consists of three main parts: the Common Language Runtime, the Framework classes, and ASP.NET.
  • node
    A word or phoneme on a recognition path in a recognition/alternative graph generated by an engine.
  • noise
    Any interference that affects the operation of a device. In communications, noise consists of random electronic signals, produced either naturally or by the circuitry, that degrade the quality or performance of a communications channel.
  • open time block
    A period of time that contains one end point (for example, noon) and a reference phrase (for example, just before), which results in "just before noon."
  • outbound call
    A telephone call originated by the telephony server and directed toward a remote party. Synonymous with outgoing call.
  • outbound-calling application
    An application that places and processes telephone calls from the system on which the application is running.
  • out-of-grammar utterance
    An utterance containing words or phrases that are not included in an application grammar.
  • Param Collection Editor
    A tool for specifying additional platform-specific and non-standard configuration parameters for the speech recognition engine.
  • PBX
    See Private Branch Exchange (PBX).
  • PCM
    See pulse code modulation (PCM).
  • PCM8
    A recording format typically used in desktop applications. In PCM16, each sample is 16 bits.
  • PCM16
    A recording format typically used in desktop applications. In PCM8, each sample is 8 bits. This format results in lower quality audio than with PCM16, but it requires less disk space.
  • PEML
    See Prompt Engine Markup Language (PEML).
  • performance counter
    A set of components that allow you to track the performance of an application.
  • performance log
    A log that collects data for specific performance objects and counters over a specified period of time.
  • personal identification number (PIN)
    A sequence of digits used to verify the identity of the person.
  • .pf
    The file name extension for a prompt function file.
  • phoneme
    Abstract categories of speech sounds (vowels and consonants) grouped together to create words. For example, SAPI provides two default pronunciations of the word hello: "h ax l ow" and "h eh l ow." Each group of sounds, separated by spaces, represents a phoneme.
  • phrase
    An ordered list of words that are spoken in the same utterance.
  • PIN
    See personal identification number (PIN).
  • pitch
    The tone of a sound, which generally is determined by the sound's frequency. A high-pitched sound has a higher frequency; a low-pitched sound has a lower frequency.
  • Point-to-Point Tunneling Protocol (PPTP)
    An extension of the Point-to-Point Protocol used for communications on the Internet. PPTP was developed by Microsoft to support virtual private networks (VPNs), which allow individuals and organizations to use the Internet as a secure means of communication.
  • postamble
    Optional ending words or phrase.
  • postamble grammar
    A grammar that recognizes speech input that follows semantically significant information. Contrast with preamble grammar.
  • PPTP
    See Point-to-Point Tunneling Protocol (PPTP).
  • preamble
    Optional beginning words or phrase.
  • preamble grammar
    A grammar that recognizes speech input that precedes semantically significant information. Contrast with postamble grammar.
  • PRI
    See Primary Rate Interface (PRI).
  • Primary Rate Interface (PRI)
    A set of international standards for telephone transmission.
  • Private Branch Exchange (PBX)
    An automatic telephone switching system that enables users within an organization to place calls to each other without going through the public telephone network. This system also allows users to place calls directly to outside numbers.
  • prompt
    A question, directive, greeting, or information spoken by a speech application.
  • prompt database
    A database containing the prompt information and audio data for a prompt project, including prompt transcription text, extraction data, and archived versions of prompt .wav files in their original recorded format.
  • prompt engine
    The component of Speech Engine Services (SES) that processes text input and produces speech output by concatenating prerecorded words and phrases that match the text input. The prompt engine stores the recordings it uses on disk and indexes them in one or more prompt database files. SES is a component of Speech Server.
  • Prompt Engine Markup Language (PEML)
    The language used by the prompt engine to take text input and produce speech output by concatenating recordings of words and phrases that match the text input.
  • prompt function
    A function that dynamically generates a prompt at run time.
  • PromptDatabase Collection Editor
    A tool for specifying properties of the prompt databases associated with the control.
  • .promptdb
    The file name extension for a working file that contains transcription text, extraction data, and archived versions of prompt .wav files in their original recorded format. This file type compiles into a .prompts file.
  • .prompts
    The file name extension for a prompt database file, which is a binary file that contains all the prompt information and audio data for a prompt project. This file type is compiled from a .promptdb file.
  • pronunciation
    The way a word or a language is usually spoken.
  • Pronunciation Editor
    A tool for defining custom pronunciations of single-word Phrase elements.
  • pronunciation lexicon
    A database of pronunciations maintained by a speech recognition or text-to-speech engine.
  • pronunciation rule
    A rule followed by a text-to-speech engine to convert text into phonemes.
  • Property Builder
    In Microsoft Visual Studio, a user interface element that assists in the entry or editing of property values.
  • prosody
    A collection of phonological features (including pitch, duration, and stress) that define the rhythm of spoken language.
  • .prproj
    The file name extension for a prompt project file.
  • PSTN
    See Public Switched Telephone Network (PSTN).
  • public switched telephone network (PSTN)
    The world's collection of interconnected voice-oriented public telephone networks, both commercial and government-owned.
  • pulse code modulation (PCM)
    The most common method of encoding an analog voice signal into a digital bit stream.
  • QA control
    A single interaction with the user, which is usually but not always a "Question and Answer" dialogue. A QA control that collects data places it in SemanticItem controls.
  • QOS
    See quality of service (QOS).
  • quality of service (QOS)
    A set of quality assurance standards and mechanisms for data transmission.
  • RealSpeak
    Software by Nuance Communications, Inc. that converts text into remarkably high quality speech, in both male and female voices.
  • Real-time Transport Protocol (RTP)
    A protocol for real-time applications that provides end-to-end network transport functions that are suitable for applications transmitting real-time data over multicast or unicast network services.
  • recognition engine
    See speech recognition engine.
  • recognition grammar
    See speech recognition grammar.
  • recognition mode
    The method used by the speech recognizer to stop the recognition process. The three modes are automatic, multiple, and single.
  • recognition path
    A sequence of words or phonemes that an engine analyzed while attempting to recognize an utterance.
  • recognition rule
    A rule followed by a speech recognition engine using a context-free grammar to recognize speech.
  • rejection threshold
    A confidence value below which an answer is rejected by the application.
  • remote procedure call (RPC)
    A call by one program to a second program on a remote system. The second program generally performs a task and returns the results of that task to the first program.
  • re-recognition
    The process performed by the Re-recognizer tool to determine whether tuning changes have improved speech recognition.
  • Re-recognizer
    A tool that validates whether tuning a speech application's recordings has actually improved the accuracy of the speech recognition.
  • root Rule Variable (RRV)
    The GRN Rule Variable of the root rule of a grammar. The RRV provides the semantic result of a recognition.
  • RPC
    See remote procedure call (RPC).
  • RRV
    See root Rule Variable (RRV).
  • RTP
    See Real-time Transport Protocol (RTP).
  • rule
    See pronunciation rule and recognition rule.
  • SALT
    See Speech Application Language Tags (SALT).
  • SALT interpreter
    A software component that interprets script and markup languages such as HTML, XHTML, and SALT on speech-enabled Web pages.
  • SALT voice response application
    A Web-based voice response application built using Speech Application Language Tags (SALT).
  • SAPI
    See Speech API (SAPI).
  • Secure Hypertext Transfer Protocol (HTTPS)
    A secure version of HTTP that encrypts data using Secure Sockets Layer (SSL).
  • Secure Real-time Transport Protocol (RTP)
    A protocol that provides encryption, message authentication and integrity, and replay protection to the RTP data in applications.
  • Secure Sockets Layer (SSL)
    A protocol for transmitting private documents over the Internet by using two keys to encrypt data.
  • semantic interpretation (SI)
    The process by which a semantic interpreter generates a result based on a spoken word or phrase that matches a grammar rule.
  • Semantic Interpretation for Speech Recognition (SISR)
    The specifications which describes how tags within SRGS grammars may be inserted to support basic post-processing or full semantic interpretation.
  • semantic item
    A value returned by a grammar rule when a user's utterance matches the rule.
  • Semantic Markup Language (SML)
    An XML-based markup language that allows the application to identify and parse meaningful parts of speech recognition output.
  • Semantic Script Editor
    A tool for associating semantic interpretation information with relevant elements in a grammar rule.
  • SES
    See Speech Engine Services (SES).
  • Session Initiation Protocol (SIP)
    An Internet standard used to initiate, manage, and terminate interactive sessions between one or more users on the Internet.
  • short time-out confirmation (STC)
    The confirmation method that interprets silence as acceptance. With short time-out confirmation, the time period that the application waits for the user to speak is typically shorter than that in explicit confirmation.
  • SI
    See semantic interpretation (SI).
  • silence
    No sound from the user is detected by the application.
  • silent installation
    A software installation that takes place without interaction by the user.
  • silent segment
    The amount of silence at the beginning or end of a word or phrase.
  • Simple Messaging Extension (SMEX)
    The communication mechanism by which SALT applications establish an asynchronous message exchange channel for sending and receiving messages between the SALT application and external components of the SALT platform.
  • Simple Object Access Protocol (SOAP)
    A protocol that provides a simple mechanism for exchanging structured and typed information between peers in a decentralized, distributed environment using XML. This protocol also defines a message format in XML that travels over the Internet using HTTP.
  • SIP
    See Session Initiation Protocol (SIP).
  • SIP peer
    The mechanism through which SIP connects Speech Server to the caller's endpoint. SIP peers can include IP PBXs, VoIP gateways, SIP phones and softphones, and TIMC.
  • SIP phone
    A special telephone that can natively connect to the Internet through SIP.
  • SIP phone simulator
    A softphone is a software program integrated within your computer to serve as a telephone, typically using a headset.
  • .sln
    The file name extension for a Microsoft Visual Studio solution file.
  • SMEX
    See Simple Messaging Extension (SMEX).
  • SML
    See Semantic Markup Language (SML).
  • SOAP
    See Simple Object Access Protocol (SOAP).
  • soft closed date
    A period of time that is specified by both a start date and an end date.
  • softphone
    A multimedia application that works in association with VoIP technology enabling you to make and receive calls direct from a computer.
  • Solution Explorer
    A pane that shows an organized view of projects and their files as well as ready access to the commands that pertain to them.
  • speaker
    The user who utters the speech to be recognized by an application.
  • speaker-dependent
    The speech recognition engine requires the user to train it to recognize his or her voice.
  • speaker-independent
    The speech recognition engine does not require training.
  • Speech API (SAPI)
    A set of routines, protocols, and tools that enable programmers to build speech-enabled applications for Microsoft Windows platforms.
  • speech application
    An application in which human-computer interaction is mediated either unidirectionally or bidirectionally by speech.
  • Speech Application Language Tags (SALT)
    A markup language that integrates speech services into existing markup languages (such as HTML and XHTML) and enables telephony access to information and applications from computers, telephones, and PDAs.
  • speech application project
    In Microsoft Visual Studio, a project containing a speech recognition application.
  • Speech Control Editor
    The voice user interface design tool for SALT voice response applications.
  • Speech Controls Outline
    A tool that enables developers to view and edit the evaluated run-time activation order of Dialog Speech Controls and ASP.NET Application Speech Controls.
  • Speech Engine Services (SES)
    The component of Speech Server that processes the audio (speech) streams that pass between the speech application and the user.
  • speech grammar
    A grammar that recognizes speech inputs. Contrast with a DTMF grammar.
  • Speech Grammar Editor
    A Speech Server tool used to develop and maintain grammars that conform to the W3C SRGS (World wide Web Consortium Speech Recognition Grammar Specification) standard.
  • speech output
    A type of spoken output produced by the prompt engine by concatenating recordings of words and phrases that match the text input.
  • speech recognition (SR)
    The process of converting spoken language into printed text.
  • speech recognition engine
    The component of Speech Engine Services (SES) that converts spoken input to text and delivers the text to an application. SES is a component of Speech Server.
  • Speech Recognition Grammar Specification (SRGS)
    A specification developed by the World Wide Web Consortium (W3C) that defines syntax for representing grammars for use in speech recognition. SRGS enables developers to specify the words and patterns of words to be listened for by a speech recognizer.
  • Speech Server
    A set of server components for deploying telephony-based voice response applications. Speech Server combines Web technologies, speech-processing services, and telephony capabilities into a single system.
  • Speech Server Administrator console
    A Microsoft Management Console (MMC) snap-in that provides a graphical interface for managing the configurable settings of Speech Server components and monitoring the status of those components.
  • Speech Server Developers Tools
    A set of Speech Server tools for creating speech recognition grammars, creating and managing prompt databases, creating voice user interfaces, and debugging speech applications.
  • speech synthesis
    The text-to-speech engine synthesizes the glottal pulse from human vocal cords and applies various filters to simulate throat length, mouth cavity, lip shape, and tongue position.
  • Speech Synthesis Markup Language (SSML)
    An XML-based markup language used to control various characteristics of synthetic speech output including voice, pitch, rate, volume, pronunciation, and other characteristics.
  • speech synthesizer
    An electronic device that converts text characters into artificial speech.
  • SpeechControlSettingsItem Collection Editor
    A tool for specifying default properties and settings for various Microsoft ASP.NET Speech Controls on a page.
  • Speech Toolbox
    In Microsoft Visual Studio, a user interface element that contains speech activity components in managed code voice response applications and Speech Controls in SALT voice response applications.
  • SR
    See speech recognition (SR).
  • SRGS
    See Speech Recognition Grammar Specification (SRGS).
  • SSL
    See Secure Sockets Layer (SSL).
  • SSL certificate
    A certificate installed on a secure server that is used to identify the merchant using it and to encrypt the credit card and other sensitive data. SSL is used to encrypt the communication between a server and a client.
  • SSML
    See Speech Synthesis Markup Language (SSML).
  • Start page
    The page that is loaded when an instance of a SALT or VoiceXML interpreter is initialized. The Start Page is to SALT interpreters what a Home Page is to graphical browsers. When the caller's session terminates, the SALT interpreter resets and reloads the Start Page.
  • STC
    See short time-out confirmation (STC).
  • system error page
    The Web page that runs when a system error occurs.
  • System Monitor
    A Windows administration tool, which is part of the Performance console, to view current counter activity or logged counter data.
  • system-initiative dialog
    A speech dialog in which the application prompts for specific information and recognizes only the requested information at that point in the application.
  • tags
    See TTS control tag.
  • TAP
    See Telephony Application Proxy (TAP).
  • TAS
    See Telephony Application Services (TAS).
  • telephony
    A telephone technology (voice, fax, or modem transmissions) based on the conversion of sound into electrical signals or wireless communication.
  • Telephony Application Host
    The server component that hosts the speech application.
  • Telephony Application Proxy (TAP)
    The server component that performs call redirection.
  • Telephony Application Services (TAS)
    The component of Speech Server that maps incoming calls to the associated speech application on a Web server, and sends application requests for speech output or recognition to Speech Engine Services.
  • telephony board
    The physical connection between the telephone network and the TIM software.
  • Telephony Interface Manager (TIM)
    A software component that is tightly coupled to the installed telephony board and that enables the board to communicate with Speech Server.
  • Telephony Interface Manager Connector (TIMC)
    An intermediary layer that enables Telephony Interface Manager (TIM) to be used with a telephony hardware interface card. Called Telephony Interface Services (TIS) in Microsoft Speech Server 2004.
  • text normalization
    The process of converting non-word written symbols into words that a speaker would say when reading that symbol out loud.
  • text-to-speech (TTS)
    Technologies for converting textual (ASCII) information into synthetic speech output. Used in voice-processing applications requiring production of broad, unrelated, and unpredictable vocabularies, such as products in a catalog or names and addresses.
  • TIM
    See Telephony Interface Manager (TIM).
  • TIMC
    See Telephony Interface Manager Connector (TIMC).
  • TLS
    See Transport Layer Security (TLS).
  • token
    A string that a speech recognizer can convert to a phonetic representation.
  • training
    The process of speaking a series of preselected phrases for the engine, which provides the engine with more information about the voice of the speaker and can improve speech recognition.
  • training sentence
    A preselected phrase spoken to the speech engine by the user to improve speech recognition.
  • transcription
    A record of a speech-based conversation converted into written text. Transcriptions are commonly used to analyze the performance of a speech application by matching what was said during the call with the log file of what actually happened.
  • Transport Layer Security (TLS)
    A layer providing encryption and authentication services that can be negotiated during the startup phase of many Internet protocols.
  • trusted SIP peer
    A SIP peer (such as a VoIP gateway or softphone) that is trusted by the source computer in a deployment group.
  • TTS
    See text-to-speech (TTS).
  • TTS control tag
    An instruction that can be embedded in text sent to a text-to-speech engine to improve the prosody of the spoken text.
  • TTS engine
    The component of Speech Engine Services (SES) that processes text input and produces speech output by synthesizing words and phrases. SES is a component of Speech Server.
  • tuning
    The process of refining a speech application to improve performance.
  • turn
    A prompt followed by a response and recognition.
  • u-law
    A standard analog signal-compression algorithm, used in digital communications systems of the North American digital hierarchy, to optimize the dynamic range of an analog signal prior to digitizing.
  • Unicode
    A 16-bit character set that replaces ASCII and allows any character from any language to be represented in a text string. The Unicode character set contains a subset for International Phonetic Alphabet (IPA) phonemes.
  • Uniform Resource Identifier (URI)
    A character string used to identify a resource (such as a file) from anywhere on the Internet by type and location. The set of Uniform Resource Identifiers includes Uniform Resource Names (URNs) and Uniform Resource Locators (URLs).
  • Uniform Resource Locator (URL)
    An address for a resource on the Internet, which specifies the protocol used to access the resource, the name of the server on which the resource resides, and (optionally) the path to a resource.
  • Universal Naming Convention (UNC)
    A name used on Windows to access a drive or directory containing files shared across a network.
  • Universal Time Coordinate (UTC)
    For all practical purposes, the same as Greenwich Mean Time, which is used for the synchronization of computers on the Internet. Also called the Coordinated Universal Time format.
  • UPL
    See user-perceived latency (UPL).
  • URI
    See Uniform Resource Identifier (URI).
  • URL
    See Uniform Resource Locator (URL).
  • user-perceived latency (UPL)
    The length of time that a user perceives to occur between the end of one event and the beginning of a subsequent event.
  • UTC
    See Universal Time Coordinate (UTC).
  • UTF-8
    The 8-bit Unicode Transformation Format that serializes a Unicode scalar value as a sequence of one to four bytes.
  • UTF-16
    The 16-bit Unicode Transformation Format that serializes a Unicode value as a sequence of two bytes, in either big-endian or little-endian format.
  • utterance
    Anything heard by the engine as a finite series of sounds that the engine attempts to recognize as speech.
  • .vbs
    The file name extension for VBScript files.
  • virtual directory
    A file that points to another file or directory and is used to allow a variety of sources to point to a common destination.
  • virtual private network (VPN)
    A private communications network that uses a public telecommunication infrastructure, such as the Internet, to provide remote offices or individual users with secure access to their organization's network.
  • vocabulary
    The set of words used in the grammars that a speech application uses. Words that are not in the vocabulary, out-of-vocabulary (OOV) words, cannot be recognized by the speech application.
  • voice command
    A word or phrase associated with a voice menu. When an engine recognizes a voice command, it notifies the application that owns the voice menu containing the command.
  • voice grammar
    See speech grammar.
  • Voice over IP (VoIP)
    Audio streaming over a network using the TCP/IP protocol.
  • voice response application
    An automated application that allows a person to speak responses to a voice menu (using speech recognition) to interact with the system.
  • Voice Response Debugging Window
    An integrated Microsoft Visual Studio tool for debugging speech applications.
  • voice user interface (VUI)
    A voice-controlled application on a computer, PDA, or Smartphone.
  • voice-only application
    An application that is driven by using either speech or DTMF input. Telephony applications are a type of voice-only application in which users interact with the application by speaking into the telephone or pressing buttons on the numeric keypad.
  • VoiceXML
    The W3C standard XML format for specifying interactive voice dialogues between humans and computers.
  • VoiceXML application
    A Web-based voice response application built using VoiceXML.
  • VoIP
    See Voice over IP (VoIP).
  • volume normalization
    A process for making recordings sound more natural when they are made at different volumes and then concatenated.
  • VPN
    See virtual private network (VPN).
  • VUI
    See voice user interface (VUI).
  • W3C
    See World Wide Web Consortium (W3C).
  • .wav
    The file name extension for waveform audio files.
  • Wave Editor
    A tool for editing waveform audio files.
  • waveform audio file
    An audio format for storing audio on computers.
  • WBEM
    See Web-based Enterprise Management (WBEM).
  • Web server
    The component of a Web application speech system that generates application Web pages containing HTML, SALT, and script. The Web server used by Speech Server is Internet Information Services (IIS), which is included with Microsoft Windows 2003 Server.
  • Web services
    Protocols that enable computers to work together by exchanging messages. Web services are based on the standard protocols of XML, SOAP, and WSDL, which allow them to interoperate across platforms and programming languages.
  • Web-based Enterprise Management (WBEM)
    An initiative based on a set of management and Internet standard technologies developed to unify the management of enterprise computing environments.
  • Web-based voice response application
    An automated application, which is not compiled or built using managed code, that allows a person to speak responses to a voice menu (using speech recognition) to interact with the system.
  • Windows Management Instrumentation (WMI)
    A standardized programming interface for managing computers, servers, and networks. WMI originated from the Web-based Enterprise Management (WBEM) initiative and the Common Information Model (CIM) adopted by the Distributed Management Task Force (DMTF).
  • Windows Workflow Foundation
    The programming model, engine, and tools for quickly building workflow-enabled applications on Windows. It consists of a Microsoft .NET Framework 3.0 (formerly WinFX) namespace, an in-process workflow engine, and designers for Visual Studio 2005.
  • WMI
    See Windows Management Instrumentation (WMI).
  • word
    An atomic Unicode text string. A word can have several related vernacular words (such as "Los Angeles") within it because the vernacular words are always used in common.
  • word alignment
    The process of associating .wav files with transcriptions. During alignment, the prompt engine marks the boundaries between words in the .wav file.
  • word boundary
    The spacing or silent between words in a prompt. Speech Prompt Editor places default word boundaries in a prompt when it creates alignments.
  • workflow
    A sequential collection of tasks or activities that are performed by a managed code voice response application.
  • Workflow Designer
    The Speech Server feature in which managed code voice response applications are created.
  • World Wide Web Consortium (W3C)
    The organization that sets standards for the Web and HTML.
  • XHTML
    See Extensible HTML (XHTML).
  • .xml
    The file name extension for an XML file.
  • XML
    See Extensible Markup Language (XML).
  • XPath
    A language that describes a way to locate and process items in XML documents by using an addressing syntax based on a path through the document's logical structure or hierarchy.
  • XPath expression
    An expression that searches through an XML document and extracts information from the nodes (any part of the document, such as an element or attribute) in that document.
  • XPathTrigger Sample Sentence Tool
    A tool for testing sample phrases or keystrokes against speech or DTMF grammars associated with the Command grammar.
  • XSLT
    See Extensible Stylesheet Language Transformation (XSLT).