Training versus Lexicons

I’ve received a number of requests from people who want to train the recognizer from wave files.  I’ve helped them, but one thing I’ve come to realize is that they’re confusing training with lexicons.

In particular, training the engine will not add new words to the vocabulary.

Training the engine from a set of wave files is useful only if you wish to recognize more audio from other wave files from the same speaker.   It’s not even terribly useful if you want to recognize live audio from the same speaker; the noise characteristics are often different, resulting in a different acoustic model.

If your problem is that you have a set of words that don’t exist in the default dictation vocabulary, then what you need to do is create a lexicon.

There are two kinds of lexicons that are interesting in this situation: User Lexicons, and Application lexicons.  (Engine lexicons are also available, but can’t be modified.)

The difference is that User lexicons are specific to the user profile; whereas Application lexicons are shared across users and profiles.  There is only one User lexicon per profile, but there can be an arbitrary number of Application lexicons.

Entries in the User lexicon override entries in the Application Lexicon, and entries in newer Application lexicons override entries in older Application lexicons.

Application lexicons are ‘write-once’ – that is, once the lexicon has been created, it cannot be modified.  Typically, the application lexicon is created at application install time, and removed on uninstall.

Windows Speech Recognition has built-in wizards to manage the user lexicon (programmatically accessible via ISpRecognizer::DisplayUI or ISpTokenUI::DisplayUI; the TypeOfUI parameter should be SPDUI_AddRemoveWord).

I’ll go into more detail on the lexicon methods in my next post.