Speech Grammars in F#
People say that Vim keys are a grammar for talking to your editor and that's exactly what they are. One weekend some time back I had fun making VimSpeak to see how well mapping English words to Vim keys would work. It turned out quite nice and some pieces of how it was built (in particular the grammar description format) might be useful to others, so here's how it works. And here's a demo of VimSpeak in action:
If you want to peruse the code you can actually learn quite a bit about the grammar of Vim itself. You'll notice that it's a very declarative set of definitions. The API given by System.Speech.Recognition is very imperative and somewhat ugly, so I made this little helper:
open System open System.Speech.Recognition type GrammarAST<'a> = ____| Word ______of string * 'a option ____| Optional __of GrammarAST<'a> ____| Repeatable of GrammarAST<'a> ____| Sequence __of GrammarAST<'a> list ____| Choice ____of GrammarAST<'a> list ____| Dictation let rec speechGrammar = function ____| Word (say, Some value) -> ________let g = new GrammarBuilder(say) ________g.Append(new SemanticResultValue(value.ToString())) ________g ____| Word (say, None) -> new GrammarBuilder(say) ____| Optional g -> new GrammarBuilder(speechGrammar g, 0, 1) ____| Repeatable g -> new GrammarBuilder(speechGrammar g, 1, Int32.MaxValue) ____| Sequence gs -> ________let builder = new GrammarBuilder() ________List.iter (fun g -> builder.Append(speechGrammar g)) gs ________builder ____| Choice cs -> new GrammarBuilder(new Choices(List.map speechGrammar cs |> Array.ofList)) ____| Dictation -> ________let dict = new GrammarBuilder() ________dict.AppendDictation() ________let spelling = new GrammarBuilder() ________spelling.AppendDictation("spelling") ________new GrammarBuilder(new Choices(dict, spelling))
This lets you construct nice looking, declarative grammars from the discriminated union and then run them through the speechGrammar function to get GrammarBuilders used by System.Speech.Recognition.
You can have simple words and optionally associate them with some meaningful value. Restricted grammars are much more accurate to recognize than free dictation and spelling, but you can do that too. You can have optional bits of grammar, sequences of things that must be said in a particular order, choices from a set of options, etc.
A demo should make it clear enough. Lets start by letting someone introduce themselves. We could have a grammar listing choices of possible names, but here we'll just let them dictate their name. However the phrase preceding this is restricted to the grammar:
let name = Dictation let intro = ____Sequence [ ________Choice [ ____________Word ("My name is", None) ____________Word ("I'm", None)] ________name]
This lets you say, "My name is Ashley" or "I'm Fred", etc. Let's let them say various greetings and goodbye phrases as well:
let greeting = ____Sequence [ ________Choice [ ____________Word ("Hello", Some "greeting") ____________Word ("Howdy", Some "greeting") ____________Word ("Hi", ___Some "greeting")] ________Optional name] let goodbye = ____Sequence [ ________Choice [ ____________Word ("Goodbye", Some "goodbye") ____________Word ("See ya", _Some "goodbye") ____________Word ("Ciao", ___Some "goodbye")] ________Optional name]
Now we can say "Hello Joe", "Howdy", "See ya Mr. Bean", "Ciao", ... Notice now we're attaching a semantic value indicating whether it's a "greeting" or a "goodbye". This makes it easy (without parsing) to pull this information out of recognized phrases later.
We can create and initialize the speech reco engine:
let reco = new SpeechRecognitionEngine() try reco.SetInputToDefaultAudioDevice() with _ -> failwith "No default audio device! Plug in a microphone, man." reco.LoadGrammar(new Grammar(speechGrammar greeting)) reco.LoadGrammar(new Grammar(speechGrammar intro)) reco.LoadGrammar(new Grammar(speechGrammar goodbye))
And for the heck of it, let's throw in some speech synthesis while we're at it:
open System.Speech.Synthesis let synth = new SpeechSynthesizer() synth.SelectVoiceByHints(VoiceGender.Female) let speak (text : string) = ____reco.RecognizeAsyncStop() ____synth.Speak text |> ignore ____reco.RecognizeAsync(RecognizeMode.Multiple)
Funny enough, it is possible for the machine to talk to itself! This is why the speak function temporarily stops recognition.
Finally, we can do use use all this for a simple demo:
reco.SpeechRecognized.Add(fun a -> ____let res = a.Result ____if res <> null then ________printfn "%s (%f)" res.Text res.Confidence ________let sem = res.Semantics.Value ________if sem <> null then ____________match sem.ToString() with ____________| "greeting" -> speak "Hello there!" ____________| "goodbye" _-> speak "See you later!") reco.RecognizeAsync(RecognizeMode.Multiple) Console.ReadLine()
Here we just echo back what we think we heard and also speak back depending on the semantic value of what was said.
Take this and have some fun with it!