My team needs to synch up words in the transcript with events from another source (button presses, specifically). The final results of transcription have word level timestamps when we use the appropriate config arguments, but intermediate results (associated with Recognizing events) do not. How can we get word level timestamps when doing real time transcription?