I'm trying to run realtime STT on two streams (one through mic, one through speaker). These are the options I'm considering:
combine both streams into one and use the native diarization capability
use the multichannel capability
create two separate sessions
Option 1: I'm considering using PullAudioInputStream & combining both streams. But I'm using the Javascript SDK and I'm unable to figure out how to set diarization option. Additionally, it seems diarization is not that great just yet.
Option 2: this seems to be limited to the Conversation Transcription API but that requires 7 mics etc. Not viable for my use case.
Option 3: create two separate sessions - one per each stream. This would 2x the cost and I'd lose synchronization between the two streams.
Any thoughts?