While investigating some transcription oddities I found that one of our transcripts has offsets which are off by ~20 seconds.
The asset metadata shows:
"AssetFile": [
{
"StartTime": "PT0.528S",
"Duration": "PT37M41.635S"
}
]
And reviewing the first block of spoken text from the transcription:
{
"id": 1,
"text": "Hi Melanie. Yes,",
"confidence": 0.558,
"speakerId": 1,
"language": "en-US",
"instances": [
{
"adjustedStart": "0:00:32.39",
"adjustedEnd": "0:00:35.76",
"start": "0:00:32.39",
"end": "0:00:35.76"
}
]
},
Reviewing the audio file itself the first word spoken occurs at ~0:00:52, 20 seconds out of sync.
A separate audio track of another speaker was correctly encoded and time stamped by the service. When both audio tracks are played side-by-side locally their timing is correct.
Is there a way to get the correct offsets into the file? Is there an option I'm missing or a piece of data that I should be retrieving to "fix-up" these timestamps?