Re: Speech to text api seems not responding (Voicemail Transcription with Azure on FusionPBX/FreeSwitch)

Konrad M WebArray 6 Reputation points
2021-11-10T02:24:35.427+00:00

This is a proposed answer to a question that was originally posted by ClearConverse over here:
https://social.msdn.microsoft.com/Forums/en-US/b06ad362-4eb7-44ee-8b84-4ce4b8b3fd20/speech-to-text-api-seems-not-responding

The original question relates to a common problem users are faced when trying to implement Azure Cognitive Services into their FusionPBX to use for Voicemail Transcription.

After implementing the API it seems as though the transcription is working, but after a while the transcription stops working. At random times it may seem like the API starts working again, usually over prolonged lengths of times and/or when the cache is cleared, but then returns to a non-working state shortly after.

Tell tale signs of this problem can be spotted in the debug logs. Below is the sample provided by ClearConverse:

transcribe_provider: azure
2019-07-05 10:16:09.349686 [NOTICE] switch_cpp.cpp:1365 [voicemail] transcribe_language: en-US
2019-07-05 10:16:09.349686 [NOTICE] switch_cpp.cpp:1365 [voicemail] Azure access_token recovered from memcached
2019-07-05 10:16:10.989670 [NOTICE] switch_cpp.cpp:1365 [voicemail] CMD: curl -X POST "https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US&format=detailed" -H 'Authorization: Bearer eyJhbGciOiJodHRwOi8vd3d3LnczLm9yZy8yMDAxLzA0L3htbGRzaWctbW9yZSNobWFjLXNoYTI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJ1cm46bXMuY29nbml0aXZlc2VydmljZXMiLCJleHAiOiIxNTYyMjU3Mzk4IiwicmVnaW9uIjoid2VzdHVzIiwic3Vic2NyaXB0aW9uLWlkIjoiNWY0NmFhZDgzN2NmNDliZmIwOTQxNWIyNTM4MGUyZmQiLCJwcm9kdWN0LWlkIjoiU3BlZWNoU2VydmljZXMuUzAiLCJjb2duaXRpdmUtc2VydmljZXMtZW5kcG9pbnQiOiJodHRwczovL2FwaS5jb2duaXRpdmUubWljcm9zb2Z0LmNvbS9pbnRlcm5hbC92MS4wLyIsImF6dXJlLXJlc291cmNlLWlkIjoiL3N1YnNjcmlwdGlvbnMvZmQ4ZTYwZjgtNjFkMi00ZTk3LTlmZTctMzU3YjQzZTNlM2RkL3Jlc291cmNlR3JvdXBzL3NwZWVjaHRvdGV4dC9wcm92aWRlcnMvTWljcm9zb2Z0LkNvZ25pdGl2ZVNlcnZpY2VzL2FjY291bnRzL2Z1c2lvbnBieCIsInNjb3BlIjoic3BlZWNoc2VydmljZXMiLCJhdWQiOiJ1cm46bXMuc3BlZWNoc2VydmljZXMud2VzdHVzIn0.LTJCzKWq-DOhORZQpL5bFOzdoAyULhIKjHELt5GxAU0' -H 'Content-type: audio/wav; codec="audio/pcm"; samplerate=8000; trustsourcerate=false' --data-binary @/var/lib/freeswitch/storage/voicemail/default/dhirendra.users.clearconverse.com/1001/msg_4a4fcc1e-a3a5-4029-9b1b-8fd54ccc8d23.wav
2019-07-05 10:16:10.989670 [NOTICE] switch_cpp.cpp:1365 [voicemail] RESULT: 
2019-07-05 10:16:10.989670 [NOTICE] switch_cpp.cpp:1365 [voicemail] TRANSCRIPTION: (null)

In this sample I would be looking for the log entry:

[NOTICE] switch_cpp.cpp:1365 [voicemail] Azure access_token recovered from memcached

In v1 of the Azure API, the access token expiry doesn't coincide with the expiry of the cached access_token, so when the token is retrieved from "memcache" (which is a file based cache instead memcache on recent releases) it is stale and gets refused by Azure, leading to the no transcription problem.

This can be verified in a couple ways including checking the lack of logs in the Azure dashboard. A surefire test is as follows.

  1. Flush Cache from within your FusionPBX panel.
  2. Try the voicemail transcription. It will work.
  3. Immediately after you can call again and it will transcribe again.
  4. Wait 20 minutes and try to use the voicemail transcription - It will not work.
  5. Flush Cache and try the voicemail transcription again - It will work.

You can also verify this without waiting by checking your cache in either /tmp or /var/cache/fusionpbx (or where ever your cache file location is set to -- check under default settings in FusionPBX web interface for the exact location). When you list the files in the directory you will see one that is named app.voicemail.azure.access_token and that is your culprit.

To fix the problem, a simple solution may be commenting out the action that checks for/saves the access_token.

This can be found in 2 locations:

/var/www/fusionpbx/app/scripts/resources/scripts/app/voicemail/resources/functions/record_message.lua
/usr/share/freeswitch/scripts/app/voicemail/resources/functions/record_message.lua

And you would be targeting either of the following lines:

cache.set(key, access_token_result, 120);
-- or --
local access_token_result = cache.get(key)

Just put two hyphens in front of either line and it will prevent saving of the token, or prevent reading of the token.

This solution will work as a patch but is not optimal because a new access token will be requested for each call.

A better solution would implement another cache for the desired expiry timestamp and then test if the current time is within the expiry time. V1 Azure tokens change expiry length from token to token. I'm not completely sure as to why but my best guess would be that they change it based on API usage. The less you make calls to the API the shorter the expiry time, the more API calls the longer the expiry. It could also be random, I'll be honest I didn't look that far into it.

With that said I would estimate 5 minutes as a safe expiry time if I was to hard code a number into the script. Adding the timestamp to cache could look something like below:

local access_current_time = os.time(os.date("!*t"))
local access_token_lifetime = 300
local access_token_expiry = access_current_time + access_token_lifetime

cache.set("app:voicemail:azure:access_expiry", access_token_expiry, access_token_expiry)

But the best way to approach this is to decode the access token, as you receive it, from Azure then calculate the delta between the "exp" value of the token and the current UTC timestamp and plug that into access_token_lifetime. In this way the expiry of your cached token will be in sync with Azure thereby maximizing the efficiency of your token caching.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,408 questions
{count} votes