Azure OpenAI Service GPT-4 is excessively (and recently) slow - when will this be resolved?

dankronstal 90

There is a significant performance issue with AOAI GPT-4 models in the last week or so. We are developing a solution using this service as one of several components, and over the past week response times from the REST API have degraded to a point of un-usability. A very simple prompt exchange, as shown in the screenshot below, requires >1 minute to return. This has been tested across multiple GPT-4 models in different subscriptions and deployments, both with and without content filtering, and with API versions 2023-07-01-preview and 2023-12-01-preview, in times of load and during 'cool' periods when token throughput has rested at 0 for a few hours, during peak business hours and during the middle of the night. GPT-3.5 model deployments for the same request payload are <4 seconds, so the issue is specific to GPT-4 models only. Input as to when this issue will be resolved would be appreciated, or other ideas for testing to support the team working on the fix. Screenshot: User's image

AshokPeddakotla-MSFT 27,491 Reputation points

2024-01-25T06:43:01.7633333+00:00

dankronstal Greetings!

I understand that you are having issues with GPT4 model slow response times.

Thanks for sharing the details. Similar issue has been discussed here see if that helps. Please note that GPT 4 is much slower than 3. If you look at the docs on latency, 35-turbo is the fastest model.

Also, refer to the same documentation which provides different factors which you can control to improve the performance like Model selection, Generation size and Max tokens, streaming etc.
dankronstal 90 Reputation points

2024-01-25T15:40:39.1333333+00:00

Thanks Ashok - the real issue here isn't GPT-3.5 vs GPT-4, but rather GPT-4 today vs GPT-4 a week or two ago. That's where the degradation in performance has been observed. Documentation has been incorporated already prior to my comments.
AshokPeddakotla-MSFT 27,491 Reputation points

2024-02-20T01:55:56.23+00:00

Apologies for the delayed response. There could be several reasons for the latency as mentioned above. However, For a deeper investigation and immediate assistance on this issue, please file a support request @ https://aka.ms/azsupt? Do let me know if you face any issue while creating a support request.
Brady Begeman 15 Reputation points

2024-04-15T18:41:34.3333333+00:00

It's been months and the performance is still utterly terrible. This is with gpt-4-1106 which is GPT-4 Turbo. If OpenAI can have just a few seconds of latency for their ChatGPT product, Azure OpenAI services should be competitive with that, not literally 10x slower.
Jason A 0 Reputation points

2024-04-23T17:35:48.5+00:00
Hi Team,

We have found a work around for the GPT-4 slowness - hope it helps

Hire a resource

Route the incoming prompts via email

Have the resource watch their email 24/7 (maybe during business hours only depending on your traffic patterns)

Have the resource open Chat GPT on Openai.com

Take the response from Chat GPT and send back in a reply via email

Have an SMTP watcher then pipe the response back to the user

We have found that this is much faster than the current experience.

-J