Azure OpenAI Service GPT-4 is excessively (and recently) slow - when will this be resolved?
There is a significant performance issue with AOAI GPT-4 models in the last week or so. We are developing a solution using this service as one of several components, and over the past week response times from the REST API have degraded to a point of un-usability. A very simple prompt exchange, as shown in the screenshot below, requires >1 minute to return. This has been tested across multiple GPT-4 models in different subscriptions and deployments, both with and without content filtering, and with API versions 2023-07-01-preview and 2023-12-01-preview, in times of load and during 'cool' periods when token throughput has rested at 0 for a few hours, during peak business hours and during the middle of the night. GPT-3.5 model deployments for the same request payload are <4 seconds, so the issue is specific to GPT-4 models only. Input as to when this issue will be resolved would be appreciated, or other ideas for testing to support the team working on the fix. Screenshot: