GPT-4 vision-preview delay in response and no response in many cases

Wajid Anwar 0 Reputation points
2024-05-15T18:13:11.3066667+00:00

We are running a UI front end node.js and supabase for authentication on top of azure open ai in our azure environment, the UI is working as expected, and authentication is also happening, we are using different models such as GPT-4 vision-preview Mistral, Claude, and DALL-E 3, We've been experiencing significant delays in GPT-4 vision-preview delay in response and no response in many cases in GPT-4 vision-preview. This issue persists across different environments, despite using various deployment keys.

Key Points:

Quota Check: Confirmed we haven't exceeded our quota.

Other Models: Mistral, Claude, and DALL-E 3 are functioning normally.

Azure OpenAI Studio: GPT-4 Vision is also not working here.

Any insights or similar experiences would be appreciated.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,358 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. AshokPeddakotla-MSFT 28,916 Reputation points
    2024-05-16T06:38:18.51+00:00

    Wajid Anwar Greetings!

    We are running a UI front end node.js and supabase for authentication on top of azure open ai in our azure environment, the UI is working as expected, and authentication is also happening, we are using different models such as GPT-4 vision-preview Mistral, Claude, and DALL-E 3, We've been experiencing significant delays in GPT-4 vision-preview delay in response and no response in many cases in GPT-4 vision-preview. This issue persists across different environments, despite using various deployment keys.

    I understand that you are having issues with GPT-4 vision-preview model. Did you check How to Use GPT-4 Turbo with Vision?

    If you are using GPT4 model then latency is expected considering that gpt-4 has more capacity than the gpt-3.5 version.

    The latest GA release of GPT-4 Turbo is: gpt-4 Version: turbo-2024-04-09

    This is the replacement for the following preview models:

    • gpt-4 Version: 1106-Preview
    • gpt-4 Version: 0125-Preview
    • gpt-4 Version: vision-preview

    Try using the latest model and see if that helps. You can also check the performance by Monitoring Azure OpenAI Service.

    This article talks about Azure OpenAI service about improving the latency performance. Here are some of the best practices to lower latency:

    • Model latency: If model latency is important to you we recommend trying out our latest models in the GPT-3.5 Turbo model series.
    • Lower max tokens: OpenAI has found that even in cases where the total number of tokens generated is similar the request with the higher value set for the max token parameter will have more latency.
    • Lower total tokens generated: The fewer tokens generated the faster the overall response will be. Remember this is like having a for loop with n tokens = n iterations. Lower the number of tokens generated and overall response time will improve accordingly.
    • Streaming: Enabling streaming can be useful in managing user expectations in certain situations by allowing the user to see the model response as it is being generated rather than having to wait until the last token is ready.
    • Content Filtering improves safety, but it also impacts latency. Evaluate if any of your workloads would benefit from modified content filtering policies.

    Please let me know if that helps or have any further queries.

    0 comments No comments