GPT-4 Turbo with Vision 사용

아티클
05/07/2024

GPT-4 Turbo with Vision은 이미지를 분석하고 이미지에 대한 질문에 대한 텍스트 응답을 제공할 수 있는 OpenAI에서 개발한 LMM(대형 다중 모드 모델)입니다. 이는 자연어 처리와 시각적 이해를 모두 통합합니다.

GPT-4 Turbo with Vision 모델은 이미지에 무엇이 있는지에 대한 일반적인 질문에 답합니다. Vision 향상을 사용하는 경우 동영상을 표시할 수도 있습니다.

팁

GPT-4 Turbo with Vision을 사용하려면 배포한 GPT-4 Turbo with Vision 모델에서 채팅 완료 API를 호출합니다. 채팅 완료 API에 익숙하지 않은 경우 GPT-4 Turbo 및 GPT-4 방법 가이드를 참조하세요.

GPT-4 Turbo 모델 업그레이드

GPT-4 Turbo의 최신 GA 릴리스는 다음과 같습니다.

gpt-4버전turbo-2024-04-09:

이는 다음 미리 보기 모델을 대체합니다.

gpt-4버전1106-Preview:
gpt-4버전0125-Preview:
gpt-4버전vision-preview:

OpenAI와 Azure OpenAI GPT-4 Turbo GA 모델의 차이점

OpenAI의 최신 0409 터보 모델 버전은 JSON 모드와 모든 유추 요청에 대한 함수 호출을 지원합니다.
Azure OpenAI의 최신 turbo-2024-04-09 버전은 현재 이미지(비전) 입력으로 유추 요청을 할 때 JSON 모드 및 함수 호출 사용을 지원하지 않습니다. 텍스트 기반 입력 요청(image_url 및 인라인 이미지가 없는 요청)은 JSON 모드 및 함수 호출을 지원합니다.

gpt-4 vision-preview와의 차이점

GPT-4 Turbo with Vision과 Azure AI 관련 Vision 개선 사항 통합은 gpt-4버전:turbo-2024-04-09에서는 지원되지 않습니다. 여기에는 OCR(광학 인식), 개체 근거 있는, 동영상 프롬프트 및 이미지가 포함된 데이터 처리 개선이 포함됩니다.

GPT-4 Turbo 프로비전된 관리 가용성

gpt-4버전:turbo-2024-04-09는 표준 배포와 프로비전 배포 모두에 사용할 수 있습니다. 현재 이 모델의 프로비전된 버전은 이미지/비전 유추 요청을 지원하지 않습니다. 이 모델의 프로비전된 배포에서는 텍스트 입력만 허용됩니다. 표준 모델 배포는 텍스트 및 이미지/비전 유추 요청을 모두 허용합니다.

사용 가능 지역

모델 지역별 가용성에 대한 자세한 내용은 표준 및 프로비전 배포에 대한 모델 행렬을 참조하세요.

GPT-4 Turbo with Vision GA 배포

Studio UI에서 GA 모델을 배포하려면 GPT-4를 선택한 다음 드롭다운 메뉴에서 turbo-2024-04-09 버전을 선택합니다. gpt-4-turbo-2024-04-09 모델의 기본 할당량은 GPT-4-Turbo의 현재 할당량과 동일합니다. 지역별 할당량 한도를 참조하세요.

채팅 완료 API 호출

다음 명령은 코드로 GPT-4 Turbo with Vision 모델을 사용하는 가장 기본적인 방법을 보여 줍니다. 이러한 모델을 프로그래밍 방식으로 처음 사용하는 경우 GPT-4 Turbo with Vision 빠른 시작부터 시작하는 것이 좋습니다.

REST
Python

https://{RESOURCE_NAME}.openai.azure.com/openai/deployments/{DEPLOYMENT_NAME}/chat/completions?api-version=2023-12-01-preview에 POST 요청을 보냅니다.

RESOURCE_NAME은 Azure OpenAI 리소스의 이름입니다.
DEPLOYMENT_NAME은 GPT-4 Turbo with Vision 모델 배포의 이름입니다.

필수 헤더:

Content-Type: application/json
api-key: {API_KEY}

본문: 다음은 샘플 요청 본문입니다. 메시지 콘텐츠가 텍스트와 이미지(이미지에 대한 유효한 HTTP 또는 HTTPS URL 또는 Base-64로 인코딩된 이미지)를 포함하는 배열일 수 있다는 점을 제외하면 형식은 GPT-4용 채팅 완료 API와 동일합니다.

Important

"max_tokens" 값을 설정해야 합니다. 그렇지 않으면 반환 출력이 차단됩니다.

{
    "messages": [ 
        {
            "role": "system", 
            "content": "You are a helpful assistant." 
        },
        {
            "role": "user", 
            "content": [
	            {
	                "type": "text",
	                "text": "Describe this picture:"
	            },
	            {
	                "type": "image_url",
	                "image_url": {
                        "url": "<image URL>"
                    }
                } 
           ] 
        }
    ],
    "max_tokens": 100, 
    "stream": false 
}

Azure OpenAI 리소스 엔드포인트와 키를 정의합니다.
GPT-4 Turbo with Vision 모델 배포의 이름을 입력합니다.

해당 값을 사용하여 클라이언트 개체를 만듭니다.

api_base = '<your_azure_openai_endpoint>' # your endpoint should look like the following https://YOUR_RESOURCE_NAME.openai.azure.com/
api_key="<your_azure_openai_key>"
deployment_name = '<your_deployment_name>'
api_version = '2023-12-01-preview' # this might change in the future

client = AzureOpenAI(
    api_key=api_key,  
    api_version=api_version,
    base_url=f"{api_base}openai/deployments/{deployment_name}/extensions",
)

그런 다음 클라이언트의 create 메서드를 호출합니다. 다음 코드는 샘플 요청 본문을 보여 줍니다. 메시지 콘텐츠가 텍스트와 이미지(이미지에 대한 유효한 HTTP 또는 HTTPS URL 또는 Base-64로 인코딩된 이미지)를 포함하는 배열일 수 있다는 점을 제외하면 형식은 GPT-4용 채팅 완료 API와 동일합니다.

Important

"max_tokens" 값을 설정해야 합니다. 그렇지 않으면 반환 출력이 차단됩니다.

response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Describe this picture:" 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": "<image URL>"
                }
            }
        ] } 
    ],
    max_tokens=2000 
)
print(response)

팁

로컬 이미지 사용

로컬 이미지를 사용하려면 다음 Python 코드를 사용하여 이를 base64로 변환하여 API에 전달할 수 있습니다. 대체 파일 변환 도구는 온라인에서 사용할 수 있습니다.

import base64
from mimetypes import guess_type

# Function to encode a local image into data URL 
def local_image_to_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"

# Example usage
image_path = '<path_to_image>'
data_url = local_image_to_data_url(image_path)
print("Data URL:", data_url)

base64 이미지 데이터가 준비되면 다음과 같이 요청 본문의 API에 전달할 수 있습니다.

...
"type": "image_url",
"image_url": {
   "url": "data:image/jpeg;base64,<your_image_data>"
}
...

출력

API 응답은 다음과 같아야 합니다.

{
    "id": "chatcmpl-8VAVx58veW9RCm5K1ttmxU6Cm4XDX",
    "object": "chat.completion",
    "created": 1702439277,
    "model": "gpt-4",
    "prompt_filter_results": [
        {
            "prompt_index": 0,
            "content_filter_results": {
                "hate": {
                    "filtered": false,
                    "severity": "safe"
                },
                "self_harm": {
                    "filtered": false,
                    "severity": "safe"
                },
                "sexual": {
                    "filtered": false,
                    "severity": "safe"
                },
                "violence": {
                    "filtered": false,
                    "severity": "safe"
                }
            }
        }
    ],
    "choices": [
        {
            "finish_reason":"stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The picture shows an individual dressed in formal attire, which includes a black tuxedo with a black bow tie. There is an American flag on the left lapel of the individual's jacket. The background is predominantly blue with white text that reads \"THE KENNEDY PROFILE IN COURAGE AWARD\" and there are also visible elements of the flag of the United States placed behind the individual."
            },
            "content_filter_results": {
                "hate": {
                    "filtered": false,
                    "severity": "safe"
                },
                "self_harm": {
                    "filtered": false,
                    "severity": "safe"
                },
                "sexual": {
                    "filtered": false,
                    "severity": "safe"
                },
                "violence": {
                    "filtered": false,
                    "severity": "safe"
                }
            }
        }
    ],
    "usage": {
        "prompt_tokens": 1156,
        "completion_tokens": 80,
        "total_tokens": 1236
    }
}

모든 응답에는 "finish_details" 필드가 포함됩니다. 가능한 값은 다음과 같습니다.

stop: API가 전체 모델 출력을 반환했습니다.
length: max_tokens 입력 매개 변수 또는 모델의 토큰 제한으로 인해 모델 출력이 불완전합니다.
content_filter: 콘텐츠 필터의 플래그로 인해 콘텐츠가 생략되었습니다.

이미지 처리의 세부 매개 변수 설정: 낮음, 높음, 자동

모델의 detail 매개 변수는 모델이 이미지를 해석하고 처리하는 방식을 조정하기 위해 low, high 또는 auto의 세 가지 선택 사항을 제공합니다. 기본 설정은 자동입니다. 여기서 모델은 이미지 입력 크기에 따라 낮음 또는 높음 중에서 결정합니다.

low 설정: 모델은 "고해상도" 모드를 활성화하지 않고 대신 저해상도 512x512 버전을 처리하므로 세밀한 세부 사항이 중요하지 않은 시나리오에 대해 응답 속도가 빨라지고 토큰 사용량이 줄어듭니다.
high 설정: 모델이 "고해상도" 모드를 활성화합니다. 여기서 모델은 처음에 저해상도 이미지를 본 다음 입력 이미지에서 상세한 512x512 세그먼트를 생성합니다. 각 세그먼트는 토큰 예산의 두 배를 사용하므로 이미지를 보다 자세히 해석할 수 있습니다.''

이미지 매개 변수가 사용된 토큰 및 가격 책정에 어떤 영향을 미치는지 자세히 알아보려면 OpenAI란? GPT-4 Turbo with Vision을 사용하는 이미지 토큰

이미지에 Vision 향상 사용

GPT-4 Turbo with Vision은 Azure AI 서비스 맞춤형 개선 사항에 대한 제외적인 액세스를 제공합니다. Azure AI 비전과 결합하면 이미지에 표시되는 텍스트와 개체 위치에 대한 더 자세한 정보를 채팅 모델에 제공하여 채팅 환경을 향상시킵니다.

OCR(광학 인식) 통합을 통해 모델은 밀도가 높은 텍스트, 변환된 이미지 및 숫자가 많은 재무 문서에 대해 더 높은 품질의 응답을 생성할 수 있습니다. 또한 더 넓은 범위의 언어를 다루고 있습니다.

개체 접지 통합은 처리하는 이미지에서 중요한 요소를 시각적으로 구분하고 강조 표시할 수 있으므로 데이터 분석 및 사용자 상호 작용에 새로운 계층을 제공합니다.

Important

Azure OpenAI 리소스에서 Vision 향상 기능을 사용하려면 Computer Vision 리소스를 지정해야 합니다. 유료(S1) 계층이어야 하며 GPT-4 Turbo with Vision 리소스와 동일한 Azure 지역에 있어야 합니다. Azure AI Services 리소스를 사용하는 경우 추가 Computer Vision 리소스가 필요하지 않습니다.

주의

GPT-4 Turbo with Vision에 대한 Azure AI 개선 사항은 핵심 기능과 별도로 요금이 청구됩니다. GPT-4 Turbo with Vision에 대한 각 특정 Azure AI 개선 사항에는 고유한 요금이 있습니다. 자세한 내용은 특별 가격 책정 정보를 참조하세요.

REST
Python

https://{RESOURCE_NAME}.openai.azure.com/openai/deployments/{DEPLOYMENT_NAME}/extensions/chat/completions?api-version=2023-12-01-preview에 POST 요청을 보냅니다.

RESOURCE_NAME은 Azure OpenAI 리소스의 이름입니다.
DEPLOYMENT_NAME은 GPT-4 Turbo with Vision 모델 배포의 이름입니다.

필수 헤더:

Content-Type: application/json
api-key: {API_KEY}

본문:

형식은 GPT-4용 채팅 완료 API의 형식과 유사하지만 메시지 콘텐츠는 문자열과 이미지(이미지에 대한 유효한 HTTP 또는 HTTPS URL 또는 Base-64로 인코딩된 이미지)를 포함하는 배열일 수 있습니다.

enhancements 및 dataSources 개체도 포함해야 합니다. enhancements는 채팅에서 요청된 특정 Vision 향상 기능을 나타냅니다. 여기에는 부울 enabled 속성이 있는 grounding 및 ocr 속성이 있습니다. 이를 사용하여 OCR 서비스 및/또는 개체 감지/접지 서비스를 요청합니다. dataSources는 Vision 향상에 필요한 Computer Vision 리소스 데이터를 나타냅니다. 여기에는 "AzureComputerVision"이어야 하는 type 속성과 parameters 속성이 있습니다. endpoint 및 key를 Computer Vision 리소스의 엔드포인트 URL과 액세스 키로 설정합니다.

Important

"max_tokens" 값을 설정해야 합니다. 그렇지 않으면 반환 출력이 차단됩니다.

{
    "enhancements": {
            "ocr": {
              "enabled": true
            },
            "grounding": {
              "enabled": true
            }
    },
    "dataSources": [
    {
        "type": "AzureComputerVision",
        "parameters": {
            "endpoint": "<your_computer_vision_endpoint>",
            "key": "<your_computer_vision_key>"
        }
    }],
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": [
	            {
	                "type": "text",
	                "text": "Describe this picture:"
	            },
	            {
	                "type": "image_url",
	                "image_url": {
                        "url":"<image URL>" 
                    }
                }
           ] 
        }
    ],
    "max_tokens": 100, 
    "stream": false 
}

이전 단계와 동일한 메서드를 호출하지만 새로운 extra_body 매개 변수를 포함합니다. 여기에는 enhancements 및 dataSources 필드가 포함되어 있습니다.

enhancements는 채팅에서 요청된 특정 Vision 향상 기능을 나타냅니다. 여기에는 부울 enabled 속성이 있는 grounding 및 ocr 필드가 있습니다. 이를 사용하여 OCR 서비스 및/또는 개체 감지/접지 서비스를 요청합니다.

dataSources는 Vision 향상에 필요한 Computer Vision 리소스 데이터를 나타냅니다. 여기에는 "AzureComputerVision"이어야 하는 type 필드와 parameters 필드가 있습니다. endpoint 및 key를 Computer Vision 리소스의 엔드포인트 URL과 액세스 키로 설정합니다. R

Important

"max_tokens" 값을 설정해야 합니다. 그렇지 않으면 반환 출력이 차단됩니다.

response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Describe this picture:" 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": "<image URL>"
                }
            }
        ] } 
    ],
    extra_body={
        "dataSources": [
            {
                "type": "AzureComputerVision",
                "parameters": {
                    "endpoint": "<your_computer_vision_endpoint>",
                    "key": "<your_computer_vision_key>"
                }
            }],
        "enhancements": {
            "ocr": {
                "enabled": True
            },
            "grounding": {
                "enabled": True
            }
        }
    },
    max_tokens=2000
)
print(response)

출력

이제 모델로부터 받는 채팅 응답에는 개체 레이블, 경계 상자, OCR 결과 등 이미지에 대한 향상된 정보가 포함됩니다. API 응답은 다음과 같아야 합니다.

{
    "id": "chatcmpl-8UyuhLfzwTj34zpevT3tWlVIgCpPg",
    "object": "chat.completion",
    "created": 1702394683,
    "model": "gpt-4",
    "choices":
    [
        {
            "finish_details": {
                "type": "stop",
                "stop": "<|fim_suffix|>"
            },
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "The image shows a close-up of an individual with dark hair and what appears to be a short haircut. The person has visible ears and a bit of their neckline. The background is a neutral light color, providing a contrast to the dark hair."
            },
            "enhancements":
            {
                "grounding":
                {
                    "lines":
                    [
                        {
                            "text": "The image shows a close-up of an individual with dark hair and what appears to be a short haircut. The person has visible ears and a bit of their neckline. The background is a neutral light color, providing a contrast to the dark hair.",
                            "spans":
                            [
                                {
                                    "text": "the person",
                                    "length": 10,
                                    "offset": 99,
                                    "polygon": [{"x":0.11950000375509262,"y":0.4124999940395355},{"x":0.8034999370574951,"y":0.4124999940395355},{"x":0.8034999370574951,"y":0.6434999704360962},{"x":0.11950000375509262,"y":0.6434999704360962}]
                                }
                            ]
                        }
                    ],
                    "status": "Success"
                }
            }
        }
    ],
    "usage":
    {
        "prompt_tokens": 816,
        "completion_tokens": 49,
        "total_tokens": 865
    }
}

모든 응답에는 "finish_details" 필드가 포함됩니다. 가능한 값은 다음과 같습니다.

stop: API가 전체 모델 출력을 반환했습니다.
length: max_tokens 입력 매개 변수 또는 모델의 토큰 제한으로 인해 모델 출력이 불완전합니다.
content_filter: 콘텐츠 필터의 플래그로 인해 콘텐츠가 생략되었습니다.

동영상에 Vision 향상 사용

GPT-4 Turbo with Vision은 Azure AI 서비스 맞춤형 개선 사항에 대한 제외적인 액세스를 제공합니다. 동영상 프롬프트 통합은 Azure AI 비전 동영상 검색을 사용하여 동영상에서 프레임 집합을 샘플링하고 동영상에서 음성 스크립트를 만듭니다. 이를 통해 AI 모델은 동영상 콘텐츠에 대한 요약과 답변을 제공할 수 있습니다.

다음 단계에 따라 비디오 검색 시스템을 설정하고 AI 채팅 모델과 통합합니다.

Important

주의

팁

원하는 경우 대신 Jupyter Notebook을 사용하여 다음 단계를 수행할 수 있습니다. 비디오 채팅 완료 전자 필기장.

Azure Blob Storage에 비디오 업로드

Azure Blob Storage 컨테이너에 비디오를 업로드해야 합니다. 아직 없는 경우 새 스토리지 계정을 만듭니다.

비디오가 업로드되면 이후 단계에서 액세스하는 데 사용하는 SAS URL을 가져올 수 있습니다.

적절한 읽기 액세스 확인

인증 방법에 따라 Azure Blob Storage 컨테이너에 대한 액세스 권한을 부여하기 위해 몇 가지 추가 단계를 수행해야 할 수 있습니다. Azure OpenAI 리소스 대신 Azure AI Services 리소스를 사용하는 경우 관리 ID를 사용하여 Azure Blob Storage에 대한 읽기 액세스 권한을 부여해야 합니다.

시스템 할당 ID 사용
사용자 할당 ID 사용

다음 단계를 수행하여 Azure AI Services 리소스에서 시스템 할당 ID를 사용하도록 설정합니다.

Azure Portal의 AI Services 리소스에서 리소스 관리 선택 - >ID 및 상태를 ON으로 전환합니다.
Storage Blob Data Read 액세스 권한을 AI Services 리소스에 할당합니다. ID 페이지에서 Azure 역할 할당을 선택한 다음, 다음 설정을 사용하여 역할 할당을 추가합니다.
- 범위: 스토리지
- 구독: {사용자 구독}
- 리소스: {Azure Blob Storage 리소스 선택}
- 역할: 스토리지 Blob 데이터 판독기
설정을 저장합니다.

비디오 검색 인덱스 만들기

사용 중인 Azure OpenAI 리소스와 동일한 지역에서 Azure AI 비전 리소스를 가져옵니다.

비디오 파일 및 해당 메타데이터를 저장하고 구성하는 인덱스 만들기 아래 예제 명령은 인덱스 만들기 API를 사용하여 my-video-index(으)로 명명된 인덱스를 만드는 방법을 보여 줍니다. 인덱스 이름을 임시 위치에 저장합니다. 이후 단계에서 필요합니다.

팁

비디오 인덱스를 만드는 방법에 대한 자세한 지침은 벡터화를 사용하여 비디오 검색 수행을 참조하세요.

curl.exe -v -X PUT "https://<YOUR_ENDPOINT_URL>/computervision/retrieval/indexes/my-video-index?api-version=2023-05-01-preview" -H "Ocp-Apim-Subscription-Key: <YOUR_SUBSCRIPTION_KEY>" -H "Content-Type: application/json" --data-ascii "
{
  'metadataSchema': {
    'fields': [
      {
        'name': 'cameraId',
        'searchable': false,
        'filterable': true,
        'type': 'string'
      },
      {
        'name': 'timestamp',
        'searchable': false,
        'filterable': true,
        'type': 'datetime'
      }
    ]
  },
  'features': [
    {
      'name': 'vision',
      'domain': 'surveillance'
    },
    {
      'name': 'speech'
    }
  ]
}"

연결된 메타데이터를 사용하여 인덱스로 비디오 파일을 추가합니다. 다음 예제에서는 수집 만들기 API와 함께 SAS URL을 사용하여 비디오 파일 2개를 인덱스에 추가하는 방법을 보여줍니다. SAS URL 및 documentId 값을 임시 위치에 저장합니다. 이후 단계에서 필요합니다.

curl.exe -v -X PUT "https://<YOUR_ENDPOINT_URL>/computervision/retrieval/indexes/my-video-index/ingestions/my-ingestion?api-version=2023-05-01-preview" -H "Ocp-Apim-Subscription-Key: <YOUR_SUBSCRIPTION_KEY>" -H "Content-Type: application/json" --data-ascii "
{
  'videos': [
    {
      'mode': 'add',
      'documentId': '02a504c9cd28296a8b74394ed7488045',
      'documentUrl': 'https://example.blob.core.windows.net/videos/02a504c9cd28296a8b74394ed7488045.mp4?sas_token_here',
      'metadata': {
        'cameraId': 'camera1',
        'timestamp': '2023-06-30 17:40:33'
      }
    },
    {
      'mode': 'add',
      'documentId': '043ad56daad86cdaa6e493aa11ebdab3',
      'documentUrl': '[https://example.blob.core.windows.net/videos/043ad56daad86cdaa6e493aa11ebdab3.mp4?sas_token_here',
      'metadata': {
        'cameraId': 'camera2'
      }
    }
  ]
}"

비디오 파일을 인덱스에 추가하면 수집 프로세스가 시작됩니다. 데이터베이스의 크기와 수에 따라 다소 시간이 걸릴 수 있습니다. 검색을 수행하기 전에 수집이 완료되었는지 확인하려면 수집 가져오기 API를 사용하여 상태를 확인하면 됩니다. 다음 단계로 진행하기 전에 이 호출에서 "state" = "Completed"를 반환할 때까지 기다립니다.
```
curl.exe -v -X GET "https://<YOUR_ENDPOINT_URL>/computervision/retrieval/indexes/my-video-index/ingestions?api-version=2023-05-01-preview&$top=20" -H "ocp-apim-subscription-key: <YOUR_SUBSCRIPTION_KEY>"
```

https://{RESOURCE_NAME}.openai.azure.com/openai/deployments/{DEPLOYMENT_NAME}/extensions/chat/completions?api-version=2023-12-01-preview에 대한 POST 요청을 준비합니다.
- RESOURCE_NAME은 Azure OpenAI 리소스의 이름입니다.
- DEPLOYMENT_NAME은 GPT-4 Vision 모델 배포의 이름입니다.
필수 헤더:
- Content-Type: application/json
- api-key: {API_KEY}

요청 본문에 다음 JSON 구조를 추가합니다.

{
    "enhancements": {
            "video": {
              "enabled": true
            }
    },
    "dataSources": [
    {
        "type": "AzureComputerVisionVideoIndex",
        "parameters": {
            "computerVisionBaseUrl": "<your_computer_vision_endpoint>",
            "computerVisionApiKey": "<your_computer_vision_key>",
            "indexName": "<name_of_your_index>",
            "videoUrls": ["<your_video_SAS_URL>"]
        }
    }],
    "messages": [ 
        {
            "role": "system", 
            "content": "You are a helpful assistant." 
        },
        {
            "role": "user",
            "content": [
                    {
                        "type": "acv_document_id",
                        "acv_document_id": "<your_video_ID>"
                    },
                    {
                        "type": "text",
                        "text": "Describe this video:"
                    }
                ]
        }
    ],
    "max_tokens": 100, 
}

요청에는 enhancements 및 dataSources 개체가 포함됩니다. enhancements는 채팅에서 요청된 특정 Vision 향상 기능을 나타냅니다. dataSources는 Vision 향상에 필요한 Computer Vision 리소스 데이터를 나타냅니다. 여기에는 "AzureComputerVisionVideoIndex"여야 하는 type 속성과 AI Vision 및 동영상 정보를 포함하는 parameters 속성이 있습니다.

위의 모든 <placeholder> 필드를 고유의 정보로 채우세요. 적절한 경우 OpenAI 및 AI Vision 리소스의 엔드포인트 URL과 키를 입력하고 이전 단계에서 동영상 인덱스 정보를 검색합니다.
API 엔드포인트에 POST 요청을 보냅니다. 여기에는 OpenAI 및 AI Vision 자격 증명, 동영상 인덱스 이름, 단일 동영상의 ID 및 SAS URL이 포함되어야 합니다.

Python 스크립트에서 클라이언트의 이전 섹션과 같은 메서드 만들기를 호출하지만 extra_body 매개 변수를 포함합니다. 여기에는 enhancements 및 data_sources 필드가 포함되어 있습니다. enhancements는 채팅에서 요청된 특정 Vision 향상 기능을 나타냅니다. 여기에는 부울 enabled 속성이 있는 video 필드가 있습니다. 영상 검색 서비스를 요청하려면 이 단추를 사용합니다.

data_sources는 Vision 향상에 필요한 외부 리소스 데이터를 나타냅니다. 여기에는 "AzureComputerVisionVideoIndex"이어야 하는 type 필드와 parameters 필드가 있습니다.

computerVisionBaseUrl 및 computerVisionApiKey를 Computer Vision 리소스의 엔드포인트 URL과 액세스 키로 설정합니다. indexName을 동영상 인덱스의 이름으로 설정합니다. videoUrls를 동영상의 SAS URL 목록으로 설정합니다.

Important

"max_tokens" 값을 설정해야 합니다. 그렇지 않으면 반환 출력이 차단됩니다.

response = client.chat.completions.create(
    model=deployment_name,
    messages=[
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": [  
            {
                "type": "acv_document_id",
                "acv_document_id": "<your_video_ID>"
            },
            { 
                "type": "text", 
                "text": "Describe this video:" 
            }
        ] } 
    ],
    extra_body={
        "data_sources": [
            {
                "type": "AzureComputerVisionVideoIndex",
                "parameters": {
                    "computerVisionBaseUrl": "<your_computer_vision_endpoint>", # your endpoint should look like the following https://YOUR_RESOURCE_NAME.cognitiveservices.azure.com/computervision
                    "computerVisionApiKey": "<your_computer_vision_key>",
                    "indexName": "<name_of_your_index>",
                    "videoUrls": ["<your_video_SAS_URL>"]
                }
            }],
        "enhancements": {
            "video": {
                "enabled": True
            }
        }
    },
    max_tokens=100
)

print(response)

Important

"data_sources" 개체의 콘텐츠는 사용 중인 Azure 리소스 종류 및 인증 방법에 따라 달라집니다. 다음 참조를 참조하세요.

"data_sources": [
{
    "type": "AzureComputerVisionVideoIndex",
    "parameters": {
    "endpoint": "<your_computer_vision_endpoint>",
    "computerVisionApiKey": "<your_computer_vision_key>",
    "indexName": "<name_of_your_index>",
    "videoUrls": ["<your_video_SAS_URL>"]
    }
}],

"data_sources": [
{
    "type": "AzureComputerVisionVideoIndex",
    "parameters": {
    "indexName": "<name_of_your_index>",
    "videoUrls": ["<your_video_SAS_URL>"]
    }
}],

"data_sources": [
{
    "type": "AzureComputerVisionVideoIndex",
    "parameters": {
        "indexName": "<name_of_your_index>",
        "documentAuthenticationKind": "managedidentity",
    }
}],

출력

모델로부터 받는 채팅 응답에는 동영상에 대한 정보가 포함되어야 합니다. API 응답은 다음과 같아야 합니다.

{
    "id": "chatcmpl-8V4J2cFo7TWO7rIfs47XuDzTKvbct",
    "object": "chat.completion",
    "created": 1702415412,
    "model": "gpt-4",
    "choices":
    [
        {
            "finish_reason":"stop",
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "The advertisement video opens with a blurred background that suggests a serene and aesthetically pleasing environment, possibly a workspace with a nature view. As the video progresses, a series of frames showcase a digital interface with search bars and prompts like \"Inspire new ideas,\" \"Research a topic,\" and \"Organize my plans,\" suggesting features of a software or application designed to assist with productivity and creativity.\n\nThe color palette is soft and varied, featuring pastel blues, pinks, and purples, creating a calm and inviting atmosphere. The backgrounds of some frames are adorned with abstract, organically shaped elements and animations, adding to the sense of innovation and modernity.\n\nMidway through the video, the focus shifts to what appears to be a browser or software interface with the phrase \"Screens simulated, subject to change; feature availability and timing may vary,\" indicating the product is in development and that the visuals are illustrative of its capabilities.\n\nThe use of text prompts continues with \"Help me relax,\" followed by a demonstration of a 'dark mode' feature, providing a glimpse into the software's versatility and user-friendly design.\n\nThe video concludes by revealing the product name, \"Copilot,\" and positioning it as \"Your everyday AI companion,\" implying the use of artificial intelligence to enhance daily tasks. The final frames feature the Microsoft logo, associating the product with the well-known technology company.\n\nIn summary, the advertisement video is for a Microsoft product named \"Copilot,\" which seems to be an AI-powered software tool aimed at improving productivity, creativity, and organization for its users. The video conveys a message of innovation, ease, and support in daily digital interactions through a visually appealing and calming presentation."
            }
        }
    ],
    "usage":
    {
        "prompt_tokens": 2068,
        "completion_tokens": 341,
        "total_tokens": 2409
    }
}

모든 응답에는 "finish_details" 필드가 포함됩니다. 가능한 값은 다음과 같습니다.

stop: API가 전체 모델 출력을 반환했습니다.
length: max_tokens 입력 매개 변수 또는 모델의 토큰 제한으로 인해 모델 출력이 불완전합니다.
content_filter: 콘텐츠 필터의 플래그로 인해 콘텐츠가 생략되었습니다.

동영상 프롬프트 가격 책정 예

GPT-4 Turbo with Vision의 가격 책정은 동적이며 사용되는 특정 기능과 입력에 따라 달라집니다. Azure OpenAI 가격 책정을 포괄적으로 보려면 Azure OpenAI 가격 책정을 참조하세요.

기본 요금 및 추가 기능은 다음과 같습니다.

GPT-4 Turbo with Vision의 기본 가격 책정은 다음과 같습니다.

입력: 토큰 1000개당 $0.01
출력: 토큰 1000개당 $0.03

동영상 검색 추가 기능과 동영상 프롬프트 통합:

수집: 동영상 분당 $0.05
트랜잭션: 비디오 검색 쿼리 1000개당 $0.25

Share via

GPT-4 Turbo with Vision 사용

GPT-4 Turbo 모델 업그레이드

OpenAI와 Azure OpenAI GPT-4 Turbo GA 모델의 차이점

gpt-4 vision-preview와의 차이점

GPT-4 Turbo 프로비전된 관리 가용성

사용 가능 지역

GPT-4 Turbo with Vision GA 배포

채팅 완료 API 호출

로컬 이미지 사용

출력

이미지 처리의 세부 매개 변수 설정: 낮음, 높음, 자동

이미지에 Vision 향상 사용

출력

동영상에 Vision 향상 사용

Azure Blob Storage에 비디오 업로드

적절한 읽기 액세스 확인

비디오 검색 인덱스 만들기

비디오 인덱스와 GPT-4 Turbo를 Vision과 통합

출력

동영상 프롬프트 가격 책정 예

다음 단계

추가 리소스