텍스트 병합 인지 기술

아티클
10/26/2023

텍스트 병합 기술은 문자열 배열의 텍스트를 단일 필드로 통합합니다.

참고 항목

이 기술은 Azure AI 서비스에 바인딩되지 않습니다. 청구할 수 없으며 Azure AI 서비스 키 요구 사항이 없습니다.

@odata.type

Microsoft.Skills.Text.MergeSkill

기술 매개 변수

매개 변수는 대/소문자를 구분합니다.

매개 변수 이름	설명
`insertPreTag`	삽입할 때마다 포함할 문자열입니다. 기본값은 `" "`입니다. 공간을 생략하려면 값을 `""`.로 설정합니다.
`insertPostTag`	모든 삽입 후에 포함될 문자열입니다. 기본값은 `" "`입니다. 공간을 생략하려면 값을 `""`.로 설정합니다.

기술 입력

입력 이름	설명
`itemsToInsert`	병합할 문자열의 배열입니다.
`text`	(선택 사항) 삽입할 기본 텍스트 본문입니다. `text`가 제공되지 않은 경우 `itemsToInsert` 요소가 연결됩니다.
`offsets`	(선택 사항) `itemsToInsert`를 삽입해야 하는 `text` 내 위치의 배열입니다. 제공된 경우 `text` 요소의 수가 `textToInsert` 요소의 수와 같아야 합니다. 그렇지 않으면 모든 항목이 `text` 끝에 추가됩니다.

기술 출력

출력 이름	설명
`mergedText`	결과 병합된 텍스트입니다.
`mergedOffsets`	`itemsToInsert` 요소가 삽입된 `mergedText` 내 위치의 배열입니다.

샘플 입력

이 기술에 대해 사용 가능한 입력을 제공하는 JSON 문서는 다음과 같을 수 있습니다.

{
  "values": [
    {
      "recordId": "1",
      "data":
      {
        "text": "The brown fox jumps over the dog",
        "itemsToInsert": ["quick", "lazy"],
        "offsets": [3, 28]
      }
    }
  ]
}

샘플 출력

이 예제에서는 insertPreTag가 설정" "되고 insertPostTag가 설정된 것으로 가정하여 이전 입력의 출력을 ""보여줍니다.

{
  "values": [
    {
      "recordId": "1",
      "data":
      {
        "mergedText": "The quick brown fox jumps over the lazy dog"
      }
    }
  ]
}

확장된 샘플 기술 세트 정의

텍스트 병합을 사용하는 일반적인 시나리오는 이미지의 텍스트 표현(OCR 기술의 텍스트 또는 이미지 캡션)을 문서의 콘텐츠 필드에 병합하는 것입니다.

다음 예제 기술 세트는 OCR 기술을 사용하여 문서에 포함된 이미지에서 텍스트를 추출합니다. 다음으로, 각 이미지의 원본 텍스트와OCRed 텍스트를 모두 포함하는 merged_text 필드를 만듭니다. 여기에서 OCR 기술에 대해 자세히 알아볼 수 있습니다.

{
  "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text", 
          "source": "/document/content"
        },
        {
          "name": "itemsToInsert", 
          "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", 
          "source": "/document/normalized_images/*/contentOffset" 
        }
      ],
      "outputs": [
        {
          "name": "mergedText", 
          "targetName" : "merged_text"
        }
      ]
    }
  ]
}

위의 예제에서는 정규화된 이미지 필드가 있다고 가정합니다. 정규화된 이미지 필드를 얻으려면 아래와 같이 인덱서 정의에서 imageAction 구성을 generateNormalizedImages로 설정합니다.

{
  //...rest of your indexer definition goes here ...
  "parameters":{
    "configuration":{
        "dataToExtract":"contentAndMetadata",
        "imageAction":"generateNormalizedImages"
    }
  }
}