사용자 지정 엔터티 패턴 일치를 사용하여 의도를 인식하는 방법

아티클
02/26/2024

Azure AI 서비스 음성 SDK에서는 간단한 언어 패턴 일치를 통해 의도 인식을 제공하는 기능이 기본 제공됩니다. 의도는 창 닫기, 확인란 표시, 텍스트 삽입 등을 사용자가 수행하려는 작업입니다.

이 가이드에서는 Speech SDK를 사용하여 디바이스의 마이크를 통해 말하는 음성 발화에서 의도를 파생시키는 콘솔 애플리케이션을 개발합니다. 다음 방법에 대해 설명합니다.

Speech SDK NuGet 패키지를 참조하는 Visual Studio 프로젝트 만들기
음성 구성을 만들고 의도 인식기 가져오기
Speech SDK API를 통해 의도 및 패턴 추가
Speech SDK API를 통해 사용자 지정 엔터티 추가
비동기, 이벤트 중심 연속 인식 사용

패턴 일치를 사용하는 경우

다음과 같은 경우 패턴 일치를 사용합니다.

사용자가 말한 것과 정확히 일치시키는 데만 관심이 있습니다. 이러한 패턴은 CLU(대화 언어 이해)보다 더 적극적으로 일치합니다.
CLU 모델에 대한 액세스 권한이 없지만 여전히 의도가 필요합니다.

자세한 내용은 패턴 일치 개요를 참조하세요.

필수 조건

이 가이드를 시작하기 전에, 다음 항목을 갖추고 있는지 확인합니다.

Azure AI 서비스 리소스 또는 통합 음성 리소스
Visual Studio 2019 모든 버전.

프로젝트 만들기

Visual Studio 2019에서 새 C# 콘솔 애플리케이션 프로젝트를 만들고 Speech SDK를 설치합니다.

몇 가지 상용구 코드로 시작

Program.cs를 열고 프로젝트의 골격으로 작동하는 코드를 추가해 보겠습니다.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Intent;

namespace helloworld
{
    class Program
    {
        static void Main(string[] args)
        {
            IntentPatternMatchingWithMicrophoneAsync().Wait();
        }

        private static async Task IntentPatternMatchingWithMicrophoneAsync()
        {
            var config = SpeechConfig.FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
        }
    }
}

음성 구성 만들기

IntentRecognizer 개체를 초기화하기 전에 Azure AI 서비스 예측 리소스의 키와 Azure 지역을 사용하는 구성을 만들어야 합니다.

"YOUR_SUBSCRIPTION_KEY"를 Azure AI 서비스 예측 키로 바꿉니다.
"YOUR_SUBSCRIPTION_REGION"을 Azure AI 서비스 리소스 지역으로 바꿉니다.

이 샘플에서는 FromSubscription() 메서드를 사용하여 SpeechConfig를 빌드합니다. 사용 가능한 메서드의 전체 목록은 SpeechConfig 클래스를 참조하세요.

IntentRecognizer 초기화

이제 IntentRecognizer를 만듭니다. 음성 구성 바로 아래에 이 코드를 삽입합니다.

using (var recognizer = new IntentRecognizer(config))
{
    
}

의도 추가

일부 패턴을 PatternMatchingModel과 연결하고 이를 IntentRecognizer에 적용해야 합니다. 먼저 PatternMatchingModel을 만들고 여기에 몇 가지 의도를 추가합니다.

참고 항목

PatternMatchingIntent에 여러 패턴을 추가할 수 있습니다.

이 코드를 using 블록 내에 삽입합니다.

// Creates a Pattern Matching model and adds specific intents from your model. The
// Id is used to identify this model from others in the collection.
var model = new PatternMatchingModel("YourPatternMatchingModelId");

// Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
var patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

// Creates a pattern that uses an optional entity and group that could be used to tie commands together.
var patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

// You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
// to distinguish between the instances. For example:
var patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
// NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
//       and is separated from the entity name by a ':'

// Creates the pattern matching intents and adds them to the model
model.Intents.Add(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
model.Intents.Add(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

일부 사용자 지정 엔터티 추가

패턴 선택기를 최대한 활용하려면 엔터티를 사용자 지정할 수 있습니다. "floorName"을 사용 가능한 층 목록으로 만듭니다. 또한 "parkingLevel"을 정수 엔터티로 만듭니다.

이 코드를 사용자 의도 아래에 삽입합니다.

// Creates the "floorName" entity and set it to type list.
// Adds acceptable values. NOTE the default entity type is Any and so we do not need
// to declare the "action" entity.
model.Entities.Add(PatternMatchingEntity.CreateListEntity("floorName", EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

// Creates the "parkingLevel" entity as a pre-built integer
model.Entities.Add(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

Recognizer에 모델 적용

이제 모델을 IntentRecognizer에 적용해야 합니다. 한 번에 여러 모델을 사용할 수 있으므로 API가 모델 모음을 가져옵니다.

이 코드를 엔터티 아래에 삽입합니다.

var modelCollection = new LanguageUnderstandingModelCollection();
modelCollection.Add(model);

recognizer.ApplyLanguageModels(modelCollection);

의도 인식

IntentRecognizer 개체에서 RecognizeOnceAsync() 메서드를 호출합니다. 이 방법은 음성 서비스에 단일 구문의 음성을 인식하도록 요청하고 구문이 식별되면 음성 인식을 중지합니다.

언어 모델이 적용되면 다음 코드를 삽입합니다.

Console.WriteLine("Say something...");

var result = await recognizer.RecognizeOnceAsync();

인식 결과(또는 오류) 표시

Speech Service에서 인식 결과가 반환되면 결과를 출력합니다.

이 코드를 var result = await recognizer.RecognizeOnceAsync(); 아래에 삽입합니다.

if (result.Reason == ResultReason.RecognizedIntent)
{
    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    Console.WriteLine($"       Intent Id={result.IntentId}.");

    var entities = result.Entities;
    switch (result.IntentId)
    {
        case "ChangeFloors":
            if (entities.TryGetValue("floorName", out string floorName))
            {
                Console.WriteLine($"       FloorName={floorName}");
            }

            if (entities.TryGetValue("floorName:1", out floorName))
            {
                Console.WriteLine($"     FloorName:1={floorName}");
            }

            if (entities.TryGetValue("floorName:2", out floorName))
            {
                Console.WriteLine($"     FloorName:2={floorName}");
            }

            if (entities.TryGetValue("parkingLevel", out string parkingLevel))
            {
                Console.WriteLine($"    ParkingLevel={parkingLevel}");
            }

            break;

        case "DoorControl":
            if (entities.TryGetValue("action", out string action))
            {
                Console.WriteLine($"          Action={action}");
            }
            break;
    }
}
else if (result.Reason == ResultReason.RecognizedSpeech)
{
    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    Console.WriteLine($"    Intent not recognized.");
}
else if (result.Reason == ResultReason.NoMatch)
{
    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
}
else if (result.Reason == ResultReason.Canceled)
{
    var cancellation = CancellationDetails.FromResult(result);
    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

    if (cancellation.Reason == CancellationReason.Error)
    {
        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
        Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
    }
}

코드 확인

이 시점에서 코드는 다음과 같습니다.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Intent;

namespace helloworld
{
    class Program
    {
        static void Main(string[] args)
        {
            IntentPatternMatchingWithMicrophoneAsync().Wait();
        }

        private static async Task IntentPatternMatchingWithMicrophoneAsync()
        {
            var config = SpeechConfig.FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");

            using (var recognizer = new IntentRecognizer(config))
            {
                // Creates a Pattern Matching model and adds specific intents from your model. The
                // Id is used to identify this model from others in the collection.
                var model = new PatternMatchingModel("YourPatternMatchingModelId");

                // Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
                var patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

                // Creates a pattern that uses an optional entity and group that could be used to tie commands together.
                var patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

                // You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
                // to distinguish between the instances. For example:
                var patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
                // NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
                //       and is separated from the entity name by a ':'

                // Adds some intents to look for specific patterns.
                model.Intents.Add(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
                model.Intents.Add(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

                // Creates the "floorName" entity and set it to type list.
                // Adds acceptable values. NOTE the default entity type is Any and so we do not need
                // to declare the "action" entity.
                model.Entities.Add(PatternMatchingEntity.CreateListEntity("floorName", EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

                // Creates the "parkingLevel" entity as a pre-built integer
                model.Entities.Add(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

                var modelCollection = new LanguageUnderstandingModelCollection();
                modelCollection.Add(model);

                recognizer.ApplyLanguageModels(modelCollection);

                Console.WriteLine("Say something...");

                var result = await recognizer.RecognizeOnceAsync();

                if (result.Reason == ResultReason.RecognizedIntent)
                {
                    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
                    Console.WriteLine($"       Intent Id={result.IntentId}.");

                    var entities = result.Entities;
                    switch (result.IntentId)
                    {
                        case "ChangeFloors":
                            if (entities.TryGetValue("floorName", out string floorName))
                            {
                                Console.WriteLine($"       FloorName={floorName}");
                            }

                            if (entities.TryGetValue("floorName:1", out floorName))
                            {
                                Console.WriteLine($"     FloorName:1={floorName}");
                            }

                            if (entities.TryGetValue("floorName:2", out floorName))
                            {
                                Console.WriteLine($"     FloorName:2={floorName}");
                            }

                            if (entities.TryGetValue("parkingLevel", out string parkingLevel))
                            {
                                Console.WriteLine($"    ParkingLevel={parkingLevel}");
                            }

                            break;

                        case "DoorControl":
                            if (entities.TryGetValue("action", out string action))
                            {
                                Console.WriteLine($"          Action={action}");
                            }
                            break;
                    }
                }
                else if (result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"RECOGNIZED: Text={result.Text}");
                    Console.WriteLine($"    Intent not recognized.");
                }
                else if (result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
                else if (result.Reason == ResultReason.Canceled)
                {
                    var cancellation = CancellationDetails.FromResult(result);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                    }
                }
            }
        }
    }
}

앱 빌드 및 실행

이제 앱을 빌드하고 음성 서비스를 사용하여 음성 인식을 테스트할 준비가 되었습니다.

코드 컴파일 - Visual Studio의 메뉴 모음에서 빌드>빌드 솔루션을 선택합니다.
앱 시작 - 메뉴 모음에서 디버그>디버깅 시작을 선택하거나 F5 키를 누릅니다.
인식 시작 - 무언가를 말하라는 메시지가 표시됩니다. 기본 언어는 한국어(Korean)입니다. 음성은 Speech Service로 전송되어 텍스트로 변환되고 콘솔에 렌더링됩니다.

예를 들어 "Take me to floor 2"라고 말하면 다음과 같이 출력됩니다.

Say something...
RECOGNIZED: Text=Take me to floor 2.
       Intent Id=ChangeFloors.
       FloorName=2

또 다른 예로 "Take me to floor 7"이라고 말하면 다음과 같이 출력됩니다.

Say something...
RECOGNIZED: Text=Take me to floor 7.
    Intent not recognized.

7이 floorName에 대한 유효한 값 목록에 없으므로 의도가 인식되지 않았습니다.

프로젝트 만들기

Visual Studio 2019에서 새 C++ 콘솔 애플리케이션 프로젝트를 만들고 Speech SDK를 설치합니다.

몇 가지 상용구 코드로 시작

helloworld.cpp를 열고 프로젝트의 골격으로 작동하는 코드를 추가해 보겠습니다.

#include <iostream>
#include <speechapi_cxx.h>

using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Intent;

int main()
{
    std::cout << "Hello World!\n";

    auto config = SpeechConfig::FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
}

음성 구성 만들기

IntentRecognizer 개체를 초기화하기 전에 Azure AI 서비스 예측 리소스의 키와 Azure 지역을 사용하는 구성을 만들어야 합니다.

"YOUR_SUBSCRIPTION_KEY"를 Azure AI 서비스 예측 키로 바꿉니다.
"YOUR_SUBSCRIPTION_REGION"을 Azure AI 서비스 리소스 지역으로 바꿉니다.

이 샘플에서는 FromSubscription() 메서드를 사용하여 SpeechConfig를 빌드합니다. 사용 가능한 메서드의 전체 목록은 SpeechConfig 클래스를 참조하세요.

IntentRecognizer 초기화

이제 IntentRecognizer를 만듭니다. 음성 구성 바로 아래에 이 코드를 삽입합니다.

    auto intentRecognizer = IntentRecognizer::FromConfig(config);

의도 추가

일부 패턴을 PatternMatchingModel과 연결하고 이를 IntentRecognizer에 적용해야 합니다. 먼저 PatternMatchingModel을 만들고 여기에 몇 가지 의도를 추가합니다. PatternMatchingIntent는 구조체이므로 인라인 구문만 사용합니다.

참고 항목

PatternMatchingIntent에 여러 패턴을 추가할 수 있습니다.

auto model = PatternMatchingModel::FromId("myNewModel");

model->Intents.push_back({"Take me to floor {floorName}.", "Go to floor {floorName}."} , "ChangeFloors");
model->Intents.push_back({"{action} the door."}, "OpenCloseDoor");

일부 사용자 지정 엔터티 추가

패턴 선택기를 최대한 활용하려면 엔터티를 사용자 지정할 수 있습니다. "floorName"을 사용 가능한 층 목록으로 만듭니다.

model->Entities.push_back({ "floorName" , Intent::EntityType::List, Intent::EntityMatchMode::Strict, {"one", "1", "two", "2", "lobby", "ground floor"} });

Recognizer에 모델 적용

이제 모델을 IntentRecognizer에 적용해야 합니다. 한 번에 여러 모델을 사용할 수 있으므로 API가 모델 모음을 가져옵니다.

std::vector<std::shared_ptr<LanguageUnderstandingModel>> collection;

collection.push_back(model);
intentRecognizer->ApplyLanguageModels(collection);

의도 인식

IntentRecognizer 개체에서 RecognizeOnceAsync() 메서드를 호출합니다. 이 방법은 음성 서비스에 단일 구문의 음성을 인식하도록 요청하고 구문이 식별되면 음성 인식을 중지합니다. 번거로움을 피하기 위해 완료될 때까지 기다립니다.

이 코드를 사용자 의도 아래에 삽입합니다.

std::cout << "Say something ..." << std::endl;
auto result = intentRecognizer->RecognizeOnceAsync().get();

인식 결과(또는 오류) 표시

Speech Service에서 인식 결과가 반환되면 결과를 출력합니다.

이 코드를 auto result = intentRecognizer->RecognizeOnceAsync().get(); 아래에 삽입합니다.

switch (result->Reason)
{
case ResultReason::RecognizedSpeech:
        std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
        std::cout << "NO INTENT RECOGNIZED!" << std::endl;
        break;
case ResultReason::RecognizedIntent:
    std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
    std::cout << "  Intent Id = " << result->IntentId.c_str() << std::endl;
    auto entities = result->GetEntities();
    if (entities.find("floorName") != entities.end())
    {
        std::cout << "  Floor name: = " << entities["floorName"].c_str() << std::endl;
    }

    if (entities.find("action") != entities.end())
    {
        std::cout << "  Action: = " << entities["action"].c_str() << std::endl;
    }

    break;
case ResultReason::NoMatch:
{
    auto noMatch = NoMatchDetails::FromResult(result);
    switch (noMatch->Reason)
    {
    case NoMatchReason::NotRecognized:
        std::cout << "NOMATCH: Speech was detected, but not recognized." << std::endl;
        break;
    case NoMatchReason::InitialSilenceTimeout:
        std::cout << "NOMATCH: The start of the audio stream contains only silence, and the service timed out waiting for speech." << std::endl;
        break;
    case NoMatchReason::InitialBabbleTimeout:
        std::cout << "NOMATCH: The start of the audio stream contains only noise, and the service timed out waiting for speech." << std::endl;
        break;
    case NoMatchReason::KeywordNotRecognized:
        std::cout << "NOMATCH: Keyword not recognized" << std::endl;
        break;
    }
    break;
}
case ResultReason::Canceled:
{
    auto cancellation = CancellationDetails::FromResult(result);

    if (!cancellation->ErrorDetails.empty())
    {
        std::cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails.c_str() << std::endl;
        std::cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
    }
}
default:
    break;
}

코드 확인

이 시점에서 코드는 다음과 같습니다.

#include <iostream>
#include <speechapi_cxx.h>

using namespace Microsoft::CognitiveServices::Speech;
using namespace Microsoft::CognitiveServices::Speech::Intent;

int main()
{
    auto config = SpeechConfig::FromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
    auto intentRecognizer = IntentRecognizer::FromConfig(config);

    auto model = PatternMatchingModel::FromId("myNewModel");

    model->Intents.push_back({"Take me to floor {floorName}.", "Go to floor {floorName}."} , "ChangeFloors");
    model->Intents.push_back({"{action} the door."}, "OpenCloseDoor");

    model->Entities.push_back({ "floorName" , Intent::EntityType::List, Intent::EntityMatchMode::Strict, {"one", "1", "two", "2", "lobby", "ground floor"} });

    std::vector<std::shared_ptr<LanguageUnderstandingModel>> collection;

    collection.push_back(model);
    intentRecognizer->ApplyLanguageModels(collection);

    std::cout << "Say something ..." << std::endl;

    auto result = intentRecognizer->RecognizeOnceAsync().get();

    switch (result->Reason)
    {
    case ResultReason::RecognizedSpeech:
        std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
        std::cout << "NO INTENT RECOGNIZED!" << std::endl;
        break;
    case ResultReason::RecognizedIntent:
        std::cout << "RECOGNIZED: Text = " << result->Text.c_str() << std::endl;
        std::cout << "  Intent Id = " << result->IntentId.c_str() << std::endl;
        auto entities = result->GetEntities();
        if (entities.find("floorName") != entities.end())
        {
            std::cout << "  Floor name: = " << entities["floorName"].c_str() << std::endl;
        }

        if (entities.find("action") != entities.end())
        {
            std::cout << "  Action: = " << entities["action"].c_str() << std::endl;
        }

        break;
    case ResultReason::NoMatch:
    {
        auto noMatch = NoMatchDetails::FromResult(result);
        switch (noMatch->Reason)
        {
        case NoMatchReason::NotRecognized:
            std::cout << "NOMATCH: Speech was detected, but not recognized." << std::endl;
            break;
        case NoMatchReason::InitialSilenceTimeout:
            std::cout << "NOMATCH: The start of the audio stream contains only silence, and the service timed out waiting for speech." << std::endl;
            break;
        case NoMatchReason::InitialBabbleTimeout:
            std::cout << "NOMATCH: The start of the audio stream contains only noise, and the service timed out waiting for speech." << std::endl;
            break;
        case NoMatchReason::KeywordNotRecognized:
            std::cout << "NOMATCH: Keyword not recognized." << std::endl;
            break;
        }
        break;
    }
    case ResultReason::Canceled:
    {
        auto cancellation = CancellationDetails::FromResult(result);

        if (!cancellation->ErrorDetails.empty())
        {
            std::cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails.c_str() << std::endl;
            std::cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
        }
    }
    default:
        break;
    }
}

앱 빌드 및 실행

이제 앱을 빌드하고 음성 서비스를 사용하여 음성 인식을 테스트할 준비가 되었습니다.

코드 컴파일 - Visual Studio의 메뉴 모음에서 빌드>빌드 솔루션을 선택합니다.
앱 시작 - 메뉴 모음에서 디버그>디버깅 시작을 선택하거나 F5 키를 누릅니다.
인식 시작 - 무언가를 말하라는 메시지가 표시됩니다. 기본 언어는 한국어(Korean)입니다. 음성은 Speech Service로 전송되어 텍스트로 변환되고 콘솔에 렌더링됩니다.

예를 들어 "Take me to floor 2"라고 말하면 다음과 같이 출력됩니다.

Say something ...
RECOGNIZED: Text = Take me to floor 2.
  Intent Id = ChangeFloors
  Floor name: = 2

또한 예를 들어 "Take me to floor 7"이라고 말하면 다음과 같이 출력됩니다.

Say something ...
RECOGNIZED: Text = Take me to floor 7.
NO INTENT RECOGNIZED!

7이 목록에 없기 때문에 의도 ID가 비어 있습니다.

참조 설명서 | GitHub의 추가 샘플

이 빠른 시작에서는 Java용 Speech SDK를 설치합니다.

플랫폼 요구 사항

대상 환경 선택:

Java 런타임
Android

Java용 Speech SDK는 Windows, Linux 및 macOS와 호환됩니다.

Windows에서는 64비트 대상 아키텍처를 사용해야 합니다. Windows 10 이상이 필요합니다.

플랫폼에 적합한 Visual Studio 2015, 2017, 2019, 2022용 Microsoft Visual C++ 재배포 가능 패키지를 설치합니다. 이 패키지를 처음 설치하려면 다시 시작해야 할 수 있습니다.

Java용 Speech SDK는 ARM64의 Windows를 지원하지 않습니다.

주의

이 문서에서는 EOL(수명 종료) 상태에 가까워진 Linux 배포판인 CentOS를 참조하세요. 이에 따라 사용 및 계획을 고려하세요. 자세한 내용은 CentOS 수명 종료 지침을 참조하세요.

Java용 Speech SDK는 x64, ARM32(Debian/Ubuntu) 및 ARM64(Debian/Ubuntu) 아키텍처에서 지원하는 배포는 다음과 같습니다.

Ubuntu 18.04/20.04
Debian 10/11
RHEL(Red Hat Enterprise Linux) 7/8
CentOS 7

Important

Linux 배포판의 최신 LTS 릴리스를 사용합니다. 예를 들어 Ubuntu 20.04 LTS를 사용하는 경우 최신 릴리스의 Ubuntu 20.04.X를 사용합니다.

Speech SDK는 다음 Linux 시스템 라이브러리에 따라 달라집니다.

GNU C 라이브러리의 공유 라이브러리(POSIX Threads Programming 라이브러리, libpthreads 포함).
OpenSSL 라이브러리(libssl) 버전 1.x 및 인증서(ca-certificates).
ALSA 애플리케이션의 공유 라이브러리(libasound).

보안 웹소켓을 설정하고 WS_OPEN_ERROR_UNDERLYING_IO_OPEN_FAILED 오류를 방지하려면 ca-certificates도 설치해야 합니다.

Important

음성 SDK는 Ubuntu 22.04 및 Debian 12의 기본값인 OpenSSL 3.0을 아직 지원하지 않습니다.

다음 명령을 실행하세요.

sudo apt-get update
sudo apt-get install build-essential libssl-dev ca-certificates libasound2 wget

Alpine Linux에서 음성 SDK를 사용하려면 glibc 프로그램 실행에 관한 Alpine Linux Wiki에서 설명하는 것과 같이 Debian chroot 환경을 만듭니다. 그런 다음, 여기에서 Debian 지침을 따릅니다.

sudo apt-get update
sudo apt-get install build-essential libssl-dev ca-certificates libasound2 wget

주의

개발 도구 및 라이브러리를 설치합니다.

sudo yum update
sudo yum groupinstall "Development tools"
sudo yum install alsa-lib openssl wget

Important

RHEL/CentOS 7에서 Speech SDK용 RHEL/CentOS 7을 구성하는 방법의 지침을 따르세요.
RHEL에서는 Linux용 OpenSSL 구성 방법에 대한 지침을 따릅니다.

Azul Zulu OpenJDK와 같은 Java 개발 키트를 설치해야 합니다. OpenJDK의 Microsoft 빌드 또는 선호하는 JDK도 작동해야 합니다.

Java용 Speech SDK 설치

일부 지침은 1.24.2과 같은 특정 SDK 버전을 사용합니다. 최신 버전을 확인하려면 GitHub 리포지토리를 검색합니다.

대상 환경 선택:

Java 런타임
Android

이 가이드는 Java 런타임에서 Java용 Speech SDK를 설치하는 방법을 보여 줍니다.

지원되는 운영 체제

Java용 Speech SDK 패키지는 다음 운영 체제에서 사용할 수 있습니다.

Windows: 64비트만 해당.
Mac: macOS X 버전 10.14 이상.
Linux: 지원되는 Linux 배포판 및 대상 아키텍처를 참조하세요.

Apache Maven을 사용하여 Java용 Speech SDK를 설치하려면 다음 단계를 따릅니다.

Apache Maven을 설치합니다.
새 프로젝트를 원하는 명령 프롬프트를 열고 새 pom.xml 파일을 만듭니다.

다음 XML 콘텐츠를 pom.xml에 복사합니다.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.microsoft.cognitiveservices.speech.samples</groupId>
    <artifactId>quickstart-eclipse</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <build>
        <sourceDirectory>src</sourceDirectory>
        <plugins>
        <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.7.0</version>
            <configuration>
            <source>1.8</source>
            <target>1.8</target>
            </configuration>
        </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
        <groupId>com.microsoft.cognitiveservices.speech</groupId>
        <artifactId>client-sdk</artifactId>
        <version>1.37.0</version>
        </dependency>
    </dependencies>
</project>

다음 Maven 명령을 실행하여 Speech SDK 및 종속성을 설치합니다.
```
mvn clean dependency:copy-dependencies
```

Eclipse 프로젝트 만들기 및 Speech SDK 설치

Eclipse Java IDE를 설치합니다. 이 IDE를 사용하려면 Java가 이미 설치되어 있어야 합니다.
Eclipse를 시작합니다.
Eclipse Launcher에서 작업 영역 상자에 새 작업 영역 디렉터리의 이름을 입력합니다. 그리고 시작을 선택합니다.
잠시 후 Eclipse IDE의 주 창이 표시됩니다. 시작 화면이 표시되는 경우 시작 화면을 닫습니다.
Eclipse 메뉴에서 파일>새로 만들기>프로젝트를 선택합니다.
새 프로젝트 대화 상자가 나타납니다. Java 프로젝트를 선택한 다음, 다음을 선택합니다.
새 Java 프로젝트 마법사가 시작됩니다. 프로젝트 이름 필드에 빠른 시작을 입력합니다. 실행 환경으로 JavaSE-1.8을 선택합니다. 마침을 선택합니다.
연결된 큐브 뷰를 열까요?라는 창이 나타나면 큐브 뷰 열기를 선택합니다.
패키지 탐색기에서 빠른 시작 프로젝트를 마우스 오른쪽 단추로 클릭합니다. 바로 가기 메뉴에서 구성>Maven 프로젝트로 변환을 선택합니다.
새 POM 만들기 창이 나타납니다. 그룹 ID 필드에 com.microsoft.cognitiveservices.speech.samples를 입력합니다. 아티팩트 ID 필드에 빠른 시작을 입력합니다. 그런 다음, 마침을 선택합니다.
pom.xml 파일을 열고 편집합니다.
1. Speech SDK를 종속성으로 사용하여 닫는 태그 </project> 앞의 파일 끝에 dependencies 요소를 추가합니다.
```
<dependencies>
  <dependency>
    <groupId>com.microsoft.cognitiveservices.speech</groupId>
    <artifactId>client-sdk</artifactId>
    <version>1.37.0</version>
  </dependency>
</dependencies>
```
1. 변경 내용을 저장합니다.

Gradle 구성

Gradle 구성에는 .jar 종속성 확장에 대한 명시적 참조가 필요합니다.

// build.gradle

dependencies {
    implementation group: 'com.microsoft.cognitiveservices.speech', name: 'client-sdk', version: "1.37.0", ext: "jar"
}

몇 가지 상용구 코드로 시작

src dir에서 Main.java를 엽니다.
파일의 내용을 다음으로 바꿉니다.

import java.util.ArrayList;
import java.util.Dictionary;
import java.util.concurrent.ExecutionException;


import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.intent.*;

public class Main {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        IntentPatternMatchingWithMicrophone();
    }

    public static void IntentPatternMatchingWithMicrophone() throws InterruptedException, ExecutionException {
        SpeechConfig config = SpeechConfig.fromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
    }
}

음성 구성 만들기

IntentRecognizer 개체를 초기화하기 전에 Azure AI 서비스 예측 리소스의 키와 Azure 지역을 사용하는 구성을 만들어야 합니다.

"YOUR_SUBSCRIPTION_KEY"를 Azure AI 서비스 예측 키로 바꿉니다.
"YOUR_SUBSCRIPTION_REGION"을 Azure AI 서비스 리소스 지역으로 바꿉니다.

이 샘플에서는 fromSubscription() 메서드를 사용하여 SpeechConfig를 빌드합니다. 사용 가능한 메서드의 전체 목록은 SpeechConfig 클래스를 참조하세요.

IntentRecognizer 초기화

이제 IntentRecognizer를 만듭니다. 음성 구성 바로 아래에 이 코드를 삽입합니다. 자동 닫기 인터페이스를 활용하기 위해 이 작업을 시도합니다.

try (IntentRecognizer recognizer = new IntentRecognizer(config)) {

}

의도 추가

일부 패턴을 PatternMatchingModel과 연결하고 이를 IntentRecognizer에 적용해야 합니다. 먼저 PatternMatchingModel을 만들고 여기에 몇 가지 의도를 추가합니다.

참고 항목

PatternMatchingIntent에 여러 패턴을 추가할 수 있습니다.

이 코드를 try 블록 내에 삽입합니다.

// Creates a Pattern Matching model and adds specific intents from your model. The
// Id is used to identify this model from others in the collection.
PatternMatchingModel model = new PatternMatchingModel("YourPatternMatchingModelId");

// Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
String patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

// Creates a pattern that uses an optional entity and group that could be used to tie commands together.
String patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

// You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
// to distinguish between the instances. For example:
String patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
// NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
//       and is separated from the entity name by a ':'

// Creates the pattern matching intents and adds them to the model
model.getIntents().put(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
model.getIntents().put(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

일부 사용자 지정 엔터티 추가

이 코드를 사용자 의도 아래에 삽입합니다.

// Creates the "floorName" entity and set it to type list.
// Adds acceptable values. NOTE the default entity type is Any and so we do not need
// to declare the "action" entity.
model.getEntities().put(PatternMatchingEntity.CreateListEntity("floorName", PatternMatchingEntity.EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

// Creates the "parkingLevel" entity as a pre-built integer
model.getEntities().put(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

Recognizer에 모델 적용

이제 모델을 IntentRecognizer에 적용해야 합니다. 한 번에 여러 모델을 사용할 수 있으므로 API가 모델 모음을 가져옵니다.

이 코드를 엔터티 아래에 삽입합니다.

ArrayList<LanguageUnderstandingModel> modelCollection = new ArrayList<LanguageUnderstandingModel>();
modelCollection.add(model);

recognizer.applyLanguageModels(modelCollection);

의도 인식

언어 모델이 적용되면 다음 코드를 삽입합니다.

System.out.println("Say something...");

IntentRecognitionResult result = recognizer.recognizeOnceAsync().get();

인식 결과(또는 오류) 표시

Speech Service에서 인식 결과가 반환되면 결과를 출력합니다.

이 코드를 IntentRecognitionResult result = recognizer.recognizeOnceAsync.get(); 아래에 삽입합니다.

if (result.getReason() == ResultReason.RecognizedSpeech) {
    System.out.println("RECOGNIZED: Text= " + result.getText());
    System.out.println(String.format("%17s", "Intent not recognized."));
}
else if (result.getReason() == ResultReason.RecognizedIntent)
{
    System.out.println("RECOGNIZED: Text= " + result.getText());
    System.out.println(String.format("%17s %s", "Intent Id=", result.getIntentId() + "."));
    Dictionary<String, String> entities = result.getEntities();

    switch (result.getIntentId())
    {
        case "ChangeFloors":
            if (entities.get("floorName") != null) {
                System.out.println(String.format("%17s %s", "FloorName=", entities.get("floorName")));
            }
            if (entities.get("floorName:1") != null) {
                System.out.println(String.format("%17s %s", "FloorName:1=", entities.get("floorName:1")));
            }
            if (entities.get("floorName:2") != null) {
                System.out.println(String.format("%17s %s", "FloorName:2=", entities.get("floorName:2")));
            }
            if (entities.get("parkingLevel") != null) {
                System.out.println(String.format("%17s %s", "ParkingLevel=", entities.get("parkingLevel")));
            }
            break;
        case "DoorControl":
            if (entities.get("action") != null) {
                System.out.println(String.format("%17s %s", "Action=", entities.get("action")));
            }
            break;
    }
}
else if (result.getReason() == ResultReason.NoMatch) {
    System.out.println("NOMATCH: Speech could not be recognized.");
}
else if (result.getReason() == ResultReason.Canceled) {
    CancellationDetails cancellation = CancellationDetails.fromResult(result);
    System.out.println("CANCELED: Reason=" + cancellation.getReason());

    if (cancellation.getReason() == CancellationReason.Error)
    {
        System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
        System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
        System.out.println("CANCELED: Did you update the subscription info?");
    }
}

코드 확인

이 시점에서 코드는 다음과 같습니다.

package quickstart;
import java.util.ArrayList;
import java.util.concurrent.ExecutionException;
import java.util.Dictionary;

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.intent.*;

public class Main {
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        IntentPatternMatchingWithMicrophone();
    }

    public static void IntentPatternMatchingWithMicrophone() throws InterruptedException, ExecutionException {
        SpeechConfig config = SpeechConfig.fromSubscription("YOUR_SUBSCRIPTION_KEY", "YOUR_SUBSCRIPTION_REGION");
        try (IntentRecognizer recognizer = new IntentRecognizer(config)) {
            // Creates a Pattern Matching model and adds specific intents from your model. The
            // Id is used to identify this model from others in the collection.
            PatternMatchingModel model = new PatternMatchingModel("YourPatternMatchingModelId");

            // Creates a pattern that uses groups of optional words. "[Go | Take me]" will match either "Go", "Take me", or "".
            String patternWithOptionalWords = "[Go | Take me] to [floor|level] {floorName}";

            // Creates a pattern that uses an optional entity and group that could be used to tie commands together.
            String patternWithOptionalEntity = "Go to parking [{parkingLevel}]";

            // You can also have multiple entities of the same name in a single pattern by adding appending a unique identifier
            // to distinguish between the instances. For example:
            String patternWithTwoOfTheSameEntity = "Go to floor {floorName:1} [and then go to floor {floorName:2}]";
            // NOTE: Both floorName:1 and floorName:2 are tied to the same list of entries. The identifier can be a string
            // and is separated from the entity name by a ':'

            // Creates the pattern matching intents and adds them to the model
            model.getIntents().put(new PatternMatchingIntent("ChangeFloors", patternWithOptionalWords, patternWithOptionalEntity, patternWithTwoOfTheSameEntity));
            model.getIntents().put(new PatternMatchingIntent("DoorControl", "{action} the doors", "{action} doors", "{action} the door", "{action} door"));

            // Creates the "floorName" entity and set it to type list.
            // Adds acceptable values. NOTE the default entity type is Any and so we do not need
            // to declare the "action" entity.
            model.getEntities().put(PatternMatchingEntity.CreateListEntity("floorName", PatternMatchingEntity.EntityMatchMode.Strict, "ground floor", "lobby", "1st", "first", "one", "1", "2nd", "second", "two", "2"));

            // Creates the "parkingLevel" entity as a pre-built integer
            model.getEntities().put(PatternMatchingEntity.CreateIntegerEntity("parkingLevel"));

            ArrayList<LanguageUnderstandingModel> modelCollection = new ArrayList<LanguageUnderstandingModel>();
            modelCollection.add(model);

            recognizer.applyLanguageModels(modelCollection);

            System.out.println("Say something...");

            IntentRecognitionResult result = recognizer.recognizeOnceAsync().get();

            if (result.getReason() == ResultReason.RecognizedSpeech) {
                System.out.println("RECOGNIZED: Text= " + result.getText());
                System.out.println(String.format("%17s", "Intent not recognized."));
            }
            else if (result.getReason() == ResultReason.RecognizedIntent)
            {
                System.out.println("RECOGNIZED: Text= " + result.getText());
                System.out.println(String.format("%17s %s", "Intent Id=", result.getIntentId() + "."));
                Dictionary<String, String> entities = result.getEntities();

                switch (result.getIntentId())
                {
                    case "ChangeFloors":
                        if (entities.get("floorName") != null) {
                            System.out.println(String.format("%17s %s", "FloorName=", entities.get("floorName")));
                        }
                        if (entities.get("floorName:1") != null) {
                            System.out.println(String.format("%17s %s", "FloorName:1=", entities.get("floorName:1")));
                        }
                        if (entities.get("floorName:2") != null) {
                            System.out.println(String.format("%17s %s", "FloorName:2=", entities.get("floorName:2")));
                        }
                        if (entities.get("parkingLevel") != null) {
                            System.out.println(String.format("%17s %s", "ParkingLevel=", entities.get("parkingLevel")));
                        }
                        break;

                    case "DoorControl":
                        if (entities.get("action") != null) {
                            System.out.println(String.format("%17s %s", "Action=", entities.get("action")));
                        }
                        break;
                }
            }
            else if (result.getReason() == ResultReason.NoMatch) {
                System.out.println("NOMATCH: Speech could not be recognized.");
            }
            else if (result.getReason() == ResultReason.Canceled) {
                CancellationDetails cancellation = CancellationDetails.fromResult(result);
                System.out.println("CANCELED: Reason=" + cancellation.getReason());

                if (cancellation.getReason() == CancellationReason.Error)
                {
                    System.out.println("CANCELED: ErrorCode=" + cancellation.getErrorCode());
                    System.out.println("CANCELED: ErrorDetails=" + cancellation.getErrorDetails());
                    System.out.println("CANCELED: Did you update the subscription info?");
                }
            }
        }
    }
}

앱 빌드 및 실행

이제 앱을 빌드하고 Speech Services와 포함된 패턴 검사기를 사용하여 의도 인식을 테스트할 준비가 되었습니다.

Eclipse에서 실행 버튼을 선택하거나 ctrl+F11을 누른 다음 "Say Something..." 프롬프트의 출력을 살펴보세요. 해당 출력이 나타나면 발화를 말하고 출력을 시청하세요.

예를 들어 "Take me to floor 2"라고 말하면 다음과 같이 출력됩니다.

Say something...
RECOGNIZED: Text=Take me to floor 2.
       Intent Id=ChangeFloors.
       FloorName=2

또 다른 예로 "Take me to floor 7"이라고 말하면 다음과 같이 출력됩니다.

Say something...
RECOGNIZED: Text=Take me to floor 7.
    Intent not recognized.

7이 floorName에 대한 유효한 값 목록에 없으므로 의도가 인식되지 않았습니다.

사용자 지정 엔터티 패턴 일치를 사용하여 의도를 인식하는 방법

패턴 일치를 사용하는 경우

필수 조건

프로젝트 만들기

몇 가지 상용구 코드로 시작

음성 구성 만들기

IntentRecognizer 초기화

의도 추가

일부 사용자 지정 엔터티 추가

Recognizer에 모델 적용

의도 인식

인식 결과(또는 오류) 표시

코드 확인

앱 빌드 및 실행

프로젝트 만들기

몇 가지 상용구 코드로 시작

음성 구성 만들기

IntentRecognizer 초기화

의도 추가

일부 사용자 지정 엔터티 추가

Recognizer에 모델 적용

의도 인식

인식 결과(또는 오류) 표시

코드 확인

앱 빌드 및 실행

플랫폼 요구 사항

Java용 Speech SDK 설치

지원되는 운영 체제

몇 가지 상용구 코드로 시작

음성 구성 만들기

IntentRecognizer 초기화

의도 추가

일부 사용자 지정 엔터티 추가

Recognizer에 모델 적용

의도 인식

인식 결과(또는 오류) 표시

코드 확인

앱 빌드 및 실행

추가 리소스