クイックスタート:カスタム音声アシスタントを作成する

[アーティクル]
02/24/2024

このクイックスタートでは、Speech SDK を使用して、既に作成および構成したボットに接続するカスタム音声アシスタントアプリケーションを作成する方法を説明します。ボットを作成する必要がある場合は、より総合的な手順を関連するチュートリアルで参照してください。

いくつかの前提条件を満たすと、お使いのカスタム音声アシスタントは数手順のみで接続できるようになります。

サブスクリプションキーとリージョンから BotFrameworkConfig オブジェクトを作成します。
上記の BotFrameworkConfig オブジェクトを使用して DialogServiceConnector オブジェクトを作成します。
DialogServiceConnector オブジェクトを使用して、1 つの発話のリッスンプロセスを開始します。
返された ActivityReceivedEventArgs を検査します。

注意

C++、JavaScript、Objective-C、Python、Swift 用の Speech SDK では、カスタム音声アシスタントがサポートされますが、ここには、そのガイドはまだ含まれていません。

GitHub で、すべての Speech SDK C# サンプルを表示またはダウンロードできます。

前提条件

開始する前に、必ず次のことを行ってください。

Speech リソースを作成する
ご自分の開発環境を設定し、空のプロジェクトを作成する
Direct Line Speech チャネルに接続されたボットを作成する
オーディオキャプチャ用のマイクにアクセスできることを確認する

Note

音声アシスタントをサポートしているリージョンの一覧を参照し、ご使用のリソースがそれらのリージョンのいずれかにデプロイされていることを確認します。

Visual Studio でプロジェクトを開きます。

最初の手順として、ご利用のプロジェクトを Visual Studio で開いていることを確認します。

定型コードを使用して開始する

このプロジェクトのスケルトンとして機能するコードを追加しましょう。

ソリューションエクスプローラーで、MainPage.xaml を開きます。

デザイナーの XAML ビューで、内容全体を、基本的なユーザーインターフェイスを定義する次のコードスニペットに置き換えます。

<Page
    x:Class="helloworld.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:helloworld"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d"
    Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">

    <Grid>
        <StackPanel Orientation="Vertical" HorizontalAlignment="Center"  
                    Margin="20,50,0,0" VerticalAlignment="Center" Width="800">
            <Button x:Name="EnableMicrophoneButton" Content="Enable Microphone"  
                    Margin="0,0,10,0" Click="EnableMicrophone_ButtonClicked" 
                    Height="35"/>
            <Button x:Name="ListenButton" Content="Talk to your bot" 
                    Margin="0,10,10,0" Click="ListenButton_ButtonClicked" 
                    Height="35"/>
            <StackPanel x:Name="StatusPanel" Orientation="Vertical" 
                        RelativePanel.AlignBottomWithPanel="True" 
                        RelativePanel.AlignRightWithPanel="True" 
                        RelativePanel.AlignLeftWithPanel="True">
                <TextBlock x:Name="StatusLabel" Margin="0,10,10,0" 
                           TextWrapping="Wrap" Text="Status:" FontSize="20"/>
                <Border x:Name="StatusBorder" Margin="0,0,0,0">
                    <ScrollViewer VerticalScrollMode="Auto"  
                                  VerticalScrollBarVisibility="Auto" MaxHeight="200">
                        <!-- Use LiveSetting to enable screen readers to announce 
                             the status update. -->
                        <TextBlock 
                            x:Name="StatusBlock" FontWeight="Bold" 
                            AutomationProperties.LiveSetting="Assertive"
                            MaxWidth="{Binding ElementName=Splitter, Path=ActualWidth}" 
                            Margin="10,10,10,20" TextWrapping="Wrap"  />
                    </ScrollViewer>
                </Border>
            </StackPanel>
        </StackPanel>
        <MediaElement x:Name="mediaElement"/>
    </Grid>
</Page>

デザインビューが更新されてアプリケーションのユーザーインターフェイスが表示されます。

ソリューションエクスプローラーで、コードビハインドのソースファイル MainPage.xaml.cs を開きます (MainPage.xaml にグループ化されています)。このファイルの内容を次に置き換えます。これには次のものが含まれます。

Speech 名前空間と Speech.Dialog 名前空間の using ステートメント
ボタンハンドラーに接続された、マイクへのアクセスを確実にするための簡単な実装
アプリケーション内のメッセージとエラーを表示する基本的な UI ヘルパー
後で設定される初期化コードパスのランディングポイント
テキスト読み上げを再生するヘルパー (ストリーミングのサポートなし)

後で設定されるリスニングを開始するための空のボタンハンドラー

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Dialog;
using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using Windows.Foundation;
using Windows.Storage.Streams;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;

namespace helloworld
{
    public sealed partial class MainPage : Page
    {
        private DialogServiceConnector connector;

        private enum NotifyType
        {
            StatusMessage,
            ErrorMessage
        };

        public MainPage()
        {
            this.InitializeComponent();
        }

        private async void EnableMicrophone_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            bool isMicAvailable = true;
            try
            {
                var mediaCapture = new Windows.Media.Capture.MediaCapture();
                var settings = 
                    new Windows.Media.Capture.MediaCaptureInitializationSettings();
                settings.StreamingCaptureMode = 
                    Windows.Media.Capture.StreamingCaptureMode.Audio;
                await mediaCapture.InitializeAsync(settings);
            }
            catch (Exception)
            {
                isMicAvailable = false;
            }
            if (!isMicAvailable)
            {
                await Windows.System.Launcher.LaunchUriAsync(
                    new Uri("ms-settings:privacy-microphone"));
            }
            else
            {
                NotifyUser("Microphone was enabled", NotifyType.StatusMessage);
            }
        }

        private void NotifyUser(
            string strMessage, NotifyType type = NotifyType.StatusMessage)
        {
            // If called from the UI thread, then update immediately.
            // Otherwise, schedule a task on the UI thread to perform the update.
            if (Dispatcher.HasThreadAccess)
            {
                UpdateStatus(strMessage, type);
            }
            else
            {
                var task = Dispatcher.RunAsync(
                    Windows.UI.Core.CoreDispatcherPriority.Normal, 
                    () => UpdateStatus(strMessage, type));
            }
        }

        private void UpdateStatus(string strMessage, NotifyType type)
        {
            switch (type)
            {
                case NotifyType.StatusMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Green);
                    break;
                case NotifyType.ErrorMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Red);
                    break;
            }
            StatusBlock.Text += string.IsNullOrEmpty(StatusBlock.Text) 
                ? strMessage : "\n" + strMessage;

            if (!string.IsNullOrEmpty(StatusBlock.Text))
            {
                StatusBorder.Visibility = Visibility.Visible;
                StatusPanel.Visibility = Visibility.Visible;
            }
            else
            {
                StatusBorder.Visibility = Visibility.Collapsed;
                StatusPanel.Visibility = Visibility.Collapsed;
            }
            // Raise an event if necessary to enable a screen reader 
            // to announce the status update.
            var peer = Windows.UI.Xaml.Automation.Peers.FrameworkElementAutomationPeer.FromElement(StatusBlock);
            if (peer != null)
            {
                peer.RaiseAutomationEvent(
                    Windows.UI.Xaml.Automation.Peers.AutomationEvents.LiveRegionChanged);
            }
        }

        // Waits for and accumulates all audio associated with a given 
        // PullAudioOutputStream and then plays it to the MediaElement. Long spoken 
        // audio will create extra latency and a streaming playback solution 
        // (that plays audio while it continues to be received) should be used -- 
        // see the samples for examples of this.
        private void SynchronouslyPlayActivityAudio(
            PullAudioOutputStream activityAudio)
        {
            var playbackStreamWithHeader = new MemoryStream();
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4); // ChunkID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // ChunkSize: max
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4); // Format
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4); // Subchunk1ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 4); // Subchunk1Size: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // AudioFormat: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // NumChannels: mono
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16000), 0, 4); // SampleRate: 16kHz
            playbackStreamWithHeader.Write(BitConverter.GetBytes(32000), 0, 4); // ByteRate
            playbackStreamWithHeader.Write(BitConverter.GetBytes(2), 0, 2); // BlockAlign
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 2); // BitsPerSample: 16-bit
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("data"), 0, 4); // Subchunk2ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // Subchunk2Size

            byte[] pullBuffer = new byte[2056];

            uint lastRead = 0;
            do
            {
                lastRead = activityAudio.Read(pullBuffer);
                playbackStreamWithHeader.Write(pullBuffer, 0, (int)lastRead);
            }
            while (lastRead == pullBuffer.Length);

            var task = Dispatcher.RunAsync(
                Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
            {
                mediaElement.SetSource(
                    playbackStreamWithHeader.AsRandomAccessStream(), "audio/wav");
                mediaElement.Play();
            });
        }

        private void InitializeDialogServiceConnector()
        {
            // New code will go here
        }

        private async void ListenButton_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            // New code will go here
        }
    }
}

InitializeDialogServiceConnector のメソッド本体に、次のコードスニペットを追加します。このコードにより、サブスクリプション情報を使用して DialogServiceConnector が作成されます。
```
// Create a BotFrameworkConfig by providing a Speech service subscription key
// the botConfig.Language property is optional (default en-US)
const string speechSubscriptionKey = "YourSpeechSubscriptionKey"; // Your subscription key
const string region = "YourServiceRegion"; // Your subscription service region.

var botConfig = BotFrameworkConfig.FromSubscription(speechSubscriptionKey, region);
botConfig.Language = "en-US";
connector = new DialogServiceConnector(botConfig);
```
Note

音声アシスタントをサポートしているリージョンの一覧を参照し、ご使用のリソースがそれらのリージョンのいずれかにデプロイされていることを確認します。

Note

ご自分のボットの構成の詳細については、Direct Line Speech チャネルに関する Bot Framework のドキュメントを参照してください。
YourSpeechSubscriptionKey と YourServiceRegion の各文字列を、ご自分の音声サブスクリプションとリージョンの独自の値に置き換えます

InitializeDialogServiceConnector のメソッド本体の最後に、次のコードスニペットを追加します。このコードは、ボットのアクティビティや音声認識の結果などの情報を伝達する DialogServiceConnector が依存するイベントのハンドラーを設定します。

// ActivityReceived is the main way your bot will communicate with the client 
// and uses bot framework activities
connector.ActivityReceived += (sender, activityReceivedEventArgs) =>
{
    NotifyUser(
        $"Activity received, hasAudio={activityReceivedEventArgs.HasAudio} activity={activityReceivedEventArgs.Activity}");

    if (activityReceivedEventArgs.HasAudio)
    {
        SynchronouslyPlayActivityAudio(activityReceivedEventArgs.Audio);
    }
};

// Canceled will be signaled when a turn is aborted or experiences an error condition
connector.Canceled += (sender, canceledEventArgs) =>
{
    NotifyUser($"Canceled, reason={canceledEventArgs.Reason}");
    if (canceledEventArgs.Reason == CancellationReason.Error)
    {
        NotifyUser(
            $"Error: code={canceledEventArgs.ErrorCode}, details={canceledEventArgs.ErrorDetails}");
    }
};

// Recognizing (not 'Recognized') will provide the intermediate recognized text 
// while an audio stream is being processed
connector.Recognizing += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Recognizing! in-progress text={recognitionEventArgs.Result.Text}");
};

// Recognized (not 'Recognizing') will provide the final recognized text 
// once audio capture is completed
connector.Recognized += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Final speech to text result: '{recognitionEventArgs.Result.Text}'");
};

// SessionStarted will notify when audio begins flowing to the service for a turn
connector.SessionStarted += (sender, sessionEventArgs) =>
{
    NotifyUser($"Now Listening! Session started, id={sessionEventArgs.SessionId}");
};

// SessionStopped will notify when a turn is complete and 
// it's safe to begin listening again
connector.SessionStopped += (sender, sessionEventArgs) =>
{
    NotifyUser($"Listening complete. Session ended, id={sessionEventArgs.SessionId}");
};

MainPage クラスの ListenButton_ButtonClicked メソッドの本体に次のコードスニペットを追加します。既に構成は確立済みであり、イベントハンドラーも登録してあるため、後は、リッスンする DialogServiceConnector をこのコードで設定します。

if (connector == null)
{
    InitializeDialogServiceConnector();
    // Optional step to speed up first interaction: if not called, 
    // connection happens automatically on first use
    var connectTask = connector.ConnectAsync();
}

try
{
    // Start sending audio to your speech-enabled bot
    var listenTask = connector.ListenOnceAsync();

    // You can also send activities to your bot as JSON strings -- 
    // Microsoft.Bot.Schema can simplify this
    string speakActivity = 
        @"{""type"":""message"",""text"":""Greeting Message"", ""speak"":""Hello there!""}";
    await connector.SendActivityAsync(speakActivity);

}
catch (Exception ex)
{
    NotifyUser($"Exception: {ex.ToString()}", NotifyType.ErrorMessage);
}

アプリをビルドして実行する

これで、お使いのアプリをビルドし、Speech Service を使用してお使いのカスタム音声アシスタントをテストする準備ができました。

メニューバーから [ビルド]>[ソリューションのビルド] を選択してアプリケーションをビルドします。これで、コードは、エラーなしでコンパイルされます。
[デバッグ]>[デバッグの開始] の順に選択するか、F5 キーを押して、アプリケーションを起動します。 [helloworld] ウィンドウが表示されます。
[Enable Microphone] を選択し、アクセス許可要求がポップアップ表示されたら、 [はい] を選択します。
[Talk to your bot] を選択し、デバイスのマイクに向かって英語のフレーズや文章を話します。音声が Direct Line Speech チャネルに送信されてテキストに変換され、ウィンドウに表示されます。

次のステップ

GitHub で C# のサンプルを詳しく見てみる

GitHub で、すべての Speech SDK Java サンプルを表示またはダウンロードできます。

ターゲット環境を選択してください

Java ランタイム
Android

前提条件

開始する前に、必ず次のことを行ってください。

Speech リソースを作成する
ご自分の開発環境を設定し、空のプロジェクトを作成する
Direct Line Speech チャネルに接続されたボットを作成する
オーディオキャプチャ用のマイクにアクセスできることを確認する

Note

プロジェクトの作成と構成

Eclipse プロジェクトを作成して Speech SDK をインストールします。

さらに、ログ記録を有効にするには、pom.xml ファイルを更新して次の依存関係を含めます。

 <dependency>
     <groupId>org.slf4j</groupId>
     <artifactId>slf4j-simple</artifactId>
     <version>1.7.5</version>
 </dependency>

サンプルコードを追加する

新しい空のクラスを Java プロジェクトに追加するために、 [File]>[New]>[Class] の順に選択します。
[New Java Class](新しい Java クラス) ウィンドウで、 [Package](パッケージ) フィールドに「speechsdk.quickstart」と入力し、 [Name](名前) フィールドに「Main」と入力します。

新しく作成された Main クラスを開き、Main.java ファイルの内容を次の開始コードに置き換えます。

package speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;
import java.io.InputStream;

public class Main {
    final Logger log = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        // New code will go here
    }

    private void playAudioStream(PullAudioOutputStream audio) {
        ActivityAudioStream stream = new ActivityAudioStream(audio);
        final ActivityAudioStream.ActivityAudioFormat audioFormat = stream.getActivityAudioFormat();
        final AudioFormat format = new AudioFormat(
                AudioFormat.Encoding.PCM_SIGNED,
                audioFormat.getSamplesPerSecond(),
                audioFormat.getBitsPerSample(),
                audioFormat.getChannels(),
                audioFormat.getFrameSize(),
                audioFormat.getSamplesPerSecond(),
                false);
        try {
            int bufferSize = format.getFrameSize();
            final byte[] data = new byte[bufferSize];

            SourceDataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
            SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(format);

            if (line != null) {
                line.start();
                int nBytesRead = 0;
                while (nBytesRead != -1) {
                    nBytesRead = stream.read(data);
                    if (nBytesRead != -1) {
                        line.write(data, 0, nBytesRead);
                    }
                }
                line.drain();
                line.stop();
                line.close();
            }
            stream.close();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

main メソッドでは、最初に DialogServiceConfig を構成し、それを使用して DialogServiceConnector インスタンスを作成します。このインスタンスは、Direct Line Speech チャネルに接続してボットと対話します。 AudioConfig インスタンスは、音声入力のソースを指定するためにも使用されます。この例では、AudioConfig.fromDefaultMicrophoneInput() により既定のマイクが使用されます。
- 文字列 YourSubscriptionKey を、Azure portal から取得できる Speech リソースキーで置換します。
- 文字列 YourServiceRegion を、自分の Speech リソースに関連付けられているリージョンに置き換えます。
Note

音声アシスタントをサポートしているリージョンの一覧を参照し、ご使用のリソースがそれらのリージョンのいずれかにデプロイされていることを確認します。
```
final String subscriptionKey = "YourSubscriptionKey"; // Your subscription key
final String region = "YourServiceRegion"; // Your speech subscription service region
final BotFrameworkConfig botConfig = BotFrameworkConfig.fromSubscription(subscriptionKey, region);

// Configure audio input from a microphone.
final AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();

// Create a DialogServiceConnector instance.
final DialogServiceConnector connector = new DialogServiceConnector(botConfig, audioConfig);
```

コネクタ DialogServiceConnector は、ボットのアクティビティ、音声認識の結果、およびその他の情報を伝達するために、いくつかのイベントに依存しています。次に、これらのイベントリスナーを追加します。

// Recognizing will provide the intermediate recognized text while an audio stream is being processed.
connector.recognizing.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognizing speech event text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// Recognized will provide the final recognized text once audio capture is completed.
connector.recognized.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognized speech event reason text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// SessionStarted will notify when audio begins flowing to the service for a turn.
connector.sessionStarted.addEventListener((o, sessionEventArgs) -> {
    log.info("Session Started event id: {} ", sessionEventArgs.getSessionId());
});

// SessionStopped will notify when a turn is complete and it's safe to begin listening again.
connector.sessionStopped.addEventListener((o, sessionEventArgs) -> {
    log.info("Session stopped event id: {}", sessionEventArgs.getSessionId());
});

// Canceled will be signaled when a turn is aborted or experiences an error condition.
connector.canceled.addEventListener((o, canceledEventArgs) -> {
    log.info("Canceled event details: {}", canceledEventArgs.getErrorDetails());
    connector.disconnectAsync();
});

// ActivityReceived is the main way your bot will communicate with the client and uses Bot Framework activities.
connector.activityReceived.addEventListener((o, activityEventArgs) -> {
    final String act = activityEventArgs.getActivity().serialize();
        log.info("Received activity {} audio", activityEventArgs.hasAudio() ? "with" : "without");
        if (activityEventArgs.hasAudio()) {
            playAudioStream(activityEventArgs.getAudio());
        }
    });

connectAsync() メソッドを呼び出して、DialogServiceConnector を Direct Line Speech に接続します。ボットをテストするには、listenOnceAsync メソッドを呼び出してマイクから音声入力を送信します。さらに、sendActivityAsync メソッドを使用して、カスタムアクティビティをシリアル化された文字列として送信することもできます。これらのカスタムアクティビティを使用すると、ボットが会話に使用する追加データを提供できます。
```
connector.connectAsync();
// Start listening.
System.out.println("Say something ...");
connector.listenOnceAsync();

// connector.sendActivityAsync(...)
```
変更を Main ファイルに保存します。
応答の再生をサポートするために、getAudio() API から返される PullAudioOutputStream オブジェクトを処理しやすいように Java InputStream に変換するクラスを追加します。この ActivityAudioStream は、Direct Line Speech チャネルからの音声応答を処理する特別なクラスです。このクラスでは、再生の処理に必要なオーディオ形式の情報を取得するアクセサーを提供します。そのために、 [File](ファイル)>[New](新規)>[Class](クラス) を選択します。
[New Java Class](新しい Java クラス) ウィンドウで、 [Package](パッケージ) フィールドに「speechsdk.quickstart」と入力し、 [Name](名前) フィールドに「ActivityAudioStream」と入力します。

新しく作成した ActivityAudioStream クラスを開き、次のコードに置き換えます。

package com.speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;

import java.io.IOException;
import java.io.InputStream;

 public final class ActivityAudioStream extends InputStream {
     /**
      * The number of samples played per second (16 kHz).
      */
     public static final long SAMPLE_RATE = 16000;
     /**
      * The number of bits in each sample of a sound that has this format (16 bits).
      */
     public static final int BITS_PER_SECOND = 16;
     /**
      * The number of audio channels in this format (1 for mono).
      */
     public static final int CHANNELS = 1;
     /**
      * The number of bytes in each frame of a sound that has this format (2).
      */
     public static final int FRAME_SIZE = 2;

     /**
      * Reads up to a specified maximum number of bytes of data from the audio
      * stream, putting them into the given byte array.
      *
      * @param b   the buffer into which the data is read
      * @param off the offset, from the beginning of array <code>b</code>, at which
      *            the data will be written
      * @param len the maximum number of bytes to read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b, int off, int len) {
         byte[] tempBuffer = new byte[len];
         int n = (int) this.pullStreamImpl.read(tempBuffer);
         for (int i = 0; i < n; i++) {
             if (off + i > b.length) {
                 throw new ArrayIndexOutOfBoundsException(b.length);
             }
             b[off + i] = tempBuffer[i];
         }
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Reads the next byte of data from the activity audio stream if available.
      *
      * @return the next byte of data, or -1 if the end of the stream is reached
      * @see #read(byte[], int, int)
      * @see #read(byte[])
      * @see #available
      * <p>
      */
     @Override
     public int read() {
         byte[] data = new byte[1];
         int temp = read(data);
         if (temp <= 0) {
             // we have a weird situation if read(byte[]) returns 0!
             return -1;
         }
         return data[0] & 0xFF;
     }

     /**
      * Reads up to a specified maximum number of bytes of data from the activity audio stream,
      * putting them into the given byte array.
      *
      * @param b the buffer into which the data is read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b) {
         int n = (int) pullStreamImpl.read(b);
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Skips over and discards a specified number of bytes from this
      * audio input stream.
      *
      * @param n the requested number of bytes to be skipped
      * @return the actual number of bytes skipped
      * @throws IOException if an input or output error occurs
      * @see #read
      * @see #available
      */
     @Override
     public long skip(long n) {
         if (n <= 0) {
             return 0;
         }
         if (n <= Integer.MAX_VALUE) {
             byte[] tempBuffer = new byte[(int) n];
             return read(tempBuffer);
         }
         long count = 0;
         for (long i = n; i > 0; i -= Integer.MAX_VALUE) {
             int size = (int) Math.min(Integer.MAX_VALUE, i);
             byte[] tempBuffer = new byte[size];
             count += read(tempBuffer);
         }
         return count;
     }

     /**
      * Closes this audio input stream and releases any system resources associated
      * with the stream.
      */
     @Override
     public void close() {
         this.pullStreamImpl.close();
     }

     /**
      * Fetch the audio format for the ActivityAudioStream. The ActivityAudioFormat defines the sample rate, bits per sample, and the # channels.
      *
      * @return instance of the ActivityAudioFormat associated with the stream
      */
     public ActivityAudioStream.ActivityAudioFormat getActivityAudioFormat() {
         return activityAudioFormat;
     }

     /**
      * Returns the maximum number of bytes that can be read (or skipped over) from this
      * audio input stream without blocking.
      *
      * @return the number of bytes that can be read from this audio input stream without blocking.
      * As this implementation does not buffer, this will be defaulted to 0
      */
     @Override
     public int available() {
         return 0;
     }

     public ActivityAudioStream(final PullAudioOutputStream stream) {
         pullStreamImpl = stream;
         this.activityAudioFormat = new ActivityAudioStream.ActivityAudioFormat(SAMPLE_RATE, BITS_PER_SECOND, CHANNELS, FRAME_SIZE, AudioEncoding.PCM_SIGNED);
     }

     private PullAudioOutputStream pullStreamImpl;

     private ActivityAudioFormat activityAudioFormat;

     /**
      * ActivityAudioFormat is an internal format which contains metadata regarding the type of arrangement of
      * audio bits in this activity audio stream.
      */
     static class ActivityAudioFormat {

         private long samplesPerSecond;
         private int bitsPerSample;
         private int channels;
         private int frameSize;
         private AudioEncoding encoding;

         public ActivityAudioFormat(long samplesPerSecond, int bitsPerSample, int channels, int frameSize, AudioEncoding encoding) {
             this.samplesPerSecond = samplesPerSecond;
             this.bitsPerSample = bitsPerSample;
             this.channels = channels;
             this.encoding = encoding;
             this.frameSize = frameSize;
         }

         /**
          * Fetch the number of samples played per second for the associated audio stream format.
          *
          * @return the number of samples played per second
          */
         public long getSamplesPerSecond() {
             return samplesPerSecond;
         }

         /**
          * Fetch the number of bits in each sample of a sound that has this audio stream format.
          *
          * @return the number of bits per sample
          */
         public int getBitsPerSample() {
             return bitsPerSample;
         }

         /**
          * Fetch the number of audio channels used by this audio stream format.
          *
          * @return the number of channels
          */
         public int getChannels() {
             return channels;
         }

         /**
          * Fetch the default number of bytes in a frame required by this audio stream format.
          *
          * @return the number of bytes
          */
         public int getFrameSize() {
             return frameSize;
         }

         /**
          * Fetch the audio encoding type associated with this audio stream format.
          *
          * @return the encoding associated
          */
         public AudioEncoding getEncoding() {
             return encoding;
         }
     }

     /**
      * Enum defining the types of audio encoding supported by this stream.
      */
     public enum AudioEncoding {
         PCM_SIGNED("PCM_SIGNED");

         String value;

         AudioEncoding(String value) {
             this.value = value;
         }
     }
 }

変更を ActivityAudioStream ファイルに保存します。

アプリのビルドと実行

F11 キーを押すか、 [Run](実行)>[Debug](デバッグ) の順に選択します。コンソールに "Say something" というメッセージが表示されます。この時点で、ボットが理解できる英語の語句や文を話しかけてください。音声は Direct Line Speech チャネルを介してボットに送信され、そこで認識および処理されます。その応答はアクティビティとして返されます。ボットが応答として音声を返す場合、音声は AudioPlayer クラスを使用して再生されます。

認識が成功した後のコンソール出力のスクリーンショット

次のステップ

GitHub で Java のサンプルを詳しく見てみる

前提条件

開始する前に、必ず次のことを行ってください。

Speech リソースを作成する
ご自分の開発環境を設定し、空のプロジェクトを作成する
Direct Line Speech チャネルに接続されたボットを作成する
オーディオキャプチャ用のマイクにアクセスできることを確認する

Note

プロジェクトの作成と構成

Android Studio を使用して Speech SDK をインストールします。

ユーザーインターフェイスを作成する

このセクションでは、アプリケーション用の基本的なユーザーインターフェイス (UI) を作成します。メインアクティビティ activity_main.xml を開いて開始します。基本的なテンプレートには、アプリケーションの名前を示すタイトルバーと、メッセージ "Hello world!" を示す TextView が含まれています。

次に、activity_main.xml の内容を次のコードに置き換えます。

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
 xmlns:tools="http://schemas.android.com/tools"
 android:layout_width="match_parent"
 android:layout_height="match_parent"
 android:orientation="vertical"
 tools:context=".MainActivity">

 <Button
     android:id="@+id/button"
     android:layout_width="wrap_content"
     android:layout_height="wrap_content"
     android:layout_gravity="center"
     android:onClick="onBotButtonClicked"
     android:text="Talk to your bot" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Recognition Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/recoText"
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="  \n(Recognition goes here)\n" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Activity Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/activityText"
     android:layout_width="match_parent"
     android:layout_height="match_parent"
     android:scrollbars="vertical"
     android:text="  \n(Activities go here)\n" />

</LinearLayout>

この XML は、ボットと対話するシンプルな UI を定義します。

button 要素は、クリックされると対話を開始し、onBotButtonClicked メソッドを呼び出します。
recoText 要素は、ユーザーがボットに話したときの音声テキスト変換の結果を表示します。
activityText 要素は、ボットからの最新の Bot Framework アクティビティについての JSON ペイロードを表示します。

この時点で、UI のテキストおよびグラフィカル表現は次のようになります。

ボット UI との対話がどのように表示されるかを示すスクリーンショット。

サンプルコードを追加する

MainActivity.java を開き、内容を次のコードに置き換えます。

 package samples.speech.cognitiveservices.microsoft.com;

 import android.media.AudioFormat;
 import android.media.AudioManager;
 import android.media.AudioTrack;
 import android.support.v4.app.ActivityCompat;
 import android.support.v7.app.AppCompatActivity;
 import android.os.Bundle;
 import android.text.method.ScrollingMovementMethod;
 import android.view.View;
 import android.widget.TextView;

 import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
 import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
 import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
 import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;

 import org.json.JSONException;
 import org.json.JSONObject;

 import static android.Manifest.permission.*;

 public class MainActivity extends AppCompatActivity {
     // Replace below with your own speech subscription key
     private static String speechSubscriptionKey = "YourSpeechSubscriptionKey";
     // Replace below with your own speech service region
     private static String serviceRegion = "YourSpeechServiceRegion";

     private DialogServiceConnector connector;

     @Override
     protected void onCreate(Bundle savedInstanceState) {
         super.onCreate(savedInstanceState);
         setContentView(R.layout.activity_main);

         TextView recoText = (TextView) this.findViewById(R.id.recoText);
         TextView activityText = (TextView) this.findViewById(R.id.activityText);
         recoText.setMovementMethod(new ScrollingMovementMethod());
         activityText.setMovementMethod(new ScrollingMovementMethod());

         // Note: we need to request permissions for audio input and network access
         int requestCode = 5; // unique code for the permission request
         ActivityCompat.requestPermissions(MainActivity.this, new String[]{RECORD_AUDIO, INTERNET}, requestCode);
     }

     public void onBotButtonClicked(View v) {
         // Recreate the DialogServiceConnector on each button press, ensuring that the existing one is closed
         if (connector != null) {
             connector.close();
             connector = null;
         }

         // Create the DialogServiceConnector from speech subscription information
         BotFrameworkConfig config = BotFrameworkConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
         connector = new DialogServiceConnector(config, AudioConfig.fromDefaultMicrophoneInput());

         // Optional step: preemptively connect to reduce first interaction latency
         connector.connectAsync();

         // Register the DialogServiceConnector's event listeners
         registerEventListeners();

         // Begin sending audio to your bot
         connector.listenOnceAsync();
     }

     private void registerEventListeners() {
         TextView recoText = (TextView) this.findViewById(R.id.recoText); // 'recoText' is the ID of your text view
         TextView activityText = (TextView) this.findViewById(R.id.activityText); // 'activityText' is the ID of your text view

         // Recognizing will provide the intermediate recognized text while an audio stream is being processed
         connector.recognizing.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognizing: " + recoArgs.getResult().getText());
         });

         // Recognized will provide the final recognized text once audio capture is completed
         connector.recognized.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognized: " + recoArgs.getResult().getText());
         });

         // SessionStarted will notify when audio begins flowing to the service for a turn
         connector.sessionStarted.addEventListener((o, sessionArgs) -> {
             recoText.setText("Listening...");
         });

         // SessionStopped will notify when a turn is complete and it's safe to begin listening again
         connector.sessionStopped.addEventListener((o, sessionArgs) -> {
         });

         // Canceled will be signaled when a turn is aborted or experiences an error condition
         connector.canceled.addEventListener((o, canceledArgs) -> {
             recoText.setText("Canceled (" + canceledArgs.getReason().toString() + ") error details: {}" + canceledArgs.getErrorDetails());
             connector.disconnectAsync();
         });

         // ActivityReceived is the main way your bot will communicate with the client and uses bot framework activities.
         connector.activityReceived.addEventListener((o, activityArgs) -> {
             try {
                 // Here we use JSONObject only to "pretty print" the condensed Activity JSON
                 String rawActivity = activityArgs.getActivity().serialize();
                 String formattedActivity = new JSONObject(rawActivity).toString(2);
                 activityText.setText(formattedActivity);
             } catch (JSONException e) {
                 activityText.setText("Couldn't format activity text: " + e.getMessage());
             }

             if (activityArgs.hasAudio()) {
                 // Text to speech audio associated with the activity is 16 kHz 16-bit mono PCM data
                 final int sampleRate = 16000;
                 int bufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);

                 AudioTrack track = new AudioTrack(
                         AudioManager.STREAM_MUSIC,
                         sampleRate,
                         AudioFormat.CHANNEL_OUT_MONO,
                         AudioFormat.ENCODING_PCM_16BIT,
                         bufferSize,
                         AudioTrack.MODE_STREAM);

                 track.play();

                 PullAudioOutputStream stream = activityArgs.getAudio();

                 // Audio is streamed as it becomes available. Play it as it arrives.
                 byte[] buffer = new byte[bufferSize];
                 long bytesRead = 0;

                 do {
                     bytesRead = stream.read(buffer);
                     track.write(buffer, 0, (int) bytesRead);
                 } while (bytesRead == bufferSize);

                 track.release();
             }
         });
     }
 }

onCreate メソッドには、マイクとインターネットのアクセス許可を要求するコードが含まれています。
メソッド onBotButtonClicked は上述の通りボタンクリックハンドラーです。ボタンを押すと、ボットとの 1 回の対話 ("ターン") がトリガーされます。
registerEventListeners メソッドは、DialogServiceConnector によって使用されるイベントと、受信アクティビティの基本的な処理を示します。

同じファイルで、リソースに一致するように構成文字列を置き換えます。
- YourSpeechSubscriptionKey は、実際のサブスクリプションキーで置き換えてください。
- YourServiceRegion を、サブスクリプションに関連付けられているリージョンに置き換えます。Direct Line Speech では Speech サービスリージョンの一部のみが現在サポートされています。詳細については、リージョンに関するページを参照してください。

アプリのビルドと実行

Android デバイスを開発用 PC に接続します。デバイスで開発モードと USB デバッグが有効なことを確認します。
アプリケーションをビルドするには、Ctrl + F9 キーを押すか、メニューバーから [ビルド]>[Make Project](プロジェクトの作成) を選択します。
アプリケーションを起動するには、Shift + F10 キーを押すか、 [実行]>[Run 'app'](アプリの実行) を選択します。
表示された配置ターゲットウィンドウで、Android デバイスを選択します。

アプリケーションとそのアクティビティを起動したら、ボタンをクリックしてボットとの対話を開始します。対話中に変換されたテキストが表示され、ボットから受信した最新のアクティビティは、受信したときに表示されます。ボットが音声による応答を提供するように構成されている場合、音声テキスト変換により自動的に再生されます。

Android アプリケーションのスクリーンショット

次のステップ

GitHub で Java のサンプルを詳しく見てみる

GitHub で、すべての Speech SDK Go サンプルを表示またはダウンロードできます。

前提条件

開始する前に、以下の操作を行います。

Speech リソースを作成する
開発環境を設定し、空のプロジェクトを作成する
Direct Line Speech チャネルに接続されたボットを作成する
オーディオキャプチャ用のマイクにアクセスできることを確認する

Note

環境を設定する

次の行を追加して、go.mod ファイルを最新の SDK バージョンで更新します。

require (
    github.com/Microsoft/cognitive-services-speech-sdk-go v1.15.0
)

定型コードを使用して開始する

ご使用のソースファイル (例: quickstart.go) の内容を次に置き換えます。これには次のものが含まれます。

"main" パッケージの定義
Speech SDK からの必要なモジュールのインポート
このクイックスタートの後半で置き換えられるボット情報を格納するための変数
オーディオ入力用のマイクを使用したシンプルな実装
音声の相互作用中に発生するさまざまなイベントのイベントハンドラー

package main

import (
    "fmt"
    "time"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/dialog"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_BOT_REGION"

    audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := dialog.NewBotFrameworkConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    connector, err := dialog.NewDialogServiceConnectorFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer connector.Close()
    activityReceivedHandler := func(event dialog.ActivityReceivedEventArgs) {
        defer event.Close()
        fmt.Println("Received an activity.")
    }
    connector.ActivityReceived(activityReceivedHandler)
    recognizedHandle := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognized ", event.Result.Text)
    }
    connector.Recognized(recognizedHandle)
    recognizingHandler := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognizing ", event.Result.Text)
    }
    connector.Recognizing(recognizingHandler)
    connector.ListenOnceAsync()
    <-time.After(10 * time.Second)
}

YOUR_SUBSCRIPTION_KEY と YOUR_BOT_REGION の値は、Speech リソースの実際の値に置き換えてください。

Azure portal に移動し、Speech リソースを開きます
左側の [Keys and Endpoint](キーとエンドポイント) に、利用可能なサブスクリプションキーが 2 つあります
- そのどちらか一方で YOUR_SUBSCRIPTION_KEY の値を置き換えます
左側の [概要] で、リージョンをメモし、それをリージョン識別子にマッピングします
- YOUR_BOT_REGION の置換値としてリージョン識別子を使用します (米国西部の場合は "westus" など)
Note

音声アシスタントをサポートしているリージョンの一覧を参照し、ご使用のリソースがそれらのリージョンのいずれかにデプロイされていることを確認します。

Note

ご自分のボットの構成の詳細については、Direct Line Speech チャネルに関する Bot Framework のドキュメントを参照してください。

コードの説明

音声構成オブジェクトを作成するには、Speech のサブスクリプションキーとリージョンが必要です。音声認識エンジンオブジェクトをインスタンス化するには、この構成オブジェクトが必要です。

認識エンジンインスタンスは、複数の音声認識方法を公開します。この例では、音声は継続的に認識されます。この機能によって、認識のために多くの語句が送信されていることと、音声の認識を停止するためにプログラムを終了するタイミングを Speech サービスに知らせることができます。結果が生成されると、コードによってコンソールに出力されます。

ビルドおよび実行

これで、プロジェクトをビルドし、Speech サービスを使用してカスタム音声アシスタントをテストするように設定できました。

プロジェクト (例: "go build" ) をビルドします。
モジュールを実行して、デバイスのマイクに向かってフレーズや文章を話します。音声が Direct Line Speech チャネルに送信され、テキストに変換されます。これは出力に表示されます。

Note

Speech SDK では、既定で認識される言語は en-us です。ソース言語の選択については、「音声を認識する方法」を参照してください。

次のステップ

GitHub で Go のサンプルを詳しく見てみる

言語とプラットフォームのその他のサポート

このタブをクリックした場合、お気に入りのプログラミング言語でクイックスタートが表示されないことがあります。ご安心ください。GitHub で入手できる追加のクイックスタートの素材とコードサンプルを用意しています。この表を使用して、お使いのプログラミング言語とプラットフォーム/OS の組み合わせに適したサンプルを見つけます。

Language	コードサンプル
C#	.NET Framework、.NET Core、UWP、Unity、Xamarin
C++	Windows、Linux、macOS
Java	Android、JRE
JavaScript	ブラウザー、Node.js
Objective-C	iOS、macOS
Python	Windows、Linux、macOS
Swift	iOS、macOS

クイック スタート:カスタム音声アシスタントを作成する

前提条件

Visual Studio でプロジェクトを開きます。

定型コードを使用して開始する

アプリをビルドして実行する

次のステップ

前提条件

プロジェクトの作成と構成

サンプル コードを追加する

アプリのビルドと実行

次のステップ

前提条件

環境を設定する

定型コードを使用して開始する

コードの説明

ビルドおよび実行

次のステップ

言語とプラットフォームのその他のサポート

その他のリソース

クイックスタート:カスタム音声アシスタントを作成する

サンプルコードを追加する