Início Rápido: Criar uma assistente de voz personalizada

Artigo
02/24/2024

Neste início rápido, irá utilizar o SDK de Voz para criar uma aplicação de voz assistente personalizada que se liga a um bot que já criou e configurou. Se precisar de criar um bot, veja o tutorial relacionado para obter um guia mais abrangente.

Depois de satisfazer alguns pré-requisitos, ligar a sua voz personalizada assistente tem apenas alguns passos:

Crie um BotFrameworkConfig objeto a partir da sua chave de subscrição e região.
Crie um DialogServiceConnector objeto com o BotFrameworkConfig objeto acima.
Com o DialogServiceConnector objeto, inicie o processo de escuta para uma única expressão.
Inspecione o ActivityReceivedEventArgs devolvido.

Nota

O SDK de Voz para C++, JavaScript, Objective-C, Python e Swift suporta assistentes de voz personalizados, mas ainda não incluímos um guia aqui.

Pode ver ou transferir todos os Exemplos C# do SDK de Voz no GitHub.

Pré-requisitos

Antes de começar, certifique-se de que:

Criar um recurso de Voz
Configurar o seu ambiente de desenvolvimento e criar um projeto vazio
Criar um bot ligado ao canal de Voz do Direct Line
Certifique-se de que tem acesso a um microfone para captura de áudio

Nota

Veja a lista de regiões suportadas para assistentes de voz e certifique-se de que os seus recursos são implementados numa dessas regiões.

Abrir o projeto no Visual Studio

O primeiro passo é certificar-se de que tem o seu projeto aberto no Visual Studio.

Começar com algum código de placa de caldeira

Vamos adicionar algum código que funcione como um esqueleto para o nosso projeto.

No Explorador de Soluções, abra MainPage.xaml.

Na vista XAML do estruturador, substitua todo o conteúdo pelo fragmento seguinte que define uma interface de utilizador rudimentar:

<Page
    x:Class="helloworld.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:helloworld"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d"
    Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">

    <Grid>
        <StackPanel Orientation="Vertical" HorizontalAlignment="Center"  
                    Margin="20,50,0,0" VerticalAlignment="Center" Width="800">
            <Button x:Name="EnableMicrophoneButton" Content="Enable Microphone"  
                    Margin="0,0,10,0" Click="EnableMicrophone_ButtonClicked" 
                    Height="35"/>
            <Button x:Name="ListenButton" Content="Talk to your bot" 
                    Margin="0,10,10,0" Click="ListenButton_ButtonClicked" 
                    Height="35"/>
            <StackPanel x:Name="StatusPanel" Orientation="Vertical" 
                        RelativePanel.AlignBottomWithPanel="True" 
                        RelativePanel.AlignRightWithPanel="True" 
                        RelativePanel.AlignLeftWithPanel="True">
                <TextBlock x:Name="StatusLabel" Margin="0,10,10,0" 
                           TextWrapping="Wrap" Text="Status:" FontSize="20"/>
                <Border x:Name="StatusBorder" Margin="0,0,0,0">
                    <ScrollViewer VerticalScrollMode="Auto"  
                                  VerticalScrollBarVisibility="Auto" MaxHeight="200">
                        <!-- Use LiveSetting to enable screen readers to announce 
                             the status update. -->
                        <TextBlock 
                            x:Name="StatusBlock" FontWeight="Bold" 
                            AutomationProperties.LiveSetting="Assertive"
                            MaxWidth="{Binding ElementName=Splitter, Path=ActualWidth}" 
                            Margin="10,10,10,20" TextWrapping="Wrap"  />
                    </ScrollViewer>
                </Border>
            </StackPanel>
        </StackPanel>
        <MediaElement x:Name="mediaElement"/>
    </Grid>
</Page>

A vista Estrutura é atualizada para mostrar a interface de utilizador da aplicação.

No Explorador de Soluções, abra o ficheiro de código atrás do ficheiro MainPage.xaml.csde origem . (Está agrupado em MainPage.xaml.) Substitua o conteúdo deste ficheiro pelo ficheiro abaixo, que inclui:

usinginstruções para os Speech espaços de nomes e Speech.Dialog
Uma implementação simples para garantir o acesso ao microfone, com fios para um processador de botões
Auxiliares básicos de IU para apresentar mensagens e erros na aplicação
Um ponto de destino para o caminho do código de inicialização que será preenchido mais tarde
Um programa auxiliar para reproduzir texto em voz (sem suporte de transmissão em fluxo)

Um processador de botões vazio para começar a escutar que será preenchido mais tarde

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Dialog;
using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using Windows.Foundation;
using Windows.Storage.Streams;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;

namespace helloworld
{
    public sealed partial class MainPage : Page
    {
        private DialogServiceConnector connector;

        private enum NotifyType
        {
            StatusMessage,
            ErrorMessage
        };

        public MainPage()
        {
            this.InitializeComponent();
        }

        private async void EnableMicrophone_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            bool isMicAvailable = true;
            try
            {
                var mediaCapture = new Windows.Media.Capture.MediaCapture();
                var settings = 
                    new Windows.Media.Capture.MediaCaptureInitializationSettings();
                settings.StreamingCaptureMode = 
                    Windows.Media.Capture.StreamingCaptureMode.Audio;
                await mediaCapture.InitializeAsync(settings);
            }
            catch (Exception)
            {
                isMicAvailable = false;
            }
            if (!isMicAvailable)
            {
                await Windows.System.Launcher.LaunchUriAsync(
                    new Uri("ms-settings:privacy-microphone"));
            }
            else
            {
                NotifyUser("Microphone was enabled", NotifyType.StatusMessage);
            }
        }

        private void NotifyUser(
            string strMessage, NotifyType type = NotifyType.StatusMessage)
        {
            // If called from the UI thread, then update immediately.
            // Otherwise, schedule a task on the UI thread to perform the update.
            if (Dispatcher.HasThreadAccess)
            {
                UpdateStatus(strMessage, type);
            }
            else
            {
                var task = Dispatcher.RunAsync(
                    Windows.UI.Core.CoreDispatcherPriority.Normal, 
                    () => UpdateStatus(strMessage, type));
            }
        }

        private void UpdateStatus(string strMessage, NotifyType type)
        {
            switch (type)
            {
                case NotifyType.StatusMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Green);
                    break;
                case NotifyType.ErrorMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Red);
                    break;
            }
            StatusBlock.Text += string.IsNullOrEmpty(StatusBlock.Text) 
                ? strMessage : "\n" + strMessage;

            if (!string.IsNullOrEmpty(StatusBlock.Text))
            {
                StatusBorder.Visibility = Visibility.Visible;
                StatusPanel.Visibility = Visibility.Visible;
            }
            else
            {
                StatusBorder.Visibility = Visibility.Collapsed;
                StatusPanel.Visibility = Visibility.Collapsed;
            }
            // Raise an event if necessary to enable a screen reader 
            // to announce the status update.
            var peer = Windows.UI.Xaml.Automation.Peers.FrameworkElementAutomationPeer.FromElement(StatusBlock);
            if (peer != null)
            {
                peer.RaiseAutomationEvent(
                    Windows.UI.Xaml.Automation.Peers.AutomationEvents.LiveRegionChanged);
            }
        }

        // Waits for and accumulates all audio associated with a given 
        // PullAudioOutputStream and then plays it to the MediaElement. Long spoken 
        // audio will create extra latency and a streaming playback solution 
        // (that plays audio while it continues to be received) should be used -- 
        // see the samples for examples of this.
        private void SynchronouslyPlayActivityAudio(
            PullAudioOutputStream activityAudio)
        {
            var playbackStreamWithHeader = new MemoryStream();
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4); // ChunkID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // ChunkSize: max
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4); // Format
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4); // Subchunk1ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 4); // Subchunk1Size: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // AudioFormat: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // NumChannels: mono
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16000), 0, 4); // SampleRate: 16kHz
            playbackStreamWithHeader.Write(BitConverter.GetBytes(32000), 0, 4); // ByteRate
            playbackStreamWithHeader.Write(BitConverter.GetBytes(2), 0, 2); // BlockAlign
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 2); // BitsPerSample: 16-bit
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("data"), 0, 4); // Subchunk2ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // Subchunk2Size

            byte[] pullBuffer = new byte[2056];

            uint lastRead = 0;
            do
            {
                lastRead = activityAudio.Read(pullBuffer);
                playbackStreamWithHeader.Write(pullBuffer, 0, (int)lastRead);
            }
            while (lastRead == pullBuffer.Length);

            var task = Dispatcher.RunAsync(
                Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
            {
                mediaElement.SetSource(
                    playbackStreamWithHeader.AsRandomAccessStream(), "audio/wav");
                mediaElement.Play();
            });
        }

        private void InitializeDialogServiceConnector()
        {
            // New code will go here
        }

        private async void ListenButton_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            // New code will go here
        }
    }
}

Adicione o fragmento de código seguinte ao corpo do método de InitializeDialogServiceConnector. Este código cria as informações com a DialogServiceConnector sua subscrição.

// Create a BotFrameworkConfig by providing a Speech service subscription key
// the botConfig.Language property is optional (default en-US)
const string speechSubscriptionKey = "YourSpeechSubscriptionKey"; // Your subscription key
const string region = "YourServiceRegion"; // Your subscription service region.

var botConfig = BotFrameworkConfig.FromSubscription(speechSubscriptionKey, region);
botConfig.Language = "en-US";
connector = new DialogServiceConnector(botConfig);

Nota

Veja a lista de regiões suportadas para assistentes de voz e certifique-se de que os seus recursos são implementados numa dessas regiões.

Nota

Para obter informações sobre como configurar o bot, consulte a documentação do Bot Framework para o canal de Voz Direct Line.

Substitua as cadeias YourSpeechSubscriptionKey e YourServiceRegion pelos seus próprios valores para a sua subscrição de voz e região.

Acrescente o seguinte fragmento de código ao fim do corpo do método de InitializeDialogServiceConnector. Este código configura processadores para eventos suportados por DialogServiceConnector para comunicar as atividades do bot, os resultados do reconhecimento de voz e outras informações.

// ActivityReceived is the main way your bot will communicate with the client 
// and uses bot framework activities
connector.ActivityReceived += (sender, activityReceivedEventArgs) =>
{
    NotifyUser(
        $"Activity received, hasAudio={activityReceivedEventArgs.HasAudio} activity={activityReceivedEventArgs.Activity}");

    if (activityReceivedEventArgs.HasAudio)
    {
        SynchronouslyPlayActivityAudio(activityReceivedEventArgs.Audio);
    }
};

// Canceled will be signaled when a turn is aborted or experiences an error condition
connector.Canceled += (sender, canceledEventArgs) =>
{
    NotifyUser($"Canceled, reason={canceledEventArgs.Reason}");
    if (canceledEventArgs.Reason == CancellationReason.Error)
    {
        NotifyUser(
            $"Error: code={canceledEventArgs.ErrorCode}, details={canceledEventArgs.ErrorDetails}");
    }
};

// Recognizing (not 'Recognized') will provide the intermediate recognized text 
// while an audio stream is being processed
connector.Recognizing += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Recognizing! in-progress text={recognitionEventArgs.Result.Text}");
};

// Recognized (not 'Recognizing') will provide the final recognized text 
// once audio capture is completed
connector.Recognized += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Final speech to text result: '{recognitionEventArgs.Result.Text}'");
};

// SessionStarted will notify when audio begins flowing to the service for a turn
connector.SessionStarted += (sender, sessionEventArgs) =>
{
    NotifyUser($"Now Listening! Session started, id={sessionEventArgs.SessionId}");
};

// SessionStopped will notify when a turn is complete and 
// it's safe to begin listening again
connector.SessionStopped += (sender, sessionEventArgs) =>
{
    NotifyUser($"Listening complete. Session ended, id={sessionEventArgs.SessionId}");
};

Adicione o fragmento de código seguinte ao corpo do ListenButton_ButtonClicked método na MainPage classe. Este código é configurado DialogServiceConnector para escutar, uma vez que já estabeleceu a configuração e registou os processadores de eventos.

if (connector == null)
{
    InitializeDialogServiceConnector();
    // Optional step to speed up first interaction: if not called, 
    // connection happens automatically on first use
    var connectTask = connector.ConnectAsync();
}

try
{
    // Start sending audio to your speech-enabled bot
    var listenTask = connector.ListenOnceAsync();

    // You can also send activities to your bot as JSON strings -- 
    // Microsoft.Bot.Schema can simplify this
    string speakActivity = 
        @"{""type"":""message"",""text"":""Greeting Message"", ""speak"":""Hello there!""}";
    await connector.SendActivityAsync(speakActivity);

}
catch (Exception ex)
{
    NotifyUser($"Exception: {ex.ToString()}", NotifyType.ErrorMessage);
}

Compile e execute a sua aplicação

Agora, está pronto para criar a sua aplicação e testar a sua assistente de voz personalizada com o serviço de Voz.

Na barra de menus, selecione Compilar>Solução de Compilação para criar a aplicação. Agora o código deverá ser compilado sem erros.
Selecione Depurar>Iniciar Depuração (ou prima F5) para iniciar a aplicação. É apresentada a janela helloworld .
Selecione Ativar Microfone e, quando o pedido de permissão de acesso aparecer, selecione Sim.
Selecione Falar com o bot e fale uma frase ou frase em inglês no microfone do seu dispositivo. A sua voz é transmitida para o canal de Voz Direct Line e transcrita para texto, que aparece na janela.

Passos seguintes

Explorar exemplos de C# no GitHub

Pode ver ou transferir todos os Exemplos de Java do SDK de Voz no GitHub.

Escolher o seu ambiente de destino

Java Runtime
Android

Pré-requisitos

Antes de começar, certifique-se de que:

Criar um recurso de Voz
Configurar o seu ambiente de desenvolvimento e criar um projeto vazio
Criar um bot ligado ao canal de Voz do Direct Line
Certifique-se de que tem acesso a um microfone para captura de áudio

Nota

Veja a lista de regiões suportadas para assistentes de voz e certifique-se de que os seus recursos são implementados numa dessas regiões.

Criar e configurar o projeto

Crie um projeto do Eclipse e instale o SDK de Voz.

Além disso, para ativar o registo, atualize o ficheiro pom.xml para incluir a seguinte dependência:

 <dependency>
     <groupId>org.slf4j</groupId>
     <artifactId>slf4j-simple</artifactId>
     <version>1.7.5</version>
 </dependency>

Adicionar código de exemplo

Para adicionar uma nova classe vazia ao seu projeto Java, selecioneNova>Classe de Ficheiro>.
Na janela Nova Classe Java , introduza speechsdk.quickstart no campo Pacote e Principal no campo Nome .

Abra a classe recém-criada Main e substitua o conteúdo do Main.java ficheiro pelo seguinte código inicial:

package speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;
import java.io.InputStream;

public class Main {
    final Logger log = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        // New code will go here
    }

    private void playAudioStream(PullAudioOutputStream audio) {
        ActivityAudioStream stream = new ActivityAudioStream(audio);
        final ActivityAudioStream.ActivityAudioFormat audioFormat = stream.getActivityAudioFormat();
        final AudioFormat format = new AudioFormat(
                AudioFormat.Encoding.PCM_SIGNED,
                audioFormat.getSamplesPerSecond(),
                audioFormat.getBitsPerSample(),
                audioFormat.getChannels(),
                audioFormat.getFrameSize(),
                audioFormat.getSamplesPerSecond(),
                false);
        try {
            int bufferSize = format.getFrameSize();
            final byte[] data = new byte[bufferSize];

            SourceDataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
            SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(format);

            if (line != null) {
                line.start();
                int nBytesRead = 0;
                while (nBytesRead != -1) {
                    nBytesRead = stream.read(data);
                    if (nBytesRead != -1) {
                        line.write(data, 0, nBytesRead);
                    }
                }
                line.drain();
                line.stop();
                line.close();
            }
            stream.close();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

main No método, primeiro configure o seu DialogServiceConfig e utilize-o para criar uma DialogServiceConnector instância. Esta instância liga-se ao canal de Voz do Direct Line para interagir com o bot. Também AudioConfig é utilizada uma instância para especificar a origem da entrada de áudio. Neste exemplo, o microfone predefinido é utilizado com AudioConfig.fromDefaultMicrophoneInput().
- Substitua a cadeia YourSubscriptionKey pela sua chave de recurso de Voz, que pode obter a partir do portal do Azure.
- Substitua a cadeia YourServiceRegion pela região associada ao recurso de Voz.
Nota

Veja a lista de regiões suportadas para assistentes de voz e certifique-se de que os seus recursos são implementados numa dessas regiões.
```
final String subscriptionKey = "YourSubscriptionKey"; // Your subscription key
final String region = "YourServiceRegion"; // Your speech subscription service region
final BotFrameworkConfig botConfig = BotFrameworkConfig.fromSubscription(subscriptionKey, region);

// Configure audio input from a microphone.
final AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();

// Create a DialogServiceConnector instance.
final DialogServiceConnector connector = new DialogServiceConnector(botConfig, audioConfig);
```

O conector DialogServiceConnector baseia-se em vários eventos para comunicar as atividades do bot, os resultados do reconhecimento de voz e outras informações. Adicione estes serviços de escuta de eventos a seguir.

// Recognizing will provide the intermediate recognized text while an audio stream is being processed.
connector.recognizing.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognizing speech event text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// Recognized will provide the final recognized text once audio capture is completed.
connector.recognized.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognized speech event reason text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// SessionStarted will notify when audio begins flowing to the service for a turn.
connector.sessionStarted.addEventListener((o, sessionEventArgs) -> {
    log.info("Session Started event id: {} ", sessionEventArgs.getSessionId());
});

// SessionStopped will notify when a turn is complete and it's safe to begin listening again.
connector.sessionStopped.addEventListener((o, sessionEventArgs) -> {
    log.info("Session stopped event id: {}", sessionEventArgs.getSessionId());
});

// Canceled will be signaled when a turn is aborted or experiences an error condition.
connector.canceled.addEventListener((o, canceledEventArgs) -> {
    log.info("Canceled event details: {}", canceledEventArgs.getErrorDetails());
    connector.disconnectAsync();
});

// ActivityReceived is the main way your bot will communicate with the client and uses Bot Framework activities.
connector.activityReceived.addEventListener((o, activityEventArgs) -> {
    final String act = activityEventArgs.getActivity().serialize();
        log.info("Received activity {} audio", activityEventArgs.hasAudio() ? "with" : "without");
        if (activityEventArgs.hasAudio()) {
            playAudioStream(activityEventArgs.getAudio());
        }
    });

Ligue-se DialogServiceConnector ao Direct Line Voz invocando o connectAsync() método. Para testar o bot, pode invocar o listenOnceAsync método para enviar entradas de áudio a partir do microfone. Além disso, também pode utilizar o sendActivityAsync método para enviar uma atividade personalizada como uma cadeia serializada. Estas atividades personalizadas podem fornecer dados adicionais que o bot utiliza na conversação.
```
connector.connectAsync();
// Start listening.
System.out.println("Say something ...");
connector.listenOnceAsync();

// connector.sendActivityAsync(...)
```
Guarde as alterações ao Main ficheiro.
Para suportar a reprodução de resposta, adicione uma classe adicional que transforma o objeto PullAudioOutputStream devolvido da API getAudio() para um Java InputStream para facilitar o processamento. Esta ActivityAudioStream é uma classe especializada que processa a resposta de áudio a partir do canal de Voz Direct Line. Fornece acessórios para obter informações de formato de áudio necessárias para processar a reprodução. Para tal, selecione Ficheiro>Nova>Classe.
Na janela Nova Classe Java , introduza speechsdk.quickstart no campo Pacote e ActivityAudioStream no campo Nome .

Abra a classe recém-criada ActivityAudioStream e substitua-a pelo seguinte código:

package com.speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;

import java.io.IOException;
import java.io.InputStream;

 public final class ActivityAudioStream extends InputStream {
     /**
      * The number of samples played per second (16 kHz).
      */
     public static final long SAMPLE_RATE = 16000;
     /**
      * The number of bits in each sample of a sound that has this format (16 bits).
      */
     public static final int BITS_PER_SECOND = 16;
     /**
      * The number of audio channels in this format (1 for mono).
      */
     public static final int CHANNELS = 1;
     /**
      * The number of bytes in each frame of a sound that has this format (2).
      */
     public static final int FRAME_SIZE = 2;

     /**
      * Reads up to a specified maximum number of bytes of data from the audio
      * stream, putting them into the given byte array.
      *
      * @param b   the buffer into which the data is read
      * @param off the offset, from the beginning of array <code>b</code>, at which
      *            the data will be written
      * @param len the maximum number of bytes to read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b, int off, int len) {
         byte[] tempBuffer = new byte[len];
         int n = (int) this.pullStreamImpl.read(tempBuffer);
         for (int i = 0; i < n; i++) {
             if (off + i > b.length) {
                 throw new ArrayIndexOutOfBoundsException(b.length);
             }
             b[off + i] = tempBuffer[i];
         }
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Reads the next byte of data from the activity audio stream if available.
      *
      * @return the next byte of data, or -1 if the end of the stream is reached
      * @see #read(byte[], int, int)
      * @see #read(byte[])
      * @see #available
      * <p>
      */
     @Override
     public int read() {
         byte[] data = new byte[1];
         int temp = read(data);
         if (temp <= 0) {
             // we have a weird situation if read(byte[]) returns 0!
             return -1;
         }
         return data[0] & 0xFF;
     }

     /**
      * Reads up to a specified maximum number of bytes of data from the activity audio stream,
      * putting them into the given byte array.
      *
      * @param b the buffer into which the data is read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b) {
         int n = (int) pullStreamImpl.read(b);
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Skips over and discards a specified number of bytes from this
      * audio input stream.
      *
      * @param n the requested number of bytes to be skipped
      * @return the actual number of bytes skipped
      * @throws IOException if an input or output error occurs
      * @see #read
      * @see #available
      */
     @Override
     public long skip(long n) {
         if (n <= 0) {
             return 0;
         }
         if (n <= Integer.MAX_VALUE) {
             byte[] tempBuffer = new byte[(int) n];
             return read(tempBuffer);
         }
         long count = 0;
         for (long i = n; i > 0; i -= Integer.MAX_VALUE) {
             int size = (int) Math.min(Integer.MAX_VALUE, i);
             byte[] tempBuffer = new byte[size];
             count += read(tempBuffer);
         }
         return count;
     }

     /**
      * Closes this audio input stream and releases any system resources associated
      * with the stream.
      */
     @Override
     public void close() {
         this.pullStreamImpl.close();
     }

     /**
      * Fetch the audio format for the ActivityAudioStream. The ActivityAudioFormat defines the sample rate, bits per sample, and the # channels.
      *
      * @return instance of the ActivityAudioFormat associated with the stream
      */
     public ActivityAudioStream.ActivityAudioFormat getActivityAudioFormat() {
         return activityAudioFormat;
     }

     /**
      * Returns the maximum number of bytes that can be read (or skipped over) from this
      * audio input stream without blocking.
      *
      * @return the number of bytes that can be read from this audio input stream without blocking.
      * As this implementation does not buffer, this will be defaulted to 0
      */
     @Override
     public int available() {
         return 0;
     }

     public ActivityAudioStream(final PullAudioOutputStream stream) {
         pullStreamImpl = stream;
         this.activityAudioFormat = new ActivityAudioStream.ActivityAudioFormat(SAMPLE_RATE, BITS_PER_SECOND, CHANNELS, FRAME_SIZE, AudioEncoding.PCM_SIGNED);
     }

     private PullAudioOutputStream pullStreamImpl;

     private ActivityAudioFormat activityAudioFormat;

     /**
      * ActivityAudioFormat is an internal format which contains metadata regarding the type of arrangement of
      * audio bits in this activity audio stream.
      */
     static class ActivityAudioFormat {

         private long samplesPerSecond;
         private int bitsPerSample;
         private int channels;
         private int frameSize;
         private AudioEncoding encoding;

         public ActivityAudioFormat(long samplesPerSecond, int bitsPerSample, int channels, int frameSize, AudioEncoding encoding) {
             this.samplesPerSecond = samplesPerSecond;
             this.bitsPerSample = bitsPerSample;
             this.channels = channels;
             this.encoding = encoding;
             this.frameSize = frameSize;
         }

         /**
          * Fetch the number of samples played per second for the associated audio stream format.
          *
          * @return the number of samples played per second
          */
         public long getSamplesPerSecond() {
             return samplesPerSecond;
         }

         /**
          * Fetch the number of bits in each sample of a sound that has this audio stream format.
          *
          * @return the number of bits per sample
          */
         public int getBitsPerSample() {
             return bitsPerSample;
         }

         /**
          * Fetch the number of audio channels used by this audio stream format.
          *
          * @return the number of channels
          */
         public int getChannels() {
             return channels;
         }

         /**
          * Fetch the default number of bytes in a frame required by this audio stream format.
          *
          * @return the number of bytes
          */
         public int getFrameSize() {
             return frameSize;
         }

         /**
          * Fetch the audio encoding type associated with this audio stream format.
          *
          * @return the encoding associated
          */
         public AudioEncoding getEncoding() {
             return encoding;
         }
     }

     /**
      * Enum defining the types of audio encoding supported by this stream.
      */
     public enum AudioEncoding {
         PCM_SIGNED("PCM_SIGNED");

         String value;

         AudioEncoding(String value) {
             this.value = value;
         }
     }
 }

Guarde as alterações ao ActivityAudioStream ficheiro.

Compilar e executar a aplicação

Selecione F11 ou selecione Executar>Depuração. A consola apresenta a mensagem "Diga algo". Neste momento, fale uma frase ou expressão em inglês que o bot possa compreender. A sua voz é transmitida para o bot através do canal de Voz Direct Line onde é reconhecido e processado pelo bot. A resposta é devolvida como uma atividade. Se o bot devolver voz como resposta, o áudio é reproduzido através da AudioPlayer classe.

Captura de ecrã da saída da consola após o reconhecimento bem-sucedido

Passos seguintes

Explorar exemplos de Java no GitHub

Pré-requisitos

Antes de começar, certifique-se de que:

Criar um recurso de Voz
Configurar o seu ambiente de desenvolvimento e criar um projeto vazio
Criar um bot ligado ao canal de Voz do Direct Line
Certifique-se de que tem acesso a um microfone para captura de áudio

Nota

Veja a lista de regiões suportadas para assistentes de voz e certifique-se de que os seus recursos são implementados numa dessas regiões.

Criar e configurar um projeto

Instale o SDK de Voz com o Android Studio.

Criar a interface de utilizador

Nesta secção, vamos criar uma interface de utilizador (IU) básica para a aplicação. Vamos começar por abrir a atividade principal: activity_main.xml. O modelo básico inclui uma barra de título com o nome da aplicação e uma TextView com a mensagem "Olá mundo!".

Em seguida, substitua o conteúdo do activity_main.xml pelo seguinte código:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
 xmlns:tools="http://schemas.android.com/tools"
 android:layout_width="match_parent"
 android:layout_height="match_parent"
 android:orientation="vertical"
 tools:context=".MainActivity">

 <Button
     android:id="@+id/button"
     android:layout_width="wrap_content"
     android:layout_height="wrap_content"
     android:layout_gravity="center"
     android:onClick="onBotButtonClicked"
     android:text="Talk to your bot" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Recognition Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/recoText"
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="  \n(Recognition goes here)\n" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Activity Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/activityText"
     android:layout_width="match_parent"
     android:layout_height="match_parent"
     android:scrollbars="vertical"
     android:text="  \n(Activities go here)\n" />

</LinearLayout>

Este XML define uma IU simples para interagir com o bot.

O button elemento inicia uma interação e invoca o onBotButtonClicked método quando clica.
O recoText elemento apresentará a voz aos resultados de texto enquanto fala com o bot.
O activityText elemento apresentará o payload JSON para a atividade mais recente do Bot Framework a partir do bot.

O texto e a representação gráfica da IU devem ter o seguinte aspeto:

Captura de ecrã a mostrar o aspeto da IU da Conversa com o bot.

Adicionar código de exemplo

Abra e substitua MainActivity.javao conteúdo pelo seguinte código:

 package samples.speech.cognitiveservices.microsoft.com;

 import android.media.AudioFormat;
 import android.media.AudioManager;
 import android.media.AudioTrack;
 import android.support.v4.app.ActivityCompat;
 import android.support.v7.app.AppCompatActivity;
 import android.os.Bundle;
 import android.text.method.ScrollingMovementMethod;
 import android.view.View;
 import android.widget.TextView;

 import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
 import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
 import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
 import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;

 import org.json.JSONException;
 import org.json.JSONObject;

 import static android.Manifest.permission.*;

 public class MainActivity extends AppCompatActivity {
     // Replace below with your own speech subscription key
     private static String speechSubscriptionKey = "YourSpeechSubscriptionKey";
     // Replace below with your own speech service region
     private static String serviceRegion = "YourSpeechServiceRegion";

     private DialogServiceConnector connector;

     @Override
     protected void onCreate(Bundle savedInstanceState) {
         super.onCreate(savedInstanceState);
         setContentView(R.layout.activity_main);

         TextView recoText = (TextView) this.findViewById(R.id.recoText);
         TextView activityText = (TextView) this.findViewById(R.id.activityText);
         recoText.setMovementMethod(new ScrollingMovementMethod());
         activityText.setMovementMethod(new ScrollingMovementMethod());

         // Note: we need to request permissions for audio input and network access
         int requestCode = 5; // unique code for the permission request
         ActivityCompat.requestPermissions(MainActivity.this, new String[]{RECORD_AUDIO, INTERNET}, requestCode);
     }

     public void onBotButtonClicked(View v) {
         // Recreate the DialogServiceConnector on each button press, ensuring that the existing one is closed
         if (connector != null) {
             connector.close();
             connector = null;
         }

         // Create the DialogServiceConnector from speech subscription information
         BotFrameworkConfig config = BotFrameworkConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
         connector = new DialogServiceConnector(config, AudioConfig.fromDefaultMicrophoneInput());

         // Optional step: preemptively connect to reduce first interaction latency
         connector.connectAsync();

         // Register the DialogServiceConnector's event listeners
         registerEventListeners();

         // Begin sending audio to your bot
         connector.listenOnceAsync();
     }

     private void registerEventListeners() {
         TextView recoText = (TextView) this.findViewById(R.id.recoText); // 'recoText' is the ID of your text view
         TextView activityText = (TextView) this.findViewById(R.id.activityText); // 'activityText' is the ID of your text view

         // Recognizing will provide the intermediate recognized text while an audio stream is being processed
         connector.recognizing.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognizing: " + recoArgs.getResult().getText());
         });

         // Recognized will provide the final recognized text once audio capture is completed
         connector.recognized.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognized: " + recoArgs.getResult().getText());
         });

         // SessionStarted will notify when audio begins flowing to the service for a turn
         connector.sessionStarted.addEventListener((o, sessionArgs) -> {
             recoText.setText("Listening...");
         });

         // SessionStopped will notify when a turn is complete and it's safe to begin listening again
         connector.sessionStopped.addEventListener((o, sessionArgs) -> {
         });

         // Canceled will be signaled when a turn is aborted or experiences an error condition
         connector.canceled.addEventListener((o, canceledArgs) -> {
             recoText.setText("Canceled (" + canceledArgs.getReason().toString() + ") error details: {}" + canceledArgs.getErrorDetails());
             connector.disconnectAsync();
         });

         // ActivityReceived is the main way your bot will communicate with the client and uses bot framework activities.
         connector.activityReceived.addEventListener((o, activityArgs) -> {
             try {
                 // Here we use JSONObject only to "pretty print" the condensed Activity JSON
                 String rawActivity = activityArgs.getActivity().serialize();
                 String formattedActivity = new JSONObject(rawActivity).toString(2);
                 activityText.setText(formattedActivity);
             } catch (JSONException e) {
                 activityText.setText("Couldn't format activity text: " + e.getMessage());
             }

             if (activityArgs.hasAudio()) {
                 // Text to speech audio associated with the activity is 16 kHz 16-bit mono PCM data
                 final int sampleRate = 16000;
                 int bufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);

                 AudioTrack track = new AudioTrack(
                         AudioManager.STREAM_MUSIC,
                         sampleRate,
                         AudioFormat.CHANNEL_OUT_MONO,
                         AudioFormat.ENCODING_PCM_16BIT,
                         bufferSize,
                         AudioTrack.MODE_STREAM);

                 track.play();

                 PullAudioOutputStream stream = activityArgs.getAudio();

                 // Audio is streamed as it becomes available. Play it as it arrives.
                 byte[] buffer = new byte[bufferSize];
                 long bytesRead = 0;

                 do {
                     bytesRead = stream.read(buffer);
                     track.write(buffer, 0, (int) bytesRead);
                 } while (bytesRead == bufferSize);

                 track.release();
             }
         });
     }
 }

O onCreate método inclui código que pede permissões de microfone e Internet.
O método onBotButtonClicked é, como observado anteriormente, o processador do clique do botão. Um botão premir aciona uma única interação ("turn") com o bot.
O registerEventListeners método demonstra os eventos utilizados pelo DialogServiceConnector processamento básico e básico das atividades de entrada.

No mesmo ficheiro, substitua as cadeias de configuração para corresponder aos seus recursos:
- Substitua YourSpeechSubscriptionKey pela sua chave de subscrição.
- Substitua YourServiceRegion pela região associada à sua subscrição Apenas um subconjunto de regiões do serviço de Voz é atualmente suportado pelo Direct Line Voz. Para obter mais informações, veja regiões.

Compilar e executar a aplicação

Ligue o seu dispositivo Android ao PC de desenvolvimento. Certifique-se de que ativou o modo de desenvolvimento e depuração USB no dispositivo.
Para criar a aplicação, prima Ctrl+F9 ou selecione Criar>Criar Projeto na barra de menus.
Para iniciar a aplicação, prima Shift+F10 ou selecione Executar>"aplicação".
Na janela de destino de implementação que aparece, selecione o seu dispositivo Android.

Assim que a aplicação e a respetiva atividade tiverem sido iniciadas, clique no botão para começar a falar com o bot. O texto transcrito será apresentado à medida que fala e a atividade mais recente que recebeu do bot será apresentada quando for recebido. Se o bot estiver configurado para fornecer respostas faladas, a conversão de voz em texto será reproduzida automaticamente.

Captura de ecrã da aplicação Android

Passos seguintes

Explorar exemplos de Java no GitHub

Pode ver ou transferir todos os Exemplos de Go do SDK de Voz no GitHub.

Pré-requisitos

Antes de começar:

Criar um recurso de Voz
Configurar o ambiente de desenvolvimento e criar um projeto vazio
Criar um bot ligado ao canal de Voz do Direct Line
Certifique-se de que tem acesso a um microfone para captura de áudio

Nota

Veja a lista de regiões suportadas para assistentes de voz e certifique-se de que os seus recursos são implementados numa dessas regiões.

Configurar o seu ambiente

Atualize o ficheiro go.mod com a versão mais recente do SDK ao adicionar esta linha

require (
    github.com/Microsoft/cognitive-services-speech-sdk-go v1.15.0
)

Começar com algum código de placa de caldeira

Substitua o conteúdo do ficheiro de origem (por exemplo, quickstart.go) pelo seguinte, que inclui:

Definição do pacote "principal"
importar os módulos necessários do SDK de Voz
variáveis para armazenar as informações do bot que serão substituídas mais tarde neste início rápido
uma implementação simples com o microfone para entrada de áudio
processadores de eventos para vários eventos que ocorrem durante uma interação de voz

package main

import (
    "fmt"
    "time"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/dialog"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_BOT_REGION"

    audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := dialog.NewBotFrameworkConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    connector, err := dialog.NewDialogServiceConnectorFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer connector.Close()
    activityReceivedHandler := func(event dialog.ActivityReceivedEventArgs) {
        defer event.Close()
        fmt.Println("Received an activity.")
    }
    connector.ActivityReceived(activityReceivedHandler)
    recognizedHandle := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognized ", event.Result.Text)
    }
    connector.Recognized(recognizedHandle)
    recognizingHandler := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognizing ", event.Result.Text)
    }
    connector.Recognizing(recognizingHandler)
    connector.ListenOnceAsync()
    <-time.After(10 * time.Second)
}

Substitua os YOUR_SUBSCRIPTION_KEY valores e YOUR_BOT_REGION pelos valores reais do recurso de Voz.

Navegue para o portal do Azure e abra o recurso de Voz
Em Chaves e Ponto Final à esquerda, existem duas chaves de subscrição disponíveis
- Utilizar um como substituição de YOUR_SUBSCRIPTION_KEY valor
Na Descrição Geral à esquerda, anote a região e mapeie-a para o identificador de região
- Utilize o identificador Região como a substituição do YOUR_BOT_REGION valor, por exemplo: "westus" para E.U.A. Oeste
Nota

Veja a lista de regiões suportadas para assistentes de voz e certifique-se de que os seus recursos são implementados numa dessas regiões.

Nota

Para obter informações sobre como configurar o bot, consulte a documentação do Bot Framework para o canal de Voz Direct Line.

Explicação do código

A chave de subscrição de Voz e a região são necessárias para criar um objeto de configuração de voz. O objeto de configuração é necessário para instanciar um objeto de reconhecedor de voz.

A instância do reconhecedor expõe várias formas de reconhecer voz. Neste exemplo, a voz é continuamente reconhecida. Esta funcionalidade permite que o serviço de Voz saiba que está a enviar muitas expressões para reconhecimento e quando o programa termina para deixar de reconhecer voz. À medida que os resultados são obtidos, o código irá escrevê-los na consola do .

Compilar e executar

Está agora configurado para criar o seu projeto e testar a sua assistente de voz personalizada com o serviço de Voz.

Crie o seu projeto, por exemplo , "go build"
Execute o módulo e fale uma frase ou frase no microfone do seu dispositivo. A sua voz é transmitida para o canal de Voz Direct Line e transcrita para texto, que aparece como saída.

Nota

O SDK de Voz irá, por predefinição, reconhecer a utilização de en-us para o idioma. Veja Como reconhecer voz para obter informações sobre como escolher o idioma de origem.

Passos seguintes

Explorar exemplos do Go no GitHub

Suporte adicional de linguagem e plataforma

Se clicou neste separador, provavelmente não viu um início rápido na sua linguagem de programação favorita. Não se preocupe, temos materiais de início rápido e exemplos de código adicionais disponíveis no GitHub. Utilize a tabela para encontrar o exemplo certo para a linguagem de programação e a combinação plataforma/SO.

Linguagem	Exemplos de código
C#	.NET Framework, .NET Core, UWP, Unity, Xamarin
C++	Windows, Linux, macOS
Java	Android, JRE
JavaScript	Browser, Node.js
Objective-C	iOS, macOS
Python	Windows, Linux, macOS
Swift	iOS, macOS

Share via

Início Rápido: Criar uma assistente de voz personalizada

Pré-requisitos

Abrir o projeto no Visual Studio

Começar com algum código de placa de caldeira

Compile e execute a sua aplicação

Passos seguintes

Pré-requisitos

Criar e configurar o projeto

Adicionar código de exemplo

Compilar e executar a aplicação

Passos seguintes

Pré-requisitos

Criar e configurar um projeto

Criar a interface de utilizador

Adicionar código de exemplo

Compilar e executar a aplicação

Passos seguintes

Pré-requisitos

Configurar o seu ambiente

Começar com algum código de placa de caldeira

Explicação do código

Compilar e executar

Passos seguintes

Suporte adicional de linguagem e plataforma

Recursos adicionais