Quickstart: Translate speech with the Speech SDK for Java

Quickstarts are also available for speech-to-text and voice-first virtual assistant.

In this quickstart, you'll create a simple Java application that captures user speech from your computer's microphone, translates the speech, and transcribes the translated text to the command line in real time. This application is designed to run on 64-bit Windows or 64-bit Linux (Ubuntu 16.04, Ubuntu 18.04, Debian 9), or on macOS 10.13 or later. It is built with the Speech SDK Maven package and the Eclipse Java IDE.

For a complete list of languages available for speech translation, see language support.

Prerequisites

This quickstart requires:

If you're running Linux, make sure these dependencies are installed before starting Eclipse.

  • On Ubuntu:

    sudo apt-get update
    sudo apt-get install libssl1.0.0 libasound2
    
  • On Debian 9:

    sudo apt-get update
    sudo apt-get install libssl1.0.2 libasound2
    

Note

For the Speech Devices SDK and the Roobo device, see Speech Devices SDK.

Create and configure project

  1. Start Eclipse.

  2. In the Eclipse Launcher, in the Workspace field, enter the name of a new workspace directory. Then select Launch.

    Screenshot of Eclipse Launcher

  3. In a moment, the main window of the Eclipse IDE appears. Close the Welcome screen if one is present.

  4. From the Eclipse menu bar, create a new project by choosing File > New > Project.

  5. The New Project dialog box appears. Select Java Project, and select Next.

    Screenshot of New Project dialog box, with Java Project highlighted

  6. The New Java Project wizard starts. In the Project name field, enter quickstart, and choose JavaSE-1.8 as the execution environment. Select Finish.

    Screenshot of New Java Project wizard

  7. If the Open Associated Perspective? window appears, select Open Perspective.

  8. In the Package explorer, right-click the quickstart project. Choose Configure > Convert to Maven Project from the context menu.

    Screenshot of Package explorer

  9. The Create new POM window appears. In the Group Id field, enter com.microsoft.cognitiveservices.speech.samples, and in the Artifact Id field, enter quickstart. Then select Finish.

    Screenshot of Create new POM window

  10. Open the pom.xml file and edit it.

    • At the end of the file, before the closing tag </project>, create a repositories element with a reference to the Maven repository for the Speech SDK, as shown here:

      <repositories>
        <repository>
          <id>maven-cognitiveservices-speech</id>
          <name>Microsoft Cognitive Services Speech Maven Repository</name>
          <url>https://csspeechstorage.blob.core.windows.net/maven/</url>
        </repository>
      </repositories>
      
    • Also add a dependencies element, with the Speech SDK version 1.6.0 as a dependency:

      <dependencies>
        <dependency>
          <groupId>com.microsoft.cognitiveservices.speech</groupId>
          <artifactId>client-sdk</artifactId>
          <version>1.6.0</version>
        </dependency>
      </dependencies>
      
    • Save the changes.

Add sample code

  1. To add a new empty class to your Java project, select File > New > Class.

  2. In the New Java Class window, enter speechsdk.quickstart into the Package field, and Main into the Name field.

    Screenshot of New Java Class window

  3. Replace all code in Main.java with the following snippet:

    package speechsdk.quickstart;
    
    import java.io.IOException;
    import java.util.Map;
    import java.util.Scanner;
    import java.util.concurrent.ExecutionException;
    import com.microsoft.cognitiveservices.speech.*;
    import com.microsoft.cognitiveservices.speech.translation.*;
    
    public class Main {
    
        public static void translationWithMicrophoneAsync() throws InterruptedException, ExecutionException, IOException
        {
            // Creates an instance of a speech translation config with specified
            // subscription key and service region. Replace with your own subscription key
            // and service region (e.g., "westus").
            SpeechTranslationConfig config = SpeechTranslationConfig.fromSubscription("YourSubscriptionKey", "YourServiceRegion");
    
            // Sets source and target language(s).
            String fromLanguage = "en-US";
            config.setSpeechRecognitionLanguage(fromLanguage);
            config.addTargetLanguage("de");
    
            // Sets voice name of synthesis output.
            String GermanVoice = "Microsoft Server Speech Text to Speech Voice (de-DE, Hedda)";
            config.setVoiceName(GermanVoice);
    
            // Creates a translation recognizer using microphone as audio input.
            TranslationRecognizer recognizer = new TranslationRecognizer(config);
            {
                // Subscribes to events.
                recognizer.recognizing.addEventListener((s, e) -> {
                    System.out.println("RECOGNIZING in '" + fromLanguage + "': Text=" + e.getResult().getText());
    
                    Map<String, String> map = e.getResult().getTranslations();
                    for(String element : map.keySet()) {
                        System.out.println("    TRANSLATING into '" + element + "': " + map.get(element));
                    }
                });
    
                recognizer.recognized.addEventListener((s, e) -> {
                    if (e.getResult().getReason() == ResultReason.TranslatedSpeech) {
                        System.out.println("RECOGNIZED in '" + fromLanguage + "': Text=" + e.getResult().getText());
    
                        Map<String, String> map = e.getResult().getTranslations();
                        for(String element : map.keySet()) {
                            System.out.println("    TRANSLATED into '" + element + "': " + map.get(element));
                        }
                    }
                    if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                        System.out.println("RECOGNIZED: Text=" + e.getResult().getText());
                        System.out.println("    Speech not translated.");
                    }
                    else if (e.getResult().getReason() == ResultReason.NoMatch) {
                        System.out.println("NOMATCH: Speech could not be recognized.");
                    }
                });
    
                recognizer.synthesizing.addEventListener((s, e) -> {
                    System.out.println("Synthesis result received. Size of audio data: " + e.getResult().getAudio().length);
                });
    
                recognizer.canceled.addEventListener((s, e) -> {
                    System.out.println("CANCELED: Reason=" + e.getReason());
    
                    if (e.getReason() == CancellationReason.Error) {
                        System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
                        System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
                        System.out.println("CANCELED: Did you update the subscription info?");
                    }
                });
    
                recognizer.sessionStarted.addEventListener((s, e) -> {
                    System.out.println("\nSession started event.");
                });
    
                recognizer.sessionStopped.addEventListener((s, e) -> {
                    System.out.println("\nSession stopped event.");
                });
    
                // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
                System.out.println("Say something...");
                recognizer.startContinuousRecognitionAsync().get();
    
                System.out.println("Press any key to stop");
                new Scanner(System.in).nextLine();
    
                recognizer.stopContinuousRecognitionAsync().get();
            }
        }
    
        public static void main(String[] args) {
            try {
                translationWithMicrophoneAsync();
            } catch (Exception ex) {
                System.out.println("Unexpected exception: " + ex.getMessage());
                assert(false);
                System.exit(1);
            }
        }
    }
    
  4. Replace the string YourSubscriptionKey with your subscription key.

  5. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  6. Save changes to the project.

Build and run the app

Press F11, or select Run > Debug.

The speech input from your microphone will be transcribed into German and logged in the console window. Press "Enter" to stop capturing speech.

Screenshot of console output after successful recognition

Next steps

Additional samples, such as how to read speech from an audio file, and output translated text as synthesized speech, are available on GitHub.

See also