Quickstart: Recognize speech in Java on Android by using the Speech SDK

In this article, you'll learn how to develop a Java application for Android using the Cognitive Services Speech SDK to transcribe speech to text. The application is based on the Speech SDK Maven Package, version 1.6.0, and Android Studio 3.3. The Speech SDK is currently compatible with Android devices having 32/64-bit ARM and Intel x86/x64 compatible processors.

Note

For the Speech Devices SDK and the Roobo device, see Speech Devices SDK.

Prerequisites

You need a Speech Services subscription key to complete this Quickstart. You can get one for free. See Try the Speech Services for free for details.

Create and configure a project

  1. Launch Android Studio, and choose Start a new Android Studio project in the Welcome window.

    Screenshot of Android Studio Welcome window

  2. The Choose your project wizard appears, select Phone and Tablet and Empty Activity in the activity selection box. Select Next.

    Screenshot of Choose your project wizard

  3. In the Configure your project screen, enter Quickstart as Name, samples.speech.cognitiveservices.microsoft.com as Package name, and choose a project directory. For Minimum API level pick API 23: Android 6.0 (Marshmallow), leave all other checkboxes unchecked, and select Finish.

    Screenshot of Configure your project wizard

Android Studio takes a moment to prepare your new Android project. Next, configure the project to know about the Speech SDK and to use Java 8.

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

The current version of the Cognitive Services Speech SDK is 1.6.0.

The Speech SDK for Android is packaged as an AAR (Android Library), which includes the necessary libraries and required Android permissions. It is hosted in a Maven repository at https://csspeechstorage.blob.core.windows.net/maven/.

Set up your project to use the Speech SDK. Open the Project Structure window by choosing File > Project Structure from the Android Studio menu bar. In the Project Structure window, make the following changes:

  1. In the list on the left side of the window, select Project. Edit the Default Library Repository settings by appending a comma and our Maven repository URL enclosed in single quotes. 'https://csspeechstorage.blob.core.windows.net/maven/'

    Screenshot of Project Structure window

  2. In the same screen, on the left side, select app. Then select the Dependencies tab at the top of the window. Select the green plus sign (+), and choose Library dependency from the drop-down menu.

    Screenshot of Project Structure window

  3. In the window that comes up, enter the name and version of our Speech SDK for Android, com.microsoft.cognitiveservices.speech:client-sdk:1.6.0. Then select OK. The Speech SDK should be added to the list of dependencies now, as shown below:

    Screenshot of Project Structure window

  4. Select the Properties tab. For both Source Compatibility and Target Compatibility, select 1.8.

  5. Select OK to close the Project Structure window and apply your changes to the project.

Create user interface

We will create a basic user interface for the application. Edit the layout for your main activity, activity_main.xml. Initially, the layout includes a title bar with your application's name, and a TextView containing the text "Hello World!".

  • Click the TextView element. Change its ID attribute in the upper-right corner to hello.

  • From the Palette in the upper left of the activity_main.xml window, drag a button into the empty space above the text.

  • In the button's attributes on the right, in the value for the onClick attribute, enter onSpeechButtonClicked. We'll write a method with this name to handle the button event. Change its ID attribute in the upper-right corner to button.

  • Use the magic wand icon at the top of the designer to infer layout constraints.

    Screenshot of magic wand icon

The text and graphical representation of your UI should now look like this:

<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <TextView
        android:id="@+id/hello"
        android:layout_width="366dp"
        android:layout_height="295dp"
        android:text="Hello World!"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintLeft_toLeftOf="parent"
        app:layout_constraintRight_toRightOf="parent"
        app:layout_constraintTop_toTopOf="parent"
        app:layout_constraintVertical_bias="0.925" />

    <Button
        android:id="@+id/button"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:layout_marginStart="16dp"
        android:onClick="onSpeechButtonClicked"
        android:text="Button"
        app:layout_constraintBottom_toTopOf="@+id/hello"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent"
        app:layout_constraintVertical_bias="0.072" />

</android.support.constraint.ConstraintLayout>

Add sample code

  1. Open the source file MainActivity.java. Replace all the code in this file with the following.

    package com.microsoft.cognitiveservices.speech.samples.quickstart;
    
    import android.support.v4.app.ActivityCompat;
    import android.support.v7.app.AppCompatActivity;
    import android.os.Bundle;
    import android.util.Log;
    import android.view.View;
    import android.widget.TextView;
    
    import com.microsoft.cognitiveservices.speech.ResultReason;
    import com.microsoft.cognitiveservices.speech.SpeechConfig;
    import com.microsoft.cognitiveservices.speech.SpeechRecognitionResult;
    import com.microsoft.cognitiveservices.speech.SpeechRecognizer;
    
    import java.util.concurrent.Future;
    
    import static android.Manifest.permission.*;
    
    public class MainActivity extends AppCompatActivity {
    
        // Replace below with your own subscription key
        private static String speechSubscriptionKey = "YourSubscriptionKey";
        // Replace below with your own service region (e.g., "westus").
        private static String serviceRegion = "YourServiceRegion";
    
        @Override
        protected void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            setContentView(R.layout.activity_main);
    
            // Note: we need to request the permissions
            int requestCode = 5; // unique code for the permission request
            ActivityCompat.requestPermissions(MainActivity.this, new String[]{RECORD_AUDIO, INTERNET}, requestCode);
        }
    
        public void onSpeechButtonClicked(View v) {
            TextView txt = (TextView) this.findViewById(R.id.hello); // 'hello' is the ID of your text view
    
            try {
                SpeechConfig config = SpeechConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
                assert(config != null);
    
                SpeechRecognizer reco = new SpeechRecognizer(config);
                assert(reco != null);
    
                Future<SpeechRecognitionResult> task = reco.recognizeOnceAsync();
                assert(task != null);
    
                // Note: this will block the UI thread, so eventually, you want to
                //        register for the event (see full samples)
                SpeechRecognitionResult result = task.get();
                assert(result != null);
    
                if (result.getReason() == ResultReason.RecognizedSpeech) {
                    txt.setText(result.toString());
                }
                else {
                    txt.setText("Error recognizing. Did you update the subscription info?" + System.lineSeparator() + result.toString());
                }
    
                reco.close();
            } catch (Exception ex) {
                Log.e("SpeechSDKDemo", "unexpected " + ex.getMessage());
                assert(false);
            }
        }
    }
    
    • The onCreate method includes code that requests microphone and internet permissions, and initializes the native platform binding. Configuring the native platform bindings is only required once. It should be done early during application initialization.

    • The method onSpeechButtonClicked is, as noted earlier, the button click handler. A button press triggers speech to text transcription.

  2. In the same file, replace the string YourSubscriptionKey with your subscription key.

  3. Also replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

Build and run the app

  1. Connect your Android device to your development PC. Make sure you have enabled development mode and USB debugging on the device.

  2. To build the application, press Ctrl+F9, or choose Build > Make Project from the menu bar.

  3. To launch the application, press Shift+F10, or choose Run > Run 'app'.

  4. In the deployment target window that appears, choose your Android device.

    Screenshot of Select Deployment Target window

Press the button in the application to begin a speech recognition section. The next 15 seconds of English speech will be sent to the Speech Services and transcribed. The result appears in the Android application, and in the logcat window in Android Studio.

Screenshot of the Android application

Next steps

See also