Quickstart: Recognize speech in Objective-C on iOS by using the Speech SDK

Quickstarts are also available for speech synthesis.

In this article, you learn how to create an iOS app in Objective-C by using the Azure Cognitive Services Speech SDK to transcribe speech to text from a microphone or from a file with recorded audio.

Prerequisites

Before you get started, you'll need:

Get the Speech SDK for iOS

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

The Cognitive Services Speech SDK for iOS is currently distributed as a Cocoa framework. It can be downloaded from this website. Download the file to your home directory.

Create an Xcode project

Start Xcode, and start a new project by selecting File > New > Project. In the template selection dialog box, select the iOS Single View App template.

In the dialog boxes that follow, make the following selections.

  1. In the Project Options dialog box:

    1. Enter a name for the quickstart app, for example, helloworld.
    2. Enter an appropriate organization name and organization identifier if you already have an Apple developer account. For testing purposes, use a name like testorg. To sign the app, you need a proper provisioning profile. For more information, see the Apple developer site.
    3. Make sure Objective-C is selected as the language for the project.
    4. Clear all the check boxes for tests and core data.

    Project settings

  2. Select a project directory:

    1. Choose your home directory to put the project in. This step creates a helloworld directory in your home directory that contains all the files for the Xcode project.

    2. Disable the creation of a Git repo for this example project.

    3. Adjust the paths to the SDK on the project settings screen.

      1. On the General tab under the Embedded Binaries header, add the SDK library as a framework by selecting Add embedded binaries > Add other. Go to your home directory and select the file MicrosoftCognitiveServicesSpeech.framework. This action adds the SDK library to the header Linked Framework and Libraries automatically. Added framework
      2. Go to the Build Settings tab, and select the All setting.
      3. Add the directory $(SRCROOT)/.. to Framework Search Paths under the Search Paths heading.

      Framework Search Paths setting

Set up the UI

The example app has a very simple UI. It has two buttons to start speech recognition either from file or from microphone input and a text label to display the result. The UI is set up in the Main.storyboard part of the project. Open the XML view of the storyboard by right-clicking the Main.storyboard entry of the project tree and selecting Open As > Source Code.

Replace the autogenerated XML with this code:

<?xml version="1.0" encoding="UTF-8"?>
<document type="com.apple.InterfaceBuilder3.CocoaTouch.Storyboard.XIB" version="3.0" toolsVersion="14113" targetRuntime="iOS.CocoaTouch" propertyAccessControl="none" useAutolayout="YES" useTraitCollections="YES" useSafeAreas="YES" colorMatched="YES" initialViewController="BYZ-38-t0r">
    <device id="retina4_7" orientation="portrait">
        <adaptation id="fullscreen"/>
    </device>
    <dependencies>
        <deployment identifier="iOS"/>
        <plugIn identifier="com.apple.InterfaceBuilder.IBCocoaTouchPlugin" version="14088"/>
        <capability name="Safe area layout guides" minToolsVersion="9.0"/>
        <capability name="documents saved in the Xcode 8 format" minToolsVersion="8.0"/>
    </dependencies>
    <scenes>
        <!--View Controller-->
        <scene sceneID="tne-QT-ifu">
            <objects>
                <viewController id="BYZ-38-t0r" customClass="ViewController" sceneMemberID="viewController">
                    <view key="view" contentMode="scaleToFill" id="8bC-Xf-vdC">
                        <rect key="frame" x="0.0" y="0.0" width="375" height="667"/>
                        <autoresizingMask key="autoresizingMask" widthSizable="YES" heightSizable="YES"/>
                        <subviews>
                            <button opaque="NO" contentMode="scaleToFill" fixedFrame="YES" contentHorizontalAlignment="center" contentVerticalAlignment="center" buttonType="roundedRect" lineBreakMode="middleTruncation" translatesAutoresizingMaskIntoConstraints="NO" id="qFP-u7-47Q">
                                <rect key="frame" x="84" y="247" width="207" height="82"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="Start speech recognition from file" identifier="recognize_file_button">
                                    <accessibilityTraits key="traits" button="YES" staticText="YES"/>
                                    <bool key="isElement" value="YES"/>
                                </accessibility>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <state key="normal" title="Recognize (File)"/>
                                <connections>
                                    <action selector="recognizeFromFileButtonTapped:" destination="BYZ-38-t0r" eventType="touchUpInside" id="Vfr-ah-nbC"/>
                                </connections>
                            </button>
                            <label opaque="NO" userInteractionEnabled="NO" contentMode="center" horizontalHuggingPriority="251" verticalHuggingPriority="251" fixedFrame="YES" text="Recognition result" textAlignment="center" lineBreakMode="tailTruncation" numberOfLines="5" baselineAdjustment="alignBaselines" adjustsFontSizeToFit="NO" translatesAutoresizingMaskIntoConstraints="NO" id="tq3-GD-ljB">
                                <rect key="frame" x="20" y="408" width="335" height="148"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="The result of speech recognition" identifier="result_label">
                                    <accessibilityTraits key="traits" notEnabled="YES"/>
                                    <bool key="isElement" value="NO"/>
                                </accessibility>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <color key="textColor" red="0.5" green="0.5" blue="0.5" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
                                <nil key="highlightedColor"/>
                            </label>
                            <button opaque="NO" contentMode="scaleToFill" fixedFrame="YES" contentHorizontalAlignment="center" contentVerticalAlignment="center" buttonType="roundedRect" lineBreakMode="middleTruncation" translatesAutoresizingMaskIntoConstraints="NO" id="91d-Ki-IyR">
                                <rect key="frame" x="16" y="209" width="339" height="30"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="Start speech recognition from microphone" identifier="recognize_microphone_button"/>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <state key="normal" title="Recognize (Microphone)"/>
                                <connections>
                                    <action selector="recognizeFromMicButtonTapped:" destination="BYZ-38-t0r" eventType="touchUpInside" id="2n3-kA-ySa"/>
                                </connections>
                            </button>
                        </subviews>
                        <color key="backgroundColor" red="1" green="1" blue="1" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
                        <viewLayoutGuide key="safeArea" id="6Tk-OE-BBY"/>
                    </view>
                    <connections>
                        <outlet property="recognitionResultLabel" destination="tq3-GD-ljB" id="kP4-o4-s0Q"/>
                    </connections>
                </viewController>
                <placeholder placeholderIdentifier="IBFirstResponder" id="dkx-z0-nzr" sceneMemberID="firstResponder"/>
            </objects>
            <point key="canvasLocation" x="135.19999999999999" y="132.68365817091455"/>
        </scene>
    </scenes>
</document>

Add the sample code

  1. Download the sample wav file by right-clicking the link and selecting Save target as. Add the wav file to the project as a resource by dragging it from a Finder window into the root level of the Project view. Select Finish in the following dialog box without changing the settings.

  2. Replace the contents of the autogenerated ViewController.m file with the following code:

    #import "ViewController.h"
    #import <MicrosoftCognitiveServicesSpeech/SPXSpeechApi.h>
    
    @interface ViewController () {
        NSString *speechKey;
        NSString *serviceRegion;
    }
    
    @property (weak, nonatomic) IBOutlet UIButton *recognizeFromFileButton;
    @property (weak, nonatomic) IBOutlet UIButton *recognizeFromMicButton;
    @property (weak, nonatomic) IBOutlet UILabel *recognitionResultLabel;
    - (IBAction)recognizeFromFileButtonTapped:(UIButton *)sender;
    - (IBAction)recognizeFromMicButtonTapped:(UIButton *)sender;
    @end
    
    @implementation ViewController
    
    - (void)viewDidLoad {
        speechKey = @"YourSubscriptionKey";
        serviceRegion = @"YourServiceRegion";
    }
    
    - (IBAction)recognizeFromFileButtonTapped:(UIButton *)sender {
        dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
            [self recognizeFromFile];
        });
    }
    
    - (IBAction)recognizeFromMicButtonTapped:(UIButton *)sender {
        dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
            [self recognizeFromMicrophone];
        });
    }
    
    - (void)recognizeFromFile {
        NSBundle *mainBundle = [NSBundle mainBundle];
        NSString *weatherFile = [mainBundle pathForResource: @"whatstheweatherlike" ofType:@"wav"];
        NSLog(@"weatherFile path: %@", weatherFile);
        if (!weatherFile) {
            NSLog(@"Cannot find audio file!");
            [self updateRecognitionErrorText:(@"Cannot find audio file")];
            return;
        }
    
        SPXAudioConfiguration* weatherAudioSource = [[SPXAudioConfiguration alloc] initWithWavFileInput:weatherFile];
        if (!weatherAudioSource) {
            NSLog(@"Loading audio file failed!");
            [self updateRecognitionErrorText:(@"Audio Error")];
            return;
        }
    
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion];
        if (!speechConfig) {
            NSLog(@"Could not load speech config");
            [self updateRecognitionErrorText:(@"Speech Config Error")];
            return;
        }
    
        [self updateRecognitionStatusText:(@"Recognizing...")];
    
        SPXSpeechRecognizer* speechRecognizer = [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig audioConfiguration:weatherAudioSource];
        if (!speechRecognizer) {
            NSLog(@"Could not create speech recognizer");
            [self updateRecognitionResultText:(@"Speech Recognition Error")];
            return;
        }
    
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you pass the correct key/region combination?", details.errorDetails);
            [self updateRecognitionErrorText:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails ])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self updateRecognitionResultText:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self updateRecognitionErrorText:(@"Speech Recognition Error")];
        }
    }
    
    - (void)recognizeFromMicrophone {
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion];
        if (!speechConfig) {
            NSLog(@"Could not load speech config");
            [self updateRecognitionErrorText:(@"Speech Config Error")];
            return;
        }
        
        [self updateRecognitionStatusText:(@"Recognizing...")];
        
        SPXSpeechRecognizer* speechRecognizer = [[SPXSpeechRecognizer alloc] init:speechConfig];
        if (!speechRecognizer) {
            NSLog(@"Could not create speech recognizer");
            [self updateRecognitionResultText:(@"Speech Recognition Error")];
            return;
        }
        
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you pass the correct key/region combination?", details.errorDetails);
            [self updateRecognitionErrorText:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails ])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self updateRecognitionResultText:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self updateRecognitionErrorText:(@"Speech Recognition Error")];
        }
    }
    
    - (void)updateRecognitionResultText:(NSString *) resultText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.blackColor;
            self.recognitionResultLabel.text = resultText;
        });
    }
    
    - (void)updateRecognitionErrorText:(NSString *) errorText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.redColor;
            self.recognitionResultLabel.text = errorText;
        });
    }
    
    - (void)updateRecognitionStatusText:(NSString *) statusText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.grayColor;
            self.recognitionResultLabel.text = statusText;
        });
    }
    
    @end
    
  3. Replace the string YourSubscriptionKey with your subscription key.

  4. Replace the string YourServiceRegion with the region associated with your subscription. For example, use westus for the free trial subscription.

  5. Add the request for microphone access. Right-click the Info.plist entry of the project tree, and select Open As > Source Code. Add the following lines into the <dict> section, and then save the file.

    <key>NSMicrophoneUsageDescription</key>
    <string>Need microphone access for speech recognition from microphone.</string>
    

Build and run the sample

  1. Make the debug output visible by selecting View > Debug Area > Activate Console.

  2. Choose either the iOS simulator or an iOS device connected to your development machine as the destination for the app from the list in the Product > Destination menu.

  3. Build and run the example code in the iOS simulator by selecting Product > Run from the menu. You also can select the Play button.

  4. After you select the Recognize (File) button in the app, you should see the contents of the audio file "What's the weather like?" on the lower part of the screen.

    Simulated iOS app

  5. After you select the Recognize (Microphone) button in the app and say a few words, you should see the text you have spoken on the lower part of the screen.

Next steps