Quickstart: Recognize speech in Objective-C on iOS using the Speech SDK

In this article, you learn how to create an iOS app in Objective-C using the Cognitive Services Speech SDK to transcribe speech to text from microphone or from a file with recorded audio.

Prerequisites

Before you get started, here's a list of prerequisites:

Get the Speech SDK for iOS

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

The current version of the Cognitive Services Speech SDK is 1.5.1.

The Cognitive Services Speech SDK for iOS is currently distributed as a Cocoa Framework. It can be downloaded from here. Download the file to your home directory.

Create an Xcode Project

Start Xcode, and start a new project by clicking File > New > Project. In the template selection dialog, choose the "iOS Single View App" template.

In the dialogs that follow, make the following selections:

  1. Project Options Dialog
    1. Enter a name for the quickstart app, for example helloworld.
    2. Enter an appropriate organization name and organization identifier, if you already have an Apple developer account. For testing purposes, you can just pick any name like testorg. To sign the app, you need a proper provisioning profile. Refer to the Apple developer site for details.
    3. Make sure Objective-C is chosen as the language for the project.
    4. Disable all checkboxes for tests and core data. Project Settings
  2. Select project directory
    1. Choose your home directory to put the project in. This creates a helloworld directory in your home directory that contains all the files for the Xcode project.
    2. Disable the creation of a Git repo for this example project.
    3. Adjust the paths to the SDK in the Project Settings.
      1. In the General tab under the Embedded Binaries header, add the SDK library as a framework: Add embedded binaries > Add other... > Navigate to your home directory and choose the file MicrosoftCognitiveServicesSpeech.framework. This adds the SDK library to the header Linked Framework and Libraries automatically. Added Framework
      2. Go to the Build Settings tab and activate All settings.
      3. Add the directory $(SRCROOT)/.. to the Framework Search Paths under the Search Paths heading. Framework Search Path setting

Set up the UI

The example app will have a very simple UI: Two buttons to start speech recognition either from file or from microphone input, and a text label to display the result. The UI is set up in the Main.storyboard part of the project. Open the XML view of the storyboard by right-clicking the Main.storyboard entry of the project tree and selecting Open As... > Source Code. Replace the autogenerated XML with this code:

<?xml version="1.0" encoding="UTF-8"?>
<document type="com.apple.InterfaceBuilder3.CocoaTouch.Storyboard.XIB" version="3.0" toolsVersion="14113" targetRuntime="iOS.CocoaTouch" propertyAccessControl="none" useAutolayout="YES" useTraitCollections="YES" useSafeAreas="YES" colorMatched="YES" initialViewController="BYZ-38-t0r">
    <device id="retina4_7" orientation="portrait">
        <adaptation id="fullscreen"/>
    </device>
    <dependencies>
        <deployment identifier="iOS"/>
        <plugIn identifier="com.apple.InterfaceBuilder.IBCocoaTouchPlugin" version="14088"/>
        <capability name="Safe area layout guides" minToolsVersion="9.0"/>
        <capability name="documents saved in the Xcode 8 format" minToolsVersion="8.0"/>
    </dependencies>
    <scenes>
        <!--View Controller-->
        <scene sceneID="tne-QT-ifu">
            <objects>
                <viewController id="BYZ-38-t0r" customClass="ViewController" sceneMemberID="viewController">
                    <view key="view" contentMode="scaleToFill" id="8bC-Xf-vdC">
                        <rect key="frame" x="0.0" y="0.0" width="375" height="667"/>
                        <autoresizingMask key="autoresizingMask" widthSizable="YES" heightSizable="YES"/>
                        <subviews>
                            <button opaque="NO" contentMode="scaleToFill" fixedFrame="YES" contentHorizontalAlignment="center" contentVerticalAlignment="center" buttonType="roundedRect" lineBreakMode="middleTruncation" translatesAutoresizingMaskIntoConstraints="NO" id="qFP-u7-47Q">
                                <rect key="frame" x="84" y="247" width="207" height="82"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="Start speech recognition from file" identifier="recognize_file_button">
                                    <accessibilityTraits key="traits" button="YES" staticText="YES"/>
                                    <bool key="isElement" value="YES"/>
                                </accessibility>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <state key="normal" title="Recognize (File)"/>
                                <connections>
                                    <action selector="recognizeFromFileButtonTapped:" destination="BYZ-38-t0r" eventType="touchUpInside" id="Vfr-ah-nbC"/>
                                </connections>
                            </button>
                            <label opaque="NO" userInteractionEnabled="NO" contentMode="center" horizontalHuggingPriority="251" verticalHuggingPriority="251" fixedFrame="YES" text="Recognition result" textAlignment="center" lineBreakMode="tailTruncation" numberOfLines="5" baselineAdjustment="alignBaselines" adjustsFontSizeToFit="NO" translatesAutoresizingMaskIntoConstraints="NO" id="tq3-GD-ljB">
                                <rect key="frame" x="20" y="408" width="335" height="148"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="The result of speech recognition" identifier="result_label">
                                    <accessibilityTraits key="traits" notEnabled="YES"/>
                                    <bool key="isElement" value="NO"/>
                                </accessibility>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <color key="textColor" red="0.5" green="0.5" blue="0.5" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
                                <nil key="highlightedColor"/>
                            </label>
                            <button opaque="NO" contentMode="scaleToFill" fixedFrame="YES" contentHorizontalAlignment="center" contentVerticalAlignment="center" buttonType="roundedRect" lineBreakMode="middleTruncation" translatesAutoresizingMaskIntoConstraints="NO" id="91d-Ki-IyR">
                                <rect key="frame" x="16" y="209" width="339" height="30"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="Start speech recognition from microphone" identifier="recognize_microphone_button"/>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <state key="normal" title="Recognize (Microphone)"/>
                                <connections>
                                    <action selector="recognizeFromMicButtonTapped:" destination="BYZ-38-t0r" eventType="touchUpInside" id="2n3-kA-ySa"/>
                                </connections>
                            </button>
                        </subviews>
                        <color key="backgroundColor" red="1" green="1" blue="1" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
                        <viewLayoutGuide key="safeArea" id="6Tk-OE-BBY"/>
                    </view>
                    <connections>
                        <outlet property="recognitionResultLabel" destination="tq3-GD-ljB" id="kP4-o4-s0Q"/>
                    </connections>
                </viewController>
                <placeholder placeholderIdentifier="IBFirstResponder" id="dkx-z0-nzr" sceneMemberID="firstResponder"/>
            </objects>
            <point key="canvasLocation" x="135.19999999999999" y="132.68365817091455"/>
        </scene>
    </scenes>
</document>

Add the sample code

  1. Download the sample wav file by right-clicking the link and choosing Save target as.... Add the wav file to the project as a resource by dragging it from a Finder window into the root level of the Project view. Click Finish in the following dialog without changing the settings.

  2. Replace the contents of the autogenerated ViewController.m file by:

    #import "ViewController.h"
    #import <MicrosoftCognitiveServicesSpeech/SPXSpeechApi.h>
    
    @interface ViewController () {
        NSString *speechKey;
        NSString *serviceRegion;
    }
    
    @property (weak, nonatomic) IBOutlet UIButton *recognizeFromFileButton;
    @property (weak, nonatomic) IBOutlet UIButton *recognizeFromMicButton;
    @property (weak, nonatomic) IBOutlet UILabel *recognitionResultLabel;
    - (IBAction)recognizeFromFileButtonTapped:(UIButton *)sender;
    - (IBAction)recognizeFromMicButtonTapped:(UIButton *)sender;
    @end
    
    @implementation ViewController
    
    - (void)viewDidLoad {
        speechKey = @"YourSubscriptionKey";
        serviceRegion = @"YourServiceRegion";
    }
    
    - (IBAction)recognizeFromFileButtonTapped:(UIButton *)sender {
        dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
            [self recognizeFromFile];
        });
    }
    
    - (IBAction)recognizeFromMicButtonTapped:(UIButton *)sender {
        dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
            [self recognizeFromMicrophone];
        });
    }
    
    - (void)recognizeFromFile {
        NSBundle *mainBundle = [NSBundle mainBundle];
        NSString *weatherFile = [mainBundle pathForResource: @"whatstheweatherlike" ofType:@"wav"];
        NSLog(@"weatherFile path: %@", weatherFile);
        if (!weatherFile) {
            NSLog(@"Cannot find audio file!");
            [self updateRecognitionErrorText:(@"Cannot find audio file")];
            return;
        }
    
        SPXAudioConfiguration* weatherAudioSource = [[SPXAudioConfiguration alloc] initWithWavFileInput:weatherFile];
        if (!weatherAudioSource) {
            NSLog(@"Loading audio file failed!");
            [self updateRecognitionErrorText:(@"Audio Error")];
            return;
        }
    
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion];
        if (!speechConfig) {
            NSLog(@"Could not load speech config");
            [self updateRecognitionErrorText:(@"Speech Config Error")];
            return;
        }
    
        [self updateRecognitionStatusText:(@"Recognizing...")];
    
        SPXSpeechRecognizer* speechRecognizer = [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig audioConfiguration:weatherAudioSource];
        if (!speechRecognizer) {
            NSLog(@"Could not create speech recognizer");
            [self updateRecognitionResultText:(@"Speech Recognition Error")];
            return;
        }
    
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you pass the correct key/region combination?", details.errorDetails);
            [self updateRecognitionErrorText:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails ])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self updateRecognitionResultText:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self updateRecognitionErrorText:(@"Speech Recognition Error")];
        }
    }
    
    - (void)recognizeFromMicrophone {
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion];
        if (!speechConfig) {
            NSLog(@"Could not load speech config");
            [self updateRecognitionErrorText:(@"Speech Config Error")];
            return;
        }
        
        [self updateRecognitionStatusText:(@"Recognizing...")];
        
        SPXSpeechRecognizer* speechRecognizer = [[SPXSpeechRecognizer alloc] init:speechConfig];
        if (!speechRecognizer) {
            NSLog(@"Could not create speech recognizer");
            [self updateRecognitionResultText:(@"Speech Recognition Error")];
            return;
        }
        
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you pass the correct key/region combination?", details.errorDetails);
            [self updateRecognitionErrorText:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails ])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self updateRecognitionResultText:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self updateRecognitionErrorText:(@"Speech Recognition Error")];
        }
    }
    
    - (void)updateRecognitionResultText:(NSString *) resultText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.blackColor;
            self.recognitionResultLabel.text = resultText;
        });
    }
    
    - (void)updateRecognitionErrorText:(NSString *) errorText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.redColor;
            self.recognitionResultLabel.text = errorText;
        });
    }
    
    - (void)updateRecognitionStatusText:(NSString *) statusText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.grayColor;
            self.recognitionResultLabel.text = statusText;
        });
    }
    
    @end
    
  3. Replace the string YourSubscriptionKey with your subscription key.

  4. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

  5. Add the request for microphone access. Right-click the Info.plist entry of the project tree and select Open As... > Source Code. Add the following lines into the <dict> section and then save the file.

    <key>NSMicrophoneUsageDescription</key>
    <string>Need microphone access for speech recognition from microphone.</string>
    

Building and Running the Sample

  1. Make the debug output visible (View > Debug Area > Activate Console).

  2. Choose either the iOS simulator or an iOS device connected to your development machine as the destination for the app from the list in the Product -> Destination menu.

  3. Build and run the example code in the iOS simulator by selecting Product -> Run from the menu or clicking the Play button. Currently the Speech SDK only supports 64bit iOS platforms.

  4. After you click the "Recognize (File)" button in the app, you should see the contents of the audio file "What's the weather like?" on the lower part of the screen.

    Simulated iOS App

  5. After you click the "Recognize (Microphone)" button in the app and say a few words, you should see the text you have spoken on the lower part of the screen.

Next steps