Quickstart: Recognize speech in Objective-C on macOS by using the Speech SDK

Quickstarts are also available for speech synthesis.

In this article, you learn how to create a macOS app in Objective-C by using the Azure Cognitive Services Speech SDK to transcribe speech recorded from a microphone to text.

Prerequisites

Before you get started, you'll need:

Get the Speech SDK for macOS

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

The Cognitive Services Speech SDK for Mac is distributed as a framework bundle. It can be used in Xcode projects as a CocoaPod or downloaded from https://aka.ms/csspeech/macosbinary and linked manually. This article uses a CocoaPod.

Create an Xcode project

Start Xcode, and start a new project by selecting File > New > Project. In the template selection dialog box, select the Cocoa App template.

In the dialog boxes that follow, make the following selections.

  1. In the Project Options dialog box:

    1. Enter a name for the quickstart app, for example, helloworld.
    2. Enter an appropriate organization name and an organization identifier if you already have an Apple developer account. For testing purposes, use a name like testorg. To sign the app, you need a proper provisioning profile. For more information, see the Apple developer site.
    3. Make sure Objective-C is selected as the language for the project.
    4. Clear the check boxes to use storyboards and to create a document-based application. The simple UI for the sample app is created programmatically.
    5. Clear all the check boxes for tests and core data.

    Project settings

  2. Select a project directory:

    1. Choose a directory to put the project in. This step creates a helloworld directory in your home directory that contains all the files for the Xcode project.
    2. Disable the creation of a Git repo for this example project.
  3. Set the entitlements for network and microphone access. Select the app name in the first line in the overview on the left to get to the app configuration. Then select the Capabilities tab.

    1. Enable the App sandbox setting for the app.
    2. Select the check boxes for Outgoing Connections and Microphone access.

    Sandbox settings

  4. The app also needs to declare use of the microphone in the Info.plist file. Select the file in the overview, and add the Privacy - Microphone Usage Description key with a value like Microphone is needed for speech recognition.

    Settings in Info.plist

  5. Close the Xcode project. You use a different instance of it later after you set up the CocoaPods.

Install the SDK as a CocoaPod

  1. Install the CocoaPod dependency manager as described in its installation instructions.

  2. Go to the directory of your sample app, which is helloworld. Place a text file with the name Podfile and the following content in that directory:

    target 'helloworld' do
      platform :osx, '10.13'
      pod 'MicrosoftCognitiveServicesSpeech-macOS', '~> 1.7.0'
    end
    
  3. Go to the helloworld directory in a terminal, and run the command pod install. This command generates a helloworld.xcworkspace Xcode workspace that contains both the sample app and the Speech SDK as a dependency. This workspace is used in the following steps.

Add the sample code

  1. Open the workspace helloworld.xcworkspace in Xcode.

  2. Replace the contents of the autogenerated AppDelegate.m file with the following code:

    #import "AppDelegate.h"
    #import <MicrosoftCognitiveServicesSpeech/SPXSpeechApi.h>
    
    
    @interface AppDelegate ()
    @property (weak) IBOutlet NSWindow *window;
    @property (weak) NSButton *button;
    @property (strong) NSTextField *label;
    @end
    
    @implementation AppDelegate
    
    - (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
        self.button = [NSButton buttonWithTitle:@"Start Recognition" target:nil action:nil];
        [self.button setTarget:self];
        [self.button setAction:@selector(buttonPressed:)];
        [self.window.contentView addSubview:self.button];
    
        self.label = [[NSTextField alloc] initWithFrame:NSMakeRect(100, 100, 200, 17)];
        [self.label setStringValue:@"Press Button!"];
        [self.window.contentView addSubview:self.label];
    }
    
    - (void)buttonPressed:(NSButton *)button {
        // Creates an instance of a speech config with specified subscription key and service region.
        // Replace with your own subscription key // and service region (e.g., "westus").
        NSString *speechKey = @"YourSubscriptionKey";
        NSString *serviceRegion = @"YourServiceRegion";
    
        SPXAudioConfiguration *audioConfig = [[SPXAudioConfiguration alloc] initWithMicrophone:nil];
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithSubscription:speechKey region:serviceRegion];
        SPXSpeechRecognizer *speechRecognizer = [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig audioConfiguration:audioConfig];
    
        NSLog(@"Say something...");
    
        // Starts speech recognition, and returns after a single utterance is recognized. The end of a
        // single utterance is determined by listening for silence at the end or until a maximum of 15
        // seconds of audio is processed.  The task returns the recognition text as result.
        // Note: Since recognizeOnce() returns only a single utterance, it is suitable only for single
        // shot recognition like command or query.
        // For long-running multi-utterance recognition, use startContinuousRecognitionAsync() instead.
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
    
        // Checks result.
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you pass the correct key/region combination?", details.errorDetails);
            [self.label setStringValue:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self.label setStringValue:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self.label setStringValue:(@"Speech Recognition Error")];
        }
    }
    
    - (void)applicationWillTerminate:(NSNotification *)aNotification {
        // Insert code here to tear down your application
    }
    
    - (BOOL)applicationShouldTerminateAfterLastWindowClosed:(NSApplication *)theApplication {
        return YES;
    }
    
    @end
    
  3. Replace the string YourSubscriptionKey with your subscription key.

  4. Replace the string YourServiceRegion with the region associated with your subscription. For example, use westus for the free trial subscription.

Build and run the sample

  1. Make the debug output visible by selecting View > Debug Area > Activate Console.
  2. Build and run the example code by selecting Product > Run from the menu. You also can select Play.
  3. After you select the button and say a few words, you should see the text you have spoken on the lower part of the screen. When you run the app for the first time, you should be prompted to give the app access to your computer's microphone.

Next steps