Quickstart: Recognize speech in Swift on macOS using the Speech SDK

Quickstarts are also available for speech synthesis.

In this article, you learn how to create a macOS app in Swift using the Cognitive Services Speech SDK to transcribe speech recorded from a microphone to text.

Prerequisites

Before you get started, here's a list of prerequisites:

Get the Speech SDK for macOS

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

Note that this tutorial will not work with version of the SDK earlier than 1.6.0.

The Cognitive Services Speech SDK for macOS is distributed as a framework bundle. It can be used in Xcode projects as a CocoaPod, or downloaded from https://aka.ms/csspeech/macosbinary and linked manually. This guide uses a CocoaPod.

Create an Xcode project

Start Xcode, and start a new project by clicking File > New > Project. In the template selection dialog, choose the "Cocoa App" template.

In the dialogs that follow, make the following selections:

  1. Project Options Dialog
    1. Enter a name for the quickstart app, for example helloworld.
    2. Enter an appropriate organization name and an organization identifier, if you already have an Apple developer account. For testing purposes, you can just pick any name like testorg. To sign the app, you need a proper provisioning profile. Refer to the Apple developer site for details.
    3. Make sure Swift is chosen as the language for the project.
    4. Disable the checkboxes to use storyboards and to create a document-based application. The simple UI for the sample app will be created programmatically.
    5. Disable all checkboxes for tests and core data.
  2. Select project directory
    1. Choose a directory to put the project in. This creates a helloworld directory in the chosen directory that contains all the files for the Xcode project.
    2. Disable the creation of a Git repo for this example project.
  3. Set the entitlements for network and microphone access. Click the app name in the first line in the overview on the left to get to the app configuration, and then choose the "Capabilities" tab.
    1. Enable the "App sandbox" setting for the app.
    2. Enable the checkboxes for "Outgoing Connections" and "Microphone" access. Sandbox Settings
  4. The app also needs to declare use of the microphone in the Info.plist file. Click on the file in the overview, and add the "Privacy - Microphone Usage Description" key, with a value like "Microphone is needed for speech recognition". Settings in Info.plist
  5. Close the Xcode project. You will use a different instance of it later after setting up the CocoaPods.

Add the sample code

  1. Place a new header file with the name MicrosoftCognitiveServicesSpeech-Bridging-Header.h into the helloworld directory inside the helloworld project, and paste the following code into it:

    #ifndef MicrosoftCognitiveServicesSpeech_Bridging_Header_h
    #define MicrosoftCognitiveServicesSpeech_Bridging_Header_h
    
    #import <MicrosoftCognitiveServicesSpeech/SPXSpeechAPI.h>
    
    #endif /* MicrosoftCognitiveServicesSpeech_Bridging_Header_h */
    
  2. Add the relative path helloworld/MicrosoftCognitiveServicesSpeech-Bridging-Header.h to the bridging header to the Swift project settings for the helloworld target in the Objective-C Bridging Header field Header properties

  3. Replace the contents of the autogenerated AppDelegate.swift file by:

    import Cocoa
    
    @NSApplicationMain
    class AppDelegate: NSObject, NSApplicationDelegate {
        var label: NSTextField!
        var fromMicButton: NSButton!
    
        var sub: String!
        var region: String!
    
        @IBOutlet weak var window: NSWindow!
    
        func applicationDidFinishLaunching(_ aNotification: Notification) {
            print("loading")
            // load subscription information
            sub = "YourSubscriptionKey"
            region = "YourServiceRegion"
    
            label = NSTextField(frame: NSRect(x: 100, y: 50, width: 200, height: 200))
            label.textColor = NSColor.black
            label.lineBreakMode = .byWordWrapping
    
            label.stringValue = "Recognition Result"
            label.isEditable = false
    
            self.window.contentView?.addSubview(label)
    
            fromMicButton = NSButton(frame: NSRect(x: 100, y: 300, width: 200, height: 30))
            fromMicButton.title = "Recognize"
            fromMicButton.target = self
            fromMicButton.action = #selector(fromMicButtonClicked)
            self.window.contentView?.addSubview(fromMicButton)
        }
    
        @objc func fromMicButtonClicked() {
            DispatchQueue.global(qos: .userInitiated).async {
                self.recognizeFromMic()
            }
        }
    
        func recognizeFromMic() {
            var speechConfig: SPXSpeechConfiguration?
            do {
                try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region)
            } catch {
                print("error \(error) happened")
                speechConfig = nil
            }
            speechConfig?.speechRecognitionLanguage = "en-US"
            let audioConfig = SPXAudioConfiguration()
    
            let reco = try! SPXSpeechRecognizer(speechConfiguration: speechConfig!, audioConfiguration: audioConfig)
    
            reco.addRecognizingEventHandler() {reco, evt in
                print("intermediate recognition result: \(evt.result.text ?? "(no result)")")
                self.updateLabel(text: evt.result.text, color: .gray)
            }
    
            updateLabel(text: "Listening ...", color: .gray)
            print("Listening...")
    
            let result = try! reco.recognizeOnce()
            print("recognition result: \(result.text ?? "(no result)"), reason: \(result.reason.rawValue)")
            updateLabel(text: result.text, color: .black)
    
            if result.reason != SPXResultReason.recognizedSpeech {
                let cancellationDetails = try! SPXCancellationDetails(fromCanceledRecognitionResult: result)
                print("cancelled: \(result.reason), \(cancellationDetails.errorDetails)")
                updateLabel(text: "Error: \(cancellationDetails.errorDetails)", color: .red)
            }
        }
    
        func updateLabel(text: String?, color: NSColor) {
            DispatchQueue.main.async {
                self.label.stringValue = text!
                self.label.textColor = color
            }
        }
    }
    
  4. In AppDelegate.swift, replace the string YourSubscriptionKey with your subscription key.

  5. Replace the string YourServiceRegion with the region associated with your subscription (for example, westus for the free trial subscription).

Install the SDK as a CocoaPod

  1. Install the CocoaPod dependency manager as described in its installation instructions.

  2. Navigate to the directory of your sample app (helloworld). Place a text file with the name Podfile and the following content in that directory:

    target 'helloworld' do
        platform :osx, 10.14
        pod 'MicrosoftCognitiveServicesSpeech-macOS', '~> 1.6'
        use_frameworks!
    end
    
  3. Navigate to the helloworld directory in a terminal and run the command pod install. This will generate a helloworld.xcworkspace Xcode workspace containing both the sample app and the Speech SDK as a dependency. This workspace will be used in the following.

Build and run the sample

  1. Open the helloworld.xcworkspace workspace in Xcode.
  2. Make the debug output visible (View > Debug Area > Activate Console).
  3. Build and run the example code by selecting Product > Run from the menu or clicking the Play button.
  4. After you click the "Recognize" button in the app and say a few words, you should see the text you have spoken in the lower part of the app window.

Next steps