Quickstart: Recognize speech in Swift on iOS by using the Speech SDK

Quickstarts are also available for speech synthesis.

In this article, you learn how to create an iOS app in Swift by using the Azure Cognitive Services Speech SDK to transcribe speech recorded from a microphone to text.

Prerequisites

Before you get started, you'll need:

Get the Speech SDK for iOS

Important

By downloading any of the Speech SDK for Azure Cognitive Services components on this page, you acknowledge its license. See the Microsoft Software License Terms for the Speech SDK.

This tutorial won't work with a version of the SDK earlier than 1.6.0.

The Cognitive Services Speech SDK for iOS is distributed as a framework bundle. It can be used in Xcode projects as a CocoaPod or downloaded from https://aka.ms/csspeech/iosbinary and linked manually. This article uses a CocoaPod.

Create an Xcode project

Start Xcode, and start a new project by selecting File > New > Project. In the template selection dialog box, select the iOS Single View App template.

In the dialog boxes that follow, make the following selections.

  1. In the Project Options dialog box:

    1. Enter a name for the quickstart app, for example, helloworld.
    2. Enter an appropriate organization name and an organization identifier if you already have an Apple developer account. For testing purposes, use a name like testorg. To sign the app, you need a proper provisioning profile. For more information, see the Apple developer site.
    3. Make sure Swift is chosen as the language for the project.
    4. Disable the check boxes to use storyboards and to create a document-based application. The simple UI for the sample app is created programmatically.
    5. Clear all the check boxes for tests and core data.
  2. Select a project directory:

    1. Choose a directory to put the project in. This step creates a helloworld directory in the chosen directory that contains all the files for the Xcode project.
    2. Disable the creation of a Git repo for this example project.
  3. The app also needs to declare use of the microphone in the Info.plist file. Select the file in the overview, and add the Privacy - Microphone Usage Description key with a value like Microphone is needed for speech recognition.

    Settings in Info.plist

  4. Close the Xcode project. You use a different instance of it later after you set up the CocoaPods.

Add the sample code

  1. Place a new header file with the name MicrosoftCognitiveServicesSpeech-Bridging-Header.h into the helloworld directory inside the helloworld project. Paste the following code into it:

    #ifndef MicrosoftCognitiveServicesSpeech_Bridging_Header_h
    #define MicrosoftCognitiveServicesSpeech_Bridging_Header_h
    
    #import <MicrosoftCognitiveServicesSpeech/SPXSpeechAPI.h>
    
    #endif /* MicrosoftCognitiveServicesSpeech_Bridging_Header_h */
    
  2. Add the relative path helloworld/MicrosoftCognitiveServicesSpeech-Bridging-Header.h to the bridging header to the Swift project settings for the helloworld target in the Objective-C Bridging Header field.

    Header properties

  3. Replace the contents of the autogenerated AppDelegate.swift file with the following code:

    import UIKit
    
    @UIApplicationMain
    class AppDelegate: UIResponder, UIApplicationDelegate {
        
        var window: UIWindow?
        
        func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
            window = UIWindow(frame: UIScreen.main.bounds)
            
            let homeViewController = ViewController()
            homeViewController.view.backgroundColor = UIColor.white
            window!.rootViewController = homeViewController
            
            window?.makeKeyAndVisible()
            return true
        }
    }
    
  4. Replace the contents of the autogenerated ViewController.swift file with the following code:

    import UIKit
    
    class ViewController: UIViewController {
        var label: UILabel!
        var fromMicButton: UIButton!
        
        var sub: String!
        var region: String!
        
        override func viewDidLoad() {
            super.viewDidLoad()
            
            // load subscription information
            sub = "YourSubscriptionKey"
            region = "YourServiceRegion"
            
            label = UILabel(frame: CGRect(x: 100, y: 100, width: 200, height: 200))
            label.textColor = UIColor.black
            label.lineBreakMode = .byWordWrapping
            label.numberOfLines = 0
            
            label.text = "Recognition Result"
            
            fromMicButton = UIButton(frame: CGRect(x: 100, y: 400, width: 200, height: 50))
            fromMicButton.setTitle("Recognize", for: .normal)
            fromMicButton.addTarget(self, action:#selector(self.fromMicButtonClicked), for: .touchUpInside)
            fromMicButton.setTitleColor(UIColor.black, for: .normal)
            
            self.view.addSubview(label)
            self.view.addSubview(fromMicButton)
        }
        
        
        @objc func fromMicButtonClicked() {
            DispatchQueue.global(qos: .userInitiated).async {
                self.recognizeFromMic()
            }
        }
        
        func recognizeFromMic() {
            var speechConfig: SPXSpeechConfiguration?
            do {
                try speechConfig = SPXSpeechConfiguration(subscription: sub, region: region)
            } catch {
                print("error \(error) happened")
                speechConfig = nil
            }
            speechConfig?.speechRecognitionLanguage = "en-US"
            
            let audioConfig = SPXAudioConfiguration()
            
            let reco = try! SPXSpeechRecognizer(speechConfiguration: speechConfig!, audioConfiguration: audioConfig)
            
            reco.addRecognizingEventHandler() {reco, evt in
                print("intermediate recognition result: \(evt.result.text ?? "(no result)")")
                self.updateLabel(text: evt.result.text, color: .gray)
            }
            
            updateLabel(text: "Listening ...", color: .gray)
            print("Listening...")
            
            let result = try! reco.recognizeOnce()
            print("recognition result: \(result.text ?? "(no result)")")
            updateLabel(text: result.text, color: .black)
        }
        
        func updateLabel(text: String?, color: UIColor) {
            DispatchQueue.main.async {
                self.label.text = text
                self.label.textColor = color
            }
        }
    }
    
  5. In ViewController.swift, replace the string YourSubscriptionKey with your subscription key.

  6. Replace the string YourServiceRegion with the region associated with your subscription. For example, use westus for the free trial subscription.

Install the SDK as a CocoaPod

  1. Install the CocoaPod dependency manager as described in its installation instructions.

  2. Go to the directory of your sample app, which is helloworld. Place a text file with the name Podfile and the following content in that directory:

    target 'helloworld' do
        platform :ios, '9.3'
        pod 'MicrosoftCognitiveServicesSpeech-iOS', '~> 1.6'
        use_frameworks!
    end
    
  3. Go to the helloworld directory in a terminal, and run the command pod install. This command generates a helloworld.xcworkspace Xcode workspace that contains both the sample app and the Speech SDK as a dependency. This workspace is used in the following steps.

Build and run the sample

  1. Open the workspace helloworld.xcworkspace in Xcode.
  2. Make the debug output visible by selecting View > Debug Area > Activate Console.
  3. Choose either the iOS simulator or an iOS device connected to your development machine as the destination for the app from the list in the Product > Destination menu.
  4. Build and run the example code in the iOS simulator by selecting Product > Run from the menu. You also can select the Play button.
  5. After you select the Recognize button in the app and say a few words, you should see the text you have spoken on the lower part of the screen.

Next steps