Using the Natural Language framework with Xamarin.iOS

Article
08/20/2018

Introduced in iOS 12, the Natural Language framework enables on-device natural language processing. It supports language recognition, tokenization, and tagging. Tokenization splits text into its component words, sentences, or paragraphs; tagging identifies parts of speech, people, places, and organizations.

The Natural Language framework can also use custom Core ML models to classify and tag text in specialized contexts.

The NSLinguisticTagger class is still available. However, the Natural Language framework is the preferred mechanism to use for Natural Language processing.

Sample app: XamarinNL

To learn how to use the Natural Language framework with Xamarin.iOS, explore the following concepts:

Recognize languages.
Tokenize text into words and sentences.
Tag named entities and parts of speech.

Recognizing languages

The Recognizer tab of the sample app demonstrates how to use an NLLanguageRecognizer to determine the language for a block of text.

Note

Language recognition is a specific type of text classification. The Natural Language framework also supports custom text classification via developer-provided Core ML models. For more information, take a look at the Introducing Natural Language Framework session from WWDC 2018.

Dominant language

Tap the Language button to identify the dominant language in the user input.

The HandleDetermineLanguageButtonTap method of the LanguageRecognizerViewController uses the GetDominantLanguage method of an NLLanguageRecognizer to fetch the NLLanguage for the primary language found in the text:

partial void HandleDetermineLanguageButtonTap(UIButton sender)
{
    UserInput.ResignFirstResponder();
    if (!String.IsNullOrWhiteSpace(UserInput.Text))
    {
        NLLanguage lang = NLLanguageRecognizer.GetDominantLanguage(UserInput.Text);
        DominantLanguageLabel.Text = lang.ToString();
    }
}

Language probabilities

Tap the Language probabilities button to fetch a list of language hypotheses for the user input.

The HandleLanguageProbabilitiesButtonTap method of the LanguageRecognizerViewController class instantiates an NLLanguageRecognizer and asks it to Process the user's text. It then calls the language recognizer's GetNativeLanguageHypotheses method, which fetches a dictionary of languages and associated probabilities. The LanguageRecognizerTableViewController class then renders these languages and probabilities.

partial void HandleLanguageProbabilitiesButtonTap(UIButton sender)
{
    UserInput.ResignFirstResponder();
    if (!String.IsNullOrWhiteSpace(UserInput.Text))
    {
        var recognizer = new NLLanguageRecognizer();
        recognizer.Process(UserInput.Text);
        NSDictionary<NSString, NSNumber> probabilities = recognizer.GetNativeLanguageHypotheses(10);
        PerformSegue(ShowLanguageProbabilitiesSegue, this);
    }
}

Potential NLLanguage values include:

Amharic
Arabic
Armenian
Bengali
Bulgarian
Burmese
Catalan
Cherokee
Croatian
Czech
Danish
Dutch
English
Finnish
French
Georgian
German
Greek
Gujarati
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Kannada
Khmer
Korean
Lao
Malay
Malayalam
Marathi
Mongolian
Norwegian
Oriya
Persian
Polish
Portuguese
Punjabi
Romanian
Russian
SimplifiedChinese
Sinhalese
Slovak
Spanish
Swedish
Tamil
Telugu
Thai
Tibetan
TraditionalChinese
Turkish
Ukrainian
Undetermined
Urdu
Vietnamese

A full list of supported languages is available as part of the NLLanguage enum API documentation.

Tokenizing text into words, sentences, and paragraphs

The Tokenizer tab of the sample app demonstrates how to separate a block of text into its component words or sentences with an NLTokenizer.

Tap the Words or Sentences button to fetch a list of tokens. Each token is associated with a word or sentence in the original text.

ShowTokens splits the user's input into tokens by calling the GetTokens method of an NLTokenizer. This method returns an array of NSValue objects, each wrapping an NSRange value corresponding to a token in the original text.

void ShowTokens(NLTokenUnit unit)
{
    if (!String.IsNullOrWhiteSpace(UserInput.Text))
    {
        var tokenizer = new NLTokenizer(unit);
        tokenizer.String = UserInput.Text;
        var range = new NSRange(0, UserInput.Text.Length);
        NSValue[] tokens = tokenizer.GetTokens(range);
        PerformSegue(ShowTokensSegue, this);
    }
}

LanguageTokenizerTableViewController renders a single token in each table cell. It extracts an NSRange from a token NSValue, finds the corresponding string in the original text, and sets a label on the table view cell:

public override UITableViewCell GetCell(UITableView tableView, NSIndexPath indexPath)
{
    var cell = TableView.DequeueReusableCell(TokenCell);
    NSRange range = Tokens[indexPath.Row].RangeValue;
    cell.TextLabel.Text = Text.Substring((int)range.Location, (int)range.Length);
    return cell;
}

Tagging named entities and parts of speech

The Tagger tab of the XamarinNL sample app demonstrates how to use the NLTagger class to associate categories with tokens of an input string. The Natural Language framework includes built-in support for recognizing people, places, organizations, and parts of speech.

Note

The Natural Language framework also supports custom tagging schemes via developer-provided Core ML models. For more information, take a look at the Introducing Natural Language Framework session from WWDC 2018.

Tap the Named entities or Parts of speech button to fetch:

An array of NSValue objects, each wrapping an NSRange for a token in the original text.
An array of NLTag values – categories for the NSValue tokens at the same array index.

In LanguageTaggerViewController, HandlePartsOfSpeechButtonTap and HandleNamedEntitiesButtonTap each call ShowTags, passing along an NLTagScheme – either NLTagScheme.LexicalClass (for parts of speech) or NLTagScheme.NameType (for named entities).

ShowTags creates an NLTagger, instantiating it with an array of NLTagScheme types for which it will be queried (in this case, only the passed-in NLTagScheme value). It then uses the GetTags method on the NLTagger to determine the tags relevant to the text in the user input.

void ShowTags(NLTagScheme tagScheme)
{
    if (!String.IsNullOrWhiteSpace(UserInput.Text))
    {
        var tagger = new NLTagger(new NLTagScheme[] { tagScheme });
        var range = new NSRange(0, UserInput.Text.Length);
        tagger.String = UserInput.Text;

        NLTag[] tags = tagger.GetTags(range, NLTokenUnit.Word, tagScheme, NLTaggerOptions.OmitWhitespace, out NSValue[] ranges);
        NSValue[] tokenRanges = ranges;
        detailViewTitle = tagScheme == NLTagScheme.NameType ? "Named Entities" : "Parts of Speech";

        PerformSegue(ShowEntitiesSegue, this);
    }
}

The tags are then displayed in a table by the LanguageTaggerTableViewController.

Potential NLTag values include:

Adjective
Adverb
Classifier
CloseParenthesis
CloseQuote
Conjunction
Dash
Determiner
Idiom
Interjection
Noun
Number
OpenParenthesis
OpenQuote
OrganizationName
Other
OtherPunctuation
OtherWhitespace
OtherWord
ParagraphBreak
Particle
PersonalName
PlaceName
Preposition
Pronoun
Punctuation
SentenceTerminator
Verb
Whitespace
Word
WordJoiner

A full list of supported tags is available as part of the NLTag enum API documentation.