WordsSegmenter WordsSegmenter WordsSegmenter WordsSegmenter Class

Definition

A segmenter class that is able to segment provided text into words.

The language supplied when this object is constructed is matched against the languages with word breakers on the system, and the best word segmentation rules available are used. The language need not be one of the app's supported languages. If there are no supported language rules available specifically for that language, the language-neutral rules are used (an implementation of Unicode Standard Annex #29 Unicode Text Segmentation), and the ResolvedLanguage property is set to "und" (undetermined language).

public sealed class WordsSegmenterpublic sealed class WordsSegmenterPublic NotInheritable Class WordsSegmenterpublic sealed class WordsSegmenter
Attributes
Windows 10 requirements
Device family
Windows 10 (introduced v10.0.10240.0)
API contract
Windows.Foundation.UniversalApiContract (introduced v1)

Constructors

WordsSegmenter(String) WordsSegmenter(String) WordsSegmenter(String) WordsSegmenter(String)

Creates a WordsSegmenter object. See the introduction in WordsSegmenter for a description of how the language supplied to this constructor is used.

public WordsSegmenter(String language)public New(String language)Public Sub New(language As String)public WordsSegmenter(String language)
Parameters
language
System.String System.String System.String System.String

A BCP-47 language tag.

Attributes
Additional features and requirements
Device family
Windows 10 (introduced v10.0.10240.0)
API contract
Windows.Foundation.UniversalApiContract (introduced v1)

Properties

ResolvedLanguage ResolvedLanguage ResolvedLanguage ResolvedLanguage

Gets the language of the rules used by this WordsSegmenter object.

"und" (undetermined) is returned if we are using language-neutral rules.

public string ResolvedLanguage { get; }public string ResolvedLanguage { get; }Public ReadOnly Property ResolvedLanguage As stringpublic string ResolvedLanguage { get; }
Value
string string string string

The BCP-47 language tag of the rules employed.

Attributes
Additional features and requirements
Device family
Windows 10 (introduced v10.0.10240.0)
API contract
Windows.Foundation.UniversalApiContract (introduced v1)

Methods

GetTokenAt(String, UInt32) GetTokenAt(String, UInt32) GetTokenAt(String, UInt32) GetTokenAt(String, UInt32)

Determines and returns the word which contains or follows a specified index into the provided text.

public WordSegment GetTokenAt(String text, UInt32 startIndex)public WordSegment GetTokenAt(String text, UInt32 startIndex)Public Function GetTokenAt(text As String, startIndex As UInt32) As WordSegmentpublic WordSegment GetTokenAt(String text, UInt32 startIndex)
Parameters
text
System.String System.String System.String System.String

Provided text from which the word is to be returned.

startIndex
System.UInt32 System.UInt32 System.UInt32 System.UInt32

A zero-based index into text. It must be less than the length of text.

Returns
Attributes
Additional features and requirements
Device family
Windows 10 (introduced v10.0.10240.0)
API contract
Windows.Foundation.UniversalApiContract (introduced v1)

GetTokens(String) GetTokens(String) GetTokens(String) GetTokens(String)

Determines and returns all of the words in the provided text.

public IVectorView<WordSegment> GetTokens(String text)public IVectorView<WordSegment> GetTokens(String text)Public Function GetTokens(text As String) As IVectorView( Of WordSegment )public IVectorView<WordSegment> GetTokens(String text)
Parameters
text
System.String System.String System.String System.String

Provided text containing words to be returned.

Returns

A collection of WordSegment objects that represent the words.

Attributes
Additional features and requirements
Device family
Windows 10 (introduced v10.0.10240.0)
API contract
Windows.Foundation.UniversalApiContract (introduced v1)

Tokenize(String, UInt32, WordSegmentsTokenizingHandler) Tokenize(String, UInt32, WordSegmentsTokenizingHandler) Tokenize(String, UInt32, WordSegmentsTokenizingHandler) Tokenize(String, UInt32, WordSegmentsTokenizingHandler)

Calls the provided handler with two iterators that iterate through the words prior to and following a given index into the provided text.

public void Tokenize(String text, UInt32 startIndex, WordSegmentsTokenizingHandler handler)public void Tokenize(String text, UInt32 startIndex, WordSegmentsTokenizingHandler handler)Public Function Tokenize(text As String, startIndex As UInt32, handler As WordSegmentsTokenizingHandler) As voidpublic void Tokenize(String text, UInt32 startIndex, WordSegmentsTokenizingHandler handler)
Parameters
text
System.String System.String System.String System.String

Provided text containing words to be returned.

startIndex
System.UInt32 System.UInt32 System.UInt32 System.UInt32

A zero-based index into text. It must be less than the length of text.

Attributes
Additional features and requirements
Device family
Windows 10 (introduced v10.0.10240.0)
API contract
Windows.Foundation.UniversalApiContract (introduced v1)

Remarks

The iterators in WordSegmentsTokenizingHandler are lazy and evaluate small chunks of text at a time.

The handler is called at most once per call to Tokenize(String, UInt32, WordSegmentsTokenizingHandler). The handler is not called if there are no selectable words in text.