TransformsCatalog.TextTransforms Class

Definition

Class used by MLContext to create instances of text data transform components.

public sealed class TransformsCatalog.TextTransforms
type TransformsCatalog.TextTransforms = class
Public NotInheritable Class TransformsCatalog.TextTransforms
Inheritance
TransformsCatalog.TextTransforms

Extension Methods

ApplyWordEmbedding(TransformsCatalog+TextTransforms, String, String, WordEmbeddingEstimator+PretrainedModelKind)

Create an WordEmbeddingEstimator, which is a text featurizer that converts a vector of text into a numerical vector using pre-trained embeddings models.

ApplyWordEmbedding(TransformsCatalog+TextTransforms, String, String, String)

Create an WordEmbeddingEstimator, which is a text featurizer that converts vectors of text into numerical vectors using pre-trained embeddings models.

FeaturizeText(TransformsCatalog+TextTransforms, String, TextFeaturizingEstimator+Options, String[])

Create a TextFeaturizingEstimator, which transforms a text column into featurized float array that represents normalized counts of n-grams and char-grams.

FeaturizeText(TransformsCatalog+TextTransforms, String, String)

Create a TextFeaturizingEstimator, which transforms a text column into a featurized vector of Single that represents normalized counts of n-grams and char-grams.

LatentDirichletAllocation(TransformsCatalog+TextTransforms, String, String, Int32, Single, Single, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Boolean)

Create a LatentDirichletAllocationEstimator, which uses LightLDA to transform text (represented as a vector of floats) into a vector of Single indicating the similarity of the text with each topic identified.

NormalizeText(TransformsCatalog+TextTransforms, String, String, TextNormalizingEstimator+CaseMode, Boolean, Boolean, Boolean)

Creates a TextNormalizingEstimator, which normalizes incoming text in inputColumnName by optionally changing case, removing diacritical marks, punctuation marks, numbers, and outputs new text as outputColumnName.

ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)

Create a NgramHashingEstimator, which copies the data from the column specified in inputColumnName to a new column: outputColumnName and produces a vector of counts of hashed n-grams.

ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)

Create a NgramHashingEstimator, which takes the data from the multiple columns specified in inputColumnNames to a new column: outputColumnName and produces a vector of counts of hashed n-grams.

ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)

Create a WordHashBagEstimator, which maps the column specified in inputColumnName to a vector of counts of hashed n-grams in a new column named outputColumnName.

ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)

Create a WordHashBagEstimator, which maps the multiple columns specified in inputColumnNames to a vector of counts of hashed n-grams in a new column named outputColumnName.

ProduceNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Creates a NgramExtractingEstimator which produces a vector of counts of n-grams (sequences of consecutive words) encountered in the input text.

ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Create a WordHashBagEstimator, which maps the column specified in inputColumnName to a vector of n-gram counts in a new column named outputColumnName.

ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Create a WordHashBagEstimator, which maps the multiple columns specified in inputColumnNames to a vector of n-gram counts in a new column named outputColumnName.

RemoveDefaultStopWords(TransformsCatalog+TextTransforms, String, String, StopWordsRemovingEstimator+Language)

Create a CustomStopWordsRemovingEstimator, which copies the data from the column specified in inputColumnName to a new column: outputColumnName and removes predifined set of text specific for language from it.

RemoveStopWords(TransformsCatalog+TextTransforms, String, String, String[])

Create a CustomStopWordsRemovingEstimator, which copies the data from the column specified in inputColumnName to a new column: outputColumnName and removes text specified in stopwords from it.

TokenizeIntoCharactersAsKeys(TransformsCatalog+TextTransforms, String, String, Boolean)

Create a TokenizingByCharactersEstimator, which tokenizes by splitting text into sequences of characters using a sliding window.

TokenizeIntoWords(TransformsCatalog+TextTransforms, String, String, Char[])

Create a WordTokenizingEstimator, which tokenizes input text using separators as separators.

Applies to