Microsoft.ML.Transforms.Text Namespace

Namespace containing text data transformation components.

Classes

CustomStopWordsRemovingEstimator

IEstimator<TTransformer> for the CustomStopWordsRemovingTransformer.

CustomStopWordsRemovingEstimator.Options

Use stop words remover that can removes language-specific list of stop words (most common words) already defined in the system.

CustomStopWordsRemovingTransformer

ITransformer resulting from fitting a CustomStopWordsRemovingEstimator.

LatentDirichletAllocationEstimator

The LDA transform implements LightLDA, a state-of-the-art implementation of Latent Dirichlet Allocation.

LatentDirichletAllocationTransformer

ITransformer resulting from fitting a LatentDirichletAllocationEstimator.

LatentDirichletAllocationTransformer.ModelParameters

Provide details about the topics discovered by LightLDA.

NgramExtractingEstimator

Produces a vector of counts of n-grams (sequences of consecutive words) encountered in the input text.

NgramExtractingTransformer

ITransformer resulting from fitting an NgramExtractingEstimator.

NgramHashingEstimator

IEstimator<TTransformer> for the NgramHashingTransformer.

NgramHashingTransformer
StopWordsRemovingEstimator

IEstimator<TTransformer> for the CustomStopWordsRemovingTransformer.

StopWordsRemovingEstimator.Options

Use stop words remover that can remove language-specific list of stop words (most common words) already defined in the system.

StopWordsRemovingTransformer

ITransformer resulting from fitting a StopWordsRemovingEstimator.

TextFeaturizingEstimator

An estimator that turns a collection of text documents into numerical feature vectors. The feature vectors are normalized counts of word and/or character n-grams (based on the options supplied).

TextFeaturizingEstimator.Options

Advanced options for the TextFeaturizingEstimator.

TextNormalizingEstimator

IEstimator<TTransformer> for the TextNormalizingTransformer.

TextNormalizingTransformer

ITransformer resulting from fitting a TextNormalizingEstimator.

TokenizingByCharactersEstimator

IEstimator<TTransformer> for the TokenizingByCharactersTransformer.

TokenizingByCharactersTransformer

ITransformer resulting from fitting a TokenizingByCharactersEstimator.

WordBagEstimator

IEstimator<TTransformer> for the ITransformer.

WordBagEstimator.Options

Options for how the n-grams are extracted.

WordEmbeddingEstimator

Text featurizer which converts vectors of text tokens into a numerical vector using a pre-trained embeddings model.

WordEmbeddingTransformer

ITransformer resulting from fitting an WordEmbeddingEstimator.

WordHashBagEstimator

IEstimator<TTransformer> for the ITransformer.

WordTokenizingEstimator

Tokenizes input text using specified delimiters.

WordTokenizingTransformer

ITransformer resulting from fitting an WordTokenizingEstimator.

Structs

LatentDirichletAllocationTransformer.ModelParameters.ItemScore
LatentDirichletAllocationTransformer.ModelParameters.WordItemScore

Interfaces

IStopWordsRemoverOptions

Defines the different type of stop words remover supported.

Enums

NgramExtractingEstimator.WeightingCriteria

A statistical measure used to evaluate how important a word is to a document in a corpus. This enumeration is serialized.

StopWordsRemovingEstimator.Language

Stopwords language. This enumeration is serialized.

TextFeaturizingEstimator.Language

Text language. This enumeration is serialized.

TextFeaturizingEstimator.NormFunction

Text vector normalizer kind.

TextNormalizingEstimator.CaseMode

Case normalization mode of text. This enumeration is serialized.

WordEmbeddingEstimator.PretrainedModelKind

Specifies which word embeddings to use.