TransformsCatalog.TextTransforms 类

参考

定义

命名空间:: Microsoft.ML

程序集:: Microsoft.ML.Data.dll

包:: Microsoft.ML v3.0.1

包:: Microsoft.ML v1.0.0

包:: Microsoft.ML v1.1.0

包:: Microsoft.ML v1.2.0

包:: Microsoft.ML v1.3.1

包:: Microsoft.ML v1.4.0

包:: Microsoft.ML v1.5.5

包:: Microsoft.ML v1.6.0

包:: Microsoft.ML v1.7.0

包:: Microsoft.ML v2.0.0

重要

一些信息与预发行产品相关，相应产品在发行之前可能会进行重大修改。对于此处提供的信息，Microsoft 不作任何明示或暗示的担保。

用于 MLContext 创建文本数据转换组件实例的类。

public sealed class TransformsCatalog.TextTransforms

type TransformsCatalog.TextTransforms = class

Public NotInheritable Class TransformsCatalog.TextTransforms

继承: Object
TransformsCatalog.TextTransforms

扩展方法

ApplyWordEmbedding(TransformsCatalog+TextTransforms, String, String, WordEmbeddingEstimator+PretrainedModelKind)	创建， WordEmbeddingEstimator它是一个文本特征化器，它使用预先训练的嵌入模型将文本向量转换为数字向量。
ApplyWordEmbedding(TransformsCatalog+TextTransforms, String, String, String)	创建， WordEmbeddingEstimator它是一个文本特征化器，它使用预先训练的嵌入模型将文本向量转换为数字向量。
FeaturizeText(TransformsCatalog+TextTransforms, String, TextFeaturizingEstimator+Options, String[])	创建一个 TextFeaturizingEstimator，它将文本列转换为的特征化向量，该向量 Single 表示 n-gram 和 char-gram 的规范化计数。
FeaturizeText(TransformsCatalog+TextTransforms, String, String)	创建， TextFeaturizingEstimator它将文本列转换为的特征化向量，该向量 Single 表示 n-gram 和 char-gram 的规范化计数。
LatentDirichletAllocation(TransformsCatalog+TextTransforms, String, String, Int32, Single, Single, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Boolean)	创建一个 LatentDirichletAllocationEstimator，它使用 LightLDA 将表示为浮点向量) 的文本 (转换为指示文本与标识的每个主题相似性的向量 Single 。
NormalizeText(TransformsCatalog+TextTransforms, String, String, TextNormalizingEstimator+CaseMode, Boolean, Boolean, Boolean)	创建一个 TextNormalizingEstimator，它通过选择性地更改大小写、删除音调符号、标点符号、数字来规范传入 `inputColumnName` 文本，并将新文本输出为 `outputColumnName`。
ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)	创建， NgramHashingEstimator用于将数据从中指定的 `inputColumnName` 列复制到新列： `outputColumnName` ，并生成哈希 n-gram 计数的矢量。
ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)	创建一个 NgramHashingEstimator，它将中指定的 `inputColumnNames` 多个列中的数据提取到一个新列： `outputColumnName` 并生成哈希 n-gram 计数的向量。
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)	创建， WordHashBagEstimator用于将中指定的 `inputColumnName` 列映射到名为 `outputColumnName`的新列中的哈希 n-gram 计数的向量。
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)	创建， WordHashBagEstimator用于将中指定的 `inputColumnNames` 多个列映射到名为 `outputColumnName`的新列中经过哈希的 n-gram 计数的向量。
ProduceNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	创建一个， NgramExtractingEstimator 它生成输入文本中遇到的) 连续单词序列 (n 克计数的向量。
ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)	创建一个 WordBagEstimator，它将中指定的 `inputColumnName` 列映射到名为 `outputColumnName`的新列中的 n 个克计数的向量。
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	创建一个 WordBagEstimator，它将中指定的 `inputColumnName` 列映射到名为 `outputColumnName`的新列中的 n 个克计数的向量。
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	创建一个 WordBagEstimator，用于将中指定的 `inputColumnNames` 多个列映射到名为 `outputColumnName`的新列中的 n 个语法计数的向量。
RemoveDefaultStopWords(TransformsCatalog+TextTransforms, String, String, StopWordsRemovingEstimator+Language)	创建一个 CustomStopWordsRemovingEstimator，用于将数据从中指定的 `inputColumnName` 列复制到新列： `outputColumnName` 并从中删除专用于 `language` 的预先定义的文本集。
RemoveStopWords(TransformsCatalog+TextTransforms, String, String, String[])	创建一个 CustomStopWordsRemovingEstimator，用于将数据从中指定的 `inputColumnName` 列复制到新列： `outputColumnName` 并从中删除中指定的 `stopwords` 文本。
TokenizeIntoCharactersAsKeys(TransformsCatalog+TextTransforms, String, String, Boolean)	创建一个 TokenizingByCharactersEstimator，它通过使用滑动窗口将文本拆分为字符序列进行标记化。
TokenizeIntoWords(TransformsCatalog+TextTransforms, String, String, Char[])	创建一个 WordTokenizingEstimator，它使用 `separators` 作为分隔符标记输入文本。

适用于

TransformsCatalog.TextTransforms 类

定义

扩展方法

适用于

反馈

其他资源