# 資料轉換Data transformations

• 準備資料以進行模型定型prepare data for model training
• 以 TensorFlow 或 ONNX 格式套用匯入的模型apply an imported model in TensorFlow or ONNX format
• 在資料傳遞過模型之後進行後續處理post-process data after it has been passed through a model

## 資料行對應及群組Column mapping and grouping

Concatenate 將一或多個輸入資料行串連成新的輸出資料行Concatenate one or more input columns into a new output column
CopyColumns 複製並重新命名一或多個輸入資料行Copy and rename one or more input columns
DropColumns 卸除一或多個輸入資料行Drop one or more input columns
SelectColumns 選取一或多個資料行以將其自輸入資料中排除Select one or more columns to keep from the input data

## 標準化和調整Normalization and scaling

NormalizeMeanVariance 減去 (定型資料的) 平均數並除以 (定型資料的) 變異數Subtract the mean (of the training data) and divide by the variance (of the training data)
NormalizeLogMeanVariance 依定型資料的對數進行標準化Normalize based on the logarithm of the training data
NormalizeLpNorm 依據輸入向量的 lp-norm 來對它進行調整，其中 p 為 1、2 或無限。Scale input vectors by their lp-norm, where p is 1, 2 or infinity. 預設為 l2 (歐幾里得距離) 範數Defaults to the l2 (Euclidean distance) norm
NormalizeGlobalContrast 透過減去資料列資料的平均數並除以標準差或 (資料列資料的) l2 範數，並乘以可設定的比例因素 (預設為 2)，來調整資料列中的每個值Scale each value in a row by subtracting the mean of the row data and divide by either the standard deviation or l2-norm (of the row data), and multiply by a configurable scale factor (default 2)
NormalizeBinning 將輸入值指派至 bin 目錄並除以 bin 的數目，以產生介於 0 與 1 的浮點值。Assign the input value to a bin index and divide by the number of bins to produce a float value between 0 and 1. 系統會以能將定型資料平均分散到所有 bin 上的方式計算 bin 界線The bin boundaries are calculated to evenly distribute the training data across bins
NormalizeSupervisedBinning 根據 bin 與標籤資料行的關聯性將輸入值指派至該 binAssign the input value to a bin based on its correlation with label column
NormalizeMinMax 依定型資料中最小及最大值之間的差異來調整輸入Scale the input by the difference between the minimum and maximum values in the training data

## 資料類型之間的轉換Conversions between data types

ConvertType 將某個輸入資料行的類型轉換成新的類型Convert the type of an input column to a new type
MapValue 根據所提供的對應字典將值對應至索引鍵 (類別)Map values to keys (categories) based on the supplied dictionary of mappings
MapValueToKey 透過從輸入資料建立對應來將值對應至索引鍵 (類別)Map values to keys (categories) by creating the mapping from the input data
MapKeyToValue 將索引鍵轉換為其原始值Convert keys back to their original values
MapKeyToVector 將索引鍵轉換為原始值的向量Convert keys back to vectors of original values
MapKeyToBinaryVector 將索引鍵轉換為原始值的二進位向量Convert keys back to a binary vector of original values
Hash 對輸入資料行中的值進行雜湊處理Hash the value in the input column

## 文字轉換Text transformations

FeaturizeText 將文字資料行轉換為標準化 ngram 和 char-gram 計數的浮動陣列Transform a text column into a float array of normalized ngrams and char-grams counts
TokenizeIntoWords 將一或多個文字資料行分割為個別字詞Split one or more text columns into individual words
TokenizeIntoCharactersAsKeys 將一或多個文字資料行分割為於一組主題上的個別字元浮點數Split one or more text columns into individual characters floats over a set of topics
NormalizeText 變更大小寫，移除變音符號、標點符號及數字Change case, remove diacritical marks, punctuation marks, and numbers
ProduceNgrams 將文字資料行轉換為一袋 ngram 計數 (連續字詞的序列)Transform text column into a bag of counts of ngrams (sequences of consecutive words)
ProduceWordBags 將文字資料行轉換為一袋 ngram 向量計數Transform text column into a bag of counts of ngrams vector
ProduceHashedNgrams 將文字資料行轉換為雜湊 ngram 計數的向量Transform text column into a vector of hashed ngram counts
ProduceHashedWordBags 將文字資料行轉換為一袋雜湊 ngram 計數Transform text column into a bag of hashed ngram counts
RemoveDefaultStopWords 從輸入資料行針對指定語言移除預設停用字詞Remove default stop words for the specified language from input columns
RemoveStopWords 從輸入資料行移除指定停用字詞Removes specified stop words from input columns
LatentDirichletAllocation 將文件 (以浮點數向量表示) 轉換為一組主題上的浮點數向量Transform a document (represented as a vector of floats) into a vector of floats over a set of topics
ApplyWordEmbedding 使用預先定型的模型將文字權杖的向量轉換成句子向量Convert vectors of text tokens into sentence vectors using a pre-trained model

## 影像轉換Image transformations

ConvertToGrayscale 將影像轉換為灰階Convert an image to grayscale
ConvertToImage 將像素的向量轉換為 ImageDataViewTypeConvert a vector of pixels to ImageDataViewType
ExtractPixels 將來自輸入影像的像素轉換為數字向量Convert pixels from input image into a vector of numbers
ResizeImages 調整影像大小Resize images
DnnFeaturizeImage 套用預先定型的深度神經網路 (DNN) 模型，將輸入影像轉換成特徵向量Applies a pre-trained deep neural network (DNN) model to transform an input image into a feature vector

## 類別資料轉換Categorical data transformations

OneHotEncoding 將一或多個文字資料行轉換為 one-hot (英文) 編碼向量Convert one or more text columns into one-hot encoded vectors
OneHotHashEncoding 將一或多個文字資料行轉換為以雜湊為基礎的 one-hot 編碼向量Convert one or more text columns into hash-based one-hot encoded vectors

## 時間序列資料轉換Time series data transformations

DetectAnomalyBySrCnn 使用光譜殘留 (SR) 演算法偵測輸入時間序列資料中的異常Detect anomalies in the input time series data using the Spectral Residual (SR) algorithm
DetectChangePointBySsa 使用單一頻譜分析 (SSA) 偵測時間序列資料中的變更點Detect change points in time series data using singular spectrum analysis (SSA)
DetectIidChangePoint 使用彈性核心密度估計和鞅分數，偵測獨立和相同分散式 (IID) 時間序列資料中的變更點Detect change points in independent and identically distributed (IID) time series data using adaptive kernel density estimations and martingale scores
ForecastBySsa 使用單一頻譜分析 (SSA) 預測時間序列資料Forecast time series data using singular spectrum analysis (SSA)
DetectSpikeBySsa 使用單一頻譜分析 (SSA) 偵測時間序列資料中的尖峰Detect spikes in time series data using singular spectrum analysis (SSA)
DetectIidSpike 使用彈性核心密度估計和鞅分數，偵測獨立和相同分散式 (IID) 時間序列資料中的尖峰Detect spikes in independent and identically distributed (IID) time series data using adaptive kernel density estimations and martingale scores

## 遺失值Missing values

IndicateMissingValues 建立新的布林值輸出資料行，其值在輸入資料行中的值遺失時為 trueCreate a new boolean output column, the value of which is true when the value in the input column is missing
ReplaceMissingValues 建立新的輸出資料行，其值在輸入資料行中的值遺失時會被設為預設值，否則則會為輸入值Create a new output column, the value of which is set to a default value if the value is missing from the input column, and the input value otherwise

## 特徵選取Feature selection

SelectFeaturesBasedOnCount 選取其非預設值大於某個閾值的特徵Select features whose non-default values are greater than a threshold
SelectFeaturesBasedOnMutualInformation 選取其標籤資料行中的資料最具相依性的特徵Select the features on which the data in the label column is most dependent

## 功能轉換Feature transformations

ApproximatedKernelMap 將每個輸入向量對應至較低維度的功能空間，其中內部產品會近似核心函式，以便可以將功能當作線性演算法的輸入使用Map each input vector onto a lower dimensional feature space, where inner products approximate a kernel function, so that the features can be used as inputs to the linear algorithms
ProjectToPrincipalComponents 套用主體元件分析演算法，以減少輸入特徵向量的維度Reduce the dimensions of the input feature vector by applying the Principal Component Analysis algorithm

## 可解釋性轉換Explainability transformations

CalculateFeatureContribution 為特徵向量的每個元素計算貢獻分數Calculate contribution scores for each element of a feature vector

## 校正轉換Calibration transformations

Platt(String, String, String) 使用羅吉斯回歸搭配使用定型資料估計的參數，將二元分類器原始分數轉換成類別機率Transforms a binary classifier raw score into a class probability using logistic regression with parameters estimated using the training data
Platt(Double, Double, String) 使用羅吉斯回歸搭配固定參數，將二元分類器原始分數轉換成類別機率Transforms a binary classifier raw score into a class probability using logistic regression with fixed parameters
Naive 藉由將分數指派給 Bin，並根據各 Bin 間的分佈計算機率，將二元分類器原始分數轉換成類別機率Transforms a binary classifier raw score into a class probability by assigning scores to bins, and calculating the probability based on the distribution among the bins
Isotonic 藉由將分數指派給 Bin 來將二元分類器原始分數轉換成類別機率，其中會使用定型資料來估計界限的位置和 Bin 的大小Transforms a binary classifier raw score into a class probability by assigning scores to bins, where the position of boundaries and the size of bins are estimated using the training data

## 深度學習轉換Deep learning transformations

ApplyOnnxModel 使用匯入的 ONNX 模型轉換輸入資料Transform the input data with an imported ONNX model
LoadTensorFlowModel 使用匯入的 TensorFlow 模型轉換輸入資料Transform the input data with an imported TensorFlow model

## 自訂轉換Custom transformations

CustomMapping 搭配使用者定義的對應將現有資料行轉換為新的資料行Transform existing columns onto new ones with a user-defined mapping