機器學習詞彙的重要字詞Machine learning glossary of important terms

以下清單匯集了對於您在 ML.NET 中建置自訂模型來說,相當實用的重要機器學習字詞。The following list is a compilation of important machine learning terms that are useful as you build your custom models in ML.NET.

準確率Accuracy

分類中,準確率係指正確分類的項目數,除以測試集內的項目總數後,所得出的值。In classification, accuracy is the number of correctly classified items divided by the total number of items in the test set. 範圍為 0 (最不準確) 到 1 (最準確)。Ranges from 0 (least accurate) to 1 (most accurate). 準確率是模型效能的其中一個評估計量。Accuracy is one of evaluation metrics of the model performance. 請將它與精確率召回率F 分數一起考量。Consider it in conjunction with precision, recall, and F-score.

曲線下的面積 (AUC)Area under the curve (AUC)

二元分類中,作為曲線下面積值的評估計量,此面積會繪製出真陽性率 (Y 軸上) 與偽陽性率 (X 軸上) 的對比。In binary classification, an evaluation metric that is the value of the area under the curve that plots the true positives rate (on the y-axis) against the false positives rate (on the x-axis). 範圍為 0.5 (最差) 到 1 (最佳)。Ranges from 0.5 (worst) to 1 (best). 這也稱為 ROC 曲線 (亦即 Receiver Operating Characteristic Curve (接收者操作特徵曲線)) 下的面積。Also known as the area under the ROC curve, i.e., receiver operating characteristic curve. 如需詳細資訊,請參閱維基百科上的接收者操作特徵 (英文) 一文。For more information, see the Receiver operating characteristic article on Wikipedia.

二元分類Binary classification

標籤僅來自兩個類別其中之一的分類案例。A classification case where the label is only one out of two classes. 如需詳細資訊,請參閱機器學習工作主題的二元分類一節。For more information, see the Binary classification section of the Machine learning tasks topic.

校正Calibration

校正是將未經處理分數對應到二元和多元分類類別成員資格的程序。Calibration is the process of mapping a raw score onto a class membership, for binary and multiclass classification. 有一些 ML.NET 定型器有 NonCalibrated 尾碼。Some ML.NET trainers have a NonCalibrated suffix. 這些演算法會產生必須對應至類別機率的未經處理分數。These algorithms produce a raw score that then must be mapped to a class probability.

CatalogCatalog

在 ML.NET 中,目錄是延伸模組函式集合,依一般用途分組。In ML.NET, a catalog is a collection of extension functions, grouped by a common purpose.

例如,每個機器學習工作 (二元分類、迴歸、排名等等) 都有可用的機器學習演算法 (定型器) 目錄。For example, each machine learning task (binary classification, regression, ranking etc) has a catalog of available machine learning algorithms (trainers). 二元分類定型器的目錄是:BinaryClassificationCatalog.BinaryClassificationTrainersThe catalog for the binary classification trainers is: BinaryClassificationCatalog.BinaryClassificationTrainers.

分類Classification

使用資料來預測分類時,監督式機器學習工作便稱為分類。When the data is used to predict a category, supervised machine learning task is called classification. 二元分類係指僅預測兩個分類 (例如,將影像分類成「貓」或「狗」的圖片)。Binary classification refers to predicting only two categories (for example, classifying an image as a picture of either a 'cat' or a 'dog'). 多元分類係指預測多個分類 (例如,將影像分類成一種特定狗品種的圖片)。Multiclass classification refers to predicting multiple categories (for example, when classifying an image as a picture of a specific breed of dog).

決定係數Coefficient of determination

迴歸中,指出資料與模型相符程度的評估計量。In regression, an evaluation metric that indicates how well data fits a model. 範圍為 0 到 1。Ranges from 0 to 1. 值為 0 時,表示資料為隨機資料,或與模型不相符。A value of 0 means that the data is random or otherwise cannot be fit to the model. 值為 1 時,表示模型與資料完全相符。A value of 1 means that the model exactly matches the data. 這通常稱為r2R2 或 R 平方。This is often referred to as r2, R2, or r-squared.

資料Data

資料是所有機器學習應用程式的中心。Data is central to any machine learning application. 在 ML.NET 中,資料是由 IDataView 物件表示。In ML.NET data is represented by IDataView objects. 資料檢視物件:Data view objects:

  • 由資料行和資料列組成are made up of columns and rows
  • 延遲評估,即作業呼叫它時,它們只載入資料are lazily evaluated, that is they only load data when an operation calls for it
  • 包含定義每個資料行類型、格式和長度的結構描述contain a schema that defines the type, format and length of each column

評估工具Estimator

ML.NET 中實作 IEstimator<TTransformer> 介面的類別。A class in ML.NET that implements the IEstimator<TTransformer> interface.

評估工具是轉換的規格 (資料準備轉換和機器學習模型定型轉換)。An estimator is a specification of a transformation (both data preparation transformation and machine learning model training transformation). 評估工具可以一起鏈結到轉換管線。Estimators can be chained together into a pipeline of transformations. 評估工具參數或評估工具管線是在呼叫 Fit 時學到。The parameters of an estimator or pipeline of estimators are learned when Fit is called. Fit 的結果是轉換器The result of Fit is a Transformer.

擴充方法Extension method

屬於類別,但定義在類別外的 .NET 方法。A .NET method that is part of a class but is defined outside of the class. 擴充方法的第一個參數是擴充方法所屬類別的靜態 this 參考。The first parameter of an extension method is a static this reference to the class to which the extension method belongs.

擴充方法密集用在 ML.NET 中,以建構評估工具的執行個體。Extension methods are used extensively in ML.NET to construct instances of estimators.

功能Feature

所評估之現象的可評估屬性,通常是一個數 (雙精度浮點數) 值。A measurable property of the phenomenon being measured, typically a numeric (double) value. 多個特徵會稱為特徵向量,且通常會儲存成 double[]Multiple features are referred to as a Feature vector and typically stored as double[]. 特徵定義了所評估之現象的重要特性。Features define the important characteristics of the phenomenon being measured. 如需詳細資訊,請參閱維基百科上的特徵 (英文) 一文。For more information, see the Feature article on Wikipedia.

特徵工程Feature engineering

特徵工程是一個程序,牽涉到定義一組特徵,並開發能從可用現象資料產生特徵向量 (亦即特徵擷取) 的軟體。Feature engineering is the process that involves defining a set of features and developing software that produces feature vectors from available phenomenon data, i.e., feature extraction. 如需詳細資訊,請參閱維基百科上的特徵工程 (英文) 一文。For more information, see the Feature engineering article on Wikipedia.

F 分數F-score

分類中,用來平衡精準率召回率的評估計量。In classification, an evaluation metric that balances precision and recall.

超參數Hyperparameter

機器學習演算法的參數。A parameter of a machine learning algorithm. 範例包括決策樹系中要學習的樹數目,或梯度下降演算法中的步階大小。Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. 「超參數」 的值是在將模型定型之前設定的,並且會控管尋找預測函式之參數 (例如決策樹中的比較點或線性迴歸模型中的加權) 的程序。Values of Hyperparameters are set before training the model and govern the process of finding the parameters of the prediction function, for example, the comparison points in a decision tree or the weights in a linear regression model. 如需詳細資訊,請參閱維基百科上的超參數 (英文) 一文。For more information, see the Hyperparameter article on Wikipedia.

標籤Label

要使用機器學習模型來預測的元素。The element to be predicted with the machine learning model. 例如,狗的品種或未來的股價。For example, the breed of dog or a future stock price.

對數損失Log loss

分類中,描述分類器準確率特性的評估計量。In classification, an evaluation metric that characterizes the accuracy of a classifier. 對數損失越小,分類器的準確率就越高。The smaller log loss is, the more accurate a classifier is.

Loss 函式Loss function

Loss 函式是定型標籤值和模型所做預測之間的差異。A loss function is the difference between the training label values and the prediction made by the model. 模型的參數是透過將 loss 函式降到最低來評估。The parameters of the model are estimated by minimizing the loss function.

不同的定型器可使用不同 loss 函式設定。Different trainers can be configured with different loss functions.

平均絕對誤差 (MAE)Mean absolute error (MAE)

迴歸中,作為所有模型誤差平均值的評估計量,其中模型誤差係指所預測標籤值與正確標籤值之間的差距。In regression, an evaluation metric that is the average of all the model errors, where model error is the distance between the predicted label value and the correct label value.

型號Model

傳統上,預測函式的參數。Traditionally, the parameters for the prediction function. 例如,線性迴歸模型中的加權,或決策樹中的分割點。For example, the weights in a linear regression model or the split points in a decision tree. 在 ML.NET 中,模型包含預測領域物件 (例如影像或文字) 標籤所需的一切資訊。In ML.NET, a model contains all the information necessary to predict the label of a domain object (for example, image or text). 這意謂著 ML.NET 模型除了包含預測函式的參數之外,也包含必要的特徵化步驟。This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function.

多元分類Multiclass classification

標籤來自三個或更多個類別其中之一的分類案例。A classification case where the label is one out of three or more classes. 如需詳細資訊,請參閱機器學習工作主題的多元分類一節。For more information, see the Multiclass classification section of the Machine learning tasks topic.

N 連語法 (N-gram)N-gram

文字資料的特徵擷取配置:任何一連串的 N 個單字會轉換成特徵值。A feature extraction scheme for text data: any sequence of N words turns into a feature value.

正規化Normalization

正規化是將浮點資料調整為 0 與 1 之間值的流程。Normalization is the process of scaling floating point data to values between 0 and 1. ML.NET 中使用的許多定型演算法都需要將輸入功能資料標準化。Many of the training algorithms used in ML.NET require input feature data to be normalized. ML.NET 提供一系列用於正規化的轉換ML.NET provides a series of transforms for normalization

數值特徵向量Numerical feature vector

僅由數值組成的特徵向量。A feature vector consisting only of numerical values. 這與 double[] 類似。This is similar to double[].

管線Pipeline

讓模型與資料集相符所需的所有作業。All of the operations needed to fit a model to a data set. 管線會由資料輸入、轉換、特徵化及學習步驟所組成。A pipeline consists of data import, transformation, featurization, and learning steps. 在管線定型之後,就會轉換成模型。Once a pipeline is trained, it turns into a model.

精確度Precision

分類中,類別的精確率係指正確預測為屬於該類別的項目數,除以預測為屬於該類別的項目總數後,所得出的值。In classification, the precision for a class is the number of items correctly predicted as belonging to that class divided by the total number of items predicted as belonging to the class.

召回率Recall

分類中,類別的召回率係指正確預測為屬於該類別的項目數,除以實際屬於該類別的項目總數後,所得出的值。In classification, the recall for a class is the number of items correctly predicted as belonging to that class divided by the total number of items that actually belong to the class.

正規化Regularization

正規化不利於太過複雜的線性模型。Regularization penalizes a linear model for being too complicated. 正規化有兩種:There are two types of regularization:

  • $L_1$ 正規化零加權不顯著的特性。$L_1$ regularization zeros weights for insignificant features. 在這種正規化後,已儲存的模型大小可能會變得較小。The size of the saved model may become smaller after this type of regularization.
  • $L_2$ 正規化會最小化不顯著特性的加權範圍,這是更一般的程序,且對極端值較不敏感。$L_2$ regularization minimizes weight range for insignificant features, This is a more general process and is less sensitive to outliers.

回復Regression

輸出為實際值 (例如雙精度浮點數) 的機器學習工作。A supervised machine learning task where the output is a real value, for example, double. 範例包括預測股價。Examples include predicting stock prices. 如需詳細資訊,請參閱機器學習工作主題的迴歸一節。For more information, see the Regression section of the Machine learning tasks topic.

相對絕對誤差Relative absolute error

迴歸中,此評估計量是所有絕對誤差的總和除以正確標籤值與所有正確標籤值之平均值的差距總和後,所得出的值。In regression, an evaluation metric that is the sum of all absolute errors divided by the sum of distances between correct label values and the average of all correct label values.

相對平方誤差Relative squared error

迴歸中,此評估計量是所有平方絕對誤差的總和除以正確標籤值與所有正確標籤值之平均值的平方差距總和後,所得出的值。In regression, an evaluation metric that is the sum of all squared absolute errors divided by the sum of squared distances between correct label values and the average of all correct label values.

均方根誤差 (RMSE)Root of mean squared error (RMSE)

迴歸中,此評估計量是誤差平方值之平均值的平方根。In regression, an evaluation metric that is the square root of the average of the squares of the errors.

評分Scoring

評分是將新資料套用至定型機器學習模型並產生預測的流程。Scoring is the process of applying new data to a trained machine learning model, and generating predictions. 評分也稱為推斷。Scoring is also known as inferencing. 根據模型的類型而定,分數可能是原始值、機率或類別。Depending on the type of model, the score may be a raw value, a probability, or a category.

監督式機器學習Supervised machine learning

這是機器學習的子集,其中所需的模型會預測尚未顯示資料的標籤。A subclass of machine learning in which a desired model predicts the label for yet-unseen data. 範例包括分類、迴歸及結構化預測。Examples include classification, regression, and structured prediction. 如需詳細資訊,請參閱維基百科上的監督式學習 (英文) 一文。For more information, see the Supervised learning article on Wikipedia.

訓練Training

針對所指定定型資料集,識別模型的程序。The process of identifying a model for a given training data set. 就線性模型而言,這意謂著要尋找加權。For a linear model, this means finding the weights. 就決策樹而言,則涉及識別分割點。For a tree, it involves identifying the split points.

轉換器Transformer

實作 ITransformer 介面的 ML.NET 類別。An ML.NET class that implements the ITransformer interface.

轉換器會將一個 IDataView 轉換成另一個。A transformer transforms one IDataView into another. 轉換器是透過定型評估工具或評估管線所建立。A transformer is created by training an estimator, or an estimator pipeline.

非監督式機器學習Unsupervised machine learning

一個機器學習子集,其中所需的模型會尋找資料中隱藏 (或潛伏) 的結構。A subclass of machine learning in which a desired model finds hidden (or latent) structure in data. 範例包括群集、建立主題模型,以及縮減維度。Examples include clustering, topic modeling, and dimensionality reduction. 如需詳細資訊,請參閱維基百科上的非監督式學習 (英文) 一文。For more information, see the Unsupervised learning article on Wikipedia.