機器學習詞彙的重要字詞Machine learning glossary of important terms

以下清單匯集了對於您在 ML.NET 中建置自訂模型來說,相當實用的重要機器學習字詞。The following list is a compilation of important machine learning terms that are useful as you build your custom models in ML.NET.

注意

此文件指的是 ML.NET,它目前處於預覽階段。This documentation refers to ML.NET, which is currently in Preview. 資料可能會有變更。Material may be subject to change. 如需詳細資訊,請瀏覽 ML.NET 簡介 (英文)。For more information, see the ML.NET introduction.

準確率Accuracy

分類中,準確率係指正確分類的項目數,除以測試集內的項目總數後,所得出的值。In classification, accuracy is the number of correctly classified items divided by the total number of items in the test set. 範圍為 0 (最不準確) 到 1 (最準確)。Ranges from 0 (least accurate) to 1 (most accurate). 準確率是模型效能的其中一個評估計量。Accuracy is one of evaluation metrics of the model performance. 請將它與精確率召回率F 分數一起考量。Consider it in conjunction with precision, recall, and F-score.

曲線下的面積 (AUC)Area under the curve (AUC)

二元分類中,作為曲線下面積值的評估計量,此面積會繪製出真陽性率 (Y 軸上) 與偽陽性率 (X 軸上) 的對比。In binary classification, an evaluation metric that is the value of the area under the curve that plots the true positives rate (on the y-axis) against the false positives rate (on the x-axis). 範圍為 0.5 (最差) 到 1 (最佳)。Ranges from 0.5 (worst) to 1 (best). 這也稱為 ROC 曲線 (亦即 Receiver Operating Characteristic Curve (接收者操作特徵曲線)) 下的面積。Also known as the area under the ROC curve, i.e., receiver operating characteristic curve. 如需詳細資訊,請參閱維基百科上的接收者操作特徵 (英文) 一文。For more information, see the Receiver operating characteristic article on Wikipedia.

二元分類Binary classification

標籤僅來自兩個類別其中之一的分類案例。A classification case where the label is only one out of two classes. 如需詳細資訊,請參閱機器學習工作主題的二元分類一節。For more information, see the Binary classification section of the Machine learning tasks topic.

分類Classification

使用資料來預測分類時,監督式機器學習工作便稱為分類。When the data is used to predict a category, supervised machine learning task is called classification. 二元分類係指僅預測兩個分類 (例如,將影像分類成「貓」或「狗」的圖片)。Binary classification refers to predicting only two categories (for example, classifying an image as a picture of either a 'cat' or a 'dog'). 多元分類係指預測多個分類 (例如,將影像分類成一種特定狗品種的圖片)。Multiclass classification refers to predicting multiple categories (for example, when classifying an image as a picture of a specific breed of dog).

決定係數Coefficient of determination

迴歸中,指出資料與模型相符程度的評估計量。In regression, an evaluation metric that indicates how well data fits a model. 範圍為 0 到 1。Ranges from 0 to 1. 值為 0 時,表示資料為隨機資料,或與模型不相符。A value of 0 means that the data is random or otherwise cannot be fit to the model. 值為 1 時,表示模型與資料完全相符。A value of 1 means that the model exactly matches the data. 這通常稱為r2R2 或 R 平方。This is often referred to as r2, R2, or r-squared.

功能Feature

所評估之現象的可評估屬性,通常是一個數 (雙精度浮點數) 值。A measurable property of the phenomenon being measured, typically a numeric (double) value. 多個特徵會稱為特徵向量,且通常會儲存成 double[]Multiple features are referred to as a Feature vector and typically stored as double[]. 特徵定義了所評估之現象的重要特性。Features define the important characteristics of the phenomenon being measured. 如需詳細資訊,請參閱維基百科上的特徵 (英文) 一文。For more information, see the Feature article on Wikipedia.

特徵工程Feature engineering

特徵工程是一個程序,牽涉到定義一組特徵,並開發能從可用現象資料產生特徵向量 (亦即特徵擷取) 的軟體。Feature engineering is the process that involves defining a set of features and developing software that produces feature vectors from available phenomenon data, i.e., feature extraction. 如需詳細資訊,請參閱維基百科上的特徵工程 (英文) 一文。For more information, see the Feature engineering article on Wikipedia.

F 分數F-score

分類中,用來平衡精準率召回率的評估計量。In classification, an evaluation metric that balances precision and recall.

超參數Hyperparameter

機器學習演算法的參數。A parameter of a machine learning algorithm. 範例包括決策樹系中要學習的樹數目,或梯度下降演算法中的步階大小。Examples include the number of trees to learn in a decision forest or the step size in a gradient descent algorithm. 「超參數」的值是在將模型定型之前設定的,並且會控管尋找預測函式之參數 (例如決策樹中的比較點或線性迴歸模型中的加權) 的程序。Values of Hyperparameters are set before training the model and govern the process of finding the parameters of the prediction function, for example, the comparison points in a decision tree or the weights in a linear regression model. 如需詳細資訊,請參閱維基百科上的超參數 (英文) 一文。For more information, see the Hyperparameter article on Wikipedia.

標籤Label

要使用機器學習模型來預測的元素。The element to be predicted with the machine learning model. 例如,狗的品種或未來的股價。For example, the breed of dog or a future stock price.

對數損失Log loss

分類中,描述分類器準確率特性的評估計量。In classification, an evaluation metric that characterizes the accuracy of a classifier. 對數損失越小,分類器的準確率就越高。The smaller log loss is, the more accurate a classifier is.

平均絕對誤差 (MAE)Mean absolute error (MAE)

迴歸中,作為所有模型誤差平均值的評估計量,其中模型誤差係指所預測標籤值與正確標籤值之間的差距。In regression, an evaluation metric that is the average of all the model errors, where model error is the distance between the predicted label value and the correct label value.

型號Model

傳統上,預測函式的參數。Traditionally, the parameters for the prediction function. 例如,線性迴歸模型中的加權,或決策樹中的分割點。For example, the weights in a linear regression model or the split points in a decision tree. 在 ML.NET 中,模型包含預測領域物件 (例如影像或文字) 標籤所需的一切資訊。In ML.NET, a model contains all the information necessary to predict the label of a domain object (for example, image or text). 這意謂著 ML.NET 模型除了包含預測函式的參數之外,也包含必要的特徵化步驟。This means that ML.NET models include the featurization steps necessary as well as the parameters for the prediction function.

多元分類Multiclass classification

標籤來自三個或更多個類別其中之一的分類案例。A classification case where the label is one out of three or more classes. 如需詳細資訊,請參閱機器學習工作主題的多元分類一節。For more information, see the Multiclass classification section of the Machine learning tasks topic.

N 連語法 (N-gram)N-gram

文字資料的特徵擷取配置:任何一連串的 N 個單字會轉換成特徵值。A feature extraction scheme for text data: any sequence of N words turns into a feature value.

數值特徵向量Numerical feature vector

僅由數值組成的特徵向量。A feature vector consisting only of numerical values. 這與 double[] 類似。This is similar to double[].

管線Pipeline

讓模型與資料集相符所需的所有作業。All of the operations needed to fit a model to a data set. 管線會由資料輸入、轉換、特徵化及學習步驟所組成。A pipeline consists of data import, transformation, featurization, and learning steps. 在管線定型之後,就會轉換成模型。Once a pipeline is trained, it turns into a model.

精確度Precision

分類中,類別的精確率係指正確預測為屬於該類別的項目數,除以預測為屬於該類別的項目總數後,所得出的值。In classification, the precision for a class is the number of items correctly predicted as belonging to that class divided by the total number of items predicted as belonging to the class.

召回率Recall

分類中,類別的召回率係指正確預測為屬於該類別的項目數,除以實際屬於該類別的項目總數後,所得出的值。In classification, the recall for a class is the number of items correctly predicted as belonging to that class divided by the total number of items that actually belong to the class.

回復Regression

輸出為實際值 (例如雙精度浮點數) 的機器學習工作。A supervised machine learning task where the output is a real value, for example, double. 範例包括預測股價。Examples include predicting stock prices. 如需詳細資訊,請參閱機器學習工作主題的迴歸一節。For more information, see the Regression section of the Machine learning tasks topic.

相對絕對誤差Relative absolute error

迴歸中,此評估計量是所有絕對誤差的總和除以正確標籤值與所有正確標籤值之平均值的差距總和後,所得出的值。In regression, an evaluation metric that is the sum of all absolute errors divided by the sum of distances between correct label values and the average of all correct label values.

相對平方誤差Relative squared error

迴歸中,此評估計量是所有平方絕對誤差的總和除以正確標籤值與所有正確標籤值之平均值的平方差距總和後,所得出的值。In regression, an evaluation metric that is the sum of all squared absolute errors divided by the sum of squared distances between correct label values and the average of all correct label values.

均方根誤差 (RMSE)Root of mean squared error (RMSE)

迴歸中,此評估計量是誤差平方值之平均值的平方根。In regression, an evaluation metric that is the square root of the average of the squares of the errors.

監督式機器學習Supervised machine learning

這是機器學習的子集,其中所需的模型會預測尚未顯示資料的標籤。A subclass of machine learning in which a desired model predicts the label for yet-unseen data. 範例包括分類、迴歸及結構化預測。Examples include classification, regression, and structured prediction. 如需詳細資訊,請參閱維基百科上的監督式學習 (英文) 一文。For more information, see the Supervised learning article on Wikipedia.

訓練Training

針對所指定定型資料集,識別模型的程序。The process of identifying a model for a given training data set. 就線性模型而言,這意謂著要尋找加權。For a linear model, this means finding the weights. 就決策樹而言,則涉及識別分割點。For a tree, it involves identifying the split points.

資料轉換Transform

一個會轉換資料的管線元件。A pipeline component that transforms data. 例如,從文字轉換成數字向量。For example, from text to vector of numbers.

非監督式機器學習Unsupervised machine learning

一個機器學習子集,其中所需的模型會尋找資料中隱藏 (或潛伏) 的結構。A subclass of machine learning in which a desired model finds hidden (or latent) structure in data. 範例包括群集、建立主題模型,以及縮減維度。Examples include clustering, topic modeling, and dimensionality reduction. 如需詳細資訊,請參閱維基百科上的非監督式學習 (英文) 一文。For more information, see the Unsupervised learning article on Wikipedia.