selectFeatures:機器學習的特徵選取轉換

特徵選取轉換會使用指定的模式,從指定的變數中選取特徵。

使用方式

  selectFeatures(vars, mode, ...)

引數

vars

如果模式為 minCount(),則為公式或字串的向量/字串清單,其指定執行特徵選取時所依據的變數名稱。 例如: ~ var1 + var2 + var3 。 如果模式為 mutualInformation(),則為描述相依變數和獨立變數的公式或具名字串清單。 例如: label ~ ``var1 + var2 + var3

mode

指定特徵選取的模式。 這可以是 minCountmutualInformation

...

直接傳遞至 Microsoft Compute Engine 的額外引數。

詳細資料

特徵選取轉換會使用下列兩種模式之一,從指定的變數中選取特徵:計數或相互資訊。 如需詳細資訊,請參閱 minCountmutualInformation

定義轉換的 maml 物件。

另請參閱

minCountmutualInformation

範例


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)

 # Use a categorical hash transform which generated 128 features.
 outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
 summary(outModel1)

 # Apply a categorical hash transform and a count feature selection transform
 # which selects only those hash slots that has value.
 outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount())))
 summary(outModel2)

 # Apply a categorical hash transform and a mutual information feature selection transform
 # which selects only 10 features with largest mutual information with the label.
 outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures(like ~ reviewCatHash, mode = mutualInformation(numFeaturesToKeep = 10))))
 summary(outModel3)