selectFeatures: Machine Learning-Featureauswahltransformation

Artikel
05/23/2023

Die Featureauswahltransformation wählt Features aus den angegebenen Variablen mit dem angegebenen Modus aus.

Verwendung

  selectFeatures(vars, mode, ...)

Argumente

`vars`

Eine Formel oder ein Vektor bzw. eine Liste von Zeichenfolgen, die den Namen der Variablen angeben, für die die Featureauswahl ausgeführt wird, wenn der Modus „minCount()“ lautet. Beispiel: ~ var1 + var2 + var3. Wenn der Modus mutualInformation() ist, eine Formel oder eine benannte Liste von Zeichenfolgen, die die abhängige Variable und die unabhängigen Variablen beschreiben. Beispiel: label ~ ``var1 + var2 + var3.

`mode`

Gibt den Modus der Featureauswahl an. Dies kann entweder minCount oder mutualInformation sein.

`...`

Zusätzliche Argumente, die direkt an die Microsoft-Compute-Engine übergeben werden sollen.

Details

Die Featureauswahltransformation wählt Features aus den angegebenen Variablen mithilfe eines der beiden Modi aus: Anzahl oder Transinformation. Weitere Informationen finden Sie unter minCount und mutualInformation.

Wert

Ein maml-Objekt, das die Transformation definiert.

Weitere Informationen

minCount mutualInformation

Beispiele


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)

 # Use a categorical hash transform which generated 128 features.
 outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
 summary(outModel1)

 # Apply a categorical hash transform and a count feature selection transform
 # which selects only those hash slots that has value.
 outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount())))
 summary(outModel2)

 # Apply a categorical hash transform and a mutual information feature selection transform
 # which selects only 10 features with largest mutual information with the label.
 outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures(like ~ reviewCatHash, mode = mutualInformation(numFeaturesToKeep = 10))))
 summary(outModel3)