minCount: Zählmodus der Featureauswahl

Artikel
05/23/2023

Zählmodus der Featureauswahl, der in der Featureauswahltransformation selectFeatures verwendet wird.

Verwendung

  minCount(count = 1, ...)

Argumente

`count`

Der Schwellenwert für die zählungsbasierte Featureauswahl. Ein Feature wird unter der Voraussetzung ausgewählt, dass mindestens count Beispiele einen nicht standardmäßigen Wert für das Feature aufweisen. Der Standardwert ist 1.

`...`

Zusätzliche Argumente, die direkt an die Microsoft-Compute-Engine übergeben werden sollen.

Details

Bei Verwendung des Zählmodus in der Transformation zur Auswahl von Features wird ein Feature ausgewählt, wenn die Anzahl der Beispiele mindestens die angegebene Anzahl von Beispielen mit nicht standardmäßigen Werten im Feature aufweist. Die Transformation zur Auswahl von Features im Zählmodus ist nützlich, wenn sie zusammen mit einer kategorischen Hashtransformation angewendet wird (siehe auch categoricalHash). Die anzahlbasierte Auswahl von Features kann die von der Hashtransformation generierten Features entfernen, die in den Beispielen keine Daten enthalten.

Wert

Eine Zeichenfolge, die den Zählmodus definiert.

Autor(en)

Microsoft Corporation Microsoft Technical Support

Siehe auch

mutualInformation selectFeatures

Beispiele


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)

 # Use a categorical hash transform which generated 128 features.
 outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
 summary(outModel1)

 # Apply a categorical hash transform and a count feature selection transform
 # which selects only those hash features that has value.
 outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount())))
 summary(outModel2)

 # Apply a categorical hash transform and a mutual information feature selection transform
 # which selects those features appearing with at least a count of 5.
 outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount(count = 5))))
 summary(outModel3)