IDF Class

Definition

Inverse document frequency (IDF). The standard formulation is used: idf = log((m + 1) / (d(t) + 1)), where m is the total number of documents and d(t) is the number of documents that contain term t.

This implementation supports filtering out terms which do not appear in a minimum number of documents (controlled by the variable minDocFreq). For terms that are not in at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.

public class IDF : Microsoft.Spark.ML.Feature.FeatureBase<Microsoft.Spark.ML.Feature.IDF>
type IDF = class
    inherit FeatureBase<IDF>
Public Class IDF
Inherits FeatureBase(Of IDF)
Inheritance

Constructors

IDF()

Create a IDF without any parameters

IDF(String)

Create a IDF with a UID that is used to give the IDF a unique ID

Methods

Clear(Param)

Clears any value that was previously set for this Microsoft.Spark.ML.Feature.Param. The value is reset to the default value.

(Inherited from FeatureBase<T>)
ExplainParam(Param)

Returns a description of how a specific Microsoft.Spark.ML.Feature.Param works and is currently set.

(Inherited from FeatureBase<T>)
ExplainParams()

Returns a description of how all of the Microsoft.Spark.ML.Feature.Param's that apply to this object work and how they are currently set.

(Inherited from FeatureBase<T>)
Fit(DataFrame)

Fits a model to the input data.

GetInputCol()

Gets the column that the IDF should read from

GetMinDocFreq()

Minimum of documents in which a term should appear for filtering

GetOutputCol()

The IDF will create a new column in the DataFrame, this is the name of the new column.

GetParam(String)

Retrieves a Microsoft.Spark.ML.Feature.Param so that it can be used to set the value of the Microsoft.Spark.ML.Feature.Param on the object.

(Inherited from FeatureBase<T>)
Load(String)

Loads the IDF that was previously saved using Save

Save(String)

Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala.

(Inherited from FeatureBase<T>)
Set(Param, Object)

Sets the value of a specific Microsoft.Spark.ML.Feature.Param.

(Inherited from FeatureBase<T>)
SetInputCol(String)

Sets the column that the IDF should read from

SetMinDocFreq(Int32)

Minimum of documents in which a term should appear for filtering

SetOutputCol(String)

The IDF will create a new column in the DataFrame, this is the name of the new column.

ToString()

Returns the JVM toString value rather than the .NET ToString default

(Inherited from FeatureBase<T>)
Uid()

The UID that was used to create the object. If no UID is passed in when creating the object then a random UID is created when the object is created.

(Inherited from FeatureBase<T>)

Applies to