Microsoft.Spark.Sql Namespace

Classes

ArrowFunctions

Functions available for DataFrame operations.

Builder

The entry point to programming Spark with the Dataset and DataFrame API.

Column

Column class represents a column that will be computed based on the data in a DataFrame.

DataFrame

A distributed collection of data organized into named columns.

DataFrameFunctions

Functions available for a managed DataFrame.

DataFrameNaFunctions

Provides functionalities for working with missing data in DataFrame.

DataFrameReader

DataFrameReader provides functionality to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc).

DataFrameStatFunctions

Provides statistic functions for DataFrame.

DataFrameUdfRegistrationExtensions

Extension methods for UdfRegistration.

DataFrameWriter

Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc).

DataFrameWriterV2

Interface used to write a DataFrame to external storage using the v2 API.

Functions

Functions available for DataFrame operations.

GenericRow

Represents a row object in RDD, equivalent to GenericRow in Spark.

RelationalGroupedDataset

A set of methods for aggregations on a DataFrame.

Row

Represents a row object in RDD, equivalent to GenericRowWithSchema in Spark.

RuntimeConfig

Runtime configuration interface for Spark.

SparkSession

The entry point to programming Spark with the Dataset and DataFrame API.

StorageLevel

Flags for controlling the storage of an RDD. Each StorageLevel records whether to use memory, whether to drop the RDD to disk if it falls out of memory, whether to keep the data in memory in a JAVA-specific serialized format, and whether to replicate the RDD partitions on multiple nodes. Also contains static properties for some commonly used storage levels, MEMORY_ONLY.

UdfRegistration

Functions for registering user-defined functions.

UdfRegistrationExtensions

Extension methods for UdfRegistration.

Interfaces

IForeachWriter

Interface for writing custom logic to process data generated by a query. This is often used to write the output of a streaming query to arbitrary storage systems.

Enums

SaveMode

SaveMode is used to specify the expected behavior of saving a DataFrame to a data source.