Catalog Class

Definition

Catalog interface for Spark. To access this, use SparkSession.Catalog.

public sealed class Catalog
type Catalog = class
Public NotInheritable Class Catalog
Inheritance
Catalog

Methods

CacheTable(String)

Caches the specified table in-memory.

Spark SQL can cache tables using an in-memory columnar format by calling CacheTable("tableName") or DataFrame.Cache(). Spark SQL will scan only required columns and will automatically tune compression to minimize memory usage and GC pressure. You can call UncacheTable("tableName") to remove the table from memory.

ClearCache()

Removes all cached tables from the in-memory cache. You can either clear all cached tables at once using this or clear each table individually using UncacheTable("tableName").

CreateTable(String, String)

Creates a table, in the hive warehouse, from the given path and returns the corresponding DataFrame. The table will contain the contents of the parquet file that is in the path parameter. The default data source type is parquet. This can be changed using CreateTable(tableName, path, source) or setting the configuration option spark.sql.sources.default when creating the spark session using Config("spark.sql.sources.default", "csv") or after you have created the session using Conf().Set("spark.sql.sources.default", "csv").

CreateTable(String, String, String)

Creates a table, in the hive warehouse, from the given path based from a data source and returns the corresponding DataFrame.

The type of file type (csv, parquet, etc.) is specified using the source parameter.

CurrentDatabase()

Returns the current database in this session. By default your session will be connected to the "default" database (named "default") and to change database either use SetCurrentDatabase("databaseName") or SparkSession.Sql("USE DATABASE databaseName").

DatabaseExists(String)

Check if the database with the specified name exists. This will check the list of hive databases in the current session to see if the database exists.

DropGlobalTempView(String)

Drops the global temporary view with the given view name in the catalog.

You can create global temporary views by taking a DataFrame and calling DataFrame.CreateOrReplaceGlobalTempView.

DropTempView(String)

Drops the local temporary view with the given view name in the catalog. Local temporary view is session-scoped. Its lifetime is the lifetime of the session that created it, i.e. it will be automatically dropped when the session terminates. It's not tied to any databases, i.e. we can't use db1.view1 to reference a local temporary view.

You can create temporary views by taking a DataFrame and calling DataFrame.CreateOrReplaceTempView.

FunctionExists(String)

Check if the function with the specified name exists. FunctionsExists includes in-built functions such as abs. To see if a built-in function exists you must use the unqualified name. If you create a function you can use the qualified name.

FunctionExists(String, String)

Check if the function with the specified name exists in the specified database. If you want to check if a built-in function exists specify the dbName as null or use FunctionExists(functionName).

GetDatabase(String)

Get the database with the specified name.

Calling GetDatabase gives you access to the hive database name, description and location.

GetFunction(String)

Get the function with the specified name. If you are trying to get an in-built function then use the unqualified name.

GetFunction(String, String)

Get the function with the specified name. If you are trying to get an in-built function then pass null as the dbName.

GetTable(String)

Get the table or view with the specified name. You can use this to find the tables description, database, type and whether it is a temporary table or not.

GetTable(String, String)

Get the table or view with the specified name in the specified database. You can use this to find the tables description, database, type and whether it is a temporary table or not.

IsCached(String)

Returns true if the table is currently cached in-memory. If the table is cached then it will consume memory. To remove the table from cache use UncacheTable or ClearCache

ListColumns(String)

Returns a list of columns for the given table/view or temporary view. The DataFrame includes the name, description, dataType, whether it is nullable or if it is partitioned and if it is broken in buckets.

ListColumns(String, String)

Returns a list of columns for the given table/view in the specified database. The DataFrame includes the name, description, dataType, whether it is nullable or if it is partitioned and if it is broken in buckets.

ListDatabases()

Returns a list of databases available across all sessions. The DataFrame contains the name, description and locationUri of each database.

ListFunctions()

Returns a list of functions registered in the current database. This includes all temporary functions. The DataFrame contains the class name, database, description, whether it is temporary and the name of each function.

ListFunctions(String)

Returns a list of functions registered in the specified database. This includes all temporary functions. The DataFrame contains the class name, database, description, whether it is temporary and the name of the function.

ListTables()

Returns a list of tables/views in the current database. The DataFrame includes the name, database, description, table type and whether the table is temporary or not.

ListTables(String)

Returns a list of tables/views in the specified database. The DataFrame includes the name, database, description, table type and whether the table is temporary or not.

RecoverPartitions(String)

Recovers all the partitions in the directory of a table and update the catalog. This only works for partitioned tables and not un-partitioned tables or views.

RefreshByPath(String)

Invalidates and refreshes all the cached data (and the associated metadata) for any Dataset that contains the given data source path. Path matching is by prefix, i.e. "/" would invalidate everything that is cached.

RefreshTable(String)

Invalidates and refreshes all the cached data and metadata of the given table. For performance reasons, Spark SQL or the external data source library it uses might cache certain metadata about a table, such as the location of blocks. When those change outside of Spark SQL, users should call this function to invalidate the cache. If this table is cached as an InMemoryRelation, drop the original cached version and make the new version cached lazily.

SetCurrentDatabase(String)

Sets the current default database in this session.

TableExists(String)

Check if the table or view with the specified name exists. This can either be a temporary view or a table/view.

TableExists(String, String)

Check if the table or view with the specified name exists in the specified database.

UncacheTable(String)

Removes the specified table from the in-memory cache.

Applies to