read_statestore table-valued function

Applies to: check marked yes Databricks Runtime 14.3 and above

Important

This feature is in Public Preview.

A table valued function for reading records from the state store of streaming queries. The returned relation only supports running as a batch query.

Syntax

read_statestore ( path [, option_key => option_value ] [ ... ] )

Arguments

read_statestore requires named parameter invocation for its optional parameters.

  • path: A STRING literal. The path of the streaming query checkpoint location.
  • option_key: The name of the option to configure. You need to use backticks (`) for options that contain dots (.).
  • option_value: A constant expression to set the option to. Accepts literals and scalar functions.

All option_values are case-insensitive.

option_key Type Default Description
batchId BIGINT latest batch id Represents the target batch to read from. This option is used when users want to perform time-travel. The batch should be committed but not yet cleaned up.
OperatorId BIGINT 0 Represents the target operator to read from. This option is used when the query is using multiple stateful operators.
storeName STRING 'DEFAULT' Represents the target side to read from. This option is used when users want to read the state from stream-stream join.
joinSide STRING 'None' Represents the target side to read from. This option is used when users want to read the state from stream-stream join. One of: 'Left', 'Right', 'None'.

Returns

The function returns a result set with the following columns.

Note

The nested columns for key and value heavily depend on the input schema of the stateful operator and the type of operator.

Name Data type Nullable Standard Description
id STRUCT No Key row of the stateful operator stored in the state checkpoint.
value STRUCT Yes Value row of the stateful operator stored in the state checkpoint.
partition_id INTEGER No The partition that contains the record.
shardId STRING No A unique identifier for the shard where the data was read from.
sequenceNumber BIGINT No The unique identifier of the record within its shard.
approximateArrivalTimestamp TIMESTAMP No The approximate time that the record was inserted into the stream.

Examples

- Read from state
> SELECT * FROM read_statestore('/checkpoint/path');

– Read from state with storeName option
> SELECT * FROM read_statestore(
      '/checkpoint/path',
      operatorId => 0,
      batchId => 2,
      storeName => 'default'
  );

– Read from state with joinSide option

> SELECT * FROM read_statestore(
      '/checkpoint/path',
      joinSide => 'left'
  );