快取 (Azure Databricks 上的 Delta Lake)CACHE (Delta Lake on Azure Databricks)

快取差異快取中指定之簡單查詢所存取的資料 SELECTDelta cacheCaches the data accessed by the specified simple SELECT query in the Delta cache. 您可以藉由提供資料行名稱的清單,並提供述詞來選擇資料列的子集,以選擇要快取的資料行子集。You can choose a subset of columns to be cached by providing a list of column names and choose a subset of rows by providing a predicate. 這可讓後續的查詢盡可能避免掃描原始檔案。This enables subsequent queries to avoid scanning the original files as much as possible. 此結構只適用于 Parquet 資料表。This construct is applicable only to Parquet tables. 此外也支援 Views,但是擴充的查詢會限制為簡單查詢,如上所述。Views are also supported, but the expanded queries are restricted to the simple queries, as described above.

SyntaxSyntax

CACHE SELECT column_name[, column_name, ...] FROM table_identifier [ WHERE boolean_expression ]

如需 RDD 快取和 Databricks IO 快取之間的差異,請參閱差異 和 Apache Spark 快取。See Delta and Apache Spark caching for the differences between the RDD cache and the Databricks IO cache.

  • table_identifiertable_identifier
    • [database_name.] table_name:資料表名稱(選擇性地使用資料庫名稱限定)。[database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.<路徑到資料表 的>:現有 Delta 資料表的位置。 delta. : The location of an existing Delta table.

範例Examples

CACHE SELECT * FROM boxes
CACHE SELECT width, length FROM boxes WHERE height=3