DBFS CLIDBFS CLI

运行 Databricks DBFS CLI 命令,将其追加到 databricks fs(或别名 dbfs),并在所有 DBFS 路径前加上 dbfs:/You run Databricks DBFS CLI commands appending them to databricks fs (or the alias dbfs), prefixing all DBFS paths with dbfs:/.

databricks fs -h
Usage: databricks fs [OPTIONS] COMMAND [ARGS]...

  Utility to interact with DBFS. DBFS paths are all prefixed
  with dbfs:/. Local paths can be absolute or local.

Options:
  -v, --version
  -h, --help     Show this message and exit.

Commands:
  cat        Shows the contents of a file. Does not work for directories.
  configure
  cp         Copies files to and from DBFS.
    Options:
      -r, --recursive
      --overwrite     Overwrites files that exist already.
  ls         Lists files in DBFS.
    Options:
      --absolute      Displays absolute paths.
      -l              Displays full information including size and file type.
  mkdirs     Makes directories in DBFS.
  mv         Moves a file between two DBFS paths.
  rm         Removes files from DBFS.
    Options:
      -r, --recursive

对于列出、移动或删除超过 1 万个文件的操作,强烈建议不要使用 DBFS CLI。For operations that list, move, or delete more than 10k files, we strongly discourage using the DBFS CLI.

  • list 操作 (databricks fs ls) 会在大约 60 秒后超时。The list operation (databricks fs ls) will time out after approximately 60s.
  • move 操作 (databricks fs mv) 会在大约 60 秒后超时,可能导致只有一部分数据被移动。The move operation (databricks fs mv) will time out after approximately 60s, potentially resulting in partially moved data.
  • delete 操作 (databricks fs rm) 会以增量方式删除成批的文件。The delete operation (databricks fs rm) will incrementally delete batches of files.

建议你使用文件系统实用工具在群集的上下文中执行此类操作。We recommend that you perform such operations in the context of a cluster, using File system utilities. dbutils.fs 涵盖 DBFS REST API 的功能范围,但仅限笔记本内部。dbutils.fs covers the functional scope of the DBFS REST API, but from notebooks. 使用笔记本运行此类操作可提供更好的控制(例如选择性删除)和可管理性,并可自动执行定期作业。Running such operations using notebooks provides better control, such as selective deletes, manageability, and the possibility to automate periodic jobs.

将文件复制到 DBFSCopy a file to DBFS

dbfs cp test.txt dbfs:/test.txt
# Or recursively
dbfs cp -r test-dir dbfs:/test-dir

从 DBFS 复制文件Copy a file from DBFS

dbfs cp dbfs:/test.txt ./test.txt
# Or recursively
dbfs cp -r dbfs:/test-dir ./test-dir