Data loading and model checkpointing are crucial to deep learning (especially distributed DL) workloads.
In Databricks Runtime 6.0 and above, Azure Databricks provides a high performance FUSE mount.
In Databricks Runtime 5.3 to Databricks Runtime 5.5, Azure Databricks provides
dbfs:/ml, a special folder that offers high-performance I/O for deep learning workloads, that maps to
file:/dbfs/ml on driver and worker nodes. Azure Databricks recommends using Databricks Runtime 5.3 or above and saving data under
/dbfs/ml. This FUSE mount also alleviates the local file I/O API limitation in Databricks Runtime of supporting only files smaller than 2GB.
If you use a Databricks Runtime version lower than 5.3 or you want to use your own storage, Databricks recommends that you use the blobfuse client, an open source project to provide a virtual filesystem backed by Azure Blob storage. To mount an Azure Blob storage container as a file system with blobfuse, you can use an init script. The following notebook explains how to generate an init script and configure a cluster to run the script.
blobfuse init script notebook