Blobfuse is an open source project developed to provide a virtual filesystem backed by the Azure Blob storage.
- Mount a Blob storage container on Linux
- Basic file system operations such as mkdir, opendir, readdir, rmdir, open, read, create, write, close, unlink, truncate, stat, rename
- Local cache to improve subsequent access times
- Parallel download and upload features for fast access to large blobs
- Allows multiple nodes to mount the same container for read-only scenarios.
You can install blobfuse from the Linux Software Repository for Microsoft products. The process is explained in the blobfuse installation page. Alternatively, you can clone this repository, install the dependencies (fuse, libcurl, gcrypt and GnuTLS) and build from source code. See details in the wiki and the GitHub Repo.
Blobfuse and Data Science Virtual Machine
Blobfuse is already installed on the Ubuntu DSVM. To use it, create a configuration file /opt/blobfuse.cfg as described https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux
Once you have installed blobfuse, configure your account credentials either in the template provided in blobfuse folder (connection.cfg), or in the environment variables. For brevity, let's use the environment variables:
export AZURE_STORAGE_ACCOUNT=myaccountname export AZURE_STORAGE_ACCESS_KEY=myaccountkey
Then mount your blob storage on the VM:
Use of a high performance disk, or ramdisk for the local cache is recommended. In Azure VMs, this is the ephemeral disk which is mounted on /mnt in Ubuntu, and /mnt/resource in RHEL. Please make sure that your user has write access to this location. If not, create and
chown to your user.
sudo mkdir /images sudo mkdir /mnt/blobfusecache
chown -R <your-user-account> /images chown -R <your-user-account> /mnt/blobfusecache/
Create your mountpoint (
mkdir /path/to/mount) and mount a Blob container (must already exist) with blobfuse:
blobfuse /images --tmp-path=/mnt/blobfusecache -o big_writes -o max_read=131072 -o max_write=131072 -o attr_timeout=240 -o fsname=blobfuse -o entry_timeout=240 -o negative_timeout=120 --config-file=/opt/blobfuse.cfg
NOTE Use absolute paths for directory paths in the command. Relative, and shortcut paths (~/) do not work. Blobfuse does not support multiple writers to a single blob, so you will need to guarantee that the file names generated during the extraction part are unique.
For more information, see the wiki
Interested in Data Engineering
Check out the Data Engineering learning resources at Microsoft learn