Using the HDFS CLI with Data Lake Storage Gen2
You can access and manage the data in your storage account by using a command line interface just as you would with a Hadoop Distributed File System (HDFS). This article provides some examples that will help you get started.
HDInsight provides access to the distributed container that is locally attached to the compute nodes. You can access this container by using the shell that directly interacts with the HDFS and the other file systems that Hadoop supports.
If you're using Azure Databricks instead of HDInsight, and you want to interact with your data by using a command line interface, you can use the Databricks CLI to interact with the Databricks file system. See Databricks CLI.
Use the HDFS CLI with an HDInsight Hadoop cluster on Linux
#Connect to the cluster via SSH. ssh firstname.lastname@example.org #Execute basic HDFS commands. Display the hierarchy. hdfs dfs -ls / #Create a sample directory. hdfs dfs -mkdir /samplefolder
The connection string can be found at the "SSH + Cluster login" section of the HDInsight cluster blade in Azure portal. SSH credentials were specified at the time of the cluster creation.
HDInsight cluster billing starts after a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use. To learn how to delete a cluster, see our article on the topic. However, data stored in a storage account with Data Lake Storage Gen2 enabled persists even after an HDInsight cluster is deleted.
Create a container
hdfs dfs -D "fs.azure.createRemoteFileSystemDuringInitialization=true" -ls abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/
<container-name>placeholder with the name that you want to give your container.
<storage-account-name>placeholder with the name of your storage account.
Get a list of files or directories
hdfs dfs -ls <path>
<path> placeholder with the URI of the container or container folder.
hdfs dfs -ls abfs://email@example.com/my-directory-name
Create a directory
hdfs dfs -mkdir [-p] <path>
<path> placeholder with the root container name or a folder within your container.
hdfs dfs -mkdir abfs://firstname.lastname@example.org/
Delete a file or directory
hdfs dfs -rm <path>
<path> placeholder with the URI of the file or folder that you want to delete.
hdfs dfs -rmdir abfs://email@example.com/my-directory-name/my-file-name
Display the Access Control Lists (ACLs) of files and directories
hdfs dfs -getfacl [-R] <path>
hdfs dfs -getfacl -R /dir
Set ACLs of files and directories
hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>]
hdfs dfs -setfacl -m user:hadoop:rw- /file
Change the owner of files
hdfs dfs -chown [-R] <new_owner>:<users_group> <URI>
Change group association of files
hdfs dfs -chgrp [-R] <group> <URI>
Change the permissions of files
hdfs dfs -chmod [-R] <mode> <URI>
You can view the complete list of commands on the Apache Hadoop 2.4.1 File System Shell Guide Website.