Filesystem operations on Data Lake Store using Java SDK

Learn how to use the Azure Data Lake Store Java SDK to perform basic operations such as create folders, upload and download data files, etc. For more information about Data Lake, see Azure Data Lake Store.

You can access the Java SDK API docs for Azure Data Lake Store at Azure Data Lake Store Java API docs.

Prerequisites

  • Java Development Kit (JDK 7 or higher, using Java version 1.7 or higher)
  • Azure Data Lake Store account. Follow the instructions at Get started with Azure Data Lake Store using the Azure portal.
  • Maven. This tutorial uses Maven for build and project dependencies. Although it is possible to build without using a build system like Maven or Gradle, these systems make is much easier to manage dependencies.
  • (Optional) And IDE like IntelliJ IDEA or Eclipse or similar.

Create a Java application

The code sample available on GitHub walks you through the process of creating files in the store, concatenating files, downloading a file, and deleting some files in the store. This section of the article walks you through the main parts of the code.

  1. Create a Maven project using mvn archetype from the command line or using an IDE. For instructions on how to create a Java project using IntelliJ, see here. For instructions on how to create a project using Eclipse, see here.

  2. Add the following dependencies to your Maven pom.xml file. Add the following snippet before the </project> tag:

     <dependencies>
       <dependency>
         <groupId>com.microsoft.azure</groupId>
         <artifactId>azure-data-lake-store-sdk</artifactId>
         <version>2.1.5</version>
       </dependency>
       <dependency>
         <groupId>org.slf4j</groupId>
         <artifactId>slf4j-nop</artifactId>
         <version>1.7.21</version>
       </dependency>
     </dependencies>
    

    The first dependency is to use the Data Lake Store SDK (azure-data-lake-store-sdk) from the maven repository. The second dependency is to specify the logging framework (slf4j-nop) to use for this application. The Data Lake Store SDK uses slf4j logging fa├žade, which lets you choose from a number of popular logging frameworks, like log4j, Java logging, logback, etc., or no logging. For this example, we disable logging, hence we use the slf4j-nop binding. To use other logging options in your app, see here.

  3. Add the following import statements to your application.

     import com.microsoft.azure.datalake.store.ADLException;
     import com.microsoft.azure.datalake.store.ADLStoreClient;
     import com.microsoft.azure.datalake.store.DirectoryEntry;
     import com.microsoft.azure.datalake.store.IfExists;
     import com.microsoft.azure.datalake.store.oauth2.AccessTokenProvider;
     import com.microsoft.azure.datalake.store.oauth2.ClientCredsTokenProvider;
    
     import java.io.*;
     import java.util.Arrays;
     import java.util.List;
    

Authentication

Create an Azure Data Lake Store client

Creating an ADLStoreClient object requires you to specify the Data Lake Store account name and the token provider you generated when you authenticated with Data Lake Store (see Authentication section). The Data Lake Store account name needs to be a fully qualified domain name. For example, replace FILL-IN-HERE with something like mydatalakestore.azuredatalakestore.net.

private static String accountFQDN = "FILL-IN-HERE";  // full account FQDN, not just the account name
ADLStoreClient client = ADLStoreClient.createClient(accountFQDN, provider);

The code snippets in the following sections contain examples of some common filesystem operations. You can look at the full Data Lake Store Java SDK API docs of the ADLStoreClient object to see other operations.

Create a directory

The following snippet creates a directory structure in the root of the Data Lake Store account you specified.

// create directory
client.createDirectory("/a/b/w");
System.out.println("Directory created.");

Create a file

The following snippet creates a file (c.txt) in the directory structure and writes some data to the file.

// create file and write some content
String filename = "/a/b/c.txt";
OutputStream stream = client.createFile(filename, IfExists.OVERWRITE  );
PrintStream out = new PrintStream(stream);
for (int i = 1; i <= 10; i++) {
    out.println("This is line #" + i);
    out.format("This is the same line (%d), but using formatted output. %n", i);
}
out.close();
System.out.println("File created.");

You can also create a file (d.txt) using byte arrays.

// create file using byte arrays
stream = client.createFile("/a/b/d.txt", IfExists.OVERWRITE);
byte[] buf = getSampleContent();
stream.write(buf);
stream.close();
System.out.println("File created using byte array.");

The definition for getSampleContent function used in the preceding snippet is available as part of the sample on GitHub.

Append to a file

The following snippet appends content to an existing file.

// append to file
stream = client.getAppendStream(filename);
stream.write(getSampleContent());
stream.close();
System.out.println("File appended.");

The definition for getSampleContent function used in the preceding snippet is available as part of the sample on GitHub.

Read a file

The following snippet reads content from a file in Data Lake Store account.

// Read File
InputStream in = client.getReadStream(filename);
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line;
while ( (line = reader.readLine()) != null) {
    System.out.println(line);
}
reader.close();
System.out.println();
System.out.println("File contents read.");

Concatenate files

The following snippet concatenates two files in the Data Lake Store account. If successful, the concatenated file replaces the two existing files.

// concatenate the two files into one
List<String> fileList = Arrays.asList("/a/b/c.txt", "/a/b/d.txt");
client.concatenateFiles("/a/b/f.txt", fileList);
System.out.println("Two files concatenated into a new file.");

Rename a file

The following snippet renames a file in Data Lake Store account.

//rename the file
client.rename("/a/b/f.txt", "/a/b/g.txt");
System.out.println("New file renamed.");

Get metadata for a file

The following snippet retrieves the metadata for a file in Data Lake Store account.

// get file metadata
DirectoryEntry ent = client.getDirectoryEntry(filename);
printDirectoryInfo(ent);
System.out.println("File metadata retrieved.");

Set permissions on a file

The following snippet sets permissions on the file that you created in the previous section.

// set file permission
client.setPermission(filename, "744");
System.out.println("File permission set.");

List directory contents

The following snippet lists the contents of a directory, recursively.

// list directory contents
List<DirectoryEntry> list = client.enumerateDirectory("/a/b", 2000);
System.out.println("Directory listing for directory /a/b:");
for (DirectoryEntry entry : list) {
    printDirectoryInfo(entry);
}
System.out.println("Directory contents listed.");

The definition for printDirectoryInfo function used in the preceding snippet is available as part of the sample on GitHub.

Delete files and folders

The following snippet deletes the specified files and folders in a Data Lake Store account, recursively.

// delete directory along with all the subdirectories and files in it
client.deleteRecursive("/a");
System.out.println("All files and folders deleted recursively");
promptEnterKey();

Build and run the application

  1. To run from within an IDE, locate and press the Run button. To run from Maven, use exec:exec.
  2. To produce a standalone jar that you can run from command-line build the jar with all dependencies included, using the Maven assembly plugin. The pom.xml in the example source code on github has an example.

Next steps