Use curl to load data into HDFS on SQL Server Big Data Clusters

THIS TOPIC APPLIES TO:yesSQL Server noAzure SQL DatabasenoAzure SQL Data Warehouse noParallel Data Warehouse

This article explains how to use curl to load data into HDFS on SQL Server 2019 Big Data Clusters (preview).

Obtain the service external IP

WebHDFS is started when deployment is completed, and its access goes through Knox. The Knox endpoint is exposed through a Kubernetes service called gateway-svc-external. To create the necessary WebHDFS URL to upload/download files, you need the gateway-svc-external service external IP address and the name of your big data cluster. You can get the gateway-svc-external service external IP address by running the following command:

kubectl get service gateway-svc-external -n <big data cluster name> -o json | jq -r .status.loadBalancer.ingress[0].ip

Note

The <big data cluster name> here is the name of the cluster that you specified in the deployment configuration file. The default name is mssql-cluster.

Construct the URL to access WebHDFS

Now, you can construct the URL to access the WebHDFS as follows:

https://<gateway-svc-external service external IP address>:30443/gateway/default/webhdfs/v1/

For example:

https://13.66.190.205:30443/gateway/default/webhdfs/v1/

List a file

To list file under hdfs:///airlinedata, use the following curl command:

curl -i -k -u root:root-password -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/airlinedata/?op=liststatus'

Put a local file into HDFS

To put a new file test.csv from local directory to airlinedata directory, use the following curl command (the Content-Type parameter is required):

curl -i -L -k -u root:root-password -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/airlinedata/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'

Create a directory

To create a directory test under hdfs:///, use the following command:

curl -i -L -k -u root:root-password -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'

Next steps

For more information about SQL Server big data cluster, see What is SQL Server big data cluster?.