question

RajD-9527 avatar image
0 Votes"
RajD-9527 asked RajD-9527 commented

import json payload from a rest api and save as json documents in adls gen2

Hi, I am trying to import json payload from a REST api GET method and save json documents into ADLS Gen2 using azure databricks.
GET: https://myapi.com/api/v1/city

GET method Output:

    [
    {"id":2643743,
     "name":"London"},
    {"id":2643744,
     "name":"Manchester"}
    ]

Powershell:

    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
   
    $username = "user"
    $password = "password"
   
    $params = @{uri = 'https://myapi.com/api/v1/city';
                       Method = 'Get';
                       Headers = @{Authorization = 'Basic ' + [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes("$($username):$($password)"));
               } #end headers hash table
       } #end $params hash table
   
    $var = invoke-restmethod @params -ContentType "application/json".Content | ConvertTo-Json


Now, I'm stuck with how to save json document in Azure Data Lake Storage Gen2. Please guide me.

Thank you.

windows-server-powershellazure-databricksazure-data-lake-storage
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PRADEEPCHEEKATLA-MSFT avatar image
2 Votes"
PRADEEPCHEEKATLA-MSFT answered RajD-9527 commented

Hello,

Welcome to Microsoft Q&A platform.

You can use df.write.json API to write to any specific location as per your need.

Syntax:df.write.json('location where you want to save the json file')

Example:df.write.json("abfss://<file_system>@<storage-account-name>.dfs.core.windows.net/iot_devices.json")

Here are the steps to save the JSON documents to Azure Data Lake Gen2 using Azure Databricks.

Step1: You can use spark.read.json API to read the json file and create a dataframe.

Step2: The blob storage location can be mounted to a databricks dbfs directory, using the instructions in below doc

https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2

Step3: Then use the df.write.json API to write to the mount point, which will write to the blob storage

For more details, refer the below articles:

Azure Databricks – JSON files

Sample notebook: https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/adls-passthrough-gen2.html

15430-image.png

Hope this helps. Do let us know if you any further queries.


Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.




image.png (67.3 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @RajD-9527,
Just checking in to see if the above answer helped. If this answers your query, do click “Accept Answer” and Up-Vote for the same. And, if you have any further query do let us know.


0 Votes 0 ·

Hello @RajD-9527,
Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

0 Votes 0 ·
RajD-9527 avatar image RajD-9527 PRADEEPCHEEKATLA-MSFT ·

Hi PRADEEPCHEEKATLA-MSFT, Could we save json documents from the api call to ADLS Gen2 containers? Is there a difference between read or write to fileshare vs container.

  import requests  
  response = requests.get('https://myapi.com/api/v1/city',
                           auth=('user', 'password'))
  data = response.json()
  df=spark.read.json(data)
  df.write.json("abfss://<file_system>@<storage-account-name>.dfs.core.windows.net/city.json")

Error: java.lang.ArrayStoreException: java.util.HashMap

Thank you



0 Votes 0 ·
RichMatheisen-8856 avatar image
0 Votes"
RichMatheisen-8856 answered RajD-9527 edited

Is this what you're looking for?

data-lake-storage-directory-file-acl-powershell


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi RichMatheisen, Thanks very much for your response. I need a way to save json documents to ADLS Gen2 using Azure Databricks. Not sure how to accomplish in azure databricks.

0 Votes 0 ·