Writing binary data to ADLS in Synapse notebook

Ljubo Jurkovic 66 Reputation points
2024-04-18T19:40:19.28+00:00

Hi all,

Is it possible to write binary data to the ADLS in Synapse notebook using Pyspark? I retrieve data via an API call where the content type is application/excel (basically binary data) and want to save it to the ADLS, a specific location like this:

abfss://<container_name>@full_storage_account_name/<path_to_file>

It seems like a trivial task, but I can't find any examples of this.

Regards,

Ljubo

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,383 questions
{count} votes

Accepted answer
  1. Vinodh247-1375 11,206 Reputation points
    2024-04-26T08:26:13.2966667+00:00

    Hi Ljubo Jurkovic,

    Thanks for reaching out to Microsoft Q&A.

    Glad to know that you were able to figure this out and thanks for sharing the rootcause. By the way, Microsoft Q&A community has a policy that "The question author cannot accept their own answer, they can only accept answers by others.". Hence request you to accept this as an answer so we can close this thread. I am summarizing the issue and root cause you have found for it.

    Issue Summary:

    Write binary data to the ADLS in Synapse notebook using Pyspark? I retrieve data via an API call where the content type is application/excel (basically binary data) and want to save it to the ADLS, a specific location like this:

    abfss://<container_name>@full_storage_account_name/<path_to_file>

    Solution:

    OP finally informed that he can't save the data as parquet because the original content format won't be preserved. He decided to use the Azure Function to do the API calls and save the content to the ADLS followed by the data next steps.

    Workaround shared by op:

    used C# libraries to issue the API call and save the content as a blob in the ADLS. var client = new HttpClient(); var accessToken = await GetToken(); var request = new HttpRequestMessage(HttpMethod.Get, url); request.Headers.Add("Authorization", "Bearer " + accessToken); var response = await client.SendAsync(request); response.EnsureSuccessStatusCode(); var blobContainerClient = new BlobContainerClient(connectionString, container); var responseContent = await response.Content.ReadAsByteArrayAsync(); Stream responseContentStream = new MemoryStream(responseContent); string filePath = Path.Combine(folder, surveyName + ".xlsx").Replace("", "/"); var blobClient = blobContainerClient.GetBlobClient(filePath); await blobClient.UploadAsync(responseContentStream, overwrite: true); published the Azure Function and call it from the Synapse pipeline using the Web activity.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more