question

JasonW-5564 avatar image
0 Votes"
JasonW-5564 asked HarithaMaddi-MSFT commented

How to query a 3rd party Azure Data Lake (2nd Gen) via Shared Access Siganture (SAS URL) in Azure Synapse?

I have been granted read only access to a Vendor's azure data lake and given a SAS connect string. How can I query this data via Azure Synapse and store the results in my own data lake?

fyi- The vendor's data is stored in JSON file format in this following folder structure YYYY/MM/DD/HH/(then hundreds of JSON files in each HH folder).

azure-synapse-analyticsazure-data-lake-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

HarithaMaddi-MSFT avatar image
0 Votes"
HarithaMaddi-MSFT answered HarithaMaddi-MSFT commented

Hi @JasonW-5564,

Welcome to Microsoft Q&A Platform. Thanks for posting the query.

The credential to data lake can be defined with SAS secret and below documents are tutorials explaining the loading using examples. Wild cards can be used to select all the JSON files under specific folder.

sql-data-warehouse-load-from-azure-data-lake-store
copy-into-transact-sql

Please suggest for further queries and we will be glad to assist.



  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification.




· 8
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for the reply, let me clarify what I am trying to do a bit more. I want to query and aggregate the raw JSON files stored in a 3rd party data lake and store those aggregates in my own data lake or relation db. I do not want to physically copy all of those raw JSON files because of the data volume, storage cost and introduce any un necessary latency. I am looking for how to do that/what is the best tool set to use for this.

0 Votes 0 ·

Hi @JasonW-5564,

Please suggest if approach posted in the other thread is useful in implementing such scenario.

Looking forward for your response!


0 Votes 0 ·

Thanks @HarithaMaddi-MSFT but neither of those seem to support JSON format? Seem to only support CSV, parquet and ORC (which I am not familiar with). Is there a way to do this?

0 Votes 0 ·

Hi @JasonW-5564,

Please refer this document that guides in reading JSON files into Synapse using CSV format which explains with examples.

Kindly let us know for further queries and we will be glad to assist.


0 Votes 0 ·

Thanks @HarithaMaddi-MSFT that is VERY close to what I need. In those examples they filter down to pointing to a specific .json file. I want to query across all .json files in a folder. For instance I want to use a path like this:

bulk 'latest/yyyy/mm/dd/HH/*.json',

Is this possible? If so is Synapse the only product to support this or can I do this in Azure SQL as well?



0 Votes 0 ·

Thanks @JasonW-5564 for confirmation that it matches the requirement. Yes, wild card expressions using * is possible to load all .json files in a folder. For loading JSON files into Azure SQL, please refer this document.

Please let us know for further queries and we will be glad to assist.

0 Votes 0 ·

Hi @JasonW-5564,

We have not received a response from you. Please suggest if above suggested approach is helpful. Otherwise, let us know and we will continue to engage with you on the query.

Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members.

0 Votes 0 ·
Show more comments