How to store an array in a json file in ADF
I have a web activity with url drive/drive-id/items/folder-id/children then I have a filter activity that filters all folders and then I'm appending it in an array. Now what I want is to store all folder ids in a json file. and later I will load that file using lookup activity and will iterate those folders and will download the files inside it. Please give me an approach. And all of this will be done in a loop so need to create new file every time
Azure Data Factory
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-05-17T06:05:55.45+00:00 Welcome to the Microsoft Q&A platform and thank you for posting your question here.
Here's an approach to achieve your goal of storing folder IDs in a JSON file, iterating through them, and downloading files within each folder using Azure Data Factory (ADF):
1. Define Variables:
Create three variables:
-
folderId
: This will store the current folder ID being processed. -
folderArray
: This will be an array variable to hold all the folder IDs. -
jsonString
: This will store the final JSON string containing the folder IDs.
2. Loop through Folders:
- Use a For Each activity to iterate through the output of your web activity.
- Inside the loop, access the current folder ID using the dynamic content expression
@item('folderId')
. - Use a Conditional Split activity to filter only folders. You can achieve this by adding a condition like
@item('isFolder') == true
.
3. Capture Folder IDs:
Inside the success branch of the Conditional Split activity, add a Set Variable activity.
- Set the variable name to
folderId
and its value to the current folder ID (@item('id')
). - Now, in the same Set Variable activity, append the current folder ID to the
folderArray
variable. You can use the expressionconcat(variables('folderArray'), @item('id'))
for this. Make sure to handle the initial case wherefolderArray
is empty (use an if-else condition or a ternary operator).
4. Convert Array to JSON:
After the loop, add another Set Variable activity.
- Set the variable name to
jsonString
and its value to the string representation of thefolderArray
. - Use the function
json(variables('folderArray'))
to convert the array to a JSON string.
5. Write JSON to File:
- Use a Data Factory Expression activity to create the desired JSON structure. This structure can include the folder IDs as an array.
- Use a Web Activity (or another suitable activity depending on your storage) to write the content of the expression activity (containing the JSON structure) to a file. You can use the
variables('jsonString')
within the expression to include the folder IDs.
6. Download Files (Separate Loop):
- Create another loop (For Each activity) to iterate through the
jsonString
. - Inside the loop, parse the JSON string using a Data Factory Expression activity with the
json(variables('jsonString'))
function. - Access the folder IDs array within the parsed JSON.
- Use another Web Activity (or suitable activity) to download files within each folder ID. You can use dynamic content from the parsed JSON to construct the download URL for each file.
Hope this helps. Do let us know if you any further queries.
-
-
Mansi Yadav 40 Reputation points
2024-05-17T08:46:20.02+00:00 Sorry It's not clear. My array is ready with the folder ids now what are the further steps to store it in a json file?
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-05-20T06:34:41.2866667+00:00 @Mansi Yadav Since you already have the folder IDs stored in an array variable (
folderArray
) within your Azure Data Factory pipeline, here are the steps to store it in a JSON file:- Convert Array to JSON String:
- Use a Set Variable activity to create a new variable named
jsonString
. - Set the value of
jsonString
to the JSON representation of thefolderArray
. - Use the expression
json(variables('folderArray'))
to achieve this conversion.
- Use a Set Variable activity to create a new variable named
- Create JSON Structure (Optional):
- If you want additional information besides folder IDs in your JSON file, use a Data Factory Expression activity.
- Within the expression, create the desired JSON structure using functions like
{}
for objects and[]
for arrays. - Include the
variables('folderArray')
within the structure to embed the folder IDs array.
- Write JSON to File:
Use a Web Activity (or another suitable activity depending on your storage) to write the JSON content to a file.
In the web activity configuration:
- Set the HTTP method to POST (or PUT if you want to overwrite existing files).
- Set the URL to the location where you want to store the JSON file (e.g., Azure Blob Storage endpoint).
- In the body of the request, use the output from the previous activity (either
variables('jsonString')
directly if it contains the complete JSON structure, or the output of the Data Factory Expression activity).
Here's an example expression for the Data Factory Expression activity if you want a simple JSON structure with just the folder IDs array:
Code snippet
{ "folderIds": json(variables('folderArray')) }
This expression creates a JSON object with a key named "folderIds" containing the folder IDs array.
Choose the appropriate activity based on your storage destination:
- Azure Blob Storage: Use a Web Activity.
- Azure Data Lake Storage (ADLS Gen1 or Gen2): Use a Data Lake Storage Gen1/Gen2 activity.
- Other Storage Locations: Consult ADF documentation for supported activities.
Configure the chosen activity:
a. HTTP Method: Set to POST (or PUT if you want to overwrite existing files).b. URL: Set to the location where you want to store the JSON file (e.g., Azure Blob Storage endpoint).c. Body:
- If the
jsonString
variable holds the complete JSON structure, usevariables('jsonString')
directly. - If you used a Data Factory Expression activity, use its output.
Example Web Activity Configuration (Azure Blob Storage):
- HTTP Method: POST
- URL:
https://<your-storage-account>.blob.core.windows.net/<container-name>/<filename>.json
(replace placeholders with your details) - Headers: Set appropriate headers for authentication (e.g., Authorization: Bearer <your-access-token>)
- Body:
@{variables('jsonString')}
(assumingjsonString
contains the complete JSON structure) - Choose the appropriate activity based on your storage destination:
- Azure Blob Storage: Use a Web Activity.
- Azure Data Lake Storage (ADLS Gen1 or Gen2): Use a Data Lake Storage Gen1/Gen2 activity.
- Other Storage Locations: Consult ADF documentation for supported activities.
Remember to configure your web activity with appropriate authentication and authorization settings based on your storage location.
By following these steps, you'll successfully convert your folder ID array to a JSON string and store it in a designated file within your chosen storage platform.
- Convert Array to JSON String:
-
Mansi Yadav 40 Reputation points
2024-05-20T07:10:09.3933333+00:00 For azure data lake storage, can't we use web activity? And you have mentioned :- Azure Data Lake Storage (ADLS Gen1 or Gen2): Use a Data Lake Storage Gen1/Gen2 activity. Which activity I can use for ADLS. Let's say I have a adls container hakuna where I have a folder logs now I want to store that json file inside logs folder in my adls container. How can I achieve this
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-05-21T10:06:38.9033333+00:00 @Mansi Yadav Yes, you can use a
Web Activity
for Azure Data Lake Storage (ADLS), but it might require additional setup for authentication and might not be as straightforward as using the dedicated activities for ADLS.So Here's the approach using Data Lake Storage Gen1/Gen2 activity for your specific scenario:
Steps:
Data Lake Storage Gen1/Gen2 Activity:
Drag and drop this activity into your ADF pipeline after processing your folder IDs.
Configure the activity:
- Name: Give it a descriptive name (e.g., WriteFolderIdsToADLS).
- Connection: Select your existing ADLS Gen1 or Gen2 linked service.
- Operation: Choose "Append" (if you want to add the JSON file to the existing ones in the logs folder) or "Create" (if you want to overwrite any existing file).
- File path: Set the path to the desired location within the logs folder. Here's the format:
Replace placeholders with your details (e.g., adl://contosoadls.azuredatalake.net/hakuna/logs/folderIds.json).adl://<your-adls-account-name>.azuredatalake.net/<container-name>/logs/<filename>.json
- Data:
- If your
jsonString
variable holds the complete JSON structure, use@variables('jsonString')
directly. - If you used a Data Factory Expression activity to create the JSON structure, use its output here.
- If your
Trigger the Activity:
- Connect the previous activity (where you prepared the JSON string) to the Data Lake Storage Gen1/Gen2 activity using an arrow to initiate the write operation when the pipeline reaches that point.
Example Configuration:
Assuming
jsonString
holds the complete JSON structure:- Activity Name: WriteFolderIdsToADLS
- Connection: MyADLSGen2LinkedService (replace with your linked service name)
- Operation: Append
- File path: adl://contosoadls.azuredatalake.net/hakuna/logs/folderIds.json
- Data: @variables('jsonString')
-
phemanth 6,885 Reputation points • Microsoft Vendor
2024-05-23T04:51:05.2766667+00:00 @Mansi Yadav We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Sign in to comment