question

sachingupta-1921 avatar image
0 Votes"
sachingupta-1921 asked ShaikMaheer-MSFT commented

Azure Data Factory merge 2 csv files with different schema

I am trying to merge the 2 csv files(in Azure data factory) which has different schema. Below is the scenario

CSV 1: 15 columns -> say 5 dimensions and 10 metrices(x1, x2,...x10) CSV 2: 15 columns -> 5 dimensions(same as above) and 10 metrices(different from above, y1, y2...y10) So my schema is different. Now I have to merge both CSV files so that only 5 dimensions comes with all 20 metrices.

I tried with Data Transformation using Select operation. That is giving me 2 rows in the merged file. One row with first 5 dimensions and 10 metrices and second row with next 5 dimensions and 10 metrices, which is incorrect as I am looking only for one row with 5 dimensions and all 20 metrics(x1,x2...x10, y1,y2...y10)

Any help is much appreciated on this issue

azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered ShaikMaheer-MSFT commented

Hi @sachingupta-1921 ,

Thank you for posting query in Microsoft Q&A Platform.

You need to perform join using join transformation on both files and then use select transformation to select only desired rows.

Please check below detailed example implementation.

Step1: I have two csv files csv1(columns are dim1,x1,x2,x3) & csv2(columns are dim1,y1,y2,y3). I am going to generate file final with columns as dim1,x1,x2,x3,y1,y2,y3.

130136-image.png

Step2: Added both csv files as source transformations in dataflow.

130174-source.gif

Setp3: Join transformation to join both csv file columns. If you don't have unique column to join then add surrogate key transformation on both sources and then join based that key column.

130137-join.gif

Step4: Select transformation to consider only required columns.

130215-selecttrans.gif

Hope this will help. Please let us know if any further queries. Thank you.


  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification.

  • If you are interested in joining the VM program and help shape the future of Q&A: Q&A Volunteer Moderators


image.png (161.6 KiB)
source.gif (291.4 KiB)
join.gif (506.4 KiB)
selecttrans.gif (421.0 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@ShaikMaheer-MSFT ...thanks for the prompt response. Appreciate it. The solution worked for me.

0 Votes 0 ·

@ShaikMaheer-MSFT ... I have one more question

II am calling the google analytics API and I have the refresh token which is valid for an year. I use the refresh token to generate the Access Token to authenticate the API call. Below is the scenario

1) I need to store the refresh token in Azure Key Vault -> How to do this?
2) Using a pipeline I want to retrieve the refresh token from Key vault, hit the api to generate the Access token and then store that access token also in Key Vault -> How to do this also.

I read couple of articles on Azure key vault, but did not get the clear understanding.
I appreciate any guidance on this.

Thanks,
Sachin

0 Votes 0 ·

Hi @sachingupta-1921 ,

Thank you for follow -up query.

You may store your refresh token in Azure Key Vault secret & using pipeline you may consider making API call to key vault to get secret value.

Please feel free to raise a new question on Microsoft Q&A Platform for this. Thank you.

0 Votes 0 ·