question

TinasheChinyati-4059 avatar image
0 Votes"
TinasheChinyati-4059 asked RajneeshKumar-1044 published

extract substring from a filename

Hi Guys, I am new in ADF and would like to extract a date time string in my filename to use as a date hierarchy in my sink to ADLS gen2. for example I have a filename as "A_ODP_20200914_094647_00.CSV" and I want the 20200914 which I will partition as Year=2020 folder, Month=09 and Day=14 dynamically as the files come in. Can anyone assist with how I can go about it. Thank you

azure-data-lake-storageazure-files
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

HimanshuSinha-MSFT avatar image
1 Vote"
HimanshuSinha-MSFT answered TinasheChinyati-4059 commented

Hello @TinasheChinyati-4059 ,

Thanks for the ask and using the forum . The below expression will work

 @concat(substring(split(variables('Filename'),'_')[2],0,4),'/',substring(split(variables('Filename'),'_')[2],4,2),'/',substring(split(variables('Filename'),'_')[2],6,2))

Just a quick inside as to what i am doing here .

  1. The date is assigned to the variable Filename , i did used A_ODP_20200914_094647_00.CSV to test .

  2. Split the name with "_" , and since it return an array , i have the data at index 2 .

  3. Use substring to get the year , month and day .

  4. I thought you may be need "\" as you talked about partitioning so used concat function .

The below animation should help .

42833-yeardaymonthissue.gif

Thanks
Himanshu
Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members



yeardaymonthissue.gif (213.2 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Himanshu

Thank you for your quick response. So just to spell out my entire problem, please guide me on the most effective way to do it. I am planning to copy multiple files from an on-prem storage to ADLS gen2. I used the last modified date to copy new files but because the last modified date changes when we sink to the ADLS gen2, i want to use that date time in my filename so we sink files with that stamp in the respective folders.
for example A_ODP_20200914_094647_00.CSV will be copied and sinked in the Year=2020, Month=09 and day=14 despite it being modified

A_ODP_20201126_094647_00.CSV will be sinked in Year=2020, Month=11 and Day=26 etc.
How do i integrate the variables into my copy pipeline so its done dynamically with every file that comes in? Thank you again

0 Votes 0 ·

The set variable activity should be able to dynamically get filename

0 Votes 0 ·
AyushChauhan-3425 avatar image
0 Votes"
AyushChauhan-3425 answered RajneeshKumar-1044 published

@HimanshuSinha-MSFT I do have the same issue can you assist me, how do I split the file name ' Amazon US Daily Sales Diagnostic_2021-05-10.xlsx'

Currently I'm using - @concat(split(variables('FileName'),'_')[1])

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @AyushChauhan-3425
You can try this :- @split(variables('FileName'),'_')[1]

0 Votes 0 ·