Data flow: toBase64 adds break lines after 76th character in specific ADF location

Andrii Danyl'tsiv 0 Reputation points
2024-04-26T08:01:34.9833333+00:00

Hi! Recently we've got unexpected outputs from ADF pipelines with no prior changes to the source code. Suddenly, the toBase64 function in data flow activities began to add break lines (\r\n) after the 76th character for output values. During the issue investigation, we had an assumption that it could be some Virtual Machine or SDK's specific thing, based on this we tried to play with the locations of ADF, and it worked out: If the location of ADF is West Europe then all good and break lines aren't added: imageif we change it to UK West then suddenly break lines start to appear:image

I am attaching the ARM template of ADF, containing the pipeline that uses the dataflow that converts input into base64. The payload is simple in JSON format and contains a single example property "ContactEmail". You will also need a Storage Account with container "testbase64" and blobs: "files/test1" that would be as input and "output/test1output" that would be as output.

Steps to reproduce:

  1. Create the Storage Account 1.1) Create container "testbase64" 1.2) Create blobs "files/test1", "output/test1output" in that container
  2. Create 2 ADFs: 2.1) Location = West Europe 2.2) Location = UK West
  3. Import ARM template into created ADFs
  4. Put a long enough value into "test1" file so that it exceeds 76 chars in output, like: {"ContactEmail":"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}
  5. Put empty "output/test1output" blob
  6. Run pipelines and compare results

As a solution, we are considering redeploying ADF into a different location. If you have some other options on your mind, we'd appreciate it.

Thanks in advance!

Arm template attached: ArmTemplate_0.txt

P.S: I filed an issue in the repo: https://github.com/Azure/Azure-DataFactory/issues/661 but I don't see any activity in other issues within the repo

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,606 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 15,676 Reputation points
    2024-04-27T13:12:26.2933333+00:00

    The problem is that Base64 adds the newline in order to be compatible with older systems that have a maximum line width.

    It won't happen only at the end of the encoding but rather every 76 characters resulting from the encoding. For many old programs that couldn't handle reading long lines, introducing a newline character was introduced every certain number of characters. In computer science argot this is called text wrapping. I guess that the number 76 of characters comes from the good practice of having code lines of at most 80 characters and having 2 per side as margin (although not sure why exactly two). This choose of 76 characters (or columns) comes from the standards in RFC2045 (page 19 paragraph 5) and is also a standard in the Linux command base64.

    More links :

    https://mathematica.stackexchange.com/questions/100509/why-base64-ends-with-a-newline-n

    https://superuser.com/questions/1225134/why-does-the-base64-of-a-string-contain-n