Value mismatch using copy data activity in ADF

Ekambaram 1 Reputation point
2022-06-29T17:55:05.613+00:00

I am using copy data activity to load an xlsx file to a sql server table(columns with nvarchar datatype).

I find that for some random values there is a difference in the value for 14th decimal place(I have the value upto 14 decimal places and the last digit varies- there is no digit after the varying digit for rounding off) For example: xlsx source has 1.12345678901234 and sink has 1.12345678901235

This is not happening for all values but some, however consistently happening for the same value in any number of reloads.

I tried to convert to csv file using copy data activity, but still with the same output(1.12345678901235).

Could anyone advise on why its the case and if there are any ways to overcome it.

Thanks in advance.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,478 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,011 Reputation points
    2022-06-30T21:51:40.85+00:00

    Hello @Ekambaram ,
    Thanks for the question and using MS Q&A platform.

    It sounds like you are running into Floating-point errors. This is something inherent to the way computer store fractional values. It happens when the output of a calculation cannot be expressed or stored precisely. Wikipedia entry for more info -- see the section "Accuracy problems". So, here's what I think is happening:

    Excel stores (at rest) data as text, like SQL char, and unlike SQL int. However, the excel tells anything reading the file what data type it is intended as -- in this case a float or double or decimal or the like. So when it came time to read, Data Factory converted the number-as-text to interim type float or double. At this point, the value didn't have an exact representation, so it did the best it could. Then, it got rounded and converted back to text when put on SQL.

    To avoid this we need to tell Data Factory that column is text, not float. I think we can do this by editing the Excel Dataset's Schema. Change that column to String data type.

    Also see:
    https://en.wikipedia.org/wiki/Floating-point_error_mitigation

    Please do let me if you have any queries.

    Thanks
    Martin


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators