question

RyanAbbey-0701 avatar image
0 Votes"
RyanAbbey-0701 asked RyanAbbey-0701 commented

Copy Activity - mapping translator

According to copy-activity-schema-and-type-mapping, it is possible to define the schema for the copy activity. We have been using this process to alter column names in the sink which has been working well. However, we've noticed that all columns are being written to the sink (parquet) as string types, this is causing problems in the subsequent process when trying to write the "string" to a numeric field. To counter that, we've attempted to add the "type" to our mapping definition which caused the failure "Could not load file or assembly '10)' ", the likelihood of the obscure "10), is likely from our type definition being "decimal(18,10)" e.g.

          "source": {
             "name": "EXCHANGE_RATE_DIVISOR"
         },
         "sink": {
             "name": "EXCHANGE_RATE_DIVISOR",
             "type": "decimal(18,10)"
         }

  1. Is this the right way to alter the type?

  2. Should it be on the source or the sink?

  3. What are the correct type definitions?




azure-data-factory
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered RyanAbbey-0701 commented

Hello @RyanAbbey-0701 and welcome to Microsoft Q&A.

I pulled up some data and made a new dataset for parquet sink. Then in the mapping I tried several datatypes, including decimal. After it ran successfully, I opened up the code to see how decimal was mapped for parquet. Below is an excerpt.

 {
                                 "source": {
                                     "name": "Sep-20",
                                     "type": "Decimal",
                                     "physicalType": "String"
                                 },
                                 "sink": {
                                     "name": "Sep",
                                     "physicalType": "UTF8"
                                 }
                             },

It appears you do not need to specify the precision of the decimal.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Did you do that in Data Mapping Flow or Copy Activity? You effectively did what I did (I did in Copy Activity) to figure out where I was going wrong and it never gave me "physicalType" in my JSON definition, any idea what the difference between "type" and "physicalType" is? I had to manually change the JSON to give myself a "type" within the sink definition but it worked and gave me enough to figure out my dynamic requirements.

I'll accept this answer just to give others a base point to work from

0 Votes 0 ·

Thanks for marking as answer @RyanAbbey-0701 . Did I provide something you had not yet figured out?

I did it in Copy Activity. I didn't see your solution until after I posted. I don't have an absolute answer on the physicalType, but I have a theorey.

Suppose you are writing to a delimted text 5, "abc", 0.234 . You might have types integer, string, and float. However a text file is a text file, everything is physically encoded as characters, whether ASCII or UTF-8 , letter or number. Instead of demanding you make everything string, the character essence is described by physicalType.

This is different if your sink is a database. Databases are strongly typed, physically storing the data in different ways. As an integer, the number 5 is stored as 0101. Floats, decimals, etc. are stored in their appropriate representations.

Now this may seem obvious, but take a moment to think about the source side. For the delimited text, whether you tell it 5 is an integer or a string makes a difference in what transformation can be applied, and how the sink might interpret them. In this way, we can do some graceful type changes.

As for how the data is sent over the wire... that depends upon the services. The delited text is changed to text. I'm not sure how SQL wire protocol works, but I expect it to be more varied.

0 Votes 0 ·

No, was about the same as what I did, just marked as answer as you gave the indication of how the JSON should look which will be helpful to anyone looking for answers in future.

I was hoping that things could be controlled better via the type/physicalType differences but what you say makes sufficient sense not to look further

0 Votes 0 ·
RyanAbbey-0701 avatar image
0 Votes"
RyanAbbey-0701 answered MartinJaffer-MSFT commented

For the most part, have this resolved

While I haven't investigated sufficiently to identify exact reason why it wasn't working, it's looking like it is because the translator is case sensitive, at best guess, if it can't interpret the definition, it ignores the definition and does a straight copy.

When I changed "decimal" to "Decimal" and "datetime" to "DateTime", the translation worked... "Decimal" translates to Decimal(38,18), still to find out if I can adjust the precision but based on this not being an option in the mapping, I suspect not

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for keeping us updated @RyanAbbey-0701 . Glad to hear you figured out a solution. I haven't tried adjusting precision before. Let us know how it goes.

0 Votes 0 ·