question

PalashAich-7056 avatar image
0 Votes"
PalashAich-7056 asked ErlandSommarskog answered

Data Type and Format In External table PolyBase

Hello there,

I am trying to query Azure Storage Account CDM csv file from SQL server on-prem. I created external table with all the data type as varchar. However, I need to display date time as datetime and need to remove double quote from string while displaying result. What should be the file format should I choose for the same. Please refer screenshot with current field values.

The file format used is below.
CREATE EXTERNAL FILE FORMAT [TextFile]
WITH
(FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = N',', USE_TYPE_DEFAULT = False))

I am aware there is DATE_FORMAT option, however, I tried with few and it is failing with varchar to datetime conversion error.

Current records that need to modify with right format.
85402-polybasefileformat.jpg


sql-server-generalazure-storage-accountsazure-blob-storageazure-filesazure-hdinsight
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered

You posted another question, which included a longer sample of the data. From that I could tell that there were more datetime values in that file - but the other fields had a different format than the two above in the screenshot. Since DATE_FORMAT is on table level, it seems that you lose here.

And in any case, the data source may be trusted to always have dates in the same format, or have correct dates.

I would recommend that you have the columns as varchar in the table. Then create a view on the top of table, where you do a try_convert to datetime2(0) with the appropriate format codes. (You find these these in the topic for CAST and CONVERT in Books Online.) You may also want to expose a raw character column in that view, so that you can easily analyse dates that do not convert.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
1 Vote"
ErlandSommarskog answered

Looking that the topic for CREATE EXTERNAL FILE FORMAT, I see that there is a STRING_DELIMITER option. I have never played with external file formats, but I would try this.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PalashAich-7056 avatar image
0 Votes"
PalashAich-7056 answered PalashAich-7056 commented

Hi @ErlandSommarskog

Thanks for your response. I tried with STRING_DELIMITER. It removed extra double quote, however, if I make the data type of a column to date time, query fails with conversion error. Any suggestions on this. If the field is varchar, it works.

The below format I am using.
CREATE EXTERNAL FILE FORMAT [CustomFormat]
WITH
(
FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS
(
FIELD_TERMINATOR = N','
, STRING_DELIMITER = N'"'
, DATE_FORMAT = N'yyyy-MM-dd HH:mm'
, USE_TYPE_DEFAULT = False
)
)

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Maybe I am saying something incredibly stupid now, but in your command you have specified a format like 2021-04-18 22:58, but in your screenshot you have a format which is radically different. There are no hyphens, but slashes, The year seems to be at the end of the date string, and what is day and month, no one knows. And the time part as one as AM marker at the end.

So, my guess is that you need to specify a date format that matches the file.

Then again, dates as string are notoriously difficult. You may be better mapping it to a string column, at least in the first run. You can then use the TRY_PARSE function to see if all values obeys to the same format. (It will return NULL, if the string does not parse.)

0 Votes 0 ·

Thanks @ErlandSommarskog

The format I mentioned under DATE_FORMAT anyway does not work. It does not give me error as in my external table datatype is VARCHAR. I am able to convert VARCHAR date field using DATETIME2 when I select the result from table. It returns me correct result, however, I would like to keep the field as DATETIME in the table to ensure data consumer avoid worrying of conversion again.

When I keep the data type as DATETIME, none of the date format worked. I tried with many formats. What should be the correct format should I use under DATE_FORMAT for the data above please?

Thanks,
Palash

0 Votes 0 ·