question

KumarDinesh-3444 avatar image
0 Votes"
KumarDinesh-3444 asked KumarDinesh-3444 commented

Create External Table with Multiple CSV's in Azure Blob folder

Hi All

I am trying to create a single external tables in Azure Synapse. Files location is Azure blob, format CSV.
If I connect the external table with file location then it works perfectly but I connect it just folder locations then instead of showing me data from multiple files it returns empty table. There is no error but it is not returning data.

CREATE EXTERNAL TABLE [stage].[TableName]
(
[Columns] Datatype
)
WITH (LOCATION = '/Legacy/WEBXL',
DATA_SOURCE = [BlobStorage],
FILE_FORMAT = [CsvFormatPipeSeperated]
)

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PRADEEPCHEEKATLA-MSFT avatar image
0 Votes"
PRADEEPCHEEKATLA-MSFT answered PRADEEPCHEEKATLA-MSFT commented

Hello @KumarDinesh-3444,

Welcome to the Microsoft Q&A platform.

(UPDATE): I had tested with the folder contains two csv files.

107520-image.png

Here is the sample to create External Table with Multiple CSV's in Azure Blob folder.

 -- Create a database master key if one does not already exist, using your own password. This key is used to encrypt the credential secret in next step.
     CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo' ;
        
 -- Create a database scoped credential with Azure storage account key as the secret.
 CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredentialKumarDinesh
 WITH
   IDENTITY = 'SHARED ACCESS SIGNATURE',
   SECRET = 'gv7nVISeXXXXXXXXXXXXXXXXXXXXXXdlOiA==' ;
    
 -- Create an external data source with CREDENTIAL option.
 CREATE EXTERNAL DATA SOURCE MyAzureStorageKumarDinesh
 WITH
   ( LOCATION = 'wasbs://azure@chepra.blob.core.windows.net/' ,
     CREDENTIAL = AzureStorageCredentialKumarDinesh,
     TYPE = HADOOP
   ) ;
    
 --FILE FORMAT
   CREATE EXTERNAL FILE FORMAT KumarDinesh
   WITH
   (
       FORMAT_TYPE = DELIMITEDTEXT,
       FORMAT_OPTIONS (FIELD_TERMINATOR=',',
       FIRST_ROW = 2)
   );
    
   -- EXXTERNAL TABLE
    
   CREATE EXTERNAL TABLE KumarDinesh
   (
       ID INT,
       Name VARCHAR(12),
       Age INT
   )
   WITH (
       LOCATION = '/csvfiles/',
       DATA_SOURCE = MyAzureStorageKumarDinesh,
       FILE_FORMAT = KumarDinesh 
   )
    
   SELECT * from KumarDinesh

107568-image.png

Proof of completion:

107681-synapse-multicsv.gif


You can specify the pattern that the files must satisfy in order to be referenced by the external table. The pattern is required only for Parquet and CSV tables. If you are using Delta Lake format, you need to specify just a root folder, and the external table will automatically find the pattern.

You can create external tables that read data from a set of files placed on Azure storage:

 CREATE EXTERNAL TABLE Taxi (
      vendor_id VARCHAR(100) COLLATE Latin1_General_BIN2, 
      pickup_datetime DATETIME2, 
      dropoff_datetime DATETIME2,
      passenger_count INT,
      trip_distance FLOAT,
      fare_amount FLOAT,
      tip_amount FLOAT,
      tolls_amount FLOAT,
      total_amount FLOAT
 ) WITH (
          LOCATION = 'yellow/puYear=*/puMonth=*/*.parquet',
          DATA_SOURCE = nyctlc,
          FILE_FORMAT = ParquetFormat
 );

For more details, refer to Azure Synapse Analytics - External table on a set of files.

Hope this helps. Do let us know if you any further queries.


  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification.



image.png (17.1 KiB)
image.png (96.1 KiB)
image.png (87.3 KiB)
synapse-multicsv.gif (1001.0 KiB)
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@PRADEEPCHEEKATLA-MSFT Location ".csv" doesn't work (.csv not found error). If I used just the folder location like delta lake then it works but returns empty tables.
I don't have any pattern, all the files with same name (just year difference) as csv stored in one folder. so I tried "/MainFolderName/Foldername/.csv" as location. And it throws ".csv" not found error. If I specify the filename.csv from same location, it works fine.

Note I am using Data Source Type Hadoop (no other type works) with abfss blob location connectivity

0 Votes 0 ·

Hello @KumarDinesh-3444,

You can checkout the updated answer with the repro steps to create External Table with Multiple CSV's in Azure Blob folder.

Hope this helps.

0 Votes 0 ·

@PRADEEPCHEEKATLA-MSFT That's very strange , in your code you are using File_format =emp although you created file format as Azure.
I tried your code as well by passing Azure and Emp both as file format. emp returns error and Azure returns empty table

0 Votes 0 ·

Hello @KumarDinesh-3444,

I'm able to successfully run the query because the File_format =emp, azure has same file_format.

Now I was able to update the answer with the latest changes and added the proof of completion to confirm the above solution works to create external table with multiple csv files.


0 Votes 0 ·

@PRADEEPCHEEKATLA-MSFT Thanks for putting these efforts. Appreciate.
I have to use Managed service identity connection for Blob storage instead of shared key access.

CREATE EXTERNAL DATA SOURCE [AzureBlobStorage] WITH
(TYPE = HADOOP,
LOCATION = N'abfss://wre@XXXXXXXXXXX.blob.core.windows.net',
CREDENTIAL = [AzureStorageCredential])

Do you think that could be the issue. Everything else is exactly same. I tried your files as well and executed your script, the results are still same, returning empty table instead of data from Emp and Emp1.csv. But works fine if I put LOCATION = '/csvfiles/Emp.csv'

0 Votes 0 ·

Hello @KumarDinesh-3444,

Could you please share the script which you are running along with screenshot of the error message?

0 Votes 0 ·
KumarDinesh-3444 avatar image
0 Votes"
KumarDinesh-3444 answered KumarDinesh-3444 commented

@PRADEEPCHEEKATLA-MSFT : Hi Pradeep I am not getting any error just the table is returning 0 row. Below is the script I am using, both specifying the csv and just the folder location. Screenshot of output for both the external tables

CREATE EXTERNAL FILE FORMAT Emp
WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR=',',
FIRST_ROW = 2)
);

    -- EXXTERNAL TABLE
    --Connecting to folder
    CREATE EXTERNAL TABLE azure
    (
        ID INT,
        Name VARCHAR(12),
        Age INT
    )
    WITH (
        LOCATION = '/TestDK/' ,
        DATA_SOURCE = [IrrWreeBlobStorage],
        FILE_FORMAT = Emp    
    )
        
    --------------------------------------------
    --Connecting to file
    CREATE EXTERNAL TABLE csvazure
    (
        ID INT,
        Name VARCHAR(12),
        Age INT
    )
    WITH (
        LOCATION = '/TestDK/Emp.csv' ,
        DATA_SOURCE = [IrrWreeBlobStorage],
        FILE_FORMAT = Emp    
    )
        
    SELECT * from azure
    SELECT * from csvazure![108101-output.png][1]


[1]: /answers/storage/attachments/108101-output.png


output.png (26.4 KiB)
· 7
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Pradeep

Thanks for your help. I finally able to solve the problem with SSMS.

The only issue was using a dfs rather than blob in Managed service identity in location while setting up Data source

New LOCATION = 'abfss://Container@StorageAccount.dfs.core.windows.net'
Old LOCATION = 'abfss://Container@StorageAccount.Blob.core.windows.net'

1 Vote 1 ·

Hello @KumarDinesh-3444,

Glad to know that your issue has resolved. Now you can accept it as answer( by clicking on the 113724-image.png). This can be beneficial to other community members. Thank you.

0 Votes 0 ·

Hello @KumarDinesh-3444,

Thanks for the additional details.

Just to confirm, could you please run the same script from Synapse studio and see if the same behaviour exists?

0 Votes 0 ·

@PRADEEPCHEEKATLA-MSFT I don't have Synapse studio setup. I tried to setup one from last few days but couldn't due to permissions to the subscription. Is it possible for you to test your code on SSMS

0 Votes 0 ·

Hello @KumarDinesh-3444,

I had tested on above code on SSMS, I had received a error message stating the external data sources is not found.

That's the reason, I had asked you to try in Synapse studio. Please do try synapse studio to get the expected results as shown in the above answer.

0 Votes 0 ·

@PRADEEPCHEEKATLA-MSFT
Hi Pradeep
I tried with Data studio and Query editor as well , its the same issue that I am not getting any records from the csv files.

May I ask , if you are using Gen1 or gen 2 data lake

As I am using Gen2 , Managed service Identity access and using abfss drivers instead of wasbs

Do you think that could be an issue?

0 Votes 0 ·

Hello @KumarDinesh-3444,

I'm able to create external tables using both Azure Blob Storage, ADLS Gen1 & Gen2.

For more details, refer to Use external tables with Synapse SQL.

0 Votes 0 ·