question

Raja-1642 avatar image
0 Votes"
Raja-1642 asked SaurabhSharma-msft edited

Copy data from Snowflake to CosmosDB

Hi Team,
For one of the requirement I need to copy data from Snowflake to Cosmos DB. Later I want to expose the data using Rest API on top of Cosmos DB (built in feature). But that's for later. Right now my problem is to copy the data.

Since data is simple and does not require much transformation I thought it should be a simple thing to do using ADF. So I plan to use a ADF pipeline and inside pipeline I plan to use Copy Data Activity.

The data in the Snowflake(The source) looks like,

110295-image.png

And the data in the Cosmos DB should look like as below,

{
"id": "123",
"String Col": "Some String",
"Some Ints": [
123,
234
],

After configuring all the basic steps like creating linked service, mapping the columns etc. when I try to publish the pipeline it gives an error,

Direct copying data from Snowflake is only supported when sink dataset is DelimitedText, Parquet or JSON with Azure Blob Storage linked service, for other dataset or linked service, please enable staging

110220-image.png


Question 1: Is the copying from Snowflake to Cosmos DB not supported?

As I understood the answer to above question is "not supported" and to copy the data between these two we need to use an option "staging". Which means a blob storage would be used as temporary staging storage.

Question 2: Is my above understanding correct?

If answer to question #2 is correct then I went ahead and thought to use a existing blob storage. It gives me other error,

110249-image.png

Question 3: I really do not understand what should we do now? There's no concrete example as how to retrieve the Sas url to the blob. Does this url need to be pointing to an existing file location?

Question 4: Doing all this to copy just a simple table, is it worth doing it all this? Is this a good solution?


Helps is highly appreciated.


azure-data-factoryazure-cosmos-db
image.png (2.4 KiB)
image.png (9.6 KiB)
image.png (7.5 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

SaurabhSharma-msft avatar image
1 Vote"
SaurabhSharma-msft answered SaurabhSharma-msft edited

Hi @Raja-1642,


Question 1: Is the copying from Snowflake to Cosmos DB not supported? Question 2: Is my above understanding correct?
No, it is not supported currently and direct copying from Snowflake to sink is supported (With shared access signature authentication) if the sink is a Azure Blob Storage and sink dataset format is of Parquet, delimited text, or JSON. Please refer to the documentation.

Question 3: I really do not understand what should we do now? There's no concrete example as how to retrieve the Sas url to the blob. Does this url need to be pointing to an existing file location?
You need to enable staging as stated in the error message. In order to retrieve SAS URL for the blob you need to go to your blob storage account on azure portal and navigate to "Shared Access Signature" blade. You can configure the required set of permissions this SAS token could contain by selecting "Allowed Services" as Blob/File/Queue/Table. In your case you can select Blob as a service for staging.
110433-image.png

You can also define which allowed resource types which could be Service/Container/Object. See below for details around the same.

Service (s): Access to service-level APIs (e.g., Get/Set Service Properties, Get Service Stats, List Containers/Queues/Tables/Shares)
Container (c): Access to container-level APIs (e.g., Create/Delete Container, Create/Delete Queue, Create/Delete Table, Create/Delete Share, List Blobs/Files and Directories)
Object (o): Access to object-level APIs for blobs, queue messages, table entities, and files(e.g. Put Blob, Query Entity, Get Messages, Create File, etc.)

Select Allowed Permissions as Read/Write/Delete/List etc.
You can select multiple options from above setting as per your requirement.

For staging (Read/Write) should be enough.
You can then generate the SAS tokens and use the "SAS Token" from the blade as highlighted above and use it in your linked service like below -
110422-image.png

Hope this helps. Please let me know if you have any questions.

Please refer to the documentation - Staged Copy from Snowflake for any other additional details.

Thanks
Saurabh


Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.




image.png (134.9 KiB)
image.png (26.1 KiB)
image.png (132.3 KiB)
image.png (136.3 KiB)
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @SaurabhSharma-msft ,
I really appreciate that you provided detailed answer. Thanks for this!!.

Interestingly when I went through the documentation you suggested, I found an important point,

If your sink data store and format meet the criteria described in this section, you can use the Copy activity to directly copy from Snowflake to sink

For JSON format, direct copy only supports the case that source Snowflake table or query result only has single column and the data type of this column is VARIANT, OBJECT, or ARRAY.

Now if you look at the data that I want to copy(in my first post) has multiple columns (its actually a view resulting into say 10 columns).

!
Question 1: So in that case this 'Copy Activity' won't work?

Question 2: If no, what is the alternate solution to this?
a. Can we do something like : Query from Snowflake -> format-> dump to cosmosdb .... as 3 step pipeline? How can we achieve this?
b. Any better automated approach?

Appreciate any help!

0 Votes 0 ·

Hi @Raja-1642 ,

The statement you are referring to in the document is meant for direct copy - when you are copying from Snowflake to Azure Blob storage. However, in your case when you are going to use the blob storage as staging then it should work fine with multiple columns. I have tried using the staging approach (using SAS token with Resource Type as (Container and object) and I could be able to see the row with all columns in CosmosDB. Please check the below screenshot.
Source data
110774-image.png

Staging Settings
110764-image.png

Pipeline Run Details
110755-image.png

Target
110756-image.png

Please try using the staging option and let me know if you see any issues and I would be happy to help you on the same.

Thanks
Saurabh


0 Votes 0 ·
image.png (57.3 KiB)
image.png (39.1 KiB)
image.png (39.4 KiB)
image.png (87.6 KiB)

Hi @SaurabhSharma-msft,

Thank you very much for clarifying this. Probably I'll go ahead with the approach you have suggested.
But I feel if I'm going with this approach I'm using an extra storage resource. '

My scenario is simple :
1. I want to transfer data (1 view with say 10 columns) from Snowflake to CosmosDB. REST API to be used on top of Cosmos DB.
2. This batch we want to run say once in a day.
3. The data is not so huge



in this scenario what do you suggest is the best approach?
a. the approach I'm using copy activity?
b Using 'data flow' activity where I transfer the columnar data to JSON using expression language?
c. Any other approach .. may be using functions? or anything you want to suggest?





0 Votes 0 ·
Show more comments