question

NikhilKumar-4615 avatar image
0 Votes"
NikhilKumar-4615 asked SaurabhSharma-msft edited

Copying 100MB of data using ADF causes cosmos db partitions to increase from 2 to 50

We are copying data from West US 2 to North Europe using Azure Data factory. From Azure Data Factory, we see that around 104 MB of data was copied before the pipeline failed with Timeout Exception:

Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request timed out.\r\nActivityId: ebac173e-97a9-4012-85ae-6e86aabd6ab3, Request URI: /dbs/+HxXAA==/colls/+HxXAKY0To8=/docs, RequestStats: , SDK: documentdb-dotnet-sdk/2.5.1 Host/64-bit MicrosoftWindowsNT/6.2.9200.0,Source=Microsoft.Azure.Documents.Client,''Type=System.Threading.Tasks.TaskCanceledException,Message=A task was canceled.,Source=mscorlib

From metrics in cosmos db in north europe, we see the number of partitions increased from 2 to 50 for this copy activity. As a result, the RU per partition dropped considerably.

100 Mb of data should ideally not increase the number of partitions to 50. What might be going wrong here ?

azure-data-factoryazure-cosmos-db
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @nikhilkumar-4615,

Thanks for using Microsoft Q&A !!
What is your source for this copy activity ? Is this an existing pipeline or a new one which you have created and run move the data.

Thanks
Saurabh

0 Votes 0 ·

The source and sink are both cosmos db. The structure of the db/collections/docs etc are all exactly same. The source db is in west us 2 region while sink is in North Europe. It is an existing pipeline that has run fine before. Although we are running it again after few months. Also the data being copied is quite large compared to what it was before.

0 Votes 0 ·

1 Answer

SaurabhSharma-msft avatar image
1 Vote"
SaurabhSharma-msft answered SaurabhSharma-msft edited

Hi @nikhilkumar-4615,

There are two things that can increase the number of physical partitions:

• Data growth (50GB limit per physical partition)
• Increasing RUs (10K RUs limit per physical partition)

Clearly, data growth is not a possibility here. So, the only other possibility for the circumstances you have described below (partitions increasing) is that the RUs for the container were increased, either manually, or programmatically.

Important point to note: if RUs are increased to a level that requires 50 physical partitions to service the throughput, but then reduced back to previous levels, the number of physical partitions will not be merge/reduce to previous levels (“partition merge” is a feature on our roadmap, but not available currently). This kind of scenario could result in unexpectedly lower RU per partition.

One question though - Is the “structure of the db/collections/docs etc are all exactly same” in both regions ? If so, is there a particular reason that ADF is being used to copy data from one region to another, rather than simply using Cosmos DB’s multi-region replication?

If none of the above shed’s light on the issue, I would suggest you to raise a support case. In case you have any limitation filing a support ticket please let me know and I will provide a one-time free support ticket for you.

Thanks
Saurabh



· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the help @SaurabhSharma-msft

Increasing RU might be the reason. Before the copying the total RU was 12,800 and number of partitions were 2. It was increased to 50,000 manually. So it definitely sounds like a reason for partition splits. But even then shouldn’t the number of partitions have increased to 8 (splitting each partition in two halves as long as the total RU per partition is higher than 10k). Is this logic correct or are there other considerations too when splitting a partition?

This is not a customer issue. The reason for copying the data is compliance. We are copying data of only customer to the region of their domicile.Is there a better way to copy the data already existing other than using ADF ?

0 Votes 0 ·

I understand the physical partitions won't reduce even if we decrease the RU. Will the number of physical partitions reduce, if we were to delete data which was getting copied over ?

0 Votes 0 ·

@NikhilKumar-4615 I am checking on this one and get back to you.

Thanks
Saurabh

0 Votes 0 ·
Show more comments