SAP Table Connector - RFCtableoption VS PartitionOption

Anonymous
2020-08-05T22:24:03.503+00:00

Dears,

Please help me to understand the difference between rfctableoption and Paritionoption of SAP table connector COPY activity.

Docs says that the rfctableoption is for Dataset filer in SAP and Paritionoption is for READ mechanism from SAP.

  • Extract data from 1-31st July.

Option 1: rfctableoption WHERE DATE GE 20200701 and DATE LE 20200731
Option 2: Paritionoption on column DATE and PartitionLowerbound 20200701 and PartitionUpperBound 20200731

Both options return exactly the same no of records. Since both serve the same purpose, what is the exact difference between these 2 options?

Please help me it has any performance impact on the extraction.

Thanks,

SAP HANA on Azure Large Instances
SAP HANA on Azure Large Instances
Microsoft branding terminology for an Azure offer to run HANA instances on SAP HANA hardware deployed in Large Instance stamps in different Azure regions.
119 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,525 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,021 Reputation points
    2020-08-06T21:16:08.023+00:00

    Hello anonymous user and thank you for your question.

    For a very small volume of data, you may not see a difference. For a larger volume the difference matters.

    Please let me explain to the best of my understanding.

    Suppose you have a table "OrdersReceived" growing at 20 million records per month. You want to do a detailed analysis of the past month's data, focusing on the trends of small orders over time. 20 million records is a lot and might take a while to download, especially if kept as a single file.

    In the RFC options, you specify to only pull orders with a value under $1000.

    To speed up the process, you would use Option 2 as you described, setting partition on date, and setting lower bound to month start, end upper bound to month end. Furthermore, you set the maximum number of partitions to 32. This causes up to 32 separate requests to be sent to SAP. Each request will fetch a portion of the date range specified. Each may be written to a separate file depending upon the copy activity sink details.
    In the Copy activity settings you also choose to set parallelism to 4. This causes Data Factory to pull 4 of the partitions at a time. This is a 4 fold improvement over option 1.

    If you had specified only 1 partition, then the parallelism would not help; This would be similar to option 1. If you had specified many partitions but only 1 parallelism, only 1 partition would be pulled at a time.

    In short, the partitions enable you to take advantage of parallelism. The RFC is applied to every partition. The partition chunks the data into more manageable sized pieces. Depending upon how you choose the partition, this can also be leveraged for improved performance in later steps.