question

RyanJames-9249 avatar image
2 Votes"
RyanJames-9249 asked vmteppo commented

Proper Pagination Format for Rest API with Azure Data Factory

I'm trying to use the Azure Data Factory to copy data from a REST API into an Azure SQL Database.

The next page is returned as a Link Header formatted as below:

</well-rollups?action=next&countyparish=TYLER+%5C%28TX%5C%29&deleteddate=null&next_page=api14+%3C+%2742457308280200%27&pagesize=100&state=TX>; rel="next"

How do I format this into the Pagination Rules in the copy data activity?

Thanks,

azure-data-factory
· 8
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I am having the exact same issue, and oddly enough it looks like I'm using the exact same API (@RyanJames-9249: DI V2?).

@MartinJaffer-MSFT the vendor's API uses RFC-5988/RFC-8288 web-linking headers to handle pagination. For example:

Link: </permits?action=next&api10=ne%28nil%29&api10=ne%28UNKNOWN%29&deleteddate=nil&next_page=permitid+%3C+2068234&pagesize=10>; rel="next"


The relative URL value is contained within the first angle-brackets-block: </permits?...>; ... and the link type is contained in the rel tag -- in this case it's a link to the next page.

The problem with using the value in the activity pagination rule is that it seemingly naively uses the entire string rather than just plucking the relative URL.

I haven't been able to find a way to handle this with dynamic content, nor have I been able to find a way to parse the response headers in another activity/loop.

1 Vote 1 ·

Yes, @vince-6126 I'm working with DIV2 API too. Haven't made any breakthroughs on fixing this yet.

0 Votes 0 ·

Hello @RyanJames-9249 and welcome to Microsoft Q&A.

The sample you provided looks like either xml or html (because it starts with </ ... > ). The REST dataset type expects JSON response. Pagination aside, are you getting the data OK?

This pagination type should be used with AbsoluteUrl as the pagination key

0 Votes 0 ·

Hey @MartinJaffer-MSFT, according to Postman the NextLink format is actually RFC 5988, ( I could be wrong on that ).

The rest of the output from the API is in Jason, but the headers don't appear to be, so the data is coming in just fine. I was hoping to figure a way to parse this header response, but I've been unable to thus far.

0 Votes 0 ·

@RyanJames-9249 the pagination options for REST are very limited. I am not certain the pagination rules will be able to parse this.
I know how to make a pipeline dynamic expression to parse this, but, last I checked, the pagination rules did not allow dynamic expression.

If the pagination rules cannot handle this, then another option is implementing a custom pagination mechanism. This typically consists of placing the copy activity in a loop. Each iteration adjusts the headers / parameters / URL. Are you open to this?

0 Votes 0 ·

@MartinJaffer-MSFT I would be open to hearing about. It can parse the returned link? There's millions of rows here, hundred of millions in a couple of cases. Appreciate the help!

0 Votes 0 ·

@RyanJames-9249, Your API , does it happen to be public, something I could do a few test runs on? I understand if it is not. It would just make things easier to reproduce on my side.

0 Votes 0 ·
Show more comments
MartinJaffer-MSFT avatar image
1 Vote"
MartinJaffer-MSFT answered MartinJaffer-MSFT converted comment to answer

@RyanJames-9249 , @vince-6126
I heard back from the product group. Unfortunately this specific scenario is not supported at this time.

To help this scenario be supported in the future, please request / upvote it in the feedback forum.

You can use a Custom Activity to run your own custom code and save the results. Then you can use a Copy Activity to load the results to the destination of your choice.

Other alternatives include Function App or possibly Logic App.

· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@vince-6126 I've now got a working ADF model that uses the Batch process and a modified Python script. If you are interested, I'm happy to share it.

1 Vote 1 ·

RyanJames-9249 - yes please, that would be helpful. Just about to have to code the same so be great to slurp from the pool of prior art!

0 Votes 0 ·

@MattSymes-6244 can you send me a direct message or provide an email and I can get you some docs.

0 Votes 0 ·
Show more comments

@MartinJaffer-MSFT - This post is probably quite old. But wanted to check if this scenario is supported now. Although it says here in this document that ADF supports it by default, it does not seem to be working for us. https://docs.microsoft.com/en-us/azure/data-factory/connector-rest?tabs=data-factory

0 Votes 0 ·
MartinJaffer-MSFT avatar image
0 Votes"
MartinJaffer-MSFT answered vmteppo commented

@Preeti-2583

RFC 5988 is supported now. In Pagination rules, there is an option for it. When creating a new Copy activity, this option is added. (see picture below). However if you remove the option, then it will not be used.

224556-image.png



image.png (94.6 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Looks like this new pagination rule using RFC5988 works nicely, when there is only one link section in the header. If there are multiple link sections, data factory is not working properly, especially when first link section contains something else than rel="next" relation type.

For example, this is ok for data factory:
Link: <https://example.org/>;; rel="self",
<https://example.org/page2>;; rel="next"

But this kind of header link combination is not working:
Link: <https://example.org/>;; rel="self"
Link: <https://example.org/page2>;; rel="next"

The latter example is not in the RFC5988 documentation, but it is in the RFC8288 documentation which obsoletes RFC5988. In my opinion, data factory should support both ways. We are struggling with one API which uses multiple link sections and, of course, we really wanted to use this new feature instead parsing link for next page other ways.

Hopefully product team is able to fix this as soon as possible.

1 Vote 1 ·