Using CSOM - are there any better ways of fetching List item pages in parallel?

nicholas dipiazza 26 Reputation points
2020-10-27T13:49:22.773+00:00

Using SharePoint CSOM, I need to fetch list item pages in parallel. Because some lists that I am fetching are 1,000,000+ items large, doing the fetch in serial fashion is extremely slow. Fetching them in parallel increases fetch speed by many times.

Please see the following post:
https://sharepoint.stackexchange.com/questions/266172/is-there-a-way-to-get-all-paging-info-for-a-list-up-front-so-that-very-large-lis/266498#266498

We are using the following approach to fetch list item pages:

First get the last ID in the list:

ClientContext clientContext = new ClientContext("your site"); 
Microsoft.SharePoint.Client.List spList = 
clientContext.Web.Lists.GetByTitle("TestList"); 
clientContext.Load(spList); 
clientContext.ExecuteQuery(); 

if (spList != null && spList.ItemCount > 0) 
{
 Microsoft.SharePoint.Client.CamlQuery camlQuery = new CamlQuery(); 
camlQuery.ViewXml = 
  @"<View Scope='RecursiveAll'><Query><OrderBy><FieldRef Name ='ID' Ascending='FALSE'/></OrderBy></Query><RowLimit Paged ='TRUE'>1</RowLimit></View>";  

ListItemCollection listItems = spList.GetItems(camlQuery); 
clientContext.Load(listItems); 
clientContext.ExecuteQuery(); 
}

Now that we know the last ID of the list, we request these pages (in parallel):
0-4999
5000-9999
10000-14990
15000-20000
....
until we get to the last ID as the max.

This is not ideal however.

When you request for 0-5000 it basically fetches 5000 entries and the last item id returned could be 6800. Now on the next request you are making is from item Id 5000. Then it will fetch those 1800+ additional items. This is undesirable because it causes you to re-download list item metadata for 1800 items twice.

Are there any better approaches to this?

SharePoint Server Development
SharePoint Server Development
SharePoint Server: A family of Microsoft on-premises document management and storage systems.Development: The process of researching, productizing, and refining new or existing technologies.
1,564 questions
0 comments No comments
{count} votes

5 answers

Sort by: Most helpful
  1. Jerryzy 10,561 Reputation points
    2020-10-28T03:10:18.803+00:00

    Hi @nicholas dipiazza

    No need to get last item Id.

    Instead, use CamlQuery.ListItemCollectionPosition to get the paing information like this:

            using (var context = new ClientContext("https://tenant.sharepoint.com/sites/dev/"))  
            {  
                context.Credentials = new Microsoft.SharePoint.Client.SharePointOnlineCredentials(userName, securePassword);  
                Web web = context.Web;  
                context.Load(web);  
                context.ExecuteQuery();  
      
                var list = context.Web.Lists.GetByTitle("CamlList");  
                Microsoft.SharePoint.Client.CamlQuery camlQuery = new CamlQuery();  
                camlQuery.ViewXml =  
                   @"<View Scope='RecursiveAll'>  
                    <RowLimit Page='True' >5000</RowLimit>  
                    </View>";  
      
                //Creating a single buffer for storing all the ListItems  
                List<ListItem> lstListItemCollection = new List<ListItem>();  
      
                do  
                {  
                    ListItemCollection itemCollection = list.GetItems(camlQuery);   
                    context.Load(itemCollection);  
                      
                    try  
                    {  
                        context.ExecuteQuery();  
                        lstListItemCollection.AddRange(itemCollection);  
                        camlQuery.ListItemCollectionPosition = itemCollection.ListItemCollectionPosition;  
      
                    }  
                    catch (Exception exec)  
                    {  
                        Console.WriteLine(exec.ToString());  
                    }  
                }  
                while (camlQuery.ListItemCollectionPosition != null);  
      }  
    

    Reference:

    Query ListItems in Batches – SharePoint Online


    If an Answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. nicholas dipiazza 26 Reputation points
    2020-10-28T18:23:02.823+00:00

    Hi @Jerryzy-MSFT

    This is exactly what I'm trying to avoid. What you are mentioning is a single threaded pagination through the sharepoint list. If the list has 1,000,000+ records, this single thread process takes a very long time. If you break the pages into pieces then fetch the pieces in multiple threads, you can greatly increase the speed.

    My question is how to, most efficiently, get the last id in the list.

    0 comments No comments

  3. Jerryzy 10,561 Reputation points
    2020-10-29T00:49:45.713+00:00

    Hi @nicholas dipiazza ,

    SharePoint CSOM objects are not thread safe so multiple threads is not suitable in this case.

    Please check out the blog which expalin this:

    Parallel processing and CSOM; friends or foes?

    0 comments No comments

  4. nicholas dipiazza 26 Reputation points
    2020-10-29T01:20:38.917+00:00

    I don't see how that is applicable here. You can definitely multi-thread fetching the pages of a single list. I've been doing the fetch of pages of a single list simultaneously with multiple threads for several years now. Without that ability my csom app would be super slow.

    Just looking for a way to make it even more efficient in case anyone else has seen a way to do it better than I am.


  5. Hubler, Benjamin 1 Reputation point
    2021-02-12T14:38:59.59+00:00

    Hello, @nicholas dipiazza

    I ran into a similar situation as you have. I would like to use a "Where" clause in the caml query with paging, but that is not allowed. So back to calling the whole list and filtering after. Here is what I am doing using ExecuteQueryAsync() calls. I have reduced the time to fetch a list of 3000 items from 8 seconds down to about 1.5 seconds. Take a look at the code and let me know if you have any suggestions, maybe it will help you out too.

        private async Task<List<ListItem>> GetSharePointDataAsync(ClientContext actx)  
        {              
            List<Task> tasks = new List<Task>();  
            List employees = actx.Web.Lists.GetByTitle("ListName");  
            actx.Load(employees);  
            actx.ExecuteQuery();  
    
            int rowlimit = 250;  
            List<ListItemCollection> colsEmployees = new List<ListItemCollection>();  
            ListItemCollectionPosition posEmployees = null;  
            List<ListItem> itemsEmployees = new List<ListItem>();  
            CamlQuery query = new CamlQuery() { ViewXml = "<View><RowLimit>" + rowlimit + "</RowLimit></View>" };  
            int positioncount = 0;  
            do  
            {  
                query.ListItemCollectionPosition = posEmployees;  
                colsEmployees.Add(employees.GetItems(query));  
                actx.Load(colsEmployees.Last());  
                tasks.Add(actx.ExecuteQueryAsync());  
                positioncount += rowlimit;  
                if (positioncount > employees.ItemCount)  
                {  
                    posEmployees = null;  
                }  
                else  
                {  
                    posEmployees = new ListItemCollectionPosition { PagingInfo = "Paged=TRUE&p_ID=" + positioncount };  
                }  
            } while (posEmployees != null);  
            await Task.WhenAll(tasks);  
            return itemsEmployees;  
        }  
    
    0 comments No comments