question

MoritzYBecker-6196 avatar image
0 Votes"
MoritzYBecker-6196 asked ·

Cosmos DB case-insensitive search consumes lots of RUs

I have a large Comos DB collection with 4 million small documents (4.79 GiB data size). The partition key is /objecttype, and the default indexing scheme is used. 2 million documents have objecttype = person. The following query consumes only 2.79 RUs, as expected (and returns 0 results):

SELECT * FROM c WHERE c.objecttype = 'person' AND STRING_EQUALS(c.displayname, 'foo', false)

However, the following case-insensitive query consumes 1228.11 RUs (also returning 0 results):

SELECT * FROM c WHERE c.objecttype = 'person' AND STRING_EQUALS(c.displayname, 'foo', true)

I would have expected the RU consumption in the second case to be only slightly higher. Doesn't case-insensitive search use an index? Or is that a bug?




Additional information


Query Statistics for the first query


Request Charge 2.79 RUs
Retrieved document count 0
Retrieved document size 0 bytes
Output document count 0
Output document size 49 bytes
Index hit document count 0
Index lookup time 0.11 ms
Document load time 0 ms
Query engine execution time 0.02 ms
System function execution time 0 ms
User defined function execution time 0 ms
Document write time 0 ms
Round Trips 1


Query statistics for the second (case-insensitive) query


Request Charge 1228.11 RUs
Retrieved document count 0
Retrieved document size 0 bytes
Output document count 0
Output document size 49 bytes
Index hit document count 0
Index lookup time 540.6700000000001 ms
Document load time 0 ms
Query engine execution time 0.030000000000000002 ms
System function execution time 0 ms
User defined function execution time 0 ms
Document write time 0 ms
Round Trips 1


Indexing Policy



{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/\"_etag\"/?"
        }
    ]
}


azure-cosmos-db
· 7
10 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@MoritzYBecker-6196 Welcome to Microsoft Q&A forums and thank you for reaching out.
Could you please post the query stats for the first query as well?

Also, I want to confirm if displayname is included in the indexing policy

I will try and repro this at my end and we will surely help you with a resolution.

0 Votes 0 ·

@KalyanChanumolu-MSFT Thanks for assisting. I've added the query stats for the first query and the indexing policy.

0 Votes 0 ·

@MoritzYBecker-6196 I am not able to reproduce this at my end.
I have escalated this internally. I will get back to you if I need your subscription and cosmos account details.

0 Votes 0 ·
Show more comments
KalyanChanumolu-MSFT avatar image
0 Votes"
KalyanChanumolu-MSFT answered ·

@MoritzYBecker-6196 I have checked this internally.
For case insensitive queries, RU charge will increase as the cardinality increases.
Please refer to the complete post here.

In your case, you can try to reduce the number of documents by adding more criteria to the where clause.
You could also try optimizing this by adding a composite index (and adding ORDER BY):

Composite index: (objecttype ASC, displayname ASC)

 SELECT * 
 FROM c 
 WHERE c.objecttype = 'person' AND STRING_EQUALS(c.displayname, 'foo', true)
 ORDER BY c.objecttype ASC, c.displayname ASC

Please find more details here: Troubleshoot query issues when using Azure Cosmos DB | Microsoft Docs


If an answer is helpful, please "Accept answer" or "Up-Vote" for the same which might be beneficial to other community members reading this thread.

· 1
10 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the clarification even though this is a disappointing answer. I would have expected property values to be to be indexed case-insensitively to support cardinality-independent query performance. The solution seems to be to store duplicates of property values in lower case.

0 Votes 0 ·
MoritzYBecker-6196 avatar image
0 Votes"
MoritzYBecker-6196 answered ·

I have created a Feature Request for efficient case-insensitive search. You can vote on it here.


10 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.