question

LucaV-4185 avatar image
0 Votes"
LucaV-4185 asked SeeyaXi-msft commented

Polybase access to Mongodb big tables is very slow

Hi all,

we have installed on SQL 2019 Developer Edition the Polybase features.
After that, we've created an external data source towards our Mongodb server, where we've stored lots of big tables containing applicative logs.

*CREATE DATABASE SCOPED CREDENTIAL mongo_creds WITH IDENTITY = 'XXXXXXXXXXXX', Secret = 'XXXXX!';
go
CREATE EXTERNAL DATA SOURCE MongoDBSource
WITH (LOCATION = 'mongodb://nnn.nnn.nnn.nn:27017',
CREDENTIAL = mongo_creds,
CONNECTION_OPTIONS = 'ssl=false;')
go
CREATE EXTERNAL TABLE...



It works, but we have experienced many performance problems in querying large tables (about 50-100 GB) on Mongodb, as soon as we try to query with conditions.
The "select top 100 *" is lightning fast, but running a query "...where field = 15" can take more than twenty minutes. On any table, despite the same query on Mongo lasts just a second.
In addition, we have noticed that when we run the query from SQL, the system tries to "download" the entire table to the swap files on the Windows machine ( C: \ Windows \ ServiceProfiles \ NetworkService \ AppData \ Local \ Temp)...
Anyone have any idea what we could try to speed up queries and avoid this kind of swap?

Thanks in advance.

Luca



sql-server-general
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @LucaV-4185 ,

We have not received a response from you. Did the reply could help you? If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. By doing so, it will benefit all community members who are having this similar issue. Your contribution is highly appreciated.

Best regards,
Seeya

0 Votes 0 ·
LucaV-4185 avatar image
0 Votes"
LucaV-4185 answered SeeyaXi-msft commented

Hello,
yes, we created indexes and they're very useful when you run the query on Mongodb... Running queries from Sql Server have no benefit at all.
I think the point is what Sql seems to do every time we run a query: uses the swap directory to download the dataset to be queried...
If I check the sys.external_data_sources table, we defined correctly the pushdown parameter (= ON), so the query activity should be beared by the Mongo dataserver ...


...I just found on the Microsoft manuals that pushdown vs Mongodb community dataserver is not active in many situations:

If the join can be done at the external data source, this reduces the amount of data movement and improves the query's performance. Without join pushdown, the data from the tables to be joined must be brought locally into tempdb, then joined.

Maybe this could be the answer...

Luca



· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @LucaV-4185,

Yes.
You found the correct document. I put the link here to make it easier for more people to view.
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-pushdown-computation?view=sql-server-ver15#pushdown-of-joins
Here is a link about How to tell if external pushdown occurred.

Best regards,
Seeya

0 Votes 0 ·
SeeyaXi-msft avatar image
0 Votes"
SeeyaXi-msft answered

Hi @LucaV-4185,

Have you built indexes on large tables? For tables with such large data, full table scans should be avoided as much as possible.
Here is an article about improving performance: https://docs.mongodb.com/manual/tutorial/optimize-query-performance-with-indexes-and-projections/

Best regards,
Seeya


If the response is helpful, please click "Accept Answer" and upvote it, as this could help other community members looking for similar queries.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.