Failing to connect to metastore when using dbx to launch ephimeral cluster in databricks

Enrico Mosca 6 Reputation points
2022-12-13T12:39:33.237+00:00

Dear all,

I am using dbx to deploy and launch jobs on ephemeral clusters on databricks.
I have initialized the the cicd-sample-project and connected to a fresh empty Databricks Free trial environment and everything works.

But when I try to do the same on my company's project, the launch commands fails. In the UI I get this error 269905-error.png

and in the log I get the following

WARN MetastoreMonitor: Failed to connect to the metastore InternalMysqlMetastore(DbMetastoreConfig{host=consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com, port=3306, dbName=organization5367007285973203, user=[REDACTED]}). (timeSinceLastSuccess=0) org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLTransientConnectionException: metastore-monitor - Connection is not available, request timed out after 15090ms. at org.skife.jdbi.v2.DBI.open(DBI.java:230)

My company's databricks environment connects to Azure through a vpn so I think that there lies the problem.

This is an overview of the ports, protocol and destination opened in our firewall:

PROTOCOL: TCP; PORT: 5671,5672,9350-9354,9093; DESTINATION: *
PROTOCOL: UDP; PORT: 123; DESTINATION: *
PROTOCOL: ICMP; PORT: *; DESTINATION: *
PROTOCOL: MSSQL; PORT 3306; DESTINATION:
westeurope-prod-metastore.mysql.database.azure.com,consolidated-westeurope-prod-metastore-addl-1.mysql.database.azure.com,consolidated-westeurope-prod-metastore-addl-2.mysql.database.azure.com,consolidated-westeurope-prod-metastore-addl-3.mysql.database.azure.com,consolidated-westeuropec2-prod-metastore-0.mysql.database.azure.com,consolidated-westeuropec2-prod-metastore-1.mysql.database.azure.com,consolidated-westeuropec2-prod-metastore-2.mysql.database.azure.com,consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com

Here are the screenshots of my current firewall configuration:
270110-firewall-rules.png
270154-firewall-rules2.png

I am quite stuck and I don't know what to try further, could you help me out?
I have attached a full log in case someone wants to have a look
269999-fail.txt

Any help or suggestion is welcomed, thanks and have a great day.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,934 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Enrico Mosca 6 Reputation points
    2023-01-05T13:43:36.613+00:00

    Thanks for your reply!
    A colleague find out that some routes were missing and adding them fixed the problem.
    For everyone else that may encounter the same problem, please have a look at Configure user-defined routes with Azure service tags

    1 person found this answer helpful.

  2. Alexander 0 Reputation points
    2024-04-19T07:07:13.3133333+00:00

    @Enrico Mosca The route in the UDR will work, but the Firewall should also work. I think you have made an mistake. Note that the UDR grant access to all Azure hosted SQL services (SQL ServiceTag).

    You have created a Application Rule (L7) for MSSQL:3306 (not MySQL used by Databricks!) MSSQL does NOT use port 3306.

    Instead configure the rule in Network Rules (L4) for Destination consolidated-westeuropec2-prod-metastore-0.mysql.database.azure.com (all domains here: Databricks) TCP port 3306.

    No need for "wide open" UDR's to next hop internet. Please verify this.

    0 comments No comments