question

AlexWadmin-1045 avatar image
0 Votes"
AlexWadmin-1045 asked MartinJaffer-MSFT commented

ServicePrincipalAuthentication no longer working in Databricks

Hi all.

I've had this problem for MONTHS now and, not having the option to give up, I'm getting desperate.

I have a databricks set up where an azure file-share is mounted, and this is used to extract data and read it into a database. Up until Monday it was been working fine.

Recently, although nothing has changed about the way the drive is mounted (via Azure ML libraries):

sp = ServicePrincipalAuthentication(tenant_id="x", # tenantID service_principal_id="y", clientId service_principal_password="z")

clientSecret ws = Workspace.get(name="wsname", auth=sp, subscription_id="a")

Listing the contents of a directory suddenly takes an enormous amount of time to finish (50 minutes), before no longer being able to find the folder. Essentially, it repeatedly tries to switch back to interactive authentication before failing altogether, saying [Errno22]: Invalid Argument.

import os

folder = "/mnt/tmp/xx/a/b/c"

patient_names = os.listdir("/mnt/tmp/xx/a/b/c") print(patient_names)

I'm lost, is there anywhere I should be looking to try and find out what's wrong?

It works fine using Interactive authentication and WAS working with SPA, but suddenly does not.
I've tried:



  • Recreating the datastore and dataset in ML

  • Creating new databricks clusters on which to run the code.

  • Creating another service principal

  • Running the code on my windows work machine with Pycharm and Python 3.7

  • Creating an Ubuntu environment with Pycharm and Python 3.7

  • Creating a new machine learning environment

  • Running the code with the python logging module to see if anything useful has come back.

  • Trying several different versions of azureml-sdk[databricks]

  • Trying yet another service principal.

Nothing has worked. I don't understand how/why service principals are suddenly being ignored, why there are no error messages or useful information of any kind, or how nobody else has come across this problem?

Please can someone help?

Thank you.



azure-machine-learningazure-databricks
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hmm, okay. there are a couple things that came to mind, @AlexWadmin-1045 .

You wrote that you were able to start iterating through the folder, then suddenly it took a long time and started asking for authentication again.

Were you able to look at the progress made while iterating? Like, did it get partway through at an acceptable pace, then slow down, then suddenly stop?

There are several things I could think to check. Is someone else running a job on the same cluster that un-mounts the same location, or tries to re-mount?
How large a list are you trying to iterate? I have seen some workloads where the act of iterating thru all files starts to break due to how much there is. Could your job be so large that the time it takes to process exceeds the security/authentication timeout?

0 Votes 0 ·

Hi Martin

Thanks for the reply :)

The log information I've been able to gather implies that it's trying to access it over and over again, I don't think it can get on at all.

I'm the only one who uses the cluster on a regular basis, so I know nobody else is unmounting anything, and the failure happens consistently.

I thought the large list might be a factor, but I've also tried the code with small file shares (with a single file) and gotten the same result.

Kind Regards

Alex

0 Votes 0 ·

You are using the file share of a storage account, not blob container, correct?

It is worth checking permissions again. Also how the identity-based authentication is configured on the file share. Any Firewall rules you may have set up.


0 Votes 0 ·

0 Answers