question

Schade-0685 avatar image
0 Votes"
Schade-0685 asked jkersting commented

Cannot upload local files to AzureML datastore (python SDK)

Hi everybody,

I just started learning how to use MS Azure and I got stuck with an apparently trivial issue.

I have my own pet ML project, a python script that runs a classification analysis with Tensorflow and Keras.
It runs smoothly locally and I am happy with it.

Now I am trying to run this script on Azure ML, hoping to take advantage from the available computing power and in general gaining some experience with the Azure services. I am a bit old style and I like the idea of running my code on my local IDE, rahter than running it in a notebook. Because of this, I focused on the python SDK libraries.

I created a free trial account on Azure and create a workspace. In order to adapt my original code to the
new task, I followed the example in https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml?WT.mc_id=aisummit-github-amynic

The problem arises when I try to upload my locally-stored training data to the datastore of the workspace. The data is savedlocally in a parquet file, about 70Mb in size. The transfer fails after some time with a ProtocolError. After that it keeps retrying and failing with a NewConnectionError.

The snippet that reproduces the error is:

 import numpy as np
 import pandas as pd
 from os.path import join as osjoin
    
 import azureml.core
 from azureml.core import Workspace,Experiment,Dataset,Datastore
 from azureml.core.compute import AmlCompute,ComputeTarget
    
 workdir = "."
 # Set up Azure Workspace
 # load workspace configuration from the config.json file in the current folder.
 try:
     ws = Workspace.from_config()
 except:
     print("Could not load AML workspace")
    
    
 datadir= osjoin(workdir,"data")
 local_files = [ osjoin(datadir,f) for f in listdir(datadir) if ".parquet" in f ]
    
 # get the datastore to upload prepared data
 datastore = ws.get_default_datastore()
 datastore.upload_files(files=local_files, target_path=None, show_progress=True)

Everything runs smoothly until the last line. What happens is that the program starts to upload the file,
I can see that there is outbound traffic from my VPN monitor. From the upload speed and the size of the file, I would say that it uploads it completely or close to that, then I get this message * :

 WARNING - Retrying (Retry(total=2, connect=3, read=2, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', OSError("(10054, 'WSAECONNRESET')"))': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
 WARNING - Retrying (Retry(total=1, connect=2, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210A8BAF48>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
 WARNING - Retrying (Retry(total=0, connect=1, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210B446748>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
 WARNING - Retrying (Retry(total=2, connect=2, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210A8B5148>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
 WARNING - Retrying (Retry(total=1, connect=1, read=3, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002210A891288>, 'Connection to creditfraudws2493375317.blob.core.windows.net timed out. (connect timeout=20)')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D
 WARNING - Retrying (Retry(total=0, connect=0, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002210A8BD3C8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /azureml-blobstore-xxx/creditcard.parquet?comp=block&blockid=TURBd01...TURB...RA%3D%3D

From the initial ProtocolError, I understand that the Azure cloud server bounces me back, but it is
unclear to me why. Checking the workspace from the Azure portal, I would guess that the container of the workspace is still empty, but I am not 100% sure if I checked that correctly.

Maybe I misunderstood the different components of the storage services in AzureML and I not using
the API correctly. Am I doing something wrong? Is there a way for me to extract more information about
the reasons for this error?

Thanks a lot in advance for any help you can provide



[*] (I manually edited portions of the error message obfuscating the blobstore name)






azure-machine-learning
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@schade-0685 The steps used to upload the files are correct and I do not think there is an issue there. Based on the error log you see on your VPN monitor could you please check if there are any firewall settings that could be hampering the upload? I have looked up this error code online and it looks like a known issue and there is a hot fix for older versions of windows. Here is the support link solution from Microsoft support.


It would be interesting to check if you can run the same steps from a different machine or a notebook to eliminate any issues with the blob storage of your workspace.



0 Votes 0 ·

@schade-0685 Does the above suggestion help to check if you can re-upload the file?

0 Votes 0 ·

1 Answer

JasonKoh-8531 avatar image
0 Votes"
JasonKoh-8531 answered jkersting commented

@romungi-MSFT

I am having the same issue. With the tutorial here: https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-bring-data

WARNING - Retrying (Retry(total=0, connect=3, read=0, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection abo
rted.', timeout('The write operation timed out'))': /azureml-blobstore-6cf75ce5-9da9-4149-bcfd-c844582dc038/datasets/cifar10/cifar-10-batche
s-py/test_batch
Uploading ./data/cifar-10-batches-py/data_batch_2
--- Logging error ---
Traceback (most recent call last):
  File "/home/user01/ws/azure-ml-tutorial/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/home/user01/ws/azure-ml-tutorial/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/user01/anaconda3/lib/python3.7/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/user01/anaconda3/lib/python3.7/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/user01/anaconda3/lib/python3.7/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/user01/anaconda3/lib/python3.7/http/client.py", line 1065, in _send_output
    self.send(chunk)
  File "/home/user01/anaconda3/lib/python3.7/http/client.py", line 987, in send
    self.sock.sendall(data)
  File "/home/user01/anaconda3/lib/python3.7/ssl.py", line 1034, in sendall
    v = self.send(byte_view[count:])
  File "/home/user01/anaconda3/lib/python3.7/ssl.py", line 1003, in send
    return self._sslobj.write(data)
socket.timeout: The write operation timed out

During handling of the above exception, another exception occurred:


Three small files are successfully loaded, but it failed at the other actual data files.

I'm using Ubuntu18.04, Python 3.7.6. I'm using a home-wifi which I don't think have a firewall for this.

Any idea?

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I solved the problem by manually setting up the timeout to None in the azureml package.

I have two questions remaining, @romungi-MSFT ,

I still often see a warning like this:
```
WARNING - Retrying (Retry(total=2, connect=3, read=2, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))': /azureml-blobstore-6cf75ce5-9da9-4149-bcfd-c844582dc038/datasets/cifar10/cifar-10-python.tar.gz?comp=block&blockid=TURBd01EQXdNREF3TURBd01EQXdNREF3TURBd01EQXdOVEF6TXpFMk5EZyUzRA%3D%3D
```
meaning that the connection is reset at the Azure side. Is it normal?

  1. How can I programmatically set up the timeout instead of using the default one? Is there a documentation for it? (I can't find it.)


thanks!

1 Vote 1 ·

Hey JasonKoh-8531!

I do have the same problem. Where in the Azureml package did you set the timeout?

Best regards,

Jens

0 Votes 0 ·

(Sorry that I didn't know this post should be located in Comment instead of Answer. )

0 Votes 0 ·