question

futomitsuishi-0224 avatar image
1 Vote"
futomitsuishi-0224 asked SietseBrouwer-1929 commented

Can I build the environment in the computing cluster using pip?

I want to train AI model, and in the VM instance executing the command below worked well

 pip install -r requirement.txt
 python ~

Then in order to train the Ai model in the same environment in the VM computing cluster, in the Python 3.8 - AzureML notebook I executed below (I'm sorry I couldn't attach the screenshot)

 import azureml.core
 from azureml.core import Workspace
 import os
 from azureml.core import ScriptRunConfig
 from azureml.core import Datastore
 from azureml.core import Experiment
 from azureml.core import Dataset
 from azureml.core.compute import AmlCompute
 from azureml.core.compute import ComputeTarget
 from azureml.core import Environment
 import datetime
    
 cluster_name = 'high-2x-v100-1'
 gpu_name = 'Standard_NC12s_v3'
 experiment_name = 'training_agent_print'
 hyperparameters = [
     '--max_train_time', '172800'
 ]
 script_folder = './script_folder'
    
 # workspace
 ws = Workspace.from_config()
 print(ws.name, ws.location, ws.resource_group, sep='\t')
    
 # compute cluster
 compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", cluster_name)
 compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
 compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)
 vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", gpu_name)
    
 if compute_name in ws.compute_targets:
     compute_target = ws.compute_targets[compute_name]
     if compute_target and type(compute_target) is AmlCompute:
         print('found compute target. just use it. ' + compute_name)
 else:
     print('creating a new compute target...')
     provisioning_config = AmlCompute.provisioning_configuration(vm_size=vm_size,
                                                                 min_nodes=compute_min_nodes,
                                                                 max_nodes=compute_max_nodes)
     compute_target = ComputeTarget.create(
         ws, compute_name, provisioning_config)

 # environment
 env = Environment.from_pip_requirements(name = "m8-pip-training", file_path = "./requirements.txt")
 exp = Experiment(workspace=ws,name=experiment_name)
    
 # run
 src = ScriptRunConfig(source_directory=script_folder,
     script='main.py',
     arguments=hyperparameters,
     compute_target=compute_target,
     environment=env
 )
 run = exp.submit(config=src)


as a result, in the 20_image_build_log.txt file, I got the log as below

 ==> WARNING: A newer version of conda exists. <==
   current version: 4.9.2
   latest version: 4.10.3
    
 Please update conda by running
    
     $ conda update -n base -c defaults conda
    
    
 Pip subprocess error:
 ERROR: Could not find a version that satisfies the requirement parlai==1.3.0 (from -r /azureml-environment-setup/condaenv.5svatkzc.requirements.txt (line 55)) (from versions: 0.1.20200409, 0.1.20200416, 0.1.20200610, 0.1.20200713, 0.1.20200716, 0.8.0, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4)
 ERROR: No matching distribution found for parlai==1.3.0 (from -r /azureml-environment-setup/condaenv.5svatkzc.requirements.txt (line 55))
    
    
 CondaEnvException: Pip failed
    
  [0mThe command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_ba289e67ead35c3dbaac125150111737 -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig' returned a non-zero code: 1
 2021/08/10 15:13:41 Container failed during run: acb_step_0. No retries remaining.
 failed to run step ID: acb_step_0: exit status 1
    
 Run ID: caj failed after 2m24s. Error: failed during run, err: exit status 1


Ans the experiment failed. I have 3 questions
1. Why computing cluster is using conda to build image even though I export the file from pip?
2. Can I build the environment using pip?
3. As there is WARNING, if I can update the conda to latest version, the experiment might not faile. Can I update the conda in the computing cluster?

Thank you so much



azure-machine-learning
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

SietseBrouwer-1929 avatar image
1 Vote"
SietseBrouwer-1929 answered SietseBrouwer-1929 commented

Summary:
- Option 1: try to create a working Conda environment, either on your own computer or in the VM; run conda list --export my-conda-specification.yml, and specify your Environment with Environment.from_conda_specification('my-env-name', 'my-conda-specification.yml')
- Option 2: create a Docker image, for example FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04, and install your Python packages in there. Once it's working, publish the Docker image and tell Environment to use it.

More details and some links below.

Why computing cluster is using conda to build image even though I export the file from pip?

Why MS uses Conda: unlike Pip, Conda can also control non-Python dependencies. Conda is also better at managing precompiled packages and tracking and solving their dependencies. (Under the hood, Conda uses Pip, which is why you're seeing "Pip subprocess error" in 20_image_build_log.txt.)

It is not very hard to translate Pip's requirements.txt file to something Conda understands; I think Conda can even read requirements.txt directly. That is how it is possible that you can export a requirements.txt file from Pip, and Conda reads it.

Can I build the environment using pip?

There are two ways you can reproducibly specify the environment you need: either create a conda specification, or successfully use pip in Docker image and use the resulting Docker image.

A. Create a conda specification that successfully builds the environment.

If you have a working conda environment:
- you can run activate it and run conda list --export conda-specification.txt to get the specification file (it will include any pip-installed dependencies!)
- you can create a new environment from that file using conda create --name my_env_name --file conda-specification.txt
- Hopefully it is also possible to run Environment.from_conda_specification('my-env-name', 'my-conda-specification.txt'). The reason I'm not sure is that conda list --export creates a plain text file, and Environment.from_conda_specification might expect a YAML file instead.

If you're creating a YAML file to specify an environment, it probably looks like this (below) and is called something like conda-spec.yml.

# conda-spec.yml
name: img-classification-part3-deploy-encrypted
dependencies:
 - package1  # installed by `conda install`
 - package2  # installed by conda
 - pip:
 - azureml-sdk
   - matplotlib
   - pandas
   - azureml-opendatasets
   - encrypted-inference==0.9
   - azure-storage-blob


Creation, again, takes place via one of
- conda create --name my_env_name --file my-conda-yaml.yml
- Environment.from_conda_specification('my-env-name', 'my-conda-specification.txt')

More details in these two URLs:
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-environments
- https://azure.github.io/azureml-cheatsheets/docs/cheatsheets/python/v1/environment/


B. Build a Docker image with a working environment, and tell Environment to use that Docker file.

  1. As there is WARNING, if I can update the conda to latest version, the experiment might not faile. Can I update the conda in the computing cluster?

I don't specifically know if you can update conda in the cluster; but I know that updating Conda should not change which packages Conda finds or (tries to) install, so this probably will not help.


I hope something of the above will help you. Good luck!



EDIT 2021-08-17:
- Use the correct command to export a conda env definition. I accidentally wrote the create command, instead...





· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for nice reply!!!

Option A failed, so I'll try B.
I have 1 more question with option A. In the conda environment , I implemented conda install pip and pip install -r requirement.txt to get the environment that I want to reproduce in cluster and export to yaml file. Then I create the environment with the yaml file. However it was figured out that the environment doesn't perfectly reproduce the environment I desired because there wasn't the module I installed.

My question is: Is there any case that exporting the yaml file and creating the environment based on it can't perfectly reproduce the environment? Is there any possible solution?

When I implemented pip install -r requirement.txt 1 dependancy conflict happened but pip solved this.


Can I ask some additional questions as well? I'd be happy to get an answer
1. all time I implement conda remove -n {myenv} --all, the tarminal gets in the environment azureml_py38. Is this an expected behavior? I always close the terminal once and re-open it.
2. When I implement conda env create -n {myenv} -f {yaml file} sometimes I got the error /anaconda/pkge/~ can't be deleted. Please remove manually Is this an expected behavior?


Thank you so much

0 Votes 0 ·

Is there any case that exporting the yaml file and creating the environment based on it can't perfectly reproduce the environment? Is there any possible solution?

May I ask what command you used to export the environment? In my original answer, I wrongly said to use conda create --name {myenv} --file {myfile}, which creates an environment instead of exporting it; I have just now edited my answer to use conda list --export, but I am curious if you used the same command.

For me, conda list --export produces very specific dependency strings, and even includes packages that I installed via pip, like this fragment:

...
sqlite=3.13.0=0
tqdm=4.62.1=pypi_0   # <-- I installed this one via pip, probably that is what `=pypi_0` means.
traitlets=4.3.1=py27_0
wcwidth=0.1.7=py27_0
wheel=0.29.0=py27_0
zlib=1.2.8=3


In regards to the two additional questions you asked, I regret that I do not know the answers. I wish we could sit behind one computer, so that we could figure it out together and laugh together when we solved it; but alas, we are strangers on the Internet, connected only by copper and fiber, electricity and light, the thinnest of virtual threads.

Finally, in reply to your other comment below: congratulations on getting it working!

Kind regards,

Sietse

0 Votes 0 ·

During trying options, I could notice the my real problem was not environmental issue and could be solved!
And I will refer to this when I bump into an environment problem next time!

T Thank you so much!!!!!!!

0 Votes 0 ·