question

HenryX-2282 avatar image
0 Votes"
HenryX-2282 asked SarthakAgrawal-2215 commented

Upgrade HDInsight Python Version

I know that HDInsight is using Python 3.5, but is there a way to upgrade the minor version to Python 3.6 or above? The reason is that we have a third party package which only works on Python 3.6 or above.

https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-python-package-installation

The documentation did not mention about how to do that. Or is it allowed to upgrade at all?

azure-hdinsight
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @HenryX-2282,

Welcome to Microsoft Q&A platform.

I’m working with the product team and get back to you when I have more information.

0 Votes 0 ·
PRADEEPCHEEKATLA-MSFT avatar image
1 Vote"
PRADEEPCHEEKATLA-MSFT answered

Hello @HenryX-2282,

It's not recommended as using non-cluster built-in Python versions are unsupported scenario.

WARNING!: HDInsight cluster depends on the built-in Python environment - Python 3.5. Directly installing custom packages in those default built-in environments may cause unexpected library version changes. And break the cluster further.

If you want to install, you can use Python 3.6, change “python=3.5” in this command to python=3.6, and follow the rest steps in the document works.

 sudo /usr/bin/anaconda/bin/conda create --prefix /usr/bin/anaconda/envs/py36new python=3.6 anaconda –yes

Hope this helps. Do let us know if you any further queries.


  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

FurcyPin-8848 avatar image
0 Votes"
FurcyPin-8848 answered SarthakAgrawal-2215 commented

Hi @PRADEEPCHEEKATLA-MSFT ,

Python 3.5 has reached end of life the 5th of September, 2020 with the release of version 3.5.10
PyCharm has also marked this version as unsupported.

Do you think that the version of Python will be upgraded in the next release of HDInsight ? And when will it happen ?

Regards

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @FurcyPin-8848,

Welcome to the Microsoft Q&A platform.

I’m working with the product team and get back to you when I have more information.

0 Votes 0 ·

Hello @FurcyPin-8848,

We are planning to upgrade to higher versions that are not end of life. It will take us time to investigate the dependencies and test the full scenarios. We will update you once the python version updated on the HDInsight clusters.

0 Votes 0 ·

@PRADEEPCHEEKATLA-MSFT any updates on this?

0 Votes 0 ·
GolofastovVasiliyAdminUSUS-3195 avatar image
0 Votes"
GolofastovVasiliyAdminUSUS-3195 answered GolofastovVasiliyAdminUSUS-3195 published

FYI My way to upd

Custom script on github.com for script acrions with code like it. It can fails on first/second run, just examle


 #!/bin/bash
 # !!! It _POSSIBE_ code for fix AZ hdinsight nodes for new spark 3 and upd to python3.7
 set -e -x
    
 sudo -s
    
 # install python 3.7
 add-apt-repository -y ppa:deadsnakes/ppa || true
 apt update || true
 apt install -y python3.7
 apt install -y python3.7-dev
 apt install -y libcairo2-dev
    
 rm /usr/bin/python3
 rm /usr/bin/python3-config 
 rm /usr/lib/python3/dist-packages/PyYAML*
 rm -rf /usr/lib/python3/dist-packages/yaml
 ln -s /usr/bin/python3.7 /usr/bin/python3
 ln -s /usr/bin/python3.7-config /usr/bin/python3-config
    
 # install packages
 python3 -m pip install --upgrade pip setuptools || true
 python3 -m pip install wheel || true
 python3 -m pip install pypandoc==1.5 PyYAML==5.3.1 || true
 python3 -m pip install pandas==1.1.2 spacy==2.3.4 kmodes==0.10.2 numpy==1.19.4 metaphone==0.6 unidecode==1.1.1 StringDist==1.0.9 rapidfuzz==0.13.4 urltools==0.4.0 nltk==3.5 phonenumbers==8.12.14 python-dateutil==2.8.1 dateparser==1.0.0 datefinder==0.7.1 geotext==0.4.0 scikit-learn==0.23.2 || true
 python3 -m pip install pyhocon==0.3.51 pyspark==3.0.1 findspark==1.4.2 wordninja==2.0.0 pyspellchecker==0.5.5 cleanco==2.0.1 gensim==3.8.3 scipy==1.5.4 sympy==1.7 nose==1.3.7 PyYAML==5.3.1 || true
 python3 -m pip install --force-reinstall --upgrade adal==1.2.5 requests==2.25.0 requests_toolbelt==0.9.1 pyOpenSSL==20.0.0 cryptography==3.2.1 || true
 python3 -m pip install six==1.15.0 pyasn1-modules==0.2.8 || true
 python3 -m pip install boto3
 python3 -m pip install pyJWT==1.7.1 --upgrade
 python3 -m pip install pyap pyarrow==0.17.1 pyspark-patch==0.0.6 zipp==1.0.0 pycairo==1.13.4 recordlinkage==0.14  dython==0.6.2 || true
 python3 -m spacy download en_core_web_sm || true
 python3 -m spacy download en_core_web_md || true
 python3 -m nltk.downloader punkt || true
 mv -v /root/nltk_data/ /usr/share || true
 python3 -m nltk.downloader averaged_perceptron_tagger || true
 python3 -m nltk.downloader stopwords || true
    
 # upd to real pyspark==3.0.1
 #mv /usr/hdp/current/spark3-client/python/pyspark /usr/hdp/current/spark3-client/python/pyspark-old
 cp -r  /usr/local/lib/python3.7/dist-packages/pyspark /usr/hdp/current/spark3-client/python
    
 # Patch
 sudo python3 /usr/local/lib/python3.7/dist-packages/pyspark-patch/unzip.py --file /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_session.zip --dest /usr/local/lib/python3.7/dist-packages/pyspark-patch --password pyspArkp@tch
 sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_types.py /usr/hdp/current/spark3-client/python/pyspark/sql/types.py || true
 sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_dataframe.py /usr/hdp/current/spark3-client/python/pyspark/sql/dataframe.py || true
 sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_session.py /usr/hdp/current/spark3-client/python/pyspark/sql/session.py || true
 sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_context.py /usr/hdp/current/spark3-client/python/pyspark/sql/context.py || true
    
 #May be req to upd /usr/hdp/current/spark3-client/conf/spark-defaults.conf for spark.yarn.appMasterEnv.PYSPARK3_PYTHON /usr/bin/python3


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SarthakAgrawal-2215 avatar image
0 Votes"
SarthakAgrawal-2215 answered

Is there any update on whether HDInsights supports new Python versions? Its been more than a year and the documentation still says Python 3.5.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.