Upgrade HDInsight Python Version

HenryX 21 Reputation points
2020-10-26T12:34:49.347+00:00

I know that HDInsight is using Python 3.5, but is there a way to upgrade the minor version to Python 3.6 or above? The reason is that we have a third party package which only works on Python 3.6 or above.

https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-python-package-installation

The documentation did not mention about how to do that. Or is it allowed to upgrade at all?

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
197 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 76,586 Reputation points Microsoft Employee
    2020-10-28T04:22:29.76+00:00

    Hello @HenryX ,

    It's not recommended as using non-cluster built-in Python versions are unsupported scenario.

    WARNING!: HDInsight cluster depends on the built-in Python environment - Python 3.5. Directly installing custom packages in those default built-in environments may cause unexpected library version changes. And break the cluster further.

    If you want to install, you can use Python 3.6, change “python=3.5” in this command to python=3.6, and follow the rest steps in the document works.

    sudo /usr/bin/anaconda/bin/conda create --prefix /usr/bin/anaconda/envs/py36new python=3.6 anaconda –yes  
    

    Hope this helps. Do let us know if you any further queries.

    ------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.
    1 person found this answer helpful.
    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Furcy Pin 1 Reputation point
    2021-01-04T09:54:31.667+00:00

    Hi @PRADEEPCHEEKATLA-MSFT ,

    Python 3.5 has reached end of life the 5th of September, 2020 with the release of version 3.5.10
    PyCharm has also marked this version as unsupported.

    Do you think that the version of Python will be upgraded in the next release of HDInsight ? And when will it happen ?

    Regards


  2. Anonymous
    2021-08-17T08:27:39.4+00:00

    FYI My way to upd

    Custom script on github.com for script acrions with code like it. It can fails on first/second run, just examle

    #!/bin/bash
    # !!! It _POSSIBE_ code for fix AZ hdinsight nodes for new spark 3 and upd to python3.7
    set -e -x
    
    sudo -s
    
    # install python 3.7
    add-apt-repository -y ppa:deadsnakes/ppa || true
    apt update || true
    apt install -y python3.7
    apt install -y python3.7-dev
    apt install -y libcairo2-dev
    
    rm /usr/bin/python3
    rm /usr/bin/python3-config 
    rm /usr/lib/python3/dist-packages/PyYAML*
    rm -rf /usr/lib/python3/dist-packages/yaml
    ln -s /usr/bin/python3.7 /usr/bin/python3
    ln -s /usr/bin/python3.7-config /usr/bin/python3-config
    
    # install packages
    python3 -m pip install --upgrade pip setuptools || true
    python3 -m pip install wheel || true
    python3 -m pip install pypandoc==1.5 PyYAML==5.3.1 || true
    python3 -m pip install pandas==1.1.2 spacy==2.3.4 kmodes==0.10.2 numpy==1.19.4 metaphone==0.6 unidecode==1.1.1 StringDist==1.0.9 rapidfuzz==0.13.4 urltools==0.4.0 nltk==3.5 phonenumbers==8.12.14 python-dateutil==2.8.1 dateparser==1.0.0 datefinder==0.7.1 geotext==0.4.0 scikit-learn==0.23.2 || true
    python3 -m pip install pyhocon==0.3.51 pyspark==3.0.1 findspark==1.4.2 wordninja==2.0.0 pyspellchecker==0.5.5 cleanco==2.0.1 gensim==3.8.3 scipy==1.5.4 sympy==1.7 nose==1.3.7 PyYAML==5.3.1 || true
    python3 -m pip install --force-reinstall --upgrade adal==1.2.5 requests==2.25.0 requests_toolbelt==0.9.1 pyOpenSSL==20.0.0 cryptography==3.2.1 || true
    python3 -m pip install six==1.15.0 pyasn1-modules==0.2.8 || true
    python3 -m pip install boto3
    python3 -m pip install pyJWT==1.7.1 --upgrade
    python3 -m pip install pyap pyarrow==0.17.1 pyspark-patch==0.0.6 zipp==1.0.0 pycairo==1.13.4 recordlinkage==0.14  dython==0.6.2 || true
    python3 -m spacy download en_core_web_sm || true
    python3 -m spacy download en_core_web_md || true
    python3 -m nltk.downloader punkt || true
    mv -v /root/nltk_data/ /usr/share || true
    python3 -m nltk.downloader averaged_perceptron_tagger || true
    python3 -m nltk.downloader stopwords || true
    
    # upd to real pyspark==3.0.1
    #mv /usr/hdp/current/spark3-client/python/pyspark /usr/hdp/current/spark3-client/python/pyspark-old
    cp -r  /usr/local/lib/python3.7/dist-packages/pyspark /usr/hdp/current/spark3-client/python
    
    # Patch
    sudo python3 /usr/local/lib/python3.7/dist-packages/pyspark-patch/unzip.py --file /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_session.zip --dest /usr/local/lib/python3.7/dist-packages/pyspark-patch --password pyspArkp@tch
    sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_types.py /usr/hdp/current/spark3-client/python/pyspark/sql/types.py || true
    sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_dataframe.py /usr/hdp/current/spark3-client/python/pyspark/sql/dataframe.py || true
    sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_session.py /usr/hdp/current/spark3-client/python/pyspark/sql/session.py || true
    sudo cp -v /usr/local/lib/python3.7/dist-packages/pyspark-patch/pyspark_sql_context.py /usr/hdp/current/spark3-client/python/pyspark/sql/context.py || true
    
    #May be req to upd /usr/hdp/current/spark3-client/conf/spark-defaults.conf for spark.yarn.appMasterEnv.PYSPARK3_PYTHON /usr/bin/python3
    
    0 comments No comments

  3. Sarthak Agrawal 1 Reputation point Microsoft Employee
    2021-12-23T07:25:44.923+00:00

    Is there any update on whether HDInsights supports new Python versions? Its been more than a year and the documentation still says Python 3.5.

    0 comments No comments