Set up the PySpark interactive environment for Visual Studio Code
The following steps show you how to set up the PySpark interactive environment in VS Code.
We use python/pip command to build virtual environment in your Home path. If you want to use another version, you need to change default version of python/pip command manually. More details see update-alternatives.
Install Python from https://www.python.org/downloads/.
Install pip from https://pip.pypa.io/en/stable/installing. (If not installed from the Python installation)
Validate Python and pip are installed successfully using the following commands. (Optional)
It is recommended to manually install Python instead of using the MacOS default version.
Install virtualenv by running command below.
pip install virtualenv
For Linux only, install the required packages by running commands bellow if you encounter the error message.
sudo apt-get install libkrb5-dev
sudo apt-get install python-dev
Restart VS Code, and then go back to the script editor that's running HDInsight: PySpark Interactive.
- HDInsight for VS Code: Video
Tools and extensions
- Use Azure HDInsight Tool for Visual Studio Code
- Use Azure Toolkit for IntelliJ to create and submit Apache Spark Scala applications
- Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely through SSH
- Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely through VPN
- Use HDInsight Tools in Azure Toolkit for Eclipse to create Apache Spark applications
- Use Apache Zeppelin notebooks with an Apache Spark cluster on HDInsight
- Kernels available for Jupyter notebook in an Apache Spark cluster for HDInsight
- Use external packages with Jupyter notebooks
- Install Jupyter on your computer and connect to an HDInsight Spark cluster
- Visualize Apache Hive data with Microsoft Power BI in Azure HDInsight
- Use Apache Zeppelin to run Apache Hive queries in Azure HDInsight