Data Science in a Box using IPython: Installing IPython notebook (2/4)

In the previous blog, we demonstrated how to create a Windows Azure Linux VM in detail. We will continue the installation process for the IPython notebook and related packages.

Python 2.7 or 3.3

One of the discussions that happened at the Python in Finance conference is which version of Python you should use?  My personally opinion is that unless you have a special need, you should stick with Python 2.7.  2.7 comes as the default on most of the latest Linux distros.  Until 3.3 becomes the default Python interpreter on your OS, it is better to use 2.7.


The Basics of package management for Python

There are several ways you can get Python packages installed.  The easiest is probably by running the OS default installer, but sometimes it may not have the latest version in the version of Linux you are running.  For Ubuntu, apt-get is the installer for the OS. apt-get will install your packages in /usr/lib/python/dist-packages.

Another option is to use easy_install. Easy install is part of Python, not part of the Ubuntu OS.   We need to have python-setuptools package installed using apt-get first, before being able to use it.  If you use easy_install, all your packages will end up in /usr/local/lib/python/site-packages instead.

type sudo apt-get install python-setuptools


PIP is another tool for installing and managing python packages, it is recommended over easy_install. For our purposes, we simply will use which ever of these tools that can install our packages easily and correctly.

type sudo apt-get install python-pip, this might take a few minutes as pip has many dependency packages that it must install.



Installing IPython, Tornado web server, Matplotlib and other packages



To install, type:  sudo apt-get install python-matplotlib

IPython notebook is browser based, it uses the Tornado webserver.  The Python-based Tornado webserver supports web sockets for interactive and efficient communication between the webserver and the browser. 

To install, type:  sudo apt-get install python-tornado

Upon completion, we will now install Python itself.  The IPython team recommends installing through easy_install to get the latest package from their website.

sudo easy_install

This should install version 1.0 dev version of IPython.



We also need to install a package called Pyzmq, Zero MQ is a very fast networking package that IPython uses for its clustered configuration.  IPython is capable of interactively controlling a cluster of machines and run massively parallel  Big Compute and Big Data applications.

Type:  sudo apt-get install python-zmq

Finally , Jinja2, is a fast, modern and designer friendly templating language for Python is is now required for IPython notebook.

Type:  sudo apt-get install python-jinja2

Configuring IPython notebook

Type: ipython profile create nbserver  to create a profile.  The command generates a default in your home directory under .ipython/profile_nbserver/    Note that any directory starts with a “.” is a hidden directory in Linux. You must type ls –al to see it.

The .ipython directory is shown below in blue.


Once we’ve created a profile, the next step is to create an SSL certificate and generate a password to protect the notebook webpage.

Type: cd ~\.ipython\profile_nbserver  to switch into the profile we just created.

Then, type: openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem   to create a certificate. Below is a sample session we used to create the certificate.



Since this is a self-signed certificate, the notebook your browser will give you a security warning. For long-term production use, you will want to use a properly signed certificate associated with your organization. Since certificate management is beyond the scope of this demo, we will stick to a self-signed certificate for now.

The next step is to create a password to protect your notebook.

Type: python -c "import IPython;print IPython.lib.passwd()" # password generation


Next, we will edit the profile's configuration file, the file in the profile directory you are in. This file has a number of fields and by default all are commented out. You can open this file with any text editor of your liking, and you should ensure that it has at least the following content, you may use either the Unix vi editor or nano which would be easier for beginners. 

Make sure you make a copy of the sha1:c70c9b9671ef:43cf678c8dcae580fb87b2d18055abd084d0e2ad  string you got from the python password generator line above.


Type: nano

This will go into the editor, copy the appropriate line into your editor.  Note # is the comment sign for Python.

 c = get_config()
 # This starts plotting support always with matplotlib
 c.IPKernelApp.pylab = 'inline'

 # You must give the path to the certificate file.

 # If using a Linux VM:
 c.NotebookApp.certfile = u'/home/azureuser/.ipython/profile_nbserver/mycert.pem'

 # Create your own password as indicated above
 c.NotebookApp.password = u'sha1:c70c9b9671ef:43cf678c8dcae580fb87b2d18055abd084d0e2ad'  #use your own
 # Network and browser details. We use a fixed port (9999) so it matches
 # our Windows Azure setup, where we've allowed traffic on that port

 c.NotebookApp.ip = '*'
 c.NotebookApp.port = 8888
 c.NotebookApp.open_browser = False


 Press control –X to exit nano and press Y to save the file. 

Configure the Windows Azure Virtual Machines Firewall

 This was done in Post 1 of this blog series.  Please see Create your first Linux Virtual Machine section of the blog.


Run the IPython Notebook

At this point we are ready to start the IPython Notebook. To do this, navigate to the directory you want to store notebooks in and start the IPython Notebook Server:

 Type: ipython notebook --profile=nbserver

You should now be able to access your IPython Notebook at the address https://[Your Chosen Name Here]

In our case it is:



Type in the Password you set when you ran the python -c "import IPython;print IPython.lib.passwd()" command.


Once logged in, you should see an empty directory.  Click on “New NoteBook” to start.


To reward your hard work, we’ll have IPython notebook plot a few donuts for us.  You can copy and paste the code from:  Please your cursor to the end of the last line, Press shift + Enter to run the code right after the last line.  If all goes well, you should see a set of 4 chocolate donuts almost instantly. 



In the second part of this blog series, we showed you the minimum steps to install the IPython notebook inside a Windows Azure VM running Linux Ubuntu 12.10.  In the next blog, we’ll take a look at a few popular, common packages for machine learning,  data analysis, and scientific Computing.  If you have questions, please contact me at @wenmingye on twitter.