Volume 28 Number 10A
OData - Harness Open Data with CKAN, OData and Windows Azure
By Mark Gayler | Government 2013
Open data is becoming increasingly important for governments and public research. Recently some key open data technologies have become available on the Microsoft Windows Azure platform. Comprehensive Knowledge Archive Network (CKAN) is one of the most popular open data platforms in use by governments, universities and enterprises around the globe. Using Windows Azure, open source platforms such as CKAN are able to take advantage of the scalability, flexibility, load-balancing and interoperability features of the cloud. The Open Data Protocol (OData) is an emerging open data standard that enables data to be easily consumed within applications across different platforms, devices and services.
In this article, I’ll show you how to deploy a CKAN open data Web site using the virtual machine (VM) capabilities of Windows Azure and VM Depot; how you can access open data published on Windows Azure using the Windows Azure Marketplace; and, finally, how that data can be reused in your applications using OData.
Deploy CKAN in the Cloud
CKAN (ckan.org) is an open source, Web-based data management system that makes data accessible by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (such as national and regional governments, universities, companies and organizations) that want to make their data open and available. The CKAN project is maintained by the Open Knowledge Foundation (OKFN), and its implementations are commonly used as a public platform for various government data catalogs, including data.gov.uk in the United Kingdom and data.gov in the United States. For example, the U.K. Meteorological Office (Met Office) is using the capabilities of CKAN and Windows Azure to host its weather data archives and make the data broadly available via data.gov.uk/metoffice-data-archive.
Developers who wish to deploy a CKAN installation in the cloud can now do so quickly and easily using the CKAN images available on VM Depot (bit.ly/16PLKxt). VM Depot is a community-driven catalog of preconfigured OSes, applications and development stacks that can be quickly deployed on Windows Azure. VM Depot enables you to find your favorite software and deploy it in minutes, or you can join the community, build a VM image and share it with others.
VM Depot has been developed by Microsoft Open Technologies Inc., a subsidiary of Microsoft. The VM images on the VM Depot site are provided and licensed to you by community members. You can learn more about publishing and deploying VM images in the VM Depot help materials.
The CKAN images on VM Depot are the result of a partnership between Microsoft UK, Microsoft Open Tech and OKFN, which has provided CKAN version 2.1 as a pair of Ubuntu VMs running on Windows Azure. The first image provides the database back end, and the second provides a Web front end. There are two methods of installing CKAN on Windows Azure using these images, as I’ll explain in the next section.
Now I’ll walk you through a step-by-step guide to deploying CKAN on Windows Azure. The CKAN project also maintains instructions in the CKAN Wiki on GitHub (bit.ly/16kCPWp).
How to Install CKAN 2.0 on Windows Azure from VM Depot
You can install CKAN on Windows Azure using the images available on VM Depot in two different ways: through the management interface and via the command-line tool. For either method you’ll need a Windows Azure subscription; if you don’t already have one, you can sign up for a free trial subscription at bit.ly/17pzkur. I’ll discuss each method.
Deploying CKAN Using the Windows Azure Command-Line Tool You can find details on how to set CKAN up via the command-line tool in the VM Depot documentation. Once you’ve installed the tools and configured your Windows Azure credentials, you can publish a CKAN VM. Search for CKAN in VM Depot by typing “ckan” into the search box on the VM Depot site. In your browser, as shown in Figure 1, you’ll see the two CKAN images. I’ll first work with the CKAN Web VM item. To the right of the description you’ll see a Deployment Script link. Click the Deployment Script link, click Agree and then select the region to which you want CKAN deployed. A script will appear in the text box. Copy this script and then replace the capitalized text with your specific deployment information. Items in square brackets are optional and—unless you know what you’re doing—can be safely ignored for now. Your final script will look something like this:
azure vm create ckantest -o vmdepot-1398-1-32 -l "West Europe" user pass azure vm endpoint create-multiple ckantest 80:80
Figure 1 The VM Depot Interface Showing a CKAN Virtual Machine Listing
The first line of this script creates a new VM with a DNS name that will be available as a VM at DNS_PREFIX.cloudapp.net, for example: “ckantest.cloudapp.net,” Linux username “user” and password “pass.” The second part opens up port 80 to the outside world. Run both lines of this script.
Now click on the Deployment Script link for the CKAN DB VM item. As you did before, select your region, copy the script and add the same DNS_PREFIX information. Also, it’s important to add a -c flag to the command to connect this VM to the previous one. For example:
azure vm create ckantest -o vmdepot-1397-2-32 -l "West Europe" user pass –c
After running this script, CKAN should be available as a VM at DNS_PREFIX.cloudapp.net (for example, ckantest.cloudapp.net).
Deploying CKAN Using the Management Portal Interface Log in to the Windows Azure Management Portal and click on the Virtual Machines section, then click on the Images tab. Click on Browse VM Depot at the bottom and then find CKAN Web image in the library. Now follow the instructions to install it in the region of your choice. It will take some time to copy this image into your subscription storage account; once you’ve completed this step, you can deploy as many images as you like. While waiting, you should click on Browse VM Depot again but this time select CKAN DB. Make sure you select the same region for both of them when asked. Once both images have copied to your account, you must register them by selecting each and clicking on the register button at the bottom of the virtual machines page on the portal (see Figure2).
Figure 2 Registering the CKAN Virtual Machines
Next, click NEW in the bottom-left corner and select “Virtual Machine from the Gallery.” Find the CKAN DB in the image list and select it. Follow the wizard, calling the instance whatever name you like (making sure you remember what you named the VM, for example, “ckanweb”). Provide the DNS prefix so that the machine’s DNS name will be useful. For example, if you call it “ckantest,” the address will end up being ckantest.cloudapp.net. You can call the user whatever name you like and supply a password or a public Secure Shell (SSH) key. Once the VM has been provisioned, follow the previous steps again for the CKAN Web image. Note that on this occasion, when the wizard asks if you want to create a standalone VM or connect to an existing VM (see Figure3), you should choose to connect to the CKAN DB VM you created previously. Remember, the previous VM must finish provisioning before you can connect another image to it.
Figure 3 Connecting to the Existing CKAN Virtual Machine
Once the wizard is complete and the VMs are provisioned—that is, they have a status of “running”—you need to click on the Web VM to open its details page. Next, click on the Endpoints tab so you can open the necessary port to allow users to access the Web interface. Call the endpoint “web” and put 80 in both the public and private ports fields (see Figure4).
Figure 4 Opening the Web Endpoints
Once this task has finished, your site should be accessible via your chosen domain name (ckantest.cloudapp.net, for example).
CKAN is a useful platform for publishing government data, but you’ll often want to analyze the data contained within it. There are a variety of analysis and business intelligence (BI) tools available for this purpose, such as Microsoft Excel. In order to enable such tools to work with a variety of data sources, Microsoft and industry partners have been collaborating on the development of the OData specification (odata.org), which works across a variety of devices and services. In the following section, I’ll explore how software developers and data scientists can use OData to access open data published in the cloud.
Windows Azure and OData Overview
Data that’s hosted on Windows Azure can also be published using the Windows Azure Marketplace (datamarket.azure.com), which utilizes OData. For example, because the Windows Azure Marketplace provides support for OData, users and developers can easily analyze the aforementioned U.K. national weather data using an OData feed that allows the use of tools such as Power Pivot within Excel. This provides unprecedented access to extensive meteorological data including hourly, daily and five-day forecasts. Based on collaboration between Citrix Systems Inc., IBM Corp., Microsoft, Progress Software Corp., SAP AG, WSO2 Inc. and others, Microsoft Open Technologies recently announced that the OASIS OData Technical Committee initiated a public review of OData 4.0, which is expected to become an OASIS standard this year.
Now I’ll show how you can import open data hosted in Windows Azure Marketplace into Excel 2013 via the Power Pivot add-in and OData.
Debrief: Windows Azure and Open Data
Open data is becoming increasingly important for governments and public research. Using Windows Azure, open data platforms such as Comprehensive Knowledge Archive Network (CKAN) are able to take advantage of the scalability, flexibility, load-balancing and interoperability features of the cloud. The Open Data Protocol (OData) is an emerging open data standard that enables data to be easily consumed within applications across different platforms, devices and services.
Publishing government data is easy and secure using the cloud.
- The cloud can provide open access to data at low cost
- Open data on the cloud platform is easily accessible by citizens, developers and commercial partners
- Open data in the cloud minimizes impact on local infrastructure
- Open data in the cloud encourages developers to reuse data for development of third-party applications and services
Accessing open data in the cloud provides a rich source of information and content for new innovative applications.
- Government open data can be reused in new citizen-centric applications
- Government-published information often includes rich data assets such as transport, geospatial and sensor data
- Use of the cloud and OData enables app developers to easily consume open data in applications running across different devices and platforms, including mobile, tablet and PC
- VM Depot: bit.ly/16PLKxt
- CKAN Wiki on GitHub: bit.ly/16kCPWp
- The Open Data Protocol (OData) Specification: odata.org
- Windows Azure Marketplace: datamarket.azure.com
- Newly Available Government Datasets: bit.ly/15bAPvq
Import Open Data from Windows Azure into Excel Using OData
Open data that’s hosted on Windows Azure can easily be imported into an application or service using OData. In this example, I’ll show how data can be imported into Excel 2013 using OData and the Power Pivot add-in. You can use this add-in to perform powerful data analysis in Excel 2013. The add-in is available in Microsoft Office Professional Plus and Office 365 Professional Plus editions. It’s built in to Excel 2013 but isn’t enabled by default. I’ll explain how you enable Power Pivot before you use it for the first time. In previous versions of Excel, Power Pivot had to be downloaded as a separate component. In Excel 2013, Power Pivot can be enabled as an add-in using the File | Add-Ins menu. For more information, see the Power Pivot help documentation at bit.ly/17NCVRO.
Accessing Open Data in the Windows Azure Marketplace I’ll now go through the steps to get a different government open dataset into Excel using the latest version of Windows Azure Marketplace and Excel 2013. In July, the DataMarket team published a blog post describing some new government datasets available (bit.ly/15bAPvq), including one from the French Postal Office, La Poste. This open dataset, a free subscription, is available in the Windows Azure Marketplace (be aware that some datasets in the marketplace might have specific authentication and login requirements, depending on the nature of the subscription). The Windows Azure Marketplace provides a useful overview page for the data, including your subscription status and more information. When you click on Explore This Dataset, you’ll be given the Service Explorer view of the dataset with the default table selected: PointsDeContactPostaux (see Figure5).
Figure 5 French La Poste Open Data PointsDeContactPostaux Table Displayed in the Windows Azure Marketplace
Importing Open Data from Windows Azure into Excel 2013 This data can be imported into Excel 2013 in several ways. For example, you can use the Data Connection Wizard from the Data menu and then select From Other Sources and then From Windows Azure Marketplace. You can also use the Data Connection Wizard to import open data from the Windows Azure Marketplace using OData. From the Data menu, select From Other Sources and then From OData Data Feed. The URL required for the OData Data Feed can be found in the Windows Azure Marketplace dataset, “URL for current expressed query” (see Figure5). You can also use the Service Explorer to filter data prior to importing it. In this example, select the dropdown by column Caracteristique and you’ll see a dialog box enabling you to filter the column. For this example, I only want Post Office data (rather than retail postal outlets) so underneath “is equal to” enter “Bureau de poste” and then select Filter. Note how the OData query changes with the result. In this example, the URL becomes: https://api.datamarket.azure.com/La\_Poste/Points\_De\_Contact\_La\_Poste/v1/PointsDeContactPostaux?$filter=Caracteristique%20eq%20%27Bureau%20de%20pos.
Enter that URL in the Data Connection Wizard box, and you’ll connect to the data feed in the Windows Azure Marketplace (see Figure6).
Figure 6 Importing Open Data from the Windows Azure Marketplace Using an OData Data Feed in Excel 2013
However, a more direct and navigable way to get Windows Azure Marketplace open data into Excel 2013 is to use the Power Pivot feature. This gives you more data table manipulation options during the import. Once you’ve enabled the Power Pivot add-in in Excel 2013 (as described earlier), you’ll see the Power Pivot option appear on the top menu. Click on Power Pivot and the Power Pivot menu will drop down. Click on Manage and the Power Pivot dialog box will open. Select Get External Data, then From Data Service, then From Windows Azure Marketplace. At this point, the Table Import Wizard dialog will populate with datasets directly from Windows Azure Marketplace and you can search, filter and select a dataset based on your criteria. You can see government open datasets by clicking on the Government category at the left and then searching for “open data.” However, in this case, you’ll search specifically for the French Postal Office data. In the Search the Marketplace box, enter “Poste” and then click the search symbol (see Figure7).
Figure 7 Selecting Open Datasets from the Windows Azure Marketplace Using Power Pivot in Excel 2013
Click on the Liste des Points de Contact du Réseau Postal Français link and you’ll see the Service Explorer view as displayed in the earlier example. You can still use the Service Explorer to filter data prior to importing it. As before, select the dropdown by column Caracteristique and you’ll see a dialog box enabling you to filter the column. In the Filter dialog enter “Bureau de poste” and then select Filter. Click on Select Query and the Table Import Wizard dialog will reappear with a default friendly name for your dataset (change it if you wish). Click Next and the wizard will display the tables you’ve selected. At this point, you have the option to Preview and Filter if you wish to choose only specific columns to import from your table. Given that you no longer need the Caracteristique column (as you’ve already filtered on it), you’ll use Preview and Filter to omit that column during the import. Select Preview and Filter, and when the table columns are displayed, uncheck the Caracteristique column and select OK. Next, click Finish and the filtered PointsDeContactPostaux table will be imported into your Power Pivot workbook, as shown in Figure8 (you’ll see a Success! dialog that you need to close).
Figure 8 French Postal Office Open Data Imported from the Windows Azure Marketplace Using Power Pivot in Excel 2013
CKAN and Windows Azure in Yokohama City
There are several examples of governments around the world publishing their data using Comprehensive Knowledge Archive Network (CKAN) and Windows Azure. Yokohama, one of the largest cities in Japan, recently published its open data catalog using CKAN running on Windows Azure. This is the first implementation in Japan of CKAN and Windows Azure for a local open data catalogue. You can see it at data.yokohamaopendata.jp.
How to Access Open Data from Catalogs Using OData
So far, I’ve shown how to set up an open data catalog on Windows Azure using CKAN. I’ve also looked at how you can import open data from the Windows Azure Marketplace into Excel 2013. Next, I want to explain how you can use the power of OData to incorporate open data into your application code.
To do this, I’ll use an external open data catalog built by the city of Regina in Saskatchewan, Canada: openregina.cloudapp.net. The Regina catalog publishes open data using the DataLab API available on GitHub (bit.ly/14M5tAq). DataLab is an open source library that enables governments and citizens to publish open data using Windows Azure, and it provides features targeted specifically at developers who want to incorporate open data within their applications.
In this example, I’ll show how DataLab generates application code dynamically to query open data using Windows Azure and OData. This code can be generated in a variety of languages, as you’ll see later.
First, go to openregina.cloudapp.net. The first page of the catalog shows the available datasets. You can provide some basic search and filter capabilities here. To start, scroll down and select the PointOfInterest dataset (see Figure 9).
Figure 9 City of Regina Open Data Catalog Showing the OData Expression Builder
You’ll note that in this view, you can see into the dataset (up to 1,000 rows) and you can provide some filtering and querying of the data. Also note the “Full query URL,” which will construct an OData feed URL as you query the data. This OData URL can be copied directly into Excel 2013, for example.
The code dynamically generated can be used directly in your application to query the open data catalog using OData. Government organizations around the world, including the city of Regina, are publishing open data in the expectation that application developers will build a new generation of innovative applications on the Web, mobile devices and so on.
These initiatives add to the many open data projects based on Windows Azure. Together they facilitate openness and transparency in public data. This work provides an extremely solid foundation upon which open data services can be built. The availability of government- and citizen-published open data represents an opportunity for you to produce innovative new applications and services using a rich data source. When coupled with the work of Microsoft Open Technologies on cross-platform, client-side tooling, you get an environment in which new and exciting opportunities are revealed.
Mark Gayler is a senior technical evangelist at Microsoft Open Technologies, specializing in open data initiatives. Reach him at email@example.com.
Thanks to the following Microsoft technical experts for reviewing this article: Ross Gardler and Max Uritsky
Ross Gardler is a Senior Technology Evangelist for Microsoft Open Technologies, Inc., and President of the Apache Software Foundation. Reach him at firstname.lastname@example.org.
Max Uritsky is a Principal Group Program Manager for the Microsoft Windows Azure Marketplace. email@example.com.