Classifying the UK's roofs from aerial imagery using deep learning with CNTK

By Tempest van Schaik, Software Engineer (AI & data) at Microsoft

Dome, Dormer or Dutch Gable? Deep neural network! How can we automatically identify the roof-shape of millions of buildings across the UK? In this blog post, we describe how we worked with Ordnance Survey, Britain's mapping agency, to classify roof types from their geospatial data.

Here in the UK, Ordnance Survey are responsible for sending out the planes, sensors and human surveyors that the country’s maps and digital geospatial services are made from. Some of Ordnance Survey's maps are free and publicly available (e.g. the kind we use for country walks), and some of their specialist maps are made for specific customers.

Microsoft's new team, Commercial Software Engineering, did a week-long hack with Ordnance Survey at their Southampton headquarters. We were there to build a proof-of-concept system to extract a new feature, roof type, from their existing data. As home-owners will know, roof type is an important factor in insuring a building. Clearly, certain roof types are more prone to developing leaks than others. Therefore a map of roof-types could help insurers more efficiently calculate risk. We tried two approaches side by side; the Custom Vision API, and building a machine learning model with Cognitive Toolkit.


What's so interesting about roofs?

And what does your roof say about you? Potentially quite a lot, so there's been growing interest in getting information about roofs from geospatial data. Roofs can be classified into various types which have implications for insurance, disaster preparedness, urban planning and even poverty alleviation.


Roof types. We had no idea there were so many!


For example, the charity GiveDirectly has used roof types as a way to streamline the identification of poor villages in Kenya and Ghana. From satellite imagery, they identify the proportion of thatch and metal roofed homes, with thatch as a proxy for extreme poverty, where financial aid is needed most. With the increasing use of agronomic machine learning, it may be possible to create a credit score for unbanked farmers who need a loan, using data extracted from aerial imagery such as dwelling type, property size, and proximity to water supply and the produce market.

After the first morning spent around the whiteboard with the OS data scientists Charis Doidge, Izzy Sargent and Jacob Rainbow, we decided that we'd focus on roof classification, while another part of our team focused on roof detection (a related code story on that work is coming soon).


A week-long hack with Ordnance Survey, Britain's mapping agency


Crowd-sourcing the training data

Sometimes, manually labelling training data is unavoidable. A few weeks before our hack, Ordnance Survey cleverly crowd-sourced their labelling. Using Zooniverse, colleagues across Ordnance Survey shared the task of classifying roofs in aerial RGB images, producing a bountiful dataset of about 18,500 labelled roofs. Some were duplicates, but there were 7,600 unique roofs. Some of the roofs were difficult for humans to classify, but out of these, 6,800 were classified with high confidence. Some roof types like hipped are very common, while some of the unusual roofs were classified less than 100 times.


Ordnance Survey set up an internal Zooniverse project to label roofs. Thanks to the commitment and patience of their team, they labelled about 18,500 roofs this way.


Overview of the roof classification system

Ultimately the Ordnance Survey data scientists needed a fully customisable model for their research and a pipeline for doing high scale machine learning.

Our colleagues Luke Vinton, Hassaan Ahmed and Andrew Fryer and architects from Ordnance Survey built a high scale system for processing data from across the UK. It takes geospatial data stored on Cosmos DB and cuts it into patches, each containing one building. It classifies these images with a machine learning API.

Microsoft’s Custom Vision API is easy to get up and running so we tried this approach first for classifying roof types. This method gave us insights into our data which saved us time when we built our own Cognitive Toolkit model. The classification system is designed so that it can either use the Custom Vision API or the machine learning model published as an API.

The roof classifications are then associated with each building. At the hack, we also had a representative from Esri who produce ArcGIS 3D geospatial software. Esri used the roof labels to create amazing new map visualisations of the UK's roofs, which brought to life how the new data could be used for products and services.


Roof classification using Custom Vision

Microsoft's Custom Vision service is an in-preview, UI-based image classification service.

To classify roofs, we set up a model using the Custom Vision UI, and trained it on a set of RGB rooftop images manually labelled as either “Flat”, “Gabled”, or “Hipped”.


Loading manually labelled training data into the Custom Vision UI.


Single image classification with the Custom Vision UI

Using this model for classifying similar images from an unseen test set yielded promising results, with high precision and recall (based on k-fold cross-validation), and all observed tests were correctly classified. After testing 10 individual images manually via the UI, however, we quickly ran out of time and motivation to do so. We also noticed that the analytics offered by Custom Vision were quite limited, and users (according to the requests of the Ordnance Survey research team) may wish to access the raw results of testing for manual, in-depth analysis.


The analytics provided in the “performance” tab of the Custom Vision UI. We classified flat or not-flat roofs using RGB imagery.


Bulk classification

We wrote a quick .NET console application to classify images in bulk. It provides a .csv file containing raw classification results and confidence scores for an arbitrarily large set of test images, by interfacing with the Custom Vision API using its C# libraries. Code for this is available on GitHub. Microsoft is currently working on an open source React web app so that anyone can test Custom Vision models against large numbers of images, and get access to the raw results in a .csv (you can see the project code in development on GitHub). An initial publicly usable version will be available very soon (see the README to see if it’s available today!).


Current limitations

The Custom Vision service is a great candidate technology for use in Ordnance Survey’s rooftop classification system, and really for any typical image classification problem, with the added benefit of being extremely quick and easy to set up.

There was a reason, however, that most of our work with this tool during this engagement was to bypass its shortcomings: it’s as new a service as it is powerful. It has some key restrictions to consider at this time, which presented challenges to overcome for practical use:

  • Restriction of API calls (classification requests) to 1000 per day
  • Restriction of testing to one image at a time (through the GUI)
  • Analytics restricted to measures of precision and recall

Play with it, and look out for it, but don’t expect to operationalise your Custom Vision model today. While it can’t handle the vast workloads necessary for commercial use (at over 1000 classifications per day) yet, that capability is surely very quickly on its way.


Roof classification with Cognitive Toolkit and Machine Learning Workbench

The Custom Vision API was able to quickly classify roofs with >80% accuracy, which was a great start. But the Ordnance Survey researchers needed a machine learning model that they could control fully and experiment with. That's why we decided to try out Cognitive Toolkit, the open source deep learning engine which powers Microsoft's AI products. Our aim was to get a classification model up and running quickly, that Ordnance Survey could tweak and refine over time.

We decided to use Cognitive Toolkit within Azure Machine Learning Workbench, which is another new tool (in preview). Workbench is a data science environment that comes free in Azure and supports both Cognitive Toolkit and Tensorflow.

Despite using such new tools, we were lucky enough to find a very relevant code example that we could adapt. The code, written by Patrick Buehler et al., uses Workbench and Cognitive Toolkit to classify images of t-shirts as spotty, stripy or leopard print. This was easy to adapt to our multi-class image classification problem. This particular image example can easily be adapted for new domains, like classifying retina scans as healthy or pathological.


Spoilt for choice: which geospatial data to use?

Our labelled training data was individual roofs in aerial RGB images. For the same roofs we had additional data types: multispectral aerial images, polygon outlines of buildings from OS MasterMap, and elevation/altitude maps called Digital Surface Models (DSM).


RGB aerial image. This is the kind of data that the human classifiers were given to label.


Digital Surface Model (DSM) is based on elevation data. White represents the high points while black represents the low areas.


However, we decided that slope (which we derived from elevation) brought out the distinguishing features of different roof types most clearly. In a slope image you can clearly see the characteristic apex lines of roofs, so we used this data:


For classifying roofs we derived slope from DSM, which highlights the intersection of planes of the roofs.


We didn't get a chance to use the near-infrared data, but it would be useful for identifying overhanging trees which obscure roofs. Ordnance Survey also had fun assembling a "Frankenband" image, which is a multi-dimensional image containing several spectral bands as well as RGB and elevation. This would be interesting to use for classification too.


Data pre-processing

We pre-processed the slope data to make classification as easy as possible. We used building outlines to extract buildings from the background, and then replaced background with black (the extraction was done with geospatial software called FME.).


Flat, gable and hip roofs respectively, as slope images, with backgrounds removed, ready for training and classification.


Machine learning model

At the core of the machine learning model, we use a transfer learning approach to "featurise" our roof images. This means we start with an original model which was trained on a very large number of generic images, and we use a small set of images from our own dataset to fine-tune this network. We essentially build upon the features that were learned during the training of the original model. For a convolutional DNN, (we used ResNet 18 which was trained on the ImageNet corpus) this means that we cut off the final dense layer that is responsible for predicting the class labels of the original model and replace it by a new dense layer. The weights of this new final layer are the learned features of our roof images, that can be used for classification. We used a simple SVM to do the classification, but the obvious next step would be to use the neural network itself to do the classification, instead of the SVM.

The training data is stored in Azure Blob storage. We used an equal number of images from each of the 3 classes: flat, hipped and gabled, starting with only several hundred images from each class. We realised that these were the easiest classes to spot, after our initial experimentation with the Custom Vision API.

With minimal tweaking of hyperparameters we were able to achieve >85% accuracy on our test set of roofs. At the end of the week, we left confident that the data scientists at Ordnance Survey would be able to bring this accuracy up quickly by refining model parameters. After experimenting with a transfer learning approach, they may even have enough data to train a neural network from scratch.


How to run the project

The code is on GitHub and borrows from the excellent t-shirt classifying tutorial.

We used Azure Machine Learning Workbench as a convenient environment to run our Python scripts in. Workbench comes free if you're using Azure for data storage/compute, and it has useful features like being able to see a history of successful and failed script executions, to track model accuracy as you fiddle with parameters, and to run the code on a remote GPU (see the Workbench install instructions for more info).

Probably the most challenging part of the hack, and where we spent most of our time, was helping Ordnance Survey set up their environment for high-scale machine learning. We learnt that this is a major difference between theoretical machine learning, and applied machine learning in a real government department or enterprise, with real data and real security infrastructure. Thankfully we had the IT department at hand, to help us unblock barriers.


Install dependencies

Before running the Python scripts in this project, you need to install the following dependencies:

 pip install
pip install opencv_python-3.3.1-cp35-cp35m-win_amd64.whl

...after downloading the OpenCV wheel (the exact filename and version can change). Note that if you save this download in your Workbench project folder, Workbench may complain that the project is too big. If that happens, just move this file out of the project folder which is only for code. There is a special folder called Outputs for big files like images.

 conda install pillow
pip install -U numpy
pip install bqplot
jupyter nbextension enable --py --sys-prefix bqplot


Computing with a GPU

This project is configured to run on a GPU. You can configure it for CPU but it may just kill your machine. The easiest way to access a GPU is to create a Deep Learning Virtual Machine which has a GPU. Within the Azure Portal, click on Virtual Machines>Add and then search for Deep Learning Virtual Machine. Once deployed, start it up and then connect to it using RDP.


Run the Python scripts

As you run scripts 0-5 in series, take a look at the files that they generate, like the downloaded Resnet 18 and the pickled models. A detailed step-by-step description of this approach can be found in the original t-shirt tutorial.

The scripts are as follows:

 * # pull down the training data from Blob storage
                            #(will need your Blob storage credentials in, or bypass this and use local data.)
* # splits data into training and test set
* # loads pre-trained ResNet model which can be refined
* # featurise images
* # train the SVM to classify
* # evaluate the model and look at classification accuracy
* # the fun script with all the hyperparameters you can play with



After only 4 days of hacking with Ordnance Survey and a rich geospatial dataset of the UK we built the following proof-of-concept system, whose finer details can be improved over time:

  • a high scale storage and compute system using Cosmos DB
  • a roof detection system, picking out roofs from aerial RGB images
  • a roof classification system using the Custom Vision API (>80% accuracy)
  • a customisable roof classification system using Python and Cognitive Toolkit (>85% accuracy and growing)



Feedback from Ordnance Survey after the hack was that they were pleased with how quickly we got the proof-of-concept up and running, which we could use to calculate the computation cost of a full-scale project. They also learnt a lot from pair-programming with us rather than us coming in and building something for them. We really appreciated this feedback because "coding with" is exactly how we like to work.

We look forward to seeing how Ordnance Survey productionises the proof-of-concept that we built, and integrates cutting-edge machine learning into more of their research, products, and services. Special thanks to everyone at Ordnance Survey for hosting us, for having so much energy and enthusiasm, and patiently helping us wrap our heads around geospatial data.

What we learnt from this hack was just how powerful and useful Transfer Learning is, so we'd encourage other developers to give it a try! In our case, we used a neural network trained on general internet images to successfully classify quite weird-looking black-and-white aerial roofs. And it works with a tiny amount of data: Custom Vision needs just 30 or so images, and we were getting good results with just a few hundred images. Please let us know if you use the method described in our code story, and get in touch if you have ideas for an interesting AI/data collaboration. We welcome GitHub pull requests, comments below, and Tweets @Dr_Tempest and @MasonCusack.