New Tutorial: Using Apache ZooKeeper on Windows Azure to Manage Solr on Linux Virtual Machines

Here at Microsoft Open Technologies, Inc., we’ve been working with the top open source DevOps, application and data tools to make popular open source packages easy to deploy and manage on Windows Azure. This work gives developers the ability to take combine powerful open source technologies in new and unusual ways in the cloud to build new applications and solve old problems.

As a showcase of these possibilities, we’ve built a search engine infrastructure using Apache Solr that is managed by an external implementation of Apache ZooKeeper to ensure scalability with reliability, and consistent search results for every search, regardless of which search servers may be accessible at any time. You can find all the details in this tutorial and once you’ve completed the tutorial you will have multiple Solr instances (called SolrCores) synchronized across more than one server, with synchronization managed by ZooKeeper.

By default, Solr has an internal, customized version of ZooKeeper running to synchronize SolrCloud shards on the same sever, but for our ZooKeeper example, we’re show you multiple SolrCores distributed across servers. That means that multiple cores at multiple IP addresses are made to look like one server.

Here’s what the Solr dashboard will show when you’ve competed this tutorial:

image

The tutorial configures one ZooKeeper instance and two Solr VMs as the minimum to test our configuration, but you could scale up much more than that. With ZooKeeper managing Solr, as long as at least one SolrCloud instance is accessible anywhere that ZooKeeper is keeping things in Sync, you will still have ability to index documents and run queries.

Try it out yourself, and let us know what you think!