Sizing Elasticsearch on Azure IaaS

Are you an architect? Does your work involve Azure IaaS and PaaS? Then sooner or later you will find yourself in a situation where your customer will ask you this question (if not already did).

Now, sizing is a tricky business. Primarily involves 2 parts – find an optimum server/VM and then find out how many. Let’s take these one by one.

In search world, there is no silver bullet for sizing and most of the time it’s done by rigorous test and observation. For example, Microsoft’s own SharePoint Enterprise Search – product team suggested some sizing benchmark based on no. of items to be crawled and indexed (10 million, 40 million and so on). This had been done based on good amount of tests conducted by product and other groups.

In case of Elasticsearch, we have few information and observation provided on their website which may help us derive to some sizing idea.

Let’s look at the following article: Based on this we can derive the following:

  1. Memory: It advises to use 64 GB. But in Azure world, we have 56 GB and 112 GB. 112 GB would be an overkill and as per the above article more than 64 GB is not advisable. So we stick to 56 GB.
  2. CPU: No definitive guide. We will choose 8 cores. One thing to be noticed here. It says between faster CPU and more core CPU, we should choose more cores. So 16 cores machine looks like better bet. But as we do not have specific info about “sweet spot” as with memory, let’s start with 8 core.

Now, for Disk, let’s see this: Although no definitive guide on IOPS, but it is clearly mentioned and advised high speed SSDs to store index. That too, local SSDs are preferred in comparison to attached one like Premium storage. To store the data/log we can have regular Azure storage. Premium storage is not essential. At least not indicated anywhere or at least I didn’t notice.

This brings us to Azure VMs like DS 13 or D13 V2. Which has 8 core, 56 GB RAM, and 400 GB SSD local storage. In my opinion, this can be considered as a benchmark for my VMs. In other words, if we need to scale higher, we will scale out rather than scale up.

Now, the question is how many VMs? Here I couldn’t find any definitive guide. It looks like things are mostly done on trial and error basis. But for a production setup with Terabytes of data, I’d recommend 3 node cluster to start with and then scale-out as required. 

Regarding the OS, I’d vote for Ubuntu, this is only because I saw at least 2 customers successfully running Elasticsearch on Ubuntu on Azure IaaS. Here is a guide, though a bit old but still may be useful:

As correctly said in the last link mentioned, you don't need load balancer (Azure or 3rd party).

At the end, a disclaimer, this article based on my experience with Elasticsearch and Azure. Your opinion and suggestions are most welcome.