Running a High Volume Website on Azure Infrastructure Services
The Tech a CEO Needs to Know
Cloud technologies like Microsoft Azure offer a very quick and simple way for software companies to run their business and access a global customer base. These companies who traditionally delivered their solution via CD or some other distribution medium are looking to the cloud as a way to expand their reach into new markets, or simply to centralise their intellectual property into a multi-tenanted environment or SaaS solution. I personally work with quite a few technology startups, via our BizSpark programme, that are considered "born in the cloud", these companies have no option but to use cloud, it allows them to start small and grow at their own pace, a key requirement for a startup.
One of the talks I do for startups is the 'Tech a CEO needs to Know'. This is targeted at CEOs and having worked myself as a CTO of a startup and angel investor fund, I know that CEOs build great business plans but sometimes don't think about the really important technical aspects of their business. I don't mean a CEO needs to know how to develop or even architect the next Uber, but there are a few simple concepts that a CEO should know about, or at least have their CTO develop a plan for. Too often I see software company CEOs defer these challenges to their CTO or Developer and assume they are being handled, but in my opinion it is critical that the business leaders within the organisation understand that there is a plan around these fundamental concepts:
- Availability - How do I make sure my site and business stays up so I don't lose customers?
- Scalability - How can I scale my infrastructure up or down to meet demand or gain new customers?
- Security - Is my data secure and what do I need to do to keep it that way?
- Disaster Recovery - What do I do if there is some sort of disaster or issue with my data?
In my experience of working with CEOs of startups they sometimes assume that this is all handled by the platform, i.e. Microsoft Azure, or that their 'Developer/CTO' is looking after all of this. In fact, Azure has quite a few platform and infrastructure services that do look after a lot of these things, assuming you use them a certain way or choose to allow Microsoft to do them for you. But the assumption that everything is fine because you are using the cloud is wrong. Your CTO/Developer must understand how to best architect for the cloud before he/she can take full advantage of the speed and scale it offers.
The Tech a CTO Needs to Know
As a CTO YOU need to take control of these requirements and develop a plan that reassures your CEO that the technology platform is sound. For example, in the event you experience a major spike in traffic, which should be a positive thing for your business, will the infrastructure stay up? How many times have you gone to a website that you've seen advertised for it to be down or just slow. You will give up! If this is your website you have lost a customer. Now, you might say that your hosting company offer an SLA, a service level agreement, which will dictate that the infrastructure will be kept up for an agreed time, e.g. 99.9% of the time. In the case of a spike in traffic, if this infrastructure is simply not adequate then your website will slow down or just stop handling requests. You will lose customers but your hosting provider will say the infrastructure stayed up, and it probably did, but it just wasn't adequate. In this scenario, what you really need is an infrastructure that you can scale up or out easily and then scale back in when it's no longer needed.
This is where Microsoft Azure comes in if used correctly. You could just think of Azure as another hosting provider, but actually it's way more. Azure consists of a number of services that can help you build out your site. These services include compute services like Virtual Machines, applications services, like caching and mobile services, as well as data services to support your relational or big data and analytics requirements. The great thing about Azure is that you can turn these services on and off yourself, or scale them in and out yourself. No need to wait for new infrastructure, simply click a button or slide a slider and get what you need, and of course only pay for what you use. That said, it's important to understand how to deploy your site to Azure. If you do what you have always done with your traditional hosting provider you may be no better off, and I for one have seen this happen many times.
For this blog post I am going to focus on infrastructure services, but there is probably another set of posts I could do for data and application services, although the main points apply across the board. In terms of infrastructure you can build out your web servers using standard Linux or Windows VMs or you can use the gallery to spin up a pre-built environment, for example the Magento solution.
If your site goes down it can be because of many things, a bug in your code or inadequate infrastructure for example. Another problem, could be planned or unplanned maintenance. The fact is, no matter who manages your infrastructure, servers will go down and this can be due to planned or unplanned issues. For example, within Azure, Virtual Machines come with an SLA of 99.95% as long as you deploy two VMs within an availability set. This then allows us to shift VMs to different update and fault domains within the data centre, meaning during either planned or unplanned maintenance you will never lose your site. It's worth noting that you don't have to use availability sets, you can just deploy a VM on its own, which is perfectly acceptable for DEV & Test or non-critical workloads. In this scenario Microsoft will even send you an email (when possible) alerting you that your VM may need to be rebooted. In this scenario you leave your VM outside of any availability sets but you will not have an SLA.
TIP: Avoid leaving a single instance virtual machine in an Availability Set by itself. Virtual machines in this configuration do not qualify for SLA guarantee and will face downtime during Azure planned maintenance events. Furthermore, if you deploy a single VM instance within an availability set, you will receive no advanced warning or notification of platform maintenance. In this configuration, your single virtual machine instance can and will be rebooted with no advanced warning when platform maintenance occurs.
So this first scenario is not wrong, but understand the risks associated.
Another configuration I quite commonly see is where the front-end web VM and the back-end database VM appear in the same availability set. This configuration is wrong and should never be used, as Microsoft have no idea what is running in these VMs either VM could be rebooted as part of a maintenance activity. In this situation your site is down and Microsoft will have honoured their SLA, i.e. at least one VM will be running within the availability set.
TIP: In the scenario where you use availability sets you will not receive email notifications about VM reboots due to planned or unplanned maintenance. This is because Microsoft assume that if you use availability sets that you have at least two VMs in them and it is safe to reboot any of them, but not all at the same time.
The correct scenario is to use an availability set for each tier within your application, one for the web tier and one for the data tier. In this scenario, at least one VM in each tier is guaranteed to be kept up in the case where VMs need to be rebooted for maintenance.
More Reading: For more detail on ensuring availability when using virtual machines check out the below article which talks through the availability options as well as how to then load balance traffic across your VMs. http://azure.microsoft.com/en-in/documentation/articles/virtual-machines-manage-availability/
One of the really cool features of Microsoft Azure is the fact that you can scale your resources in and out. This means you only ever have the infrastructure you need and no more. If you think about this in terms of building your own infrastructure where, due to inconsistent demand on resources, the percentage utilisation of hardware is typically quite low, Azure can drive the efficiency of infrastructure required to run websites way up. Think about this in terms of an online eCommerce site that really gets hit around Christmas time but for the rest of the year returns to normal demand. Even during a normal week, the site may be hit more during business hours. If we had to provision the infrastructure for this upfront then we would probably build out to support the peak demand plus some to account for unexpected spikes. This, of course, results in a lot of waste. Azure allows us to spin up this infrastructure but then shut it down with a click of a button or a PowerShell command or even based on rules and schedules you define. This gives you much better control of the infrastructure and allows you to scale to meet any demand and because you don't pay for the VM compute time when a VM is stopped this gives you much better control over your infrastructure costs.
More Reading: Take a look at this article which shows you how to setup auto-scaling and manual scaling for Virtual Machines. http://azure.microsoft.com/en-us/documentation/articles/cloud-services-how-to-scale/
Another challenge in scaling applications is that they may be global applications, where you have users from all over the world. In this scenario it may not be feasible to host your site in a single data centre, instead, you may want to deploy multiple versions of your application in different data centres and then direct people depending on their geographical location. Azure offers data centres in 19 regions so you can choose where you want to run your site as well as store your data. If you deploy multiple sites you can use Azure Traffic Manager to load balance across those sites. You should also consider using the Azure CDN to create local caches of resources like images, videos or application scripts.
TIP: If you want to improve the availability of your site even more beyond the discussion earlier, you can deploy a second copy of the application to another data centre and use Traffic Manager to redirect users to it if there is a problem with the primary site. Of course you will need to consider how your data is shared between data centres but it is possible to stretch a virtual network across Azure data centres and leverage our dedciated and encrypted pipe between them
Securing your online site and making sure nobody gets at your data is vital and it's important that you understand that this is not always done for you. Most hosting providers will of course have comprehensive security processes in place to ensure their networks are secure and cannot be hacked, but this doesn’t mean your data or applications are safe. Maybe you have left bugs in your software that allow an attacker to retrieve data they shouldn't, or maybe you have left some ports open on a VM which allow attackers to get into your servers. The fact is combating cyber-crime is an ongoing initiative you will need to invest in, especially if you are transacting payments or sensitive data.
Here are just a few things you should consider:
Software development Life-cycle.
If you are developing software then you should have security built in to your processes. In general your website is the easiest place for an attacker to start. If a hacker can figure out how to retrieve data from your website or application then he/she can use this to attack another part of the system and may end up getting access to even more data. Your code should be written in such a way that it looks for typical attacks like cross-site scripting or sql injection. The best thing to do is follow the guidelines at OWASP.org
Operational controls to prevent, detect, contain and respond.
Although Microsoft have extensive measures in place to prevent attacks, it is also prudent to have measures in place to detect, contain and respond in the case where an attack is successful. If an attack were to go undetected then it's impact could be massive, whereas an attack that is detected immediately may have little or even no impact. In terms of how you can protect your application within Azure, here are some of the things you can do:
- Understand how networks work in Azure. Know how to allow and deny access to VMs, subnets and Vnets within Azure. Starting with a virtual network is always a good idea, when deploying virtual machines. After that you need to understand concepts like security groups and acls. Check out this article for a detailed explanation.
- Consider running a WAF (web application firewall) in front of your web servers. This will add an extra layer of protection and will monitor traffic for specific threats. Take a look at the WAF appliances available within the Azure marketplace. http://azure.microsoft.com/en-in/marketplace/?term=waf
- Run regular vulnerability scans on your application to check for open ports or other vulnerabilities that could expose your data. There are a number of services out there to do that, for example www.mcafeesecure.com. If you are using the Azure App Service to host your site, rather than Infrastructure Serviuces, you can implement built in vulnerability scanning using Tinfoil, see here.
- If you are storing sensitive information on Virtual Machines within Azure you can consider encrypting the VMs at rest. Take a look at the CloudLink Secure VM solution.
Further Reading: Checkout the detailed information available on the Microsoft website to see the security measures taken within the Azure network to ensure your applications are protected: http://azure.microsoft.com/en-us/support/trust-center/security/. Also check out the web application security consortium website here.
By providing customers with compliant, independently verified cloud services, Microsoft makes it easier for customers to achieve compliance for the infrastructure and applications they run in Azure. Microsoft provides Azure customers with detailed information about our security and compliance programs, including audit reports and compliance packages, to help customers assess our services against their own legal and regulatory requirements.
Checkout this link for more details: http://azure.microsoft.com/en-us/support/trust-center/compliance
So, now that we have looked at some of the areas to think about here is a simple list of the steps you should consider. Again, I am assuming you are using infrastructure services and are comfortable with Linux VMs and you have access to the Azure Portal (https://portal.azure.com/) via an Azure subscription.
- Start by setting up a virtual network and configure subnets for each tier in your application, e.g. data, application and web. Create a subnet specifically for a DMZ which will contain internet facing VMs.
- Setup your security groups to restrict access between the tiers of your architecture.
- Create your VMs and configure them to be placed within the appropriate subnet. You could use the Magento image for the application tier and the NGINX image or Barracuda Image for your DMZ tier.
- Setup the appropriate availability sets and load balanced sets to achieve your high availability targets.
- Setup a CDN to host your website assets like scripts, CSS files and images