N-tier application with Apache Cassandra
This reference architecture shows how to deploy VMs and a virtual network configured for an N-tier application, using Apache Cassandra on Linux for the data tier. Deploy this solution.
Download a Visio file of this architecture.
The architecture has the following components:
Resource group. Resource groups are used to group resources so they can be managed by lifetime, owner, or other criteria.
Virtual network (VNet) and subnets. Every Azure VM is deployed into a VNet that can be segmented into multiple subnets. Create a separate subnet for each tier.
NSGs. Use network security groups (NSGs) to restrict network traffic within the VNet. For example, in the 3-tier architecture shown here, the database tier does not accept traffic from the web front end, only from the business tier and the management subnet.
VM scale set (not shown). A VM scale set is an alternative to using an availability set. A scale sets makes it easy to scale out the VMs in a tier, either manually or automatically based on predefined rules.
Azure Load balancers. The load balancers distribute incoming Internet requests to the VM instances. Use a public load balancer to distribute incoming Internet traffic to the web tier, and an internal load balancer to distribute network traffic from the web tier to the business tier.
Public IP address. A public IP address is needed for the public load balancer to receive Internet traffic.
Jumpbox. Also called a bastion host. A secure VM on the network that administrators use to connect to the other VMs. The jumpbox has an NSG that allows remote traffic only from public IP addresses on a safe list. The NSG should permit ssh traffic.
Apache Cassandra database. Provides high availability at the data tier, by enabling replication and failover.
Azure DNS. Azure DNS is a hosting service for DNS domains, providing name resolution using Microsoft Azure infrastructure. By hosting your domains in Azure, you can manage your DNS records using the same credentials, APIs, tools, and billing as your other Azure services.
Your requirements might differ from the architecture described here. Use these recommendations as a starting point.
VNet / Subnets
When you create the VNet, determine how many IP addresses your resources in each subnet require. Specify a subnet mask and a VNet address range large enough for the required IP addresses, using CIDR notation. Use an address space that falls within the standard private IP address blocks, which are 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16.
Choose an address range that does not overlap with your on-premises network, in case you need to set up a gateway between the VNet and your on-premise network later. Once you create the VNet, you can't change the address range.
Design subnets with functionality and security requirements in mind. All VMs within the same tier or role should go into the same subnet, which can be a security boundary. For more information about designing VNets and subnets, see Plan and design Azure Virtual Networks.
Do not expose the VMs directly to the Internet, but instead give each VM a private IP address. Clients connect using the IP address of the public load balancer.
Define load balancer rules to direct network traffic to the VMs. For example, to enable HTTP traffic, create a rule that maps port 80 from the front-end configuration to port 80 on the back-end address pool. When a client sends an HTTP request to port 80, the load balancer selects a back-end IP address by using a hashing algorithm that includes the source IP address. In that way, client requests are distributed across all the VMs.
Network security groups
Use NSG rules to restrict traffic between tiers. For example, in the 3-tier architecture shown above, the web tier does not communicate directly with the database tier. To enforce this, the database tier should block incoming traffic from the web tier subnet.
- Deny all inbound traffic from the VNet. (Use the
VIRTUAL_NETWORKtag in the rule.)
- Allow inbound traffic from the business tier subnet.
- Allow inbound traffic from the database tier subnet itself. This rule allows communication between the database VMs, which is needed for database replication and failover.
- Allow ssh traffic (port 22) from the jumpbox subnet. This rule lets administrators connect to the database tier from the jumpbox.
Create rules 2 – 4 with higher priority than the first rule, so they override it.
We recommend DataStax Enterprise for production use, but these recommendations apply to any Cassandra edition. For more information on running DataStax in Azure, see DataStax Enterprise Deployment Guide for Azure.
Put the VMs for a Cassandra cluster in an availability set to ensure that the Cassandra replicas are distributed across multiple fault domains and upgrade domains. For more information about fault domains and upgrade domains, see Manage the availability of virtual machines.
Configure three fault domains (the maximum) per availability set and 18 upgrade domains per availability set. This provides the maximum number of upgrade domains that can still be distributed evenly across the fault domains.
Configure nodes in rack-aware mode. Map fault domains to racks in the
You don't need a load balancer in front of the cluster. The client connects directly to a node in the cluster.
For high availability, deploy Cassandra in more than one Azure region. Within each region, nodes are configured in rack-aware mode with fault and upgrade domains, for resiliency inside the region.
Do not allow ssh access from the public Internet to the VMs that run the application workload. Instead, all ssh access to these VMs must come through the jumpbox. An administrator logs into the jumpbox, and then logs into the other VM from the jumpbox. The jumpbox allows ssh traffic from the Internet, but only from known, safe IP addresses.
The jumpbox has minimal performance requirements, so select a small VM size. Create a public IP address for the jumpbox. Place the jumpbox in the same VNet as the other VMs, but in a separate management subnet.
To secure the jumpbox, add an NSG rule that allows ssh connections only from a safe set of public IP addresses. Configure the NSGs for the other subnets to allow ssh traffic from the management subnet.
VM scale sets help you to deploy and manage a set of identical VMs. Scale sets support autoscaling based on performance metrics. As the load on the VMs increases, additional VMs are automatically added to the load balancer. Consider scale sets if you need to quickly scale out VMs, or need to autoscale.
There are two basic ways to configure VMs deployed in a scale set:
Use extensions to configure the VM after it is provisioned. With this approach, new VM instances may take longer to start up than a VM with no extensions.
Deploy a managed disk with a custom disk image. This option may be quicker to deploy. However, it requires you to keep the image up to date.
For additional considerations, see Design considerations for scale sets.
When using any autoscale solution, test it with production-level workloads well in advance.
Each Azure subscription has default limits in place, including a maximum number of VMs per region. You can increase the limit by filing a support request. For more information, see Azure subscription and service limits, quotas, and constraints.
If you are not using VM scale sets, put VMs in the same tier into an availability set. Create at least two VMs in the availability set to support the availability SLA for Azure VMs. For more information, see Manage the availability of virtual machines.
The load balancer uses health probes to monitor the availability of VM instances. If a probe cannot reach an instance within a timeout period, the load balancer stops sending traffic to that VM. However, the load balancer will continue to probe, and if the VM becomes available again, the load balancer resumes sending traffic to that VM.
Here are some recommendations on load balancer health probes:
- Probes can test either HTTP or TCP. If your VMs run an HTTP server, create an HTTP probe. Otherwise create a TCP probe.
- For an HTTP probe, specify the path to an HTTP endpoint. The probe checks for an HTTP 200 response from this path. This can be the root path ("/"), or a health-monitoring endpoint that implements some custom logic to check the health of the application. The endpoint must allow anonymous HTTP requests.
- The probe is sent from a known IP address, 18.104.22.168. Make sure you don't block traffic to or from this IP address in any firewall policies or network security group (NSG) rules.
- Use health probe logs to view the status of the health probes. Enable logging in the Azure portal for each load balancer. Logs are written to Azure Blob storage. The logs show how many VMs on the back end are not receiving network traffic due to failed probe responses.
For the Cassandra cluster, the failover scenarios to consider depend on the consistency levels used by the application, as well as the number of replicas used. For consistency levels and usage in Cassandra, see Configuring data consistency and Cassandra: How many nodes are talked to with Quorum? Data availability in Cassandra is determined by the consistency level used by the application and the replication mechanism. For replication in Cassandra, see Data Replication in NoSQL Databases Explained.
Virtual networks are a traffic isolation boundary in Azure. VMs in one VNet cannot communicate directly with VMs in a different VNet. VMs within the same VNet can communicate, unless you create network security groups (NSGs) to restrict traffic. For more information, see Microsoft cloud services and network security.
For incoming Internet traffic, the load balancer rules define which traffic can reach the back end. However, load balancer rules don't support IP safe lists, so if you want to add certain public IP addresses to a safe list, add an NSG to the subnet.
Consider adding a network virtual appliance (NVA) to create a DMZ between the Internet and the Azure virtual network. NVA is a generic term for a virtual appliance that can perform network-related tasks, such as firewall, packet inspection, auditing, and custom routing. For more information, see Implementing a DMZ between Azure and the Internet.
Encrypt sensitive data at rest and use Azure Key Vault to manage the database encryption keys. Key Vault can store encryption keys in hardware security modules (HSMs). It's also recommended to store application secrets, such as database connection strings, in Key Vault.
Deploy the solution
A deployment for this reference architecture is available on GitHub.
Clone, fork, or download the zip file for the reference architectures GitHub repository.
Make sure you have the Azure CLI 2.0 installed on your computer. To install the CLI, follow the instructions in Install Azure CLI 2.0.
Install the Azure building blocks npm package.
npm install -g @mspnp/azure-building-blocks
From a command prompt, bash prompt, or PowerShell prompt, login to your Azure account by using one of the commands below, and follow the prompts.
Deploy the solution using azbb
To deploy the Linux VMs for an N-tier application reference architecture, follow these steps:
Navigate to the
virtual-machines\n-tier-linuxfolder for the repository you cloned in step 1 of the pre-requisites above.
The parameter file specifies a default adminstrator user name and password for each VM in the deployment. You must change these before you deploy the reference architecture. Open the
n-tier-linux.jsonfile and replace each adminUsername and adminPassword field with your new settings. Save the file.
Deploy the reference architecture using the azbb command line tool as shown below.
azbb -s <your subscription_id> -g <your resource_group_name> -l <azure region> -p n-tier-linux.json --deploy
For more information on deploying this sample reference architecture using Azure Building Blocks, visit the GitHub repository.