Volume 32 Number 10
Building Better Cloud Deployments: 5 Steps to Immutability
By Martin Albisetti | October 2017
Cloud delivery has emerged as the accepted standard for Web service deployment, whether those services reside on public or private cloud infrastructures. While cloud promises on-demand availability, automation and agility, not all Web services are designed to take full advantage of these promises. Unpredictable cloud infrastructure and agile DevOps cycles have created a need for isolated software components, which enable rapid movement through testing, development and production of Web services in the cloud.
At Bitnami, we've been living and breathing the evolution of software deployment for a very long time, and we've invested heavily in making it easy for developers to embrace the best deployment techniques available. One technique is the concept of immutable infrastructure, which is an approach to managing services and software deployments in a way that components are replaced rather than changed. In effect, an application or service is redeployed each time any change occurs.
Pets Versus Cattle
Randy Bias, founder of the OpenStack Foundation, coined the analogy of pets versus cattle to represent this. Traditional servers are treated like pets. They have their own personalities, and if they get sick you nurse them back to health. Cattle, on the other hand, all look the same. If they get sick you just get rid of them. It costs you more to treat cattle than to replace them. Also, cattle scales. A farmer will add to the herd as the operation grows, based on demand and budget.
Clouds provide just enough functionality to let you continue treating your servers as pets. The problem is, they're pet goldfish. It's boring. If the pet gets sick, you might be able to treat it, but vets will usually laugh at you if you try. The fish will often be unresponsive and will die at any time without warning.
This is why immutable (or image-based) deployments and using public or private clouds usually go together. If you want to confidently run production-grade Web services in the cloud, it's time to set aside pet ownership and deploy infrastructure and software that can be easily reproduced or replaced. Like cattle.
To understand the subtleties in this approach, let's consider the pros and cons of using clouds. Clouds offer benefits that used to be prohibitively expensive to achieve with private infrastructure. In addition to worldwide redundancy and fast delivery by serving content geographically close to users, cloud deployments enable elasticity. IT organizations can call upon resources as they're needed, paying only for what they use at any time. This on-demand model also makes it easy to experiment with new technologies and topologies without steep, up-front investments.
There's also the benefit that comes with the "Everything"-as-a-Service model. Clouds offer most of the common services you'd want in modern Web software, including databases (SQL and NoSQL), HTTP front ends, load balancers, DNS, block storage, object storage, log analysis, monitoring, queues, AI and more. The list grows with each month. And clouds let you focus on your core business, by offloading common services to a provider.
Of course, everything comes at a cost. Adopting on-demand infrastructure means you forego reliability guarantees. An instance in the cloud could disappear or reboot at any time without warning. Likewise, failure modes can be impossible to reproduce. Erratic behavior can also occur. For example, a new instance might fail to boot or servers might experience slow disk IO or spiking CPU loads, with no clear cause.
Above the Clouds
Given the advantages and challenges of cloud deployments, immutable services are a perfect fit. Among the key benefits:
- Predictably deploy exactly the same bytes everywhere in the world.
- Launch dozens, hundreds or even thousands of new instances in response to demand, with little or no preparation.
- Build automatic error recovery into deployments. Misbehaving instances can be automatically shut down or removed from serving requests and left for later inspection and new ones brought up, that way debugging can happen whenever convenient instead of in the middle of an outage.
- Automate deployments by employing images that don't have to be transformed after boot.
Beyond working well with clouds, there are many benefits to deploying your services immutably, from reduced complexity and speedier deployments to improved security and compliance.
Reduced Complexity If you have a basic set of three services, each with an average of four instances, you quickly end up with 12 pets. That's a lot of pets to feed! You might argue that the four instances are all the same, and that's probably how it started off, but you can see how over the period of six months each instance will have been upgraded, debugged and revived in different ways, no longer guaranteeing that they're actually the same anymore.
By contrast, an immutable approach means that you have exactly one image. Each of the four instances will contain the exact same code at deployment time, configured in the exact same way. Unless you change something afterward, you only have three different variations of the images to understand and worry about. Additionally, any drift that occurs while in production will be reset on each subsequent deployment.
Speedier Deployment Immutable deployments shift a lot of the heavy lifting (including downloading, compiling and validating) to the build stage, adding time to this phase of deployment but protecting against failures in production.
Once images are built, deployments are as fast as you can bring up a server on a particular cloud, including rolling back a problematic change. Also, deployments become effectively atomic, letting you perform rolling or cluster-scale deployments while ensuring no half-state exists on any particular server.
Simpler Experimentation and Rollbacks A single image per deployment means you can create a short-lived experiment and destroy it once it's complete. Experiments are also cheaper because you know you're starting from the same base you have running in production, down to the last byte. This makes it easier to track the changes you're trying to make.
Better Reproducibility Immutability ensures that the same code returns the same output in every laptop, every continuous integration (CI) and every server. There is no uncertainty as to what combination of code and configuration is being used. If you have to make changes to the environment or running images, you can easily redeploy from the same image to go back to a known-good state. This provides the closest thing to a "production-like environment," given that the images themselves are produced every time you make a change.
Tighter Security and Compliance By baking an image each time, you know exactly what's running on any server at any point in time, allowing you to audit a single artifact and ensure that's what's running everywhere. In extreme cases, you can use resources like the Docker read-only mode to ensure no subsequent changes were made to instances when they're run.
Improved Disaster Recovery To create immutable infrastructure, you need to have a minimum level of automation in your deployments. When (not if) your infrastructure collapses, this improvement can allow you to recover in minutes instead of hours, days or weeks.
Climb the Immutable Ladder
Creating immutable services isn't always easy, especially if your project was built on assumptions common to legacy datacenters like being able to write anywhere on the file system or overwrite existing files for fast in-place upgrades. However, you can work toward an immutable service in just five steps, as depicted in Figure 1, with each step providing tangible benefits.
Figure 1 5 Steps to Immutability
A climb all the way up the immutable ladder requires investment and patience, but you'll see early benefits even with some small changes. Let's explore the steps in an immutable services effort.
Step 0: Initial Disentanglement
This step is where a lot of disentangling happens. By taking time to see how current processes, scripts and templates are deeply coupled to each other, you can reap many of the benefits of immutability by having a more reproducible deployment strategy:
- Immutable deployments forbid in-place upgrades of your code.
- Choose the fastest path that allows for image-based deployments instead of in-place upgrades. There are several ways to tackle this: You can snapshot your current VM and use that as the base image and replace the code on each deployment, for instance, or you can tweak your config management software to fully configure a vanilla image.
- With the new model you'll deploy each instance as a new image. That means you must be able to switch traffic to new instances either at the DNS level or in the HTTP front end, be it load balancers, HTTP servers, caching layers or the like.
- Critical data must be stored outside of the image file system, either externally or in a mounted volume. This will continue to be a requirement in all subsequent steps.
At this stage, many things can fail—for example, you might not be able to download a system package dependency or the system could fail to configure properly due to a race condition or unexpected version skew. Rebuilding the whole image each time in one way or another will usually take a significant amount of time.
Step 1: Isolate from the OS
Once you've completed step 0, the next target is to untangle yourself from the OS layer. Doing this lets you start many different services and deployments from the same, well-known barebones image. It can also reduce the number of im-ages being maintained and speed up the build process. Among things to keep in mind:
- Start by moving all your dependencies out of system directories.
- Programming languages and frameworks are generally well-suited to the separation required in this step. For exam-ple, Python's Virtualenv, Ruby's Gemfiles and PHP's Composer all have easy ways to isolate dependencies from the rest of the system by using relative paths to store them by default.
- Compiled dependencies can be more difficult to handle, because they often expect to find dependencies on specific system paths.
- At this stage you'll be replacing the whole server on each deployment, so you'll need to make sure you don't throw away valuable data. It's important that system-level logs be stored externally to the instance.
Step 2: Isolate from the Runtime
So you've untangled your service from the underlying OS. Now you need to make sure your language runtime is predictable and reliable. Roll your language runtime into your base image—including any language configuration options you may need to set by default—and you'll know you can rely on the same behavior from it in every instance:
- Runtimes are trivial to update. They don't change that often and are easy to keep up-to-date even when rolled into base images.
- Rolling runtimes into base images lets you control when updates happen, which is important because updates can break compatibility and force you to port applications.
- An isolated runtime can eliminate baffling production challenges produced by small configuration differences, like between an 8MB and 64MB memory_limit.
- Immutability at this step is still easy to achieve because both the OS and the language runtime are amenable to isolation from everything else on the system.
Step 3: Isolate from the Framework
If you got to this step, you've probably got the hang of this immutability thing. You can start to see why the immutable ladder is worth climbing. With step 3, your commitment will get tested, as some unique challenges start to show themselves when working with frameworks. Among them:
- Frameworks change a bit more often than runtimes, so you'll have to update base images more often. That said, being able to manage framework updates on your own schedule is important, as even minor updates can break compatibility.
- Frameworks often present more security issues, because their surface of attack is wider and usually exposed to the Internet. For instance, Python produces 26 common vulnerabilities and exposures (CVE) versus 48 CVEs for Django.
- Disentangling your deployments gets tougher at this stage, because frameworks tend to assume they're part of your code base. You must learn more about the inner workings of your framework to understand the best way to manage the separation between code and framework.
Step 4: Fully Baked Image per Deploy
You've made the climb! This is your destination. When your images boot, they're ready for action. There's no downloading, initializing or major configuring of any software in the image. The only things that need tweaking at this stage are minor perinstance values, such as IP addresses or instance names:
- Your app code and data are completely separate from the underlying OS, runtime and framework. Bake those three underlying elements into your base image, and you can update code and data independently.
- You don't need to rebuild your service from scratch to get here, but some extra effort may be needed to ensure you're only writing in specific places.
- Once you've managed to set everything up to match this step, you can use auto-scale solutions to spin images up or down quickly and reliably.
Bonus Side Step: Immutable + Stateless
Not all services will fit into this model. However, if you happen to have a stateless service such as a worker farm, a caching system or a static Web site, this approach allows you to keep all the data in the image itself. This arrangement reduces the complexity of your deployment by serving it directly from the instance, and is ideal for:
- Read-only data like static Web sites (no runtime in production).
- Workers that process data and farm out to a queue (same runtime, no local state).
The Immutable Ladder
Now that your infrastructure is immutable, your development-to-production cycle will be optimized, seamless and much more fluid, as shown in Figure 2.
Figure 2 The Optimized Production Cycle
Public clouds are generally built to be less reliable than datacenters, because they share underlying resources with potential noisy neighbors. In exchange, you can automatically scale and shut down resources in response to changes in demand. This arrangement means you should optimize for instances to be shut down and replaced. The more your instance scales, the more the investment in automation pays off, because you'll need to spin up instances more frequently. You'll find that optimizing your instances becomes a much more seamless process, enabling you to focus on devel-opment rather than handholding deployments.
To quote Sylvester Stallone in the movie "Rocky Balboa": "It hurts now ... but, one day, it'll be your warm up." As you start at step 0 of the immutable ladder, you may have to remind yourself that the pain is worth the gain. But the gain is considerable. An immutable approach allows you to make sure that things work well before they go to production, rather than discovering issues after OSes, frameworks, runtimes and application code have a chance to work against each other in a live fire deployment. Now more failures can be discovered and dealt with "on the ground" before deployment, rather than in the air when your environment is up and running.
The 3 Layers
Once you're producing an image per deployment, you'll want to think about your system architecture as having three layers: a volatile layer, a persistent data layer and a routing layer. At Bitnami, we've spent a lot of time thinking about these layers and how they affect each open source application, especially those that are usually used as open source building blocks for others, such as databases and language frameworks.
Layer 1: Volatile Deployments are now disposable images, which get thrown away at each deployment and replaced with a freshly built one. That makes the code and all the data it writes locally disposable. You should be prepared to delete and replace any of these instances at any point in time. These effectively become stateless servers.
Layer 2: Persistent Data This is where you'll store your important databases, user files, session state and anything else that's important to keep around. You'll want to store service and system logs at this layer for compliance and debugging purposes, as well. It can be hard to ensure that no single point of failure exists in this layer, so cloud Software-as-a-Service solutions can come in handy to address these concerns. This layer should be robust, highly available and fully backed up.
Layer 3: Routing Once you have a volatile layer, you need a predictable and addressable layer for external users and other services to access your services. You'll need to have predictable public IP addresses and domain names, decoupled from the volatile servers running your code and able to forward requests to a flexible pool of servers.
The take-away for developers is clear: Building and running immutable services provides enormous benefits across the pipeline from development and build to production. With each step you take on the immutable ladder, your existing services gain extra benefits and advantages.
Immutable infrastructure at its heart is about full automation of deployments and enforcing a process that guarantees—to the maximum extent possible—what's running on your servers. Step 0 is the place to start. Even if you only walk up that first step, your cloud workloads will be in a much better situation than they were before.
Martin Albisetti is a senior architect for Bitnami and was previously director of engineering at Canonical (Ubuntu). Albisetti spends his free time working with his hands in rural Uruguay, where he lives with his family. You can find him on Twitter: @beuno.
Thanks to the following Microsoft technical expert for reviewing this article: James McCaffrey