Modern operations

Completed

Earlier in this module, we claimed that modern operations practices can help. However, we didn't say very much about what those practices were or why looking to them for assistance makes sense. Let's discuss that now as a prelude to the map we use to navigate our reliability work.

In this module and throughout the Modern Operations learning path, we use the term modern operations practices as a container for two different practices:

  • DevOps
  • Site reliability engineering (SRE)

Let's take a quick look at both of them and what they have to offer us in this context.

DevOps

DevOps is a many-splendored thing that different people have understood and interpreted in different ways. There are many good definitions for it. Lets look at a few of them to give you a well-rounded view:

Continuous delivery of value

DevOps is the union of people, process, and products to enable continuous delivery of value to our end users. The contraction of "Dev" and "Ops" refers to replacing siloed Development and Operations to create multidisciplinary teams that now work together with shared and efficient practices and tools. Essential DevOps practices include agile planning, continuous integration, continuous delivery, and monitoring of applications.

-- Sam Guckenheimer, What is DevOps?

Culture of collaboration, ownership and learning

DevOps is an engineering culture of collaboration, ownership, and learning with the purpose of accelerating the software development life cycle from ideation to production.

-- Emily Freeman, DevOps for Dummies

Work together to imagine, develop, deploy and operate a system

DevOps: a culture where people, regardless, of title, or background, work together to imagine, develop, deploy and operate a system.

-- Ken Murage

Site Reliability Engineering

If a key word in the DevOps realm is delivery, the key word for site reliability engineering would be reliability. While there are a few definitions for SRE that have been discussed in the field, there appears to be more consensus on the subject. Here's our definition that captures the key aspects of that consensus:

Site reliability engineering is an engineering discipline devoted to helping an organization sustainably achieve the appropriate level of reliability in their systems, services, and products.

Why these practices? Why now?

Operations professionals created DevOps and SRE in the last 15 years to respond to a specific set of challenges in the operations space. They recognized that many factors were going to require a new approach, including:

  • The growing complexity of production environments.
  • Increasing business dependency on the continuous functioning of those environments.
  • The inability to scale the workforce linearly with the size of these environments.

While there are different practices with somewhat of a different focus (for example, delivery vs. reliability as mentioned before), both DevOps and SRE have both developed principles and practices that are core to reliability work. These include monitoring/observability, proper incident response, automation and collaboration. All of these things play a prominent role in the material we discuss in the Improve your Reliability learning path.

So let's get into that work. We're going to start with a model that can help us structure and plan our reliability efforts.

Check your knowledge

1.

Which of these modern operations practices has a specific focus on the continuous delivery of value to end users?

2.

Which of these things isn't a core part of the engineering culture of DevOps?

3.

Which of these practices are common to both DevOps and SRE?

4.

Which of these modern operations practices focuses primarily on reliability?