Security in DevOps (DevSecOps)

Security is a key part of DevOps. But how does a team know if a system is secure? Is it really possible to deliver a completely secure service?

Unfortunately, the answer is no. DevSecOps is a continuous and ongoing effort that requires the attention of everyone in both development and IT operations. While the job is never truly done, the practices that teams employ to prevent and handle breaches can help produce systems that are as secure and resilient as possible.

"Fundamentally, if somebody wants to get in, they're getting in...accept that. What we tell clients is: number one, you're in the fight, whether you thought you were or not. Number two, you almost certainly are penetrated." -- Michael Hayden, Former Director of NSA and CIA

The security conversation

Teams that don't have a formal DevSecOps strategy are encouraged to begin planning as soon as possible. At first there may be resistance from team members who don't fully appreciate the threats that exist. Others may not feel that the team is equipped to face the problem and that any special investment would be a wasteful distraction from shipping features. However, it's necessary to begin the conversation to build consensus as to the nature of the risks, how the team can mitigate them, and whether the team needs resources they don't currently have.

Expect skeptics to bring some common arguments, such as:

  • How real is the threat? Teams often don't appreciate the potential value of the services and data they're charged with protecting.
  • Our team is good, right? A security discussion may be perceived as doubt in the team's ability to build a secure system.
  • I don't think that's possible. This is a common argument from junior engineers. Those with experience usually know better.
  • We've never been breached. But how do you know? How would you know?
  • Endless debates about value. DevSecOps is a serious commitment that may be perceived as a distraction from core feature work. While the security investment should be balanced with other needs, it can't be ignored.

The mindset shift

DevSecOps culture requires an important shift in mindset. Not only do you need to prevent breaches, but assume them as well.

Security strategy components

There are many techniques that can be applied in the quest for more secure systems.

Preventing breaches Assuming breaches
Threat models War game exercises
Code reviews Central security monitors
Security testing Live site penetration tests
Security development lifecycle (SDL)

Every team should already have at least some practices in place for preventing breaches. Writing secure code has become more of a default, and there are many free and commercial tools to aid in static analysis and other security testing features.

However, many teams lack a strategy that assumes system breaches are inevitable. Assuming that you've been breached can be hard to admit, especially when having difficult conversations with management, but that assumption can help you answer questions about security on your own time. You don't want to figure it all out during a real security emergency.

Common questions to think through include:

  • How will you detect an attack?
  • How will you respond if there is an attack or penetration?
  • How will you recover from an attack, such as when data has been leaked or tampered with?

Key DevSecOps practices

There are several common DevSecOps practices that apply to virtually any team.

First, focus on improving mean time to detection and mean time to recovery. These metrics indicate how long it takes to detect a breach and how long it takes to recover, respectively. They can be tracked through ongoing live site testing of security response plans. When evaluating potential policies, improving these metrics should be an important consideration.

Practice defense in depth. When a breach happens, attackers can get access to internal networks and everything inside them. While it would be ideal to stop attackers before it gets that far, a policy of assuming breaches drives teams to minimize exposure from an attacker who has already gotten in.

Finally, perform periodic post-breach assessments of your practices and environments. After a breach has been resolved, your team should evaluate the performance of the policies, as well as their own adherence to them. Policies are most effective when teams actually follow them. Every breach, whether real or practiced, should be seen as an opportunity to improve.

Strategies for mitigating threats

There are too many threats to enumerate them all. Some security holes are due to issues in dependencies like operating systems and libraries, so keeping them up-to-date is critical. Others are due to bugs in system code that require careful analysis to find and fix. Poor secret management is the cause of many breaches, as is social engineering. It's a good practice to think about the different kind of security holes and what they mean to the system.

Attack vectors

Consider a scenario where an attacker has gained access to a developer's credentials. What can they do?

Privilege Attack
Can they send emails? Phish colleagues
Can they access other machines? Log on, mimikatz, repeat
Can they modify source Inject code
Can they modify the build/release process? Inject code, run scripts
Can they access a test environment? If a production environment takes a dependency on the test environment, exploit it
Can they access the production environment? So many options...

How can your team defend against these vectors?

  • Store secrets in protected vaults
  • Remove local admin accounts
  • Restrict SAMR
  • Credential Guard
  • Remove dual-homed servers
  • Separate subscriptions
  • Multi-factor authentication
  • Privileged access workstations
  • Detect with ATP & Microsoft Defender for Cloud

Secret management

All secrets must be stored in a protected vault. Secrets include:

  • Passwords, keys, and tokens
  • Storage account keys
  • Certificates
  • Credentials used in shared non-production environments, too

You should use a hierarchy of vaults to eliminate duplication of secrets. Also consider how and when secrets are accessed. Some are used at deploy-time when building environment configurations, whereas others are accessed at run-time. Deploy-time secrets typically require a new deployment in order to pick up new settings, whereas run-time secrets are accessed when needed and can be updated at any time.

Platforms have secure storage features for managing secrets in CI/CD pipelines and cloud environments, such as Azure Key Vault and GitHub Actions.

Helpful tools

  • Microsoft Defender for Cloud is great for generic infrastructure alerts, such as for malware, suspicious processes, etc.
  • Source code analysis tools for static application security testing (SAST).
  • GitHub advanced security for analysis and monitoring of repos.
  • mimikatz extracts passwords, keys, pin codes, tickets, and more from the memory of lsass.exe, the Local Security Authority Subsystem Service on Windows. It only requires administrative access to the machine, or an account with the debug privilege enabled.
  • BloodHound builds a graph of the relationships within an Active Directory environment. It can be used the red team to easily identify attack vectors that are difficult to quickly identify.

War game exercises

A common practice at Microsoft is to engage in war game exercises. These are security testing events where two teams are tasked with testing the security and policies of a system.

The red team takes on the role of an attacker. They attempt to model real-world attacks in order to find gaps in security. If they can exploit any, they also demonstrate the potential impact of their breaches.

The blue team takes on the role of the DevOps team. They test their ability to detect and respond to the red team's attacks. This helps to enhance situational awareness and measure the readiness and effectiveness of the DevSecOps strategy.

Evolve a war games strategy

War games are effective at hardening security because they motivate the red team to find and exploit issues. It'll probably be a lot easier than expected early on. Teams that haven't actively tried to attack their own systems are generally unaware of the size and quantity of security holes available to attackers. The blue team may be demoralized at first since they'll get run over repeatedly. Fortunately, the system and practices should evolve over time such that the blue team consistently wins.

Prepare for war games

Before starting war games, the team should take care of any issues they can find through a security pass. This is a great exercise to perform before attempting an attack because it will provide a baseline experience for everyone to compare with after the first exploit is found later on. Start off by identifying vulnerabilities through a manual code review and static analysis tools.

Organize teams

Red and blue teams should be organized by specialty. The goal is to build the most capable teams for each side in order to execute as effectively as possible.

The red team should include some security-minded engineers and developers deeply familiar with the code. It's also helpful to augment the team with a penetration testing specialist, if possible. If there are no specialists in-house, many companies provide this service along with mentoring.

The blue team should be made up of ops-minded engineers who have a deep understanding of the systems and logging available. They have the best chance of detecting and addressing suspicious behavior.

Run early war games

Expect the red team to be effective in the early war games. They should be able to succeed through fairly simple attacks, such as by finding poorly protected secrets, SQL injection, and successful phishing campaigns. Take plenty of time between rounds to apply fixes and feedback on policies. This will vary by organization, but you don't want to start the next round until everyone is confident that the previous round has been mined for all it's worth.

Ongoing war games

After a few rounds, the red team will need to rely on more sophisticated techniques, such as cross-site scripting (XSS), deserialization exploits, and engineering system vulnerabilities. Bringing in outside security experts in areas like Active Directory may be helpful in order to attack more obscure exploits. By this time, the blue team should not only have a hardened platform to defend, but will also make use of comprehensive, centralized logging for post-breach forensics.

"Defenders think in lists. Attackers think in graphs. As long as this is true, attackers win." -- John Lambert (MSTIC)

Over time, the red team will take much longer to reach objectives. When they do, it will often require discovery and chaining of multiple vulnerabilities to have a limited impact. Through the use of real-time monitoring tools, the blue team should start to catch attempts in real-time.

Guidelines

War games shouldn't be a free-for-all. It's important to recognize that the goal is to produce a more effective system run by a more effective team.

Code of conduct

Here is a sample code of conduct used by Microsoft:

  1. Both the red and blue teams will do no harm. If the potential to cause damage is significant, it should be documented and addressed.
  2. The red team should not compromise more than needed to capture target assets.
  3. Common sense rules apply to physical attacks. While the red team is encouraged to be creative with non-technical attacks, such as social engineering, they shouldn't print fake badges, harass people, etc.
  4. If a social engineering attack is successful, don't disclose the name of the person who was compromised. The lesson can be shared without alienating or embarrassing a team member everyone needs to continue to work with.

Rules of engagement

Here are sample rules of engagement used by Microsoft:

  1. Do not impact the availability of any system.
  2. Do not access external customer data.
  3. Do not significantly weaken in-place security protections on any service.
  4. Do not intentionally perform destructive actions against any resources.
  5. Safeguard credentials, vulnerabilities, and other critical information obtained.

Deliverables

Any security risks or lessons learned should be documented in a backlog of repair items. Teams should define a service level agreement (SLA) for how quickly security risks will be addressed. Severe risks should be addressed as soon as possible, whereas minor issues may have a two-sprint deadline.

A report should be presented to the entire organization with lessons learned and vulnerabilities found. It's a learning opportunity for everyone, so make the most of it.

Lessons learned at Microsoft

Microsoft regularly practices war games and has learned a lot of lessons along the way.

  • War games are an effective way to change DevSecOps culture and keep security top-of-mind.
  • Phishing attacks are very effective for attackers and should not be underestimated. The impact can be contained by limiting production access and requiring two-factor authentication.
  • Control of the engineering system leads to control of everything. Be sure to strictly control access to the build/release agent, queue, pool, and definition.
  • Practice defense in depth to make it harder for attackers. Every boundary they have to breach slows them down and offers another opportunity to catch them.
  • Don't ever cross trust realms. Production should never trust anything in test.

Next steps

Learn more about the security development lifecycle and DevSecOps on Azure.