Shielded VMs in Windows Server 2016
A lot of customers ask service providers: "How can you guarantee, that your technical staff won't steal our confidential data?" And honestly speaking, there is no simple answer for that. Some service providers can say that they have organizational policies and security monitoring in place, others say that they will implement encryption of tenant's data etc. But usually there is no technical barrier, that will guarantee, that virtualization admin won't get access to data inside tenant's VM. And we see a huge market of customers, that are not even thinking of using service provider's cloud, because they think that keeping all VMs inside their office is much more secure. This problem also blocks current service provider customers from moving from legacy collocation to modern cloud offers.
Windows Server 2016 solves that problem with a new technology called "Shielded VMs" .And this is a kind of technology, when you think "Hmm, how could I live without that?"
Shielded VMs is a set of technologies, that have the same goal - protect tenant secrets from service provider technical staff (aka "rogue admins") or from hackers, that got elevated rights inside service provider virtualized environment. Shielded VMs close the attack vectors that are unique to the fact that the VM is virtualized. Before I'll dig into Shielded VMs details, let's see what are typical attack vectors on tenant VMs from service provider side.
So, let's imagine that I'm the system administrator of service provider IaaS environment. I can be VMM Fabric admin for Hyper-V based cloud, vCenter admin for VMWare-based cloud, root user for Xen\KVM hosts etc. I'm the user, that has full control of virtualization hosts. My company has a customer - a big and successful internet store. That tenant has 10 VMs in our cloud, all VMs are connected to the dedicated virtual network. There are security solutions on the edge that block all suspicious traffic - port scans, password brute force attempts, network attacks etc. There are strict firewall rules applied to the network. So everything is aligned with industry security standards for public cloud.
I don't know tenant's administrator password, that is used inside tenant VMs. But I have a plenty of other options:
- I can make a snapshot (checkpoint) of the VM, that contains a SQL database with all customer's data. Then I can copy its content to my personal flash drive without any downtime, and tenant won't even know (yes, snapshots are great for such purpose).
- First attempt is the easiest - I can mount the VHDX/VMDK file and see all the content of tenant's virtual disk. I can just copy the SQL database, mount it my own SQL server and get the list of all customers. Then I can save the list of customers with their purchase history and contact information to Excel and sell that data to my customer's main competitor. Done.
- But what should I do if the virtual disk is encrypted? First - virtual disk encryption on the guest OS level in a regular way adds extra steps to boot the VM, because tenant needs to provide a password or PIN to unlock the encrypted drive during the boot. Second - customers think that disk encryption inside VM is good enough to protect from rogue admins and hackers, and that's not usually true. I, as a rogue admin, can boot that VM on my laptop (outside corporate firewalls and security solutions, outside corporate security monitoring perimeter) and try to guess admin/root password through the network. No firewall - all ports are open. No attempt limit - I can re-start the activity using the snapshot copy. And password guessing will work a thousand times faster because I will do it not through regular LAN, but through virtual network bus between my laptop and virtual network adapter of the VM.
- Password brute forcing can take a day, a week or a month. But who cares - I have a plenty of time. The default password change policy in AD is 42 days, and that's usually enough to guess even a complex password when VM runs on my laptop. A lot of organizations change admin password once a year, and that is more than enough for a hacker to guess admin password and use it on the production system.
- After guessing the admin/root password, I can logon to the VM via RDP/SSH/PowerShell and copy all the required files that I need, even if the virtual disk is encrypted. Because I've used a snapshot trick, I've got a copy of the VM in the running state, so I don't need to provide an unlock code at the boot time - encrypted disk is already unlocked!
This was a case of a smart rogue admin that wanted to steal confidential data of the customer for some purposes - fun, revenge or profit. But there are even more easy ways to hack tenant VMs:
- Virtualization admin can connect to the VM via console. When you have console access you can do a lot. You can use Ease of Access exploit and reset the local admin password. You can reset the root password in Linux. Then you will be able to logon into the VM and do whatever you like inside it.
- Virtualization admin can send shutdown command to the tenant's VM, mount its disk, inject any script inside the OS and then turn the VM back on. All these actions can be scripted, so for the tenant it will look like a quick VM reboot with 10 seconds of downtime. It can be done during the planned maintenance window, so tenant won't even think than something went wrong. I've already showed how to inject a startup script into the tenant VM to enable SIL agent inside it - that was easy and doesn't require an admin credentials. Similar approach can be used by rogue admin to inject a malware script that will do something bad with tenant's VM - e.g. create a new admin user with a password, set by rogue admin.
- Network admin can listen for the Live Migration or vMotion network traffic, and then analyze that traffic to know what's inside the tenant VM. If he will be able to get the password hash from the traffic dump, he will be able to logon the systems inside tenant VMs using a username and a hash instead of the password (aka "Pass-the-hash" attack).
- Storage admin can make a snapshot of the virtual disk of the VM, mount it and steal the data. Backup admin can restore the VM from backup and do the same. They can get the NTDS.dit file from the AD Domain Controller virtual disk and use some tools, that will be able to extract user passwords from AD database. Then they can use those credentials to connect the tenant VMs remotely.
- Admins of the secondary datacenter can steal tenant secrets from the replicated VM copy in a secondary site.
As you see - a lot of technical employees inside service provider organization have an ability to steal the customer data from tenant VMs. Don't get me wrong - even if you are a CEO or CTO of a service provider, you fully trust your employees and you are sure that they won't be bribed, there is still no guarantee that one day their admin credentials won't be stolen with purpose and used for rogue admin attack.
All those tricks can be implemented on different modern IaaS platforms - Hyper-V 2012 R2, vSphere 6, OpenStack Newton etc. A lot of customers live with it, some of them afraid of those risks, some of them don't even know that service provider can get their data. Shielded VMs functionality was added to Windows Server 2016 to provide service providers and their customers a comprehensive solution to protect from such attacks.
What Windows Server 2016 Shielded VMs include:
- Shielded VM mode. In this mode, Secure Boot and vTPM are enforced, Saved State file and Live Migration traffic are encrypted. Also, some potentially unsecure VM extensions like Console access, keyboard and mouse drivers, COM/Serial ports and debugger are disabled.
- BitLocker Virtual Disk encryption using vTPM. No need to provide an unlock code after reboot - use guest disk encryption everywhere without any administration overhead. Encryption keys are securely sealed inside virtual TPM device, that moves when the VM moves to another host.
- Host Guardian Service (HGS). HGS is a Windows Server role that measures the health of Hyper-V hosts and releases keys to healthy Hyper-V hosts when powering-on or live migrating Shielded VMs. These two capabilities are fundamental to a Shielded VMs solution and are referred to as the Attestation service and Key Protection Service (KPS) respectively. HGS won't allow Shielded VMs to boot on any host that is not a part of pre-authorized guarded fabric (e.g. personal laptop of a rogue admin) or on a compromised host. It is expected that HGS service will be managed by different group of people inside service provider organization to keep the keys to the kingdom away from the kingdom.
- Hypervisor-enforced Code Integrity Policies . Security policies, applied on Hyper-V hosts, that maintain the standards by which a server running Windows Server 2016 determines whether an application is trustworthy and can be run, or it should be blocked. A Code Integrity policy helps ensure that only the executables you trust to run on a host are allowed to run. Malware and other executables outside the trusted executables are prevented from running. HGS in "TPM-trusted attestation" won't allow Shielded VMs to boot on hosts, where Code Integrity Policy on the host is not aligned with Code Integrity Policy, specified on HGS.
- Shielded VM Tools . Several command line and GUI tools that are required to create Shielded Templates for VMM and Azure Pack. Those tools are also used to prepare Shielding Data File (also called a provisioning data file or PDK file). That's an encrypted file that a tenant creates to protect important VM configuration information, such as the administrator password, RDP certificate, domain-join credentials, and so on. A fabric administrator uses the shielding data file when creating a shielded VM, but is unable to view or use the information contained in the file.
Here is the diagram, that shows the boot process of the Shielded VM:
It the following table you can see how Shielded VMs technologies can protect tenant's data from typical rogue admin attacks:
|Attack type||How Shielded VMs protect tenant secrets|
|Admin/root password reset in Guest OS using specialized tools||Console access, PowerShell Direct and potentially unsecure WMI extensions disabled inside Guest OS. Secure Boot enforced.|
|OS File System access and custom script injection (e.g. add a new admin user during startup)||VM OS disk and Checkpoints are encrypted by BitLocker using virtual TPM. Encryption keys are securely stored in HGS.|
|Copy VM outside security perimeter, launch it on personal laptop and make password brute force attack||HGS will provide keys for booting the VM only after Hyper-V host attestation. If either the fabric or the host is unknown, VM won't be able to boot.|
|Change the VM bootloader||Secure Boot will block the VM boot if the bootloader will be changed.|
|Analyze VM dump||VMWP debugger disabled. Saved State file is encrypted. Hyper-V hosts cannot run VMs with debugger's attached because HGS considers that unhealthy.|
|Analyze network traffic during VM migration from one host to another||Live Migration traffic is encrypted.|
|Hack the tenant VM in the secondary site||All security capabilities are still active in the secondary site - VM files are encrypted, HGS won't allow to boot the VM if Hyper-V hosts in secondary site are not part of guarded fabric.|
|Malware, running on hypervisor host||Code Integrity Policies won't allow to run untrusted software. HGS won’t allow to run VMs on a host without the specified Code Integrity Policy.|
|Malicious script injection into the VM template||Disk signature is checked for match with Volume Signature Catatog in HGS, if template disk was changed - VM won't be created.|
|Steal the admin password when tenant types it during the VM creation from template on the IaaS portal (e.g. WAP or vCloud Director).||Admin password is stored and encrypted inside PDK-file and not provided in the clear text during the VM creation. Man-in-the-middle attacks are blocked through RDP mutual authentication which is why shielding data contains an RDP certificate.|
Shielded VMs are fully integrated into "Microsoft Datacenter vNext" platform. System Center 2016 and Windows Azure Pack already support Shielded VMs and allow you to:
- Add new hosts to the guarded fabric and create Shielded VM templates using VMM 2016
- Backup and restore Shielded VMs in DPM 2016
- Monitor your guarded fabric using SCOM 2016
- Add Shielded VMs capabilities to Azure Pack plans. Tenants can upload their shielding data files, download volume signature catalogs and create new VMs as Shielded.
As you see, Shielded VMs is not just a security theater, that provides the feeling of improved security while doing little or nothing to actually achieve it. It's an end-to-end solution, that provides protection from the bottom to the top. How it can be used in real life:
- Shielded VMs for Service Provider - if you are a service provider, then you can use Shielded VMs to assure your customers (current and potential ones) that it is secure to run VMs with sensitive data in your cloud. It can be a new business opportunity to reach new customers, that were afraid to use external datacenters before.
- Shielded VMs for a Tenant - I've met a lot of organizations, that already heard about Shielded VMs. They realize how unsecure it can be to run VMs with confidential data in the public cloud environment without such technologies. They require "Shielded VMs or similar technology" in their RFPs, that they send to service providers. If you are choosing a service provider to host your data - ask your potential provider if they can offer you Shielded VMs. Start with Domain Controllers - the core of your corporate identity.
Shielded VMs FAQ
Q: Looks like a great technology. But do you have any real-world cases of service providers running such?
A: Sure! Well-knows market leaders like Rackspace, Acuutech, Brightsolid and Convergent already offer Shielded VMs to their customers. And you'll see much more in the nearest future.
Q: What are the requirements for the solution?
A: Detailed explanation is here. You will need Windows Server 2016 Datacenter and System Center 2016 Datacenter (VMM and SPF components), so CIS Datacenter per core licensing will be optimal. Servers for Hyper-V hosts must support Windows Server 2016. For TPM-trusted attestation (which is recommended) hosts also need to support TPM 2.0, UEFI and SLAT. Last 2 are probably supported by your existing hardware, but TPM 2.0 is pretty new thing and may not be available in servers built before December of 2015. Some hardware vendors let you add TPM 2.0 chip on the external card to the existing server if it doesn't have TPM 2.0 chip onboard. If your current servers don't support TPM 2.0 at all - then use Admin-trusted attestation mode as a temporary workaround. You can convert the fabric to TPM-trusted without re-installing it once your hardware supports TPM 2.0.
Q: What Guest operating systems are compatible with Shielded VMs?
A: Windows Server 2012 or later. Linux is not supported at this time, but this is planned to future updates. Detailed Guest OS requirements are available here.
Q: Is it hard to deploy?
A: Not at all. Full deployment path is described here. In few words - install all available updates for Windows Server 2016 and System Center 2016 (important), deploy HGS environment, configure Hyper-V hosts, create shielded VM templates, make them available for WAP tenants.
Q: Is it possible to convert the existing tenant VM into Shielded VM?
A: Yes, but the preferred way is to install Shielded VMs from scratch using Shielded templates and encrypted PDK file with admin credentials. But if you really need to convert the existing Gen2 VM into Shielded VM, read here.
Q: How service provider can control the license compliance if there is no access into tenant's Guest OS?
A: Shielded VMs support SIL for that purpose. With SIL and SILA service provider can control what Microsoft software is installed inside tenant's VM. Similar software can be used to control licensing for other vendor products.