Test results for Hyper-V replication to a secondary site

This article provides the results of performance testing when replicating Hyper-V VMs in System Center Virtual Machine Manager (VMM) clouds, to a secondary datacenter.

Test goals

The goal of testing was to examine how Site Recovery performs during steady state replication.

  • Steady state replication occurs when VMs have completed initial replication, and are synchronizing delta changes.
  • It’s important to measure performance using steady state, because it’s the state in which most VMs remain, unless unexpected outages occur.
  • The test deployment consisted of two on-premises sites, with a VMM server in each site. This test deployment is typical of a head office/branch office deployment, with head office acting as the primary site, and the branch office as the secondary or recovery site.

What we did

Here's what we did in the test pass:

  1. Created VMs using VMM templates.
  2. Started VMs, and captured baseline performance metrics over 12 hours.
  3. Created clouds on the primary and recovery VMM servers.
  4. Configured replication in Site Recovery, including mapping between source and recovery clouds.
  5. Enabled protection for VMs, and allowed them to complete initial replication.
  6. Waited a couple of hours for system stabilization.
  7. Captured performance metrics over 12 hours, where all VMs remained in an expected replication state for those 12 hours.
  8. Measured the delta between the baseline performance metrics, and the replication performance metrics.

Primary server performance

  • Hyper-V Replica (used by Site Recovery) asynchronously tracks changes to a log file, with minimum storage overhead on the primary server.

  • Hyper-V Replica utilizes self-maintained memory cache to minimize IOPS overhead for tracking. It stores writes to the VHDX in memory, and flushes them into the log file before the time that the log is sent to the recovery site. A disk flush also happens if the writes hit a predetermined limit.

  • The graph below shows the steady state IOPS overhead for replication. We can see that the IOPS overhead due to replication is around 5%, which is quite low.

    Primary results

Hyper-V Replica uses memory on the primary server, to optimize disk performance. As shown in the following graph, memory overhead on all servers in the primary cluster is marginal. The memory overhead shown is the percentage of memory used by replication, compared to the total installed memory on the Hyper-V server.

Primary results

Hyper-V Replica has minimum CPU overhead. As shown in the graph, replication overhead is in the range of 2-3%.

Primary results

Secondary server performance

Hyper-V Replica uses a small amount of memory on the recovery server, to optimize the number of storage operations. The graph summarizes the memory usage on the recovery server. The memory overhead shown is the percentage of memory used by replication, compared to the total installed memory on the Hyper-V server.

Secondary results

The amount of I/O operations on the recovery site is a function of the number of write operations on the primary site. Let’s look at the total I/O operations on the recovery site in comparison with the total I/O operations and write operations on the primary site. The graphs show that the total IOPS on the recovery site is

  • Around 1.5 times the write IOPS on the primary.
  • Around 37% of the total IOPS on the primary site.

Secondary results

Secondary results

Effect on network utilization

An average of 275 Mb per second of network bandwidth was used between the primary and recovery nodes (with compression enabled), against an existing bandwidth of 5 Gb per second.

Results network utilization

Effect on VM performance

An important consideration is the impact of replication on production workloads running on the virtual machines. If the primary site is adequately provisioned for replication, there shouldn’t be any impact on the workloads. Hyper-V Replica’s lightweight tracking mechanism ensures that workloads running in the virtual machines are not impacted during steady-state replication. This is illustrated in the following graphs.

This graph shows IOPS performed by virtual machines running different workloads, before and after replication was enabled. You can observe that there is no difference between the two.

Replica effect results

The following graph shows the throughput of virtual machines running different workloads, before and after replication was enabled. You can observe that replication has no significant impact.

Results replica effects

Conclusion

The results clearly show that Site Recovery, coupled with Hyper-V Replica, scales well with minimum overhead for a large cluster. Site Recovery provides simple deployment, replication, management and monitoring. Hyper-V Replica provides the necessary infrastructure for successful replication scaling.

Test environment details

Primary site

  • The primary site has a cluster containing five Hyper-V servers, running 470 virtual machines.
  • The VMs run different workloads, and all have Site Recovery protection enabled.
  • Storage for the cluster node is provided by an iSCSI SAN. Model – Hitachi HUS130.
  • Each cluster server has four network cards (NICs) of one Gbps each.
  • Two of the network cards are connected to an iSCSI private network, and two are connected to an external enterprise network. One of the external networks is reserved for cluster communications only.

Primary hardware requirements

Server RAM Model Processor Number of processors NIC Software
Hyper-V servers in cluster:
ESTLAB-HOST11
ESTLAB-HOST12
ESTLAB-HOST13
ESTLAB-HOST14
ESTLAB-HOST25
128
ESTLAB-HOST25 has 256
Dell ™ PowerEdge ™ R820 Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz 4 I Gbps x 4 Windows Server Datacenter 2012 R2 (x64) + Hyper-V role
VMM Server 2 2 1 Gbps Windows Server Database 2012 R2 (x64) + VMM 2012 R2

Secondary site

  • The secondary site has a six-node failover cluster.
  • Storage for the cluster node is provided by an iSCSI SAN. Model – Hitachi HUS130.

Primary hardware specification

Server RAM Model Processor Number of processors NIC Software
Hyper-V servers in cluster:
ESTLAB-HOST07
ESTLAB-HOST08
ESTLAB-HOST09
ESTLAB-HOST10
96 Dell ™ PowerEdge ™ R720 Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz 2 I Gbps x 4 Windows Server Datacenter 2012 R2 (x64) + Hyper-V role
ESTLAB-HOST17 128 Dell ™ PowerEdge ™ R820 Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz 4 Windows Server Datacenter 2012 R2 (x64) + Hyper-V role
ESTLAB-HOST24 256 Dell ™ PowerEdge ™ R820 Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz 2 Windows Server Datacenter 2012 R2 (x64) + Hyper-V role
VMM Server 2 2 1 Gbps Windows Server Database 2012 R2 (x64) + VMM 2012 R2

Server workloads

  • For test purposes we picked workloads commonly used in enterprise customer scenarios.
  • We use IOMeter with the workload characteristic summarized in the table for simulation.
  • All IOMeter profiles are set to write random bytes to simulate worst-case write patterns for workloads.
Workload I/O size (KB) % Access %Read Outstanding I/Os I/O pattern
File Server 4
8
16
32
64
60%
20%
5%
5%
10%
80%
80%
80%
80%
80%
8
8
8
8
8
All 100% random
SQL Server (volume 1)
SQL Server (volume 2)
8
64
100%
100%
70%
0%
8
8
100% random
100% sequential
Exchange 32 100% 67% 8 100% random
Workstation/VDI 4
64
66%
34%
70%
95%
1
1
Both 100% random
Web File Server 4
8
64
33%
34%
33%
95%
95%
95%
8
8
8
All 75% random

VM configuration

  • 470 VMs on the primary cluster.
  • All VMs with VHDX disk.
  • VMs running workloads summarized in the table. All were created with VMM templates.
Workload # VMs Minimum RAM (GB) Maximum RAM (GB) Logical disk size (GB) per VM Maximum IOPS
SQL Server 51 1 4 167 10
Exchange Server 71 1 4 552 10
File Server 50 1 2 552 22
VDI 149 .5 1 80 6
Web server 149 .5 1 80 6
TOTAL 470 96.83 TB 4108

Site Recovery settings

  • Site Recovery was configured for on-premises to on-premises protection
  • The VMM server has four clouds configured, containing the Hyper-V cluster servers and their VMs.
Primary VMM cloud Protected VMs Replication frequency Additional recovery points
PrimaryCloudRpo15m 142 15 mins None
PrimaryCloudRpo30s 47 30 secs None
PrimaryCloudRpo30sArp1 47 30 secs 1
PrimaryCloudRpo5m 235 5 mins None

Performance metrics

The table summarizes the performance metrics and counters that were measured in the deployment.

Metric Counter
CPU \Processor(_Total)% Processor Time
Available memory \Memory\Available MBytes
IOPS \PhysicalDisk(_Total)\Disk Transfers/sec
VM read (IOPS) operations/sec \Hyper-V Virtual Storage Device(<VHD>)\Read Operations/Sec
VM write (IOPS) operations/sec \Hyper-V Virtual Storage Device(<VHD>)\Write Operations/S
VM read throughput \Hyper-V Virtual Storage Device(<VHD>)\Read Bytes/sec
VM write throughput \Hyper-V Virtual Storage Device(<VHD>)\Write Bytes/sec

Next steps

Set up replication