Performance implications of unmanaged array accesses

Article
02/08/2008

I was recently shown the following code and asked why the loop calling SafeAccess executed significantly faster than the second loop calling UnsafeAccess:

static int [] intarray = new int [5000];

static void SafeAccess(int a, int b)

{

int temp = intarray[a];

intarray[a] = intarray[b];

intarray[b] = temp;

}

static Unsafe void UnsafeAccess(int a, int b)

{

fixed (int* pi = &intarray[0])

{

int temp = pi[a];

pi[a] = pi[b];

pi[b] = temp;

}

static Unsafe void Main(string[] args)

{

for (int i = 0; i < testCount; i++)

{

SafeAccess(0, i);

}

for (int i = 0; i < testCount; i++)

{

UnsafeAccess(0, i);

}

Safe Loop:

I examined the code generated by the 64-bit JIT compiler for the SafeAccess loop (which was inlined into Main by the JIT). Vance Morrison posted a useful article describing how to accomplish this from within Visual Studio: https://blogs.msdn.com/vancem/archive/2006/02/20/535807.aspx

00000642`801501f0 418b08 mov ecx,dword ptr [r8]

00000642`801501f3 8b02 mov eax,dword ptr [rdx]

00000642`801501f5 418900 mov dword ptr [r8],eax

00000642`801501f8 890a mov dword ptr [rdx],ecx

00000642`801501fa 4883c204 add rdx,4

00000642`801501fe 493bd1 cmp rdx,r9

00000642`80150201 7ced jl 00000642`801501f0

There are 7 instructions and 4 memory accesses per loop iteration, with no range checks remaining inside the loop body after optimization. In this case there is no performance cost incurred for safety.

Unsafe Loop:

By contrast, the unsafe version is a mess. UnsafeAccess is larger MSIL (50 bytes vs 31) because Unsafe array accesses require more MSIL instructions than safe ones. Given an array and index on the evaluation stack, safe array accesses require only a single 1-byte instruction: ldelem. The C# compiler generates a much more complex sequence for Unsafe accesses:

IL_000c: /* 06 | */ ldloc.0 // &array[0]

IL_000d: /* D3 | */ conv.i

IL_000e: /* 02 | */ ldarg.0 // index

IL_000f: /* D3 | */ conv.i

IL_0010: /* 1A | */ ldc.i4.4

IL_0011: /* 5A | */ mul

IL_0012: /* 58 | */ add

IL_0013: /* 4A | */ ldind.i4

Ignoring the first and third instructions, which are used to get the array and index, there are six instructions (and bytes) required to load an array element. These extra instructions make UnsafeAccess larger than SafeAccess. When determining which methods should be inlined by the JIT, one of the most highly weighted factors is the size of the inlinee method. In this case UnsafeAccess was rejected for inlining, and because of this, the range check at &intarray[0] could not be removed. In fact the unsafe loop variant actually caused more runtime range checks to occur than the safe variant!

Finally, the presence of a pinned variable inhibits many optimizations in the 64-bit JIT. As a result, the generated code for UnsafeAccess is far worse than that of the safe variant. Keep in mind that the following excerpt shows only the UnsafeAccess method itself, and does not even include the the loop in Main, as the SafeAccess example above does.

image00000000_00e40000!Arrays.UnsafeAccess(Int32, Int32):

00000642`80150260 4883ec38 sub rsp,38h

00000642`80150264 448bc1 mov r8d,ecx

00000642`80150267 48c744242000000000 mov qword ptr [rsp+20h],0

00000642`80150270 48b9102e352000000000 mov rcx,20352E10h

00000642`8015027a 488b09 mov rcx,qword ptr [rcx]

00000642`8015027d 488b4108 mov rax,qword ptr [rcx+8]

00000642`80150281 4885c0 test rax,rax

00000642`80150284 7641 jbe 00000642`801502c7

00000642`80150286 488d4110 lea rax,[rcx+10h]

00000642`8015028a 4889442420 mov qword ptr [rsp+20h],rax

00000642`8015028f 4d63c8 movsxd r9,r8d

00000642`80150292 488b442420 mov rax,qword ptr [rsp+20h]

00000642`80150297 468b0488 mov r8d,dword ptr [rax+r9*4]

00000642`8015029b 4863d2 movsxd rdx,edx

00000642`8015029e 488b442420 mov rax,qword ptr [rsp+20h]

00000642`801502a3 8b0c90 mov ecx,dword ptr [rax+rdx*4]

00000642`801502a6 488b442420 mov rax,qword ptr [rsp+20h]

00000642`801502ab 42890c88 mov dword ptr [rax+r9*4],ecx

00000642`801502af 488b442420 mov rax,qword ptr [rsp+20h]

00000642`801502b4 44890490 mov dword ptr [rax+rdx*4],r8d

00000642`801502b8 48c744242000000000 mov qword ptr [rsp+20h],0

00000642`801502c1 4883c438 add rsp,38h

00000642`801502c5 f3c3 rep ret

Conclusion:

Unsafe array accesses have a lot of potential problems: correctness, GC heap fragmentation due to pinning, and as we have just seen, performance. I hope that this example will help developers understand that safety does not necessarily incur a runtime cost. Before attempting to evade a ‘safety tax’ it is a good idea to check if you are currently paying one. The first step in doing that is viewing disassembly of the optimized code

-Matt Grice

Performance implications of unmanaged array accesses

I was recently shown the following code and asked why the loop calling SafeAccess executed significantly faster than the second loop calling UnsafeAccess:

Safe Loop:

Unsafe Loop:

Conclusion:

Additional resources