Shooting yourself in the foot with threads
Hello gentle readers
I only had one request for different content and that was for more SOS debugging tips. This is a very easy request for me to fill because I can refer you to my esteemed colleagues Tess and Doug at http://blogs.msdn.com/tess/default.aspx and http://blogs.msdn.com/dougste/default.aspx who speak of this subject at some length and with great expertise.
Given that no-one other than Naveen (Thank you very much, by the way) has requested anything different, I will continue to write brain dumps in whatever topic seems like a good idea at the time. Today, I am going to talk about a thread safety issue that sometimes bites.
On modern processors, reading and writing of 32 bit values is atomic. By that, I mean that nothing can interrupt in the middle leaving you with half of the value updated and half at its old value. So, it would seem reasonable to think that there was no need to protect that value since it is atomic. You would almost be right as well. Things which are almost right give me a great sense of job security. Imagine the following scenario
Thread 1 does things with a 32 bit integer that we are using for synchronisation which we will call lngLockCount. Because it is atomic, there is no protection around it. The code does the following check:
// lngLockCount is a module level variable
// I am good to go
// I do stuff
lngLockCount=0; // done now
Thread 2 does the exact same thing with the exact same code. Is this safe? No but you might not notice at first. When you compile for debug, the compiler treats most variables as if they were marked as volatile. So, in this case, the memory location that contains lngLockCount will be read for the test and written for the assignments. The other thread will see the changes. You will almost certainly be fine. When you compile for release, the compiler treats most variables as if they were marked as register. That is interesting here since the compiler has no reason to think that anyone else cares about this value. There is no apparent reason why it shouldn’t load the value in to a register before the test then set the register ONLY to 1 and then set the register ONLY to 0 again – and as long as they update the memory at some point then the compiler has done what you asked. The only value that ever gets in to the memory location is 0 so the other thread always feels free to go ahead. That isn't a good thing.
Most C++ developers are familiar with the register keyword. It means that you would like the compiler to use a register for this variable if at all possible. The compiler may honour it or it may not depending on what registers are available. Fewer people know the volatile keyword. In some ways, volatile is the opposite. It means that the value should always come and go from memory and never be cached in a register and unlike register, volatile is not a request but an instruction to the compiler. Volatile is not one that you see so often. Why would you want to increase the number of memory fetches, especially after all the things that I said in one of the performance blogs? Multithreading is why.
If we forced this to be volatile or compiled in debug, would that be OK? Very nearly. Let us look at the assembler.
cmp dword ptr [MyApp!lngLockCount (00428508)],0x0
jnz MyApp!MyFunc+0x72 (00411a92)
mov dword ptr [MyApp!lngLockCount (00428508)],0x1
What happens if thread 2 is executing dword ptr [MyApp!lngLockCount],0x1 after the cmp but before the jnz? Both threads think that they are good to go. The timing is unlikely to work out that way. On a single CPU machine, it is a vanishingly unlikely and you probably wouldn’t hit it in a million iterations. It is a bit more likely on a multi-CPU box. Do enough iterations and one day it will bite you. If the optimiser can find a register for the variable and you haven’t marked it as volatile then it will bite you as soon as you build in release mode. The assembler for that would look more like:
jnz MyApp!MyFunc+0x72 (00411a92)
Each thread has its own EAX register so there will be no synchronisation at all there. A bad situation gets worse - except that it will fail much faster and give you a better chance of catching the bug.
Happily, there are functions designed to help you not to shoot yourself in the foot. Check out InterlockedIncrement, InterlockedCompareExchange, InterlockedDecrement, InterlockedExchange, and InterlockedExchangeAdd on MSDN. If anyone is interested, I will explain how you could use those to do this safely - but it is generally easier and safer to use the synchronisation objects already provided. When looking at code that will be multithreaded, you can’t be too paranoid.