Debugging Winsock LSPs
Several people have asked for tips on debugging LSPs, and unfortunately there’s no easy method to do so. One difficulty with LSPs is that once it’s installed any number of processes can immediately load the faulty LSP and begin to behave badly. This usually isn’t fatal unless one of those processes happens to be WINLOGON.EXE, LSASS.EXE, or another system critical service in which case your computer will shut down.
Problems with LSPs typically manifest in one of two ways:
1. An application crash
2. The application fails in an unexpected error
The first problem is usually easier to diagnose. This is done by using a debugger to either attach to the process and dumping the relevant stack that caused the violation or by running the application under the debugger. Almost all of application crashes from LSPs are due to invalid memory access (i.e. access violation). With the debugger (either Visual Studio, WinDbg.exe, NTSD.EXE, etc.) you can view the faulting instruction and dump the memory address that your LSP was trying to access. This usually gives you enough information to trace through your code and see how and where that structure and/or variable is manipulated to track down the culprit.
However, in some instances that might not be enough. In that case the next step would be to step through your LSP code for typical socket operations that would lead up to the crash. For example, if the crash occurred in your LSP’s overlapped completion routine then setting breakpoints in your LSP’s WSPSend and OverlappedCompletionRoutine would be the first step. When those breakpoints are hit, you can then validate the LSPs internal state for each socket and that request. Setting additional breakpoints on routines that manipulate the LSP internal state may also be necessary.
Other areas of the LSP to look at when debugging access violations are:
1. Access to the LSP socket context structure from multiple threads – many apps perform simultaneous socket operations on the same socket from multiple threads. Verify the LSP correctly references and utilizes CRITICAL_SECTIONs when necessary to handle this case.
2. Validating calling parameters – the Winsock API specification states that Winsock will catch invalid memory pointers for some input parameters and return WSAEFAULT. This means any LSP must also do this validation.
3. Overlapped completion – LSPs add considerable overhead to the amount of time it takes for an overlapped operation to fully complete. Some poorly written applications use stack based WSAOVERLAPPED structures and then leave that function even though the operation is still pending. When the operation completes, the LSP ends up over writing a random stack location. This isn’t a bug in the LSP bug a bug in the application, and there’s nothing the LSP can do to prevent this. The only recourse here is to contact that application owner.
If you are experiencing a crash in a system critical process like LSASS.EXE then life is much more difficult. In this case, debugging can only be performed by piping the user mode debugger to the kernel mode debugger (kd.exe). This will stop the entire OS execution while stepping through the faulting process and will prevent the OS from rebooting underneath you. This type of debugging can only be accomplished using the Windows debugging tools found here:
For the second type of LSP failure, unexpected application errors, the starting point is more nebulous. An example of this is installing an LSP and then having your web browser fail to load a web page you know should succeed. Since you don’t immediately know what call is unexpectedly failing, these problems require more work. Several approaches are:
1. Use a debugger to step through the expected Winsock calls if known (e.g. Internet Explorer would call socket, connect, select, send, and recv.
2. Add tracing to your LSP to log unexpected failures.
Tracing is generally a good thing to add since it’s useful and can be reused (if any other problems come up). Additionally, since you probably won’t know what the failing application is explicitly doing in terms of Winsock calls, adding tracing usually takes less time in the long run.
Some other random LSP tips:
1. Always remember to install your LSP over both TCP and UDP entries for IPv4 and IPv6. Several applications will call select and pass handles owned by multiple providers, and if your LSP doesn’t layer over that provider then there is a possibility that your LSP is completely bypassed and the owning provider will fail its WSPSelect call (as it won’t understand the LSP sockets passed to it).
2. When debugging an LSP you may install and remove your LSP multiple times. After a while you can get into a state where various processes have loaded different versions of your LSP. It’s a good idea to reboot your computer after several iterations of installing and removing your LSP.
3. Debug your LSP on a Windows Vista computer J. Winsock LSP categorization will prevent system services from loading your LSP which will prevent the flakiness mentioned in the previous point. See the following blog posting about LSP categorization: http://blogs.msdn.com/wndp/archive/2006/02/09/529031.aspx
--Anthony Jones (AJones)