In some cases, the stack trace function will fail in the debugger. This can be caused by a call to an invalid address that caused the debugger to lose the location of the return address; or you may have come across a stack pointer for which you cannot directly get a stack trace; or there could be some other debugger problem. In any case, being able to manually walk a stack is often valuable.
The basic concept is fairly simple: dump out the stack pointer, find out where the modules are loaded, find possible function addresses, and verify by checking to see if each possible stack entry makes a call to the next.
Before going through an example, it is important to note that the kb (Display Stack Backtrace) command has an additional feature on Intel systems. By doing a kb=[ebp] [eip] [esp], the debugger will display the stack trace for the frame with the given values for base pointer, instruction pointer, and stack pointer, respectively.
For the example, a failure that actually gives a stack trace is used so the results can be checked at the end.
The first step is to find out what modules are loaded where. This is accomplished with the x (Examine Symbols) command (some symbols are edited out for reasons of length):
kd> x *! start end module name 77f70000 77fb8000 ntdll (C:\debug\ntdll.dll, \\ntstress\symbols\dll\ntdll.DBG) 80010000 80012320 Aha154x (load from Aha154x.sys deferred) 80013000 8001aa60 SCSIPORT (load from SCSIPORT.SYS deferred) 8001b000 8001fba0 Scsidisk (load from Scsidisk.sys deferred) 80100000 801b7b40 NT (ntoskrnl.exe, \\ntstress\symbols\exe\ntoskrnl.DBG) 802f0000 8033c000 Ntfs (load from Ntfs.sys deferred) 80400000 8040c000 hal (load from hal.dll deferred) fe4c0000 fe4c38c0 vga (load from vga.sys deferred) fe4d0000 fe4d3e60 VIDEOPRT (load from VIDEOPRT.SYS deferred) fe4e0000 fe4f0e40 ati (load from ati.SYS deferred) fe500000 fe5057a0 Msfs (load from Msfs.SYS deferred) fe510000 fe519560 Npfs (load from Npfs.SYS deferred) fe520000 fe521f60 ndistapi (load from ndistapi.sys deferred) fe530000 fe54ed20 Fastfat (load from Fastfat.SYS deferred) fe5603e0 fe575360 NDIS (NDIS.SYS, \\ntstress\symbols\SYS\NDIS.DBG) fe580000 fe585920 elnkii (elnkii.sys, \\ntstress\symbols\sys\elnkii.DBG) fe590000 fe59b8a0 ndiswan (load from ndiswan.sys deferred) fe5a0000 fe5b7c40 nbf (load from nbf.sys deferred) fe5c0000 fe5c1b40 TDI (load from TDI.SYS deferred) fe5d0000 fe5dd580 nwlnkipx (load from nwlnkipx.sys deferred) fe5e0000 fe5ee220 nwlnknb (load from nwlnknb.sys deferred) fe5f0000 fe5fb320 afd (load from afd.sys deferred) fe610000 fe62bf00 tcpip (load from tcpip.sys deferred) fe630000 fe648600 netbt (load from netbt.sys deferred) fe650000 fe6572a0 netbios (load from netbios.sys deferred) fe660000 fe660000 Parport (load from Parport.SYS deferred) fe670000 fe670000 Parallel (load from Parallel.SYS deferred) fe680000 fe6bcf20 rdr (rdr.sys, \\ntstress\symbols\sys\rdr.DBG) fe6c0000 fe6f0920 srv (load from srv.sys deferred)
The second step is dumping out the stack pointer to look for addresses in the modules given by the x *! command:
kd> dd esp fe4cc97c 80136039 00000270 00000000 00000000 fe4cc98c fe682ae4 801036fe 00000000 fe68f57a fe4cc99c fe682a78 ffb5b030 00000000 00000000 fe4cc9ac ff680e08 801036fe 00000000 00000000 fe4cc9bc fe6a1198 00000001 fe4cca78 ffae9d98 fe4cc9cc 02000901 fe4cca68 ffb50030 ff680e08 fe4cc9dc ffa449a8 8011c901 fe4cca78 00000000 fe4cc9ec 80127797 80110008 00000246 fe6a1430 kd> dd fe4cc9fc 00000270 fe6a10ae 00000270 ffa44abc fe4cca0c ffa449a8 ff680e08 fe6b2c04 ff680e08 fe4cca1c ffa449a8 e12820c8 e1235308 ffa449a8 fe4cca2c fe685968 ff680e08 e1235308 ffa449a8 fe4cca3c ffb0ad48 ffb0ad38 00100000 ffb0ad38 fe4cca4c 00000000 ffa44a84 e1235308 0000000a fe4cca5c c00000d6 00000000 004ccb28 fe4ccbc4 fe4cca6c fe680ba4 fe682050 00000000 fe4ccbd4
To determine which values are likely function addresses and which are parameters or saved registers, the first thing to consider is what the different types of information look like on the stack. Most integers are going to be smaller value, which means they will be mostly zeros when displayed as DWORDs (like 0x00000270). Most pointers to local addresses will be near the stack pointer (like fe4cca78). Status codes usually begin with a c (c00000d6). Unicode and ASCII strings can be identified by the fact that each character will be in the range of 20-7f. (In KD, the dc (Display Memory) command will show the characters on the right.) Most importantly, the function addresses will be in the range listed by x *!.
Notice that all modules listed are in the ranges of 77f70000 to 8040c000 and fe4c0000 to fe6f0920. Based on these ranges, the possible function addresses in the preceding list are: 80136039, 801036fe (listed twice, so more likely a parameter), fe682ae4, fe68f57a, fe682a78, fe6a1198, 8011c901, 80127797, 80110008, fe6a1430, fe6a10ae, fe6b2c04, fe685968, fe680ba4, and fe682050. Investigate these locations by using an ln (List Nearest Symbols) command for each address:
kd> ln 80136039 (80136039) NT!_KiServiceExit+0x1e | (80136039) NT!_KiServiceExit2-0x177 kd> ln fe682ae4 (fe682ae4) rdr!_RdrSectionInfo+0x2c | (fe682ae4) rdr!_RdrFcbReferenceLock-0xb4 kd> ln 801036fe (801036fe) NT!_KeWaitForSingleObject | (801036fe) NT!_MmProbeAndLockPages-0x2f8 kd> ln fe68f57a (fe68f57a) rdr!_RdrDereferenceDiscardableCode+0xb4 (fe68f57a) rdr!_RdrUninitializeDiscardableCode-0xa kd> ln fe682a78 (fe682a78) rdr!_RdrDiscardableCodeLock | (fe682a78) rdr!_RdrDiscardableCodeTimeout-0x38 kd> ln fe6a1198 (fe6a1198) rdr!_SubmitTdiRequest+0xae | (fe6a1198) rdr!_RdrTdiAssociateAddress-0xc kd> ln 8011c901 (8011c901) NT!_KeSuspendThread+0x13 | (8011c901) NT!_FsRtlCheckLockForReadAccess-0x55 kd> ln 80127797 (80127797) NT!_ZwCloseObjectAuditAlarm+0x7 | (80127797) NT!_ZwCompleteConnectPort-0x9 kd> ln 80110008 (80110008) NT!_KeWaitForMultipleObjects+0x27c | (80110008) NT!_FsRtlLookupMcbEntry-0x164 kd> ln fe6a1430 (fe6a1430) rdr!_RdrTdiCloseConnection+0xa | (fe6a1430) rdr!_RdrDoTdiConnect-0x4 kd> ln fe6a10ae (fe6a10ae) rdr!_RdrTdiDisconnect+0x56 | (fe6a10ae) rdr!_SubmitTdiRequest-0x3c kd> ln fe6b2c04 (fe6b2c04) rdr!_CleanupTransportConnection+0x64 | (fe6b2c04)rdr!_RdrReferenceServer-0x20 kd> ln fe685968 (fe685968) rdr!_RdrReconnectConnection+0x1b6 (fe685968) rdr!_RdrInvalidateServerConnections-0x32 kd> ln fe682050 (fe682050) rdr!__strnicmp+0xaa | (fe682050) rdr!_BackPackSpinLock-0xa10
As noted before, 801036fe is not likely to be part of the stack trace as it is listed twice. If the return addresses have an offset of zero, they can be ignored (you cannot return to the beginning of a function). Based on this information, the stack trace is revealed to be:
NT!_KiServiceExit+0x1e rdr!_RdrSectionInfo+0x2c rdr!_RdrDereferenceDiscardableCode+0xb4 rdr!_SubmitTdiRequest+0xae NT!_KeSuspendThread+0x13 NT!_ZwCloseObjectAuditAlarm+0x7 NT!_KeWaitForMultipleObjects+0x27c rdr!_RdrTdiCloseConnection+0xa rdr!_RdrTdiDisconnect+0x56 rdr!_CleanupTransportConnection+0x64 rdr!_RdrReconnectConnection+0x1b6 rdr!__strnicmp+0xaa
To verify each symbol, unassemble immediately before the return address specified to see if it does a call to the function above it. To reduce length, the following is edited (the offsets used were found by trial and error):
kd> u 80136039-2 l1 // looks ok, its a call NT!_KiServiceExit+0x1c: 80136037 ffd3 call ebx kd> u fe682ae4-2 l1 // paged out (all zeroes) unknown rdr!_RdrSectionInfo+0x2a: fe682ae2 0000 add [eax],al kd> u fe68f57a-6 l1 // looks ok, its a call, but not anything above rdr!_RdrDereferenceDiscardableCode+0xae: fe68f574 ff15203568fe call dword ptr [rdr!__imp__ExReleaseResourceForThreadLite] kd> u fe682a78-6 l1 // paged out (all zeroes) unknown rdr!_DiscCodeInitialized+0x2: fe682a72 0000 add [eax],al kd> u fe6a1198-5 l1 // looks good, call to something above rdr!_SubmitTdiRequest+0xa9: fe6a1193 e82ee3feff call rdr!_RdrDereferenceDiscardableCode (fe68f4c6) kd> u 8011c901-2 l1 // not good, its a jump in the function NT!_KeSuspendThread+0x11: 8011c8ff 7424 jz NT!_KeSuspendThread+0x37 (8011c925) kd> u 80127797-2 l1 // looks good, an int 2e -> KiServiceExit NT!_ZwCloseObjectAuditAlarm+0x5: 80127795 cd2e int 2e kd> u 80110008-2 l1 // not good, its a test instruction not a call NT!_KeWaitForMultipleObjects+0x27a: 80110006 85c9 test ecx,ecx kd> u 80110008-5 l1 // paged out (all zeroes) unknown NT!_KeWaitForMultipleObjects+0x277: 80110003 0000 add [eax],al kd> u fe6a1430-6 l1 // looks good its a call to ZwClose... rdr!_RdrTdiCloseConnection+0x4: fe6a142a ff15f83468fe call dword ptr [rdr!__imp__ZwClose (fe6834f8)] kd> u fe6a10ae-2 l1 // paged out (all zeroes) unknown rdr!_RdrTdiDisconnect+0x54: fe6a10ac 0000 add [eax],al kd> u fe6b2c04-5 l1 // looks good, call to something above rdr!_CleanupTransportConnection+0x5f: fe6b2bff e854e4feff call rdr!_RdrTdiDisconnect (fe6a1058) kd> u fe685968-5 l1 // looks good, call to immediately above rdr!_RdrReconnectConnection+0x1b1: fe685963 e838d20200 call rdr!_CleanupTransportConnection (fe6b2ba0) kd> u fe682050-2 l1 // paged out (all zeroes) unknown rdr!__strnicmp+0xa8: fe68204e 0000 add [eax],al
Based on this, it appears that RdrReconnectConnection called RdrCleanupTransportConnection, to RdrTdiDisconnect, to ZwCloseObjectAuditAlarm, to KiSystemServiceExit. The other functions on the stack are probably leftover portions of previously active stacks.
In this case, the stack trace worked properly. Following is the actual stack trace to check the answer:
kd> k ChildEBP RetAddr fe4cc978 80136039 NT!_NtClose+0xd fe4cc978 80127797 NT!_KiServiceExit+0x1e fe4cc9f4 fe6a1430 NT!_ZwCloseObjectAuditAlarm+0x7 fe4cca10 fe6b2c04 rdr!_RdrTdiCloseConnection+0xa fe4cca28 fe685968 rdr!_CleanupTransportConnection+0x64 fe4cca78 fe688157 rdr!_RdrReconnectConnection+0x1b6 fe4ccbd4 80106b1e rdr!_RdrFsdCreate+0x45b fe4ccbe8 8014b289 NT!IofCallDriver+0x38 fe4ccc98 8014decd NT!_IopParseDevice+0x693 fe4ccd08 8014d6d2 NT!_ObpLookupObjectName+0x487 fe4ccde4 8014d3ad NT!_ObOpenObjectByName+0xa2 fe4cce90 8016660d NT!_IoCreateFile+0x433 fe4cced0 80136039 NT!_NtCreateFile+0x2d
The first entry was the current location based on the stack trace, but otherwise, the stack was correct up to the point where RdrReconnectConnection was called. The same process could have been used to trace the entire stack. For a more exact method of manual stack walking, you would need to unassemble each potential function and follow each push and pop to identify each DWORD on the stack.