Quick tips: Kernel Dumps, Blue Screens and !Analyze -v

Hello,

This time I’m going to address something that most times is somewhat straight forward to analyze yet many people I deal with don´t know how to proceed when a blue screen appears. In this blog post I assume the server is already configured to generate kernel dumps or mini dumps. (this is something I always advice. Configure your servers to generate memory dumps if something goes wrong)

Usually I get emails like “my server just got a BSOD (blue screen of death). I´ve got a minidump and I need to understand what happened”. My approach is always the same (and most of the times it is enough to find the root cause).

First Step

Open windbg and make sure the symbol server is properly configured – more info at https://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

Second Step

Open the memory dump on windbg. Below is the output when opening a kernel memory dump

(…)

Loading Kernel Symbols

...............................................................

................................................................

......................

Loading User Symbols

Loading unloaded module list

....................................

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 50, {d44cdde4, 0, 818ed8b3, 0}

(…)

As you can see above the debuggers states “Use !analyze –v to get detailed debugging information”. Let’s follow the expert J (the debugger) and issue !analyze –v

1: kd> !analyze -v

*******************************************************************************

* *

* Bugcheck Analysis *

* *

*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)

Invalid system memory was referenced. This cannot be protected by try-except,

it must be protected by a Probe. Typically the address is just plain bad or it

is pointing at freed memory.

Arguments:

Arg1: d44cdde4, memory referenced.

Arg2: 00000000, value 0 = read operation, 1 = write operation.

Arg3: 818ed8b3, If non-zero, the instruction address which referenced the bad memory

                    address.

Arg4: 00000000, (reserved)

Debugging Details:

------------------

(…)

TRAP_FRAME: d28db9e0 -- (.trap 0xffffffffd28db9e0)

ErrCode = 00000000

eax=d44cdde0 ebx=d28dbad0 ecx=d28dba94 edx=00000000 esi=b2a6c298 edi=09f56e20

eip=818ed8b3 esp=d28dba54 ebp=d28dba60 iopl=0 nv up ei ng nz na po nc

cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282

nt!RtlTimeToSecondsSince1980+0x16:

818ed8b3 ff7004 push dword ptr [eax+4] ds:0023:d44cdde4=????????

Resetting default scope

LAST_CONTROL_TRANSFER: from 8185edb4 to 818a936d

STACK_TEXT:

d28db9c8 8185edb4 00000000 d44cdde4 00000000 nt!MmAccessFault+0x10a

d28db9c8 818ed8b3 00000000 d44cdde4 00000000 nt!KiTrap0E+0xdc

d28dba60 a658794a d44cdde0 d28dba94 b2a6c298 nt!RtlTimeToSecondsSince1980+0x16

d28dba9c a6586d85 bfca9448 b2a6c298 00000020 srvnet!FillSessionInfoBuffer+0x8c

d28dbae4 a6587be7 bfca9448 00000007 00000006 srvnet!SvcEnumApiHandler64+0x7f

d28dbb10 a657bb5a bfca9448 09f56d60 00002000 srvnet!SvcSessionEnum+0x2f

d28dbb6c a658c102 87629550 00000001 10b017e8 srvnet!SrvAdminProcessFsctl+0x2de

d28dbbd0 a657c3aa 87629550 00000001 10b017e8 srvnet!SrvNetProcessFsctl+0x54

d28dbc18 a658c043 872aaf08 00146027 87629550 srvnet!SrvNetDeviceControl+0xc6

d28dbc2c 81855976 872aaf08 c5e69880 c5e69880 srvnet!SrvNetDefaultDispatch+0x3e

d28dbc44 81a576a1 87629550 c5e69880 c5e698f0 nt!IofCallDriver+0x63

d28dbc64 81a57e46 872aaf08 87629550 10b01701 nt!IopSynchronousServiceTail+0x1d9

d28dbd00 81a56b2c 872aaf08 c5e69880 00000000 nt!IopXxxControlFile+0x6b7

d28dbd34 8185bc7a 000007a8 000033a4 00000000 nt!NtFsControlFile+0x2a

d28dbd34 77275e74 000007a8 000033a4 00000000 nt!KiFastCallEntry+0x12a

WARNING: Frame IP not in any known module. Following frames may be wrong.

043bf878 00000000 00000000 00000000 00000000 0x77275e74

(…)

MODULE_NAME: srvnet

IMAGE_NAME: srvnet.sys

(…)

I usually look at the highlighted above:

· Error – PAGE_FAULT_IN_NONPAGED_AREA

· Stack – STACK_TEXT

· IMAGE_NAME

Next step is to see what is the version of the module (in my sample srvnet)

1: kd> lmvm srvnet

(…)

    FileVersion: 6.0.6002.18005

(…)

And finally bing it (https://www.bing.com/search?q=%22PAGE_FAULT_IN_NONPAGED_AREA%22+srvnet+msdn&go=&form=QBRE&filt=all) with some keywords (depending on results I try one or more combinations like including method name, …)

"PAGE_FAULT_IN_NONPAGED_AREA" srvnet msdn

In my case the first link is to https://support.microsoft.com/kb/951418. After installing the hotfix the issue no longer occurs.

Of course it´s not always as simple as this but most times it’s enough.

See you next time

Bruno