I’ve got your NBLs right here

The most common issue we see in NDIS drivers is a “lost packet”.  You have lost a packet when NDIS gives your driver a NET_BUFFER_LIST (NBL) and your driver never returns the packet back to NDIS.  A lost packet will often show up as a hang during Pause or a 0x9F bugcheck.

These issues are also very difficult to debug.  NDIS says “hey, I sent you 1,044,949,195,033 NBLs, but you only returned 1,044,949,195,032.”  Now what?

Starting with Windows 7 SP1, NDIS can track every packet that goes through each NDIS driver.  The packet tracking works like this: just before NDIS gives an NBL to a driver, the NBL is stamped with the driver’s handle.  This means that if you can just search through all the NBLs on the system, you can identify the lost NBL by finding the one NBL that is still stamped with your driver’s handle.

Fortunately, you don’t have to do this manually.  The !ndiskd.pendingnbls debugger extension is clever enough to do the search for you.  !ndiskd.pendingnbls will identify every NBL that is not “at home”, i.e., is not currently held by the same driver that allocated the NBL.

Let’s look at a short example:

kd> !ndiskd.pendingnbls
PHASE 1/3: Found 19 NBL pool(s).
PHASE 2/3: Found 0 freed NBL(s).
Pending Nbl        Currently held by
    ffffcf800287cd20   ffffcf8002750c70 - NDIS Sample LightWeight Filter-0000  [Filter]
PHASE 3/3: Found 1 pending NBL(s) of 1885 total NBL(s).
Search complete.

What is this showing?  The debugger extension counted 1,885 total NBLs on the system.  Of those, most are currently held by whichever driver allocated them, so they're not considered “pending”.  There’s only one NBL, 0xffffcf800287cd20, that is still missing.  NDIS last gave that NBL to a Filter driver named “NDIS Sample LightWeight Filter”.  That filter driver rises to the top of the list of suspects.

Not all pending NBLs are bad.  Every time a packet is sent or received, an NBL goes pending.  If you want to see a pending NBL in action, just set a breakpoint on your datapath handler and run !ndiskd.pendingnbls — you should see the NBL that was just passed to your driver.

NBLs that are pending for “a long time” are bad — they’re likely leaks, and can cause bugchecks or app hangs.  If you’re debugging a 0x9F bugcheck or hang during Pause, the datapath has been stopped for some time, so any NBLs that are still pending are likely leaks.

One last note.  There’s a small (<1% path length) cost to NBL tracking, so NDIS does not enable it by default on Windows Server.  If you are doing NDIS development, you should enable NBL tracking.  There are two ways to enable NBL tracking on Windows Server:

    1. Windows Server 2012 R2 and later:  Just enable Driver Verifier on NDIS.SYS.  This is already a best practice for NDIS developers, so you should already be doing this anyway.
    2. Windows Server 2008 R2 SP1 and later: Set the below registry key to 1:

HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters ! TrackNblOwner [REG_DWORD]

Next time we’ll talk about another nifty way to keep an eye on your NBLs.