Diagnosing Deadlock-Stall Conditions Within Microsoft Windows CE .NET

 

Nat Frampton, Windows Embedded MVP, President
Real Time Development Corp.

June 2002

Applies to:
    Microsoft® Windows® CE .NET with Microsoft Platform Builder 4.0

Summary: Microsoft Windows CE .NET provides a rich set of tools and development environments for creating embedded applications. We start by examining the Kernel Tracker Tool to understand its power. Often an embedded developer faces situations where external tools are not able to provide assistance because of their dependence on particular operating system (OS) resources that are not available. As a solution to diagnosing these deadlock or stall situations, where the tools have been rendered useless, we develop a simple Priority Runner application to identify the lowest schedulable thread priority in the system. (14 printed pages)

Contents

Introduction
Kernel Tracker—Snapshot
Deadlock-Stall Thread
Strategy for Priority Runner
Implementing Priority Runner
Thread and Event Initialization
Priority Runner Thread
Interrupt Initialization
IST-Interrupt Service Routine
Testing Priority Runner
Summary

Introduction

There are many demonstrations and examples of the use of tools like the integrated development environment of Microsoft® Platform Builder 4.0 or Microsoft eMbedded Visual C++® to explain the behavior of the system. When the system is well behaved, the combination of tools provides an excellent view of the system state. What happens when the system is in a deadlocked state? I want to explore diagnosing a scenario on Microsoft Windows® CE .NET where a particular high priority thread is consuming a large portion of the CPU bandwidth. In this scenario, the drivers that deliver the messages and refresh the Screen Display will never be scheduled. We will explore a simple software example titled Priority Runner, which can work in combination with hardware as simple as LEDs or a serial port to diagnose this problem. The first question you will want to know is how to find out which thread priority is consuming the CPU and not allowing the other thread priorities to run. This is the goal of our software solution. The diagnostic thread will identify the last available priority that was scheduled in.

We start with two quick definitions. Deadlock is where a thread requires a system resource like a Semaphore or Mutex and is prevented from running because this resource is not available. A stall condition is where another thread at equal or higher priority is consuming the CPU and not allowing other threads of equal or lower priority the opportunity to run. We will create a thread that demonstrates both conditions.

Kernel Tracker—Snapshot

The existing graphical tools and data acquisition mechanisms built into the operating system are very powerful. They work together to provide a critical high resolution view of the history of activities on the device. The Kernel Tracker Tool provides a detailed view of activities of all process and operating system threads in a single view. Below is a simple example of a Kernel Tracker display snapshot from a Windows CE .NET Device.

Click here to see larger image

Figure 1. (Click picture to see larger image)

All system processes and interrupts are represented on the Workspace area to the left of the screen. A High resolution graphical display is in the center of the diagram with a complete symbol Key to the right of the display. In the diagram above, the individual interrupts are graphically represented along the leaves to the Interrupts Tree Branch. This diagram had a screen resolution of 1000 ms, which creates a good overall view of the present activity. We can scroll into a particular point in the diagram by selecting an interrupt of interest and set the Zoom Range to 1 ms. The following diagram represents this high resolution view.

Figure 2.

Thread execution is represented by the Horizontal green bars. Context switches are seen as the vertical lines between these bars. In this example, Interrupt 24 occurs, causing the Kernel to pulse the event registered for the Interrupt Service Thread (IST-Thread 0xCDD92926) in the RealTimeLab.exe. The IST's execution is seen in two sections of activity in which it waits on an OS object, which causes activity in CEMON.EXE. The first context switch is the result of a retail message and the second is the publishing of the interrupt count back to Kernel Tracker through the CELOG service on the Target.

A full explanation of Kernel Tracker or the Interrupt Mechanisms is not the focus of this article. Explore the documentation in Platform Builder focusing on Kernel Tracker as a starting point and, most importantly, start exploring your target system with these tools. It is important to understand the available tools and exhaust their ability before we move on to manually creating our own debug techniques.

Deadlock-Stall Thread

We will create a rogue thread in order to create a representative problem to solve. The deadlock-stall thread for our example is the function ThreadHigh. The thread stalls while monitoring the performance counter and holding the critical section for one second. It then sleeps for 5 ms. This bounds all threads above the priority of this thread to .5% of the CPU.

DWORD WINAPI ThreadHigh( LPVOID lpvParam )
{
    while( g_fRun )
    {
        // Grab the critical Section
        EnterCriticalSection( &g_csCriticalSection );
        // Wait on the performance counter for 1 second
        //
        ulDiffuS = 0;
        QueryPerformanceCounter( &liStart );
        while( ulDiffuS < 1000000 )
        {
            QueryPerformanceCounter( &liCurrent );
            ulDiffuS = STCalcDeltaUS( liStart, liCurrent );
        }
        // Release the critical section
        LeaveCriticalSection( &g_csSTCriticalSection );
        // Sleep for a while
        Sleep( 5 );
    }
    return 0;
}

The key components to this high priority thread are:

  • Enter the Critical Section
  • Wait for the 1,000,000 us to expire on the Performance Counter (1 sec)
  • Leave the Critical Section
  • Sleep
  • Repeat

We are going to create and set the thread priority of this thread to 20. Windows CE .NET has 256 available priorities to applications. The highest priority is priority 0. Increasing priority numbers represent lower priority threads. The result of ThreadHigh's execution on the CPU is that all threads with priority lower than 20 are unable to run (Priority 21-255). All drivers, file system, GUI, ISTs are suspended, waiting to be scheduled.

The deadlock-stall thread and event initialization can be placed anywhere within an application. Examine the function:

void InitializeDeadStallThread( void )
{
    // Set the stall thread priority
    //
    m_nSTHighPriority = 20;
    // Initalize the Critical Section
    //
    InitializeCriticalSection( &g_csSTCriticalSection );
    // Create the High Priority Thread
    //
    g_htTVHigh    =     CreateThread(
                NULL,             // CE Has No Security
                0,                // No Stack Size
                STThreadHigh,     // Interrupt Thread
                NULL,             // No Parameters
                CREATE_SUSPENDED, // No Creation Flags
                &dwThreadID       // Thread Id
                );
    // Set the thread priority to real time
    //
    if( !CeSetThreadPriority( g_htSTHigh, m_nSTHighPriority ))
    {
        RETAILMSG(1,(TEXT("RTL-ST : Failed setting High Thread 
      Priority.\r\n")));
        return;
    }
    // Get the thread going
    //
    g_fSTRun    = TRUE;
    ResumeThread( g_htSTHigh );
}

The key steps in initializing an interrupt are:

  • Initialize the Critical Section.
  • Create the High Priority Thread suspended.
  • Set the High Priority Thread's priority.
  • Resuming the Stall Thread to get it started.

Strategy for Priority Runner

The basic scenario for finding which thread priority is stalled is simple. We will use a thread to run the priorities from start to finish. The priority runner thread will be launched and immediately wait on a start event. Upon receiving the start event, the thread will at each priority, flash the LED for 100 ms, increment and set the priority, sleep for 1 second, and check to see if we have reached the stop priority. Below is a diagram of this behavior.

Figure 3.

The Priority Runner functionality can be implemented in your choice of mechanisms. Ensure if possible that it can be started by a hardware switch or push button since the Graphics Environment on the target may be prevented from running depending on the rouge thread priority. The external switch will be tied to an IST. The following code examples implement the setting of the start event from an IST.

Implementing Priority Runner

There are four critical sections to the implementation. The following global variables and definitions are required for these five functions:

// Globals
//
HANDLE          g_hevInterrupt;        // Interrupt Event
HANDLE          g_htIST;            // Interrupt Thread
DWORD           g_dwInterruptCount;        // Interrupt Count
DWORD           g_dwSysInt = 8;        // System Interrupt Number
BOOL            g_fRun = FALSE;        // Interrupt Running Flag  
CRITICAL_SECTION  g_csSTCriticalSection;    // ST Critical Section
HANDLE            g_htSTHigh;            // ST High Priority Thread Handle
BOOL              g_fSTRun    = TRUE;    // ST Running Flag 
HANDLE            g_htPR;            // PR Thread
int               g_nPRStartPriority;        // PR Start Thread Priority
int               g_nPRStopPriority;        // PR Stop Thread Priority
int               g_nPRPriority;        // PR Priority
BOOL              g_fPRRunning = FALSE;    // PR Running Flag
HANDLE            g_hevPRFinished;        // PR Finished Event
HANDLE            g_hevPRStart;            // PR Start Event

// Prototypes
//
DWORD WINAPI      ThreadIST( LPVOID lpvParam );    // Interrupt 
   Routine IST
DWORD WINAPI      STThreadHigh ( LPVOID lpvParam );// ST High 
   Priority Thread
ULONG             STCalcDeltaUS ( LARGE_INTEGER liStart, 
                                  LARGE_INTEGER liCurrent );
                                  // ST Calc Delta us
DWORD WINAPI      PRThread( LPVOID lpvParam );    // PR Priority 
   Runner Thread
DWORD             PRLED    ( BYTE     ucPort, 
                BOOL fState  );        // Led Control

Thread and Event Initialization

The thread and event initialization can be placed anywhere within an application. The following function demonstrates the necessary initialization:

Void InitializePriorityRunner
{
// Create an event
//
g_hevPRFinished = CreateEvent(NULL, FALSE, FALSE, NULL);
if (g_hevPRFinished == NULL) 
{
    RETAILMSG(1, (TEXT("RTL-PR : Finished Event creation 
      failed!!!\r\n")));
    return;
}

// Create an event
//
g_hevPRStart = CreateEvent(NULL, FALSE, FALSE, NULL);
if (g_hevPRStart == NULL) 
{
    RETAILMSG(1, (TEXT("RTL-PR : Start Event creation failed!!!\r\n")));
    return;
}
// Create the Priority Runner Thread
//
g_htPR = CreateThread( NULL,            // Windows CE Has No Security
                       0,               // No Stack Size
                       PRThread,        // Interrupt Thread
                       NULL,            // No Parameters
                       CREATE_SUSPENDED,    // No Creation Flags
                       &dwThreadID          // Thread Id
                     );
// Look for a good Thread handle
//
if( !g_htPR )
{
    RETAILMSG(1,(TEXT("RTL-PR : Failed create Priority Runner 
      Thread.\r\n")));
    return;
}
// Setup the start priorities
//
g_nPRPriority = 0;
// Set the thread priority to the start priority
//
if( !CeSetThreadPriority( g_htPR, g_nPRPriority ))
{
    RETAILMSG(1,(TEXT("RTL-PR : Failed setting Start Priority.\r\n")));
    return;
}
// Get the thread started
g_fPRRunning = TRUE;
ResumeThread( g_htPR );
}

The key steps in initializing the priority runner are:

  • Creating the Start and Finished Events.
  • Creating a Priority Runner thread that is suspended.
  • Set the thread priority to the start priority.
  • Set the g_fPRRunning Flag to true.
  • Resuming the Thread.

Priority Runner Thread

The Priority Runner Thread is in the function PRThread.

DWORD WINAPI PRThread( LPVOID lpvParam )
{
DWORD dwCount;
DWORD dwStatus;
// Make sure we have the object
//
dwStatus    = WaitForSingleObject(g_hevPRStart, INFINITE);
if( dwStatus != WAIT_OBJECT_0 )
    {
        return 0;
    }
// Check to see if we have reached the stop priority
// 
while ( g_nPRPriority < g_nPRStopPriority )
    {
        // Write to CELog that we own it
        dwCount = (DWORD) g_nPRPriority;
        // Set the LED On
        //
        LED( 1, TRUE );
        // Wait 
        //
        Sleep( 100 );
        // Set the LED Off
        //
        LED( 1, FALSE );
        // Go to the next Priority
        g_nPRPriority++;
        // Set the new thread priority
        //
        if( !CeSetThreadPriority( g_htPR, g_nPRPriority ))
        {
            RETAILMSG(1,(TEXT("RTL-PR : Failed Setting Priority 
         %d!!!\r\n"),
            g_nPRPriority));
            return 0;
        }
        // Wait for next cycle
        //
        Sleep( 1000 );
    }
    // We are done set the event
    //
    SetEvent( g_hevPRFinished );
    return 0;
}

The key components to this Priority Runner thread are:

  • Waiting for the Start Event, which is set in the IST based on a button press.
  • Cycle Through the Priorities from the Dialog.
  • Turn the LED On.
  • Wait 100 ms.
  • Turn the LED Off.
  • Set our thread priority to the next thread priority.
  • Sleep for 1000 ms (1 second).
  • Repeat cycling through the thread priorities.
  • Set the g_hevPRFinished Event.

Interrupt Initialization

There are two critical sections to the code we have just added. The interrupt initialization can be placed in the initialization code to your application or executed in response to a particular start event. Examine the following code snippet:

Void InitializeInterrupt( void )
{
// Create an event
//
g_hevInterrupt = CreateEvent(NULL, FALSE, FALSE, NULL);
if (g_hevInterrupt == NULL) 
{
    RETAILMSG(1, (TEXT("RTL-IST: Event creation failed!!!\r\n")));
    return;
}
// Have the OEM adaption layer (OAL) Translate the interrupt request 
      (IRQ) to a system irq
//
fRetVal = KernelIoControl( IOCTL_HAL_TRANSLATE_IRQ, 
                           &dwIrq,
                           sizeof( dwIrq ),
                           &g_dwSysInt,
                           sizeof( g_dwSysInt ),
                           NULL );
// Create a thread that waits for signaling
//
g_fRun = TRUE;
g_htIST = CreateThread(NULL, // Windows CE Has No Security
                       0,    // No Stack Size
                       ThreadIST,    // Interrupt Thread
                       NULL,         // No Parameters
                       CREATE_SUSPENDED, // Create Suspended
                       &dwThreadID    // Thread Id
                       );
// Set the thread priority to real time
//
m_nISTPriority = 0;
if( !CeSetThreadPriority( g_htIST, m_nISTPriority ))
{
    RETAILMSG(1,(TEXT("RTL-IST: Failed setting Thread Priority.\r\n")));
    return;
}
// Initialize the interrupt
//
if ( !InterruptInitialize(g_dwSysInt,g_hevInterrupt,NULL,0) ) 
{
    RETAILMSG (1, (TEXT("RTL-IST: InterruptInitialize failed!!!\r\n")));
    return;
}
// Update the dialog with the thread number
//
m_szISTHandle.Format( L"0x%08X", g_htIST );
UpdateData( FALSE );
m_ISTEnableButton.EnableWindow    ( FALSE );
m_ISTDisableButton.EnableWindow    ( TRUE );
// Get the thread started
//
ResumeThread( g_htIST );
}

The key steps in initializing an interrupt are:

  • Creating an Event.
  • Getting the System Interrupt number for your IRQ.
  • Creating an interrupt service thread (IST) that is suspended.
  • Calling InterruptInitialize to create an association of the IRQ->Event.
    • Creating an IST that isn't suspended may cause the InterruptInitialize to fail because the event is already being waited on.
  • Set the thread priority to the highest priority, which is 0.
  • Resuming the IST.

IST-Interrupt Service Routine

The actual IST is in the function ThreadIST. We have set the g_hevPRStart Event to get the Priority Runner thread started.

DWORD WINAPI ThreadIST( LPVOID lpvParam )
{
DWORD dwStatus;
BOOL fState = TRUE;
// Always check the running flag
//
while( g_fRun )
{
    dwStatus = WaitForSingleObject(g_hevInterrupt, INFINITE);
    // Check to see if we are finished
    //
    if(!g_fRun ) return 0;
    // Make sure we have the object
    //
    if( dwStatus == WAIT_OBJECT_0 )
    {
        // Only look for button ups. Get an interrupt on up and down
        //
        if (!( READ_REGISTER_ULONG(g_pButtonPort) & BUTTON_MASK))
        {
            RETAILMSG(1, (TEXT("Button up...")));
            g_dwInterruptCount ++;
            // Store our count out the the CELOG
            //
            CELOGDATA( TRUE, 
                       CELID_RAW_LONG, 
                       &g_dwInterruptCount, 
                       (WORD) (sizeof(DWORD)), 
                       1, 
                       CELZONE_MISC);
            RETAILMSG(1, (TEXT("RTL-IST  \\Button Count : %d! \r\n"),
                      g_dwInterruptCount));
                // See if we are to start the Priority Runner
                //
                if( g_fPRRunning )
                {
                    SetEvent( g_hevPRStart );
                }
                // Flash the LED
                LED( 0, fState );
                fState    = !fState;
        }
        // Finish the interrupt
        //
        InterruptDone( g_dwSysInt );
    }
}
return 0;
}

The key components to this IST interrupt handling thread are:

  • Waiting for the Interrupt Event.
  • Confirming that we have a pulsed event from the OS.
  • Confirm the button is in an up state.
  • Handling the interrupt in the shortest time possible.
  • Create CELOGDATA to be viewed in Kernel Tracker.
  • Check to see if the g_fPRRunning flag is set and then set the g_hevPRStart Event.
  • Toggle the LED state.
  • Reporting InterruptDone().
    • The OS will not provide another interrupt on this IRQ until the InterruptDone is reported.
  • Waiting for the Interrupt Event again.

We are now ready to run and test the Priority Runner.

Testing Priority Runner

Integrate the previous functions into your application. Once you have started and initialized the events, start the interrupt handler, deadlock-stall, and priority runner threads. Set the start event for the Priority runner. In this example, you would push the external button causing the IST to execute. The Priority Runner thread blinks on each priority, starting at priority 0 in this case. You will be counting light flashes on LED starting with the start priority. It will flash the LED once for each priority. The light will stop flashing when you reach the stalled priority. In this case, the stalled priority is 20. You should count 21LED flashes total. You can configure the start and stop priorities for priority runner in the initialization code.

Summary

Windows CE .NET provides the embedded developer a rich development environment and tools to develop and debug embedded applications. Kernel Tracker provides a high resolution view into the scheduler of Windows CE .NET. This tool will serve most of your debugging needs. As with all embedded systems, there are times when hardware-only interface solutions are required to identify and correct problems. The Priority Runner application provides one such mechanism. Debugging stalled thread priorities is difficult without such a tool. The development of embedded applications and debugging tools must be based on an understanding of all operating system mechanisms. This article serves as a beginning to the process. Good luck on the journey!