February 2010

Volume 25 Number 02

Going Places - Gesture Magic

By Marcus Perryman | February 2010

Touchable screens have been synonymous with Windows Mobile since the first devices appeared back in 2002; however, Windows Mobile 6.5 is the first version to claim any form of gesture support that is exposed to developers. So what is a gesture and why all the fuss?

The traditional touch screens found on Windows Mobile Professional devices provide a mouse simulation surface producing mouse-left-button and mouse-move messages through the screen driver interface. These messages are processed and delivered as if the screen and stylus were a physical mouse, and there are definite similarities: a mouse produces a stream of location coordinates in a linear fashion and can be used as a very precise pointing device, just like a stylus on a screen.

There are differences, as well. For example, a mouse sends position information independent of button information, but the touch screen always simulates the left button being pressed and sends position information only when there is contact with the screen. This paradigm can continue as long as the similarities remain strong. However, with the ever-increasing screen sizes on modern phones, the most natural and intuitive stylus rapidly becomes the user’s index finger. For consumer markets, reliance on a fiddly and easily lost stylus is fast going out of fashion, replaced with the demand for a bold and interactive interface that shouts “touch me!” to encourage an emotional connection with a user.

Fingers and thumbs present a different profile, in total contrast to the precision of a stylus tip, so we see the similarities to mouse input break down. The input data is no longer pinpoint accurate, and often the linear input is more akin to a lunar orbit than a straight line. And it’s not just the data that’s different; input is expected to result in a smooth, animated response, proportional to the input sequence. At this point, it’s clear the mouse paradigm no longer fits and we need something new and different to help describe this type of input and understand how to respond. Enter gestures.

It’s Not Just About Touch Gestures

Before we get to the details of touch gestures, let’s take a moment to step back and think on a broader level about gestures in general. A gesture can mean lots of different things. It might be a finger movement on a computer screen, but shaking your head is also a gesture, as is waving your arm or shaking hands with someone. My point is that it would be shortsighted to just consider the input on a screen as the only source of gestures. Many devices today have multiple sensors, including touch screens, accelerometers, compasses, GPS instruments and cameras. Shaking a device, turning it over, turning it around in a circle or even just smiling at the camera could all be interpreted as gestures to which the software needs to respond, and that’s just with the sensors we know about today.

With this in mind, the architecture in Windows Mobile 6.5 was designed to separate a gesture’s source and recognition process from the routing, delivery and response to that gesture. Although we may have only touch gesture recognition today, new gestures can be delivered through the system once the sensor and recognition components are present. New sensors and recognition software can be added by hardware manufacturers and integrated into the existing gesture delivery architecture—see Figure 1. I’ll come back later to look more closely at gesture targeting and delivery.

image: General Gesture Architecture

Figure 1 General Gesture Architecture

Touch Recognizer

Figure 2 shows the new touch gesture components in Windows Mobile 6.5: Gesture Recognizer, Gesture Delivery, Physics Engine, and Window Auto Gesture (WAG). We will take a look at each, starting with the Gesture Recognizer.

image: Touch Gesture Components

Figure 2 Touch Gesture Components

The Gesture Recognizer component connects directly to the input from the existing touch driver. The input information provided by the driver remains unchanged in Windows Mobile 6.5 in order to keep OEM development costs low and to encourage adoption.

There are five recognized touch gestures in Windows Mobile 6.5 (seeFigure 3):

  • Select: touch location stays within a movement threshold (that is, the maximum allowed movement of the finger), and touch duration is less than the threshold time.
  • Hold: touch location stays within a more lenient threshold and exceeds the select threshold time.
  • Double Select: two correctly recognized select gestures are recognized within a threshold time, and occur within a distance tolerance.
  • Pan: touch point movement exceeds the select threshold. This gesture is slightly different and is classed as continuous because it generates more than one gesture event.
  • Scroll: the most complex gesture to recognize, as it has speed, angle deviation and distance thresholds.

image: Five Core Gestures

Figure 3 Five Core Gestures

You might wonder why we have some of these gestures, because the mouse behavior appears sufficient to acquire the same information. For example, the Select gesture seems just like clicking on a button, and Pan is just like a mouse move. There are two main reasons why all five of these gestures are important.

Consistency: A mouse click is received as two messages, down and up of the mouse button. The exact behavior for recognizing a click is specific to the control that recognizes it. For example, a button control recognizes the mouse down and mouse up as a click when both locations are within the windows bounds. In contrast, the ListView control recognizes the same event, but for each item in its list. The Select gesture is recognized independently of the control, using consistent parameters. The distance thresholds used for gesture recognition are resolution-aware (or more accurately, dots-per-inch-aware) and are set in order to work with the broadest range of finger profiles (there is a surprising range of finger shapes). So the same physical distances are used on different-sized screens to provide consistency among devices.

Routing: A finger is not an accurate pointing device, especially when the user is moving or walking around, so it’s vital that applications maximize the touchable target area. The Gesture Delivery component implements some specific rules to assist with this task and increase the value of these simple gestures.

Routing

Gesture information is delivered via the new WM_GESTURE message, and as with all window messages, there are associated parameters—DWORD wParam and LONG lParam—that contain the details of the message. The WM_GESTURE message parameters contain the gesture ID as a wParam to indicate which gesture is being delivered, and a handle to the full gesture information as an lParam. A mouse message is always sent to the topmost window at the location of the mouse coordinates (discounting mouse capture scenarios), but for gestures the rules are different. Gesture messages are different and are always sent to the topmost window under the very first touch point of the sequence that makes up the full gesture sequence. This subtlety doesn’t make much of an impact for Select, Hold and Double Select gestures, which have only small screen movement tolerances. However, the Pan gesture is quite different. When you start panning, all Pan messages are sent to the window in which the panning starts, even if the panning movement takes the touch point outside of that original window.

In the same way, the Scroll gesture is recognized many pixels from its original touch-point location. But it makes sense that the Scroll should be routed to the same window as the preceding Pan messages, as the user started the input sequence in that original control and intended to target it. Considering that the Pan gesture is often associated with direct manipulation—moving content around the screen as if it were a piece of paper on a desktop—this routing makes a lot of sense, because the control or screen point under the finger on the initial touch should remain under the finger as the content is moved around the screen.

Unhandled Message Routing

Another unusual aspect of gesture message routing is what happens to unhandled gesture messages. Like all unhandled messages, they end up being sent to DefWindowProc for default processing. When DefWindowProc receives a gesture message, it attempts to find the window’s parent and send the message on to that window. This is done to maximize the touchable area available to the user.

To help explain, consider a scrollable window with a number of child label controls. The parent window implements Pan and Scroll gesture response logic to move the child label controls up and down on the visible surface. However, the label controls are unmodified and know nothing about gesture support. If the user happens to start a gesture by touching on a label control instead of the parent window, the user’s expectation is the same—that the form will move in response to input movement. By forwarding the unhandled gesture messages from the label control to the parent window, the user’s expectation is met and the content moves as if the user had touched on the form directly. This behavior is illustrated in Figure 4.

image: Message Routing

Figure 4 Message Routing

There is a small gotcha to call out here: Never send gesture messages from parent to child window or you risk invoking an infinite loop and an inevitable stack overflow crash. There is some basic loop detection implemented in DefWindowProc to try to prevent this situation, but it may not detect all occurrences.

Gesture Messages

Windows Mobile 6.5 recognizes five gestures, but applications can receive seven gesture types. The extra two gesture types are BEGIN and END, sent at the beginning and end of a gesture sequence (all gesture types are prefixed with GID_ to indicate Gesture IDentifier, so these are GID_BEGIN and GID_END). For example, if a Select gesture is recognized, the application will receive three gesture messages: GID_BEGIN, GID_SELECT and GID_END. For a Pan sequence ending in a Scroll gesture, the application will receive GID_BEGIN, GID_PAN, GID_PAN …, GID_SCROLL and finally GID_END.

GID_BEGIN is useful as it contains the screen coordinates of the original touch point. GID_END is handy as it indicates when the user input has ended and no further gestures will be sent for the current sequence.

To help introduce the basic gesture recognition and delivery system in Windows Mobile 6.5, I’ve included a Visual Studio project in the attached samples called SimpleGestureCapture. This sample shows a listbox and adds a new line for every gesture message received by the main window, including location information for all gestures and the angle and speed of scroll gestures. You will need Visual Studio 2005 or Visual Studio 2008 plus the Windows Mobile 6 Professional SDK and the Windows Mobile 6.5 Developer Tool Kit installed. From this sample you can see how the gesture message is received and the data extracted.

Physics

The most exciting part of gesture support is the natural response users experience when manipulating screen content. The key part of this response is the consistent, predictable and natural experience across the device. To achieve this consistency, a new component has been added to the OS called the Physics Engine. This module provides a suite of number-crunching algorithms that take input information, such as the angle and speed from a Scroll gesture, and decay the speed over time using a specific deceleration coefficient. Also, the Physics Engine can be used to apply boundary animations when the input speed is sufficient to move the animation point outside a bounding rectangle.

To use the Physics Engine in Windows Mobile 6.5, a new instance of the Physics Engine must first be created and initialized. Then, at regular time intervals, it’s polled to retrieve the current animation location and the calling application redraws its client region appropriately. The Physics Engine will continue to decay the speed of the animation until it falls below a minimum threshold value, at which point it’s marked as complete and can be released.

As part of the initialization data, the application must specify the bounding rectangle of the data space as well as the view rectangle for the display space (see Figure 5). If the view rectangle moves outside the bounding rectangle, the Physics Engine will use the selected boundary animation (again, part of the initialization data) to bring the view rectangle back inside. The Physics Engine initialization is flexible enough to allow animation in just one axis or to have different boundary animation for each axis if required.

image: How the Physics Engine Handles Bounding and Display Rectangles

Figure 5 How the Physics Engine Handles Bounding and Display Rectangles

By default the Physics Engine decays the speed based on a time delta taken from the point of initialization to the time of each location retrieval call. The calling app can override this by specifying a “user time” value and have the Physics Engine calculate the location at that time. This can be useful for finding the screen position where an animation will complete.

Another interesting Physics Engine configuration is that of item size. This information is used to impose a grid of valid stopping positions over the data space, forcing the Physics Engine to allow the view location final position to end only at one of these grid coordinates. This behavior is helpful when an application is displaying a list of items on the screen and doesn’t want a partial item to be displayed at the top of the screen. The behavior works in either or both axes and will adjust the animation decay and stop algorithms to extend or contract the duration of the animation so it hits the required stopping points.

Putting It Together

For an application to fully support touch gestures, it needs to be enhanced to recognize the appropriate gesture messages and respond appropriately. Where necessary, it needs to create and query a Physics Engine instance to drive the screen redraw. Moreover, the application needs to consider what should happen if an animation or gesture sequence is interrupted by further user input or other events, and ensure that it’s handled in an efficient way. Although all of this is relatively straightforward to achieve, it does require a reasonable amount of boilerplate code that must be created for each window that responds to gestures. So in Windows Mobile 6.5, a number of steps have been taken to simplify this task.

First, a number of the inbuilt controls have already been updated to support gestures, including the LISTVIEW, LISTBOX, TREEVIEW and WEBVIEW controls (some modes don’t support gestures). If you are already using any of these controls, your app is already gesture-enabled.

For applications that don’t make use of the inbuilt controls, there is a new API that significantly simplifies the work required to enable gesture support in the most common scenarios, called Window Auto Gesture (WAG).

Window Auto Gesture

The WAG logic is tightly bound to the DefWindowProc() processing to provide a default gesture response available for any window. When enabled, WAG will automatically respond to GID_PAN and GID_SCROLL gestures, create a Physics Engine instance and send the relevant positioning data back to the application through notification messages. WAG also implements gesture interruption by monitoring the input queue when a pan or scroll gesture is in progress, providing appropriate transitions to and from an animation state.

The default configuration for WAG is to ignore gesture messages, so any window that wants to use the WAG behavior must enable it first. To turn gesture support on, the application must call TKSetWindowAutoGesture for each window that requires support and pass the configuration settings required. As I said earlier, WAG is intended to simplify the most common scenarios for gesture support, and in order for WAG to drive your window, it must have been created with the WS_VSCROLL and/or WS_HSCROLL style set in the axes that can be manipulated by touch gestures. Additionally, the application is required to correctly manage the scroll bar, maintaining the range, min/max and page size as appropriate. This is required so that WAG can calculate the data area size that your window is displaying.

WAG has a number of options worth calling out:

  • WAG will handle both GID_PAN and GID_SCROLL gestures, but either can be disabled if required.
  • Like the Physics Engine, WAG also supports setting item width and height. This information is used not only to set the snapping points, but also to expand the scroll range values from an item count to a pixel count. For example, if the scroll bar range is 0 to 9 for a list of 10 items, and each item requires 20 pixels vertically to display its content, then the item height should be set to 20. WAG will multiply the scroll range (10) by the pixel height (20) to identify the full pixel range of the data (200 pixels).
  • WAG supports a special mode that will drive the window movement by generating WM_xSCROLL messages to the application instead of the more common owner animation messages. This is useful if you have a legacy application and want touch gesture support with the absolute minimum changes to its code. This mode is enabled by setting the nOwnerAnimateMessage value that is part of the TKSetWindowAutoGesture() initialization data to 0 instead of the normal WM_USER + x value. Some functionality is limited in this mode, such as no support for pixel-by-pixel manipulation—the control can only be manipulated item by item. Also, there is no way to go outside the scroll range in this mode, so the extent values are ignored. This option doesn’t work well for scrolling in both axes at the same time because each axis must be moved independently.
  • Extents describe the distance the display area can be dragged beyond the data range and is expressed as a percentage of the display size. Take care when enabling extents, because this allows the user to drag the display beyond the scroll limits and expose a screen area that many existing applications aren’t capable of handling correctly. Ensure the application is correctly clearing the screen when space appears beyond the top or to the left of the data range.

Typically an application will configure WAG with nOwnerAnimateMessage as a value in the range WM_USER to WM_APP. WAG will use this value in the message sent back to the application each time the application needs to redraw its display area. The first animation message in a sequence will be preceded by a status message indicating that the control is now responding to gesture input. WAG automatically aggregates GID_PAN gesture messages and only sends an animation message to the application at a maximum frequency of 24 times per second (regulated using the GESTURE_ANIMATION_FRAME_DELAY_MS timer duration found in gesturephysics.h from the Windows Mobile 6.5 Developer Tool Kit). The same applies for scroll animations, where WAG uses the same timer to query its Physics Engine a maximum of 24 times per second.

The status message option for WAG is especially useful if your control supports focus or changes visually without user interaction, for example via asynchronous updates. Status messages tell the control when the user is interacting through the touch interface. They should be used as a trigger to halt any updates that might change the visual aspects of the control or its content, or unnecessarily take resources from the screen animation. Producing a full-screen animated effect can be resource-intensive, so it’s important to halt any unnecessary background processing and concentrate the resources to provide smooth and timely response to the user. Once the touch interaction is done, use the status message to trigger a data refresh and update, if required.

For more information on the WAG API, see the MSDN documentation for Windows Mobile 6.5 (msdn.microsoft.com/library/ee220917).

Tips and Tricks

Using the gesture API to accept and process gesture information is straightforward. However, it can be a little trickier to produce smooth animation in response to the gestures. Here are some tips that may help.

First frame time is vital. It’s surprising how sensitive the human eye can be to user interface latency. For example, a delay of more than 100ms between a screen touch and a graphical response can result in a feeling of sluggishness, even if the application then maintains a steady 24 frames per second (fps). Work to ensure the first frame response is fast, ideally below 50ms. It’s worth noting that the overhead of the Gesture Recognizer and Gesture Delivery have been carefully optimized, resulting in only 1ms or 2ms from touch to application.

Prefer a consistent frame rate. In our testing, users preferred a slightly slower but more consistent frame rate over a faster but more variable rate. We applied this information by making a timer to regulate the frame update frequency, and tuning the timer to ensure some free CPU time in each frame to handle other tasks.

Remove unnecessary overhead during animation. It’s obvious that the less work there is per frame, the more frames can be drawn per second. However, it’s sometimes harder to identify exactly what work can be left out. During touch manipulation, and especially during scroll animations, the user is less interested in detail and more interested in broad indicators. For example, while scrolling a list of e-mail messages, the user might be less interested in a preview of each message but more interested in its location in the list and its title. So it may be okay to stop updating or retrieving the preview text in order to allow extra time for smooth animation.

Judicious use of off-screen buffers. Double-buffering can be an excellent way of improving drawing performance and reducing fragmented drawing of the screen. However, it must be applied carefully, as an off-screen buffer is costly in resources. Ensure the buffer is held for the shortest possible time and is kept to a minimum size. Using the ScrollWindowEx API can often achieve similar results without the memory overhead of an off-screen buffer.

Measure first and then apply appropriate improvements. It’s standard performance-analysis practice to ensure you’re fixing something that is actually broken. So before changing any code, make sure you understand where the costs are in your animation loop by measuring them first, and then apply your effort to the areas that will yield the most significant benefits to your application.

Doing It Managed

Managed code applications that use common controls (such as LISTBOX, LISTVIEW, WEBVIEW and TREEVIEW) will automatically benefit from the touch gesture support added to these controls without any code changes. For applications that have custom controls, the control code will need to be modified to make use of gestures through the API exposed in the Windows Mobile 6.5 Developer Tool Kit. The tool kit contains C++ headers and samples and is aimed at native code developers. However, the APIs are designed to be easy to use through a simple interop from managed code.

The trickiest part of implementing gesture support is being able to receive the new WM_GESTURE message and the WAG animation messages, because unlike the desktop, the compact framework doesn’t expose the WndProc handler. To get at these messages requires the common technique of sub-classing the window to get a first look at all messages sent to it and filter out the ones you need. This can be done by using a native helper DLL or by simply calling directly to the native APIs. In the sample code available with this article on the MSDN online site, I’ve included some examples that show how this might be achieved, along with three projects showing touch gestures, the Physics Engine and WAG all in use with managed code. You’ll also find several solutions available in the community.

Next Steps

To get started with gestures on Windows Mobile 6.5, be sure to download the Developer Tool Kit from http://www.microsoft.com/en-us/download/details.aspx?id=17284. It includes emulators and samples to explore many of the possibilities. Also, the MSDN documentation for this native API is available at msdn.microsoft.com/library/ee220920. If you’re looking for managed code solutions, take a look at the sample code attached to this article on the MSDN page or at Maarten Struys’ blog (http://mstruys.com/) or Alex Yakhnin’s blog (blogs.msdn.com/priozersk/archive/2009/08/28/managed-wrapper-of-the-gesture-apis.aspx).

There are also more of my ramblings about touch gestures on my blog:


Marcus Perryman has worked at Microsoft for more than 10 years in various technical roles, including developer evangelist and developer consultant. At present, Perryman works as a software design engineer in the Windows Mobile product group designing and developing the next generation of mobile operating systems.

Thanks to the following technical experts for reviewing this article: Tim Benton, David Goon, John Lawrence, Stewart Tootill and Marcin Stankiewicz