Let’s Talk About Touch (Part2)

In part 1 I introduced several of the key pieces in Windows Mobile 6.5 gesture story, including the basics of what a touch gesture is and how the system identifies and delivers gesture messages to your app. I finished up by introducing you to the Physics Engine, so let’s continue from there.

Natural Physical Interaction

I talked about the new Physics Engine component briefly in part 1 but I skipped over a couple of important bits, so let me cover them off here.

Panning on a list of data is relatively simple to implement because the expected behaviour is fairly obvious, especially when the input device is a finger or thumb – it should work just like sliding a piece of paper around on a slippery desktop surface. However modelling the scroll gesture at the end of a pan sequence is a little harder to get right because the expectations are not so clear cut: what speed should the data move at, does the speed decay and if so at what rate?

The touch team did quite a bit of research into what the ‘right’ i.e. most natural response should be and captured the corresponding deceleration algorithm in the Physics Engine. We then built apps around that and presented users with the results so we could fine tune the algorithm parameters.

It was evident from this research that the human eye is very sensitive to the movement and deceleration used in the scroll animation, and it’s important to ensure the response to a scroll gesture is always predictable and the feedback is instantaneous – this helped us focus our performance tuning efforts when working on our controls in the OS.

The Physics Engine is used to map the animation of several other aspects such as snapping to an item in a list (more later) and rubber band where the velocity of scroll gesture would take the display position outside the visible content region.

The key point is when you are implementing UI that moves content in response to a scroll gesture, use the Physics Engine to drive the animation to ensure the user experience is consistent and predictable for the user.

What do you call the space that’s not client and not ‘non-client’?

Extending the concept of touchable content gives rise to some scenarios new to Windows Mobile. Imagine an application showing a list of items with a nice scroll bar to indicate the relative position of the visible content. Through gesture support the user can freely navigate up and down the list by direct manipulation (i.e. without touching the scrollbar), but what happens at the limit of the content? Extending the idea of direct manipulating means we really want the top and bottom of the content to be visually flexible so the user can pan the list beyond its limit and see a clear indication of the top of the document –showing a physical document border for example - and then on release see a smooth animation back to a edge of the document.

We’ve always had the concept of client space where data is drawn, and non client space like the menu, title bar, border etc. But this is something new! The space uncovered by going beyond the list limit doesn’t have a name – it’s not really client space because it’s beyond the client limit (and beyond the scroll bar range!) but it’s definitely not non-client because it’s drawn in the client area. In 6.5 we didn’t formally define this space however when updating the controls and applications to become touch aware we had to solve the problem of what to draw in this space - which was not as straightforward as it sounds when considering that most of our controls can be owner draw / support background themes and were originally designed to work well with a scrollbar.

If you are planning to support direct manipulation via gestures and make use of the Physics Engine then make sure your controls can support the concept of this new window space. There is an option to disable rubber banding in the Physics Engine and this will stop your window from uncovering this new space – but be aware doing this is likely to degrade the user experience.

Item Snapping

While we’re on the topic of the Physics Engine, another feature it exposes is item snapping. Somewhere in Window’s long ago past scrollbars were introduced to support a smaller display area than the actual data needed. And they have served us well for many years. Scrollbars allow the data space and the view space dimensions to be provided while allowing some of the physical display details to remain hidden, such as the real number of pixels needed to draw a piece of the screen.

There are lots of list style controls out there that use the scrollbar is this way because its very convenient: specifying the scroll range as the number of items, the current scroll position representing the item number and the page size described in number of items that can be visually displayed. The built in listbox and listview controls work in this way.

Unfortunately this approach causes some problems when introducing direct manipulation, specifically the pan gesture. The user wants to see pixel by pixel movement in response to the touch input but there aren’t actually any valid scrollbar positions available to represent each pixel position because the scroll range is in items and each item is more than one pixel. That may not be such a problem while the gesture is in progress – round the scroll position to the nearest item, after all it’s just an indicator at this point – but when the session ends the user is unlikely to have aligned the item exactly on the pixel boundary required to match the scrollbar.

The Physics Engine supports the concept of item snapping to help solve this problem. The item height / width can be specified when creating the Physics Engine, and then all animations are guaranteed to end exactly on an item boundary. In the case of a scroll the end point of the scroll is either stretched or shrunk to the nearest boundary and the animation adjusted to make this look smooth and predicted. The Physics Engine can also be used with a zero initial velocity to provide an item snap animation for when the user ends a pan session without a flick.

Here is a snippet from the Physics Engine sample in the DTK showing how to use item snapping:

{

    PHYSICSENGINEINIT initState = {sizeof(initState)};

...

    initState.dwEngineType = 0;

    initState.dwFlags = 0;

    initState.lInitialVelocity = -nTransitionSpeed;

    initState.dwInitialAngle = nTransitionAngle;

    initState.bXAxisMovementMode = PHYSICSENGINE_MOVEMENT_MODE_DECELERATE;

    initState.bYAxisMovementMode = PHYSICSENGINE_MOVEMENT_MODE_DECELERATE;

    initState.bXAxisBoundaryMode = PHYSICSENGINE_BOUNDARY_MODE_RUBBERBAND;

    initState.bYAxisBoundaryMode = PHYSICSENGINE_BOUNDARY_MODE_RUBBERBAND;

    GetClientRect(hwnd, &rctClient);

    initState.rcBoundary.left = 0;

    initState.rcBoundary.right = rctClient.right + g_nMaxXExtent;

    initState.rcBoundary.top = 0;

    initState.rcBoundary.bottom = rctClient.bottom + g_nMaxYExtent;

    initState.sizeView.cx = rctClient.right;

    initState.sizeView.cy = rctClient.bottom;

    initState.ptInitialPosition.x = g_nXPos;

    initState.ptInitialPosition.y = g_nYPos;

initState.sizeItem.cx = 100;

initState.sizeItem.cy = 100;

    // create the physics engine and store it

    if (SUCCEEDED(TKCreatePhysicsEngine(&initState, &g_hPhysicsEngine)))

...

In this code the item height is set to 100 indicating that individual scrollbar values represent 100 pixels of screen real estate. If you are writing new code I would recommend you design your code to support by pixel scrolling from the outset and keep the scroll range in pixels – it just makes things a bit easier for you the developer and slightly more natural for the user. But this feature is really useful if you are updating ‘legacy’ code to support touch.

WAGI (Window Auto Gesture Interface)- Make it Simple

I was writing a touch presentation for some of our Asian partners and put a ‘small’ sample together (native code) showing the basics for implementing direct manipulation with gestures. Walking through the code in front of the partners it dawned on me, as I was searching the pages and pages of source, that maybe we need to work on simplifying some of the common scenarios. The Physics Engine interface was revamped and we’ve introduced the new WindowAutoGesture API’s that provide a very simple way of implementing the most common direct manipulation scenarios.

WindowAutoGesture Interface (WAGI) provides configurable gesture handling for individual windows taking away the complexity of dealing directly with the gesture messages and the physics engine. WAGI is implemented as part of the window manager and handles the pan and scroll gestures on the windows behalf, creating and driving the Physics Engine as appropriate. WAGI then instructs the application where to draw content through custom animation messages.

The window remains responsible for drawing its content and updating its scrollbar in response to animation commands from WAGI. The WAGI API was originally designed to go the extra step and take control of content drawing and scrollbars as well but there wasn’t time in the schedule to implement this for 6.5.

You enable and configure the WAGI behaviour for a specific window by calling the new TKSetWindowAutoGesture(). Gestures are delivered to the WAGI infrastructure through DefWindowProc(), so the application must ensure all unprocessed messages - the scroll +pan gestures specifically - are appropriately passed to DefWindowProc(). WAGI then processes the gesture and delivers the appropriate animation messages back to the window.

The most significant restriction with WAGI is that the window must have a scroll style (WS_VSCROLL or WS_HSCROLL) set and a scroll range greater than the visible area i.e. range > page size, which all means there must be visible scrollbars on your window. WAGI supports gestures in one or both axis and dynamically detects the scroll range, adjusting its behaviour appropriately.

WAGI can operate in two modes: one manipulates the application by directly simulating scrollbar messages such as WM_VSCROLL and WM_HSCROLL, and the other works through private animation messages to tell the window the pixel positions to draw content to. In both modes WAGI also provides notification messages to tell the window when touch interaction starts and ends – this is relevant for the focus issue discussed later.

Scrollbar Manipulation Mode

Scrollbar manipulation mode makes adding touch support to existing code that already supports scrollbar navigation very easy – just set up the WAGI information via a single call to TKSetWindowAutoGesture() and the existing scroll logic will take care of everything else. However there are a couple of reasons why you might want to consider upgrading to the animation message mode of WAGI:

· If the scrollbar range for the existing window is something other than pixel positions then the movement of the UI can appear quite granular and jumpy because the scrollbar can only be manipulated to represent whole item positions and can’t animate through the interim pixels positions if they exist.

· There is no way when using scrollbars to let make the content draw beyond its scroll limit, i.e. when rubber banding at the end of the list when the user has flicked fast on the screen, or when the user drags to the top of the list and beyond to see the limit of the list. Scrollbars have min & max values that bound the content and for lots of reasons it’s not possible to set the current position outside those positions.

Scrollbar manipulation mode is selected by setting nOwnerAnimateMessage to zero.

Animation Message Mode

This mode is selected by setting nOwnerAnimateMessage to a valid window message value equal or above WM_USER. This becomes the message id that carries the animation information back to the window. When an animation message is received the window must call TKGetAnimateMessageInfo() to get the x, y coordinates of the top left of the display area. TKGetAnimateMessageInfo also identifies the type of animation, and although there is currently only one, make sure you check this for AMI_ANIMATION_SCROLL because this may be extended in future releases. Then you need to force a redraw of the window using the new x,y content position not forgetting to update the scroll bar pos to the nearest item.

This mode supports much more flexibility because the scroll range is not a limiting factor, so per pixel positioning is possible. Also in this mode the x,y pixel position might be outside the windows data area, uncovering the new client area space I talked about before, so you need to update the paint routine to make sure it copes with this.

There are a number of other options available through WAGI, like disabling scroll or pan gesture support, limiting the extent that a user can go beyond the window content (including 0% if you don’t want to solve that problem), and the magical Lock Axes option. Lock Axes isn’t some medieval battle cry, unfortunately it’s something much more mundane. This option only makes sense when both the horizontal and vertical axes are scrollable, and it tells WAGI to ignore the scroll gestures unless they are roughly along one of the two axes, i.e. you will only get left/ right or up/down scroll movement with this option turned on. You can see behaviour very similar to this in IE6 on WM 6.5 when scrolling around the page.

See the WAGSample project in the DTK for more details.

Item Focus

Direct manipulation through gestures brings in a whole bunch of issues around focus. Windows traditionally encourages the support of having a region or a point of focus so that a user can interact with the application without being forced to use a pointing device like a mouse or a stylus, or now a finger. Focus is traditionally set with the mouse-down or touch-down event but with direct manipulation the application cannot tell which gesture the user is intending to make until sometime after that initial event so it may be the user is starting a pan or scroll gesture instead of a tap, and moving the focus at the start of every pan gesture is more than likely not what the user wants. Also what should happen if the user hits a hardware key, like a delete key or context menu key while the control is animating in response to a gesture message like a scroll? Should it act on the item with focus at that moment, or stop the animation and move the item into view first, or something else?

To help solve this problem on many of the inbuilt controls we have changed the point at which focus is chosen from when the initial input occurs to now happen at end of the gesture sequence. Visually there are some subtle differences from 6.1 but the overall experience should feel right for the user.

So here are a couple of recommendations for your application design:

· The ideal solution is to design your touch based applications to work without focus. However it’s important to ensure the application navigation is accessible through the direction pad (DPad) keys and if you are planning to use marketplace then the application must also work on pre 6.5 images.

· If you have a custom (inbuilt list controls: Listbox and Listview already do this) list style control that has a concept of focus, move focus selection to mouse-up events (or even better use the select gesture - see the last point in the list). That way you can detect if a gesture happened and ignore the focus change. The exception here is the hold gesture which probably needs to move focus to the item under the gesture before displaying the context menu.

· During a gesture session like a pan or scroll animation, any hardware key should interrupt the animation but not action the focused item. If you use WAGI the animation interruption is done for you, but the application code must still block the associated action.

· This is fairly specific but may be relevant for you: if the focused item is off the screen and the user presses a dpad key, move the focus to an item onscreen instead of scrolling the list to show the current / next / prev focus item – have a look at how the listview does it in something like outlook email client, we spent quite a bit of time getting this right.

· If you can, use only gesture messages or only mouse messages but not both. Focus issues get easier to solve if you can stick with just one or the other.

Which 6.5 Controls are Gesture Aware?

The primary OS controls we updated are these:

· Listview

· Listbox (includes combo)

· Webview

· Treeview

· Tab (scroll left / right to change page)

We also updated a number of applications that have their own custom controls such as the Getting Started app and IE.

Why do you need to know? Well, possibly more for interest than anything else, but you do need to be more aware of this if you subclass any of the above controls. Listview, treeview, webview all use WAGI to drive the animations which means they require all gesture messages to get to the DefWindowProc() and also they use WM_USER+x for the WAGI animation and status messages that must get through to the control. I know this is “programming 101” but we’ve seen this cause problems a few times already. Remember, if you subclass a control you should not be overloading messages in the WM_USER range; this range is private for the original window proc to use.

Getting the Right Frame Rate

Direct manipulation by the user demands a fast response – users are surprisingly adept at spotting UI lag. However a fast response does not translate directly to a demand for a blistering frame rate, in fact the user experience is much better with a slower, more consistent frame rate than a much faster but less consistent rate. In our testing the key factors that came out top are:

1> The initial response to a touch input must be fast, in the order of 50ms but the faster the better.

2> The frame rate must be as consistent as possible.

3> With a consistent frame rate above ~15 fps our own research showed the untrained eye cannot easily distinguish between small differences in frame rate, i.e. 20fps vs 25fps (note: this was not exhaustive research and there are likely more scientifically sound results available, but this was sufficient for our needs)

We found that many of the legacy controls were not designed with rapid frame updates in mind, but with some relatively minor adjustments and performance optimizations we were able to hit our target frame rates with ease. Here are some of the things I applied while optimizing listview:

· As with all optimization it’s vital to have the right measurements in place so you can ensure you are spending time optimizing the right bits.

· It’s possible to receive touch input / gesture messages at a much faster rate than is required to update the screen. To keep the frame rate consistent it’s important to separate the gesture input from the frame update. For WAGI we use GESTURE_ANIMATION_FRAME_DELAY_MS which is found in the gesturephysicsengine.h from the DTK. Currently this is set to 34ms which when used in a timer gives rise to a frame rate of close to 25 FPS, and WAGI will deliver animation updates no more often than this. For non WAGI gesture handling I would recommend you use the same frame delay counter and aggregate pan messages between the frame events. The frame counter actually works against us when measuring the initial gesture response time because the first update is made a minimum of 40ms after the first gesture message. We made a number of changes to help improve the situation, and now WAGI will deliver the first animation request as soon as a PAN delta has been detected - the GID_BEGIN message was also modified to include the initial position of the touch point thus enabling the first GID_PAN to trigger a frame update. The key point here is: do as little as possible in the first frame update in order to keep the lag to a minimum.

· Don’t do anything unnecessary in the update loop. This one sounds like plain common sense but there are some subtleties worth pointing out. It’s ok to use a special case drawing loop when processing gesture animation especially if you have high detail / high cost pieces of drawing code. For example if you have a list of contacts that also shows online presence information it doesn’t make sense to wait for an update of the presence information for each frame – wait until the gesture interaction has finished and then update the screen with the slower information. I got quite an improvement here by reducing the frequency of scrollbar drawing during the gesture animation – take a look at the scrollbar in the outlook client when you flick up / down on a long list of emails, you will likely not have noticed before but it actually updates on a much lower frequency than the frame rate.

· Judicious use of off-screen screen cache can also boost performance significantly. Take care here because an off-screen bitmap takes memory from GWES heap which is a scarce, shared resource pool so don’t go and create a 4mb cache just for your screen. However if you are likely to redraw the same area of screen from the original data for several frames of an animation then it may be worth using a small cache. The key thing here is to carefully measure the benefits to the application performance and be pragmatic about the results.

Managed Code

You may have noticed that there is currently no update to support gestures in managed code for CF 2.0 or 3.5 at this point. As a managed code developer there are still some options open to you. CF controls are implemented in terms of the OS controls so if you are using any of the updated common controls in your managed app then you will have gesture support built in. However if you want to gesture enable a bespoke / custom control then you are required to interop and interact with the gesture architecture that way.

To help you get this up and running there are a couple of projects I’m aware of that will help. Firstly Maarten Struys has recently posted managed code versions of the DTK samples showing pretty much everything you need to get started. Take a look at his blog post here. In addition we’ve been working on a set of simple managed code extension classes that can be used to access gesture messages, the physics engine and WAGI from managed code. It’s not quite ready yet, but I will post more details when it’s baked.

What I’ve not covered:

There are a couple of things I’ve not covered here, like gesture other than touch, gesture enabling forms, touch filter, what’s in the DTK etc. I might post more on these in the near future.

I’m due to deliver a session on WM 6.5 gestures at the upcoming EMEA TechEd in Berlin in November and I’m hoping to get Maarten to join me and share some real field experience of using gestures.

Anyway that’s about it on touch from me. The UK team has moved onto other things now but hopefully its stuff I can blog about more readily.

Marcus