March 2011

Volume 26 Number 03

UI Frontiers - Touch Gestures on Windows Phone

By Charles Petzold | March 2011

image: Charles Petzold As someone who spends much of his professional life observing the evolution of APIs, I’ve been quite entertained by that little corner of the API universe occupied by multi-touch.  I’m not sure I’d even want to count the number of different multi-touch APIs spread out over Windows Presentation Foundation (WPF), Microsoft Surface, Silverlight, XNA and Windows Phone, but what’s most evident is that a “unified theory” of multi-touch is still elusive.

Of course, this plethora of touch APIs shouldn’t be surprising for a technology that’s still comparatively young. Moreover, multi-touch is more complex than the mouse. That’s partially due to the potential interaction of multiple fingers, but it also reflects the difference between a purely artificial device such as the mouse and all-natural fingers. We humans have a lifetime of experience using our fingers, and we expect them to interact with the world in well-known ways, even if we’re touching the glossy surface of a video display.

For the application programmer, Windows Phone 7 defines four—yes, four—different touch interfaces.

Silverlight applications written for Windows Phone 7 have the option of obtaining low-level touch input through the static Touch.FrameReported event, or higher-level input through the various Manipulation routed events. These Manipulation events are mostly a subset of similar events in WPF, but they’re different enough to cause major headaches.

XNA applications for Windows Phone 7 use the static TouchPanel class to obtain touch input, but that single class actually incorporates two touch interfaces: The GetState method obtains low-level finger activity, and the ReadGesture method obtains higher-level gestures. The gestures supported by the ReadGesture method are not stylus-like gestures such as checkmarks and circles. They’re much simpler gestures described by names such as Tap, Drag and Pinch. In keeping with XNA architecture, touch input is polled by the application rather than being delivered through events.

Gestures Come to Silverlight

I naturally assumed that Silverlight for Windows Phone 7 already had a sufficient number of multi-touch APIs, so I was quite surprised to see a third one added to the mix—albeit in a toolkit that came out a little too late for me to describe in my book, “Programming Windows Phone 7” (Microsoft Press, 2010).

As you probably know, various releases of WPF and Silverlight over the past several years have been supplemented by toolkits released through CodePlex. These toolkits allow Microsoft to get new classes to developers outside of the usual ship cycle and often give us a “sneak peek” at enhancements to the frameworks that might be incorporated in future releases. Full source code is an extra bonus.

Windows Phone 7 now also benefits from this custom. The Silverlight for Windows Phone Toolkit (available at silverlight.codeplex.com) contains DatePicker, TimePicker and ToggleSwitch controls already familiar to users of Windows Phone 7; a WrapPanel (handy for dealing with phone orientation changes); and multi-touch gesture support.

This new Silverlight gesture support in the toolkit is intended to be similar to the XNA TouchPanel.ReadGesture method, except it’s delivered through routed events rather than polling.

How similar is it? Much more so than I expected! Looking at the source code, I was quite surprised to discover that these new Silverlight gesture events were entirely derived from a call to the XNA TouchPanel.ReadGesture method. I wouldn’t have thought that a Silverlight application on Windows Phone was allowed to call this XNA method, but there it is.

Although the Silverlight and XNA gestures are fairly similar, the properties associated with the gestures are not. The XNA properties use vectors, for example, and because Silverlight doesn’t include a Vector structure (an omission I feel is ridiculous), the properties had to be redefined for Silverlight in certain simple ways.

As I’ve been working with these gesture events, they’ve come to be my favorite multi-touch API for Silverlight for Windows Phone. I’ve found them to be comprehensive for much of what I need to do and also fairly easy to use.

Let me demonstrate by giving these gestures actual work to do.

Gesture Service and Listener

All the source code for this column is in a downloadable Visual Studio solution named GestureDemos that contains three projects. You’ll need to have the Windows Phone 7 development tools installed, of course, and also the Silverlight for Windows Phone Toolkit.

After installing the toolkit, you can use it in your own Windows Phone projects by adding a reference to the Microsoft.Phone.Controls.Toolkit assembly. In the Add Reference dialog box, it should be listed under the .NET tab.

In a XAML file, you’ll then need an XML namespace declaration like this one (but all on one line):

xmlns:toolkit=
"clr-namespace:Microsoft.Phone.Controls;
assembly=Microsoft.Phone.Controls.Toolkit"

Here are the 12 available gesture events, roughly in the order that I’ll discuss them (the events that I’ve grouped on a single line are related and occur in a sequence):

GestureBegin, GestureCompleted
Tap
DoubleTap
Hold
DragStarted, DragDelta, DragCompleted
Flick
PinchStarted, PinchDelta, PinchCompleted

Suppose you want to handle Tap and Hold events that occur on a Grid or any child of the Grid. You can specify that in the XAML file like so:

<Grid ... >
  <toolkit:GestureService.GestureListener>
    <toolkit:GestureListener 
      Tap="OnGestureListenerTap"
      Hold="OnGestureListenerHold" />
  </toolkit:GestureService.GestureListener>
    ...
</Grid>

You indicate the events and handlers in a GestureListener tag that’s a child of the GestureListener attached property of the GestureService class.

Alternatively in code, you’ll need a namespace directive for the Microsoft.Phone.Controls namespace and the following code:

GestureListener gestureListener = 
  GestureService.GetGestureListener(element);
gestureListener.Tap += OnGestureListenerTap;
gestureListener.Hold += OnGestureListenerHold;

In either case, if you’re setting this gesture listener on a panel, make sure that the Background property is at least set to Transparent! Events will simply fall through a panel with a default background of null.

Tap and Hold

All gesture events are accompanied by event arguments of type GestureEventArgs or a type that derives from GestureEventArgs. The OriginalSource property indicates the top-most element touched by the first finger that meets the screen; the GetPosition method provides the current coordinates of that finger relative to any element.

The gesture events are routed, which means that they can travel up the visual tree and be handled for any element that has a GestureListener installed. As usual, an event handler can set the Handled property of GestureEventArgs to true to prevent an event from travelling further up the visual tree. However, this only affects other elements using these gesture events. Setting Handled to true does not prevent elements higher in the visual tree from obtaining touch input through other interfaces.

The GestureBegin event indicates that a finger has touched a previously fingerless screen; GestureCompleted signals when all fingers have left the screen. These events may be handy for initialization or cleanup, but you’ll generally be more focused on gesture events that occur between these two events.

I’m not going to spend much time on the simpler gestures. A Tap occurs when a finger touches the screen and then lifts up within about 1.1 seconds, without moving too far from the original position. If two taps are close in succession, the second one comes through as a DoubleTap. A Hold occurs when a finger is pressed on the screen and remains in roughly the same spot for about 1.1 seconds. The Hold event is generated at the end of this time without waiting for the finger to lift.

Drag and Flick

A Drag sequence—consisting of a DragStarted event, zero or more DragDelta events and a DragCompleted event—occurs when a finger touches the screen, moves and lifts. Because it isn’t known that dragging will occur when a finger first touches the screen, the DragStarted event is delayed until the finger actually starts moving beyond the Tap threshold. The DragStarted event might be preceded by a Hold event if the finger has been on the screen without moving for about a second.

Because the finger has already begun moving when the DragStarted event is fired, the DragStartedEventArgs object can include a Direction property of type Orientation (Horizontal or Vertical). The DragDeltaEventArgs object accompanying the DragDelta event includes more information: HorizontalChange and VerticalChange properties that are convenient for adding to the X and Y properties of a TranslateTransform, or the Canvas.Left and Canvas.Top attached properties.

The Flick event occurs when a finger leaves the screen as it’s still moving, suggesting that the user wants inertia to occur. The event arguments include an Angle (measured clockwise from the positive X axis) and HorizontalVelocity and VerticalVelocity values, both in pixels per second.

The Flick event can occur in isolation; or it can occur between DragStarted and DragCompleted events without any DragDelta events; or it might follow a series of DragDelta events before DragCompleted. Generally you’ll want to handle Drag events and Flick events in conjunction, almost as if the Flick is a continuation of the Drag. However, you’ll need to add your own inertia logic.

This is demonstrated in the DragAndFlick project. The display contains an ellipse that the user simply drags around with a finger. If the finger leaves the screen with a flicking motion, then a Flick event occurs and the Flick handler saves some information and installs a handler for the CompositionTarget.Rendering event. This event—which occurs in synchronization with the video display refresh—keeps the ellipse moving while applying a deceleration to the velocity.

Bouncing off the sides is handled a bit unusually: The program maintains a position as if the ellipse simply keeps moving in the same direction until it stops; that position is folded into the area in which it can bounce.

Pinch Me, I Must Be Dreaming

The Pinch sequence occurs when two fingers are touching the screen; it’s generally interpreted to expand or contract an on-screen object, possibly rotating it as well.

There’s no question that the pinching operation constitutes one of the most treacherous areas of multi-touch processing, and it’s not unusual to see higher-level interfaces fail at providing adequate information. Most notoriously, the Windows Phone 7 ManipulationDelta event is particularly tricky to use.

When handling gestures, Drag sequences and Pinch sequences are mutually exclusive. They don’t overlap but they can occur back to back. For example, press a finger to the screen and drag it. That generates a DragStarted and multiple DragDelta events. Now press a second finger to the screen. You’ll get a DragCompleted to complete the Drag sequence followed by a PinchStarted and multiple PinchDelta events. Now lift the second finger while the first finger keeps moving. That’s a PinchCompleted to complete the Pinch sequence, followed by DragStarted and DragDelta. Depending on the number of fingers touching the screen, you’re basically alternating between Drag sequences and Pinch sequences.

One helpful characteristic of this Pinch gesture is that it doesn’t discard information. You can use properties of the event arguments to entirely reconstruct the positions of the two fingers, so you can always go back to first principles if you need to.

During a Pinch sequence, the current location of one finger—let’s call it the primary finger—is always available with the GetPosition method. For this discussion, call that return value pt1. For the PinchStarted event, the PinchStartedGestureEventArgs class has two additional properties named Distance and Angle indicating the location of the second finger relative to the first. You can easily calculate that actual location using the following statement:

Point pt2 = new Point(pt1.X + args.Distance * Cos(args.Angle),
                      pt1.Y + args.Distance * Sin(args.Angle));

The Angle property is in degrees, so you’ll need Cos and Sin methods to convert to radians before calling Math.Cos and Math.Sin. Before the PinchStarted handler has completed, you’ll also want to save the Distance and Angle properties in fields, perhaps named pinchStartDistance and pinchStartAngle.

The PinchDelta event is accompanied by a PinchGestureEventArgs object. Once again, the GetPosition method gives you the location of the primary finger, which has perhaps moved from its original location. For the second finger, the event arguments provide DistanceRatio and TotalAngleDelta properties.

The DistanceRatio is the ratio of the current distance between the fingers to the original distance, which means you can calculate the current distance like so:

double distance = args.DistanceRatio * pinchStartDistance;

The TotalAngleDelta is a difference between the current angle between the fingers and the original angle. You can calculate the current angle like this:

double angle = args.TotalAngleDelta + pinchStartAngle;

Now you can calculate the location of the second finger as before:

Point pt2 = new Point(pt1.X + distance * Cos(angle),
                      pt1.Y + distance * Sin(angle));

You don’t need to save any additional information to fields during PinchDelta handling to process further PinchDelta events.

The TwoFingerTracking project demonstrates this logic by displaying blue and green ellipses that track one or two fingers around the screen.

Scale and Rotate

The PinchDelta event also provides sufficient information to perform scaling and rotation on objects. I had to supply my own matrix multiplication method, but that was about the extent of the hassles.

To demonstrate, the ScaleAndRotate project implements what is now a “traditional” type of demonstration that lets you drag, scale and optionally rotate a photograph. To perform these transforms, I defined the Image element with a double-barreled RenderTransform as shown in Figure 1.

Figure 1 The Image Element in ScaleAndRotate

<Image Name="image"
  Source="PetzoldTattoo.jpg"
  Stretch="None"
  HorizontalAlignment="Left"
  VerticalAlignment="Top">
  <Image.RenderTransform>
    <TransformGroup>
      <MatrixTransform x:Name="previousTransform" />
        <TransformGroup x:Name="currentTransform">
          <ScaleTransform x:Name="scaleTransform" />
          <RotateTransform x:Name="rotateTransform" />
          <TranslateTransform x:Name="translateTransform" />
        </TransformGroup>
    </TransformGroup>
  </Image.RenderTransform>
</Image>

When a Drag or Pinch operation is in progress, the three transforms in the nested TransformGroup are manipulated to move the picture around the screen, scale it and rotate it. When a DragCompleted or PinchCompleted event occurs, the Matrix in the MatrixTransform named previousTransform is multiplied by the composite transform available as the Value property of the TransformGroup. The three transforms in this TransformGroup are then set back to their default values.

Scaling and rotation are always relative to a center point, which is the point that remains in the same location when the transform occurs. A photograph scaled or rotated relative to its upper-left corner ends up in a different location than a photograph scaled or rotated relative to its lower-right corner.

The ScaleAndRotate code is shown in Figure 2. I use the primary finger as the scaling and rotation center; these center points are set on the transforms during PinchStarted handling and they don’t change for the duration of the Pinch sequence. During PinchDelta events, the DistanceRatio and TotalAngleDelta properties provide scaling and rotation information relative to that center. Any change in movement of the primary finger (which must be detected with a saved field) then becomes an overall translation factor.

Figure 2 The ScaleAndRotate Code

public partial class MainPage : PhoneApplicationPage
{
    bool isDragging;
    bool isPinching;
    Point ptPinchPositionStart;
    public MainPage()
    {
        InitializeComponent();
    }
    void OnGestureListenerDragStarted(object sender, DragStartedGestureEventArgs args)
    {
        isDragging = args.OriginalSource == image;
    }
    void OnGestureListenerDragDelta(object sender, DragDeltaGestureEventArgs args)
    {
        if (isDragging)
        {
            translateTransform.X += args.HorizontalChange;
            translateTransform.Y += args.VerticalChange;
        }
    }
    void OnGestureListenerDragCompleted(object sender, 
      DragCompletedGestureEventArgs args)
    {
        if (isDragging)
        {
            TransferTransforms();
            isDragging = false;
        }
    }
    void OnGestureListenerPinchStarted(object sender, 
      PinchStartedGestureEventArgs args)
    {
        isPinching = args.OriginalSource == image;
        if (isPinching)
        {
            // Set transform centers
            Point ptPinchCenter = args.GetPosition(image);
            ptPinchCenter = previousTransform.Transform(ptPinchCenter);
            scaleTransform.CenterX = ptPinchCenter.X;
            scaleTransform.CenterY = ptPinchCenter.Y;
            rotateTransform.CenterX = ptPinchCenter.X;
            rotateTransform.CenterY = ptPinchCenter.Y;
            ptPinchPositionStart = args.GetPosition(this);
        }
    }
    void OnGestureListenerPinchDelta(object sender, PinchGestureEventArgs args)
    {
        if (isPinching)
        {
            // Set scaling
            scaleTransform.ScaleX = args.DistanceRatio;
            scaleTransform.ScaleY = args.DistanceRatio;
            // Optionally set rotation
            if (allowRotateCheckBox.IsChecked.Value)
                rotateTransform.Angle = args.TotalAngleDelta;
            // Set translation
            Point ptPinchPosition = args.GetPosition(this);
            translateTransform.X = ptPinchPosition.X - ptPinchPositionStart.X;
            translateTransform.Y = ptPinchPosition.Y - ptPinchPositionStart.Y;
        }
    }
    void OnGestureListenerPinchCompleted(object sender, PinchGestureEventArgs args)
    {
        if (isPinching)
        {
            TransferTransforms();
            isPinching = false;
        }
    }
    void TransferTransforms()
    {
        previousTransform.Matrix = Multiply(previousTransform.Matrix, 
          currentTransform.Value);
        // Set current transforms to default values
        scaleTransform.ScaleX = scaleTransform.ScaleY = 1;
        scaleTransform.CenterX = scaleTransform.CenterY = 0;
        rotateTransform.Angle = 0;
        rotateTransform.CenterX = rotateTransform.CenterY = 0;
        translateTransform.X = translateTransform.Y = 0;
    }
    Matrix Multiply(Matrix A, Matrix B)
    {
        return new Matrix(A.M11 * B.M11 + A.M12 * B.M21,
                          A.M11 * B.M12 + A.M12 * B.M22,
                          A.M21 * B.M11 + A.M22 * B.M21,
                          A.M21 * B.M12 + A.M22 * B.M22,
                          A.OffsetX * B.M11 + A.OffsetY * B.M21 + B.OffsetX,
                          A.OffsetX * B.M12 + A.OffsetY * B.M22 + B.OffsetY);
    }
}

That’s certainly the simplest pinch code I’ve ever written, and that fact is perhaps the best endorsement I can provide for this new gesture interface.

Perhaps a unified theory of multi-touch isn’t far off after all.


Charles Petzold* is a longtime contributing editor to* MSDN Magazine*. His new book, “Programming Windows Phone 7” (Microsoft Press, 2010), is available as a free download at bit.ly/cpebookpdf.*

Thanks to the following technical expert for reviewing this article: Richard Bailey