Custom Gestures for 3D Manipulation Using Windows Touch in C++

My CodeProject Entries

My CodeProject Entries


At PDC 2009, Reed Townsend presented some very exciting multitouch samples in a presentation: Windows Touch Deep Dive. The following video shows the presentation:

This blog post covers the code behind the custom 3D manipulations that were presented.  You can download the 3D manipulations code sample for this from my server.  Note that this project uses the DirectX SDK. If you don't already have the DirectX SDK installed, you can Download the DirectX SDK from the DirectX Developer Center.  If you installed the DirectX SDK, you will also need to add the header, library, and other include paths to Visual Studio.  You may need to add some additional libraries (D3D11.lib;d3d10_1.lib;) to the additional dependencies in the linker input if you are getting the errors:


1>D3DXDriver.obj : error LNK2019: unresolved external symbol _D3D10CreateDeviceAndSwapChain1@36 referenced in function "private: long __thiscall D3DXDriver::CreateSwapChain(unsigned int,unsigned int)" (?CreateSwapChain@D3DXDriver@@AAEJII@Z)

1>D3DXDriver.obj : error LNK2019: unresolved external symbol _D3D11CreateDevice@40 referenced in function "private: long __thiscall D3DXDriver::CreateSwapChain(unsigned int,unsigned int)" (?CreateSwapChain@D3DXDriver@@AAEJII@Z)

For the 3D manipulations demo that was shown at PDC 2009, a custom implementation of Windows Touch input is used to map gestures to transformations in 3D space.  The following image shows the application in use.





For this demo, a few utility classes are created that simplify and organize the code. The D3DXDriver class encapsulates the Direct 3D setup and control. The CComTouchDriver class encapsulates Windows Touch handling. The Camera class inherits from the InertiaObj class, a generic inertia class, and encapsulates the various transformations that are made to the camera object within the scene. The following dataflow shows how messages are propagated through the application in the utility classes.



In the diagram on the left, the WM_TOUCH message is generated by user input on the device and the message is sent to the main window’s WndProc method. From there, the message gets passed to the CComTouchDriver class which then sends the event data to the Camera class which then feeds the input its input handler. The input will then cause the manipulation processor (represented in the diagram on the right) to raise events such as ManipulationStarted and ManipulationDelta. The event handler for the ManipulationDelta event will update the camera position based on the event’s values.

Demo Component Details

The following sections describe the various tasks that were completed to create this demo.

· Set up Direct3D

· Add WM_TOUCH support

· Add manipulations support

· Map manipulations to 3D navigation

· Add inertia and tweak the application 



Set up Direct3D

This project uses the D3DXDriver class which simplifies hooking up Direct3D to a project.  For this project, the D3DXDriver class encapsulates the rendering methods and managing some of the scene objects.  The render method uses the camera object to set up the camera position so that updates that the camera object makes on itself are reflected when the scene renders.

Once you have Direct3D working using the driver, you can set up the basic elements of the scene.

Create boxes

The boxes that are seen in the scene are generated by a few calls to the Direct3D API.  The following code shows you how the boxes are generated and randomly placed.

    //seed the prng to get consistent output

    //1987 is an arbitrary number that happens to have good results


    //build the world transforms for each of the colored boxes

    for (int i = 0; i < NUM_BOXES; i++)


        m_amBoxes[i] = new D3DXMATRIX();

        if (m_amBoxes[i] == NULL)


            hr = E_FAIL;






            (FLOAT)(rand() % 20 - 5),

            (FLOAT)(rand() % 10 - 5),

            (FLOAT)(rand() % 20 - 5));



The following code shows how the boxes are rendered.



    //render the scattered cubes

    for (int i = 0; i < NUM_BOXES; i++)


        RenderBox(m_amBoxes[i], FALSE, NULL);






Create axes

The x, y, and z axes are added to the scene to enable some reference points that you can move the scene around.  In implementation, these objects are just stretched boxes created in a manner similar to the randomly generated boxes that were created for the scene.  The following code shows how the axes coordinate and color values are initialized.



        //build the world transforms for each of the axes

        FLOAT afScale[3], afTrans[3];

        D3DXMATRIX mScale, mTranslate;

        for (int i = 0; i < 3; i++)

        { m_amAxes[i] = new D3DXMATRIX();

            if (m_amAxes[i] == NULL)


                hr = E_FAIL;





            //each axis is just a cube, streched and transformed to the proper position about

            //the origin

            for (int j = 0; j < 3; j++)


                afScale[j] = AXIS_SIZE;

                afTrans[j] = 0.0f;


            //the axis should go onto the - side of the origin just a bit

            afScale[i] = AXIS_LENGTH;

            afTrans[i] = AXIS_LENGTH * 0.98f;

       D3DXMatrixScaling(&mScale, afScale[0], afScale[1], afScale[2]);

            D3DXMatrixTranslation(&mTranslate, afTrans[0], afTrans[1], afTrans[2]);

            D3DXMatrixMultiply(m_amAxes[i], &mScale, &mTranslate);





The following code shows how the axes are rendered in D3DXDriver.cpp.



   //render the axes

    for (int i = 0; i < 3; i++)


        RenderBox(m_amAxes[i], TRUE, &(g_vAxesColors[i]));





Set up initial position for the camera

The camera must be initialized so that the view a user sees includes the entire scene that was set up in previous steps.  The following code shows how the camera is set up in the camera class.



VOID Camera::Reset()


    m_vPos = D3DXVECTOR3( 20.0f, 20.0f, -20.0f);

    m_vLookAt = D3DXVECTOR3( 0.0f, 0.0f, 0.0f );

    m_vUp = D3DXVECTOR3( 0.0f, 1.0f, 0.0f );






The following code shows how the camera object is used to render from the camera’s position.



    //update camera





        &(m_pCamera->m_vUp) );





By default, applications will receive WM_GESTURE messages.  Since this is a custom implementation of input based on touch, you need to call RegisterTouchWindow to get WM_TOUCH messages and ultimately will use a ManipulationProcessor to interpret some of the WM_TOUCH messages. The following code shows how RegisterTouchWindow is called in InitWindow for the project’s main source file.



    if( !FAILED( hr ) )


        // Ready for handling WM_TOUCH messages

        RegisterTouchWindow(*hWnd, 0);

        ShowWindow( *hWnd, nCmdShow );






An advantage of using WM_TOUCH messages and custom handling of touch messages over using WM_GESTURE messages is that you can simultaneously perform 2 different types of manipulation simultaneously (zoom while panning, rotation while zooming, and so on).  The following code maps the WM_TOUCH message to the TouchProc method.

The following code shows how the WM_TOUCH message is propagated to the CComTouchDriver class in the TouchProc method.



LRESULT CALLBACK WndProc( HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam )



    HDC hdc;

    switch( message )


        case WM_TOUCH:

            TouchProc(hWnd, message, wParam, lParam);





The following code shows how the CComTouchDriver class handles the input event.



LRESULT TouchProc( HWND hWnd, UINT /*message*/, WPARAM wParam, LPARAM lParam )




int iNumContacts;

POINT ptInputs;

iNumContacts = LOWORD(wParam);

hInput = (HTOUCHINPUT)lParam;

pInputs = new (std::nothrow) TOUCHINPUT[iNumContacts];

// Get each touch input info and feed each TOUCHINPUT into the process input handler

if(pInputs != NULL)


if(GetTouchInputInfo(hInput, iNumContacts, pInputs, sizeof(TOUCHINPUT)))


for(int i = 0; i < iNumContacts; i++)


// Bring touch input info into client coordinates

ptInputs.x = pInputs[i].x/100;

ptInputs.y = pInputs[i].y/100;

ScreenToClient(hWnd, &ptInputs);

pInputs[i].x = ptInputs.x;

pInputs[i].y = ptInputs.y;




delete [] pInputs;



return 0;



The following code shows how the CComTouchDriver class handles the input event.

VOID CComTouchDriver::ProcessInputEvent(TOUCHINPUT * inData)


    //look for gestures

    //if there is a gesture, we don't want to change the event stream

    //(i.e. remove the tap events) because we don't want to accidentally

    //destroy any account keeping in an event sink (e.g. have a consumer see

    //a touch down without a corresponding touch up)

    if ( IsMultiFingerTap( inData ) )




    //The only object we want to respond to touch events is the camera.

    //If we wanted to add more objects and do hit detection, this would be

    //the place to do it


    //e.g. Find which object the user was manipulating and route the touch

    //input to it

    m_pCamera->ProcessInputEvent(inData, m_pointMap.size());

The first thing it does is check for a custom gesture.  The second thing the code does is send the input data to the camera.  Finally, the following code shows how the camera handles the input data.




VOID Camera::ProcessInputEvent(TOUCHINPUT const * inData, int iNumContacts)


    TrackNumContacts(inData->dwTime, iNumContacts);

    InertiaObj::ProcessInputEvent(inData, iNumContacts);






The following code shows how the InertiaObj class handles WM_TOUCH data.




VOID InertiaObj::ProcessInputEvent(TOUCHINPUT const * inData, int /*iNumContacts*/)


    DWORD dwCursorID = inData->dwID;

    DWORD dwTime = inData->dwTime;

    DWORD dwEvent = inData->dwFlags;

    FLOAT fpX = (FLOAT)inData->x, fpY = (FLOAT)inData->y;


    if(dwEvent & TOUCHEVENTF_DOWN)


        m_manipulationProc->ProcessDownWithTime(dwCursorID, fpX, fpY, dwTime);


    else if(dwEvent & TOUCHEVENTF_MOVE)


        m_manipulationProc->ProcessMoveWithTime(dwCursorID, fpX, fpY, dwTime);


    else if(dwEvent & TOUCHEVENTF_UP)


        m_manipulationProc->ProcessUpWithTime(dwCursorID, fpX, fpY, dwTime);







In summary, the touch data propagates from the main application, to the touch driver, to the camera.



Add Manipulations and Map manipulations to 3D navigation

This project uses CComTouchDriver, a class that encapsulates much of the touch input and has places where the input handling can be easily customized and the InertiaObj class, a class that encapsulates touch input handling for inertia. As described in the previous section, WM_TOUCH messages are handed to the touch driver in the main window’s WndProc method which then routes messages appropriately to the camera which implements the InertiaProcessor interface.  Once the messages are reaching the classes implementing the _IManipulationEvents interfaces, manipulation events will be generated.  Once the manipulation events are generated, you can map the manipulations to 3D navigation.  The following sections describe the various manipulation mappings.

Zoom / Pinch to move the camera’s distance or pan about the z-axis

These transforms are hooked up within the manipulation processor to modify the camera’s distance while keeping the camera locked to the focal point or by panning about the z-axis.

    //handle pinch - zoom or z-axis pan depending on the Ctrl button state

    SHORT sCtrlState = GetKeyState(VK_CONTROL);

    if (sCtrlState < 0)


        vPan = D3DXVECTOR2(0.0f, 0.0f);

        Pan(vPan, CalcZPan(delta.scaleDelta));







Note that the pinch gesture performs an operation similar to zooming if you hold the Control key but it’s slightly different.  If you hold control while panning, the focal distance of the camera remains fixed and the camera moves in the coordinate space instead.


Spherical panning

Panning is done by rotating the camera about a focal point behind the scene created by the boxes.  The following code shows how the camera is panned in the manipulation event handler.

    //spherical pan if 1 finger, pan if 2+ fingers

    if (m_uLagNumContacts >= 2)


        vPan = D3DXVECTOR2(-delta.translationDeltaX, delta.translationDeltaY);

        Pan(vPan, 0);




        vSpherePan = D3DXVECTOR2(





The following code shows how the Pan method is implemented.

VOID Camera::Pan(D3DXVECTOR2 vPan, FLOAT zPan)


    D3DXVECTOR3 vTPan, vRadius;

    RECT rClient;

    GetClientRect(m_hWnd, &rClient);

    FLOAT fpWidth = (FLOAT)(rClient.right - rClient.left);

    FLOAT fpHeight = (FLOAT)(rClient.bottom -;

    vRadius = m_vPos - m_vLookAt;

    FLOAT fpRadius = D3DXVec3Length(&vRadius);

    //our field of view determines how far it is from one side of the screen to the other

    //in world coordinates

    //determine this distance, and scale our normalized xy pan vector to it

    FLOAT fpYPanCoef = 2*fpRadius / tan( (D3DX_PI - FOV_Y) / 2.0f);

    FLOAT fpXPanCoef = fpYPanCoef * (fpWidth / fpHeight);

    vTPan.x = vPan.x * fpXPanCoef;

    vTPan.y = vPan.y * fpYPanCoef;

    vTPan.z = zPan;

    ScreenVecToCameraVec(&vTPan, vTPan);

    m_vPos += vTPan;

    m_vLookAt += vTPan;


The following code shows how the SphericalPan method is implemented.

VOID Camera::SphericalPan(D3DXVECTOR2 vPan)



    D3DXMATRIX mRot;

    D3DXVECTOR3 vRotAxis;

    D3DXVECTOR3 vRadius = m_vPos - m_vLookAt;

    FLOAT radius = D3DXVec3Length(&vRadius);

    //Translate the relative pan vector to the absolute distance

    //we want to travel

    //the radius of the oribit the camera makes with a screen x-axis input

    FLOAT cameraHeight = D3DXVec3Dot(&vRadius, &m_vUp);

    FLOAT xOrbitRadius = sqrt( pow(radius, 2) - pow(cameraHeight, 2));

    //panning across the entire screen will rotate the view 180 degrees

    FLOAT ySpherePanCoef = 2 * sqrt(2.0f * pow(radius, 2));

    FLOAT xSpherePanCoef = 2 * sqrt(2.0f * pow(xOrbitRadius, 2));

    vPan.x *= xSpherePanCoef;

    vPan.y *= ySpherePanCoef;

    D3DXVECTOR3 vTPan = D3DXVECTOR3(vPan.x, vPan.y, 0);


    //the angle of the arc of the path around the sphere we want to take

    FLOAT theta = D3DXVec2Length(&vPan) / radius;


    //the other angle (the triangle is icoseles) of the triangle interior to the arc

    FLOAT gamma = (FLOAT)((D3DX_PI - theta) / 2.0f);

    //the length of the chord beneath the arc we traveling

    //ultimately we will set vTPos to be this chord,

    //therefore m_vPos+vTPan will be the new position

    FLOAT chordLen = (radius * sin(theta)) / sin(gamma);

    //translate pan to the camera's frame of reference

    ScreenVecToCameraVec(&vTPan, vTPan);

    //then set pan to the length of the chord

    D3DXVec3Normalize(&vTPan, &vTPan);

    vTPan *= chordLen;

    //rotate the chord into the sphere by pi/2 - gamma


        D3DXVec3Cross(&vRotAxis, &vTPan, &m_vPos),

        -(FLOAT)((D3DX_PI / 2.0f) - gamma));

    D3DXMatrixRotationQuaternion(&mRot, &q);

    D3DXVec3TransformCoord(&vTPan, &vTPan, &mRot);


    //vTPan is now equal to the chord beneath the arc we wanted to travel along

    //our view sphere

    D3DXVECTOR3 vNewPos = m_vPos + vTPan;

    //watch to see if the cross product flipped directions

    //this happens if we go over the top/bottom of our sphere

    D3DXVECTOR3 vXBefore, vXAfter;

    vRadius = m_vPos - m_vLookAt;

    D3DXVec3Cross(&vXBefore, &vRadius, &m_vUp);

    D3DXVec3Normalize(&vXBefore, &vXBefore);

    vRadius = vNewPos - m_vLookAt;

    D3DXVec3Cross(&vXAfter, &vRadius, &m_vUp);

    D3DXVec3Normalize(&vXAfter, &vXAfter);

    D3DXVECTOR3 vXPlus = vXBefore + vXAfter;

    //if we went straight over the top the vXPlus would be zero

    //the < 0.5 lets it go almost straight over the top too

    if ( D3DXVec3Length(&vXPlus) < 0.5f )


        //go upside down

        m_vUp = -m_vUp;


    //update our camera position

    m_vPos = vNewPos;


2-finger tap detection

This custom gesture is implemented by detecting when more than one input comes down and comes up within a certain window of time.  Handling this gesture is implemented by recording the time that fingers come down and the position they come down at as well as the time that fingers come up and the time. To track the point inputs and calculate the distance points have travelled, a map, m_pointMap is created to store points.  To track the time and number of contacts, the start time for the input process is stored along with the maximum number of contacts seen. The following code shows how 2-finger tap detection is implemented.


    //used for gesture detection

    unsigned int m_uMaxNumContactsSeen;

    DWORD m_dwGestureStartTime;

    //used to track finger travel distance in gesture detection

    std::map<DWORD, D3DXVECTOR2> m_pointMap;

    FLOAT m_fpMaxDist;


BOOL CComTouchDriver::IsMultiFingerTap(TOUCHINPUT const * inData)


    BOOL fResult = FALSE;

    DWORD dwPTime = inData->dwTime;

    DWORD dwEvent = inData->dwFlags;

    DWORD dwCursorID = inData->dwID;

    FLOAT x = (FLOAT)(inData->x);

    FLOAT y = (FLOAT)(inData->y);

    if(dwEvent & TOUCHEVENTF_DOWN)


        if (m_pointMap.size() == 0) //if this is the first contact in a gesture


            m_dwGestureStartTime = dwPTime;

            m_fpMaxDist = 0;




             m_pointMap.insert(std::pair<DWORD, D3DXVECTOR2>(dwCursorID, D3DXVECTOR2(x,y)));


             if (m_pointMap.size() > m_uMaxNumContactsSeen)


                m_uMaxNumContactsSeen = m_pointMap.size();





            //if we can't keep track of the distance traveled, assume it was to far

            m_fpMaxDist = MAX_TAP_DIST + 1;



    else if(dwEvent & TOUCHEVENTF_UP)


        //calculate the distance this contact traveled - from touch down to touchup

    std::map<DWORD, D3DXVECTOR2>::iterator it = m_pointMap.find(dwCursorID);

        if(it != m_pointMap.end())


            D3DXVECTOR2 ptStart = (*it).second;

            D3DXVECTOR2 ptEnd = D3DXVECTOR2( x, y );

            D3DXVECTOR2 vDist = ptEnd - ptStart;

            FLOAT fpDist = D3DXVec2Length( &vDist );

            if (fpDist > m_fpMaxDist)


                m_fpMaxDist = fpDist;




        //if the gesture is over (no more contacts), look for a two finger tap

        if (m_pointMap.size() == 0)


            //at least 2 fingers, in a quick enough succession to be called a tap

            //if we wanted to capture exactly n-finger taps, we would just change

            //this comparison

            if (m_uMaxNumContactsSeen >= 2 && dwPTime - m_dwGestureStartTime < MAX_TAP_TIME)


                fResult = TRUE;


            //clear the num of contacts we've seen - the gesture is over

            m_uMaxNumContactsSeen = 0;



    //if any contact traveled more than MAX_TAP_DIST between touchdown and touchup,

    //we did not have a tap

    if (m_fpMaxDist > MAX_TAP_DIST)


        fResult = FALSE;


    return fResult;


Manipulation smoothing

The input that you get by default will have some variability which is not optimal for this particular project.  This causes jittery motion when panning around.  This could be caused by noise on the input device or the manipulation processor interpreting the gesture as a combined pan and rotate gesture.  To fix this, a window of input messages is kept and is averaged before processing the WM_TOUCH messages.  Smoothing the input messages fixes the wobbly panning and zooming.  The following code shows how the averaged window is stored for contacts.

VOID Camera::SmoothManipulationDelta(Delta *delta)


    //we smooth by keeping a rolling window of inputs and using their average value as

    //the return. The aim is to have small jitters in the input cancel eachother out

    //before they are seen by the user

    Delta sumDeltas;

    m_ucWindowIndex = m_ucWindowIndex % SMOOTHING_WINDOW_SIZE;

    m_pDeltaWindow[m_ucWindowIndex++] = *delta;



    for (int i = 0; i < SMOOTHING_WINDOW_SIZE; i++)


        sumDeltas.translationDeltaX += m_pDeltaWindow[i].translationDeltaX;

        sumDeltas.translationDeltaY += m_pDeltaWindow[i].translationDeltaY;

        sumDeltas.rotationDelta += m_pDeltaWindow[i].rotationDelta;

        //scaleDelta is a multiplicitve delta, not an additive delta like the others

        sumDeltas.scaleDelta *= m_pDeltaWindow[i].scaleDelta;


    sumDeltas.translationDeltaX /= SMOOTHING_WINDOW_SIZE;

    sumDeltas.translationDeltaY /= SMOOTHING_WINDOW_SIZE;

    sumDeltas.rotationDelta /= SMOOTHING_WINDOW_SIZE;

    sumDeltas.scaleDelta = pow(sumDeltas.scaleDelta, 1.0f/SMOOTHING_WINDOW_SIZE);


    delta->translationDeltaX = sumDeltas.translationDeltaX;

    delta->translationDeltaY = sumDeltas.translationDeltaY;



    delta->scaleDelta = sumDeltas.scaleDelta;

    delta->rotationDelta = sumDeltas.rotationDelta;



Note that this is implemented by simply averaging the input deltas in members of the _IManipulationEvents interface via the InertiaObj class.

Add inertia and other tweaks

Inertia is handled by triggering a timer on manipulation completion.  This timer calls the Process method on the inertia processor encapsulated in the camera class.  The following code shows how the timer is triggered in the InertiaObj interface.

HRESULT STDMETHODCALLTYPE InertiaObj::ManipulationCompleted(

    FLOAT /*x*/,

    FLOAT /*y*/,

    FLOAT /*cumulativeTranslationX*/,

    FLOAT /*cumulativeTranslationY*/,

    FLOAT /*cumulativeScale*/,

    FLOAT /*cumulativeExpansion*/,

    FLOAT /*cumulativeRotation*/)


    HRESULT hr = S_OK;




        hr = SetupInertia(m_inertiaProc, m_manipulationProc);

        m_bIsInertiaActive = TRUE;


        // Kick off timer that handles inertia

        SetTimer(m_hWnd, m_iTimerId, DESIRED_MILLISECONDS, NULL);




        m_bIsInertiaActive = FALSE;

        // Stop timer that handles inertia

        KillTimer(m_hWnd, m_iTimerId);


    return hr;


The following code shows the handler for the WM_TIMER event in the 3dManipulation implementation file.


        case WM_TIMER:

            //process inertia



The following code shows how the ComTouchDriver class implements the ProcessChanges method to trigger inertia on the camera.

// Handler for activating the inerita processor

VOID CComTouchDriver::ProcessChanges()


    BOOL bCompleted = FALSE;

    if (m_pCamera->m_bIsInertiaActive == TRUE)






Inertia Camera


The camera derives from the InertiaObj class which inherits the _ManipulationEvents interface in order to enable inertia features. When the camera is constructed, parameters for the inertia settings are configured within the class.  The following code shows how inertia is configured.

HRESULT InertiaObj::SetupInertia(IInertiaProcessor* ip, IManipulationProcessor* mp)


    HRESULT hr = S_OK;

    // Set desired properties for inertia events

    // Deceleration for tranlations in pixel / msec^2

    HRESULT hrPutDD = ip->put_DesiredDeceleration(0.006f);

    // Deceleration for rotations in radians / msec^2

    HRESULT hrPutDAD = ip->put_DesiredAngularDeceleration(0.00002f);

    FLOAT fVX;

    FLOAT fVY;

    FLOAT fVR;

    HRESULT hrPutVX = mp->GetVelocityX(&fVX);

    HRESULT hrGetVY = mp->GetVelocityY(&fVY);

    HRESULT hrGetAV = mp->GetAngularVelocity(&fVR);

    // Set initial velocities for inertia processor

    HRESULT hrPutIVX = ip->put_InitialVelocityX(fVX);

    HRESULT hrPutIVY = ip->put_InitialVelocityY(fVY);

    HRESULT hrPutIAV = ip->put_InitialAngularVelocity(fVR);

    if(FAILED(hrPutDD) || FAILED(hrPutDAD) || FAILED(hrPutVX)

        || FAILED(hrGetVY) || FAILED(hrGetAV) || FAILED(hrPutIVX)

        || FAILED(hrPutIVY) || FAILED(hrPutIAV))


        hr = E_FAIL;


    return hr;


 See Also