Head and eye gaze in DirectX

In Windows Mixed Reality, gaze input is used to determine what the user is looking at. This can be used to drive primary input models such as gaze and commit, and also to provide context for types of interactions. There are two types of gaze vectors available through the API: head gaze and eye gaze. Both are provided as a three dimensional ray with an origin and direction. Applications can then raycast into their scenes, or the real world, and determine what the user is targeting.

Head gaze represents the direction that the user's head is pointed in. Think of this as the position and forward direction of the device itself, with the position representing the center point between the two displays. Head gaze is available on all Mixed Reality devices.

Eye gaze represents the direction that the user's eyes are looking towards. The origin is located between the user's eyes. It is available on Mixed Reality devices that include an eye tracking system.

Both head and eye gaze rays are accessible through the SpatialPointerPose API. Simply call SpatialPointerPose::TryGetAtTimestamp to recieve a new SpatialPointerPose object at the specified timestamp and coordinate system. This SpatialPointerPose contains a head gaze origin and direction. It also contains an eye gaze origin and direction if eye tracking is available.

Using head gaze

To access the head gaze, start by calling SpatialPointerPose::TryGetAtTimestamp to recieve a new SpatialPointerPose object. You need to pass the following parameters.

  • A SpatialCoordinateSystem that represents the desired coordinate system for the head gaze. This is represented by the coordinateSystem variable in the following code. For more information, visit our coordinate systems developer guide.
  • A Timestamp that represents the exact time of the head pose requested. Typically you will use a timestamp that corresponds to the time when the current frame will be displayed. You can get this predicted display timestamp from a HolographicFramePrediction object, which is accessible through the current HolographicFrame. This HolographicFramePrediction object is represented by the prediction variable in the following code.

Once you have a valid SpatialPointerPose, the head position and forward direction are accessible as properties. The following code shows how to access them.

using namespace winrt::Windows::UI::Input::Spatial;
using namespace winrt::Windows::Foundation::Numerics;

SpatialPointerPose pointerPose = SpatialPointerPose::TryGetAtTimestamp(coordinateSystem, prediction.Timestamp());
if (pointerPose)
{
	float3 headPosition = pointerPose.Head().Position();
	float3 headForwardDirection = pointerPose.Head().ForwardDirection();

	// Do something with the head gaze
}

Using eye gaze

The eye gaze API is very similar to head gaze. It uses the same SpatialPointerPose API, which provides a ray origin and direction that you can raycast against your scene. The only difference is you need to explicitly enable eye tracking before using it by requesting access.

Declaring the Gaze Input capability

Double click the appxmanifest file in Solution Explorer. Then navigate to the Capabilities section and check the Gaze Input capability.

Gaze input capability

This adds the following lines to the Package section in the appxmanifest file:

  <Capabilities>
    <DeviceCapability Name="gazeInput" />
  </Capabilities>

Requesting access to gaze input

When your app is starting up, call EyesPose::RequestAccessAsync to request access to eye tracking. The system will prompt the user if needed, and return GazeInputAccessStatus::Allowed once access has been granted. This is an asynchronous call, so it requires a bit of extra management. The following example spins up a detached std::thread to wait for the result, which it stores to a member variable called m_isEyeTrackingEnabled.

using namespace winrt::Windows::Perception::People;
using namespace winrt::Windows::UI::Input;

std::thread requestAccessThread([this]()
{
	auto status = EyesPose::RequestAccessAsync().get();

	if (status == GazeInputAccessStatus::Allowed)
		m_isEyeTrackingEnabled = true;
	else
		m_isEyeTrackingEnabled = false;
});

requestAccessThread.detach();

Starting a detached thread is just one option for handling async calls. Alternatively, you could use the new co_await functionality supported by C++/WinRT.

Getting the eye gaze ray

Once you have recieved access to ET, you are free to grab the eye gaze ray every frame. Just as with head gaze, get the SpatialPointerPose by calling SpatialPointerPose::TryGetAtTimestamp with a desired timestamp and coordinate system. The SpatialPointerPose contains an EyesPose object through the Eyes property. This is non-null only if eye tracking is enabled. From there you can check if the user in the device has an eye tracking calibration by calling EyesPose::IsCalibrationValid. Next, use the Gaze property to get the a SpatialRay contianing the eye gaze position and direction. The Gaze property can sometimes be null, so be sure to check for this. This can happen is if a calibrated user temporarily closes their eyes.

The following code shows how to access the eye gaze ray.

using namespace winrt::Windows::UI::Input::Spatial;
using namespace winrt::Windows::Foundation::Numerics;

SpatialPointerPose pointerPose = SpatialPointerPose::TryGetAtTimestamp(coordinateSystem, prediction.Timestamp());
if (pointerPose)
{
	if (pointerPose.Eyes() && pointerPose.Eyes().IsCalibrationValid())
	{
		if (pointerPose.Eyes().Gaze())
		{
			auto spatialRay = pointerPose.Eyes().Gaze().Value();
			float3 eyeGazeOrigin = spatialRay.Origin;
			float3 eyeGazeDirection = spatialRay.Direction;
			
			// Do something with the eye gaze
		}
	}
}

Correlating gaze with other inputs

Sometimes you may find that you need a SpatialPointerPose that corresponds with an event in the past. For example, if the user performs an Air Tap, your app might want to know what they were looking at. For this purpose, simply using SpatialPointerPose::TryGetAtTimestamp with the predicted frame time would be inaccurate because of the latency between system input processing and display time.

One way to handle this scenario is to make an additional call to SpatialPointerPose::TryGetAtTimestamp, using a historical timestamp that corresponds to the input event. However, for input that routes through the SpatialInteractionManager, there's an easier method. The SpatialInteractionSourceState has its very own TryGetAtTimestamp function. Calling that will provide a perfectly correlated SpatialPointerPose without the guesswork. For more information on working with SpatialInteractionSourceStates, take a look at the Hands and Motion Controllers in DirectX documentation.

See also