Head-gaze and eye-gaze input in DirectX

In Windows Mixed Reality, eye and head gaze input is used to determine what the user is looking at. This can be used to drive primary input models such as head-gaze and commit, and also to provide context for types of interactions. There are two types of gaze vectors available through the API: head-gaze and eye-gaze. Both are provided as a three dimensional ray with an origin and direction. Applications can then raycast into their scenes, or the real world, and determine what the user is targeting.

Head-gaze represents the direction that the user's head is pointed in. Think of this as the position and forward direction of the device itself, with the position representing the center point between the two displays. Head-gaze is available on all Mixed Reality devices.

Eye-gaze represents the direction that the user's eyes are looking towards. The origin is located between the user's eyes. It is available on Mixed Reality devices that include an eye tracking system.

Both head and eye-gaze rays are accessible through the SpatialPointerPose API. Simply call SpatialPointerPose::TryGetAtTimestamp to receive a new SpatialPointerPose object at the specified timestamp and coordinate system. This SpatialPointerPose contains a head-gaze origin and direction. It also contains an eye-gaze origin and direction if eye tracking is available.

Device support

Feature HoloLens (1st gen) HoloLens 2 Immersive headsets
Head-gaze ✔️ ✔️ ✔️
Eye-gaze ✔️

Using head-gaze

To access the head-gaze, start by calling SpatialPointerPose::TryGetAtTimestamp to receive a new SpatialPointerPose object. You need to pass the following parameters.

  • A SpatialCoordinateSystem that represents the desired coordinate system for the head-gaze. This is represented by the coordinateSystem variable in the following code. For more information, visit our coordinate systems developer guide.
  • A Timestamp that represents the exact time of the head pose requested. Typically you will use a timestamp that corresponds to the time when the current frame will be displayed. You can get this predicted display timestamp from a HolographicFramePrediction object, which is accessible through the current HolographicFrame. This HolographicFramePrediction object is represented by the prediction variable in the following code.

Once you have a valid SpatialPointerPose, the head position and forward direction are accessible as properties. The following code shows how to access them.

using namespace winrt::Windows::UI::Input::Spatial;
using namespace winrt::Windows::Foundation::Numerics;

SpatialPointerPose pointerPose = SpatialPointerPose::TryGetAtTimestamp(coordinateSystem, prediction.Timestamp());
if (pointerPose)
{
	float3 headPosition = pointerPose.Head().Position();
	float3 headForwardDirection = pointerPose.Head().ForwardDirection();

	// Do something with the head-gaze
}

Using eye-gaze

Please note that for your users to use eye-gaze input, each user has to go through an eye tracking user calibration the first time they use the device. The eye-gaze API is very similar to head-gaze. It uses the same SpatialPointerPose API, which provides a ray origin and direction that you can raycast against your scene. The only difference is that you need to explicitly enable eye tracking before using it. For this, you need to do two steps:

  1. Request user permission to use eye tracking in your app.
  2. Enable the "Gaze Input" capability in your package manifest.

Requesting access to eye-gaze input

When your app is starting up, call EyesPose::RequestAccessAsync to request access to eye tracking. The system will prompt the user if needed, and return GazeInputAccessStatus::Allowed once access has been granted. This is an asynchronous call, so it requires a bit of extra management. The following example spins up a detached std::thread to wait for the result, which it stores to a member variable called m_isEyeTrackingEnabled.

using namespace winrt::Windows::Perception::People;
using namespace winrt::Windows::UI::Input;

std::thread requestAccessThread([this]()
{
	auto status = EyesPose::RequestAccessAsync().get();

	if (status == GazeInputAccessStatus::Allowed)
		m_isEyeTrackingEnabled = true;
	else
		m_isEyeTrackingEnabled = false;
});

requestAccessThread.detach();

Starting a detached thread is just one option for handling async calls. Alternatively, you could use the new co_await functionality supported by C++/WinRT. Here is another example for asking for user permission:

  • EyesPose::IsSupported() allows the application to trigger the permission dialog only if there is an eye tracker.
  • GazeInputAccessStatus m_gazeInputAccessStatus; // This is to prevent popping up the permission prompt over and over again.
GazeInputAccessStatus m_gazeInputAccessStatus; // This is to prevent popping up the permission prompt over and over again.

// This will trigger to show the permission prompt to the user.
// Ask for access if there is a corresponding device and registry flag did not disable it.
if (Windows::Perception::People::EyesPose::IsSupported() &&
   (m_gazeInputAccessStatus == GazeInputAccessStatus::Unspecified))
{ 
	Concurrency::create_task(Windows::Perception::People::EyesPose::RequestAccessAsync()).then(
	[this](GazeInputAccessStatus status)
	{
  		// GazeInputAccessStatus::{Allowed, DeniedBySystem, DeniedByUser, Unspecified}
    		m_gazeInputAccessStatus = status;
		
		// Let's be sure to not ask again.
		if(status == GazeInputAccessStatus::Unspecified)
		{
      			m_gazeInputAccessStatus = GazeInputAccessStatus::DeniedBySystem;	
		}
	});
}

Declaring the Gaze Input capability

Double click the appxmanifest file in Solution Explorer. Then navigate to the Capabilities section and check the Gaze Input capability.

Gaze input capability

This adds the following lines to the Package section in the appxmanifest file:

  <Capabilities>
    <DeviceCapability Name="gazeInput" />
  </Capabilities>

Getting the eye-gaze ray

Once you have received access to ET, you are free to grab the eye-gaze ray every frame. Just as with head-gaze, get the SpatialPointerPose by calling SpatialPointerPose::TryGetAtTimestamp with a desired timestamp and coordinate system. The SpatialPointerPose contains an EyesPose object through the Eyes property. This is non-null only if eye tracking is enabled. From there you can check if the user in the device has an eye tracking calibration by calling EyesPose::IsCalibrationValid. Next, use the Gaze property to get the a SpatialRay containing the eye-gaze position and direction. The Gaze property can sometimes be null, so be sure to check for this. This can happen is if a calibrated user temporarily closes their eyes.

The following code shows how to access the eye-gaze ray.

using namespace winrt::Windows::UI::Input::Spatial;
using namespace winrt::Windows::Foundation::Numerics;

SpatialPointerPose pointerPose = SpatialPointerPose::TryGetAtTimestamp(coordinateSystem, prediction.Timestamp());
if (pointerPose)
{
	if (pointerPose.Eyes() && pointerPose.Eyes().IsCalibrationValid())
	{
		if (pointerPose.Eyes().Gaze())
		{
			auto spatialRay = pointerPose.Eyes().Gaze().Value();
			float3 eyeGazeOrigin = spatialRay.Origin;
			float3 eyeGazeDirection = spatialRay.Direction;
			
			// Do something with the eye-gaze
		}
	}
}

Fallback when eye tracking is not available

As mentioned in our eye tracking design docs, both designers as well as developers should be aware that there may be instances in which eye tracking data may not be available to your app. There are various reasons for this ranging from a user not being calibrated, the user having denied the app access to his/her eye tracking data or simply temporary interferences (such as smudges on the HoloLens visor or hair occluding the user's eyes). While some of the APIs have already been mentioned in this document, in the following, we provide a summary of how to detect that eye tracking is available as a quick reference:

In addition, you may want to check that your eye tracking data is not stale by adding a timeout between received eye tracking data updates and otherwise fallback to head-gaze as discussed below. Please visit our fallback design considerations for more information.


Correlating gaze with other inputs

Sometimes you may find that you need a SpatialPointerPose that corresponds with an event in the past. For example, if the user performs an Air Tap, your app might want to know what they were looking at. For this purpose, simply using SpatialPointerPose::TryGetAtTimestamp with the predicted frame time would be inaccurate because of the latency between system input processing and display time. In addition, if using eye-gaze for targeting, our eyes tend to move on even before finishing a commit action. This is less of an issue for a simple Air Tap, but becomes more critical when combining long voice commands with fast eye movements. One way to handle this scenario is to make an additional call to SpatialPointerPose::TryGetAtTimestamp, using a historical timestamp that corresponds to the input event.

However, for input that routes through the SpatialInteractionManager, there's an easier method. The SpatialInteractionSourceState has its very own TryGetAtTimestamp function. Calling that will provide a perfectly correlated SpatialPointerPose without the guesswork. For more information on working with SpatialInteractionSourceStates, take a look at the Hands and Motion Controllers in DirectX documentation.


Calibration

For eye tracking to work accurately, each user is required to go through an eye tracking user calibration. This allows the device to adjust the system for a more comfortable and higher quality viewing experience for the user and to ensure accurate eye tracking at the same time. Developers don’t need to do anything on their end to manage user calibration. The system will ensure that the user gets prompted to calibrate the device under the following circumstances:

  • The user is using the device for the first time
  • The user previously opted out of the calibration process
  • The calibration process did not succeed the last time the user used the device

Developers should make sure to provide adequate support for users for whom eye tracking data may not be available. Learn more about considerations for fallback solutions at Eye tracking on Hololens 2.


See also