Article
01/25/2019

January 2017

Volume 32 Number 1

[HoloLens]

Introduction to the HoloLens, Part 2: Spatial Mapping

By Adam Tuliper | January 2017

In my last article, I talked about the three pillars of input for the HoloLens—gaze, gesture and voice (msdn.com/magazine/mt788624). These constructs allow you to physically interact with the HoloLens and, in turn, the world around you. You’re not constrained to working only with them, however, because you can access information about your surroundings through a feature called spatial mapping, and that’s what I’m going to explore in this article.

If I had to choose a single favorite feature on the HoloLens, it would be spatial mapping. Spatial mapping allows you to understand the space around you, either explicitly or implicitly. I can explicitly choose to work with the information taken in, or I can proceed implicitly by allowing natural physical interactions, like dropping a virtual ball onto a physical table, to take place. Recently, with some really neat updates to the HoloToolkit from Asobo Studio, it’s easy to scan for features in your environment, such as a chair, walls and more.

What Is a 3D Model?

It might be helpful to understand what a 3D model is before looking at what a spatial map of your area represents. 3D models come in a number of file formats, such as .ma or .blender, but often you’ll find them in either of two proprietary Autodesk formats called .FBX (Filmbox) or .OBJ files. .FBX files can contain not only 3D model information, but also animation data, though that isn’t applicable to this discussion.

A 3D model is a fairly simple object, commonly tracked via face-vertex meshes, which means tracking faces and vertices. For nearly all modern hardware, triangles are used for faces because triangles are the simplest of polygons. Inside a 3D model you’ll find a list of all vertices in the model (made up of x,y,z values in space); a list of the vertex indices that make up each triangle; normals, which are just descriptive vectors (arrows) coming off each vertex used for lighting calculations so you know how light should interact with your model; and, finally, UV coordinates—essentially X,Y coordinates that tell you how to take a 2D image, called a texture, and wrap it around your model like wrapping paper to make it look like it was designed. Figure 1 shows virtual Adam, a model that the company xxArray created for me because, well, I wanted to put myself into a scene with zombies. This is just a 3D model, but note the legs, which are made of vertices and triangles, and that the pants texture is, in simple terms, wrapped around the 3D model of the legs to look like pants. That’s nearly all the magic behind a 3D model.

Figure 1 UV Mapping of 2D Texture to 3D Object

What Does Spatial Mapping Look Like?

Spatial mapping is easier in some ways because you’re not dealing with the textures of your environment. All you typically care about is having a fairly accurate mesh created from your environment that can be discovered. The environment is scanned so you can interact with it. Figure 2 shows a scenario slightly more like what you’ll actually get, though contrived. The model on the left shows the vertices, triangles and normals. You can’t see the normal directly, of course, but you see its result by how the object is shaded.

Figure 2 What’s Needed for Rendering and for the Physics Engine

What you’ve seen thus far in both 3D model scenarios is purely for rendering and has absolutely nothing to do (yet) with physics. The green box outline on the right in Figure 2 is the shape of the collider I’ve moved off the cube to show a point; this is the component that defines the region to the physics system. If you want to fully interact with the world on the HoloLens, a game or in any 3D experience, really, you need a collider for the physics system to use.

When you turn the HoloLens on and are in the holographic shell, it’s always mapping your environment. The HoloLens does this to understand where to place your windows. If I walk around my house with the HoloLens, it’s always updating its information about my environment. This serves two purposes: First, when I walk into a room I’ve been in previously, the HoloLens should show me the windows I had open. Second, environments are always changing and it needs to detect those changes. Think of the following common scenarios: someone walks in front of me, my kids are running around in the house, our pet bear walks by and creates a large occlusion zone I can’t see through. The point is, the environment is potentially always changing and the HoloLens is looking for these changes. Before delving into the API, let’s see the spatial mapping in practice (and, by the way, I don’t have a real pet bear).

To view spatial mapping in action, you can connect to the Windows Device Portal on a HoloLens, which allows remote management and viewing of the device, including a live 30 FPS video stream of what the device sees. The device portal can be run for nearly any Windows 10 device. It can be accessed by going to the device IP, or to 127.0.0.1:10080 for devices plugged in over USB once it’s been enabled on the HoloLens in the Developer Settings. Most Windows 10 devices can be enabled for a device portal as outlined at bit.ly/2f0cnfM. Figure 3 and Figure 4 show the spatial mesh retrieved from the 3D view in the device portal. Figure 3 shows what the HoloLens sees as soon as I turn it on, while Figure 4 displays the view after a brief walk through my living room. Note the chair next to the far wall on the right, as that appears later on (in Figure 9) when I ask the spatial understanding library to find me a sittable surface.

Figure 3 HoloLens Spatial Mesh Right After HoloLens Is Turned on in a New Room

Figure 4 HoloLens Spatial Mesh After a Quick Walk-Through a Portion of the Room

How Spatial Mapping Works

Spatial mapping works via a SurfaceObserver object, as you’re observing surface volumes, watching for new, updated and removed surfaces. All the types you need to work with come with Unity out of the box. You don’t need any additional libraries, though the HoloToolkit-Unity repository on GitHub has lots of functionality for the HoloLens, including some amazing surface detection I’ll look at later, so this repository should be considered essential for hitting the ground running.

First, you tell the SurfaceObserver that you’re observing a volume:

public Vector3 Extents = new Vector3(10, 10, 10);
observer = new SurfaceObserver();
// Start from 0,0,0 and fill in a 10 meter cube volume
// as you explore more of that volume area
observer.SetVolumeAsAxisAlignedBox(Vector3.zero,Extents);

The larger the region, the greater the computational cost that can occur. According to the documentation, spatial mapping scans in a 70-degree cone a region between 0.8 and 3.1 meters—about 10 feet out (the docs state these values might change in the future). If an object is further away, it won’t be scanned until the HoloLens gets closer to it. Keeping to 0.8 meters also ensures the user’s hands won’t accidentally be included as part of the spatial mesh of the room.

The process to get spatial data into an application is as follows:

Notify the SurfaceObserver to observe a region of size A and shape B.
At a predefined interval (such as every 3 seconds), ask the SurfaceObserver for an update if you aren’t waiting on other results to be processed. (It’s best not to overlap results; let one mesh finish before the next is processed.)
Surface Observer lets you know if there’s an add, update or removal of a surface volume.
If there’s an add or update to your known spatial mesh:
1. Clean up old surface if one exists for this id.
2. Reuse (to save memory, if you have a surface that isn’t being used) or allocate a new SurfaceObject with mesh, collider and world anchor components.
3. Make an async request to bake the mesh data.
If there’s a removal, remove the volume and make it inactive so you can reuse its game object later (this prevents additional allocations and thus fewer garbage collections).

To use spatial mapping, SpatialPerception is a required capability in a Universal Windows Platform (UWP) app. Because an end user should be aware that an application can scan the room, this needs to be noted in the capabilities either in the Unity player settings as shown in Figure 5, or added manually in your application’s package.appxmanifest.

Figure 5 Adding SpatialPerception in File-Build Settings

The spatial meshes are processed in surface volumes that are different from the bounding volume defined for the SurfaceObserver to observe. The key is once the SurfaceObserver_OnSurface delegate is called to note surface volume changes, you request the changes in the next frame. The meshes are then prepared in a process called baking, and a SurfaceObserver_OnDataReady callback is processed when the mesh is ready.

Baking is a standard term in the 3D universe that usually refers to calculating something ahead of time. It’s typically used to talk about calculating lighting information and transferring it to a special image called a lightmap in the baking process. Lightmaps help avoid runtime calculations. Baking a mesh can take several frames from the time you ask for it in your Update function (see Figure 6). For performance’s sake, request the mesh only from RequestMeshAsync if you’re actually going to use it, otherwise you’re doing extra processing when you bake it for no reason.

Figure 6 The Update Function

private void Update()
{
  // Only do processing if you should be observing.
  // This is a flag that should be turned on or off.
  if (ObserverState == ObserverStates.Running)
  {
    // If you don't have a mesh creation pending but you could
    // schedule a mesh creation now, do it!
    if (surfaceWorkOutstanding == false && surfaceWorkQueue.Count > 0)
    {
      SurfaceData surfaceData = surfaceWorkQueue.Dequeue();
      // If RequestMeshAsync succeeds, then you've scheduled mesh creation.
      // OnDataReady is left out of this demo code, as it performs
      // some basic cleanup and sets some material/shadow settings.
      surfaceWorkOutstanding = observer.RequestMeshAsync(surfaceData,
        SurfaceObserver_OnDataReady);
    }
    // If you don't have any other work to do, and enough time has passed since
    // previous update request, request updates for the spatial mapping data.
    else if (surfaceWorkOutstanding ==
      false && (Time.time - updateTime) >= TimeBetweenUpdates)
    {
      // You could choose a new origin here if you need to scan
      // a new area extending out from the original or make Extents bigger.
      observer.SetVolumeAsAxisAlignedBox(observerOrigin, Extents);
      observer.Update(SurfaceObserver_OnSurfaceChanged);
      updateTime = Time.time;
    }
  }
}
private void SurfaceObserver_OnSurfaceChanged(
  SurfaceId id, SurfaceChange changeType, Bounds bounds, System.DateTime updateTime)
{
  GameObject surface;
  switch (changeType)
  {
    case SurfaceChange.Added:
    case SurfaceChange.Updated:
      // Create (or get existing if updating) object on a custom layer.
      // This creates the new game object to hold a piece
      // of the spatial mesh.
      surface = GetSurfaceObject(id.handle, transform);
 
      // Queue the request for mesh data to be handled later.
      QueueSurfaceDataRequest(id, surface);
      break;
 
    case SurfaceChange.Removed:
      // Remove surface from list.
      // ...
      break;
  }
}

The Update code is called every frame on any game object deemed responsible for getting the spatial meshes.

When surface volume baking is requested via RequestMeshAsync, the request is passed a SurfaceData structure in which you can specify the scanning density (resolution) in triangles per cubic meter to process. When TrianglesPerCubicMeter is greater than 1000, you get fairly smooth results that more closely match the surfaces you’re scanning. On the other hand, the lower the triangle count, the better the performance. A resolution of <100 is very fast, but you lose surface details, so I recommend trying 500 to start and adjusting from there. Figure 7 uses about 500 TrianglesPerCubicMeter. The HoloLens already does some optimizations on the mesh, so you’ll need to performance test your applications and make a determination whether you want to scan and fix up more (use less memory) or just scan at a higher resolution, which is easier but uses more memory.

Figure 7 A Virtual Character Detecting and Sitting on a Real-World Item (from the Fragments Application)

Creating the spatial mesh isn’t a super high-resolution process by design because higher resolution equals significantly more processing power and usually isn’t necessary to interact with the world around you. You won’t be using spatial mapping to capture a highly detailed small figurine on your countertop—that’s not what it’s designed for. There are plenty of software solutions for that, though, via a technique called photogrammetry, which can be used for creating 3D models from images, such as Microsoft 3D Builder, and many others listed at bit.ly/2fzcH1z and bit.ly/1UjAt1e. The HoloLens doesn’t include anything for scanning and capturing a textured 3D model, but you can find applications to create 3D models on the HoloLens, such as HoloStudio, or you can create them in 3D Builder (or in any 3D modeling software for that matter) and bring them into Unity to use on the HoloLens. You can also now live stream models from Unity to the HoloLens during development with the new Holographic emulation in Unity 5.5.

Mesh colliders in Unity are the least-performant colliders, but they’re necessary for surfaces that don’t fit primitive shapes like boxes and spheres. As you add more triangles on the surfaces and add mesh colliders to them, you can impact physics performance. SurfaceData’s last parameter is whether to bake a collider:

SurfaceData surfaceData = new SurfaceData(id,
            surface.GetComponent<MeshFilter>(),
            surface.GetComponent<WorldAnchor>(),
            surface.GetComponent<MeshCollider>(),
            TrianglesPerCubicMeter,
            bakeCollider);

You may never need a collider on your spatial mesh (and thus pass in bakeCollider=false) if you only want to detect features in the user’s space, but not integrate with the physics system. Choose wisely.

There are plenty of considerations for the scanning experience when using spatial mapping. Applications may opt not to scan, to scan only part of the environment or to ask users to scan their environment looking for certain-size surfaces like a couch. Design guidelines are listed on the “Spatial Mapping Design” page of the Windows Dev Center (bit.ly/2gDqQQi) and are important to consider, especially because understating scenarios can introduce various imperfections into your mesh, which fall into three general categories discussed on the “Spatial Mapping Design” page—bias, hallucinations and holes. One workflow would be to ask the user to scan everything up front, such as is done at the beginning of every “RoboRaid” session to find the appropriate surfaces for the game to work with. Once you’ve found applicable surfaces to use, the experience starts and uses the meshes that have been provided. Another workflow is to scan up front, then scan continually at a smaller interval to find real-world changes.

Working with the Spatial Mesh

Once the mesh has been created, you can interact with it in various ways. If you use the HoloToolkit, the spatial mesh has been created with a custom layer attribute. In Unity you can ignore or include layers in various operations. You can shoot an invisible arrow out in a common operation called a raycast, and it will return the colliders that it hit on the optionally specified layer.

Often I’ll want to place holograms in my environment, on a table or, even like in “Young Conker” (bit.ly/2f4Ci4F), provide a location for the character to move to by selecting an area in the real world (via the spatial mesh) to which to go. You need to understand where you can intersect with the physical world. The code in Figure 8 performs a raycast out to 30 meters, but will report back only areas hit on the spatial mapping mesh. Other holograms are ignored if they aren’t on this layer.

Figure 8 Performing a Raycast

// Do a raycast into the world that will only hit the Spatial Mapping mesh.
var headPosition = Camera.main.transform.position;
var gazeDirection = Camera.main.transform.forward;
 
RaycastHit hitInfo;
// Ensure you specify a length as a best practice. Shorter is better as
// performance hit goes up roughly linearly with length.
if (Physics.Raycast(headPosition, gazeDirection, out hitInfo,
    10.0f, SpatialMappingManager.Instance.LayerMask))
{
  // Move this object to where the raycast hit the Spatial Mapping mesh.
    this.transform.position = hitInfo.point;
 
  // Rotate this object to face the user.
  Quaternion rotation = Camera.main.transform.localRotation;
  rotation.x = 0;
  rotation.z = 0;
  transform.rotation = rotation;
 
}

I don’t have to use the spatial mesh, of course. If I want a hologram to show up and the user to be able to place it wherever he wants (maybe it always follows him) and it will never integrate with the physical environment, I surely don’t need a raycast or even the mesh collider.

Now let’s do something fun with the mesh. I want to try to determine where in my living room an area exists that a character could sit down, much like the scene in Figure 7, which is from “Fragments,” an amazing nearly five-hour mystery-solving experience for the HoloLens that has virtual characters sitting in your room at times. Some of the code I’ll walk through is from the HoloToolkit. It came from Asobo Studio, which worked on “Fragments.” Because this is mixed reality, it’s just plain awesome to develop experiences that mix the real world with the virtual world. Figure 9 is the end result from a HoloToolkit-Examples—SpatialUnderstandingExample scene that I’ve run in my living room. Note that it indicates several locations that were identified as sittable areas.

Figure 9 The HoloToolkit SpatialUnderstanding Functionality

The entire code example for this is in the HoloToolkit, but let’s walk through the process. I’ve trimmed down the code into applicable pieces. (I’ve talked about SurfaceObserver already so that will be excluded from this section.) SpatialUnderstandingSourceMesh wraps the SurfaceObserver through a SpatialMappingObserver class to process meshes and will create the appropriate MeshData objects to pass to the SpatialUnderstaing DLL. The main force of this API lies in this DLL in the HoloToolkit.

In order to look for shapes in my spatial mesh using the DLL, I must define the custom shape I’m looking for. If I want a sittable surface that’s between 0.2 and 0.6 meters off the floor, made of at least one discrete flat surface, and a total surface area minimum of 0.2 meters, I can create a shape definition that will get passed to the DLL through AddShape (see Figure 10).

Figure 10 Creating a Shape Definition

ShapeDefinitions.cs
// A "Sittable" space definition.
shapeComponents = new List<SpatialUnderstandingDllShapes.ShapeComponent>()
{
  new SpatialUnderstandingDllShapes.ShapeComponent(
    new List<SpatialUnderstandingDllShapes.ShapeComponentConstraint>()
    {
      SpatialUnderstandingDllShapes.ShapeComponentConstraint.Create_
        SurfaceHeight_Between(0.2f, 0.6f),
      SpatialUnderstandingDllShapes.ShapeComponentConstraint.Create_
        SurfaceCount_Min(1),
      SpatialUnderstandingDllShapes.ShapeComponentConstraint.Create_
        SurfaceArea_Min(0.20f),
  }),
};
// Tell the DLL about this shape is called Sittable.
AddShape("Sittable", shapeComponents);

Next, I can detect the regions and then visualize or place game objects there. I’m not limited to asking for a type of shape and getting all of them. If I want, I can structure my query to QueryTopology_FindLargePositionsOnWalls or QueryTopology_FindLargestWall, as shown in Figure 11.

Figure 11 Querying for a Shape

SpaceVisualizer.cs (abbreviated)
 
const int QueryResultMaxCount = 512;
private ShapeResult[] resultsShape = new ShapeResult[QueryResultMaxCount];
public GameObject Beacon;
 
public void FindSittableLocations()
{
  // Pin managed object memory going to native code.
  IntPtr resultsShapePtr =
    SpatialUnderstanding.Instance.UnderstandingDLL.
    PinObject(resultsShape);
   
  // Find the half dimensions of "Sittable" objects via the DLL.
  int shapeCount = SpatialUnderstandingDllShapes.QueryShape_FindShapeHalfDims(
    "Sittable",
    resultsShape.Length, resultsShapePtr);
 
  // Process found results.
  for(int i=0;i<shapeCount;i++)
  {
    // Create a beacon at each "sittable" location.
    Instantiate(Beacon, resultsShape[i].position, Quaternion.identity);
 
    // Log the half bounds of our sittable area.
    Console.WriteLine(resultsShape[i].halfDims.sqrMagnitude < 0.01f) ?
      new Vector3(0.25f, 0.025f, 0.25f) : resultsShape[i].halfDims)
  }
}

There’s also a solver in the HoloToolkit that allows you to provide criteria, such as “Create 1.5 meters away from other objects”:

List<ObjectPlacementRule> rules =
new List<ObjectPlacementRule>() {
    ObjectPlacementRule.Create_AwayFromOtherObjects(1.5f),
};
// Simplified api for demo purpose – see LevelSolver.cs in the HoloToolkit.
var queryResults = Solver_PlaceObject(....)

After executing the preceding query to place an object, you get back a list of results you can use to determine the location, bounds and directional vectors to find the orientation of the surface:

public class ObjectPlacementResult
{
  public Vector3 Position;
  public Vector3 HalfDims;
  public Vector3 Forward;
  public Vector3 Right;
  public Vector3 Up;
};

Wrapping Up

Spatial mapping lets you truly integrate with the world around you and engage in mixed-reality experiences. You can guide a user to scan her environment and then give her feedback about what you’ve found, as well as smartly determine her environment for your holograms to interact with her. There’s no other device like the HoloLens for mixing worlds. Check out HoloLens.com and start developing mind-blowing experiences today. Next time around, I’ll talk about shared experiences on the HoloLens. Until then, keep developing!

Adam Tuliper is a senior technical evangelist with Microsoft living in sunny SoCal. He’s a Web dev/game dev Pluralsight.com author and all-around tech lover. Find him on Twitter: @AdamTuliper or at adamtuliper.com.

Thanks to the following Microsoft technical expert for reviewing this article:Jackson Fields