September 2014

Volume 29 Number 9

DirectX Factor : Vertex Shaders and Transforms

Charles Petzold

Charles PetzoldIn 1975, the artist Richard Haas painted the flat side of a building in the Soho district of Manhattan to resemble the façade of a classic cast-iron building, including such features as a cat in a window. Time hasn’t been kind to this painting, but in its early days, it looked very real and fooled many people.

Such paintings—called trompe-l’œil, meaning “deceive the eye”—use shadows and shading to bring out a third dimension on a flat two-dimensional surface. We tend to be tickled by such illusions, but at the same time eager to satisfy ourselves that we can discern the trick. One simple way is to try observing the painting from different perspectives to see if it looks the same.

A similar mental process occurs when a computer program displays some graphics that seem to straddle the line between 2D and 3D. Is it really 3D? Or just 2D with some clever overlapping and shading? We can’t move our heads from side to side to establish the facts, but we might be able to persuade the program to rotate its graphics to see what happens.

The image in Figure 1 looks a lot like the screen displayed by the ThreeTriangles program in the previous installment of this column (msdn.microsoft.com/magazine/dn768854). But the downloadable program for this column, called ThreeRotatingTriangles, does indeed rotate that assemblage of three triangles. The effect is visually quite interesting, and as the three triangles move in relation to each other, the program does establish that there is indeed some 3D graphics processing going on. However, if you look at the code, you’ll discover that the program is still written entirely with Direct2D rather than Direct3D, using the powerful feature of Direct2D effects.

The ThreeRotatingTriangles Program Display
Figure 1 The ThreeRotatingTriangles Program Display

Getting Data into an Effect

The overall organization of the earlier ThreeTriangles program has been preserved in ThreeRotatingTriangles. The class that provides the Direct2D effect implementation is now named Rotating­TriangleEffect rather than SimpleTriangleEffect, but it continues to implement the ID2D1EffectImpl (“effect implementation”) and ID2D1DrawTransform interfaces.

The earlier SimpleTriangleEffect wasn’t versatile at all. It contained hardcoded vertices to display three overlapping triangles. RotatingTriangleEffect allows the vertices to be defined from outside the class, and both the effect implementation and the vertex shader have been enhanced to accommodate matrix transforms.

Generally, an effect implementation such as RotatingTriangle­Effect contains a static method that registers itself by calling RegisterEffectFromString and associating itself with a class ID. In the ThreeRotatingTriangles program, the ThreeRotating­TrianglesRenderer class calls this static method in its constructor to register the effect.

ThreeRotatingTrianglesRenderer also defines an object of type ID2D1Effect as a private field:

Microsoft::WRL::ComPtr<ID2D1Effect>
            m_rotatingTriangleEffect;

To use the effect, the program must create this object by referencing the effect’s class ID in a call to CreateEffect. In the ThreeRotating­TrianglesRenderer class, this occurs in the CreateDeviceDependent­Resources method:

d2dContext->CreateEffect(
  CLSID_RotatingTriangleEffect, &m_rotatingTriangleEffect);

The effect can then be rendered with a call to the DrawImage method. Here’s how ThreeRotatingTrianglesRenderer makes the call in its Render method:

d2dContext->DrawImage(m_rotatingTriangleEffect.Get());

But there’s more that can be done between those two calls. ID2D1Effect derives from ID2D1Properties, which has methods named SetValue and GetValue that allow a program to set properties on the effect. These properties can range from simple effect options expressed as Boolean flags to large buffers of data. However, SetValue and GetValue aren’t often used. They require identifying the particular property by index, and greater program clarity is obtained by instead using the methods SetValueByName and GetValueByName.

Keep in mind that these SetValueByName and GetValueByName methods are part of the ID2D1Effect object, which is the object returned from the CreateEffect call. The ID2D1Effect object passes the values of these properties to the effect implementation class you’ve written—the class that implements the ID2D1EffectImpl and ID2D1DrawTransform interfaces. This seems a bit roundabout, but it’s done this way so registered effects can be used without access to the classes that implement the effect.

But this means that an effect implementation such as the RotatingTriangleEffect class must itself indicate that it can accept properties of various types, and it must provide methods for setting and getting those properties.

This information is provided by the effect implementation when it registers itself using RegisterEffectFromString. Required in this call is some XML that includes the names and types of the various properties the effect implementation supports. RotatingTriangle­Effect supports four properties, with the following names and data types:

  • VertexData of type blob, that is, a memory buffer referenced by a byte pointer.
  • ModelMatrix of type matrix4x4.
  • ViewMatrix of type matrix4x4.
  • ProjectionMatrix of type matrix4x4.   

The names of these data types are specific to Direct2D effects. When an effect that supports properties is registered, the effect implementation must also supply an array of D2D1_VALUE_TYPE_BINDING objects. Each of the objects in this array associates a named property, for example VertexData, with two methods in the effect implementation that set and get the data. For VertexData, the two methods are named SetVertexData and GetVertexData. (When you define these Get methods, make sure to include the const keyword, or you’ll get one of those weird template errors that will be completely baffling.)

Similarly, the RotatingTriangleEffect class defines methods named SetModelMatrix and GetModelMatrix, and so forth. These methods aren’t called by any application program—indeed, they’re private to RotatingTriangleEffect. Instead, the program calls the SetValueByName and GetValueByName methods on the ID2D1Effect object, which then calls the Set and Get methods in the effect implementation.

The Vertex Buffer

The ThreeRotatingTrianglesRenderer class registers the Rotating­TrianglesEffect in its constructor, and renders the effect in its Render method. But between those two calls, the renderer class calls SetValueByName on the ID2D1Effect object to pass data into the effect implementation.

The first of the four effect properties listed earlier is VertexData, which is a collection of vertices used to define a vertex buffer. For Direct2D effects, the number of items in a vertex buffer must be a multiple of three, grouped into triangles. The ThreeRotatingTriangles program displays only three triangles with three vertices each, but the effect can handle a larger buffer.

The format of the vertex buffer expected by RotatingTriangle­Effect is defined in a structure in RotatingTriangleEffect.h:

struct PositionColorVertex
{
  DirectX::XMFLOAT3 position;
  DirectX::XMFLOAT3 color;
};

This is the same format used by SimpleTriangleEffect, but defined somewhat differently. Figure 2 shows how the CreateDeviceDependentResources method in ThreeRotatingTrianglesRenderer transfers the vertex array to the effect after the effect has been created.

Figure 2 Creating the Effect and Setting the Vertex Buffer

void ThreeRotatingTrianglesRenderer::CreateDeviceDependentResources()
{
  ID2D1DeviceContext1* d2dContext =
    m_deviceResources->GetD2DDeviceContext();
  // Create the effect
  DX::ThrowIfFailed(d2dContext->CreateEffect(
                  CLSID_RotatingTriangleEffect,
                  &m_rotatingTriangleEffect)
    );
  // Set the vertices
  std::vector<PositionColorVertex> vertices =
  {
    // Triangle 1
    { XMFLOAT3(0, -1000, -1000), XMFLOAT3(1, 0, 0) },
    { XMFLOAT3(985, -174, 0), XMFLOAT3(0, 1, 0) },
    { XMFLOAT3(342, 940, 1000), XMFLOAT3(0, 0, 1) },
    // Triangle 2
    { XMFLOAT3(866, 500, -1000), XMFLOAT3(1, 0, 0) },
    { XMFLOAT3(-342, 940, 0), XMFLOAT3(0, 1, 0) },
    { XMFLOAT3(-985, -174, 1000), XMFLOAT3(0, 0, 1) },
    // Triangle 3
    { XMFLOAT3(-866, 500, -1000), XMFLOAT3(1, 0, 0) },
    { XMFLOAT3(-643, -766, 0), XMFLOAT3(0, 1, 0) },
    { XMFLOAT3(643, -766, 1000), XMFLOAT3(0, 0, 1) }
  };
  DX::ThrowIfFailed(
    m_rotatingTriangleEffect->SetValueByName(L"VertexData",
      (byte *) &vertices)
    );
  // Ready to render!
  m_readyToRender = true;
}

The X and Y coordinates are based on sines and cosines of angles in 40-degree increments with a radius of 1,000. The Z coordinates range from ­–1,000 for the foreground to 1,000 for the background. In the earlier SimpleTriangleEffect I was very careful to set Z between 0 and 1 because of the conventions used to clip 3D output. As you’ll see, that’s not necessary here because actual camera transforms will be applied to the vertices.

The SetValueByName call with a name of VertexData causes the ID2D1Effect object to call the SetVertexBuffer method in RotatingTriangleEffect to pass along the data. This method casts the byte pointer back to its original type and calls CreateVertexBuffer to store the information in a way that allows it to be passed to the vertex shader.

Applying the Transforms

Figure 3 shows the Update method in ThreeRotatingTrianglesRenderer calculating three matrix transforms and making three calls to SetValueByName. The code has been slightly simplified to remove checks for errant HRESULT returns.

Figure 3 Setting the Three Transform Matrices

void ThreeRotatingTrianglesRenderer::Update(DX::StepTimer const& timer)
{
  if (!m_readyToRender)
      return;
  // Apply model matrix to rotate vertices
  float angle = float(XM_PIDIV4 * timer.GetTotalSeconds());
  XMMATRIX matrix = XMMatrixRotationY(angle);
  XMFLOAT4X4 float4x4;
  XMStoreFloat4x4(&float4x4, XMMatrixTranspose(matrix));
  m_rotatingTriangleEffect->SetValueByName(L"ModelMatrix", float4x4);
  // Apply view matrix
  matrix = XMMatrixLookAtRH(XMVectorSet(0, 0, -2000, 0),
                            XMVectorSet(0, 0, 0, 0),
                            XMVectorSet(0, 1, 0, 0));
  XMStoreFloat4x4(&float4x4, XMMatrixTranspose(matrix));
  m_rotatingTriangleEffect->SetValueByName(L"ViewMatrix", float4x4);
  // Base view width and height on coordinates of model
  float width = 2000;
  float height = 2000;
  // Adjust width and height for landscape and portrait modes
  Windows::Foundation::Size logicalSize = 
    m_deviceResources->GetLogicalSize();
  if (logicalSize.Width > logicalSize.Height)
      width *= logicalSize.Width / logicalSize.Height;
  else
      height *= logicalSize.Height / logicalSize.Width;
  // Apply projection matrix   
  matrix = XMMatrixOrthographicRH(width, height, 500, 4000);
  XMStoreFloat4x4(&float4x4, XMMatrixTranspose(matrix));
  m_rotatingTriangleEffect->SetValueByName(L"ProjectionMatrix", float4x4);
}

As usual, Update is called at the frame rate of the video display. The first matrix it calculates is applied to the vertices to rotate them around the Y axis. The second matrix is a standard camera view transform that results in shifting the scene so the viewer is on the origin of the three-dimensional coordinate system and looking straight along the Z axis. The third is a standard projection matrix, which results in X and Y coordinates being normalized to values between ­–1 and 1, and Z coordinates between 0 and 1.

These matrices must be applied to vertex coordinates. So why doesn’t the program just multiply them by the array of vertices defined in the CreateDeviceDependentResources method and then set a new vertex buffer in the RotatingTrianglesEffect?

It’s certainly possible to define a dynamic vertex buffer that changes at the frame rate of the video display, and in some cases it’s necessary. But if the vertices need only be modified by matrix transforms, a dynamic vertex buffer isn’t as efficient as maintaining the same buffer throughout the program and applying the transforms later on in the pipeline—specifically, in the vertex shader that’s running on the video GPU.

This means the vertex shader needs new matrix transforms for every frame of the video display, and that raises another issue: How does an effect implementation get data into the vertex shader?

The Shader Constant Buffer

Data is transferred from application code into a shader through a mechanism called a constant buffer. Don’t let the name deceive you into thinking its contents remain constant throughout the course of the program. That’s definitely not the case. Very often the constant buffer changes with every frame of the video display. However, the contents of the constant buffer are constant for all the vertices in each frame, and the format of the constant buffer is fixed at compile time by the program.

The format of the vertex shader constant buffer is defined in two places: in C++ code and in the vertex shader itself. In the effect implementation, it looks like this:

struct VertexShaderConstantBuffer
{
  DirectX::XMFLOAT4X4 modelMatrix;
  DirectX::XMFLOAT4X4 viewMatrix;
  DirectX::XMFLOAT4X4 projectionMatrix;
} m_vertexShaderConstantBuffer;

When the Update method in the renderer calls SetValueByName to set one of the matrices, the ID2D1Effect object calls the appropriate Set method in the RotatingTrianglesEffect class. These methods are named SetModelMatrix, SetViewMatrix, and SetProjection­Matrix, and they simply transfer the matrix to the appropriate field in the m_vertexShaderConstantBuffer object.

The ID2D1Effect object assumes that any call to SetValueByName results in a change to the effect that probably involves a corresponding change to graphical output, so the PrepareForRender method is called in the effect implementation. It’s here the effect implementation can take the opportunity to call SetVertexShaderConstantBuffer to transfer the contents of the VertexShaderConstantBuffer to the vertex shader.

The New Vertex Shader

Now, finally, you can look at the High Level Shader Language (HLSL) code that’s doing much of the work in rotating the vertices of those three triangles and orienting them in 3D space. This is the new and improved vertex shader, shown in its entirety in Figure 4.

Figure 4 The Vertex Shader for the Rotating Triangle Effect

// Per-vertex data input to the vertex shader
struct VertexShaderInput
{
  float3 position : MESH_POSITION;
  float3 color : COLOR0;
};
// Per-vertex data output from the vertex shader
struct VertexShaderOutput
{
  float4 clipSpaceOutput : SV_POSITION;
  float4 sceneSpaceOutput : SCENE_POSITION;
  float3 color : COLOR0;
};
// Constant buffer provided by effect.
cbuffer VertexShaderConstantBuffer : register(b1)
{
  float4x4 modelMatrix;
  float4x4 viewMatrix;
  float4x4 projectionMatrix;
};
// Called for each vertex.
VertexShaderOutput main(VertexShaderInput input)
{
  // Output structure
  VertexShaderOutput output;
  // Get the input vertex, and include a W coordinates
  float4 pos = float4(input.position.xyz, 1.0f);
  // Pass through the resultant scene space output value
  output.sceneSpaceOutput = pos;
  // Apply transforms to that vertex
  pos = mul(pos, modelMatrix);
  pos = mul(pos, viewMatrix);
  pos = mul(pos, projectionMatrix);
  // The result is clip space output
  output.clipSpaceOutput = pos;
  // Transfer the color
  output.color = input.color;
  return output;
}

Notice how the structures are defined: The VertexShaderInput in the shader is the same format as the PositionColorVertex structure defined in the C++ header file. The VertexShaderConstantBuffer is the same format as the same-named structure in C++ code. The VertexShaderOutput structure matches the PixelShaderInput structure in the pixel shader.

In the vertex shader associated with the SimpleTriangleEffect in last month’s column, a ClipSpaceTransforms buffer was provided automatically to convert from scene space (the pixel coordinates used for the vertices of the triangles) to clip space, which involves normalized X and Y coordinates ranging from –1 to 1, and Z coordinates that range from 0 to 1.

That’s no longer necessary, so I removed it without any unfortunate consequences. Instead, the projection matrix does the equivalent job. As you can see, the main function applies the three matrices to the input vertex position, and sets the result to the clipSpaceOuput field of the output structure.

That clipSpaceOutput field is required. This is how the depth buffer is managed, and how the results are mapped to the display surface. However, the sceneSpaceOutput field of the VertexShaderOutput structure isn’t required. If you remove that field—and also remove the field from the PixelShaderInput structure of the pixel shader—the program will run the same.

Row Major and Column Major

The shader performs three multiplications of positions by matrices:

pos = mul(pos, modelMatrix);
pos = mul(pos, viewMatrix);
pos = mul(pos, projectionMatrix);

In mathematical notation, these multiplications look like this:

m11   m12    m13   m14
m21   m22    m23   m24
m31   m32    m33   m34
m41   m42    m43   m44
|x   y   z   w| ×

When this multiplication is performed, the four numbers that describe the point (x, y, z, w) are multiplied by the four numbers in the first column of the matrix (m11, m21, m31, m41) and the four products are summed, and then the process continues with the second, third and fourth columns.

The vector (x, y, z, w) consists of four numbers stored in adjacent memory. Consider hardware that implements parallel processing of matrix multiplication. Do you think it might be faster to perform these four multiplications in parallel if the numbers in each column were also stored in adjacent memory? This seems very likely, which implies the optimum way to store the matrix values in memory is the order m11, m21, m31, m41, m12, m22 and so forth.

That’s known as column-major order. The memory block begins with the first column of the matrix, then the second, third and fourth. And this is what vertex shaders assume to be the organization of matrices in memory when performing these multiplications.

However, this isn’t the way DirectX normally stores matrices in memory. The XMMATRIX and XMFLOAT4X4 structures in the DirectX Math library store matrices in row-major order: m11, m12, m13, m14, m21, m22 and so forth. This seems like a more natural order to many of us because it’s similar to the order that we read lines of text—across and then down.

Regardless, there’s an incompatibility between DirectX and shader code, and that’s why you’ll notice in the code in Figure 3 that every matrix is subjected to an XMMatrixTranspose call before being sent off to the vertex shader. The XMMatrixTranspose function converts row-major matrices to column-major matrices, and back again if you need that.

That’s the most common solution to this problem but it’s not the only solution. You can alternatively specify a flag to compile the shader for row-major order, or you can leave the matrices untransposed and just switch around the order of the matrix and the vector in the multiplications:

pos = mul(viewMatrix, pos);

The Final Step

The shading of the three triangles certainly demonstrates that vertex colors are interpolated over the surface of each triangle, but the result is rather crude. Surely I can do better in using this powerful shading tool to mimic the reflection of light. That will be the final step in this plunge into the versatile world of Direct2D.


Charles Petzold is a longtime contributor to MSDN Magazine and the author of “Programming Windows, 6th Edition” (Microsoft Press, 2013), a book about writing applications for Windows 8. His Web site is charlespetzold.com.

Thanks to the following Microsoft technical expert for reviewing this article: Doug Erickson