UMA Optimizations: CPU Accessible Textures and Standard Swizzle
Universal Memory Architecture (UMA) GPUs offer some efficiency advantages over discrete GPUs, especially when optimizing for mobile devices. Giving resources CPU access when the GPU is UMA can reduce the amount of copying that occurs between CPU and GPU. While we don’t recommend applications blindly give CPU access to all resources on UMA designs, there are opportunities to improve efficiencies by giving the right resources CPU access. Unlike discrete GPUs, the CPU can technically have a pointer to all resources that the GPU can access.
Overview of CPU accessible textures
CPU accessible textures, in the graphics pipeline, are a feature of UMA architecture, enabling CPUs read and write access to textures. On the more common discreet GPUs, the CPU does not have access to textures in the graphics pipeline.
The general best practice advice for textures is to accommodate discrete GPUs, which typically involves following the processes in Uploading Texture Data Through Buffers, summarized as:
- Not having any CPU access for the majority of textures.
- Setting the texture layout to D3D12_TEXTURE_LAYOUT_UNKNOWN.
- Uploading the textures to the GPU with CopyTextureRegion.
However, for certain cases, the CPU and GPU may interact so frequently on the same data, that mapping textures becomes helpful to save power, or to speed up a particular design on particular adapters or architectures. Applications should detect these cases and optimize out the unnecessary copies. In this case, for best performance consider the following:
Only start entertaining the better performance of mapping textures when D3D12_FEATURE_DATA_ARCHITECTURE::UMA is TRUE. Then pay attention to CacheCoherentUMA if deciding which CPU cache properties to choose on the heap.
Leveraging CPU access for textures is more complicated than for buffers. The most efficient texture layouts for GPUs are rarely row_major. In fact, some GPUs can only support row_major textures when copying texture data around.
UMA GPUs should universally benefit from a simple optimization to reduce level-load times. After recognizing UMA, the application can optimize out the initial CopyTextureRegion to populate textures that the GPU will not modify. Instead of creating the texture in a heap with D3D12_HEAP_TYPE_DEFAULT, and marshalling the texture data through, the application can use WriteToSubresource to avoid understanding the actual texture layout.
In D3D12, textures created with D3D12_TEXTURE_LAYOUT_UNKNOWN and no CPU access are the most efficient for frequent GPU rendering and sampling. When performance testing, those textures should be compared against D3D12_TEXTURE_LAYOUT_UNKNOWN with CPU access, and D3D12_TEXTURE_LAYOUT_STANDARD_SWIZZLE with CPU access, and D3D12_TEXTURE_LAYOUT_ROW_MAJOR for cross-adapter support.
Using D3D12_TEXTURE_LAYOUT_UNKNOWN with CPU access enables the methods WriteToSubresource, ReadFromSubresource, Map (precluding application access to pointer), and Unmap; but can sacrifice efficiency of GPU access.
Using D3D12_TEXTURE_LAYOUT_STANDARD_SWIZZLE with CPU access enables WriteToSubresource, ReadFromSubresource, Map (which returns a valid pointer to application), and Unmap. It can also sacrifice the efficiency of GPU access more than D3D12_TEXTURE_LAYOUT_UNKNOWN with CPU access.
Overview of Standard Swizzle
D3D12 (and D3D11.3) introduce a standard multi-dimensional data layout. This is done to enable multiple processing units to operate on the same data without copying the data or swizzling the data between multiple layouts. A standardized layout enables efficiency gains through network effects and allows algorithms to make short-cuts assuming a particular pattern.
For a detailed description of the texture layouts, refer to D3D12_TEXTURE_LAYOUT.
Note though that this standard swizzle is a hardware feature, and may not be supported by all GPUs.
For background information on swizzling, refer to Z-order curve.
Unlike D3D11.3, D3D12 supports texture mapping by default, so there is no need to query D3D12_FEATURE_DATA_D3D12_OPTIONS. However D3D12 does not always support standard swizzle - this feature will need to be queried for with a call to CheckFeatureSupport and checking the StandardSwizzle64KBSupported field of D3D12_FEATURE_DATA_D3D12_OPTIONS.
The following APIs reference texture mapping:
- D3D12_TEXTURE_LAYOUT : controls the swizzle pattern of default textures and enable map support on CPU accessible textures.
- D3D12_RESOURCE_DESC : describes a resource, such as a texture, this is an extensively used structure.
- D3D12_HEAP_DESC : describes a heap.
- ID3D12Device::CreateCommittedResource : creates a single resource and backing heap of the right size and alignment.
- ID3D12Device::CreateHeap : creates a heap for a buffer or texture.
- ID3D12Device::CreatePlacedResource : creates a resource that is placed in a specific heap, usually a faster method of creating a resource than CreateHeap.
- ID3D12Device::CreateReservedResource : creates a resource that is reserved but not yet committed or placed in a heap.
- ID3D12CommandQueue::UpdateTileMappings : updates mappings of tile locations in tiled resources to memory locations in a resource heap.
- ID3D12Resource::Map : gets a pointer to the specified data in the resource, and denies the GPU access to the subresource.
- ID3D12Resource::GetDesc : gets the resource properties.
- ID3D12Heap::GetDesc gets the heap properties.
- ReadFromSubresource : copies data from a texture which was mapped using Map.
- WriteToSubresource : copies data into a texture which was mapped using Map.
Resources and parent heaps have alignment requirements:
- D3D12_DEFAULT_MSAA_RESOURCE_PLACEMENT_ALIGNMENT (4MB) for multi-sample textures.
- D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT (64KB) for single sample textures and buffers.
- Linear subresource copying must be aligned to D3D12_TEXTURE_DATA_PLACEMENT_ALIGNMENT (512 bytes), with row pitch being aligned to D3D12_TEXTURE_DATA_PITCH_ALIGNMENT (256 bytes).
- Constant buffer views must be aligned to D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT (256 bytes).
Textures smaller than 64KB should be processed through CreateCommittedResource.
With dynamic textures (textures that change every frame) the CPU will write linearly to the upload heap, followed by a GPU copy operation.
Typically to create dynamic resources create a large buffer in an upload heap (refer to Suballocation Within Buffers). To create staging resources, create a large buffer in a readback heap. To create default static resources, create adjacent resources in a default heap. To create default aliased resources, create overlapping resources in a default heap.
WriteToSubresource and ReadFromSubresource rearrange texture data between a row-major layout and an undefined resource layout. The operation is synchronous, so the application should keep CPU scheduling in mind. The application can always break up the copying into smaller regions or schedule this operation in another task. MSAA resources and depth-stencil resources with opaque resource layouts are not supported by these CPU copy operations, and will cause a failure. Formats which don’t have a power-of-two element size are also not supported and will also cause a failure. Out of memory return codes can occur.