FFMPEG (D3D11VA) + D3D11Shader(Convert NV12->RGBA) cpu usage to high.

剑川 孙 1 Reputation point
2021-11-19T07:34:41.07+00:00

Hello:
I Use ffmepg to decoded 4k(8bit) videofile(with d3d11 hw decoder). ffmepg output a texture(NV12 Fromat) array and a arraySliceIndex which contains decoded texture.
Then for each frame,I use d3d11 shader to convert nv12 texture to bgra texture as follow step:
1、 I create two shaderResourceView(for Y plane and UV plane) , and set them to pixel shader stage.
2、 Upload convert color matrix (use const buffer).
3、Draw.

It work well,but I find the cup usage is up to 50% (GPU Decode usage 40%)when i play the video frame(play rate is 30fps).
Then I use d3dvideodevice videoprocessor to replace shader convertor. the usage of cup is only(15%).

Then I check my code ,I call CreateShaderResourceView in every frame render to create view for shader , cause the output texture slice index is different every frame.
I Do an test ,when I don't call CreateShaderResourceView every frame(obvious it can't work),the usage of cup is only 15%.
I aware that the high usage of cup is associated with CreateShaderResourceView func.

I can create a class member nv12 texture and CreateShaderResourceView for this texture once,then each frame, I use copySubResourceFunc to copy ffmpeg output texture to this member texture。to avoid call CreateShaderResourceView for each render . But this way GPU should cop 4k image data every frame!!!.

SO What is the right way to solve this problem?

My hardware is a notebook pc with Geforce 1050 Ti GPU and intel 630 GPU。
my convert code :
HRESULT hr = NV_NOERROR;
//
// Set shader
//
m_d3d11DeviceContext->VSSetShader(m_vertexShader, nullptr, 0);
m_d3d11DeviceContext->PSSetShader(m_pixelShader, nullptr, 0);
m_d3d11DeviceContext->IASetInputLayout(m_inputLayout);

//
// Set shader resource.
//
CComPtr<ID3D11ShaderResourceView> YPlaneTextureView, UVPlaneTextureView;
D3D11_SHADER_RESOURCE_VIEW_DESC YPlaneDesc, UVPlaneDesc;
ZeroMemory(&YPlaneDesc, sizeof(YPlaneDesc));
ZeroMemory(&UVPlaneDesc, sizeof(UVPlaneDesc));
YPlaneDesc.Format = DXGI_FORMAT_R8_UNORM;
YPlaneDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2DARRAY;
YPlaneDesc.Texture2DArray.MostDetailedMip = 0;
YPlaneDesc.Texture2DArray.MipLevels = 1;
YPlaneDesc.Texture2DArray.FirstArraySlice = srcTextureSliceIndex;
YPlaneDesc.Texture2DArray.ArraySize = 1;
hr = m_d3d11Device->CreateShaderResourceView(srcNV12Texture,
&YPlaneDesc,
&YPlaneTextureView);

if (FAILED(hr)) {
NvError()<<"Create shader resource view(y plane texture view) failed.";
return;
}
UVPlaneDesc.Format = DXGI_FORMAT_R8G8_UNORM;
UVPlaneDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2DARRAY;
UVPlaneDesc.Texture2DArray.MostDetailedMip = 0;
UVPlaneDesc.Texture2DArray.MipLevels = 1;
UVPlaneDesc.Texture2DArray.FirstArraySlice = srcTextureSliceIndex;
UVPlaneDesc.Texture2DArray.ArraySize = 1;
hr = m_d3d11Device->CreateShaderResourceView(srcNV12Texture,
&UVPlaneDesc,
&UVPlaneTextureView);
if (FAILED(hr)) {
NvError()<<"Create shader resource view(uv plane texture view) failed.";
return;
}

std::array<ID3D11ShaderResourceView *, 2> const resourceTextureView = { YPlaneTextureView, UVPlaneTextureView };
m_d3d11DeviceContext->PSSetShaderResources(0, (uint)resourceTextureView.size(), resourceTextureView.data());

//
// Upload shader const buffer.
//
D3D11_MAPPED_SUBRESOURCE mappedResource;
hr = m_d3d11DeviceContext->Map(m_pixelStageConstBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(hr)) {
    NvError()<<"Map shader const buffer failed.";
    return;
}
SNvD3DPixelStageConstBuffer *dataPtr = (SNvD3DPixelStageConstBuffer *)mappedResource.pData;
FillShaderConstBufferData(colorMode, colorTrans, hdrInfo, dataPtr);
m_d3d11DeviceContext->Unmap(m_pixelStageConstBuffer, 0);
std::array<ID3D11Buffer *, 1> const constBuffer = { m_pixelStageConstBuffer };
m_d3d11DeviceContext->PSSetConstantBuffers(0, (uint)constBuffer.size(), constBuffer.data());

std::array<ID3D11SamplerState *, 1> const samplerState = { m_samplerState };
m_d3d11DeviceContext->PSSetSamplers(0, (uint)samplerState.size(), samplerState.data());

//
// Set render target.
//
CComPtr<ID3D11RenderTargetView> renderTargetView;
hr = m_d3d11Device->CreateRenderTargetView(dstRGBATexture, nullptr, &renderTargetView);
if (FAILED(hr)) {
    NvError()<<"Create render target view failed.";
    return;
}
std::array<ID3D11RenderTargetView *, 1> const RTV = { renderTargetView };
m_d3d11DeviceContext->OMSetRenderTargets((uint)RTV.size(), RTV.data(), nullptr);

//
// Draw call.
//

uint vertexBufferStride = sizeof(SNvD3DVertexBuffer);
uint offset = 0;
std::array<ID3D11Buffer *, 1> const vertexBuffer = { m_vertexBuffer };
m_d3d11DeviceContext->IASetVertexBuffers(0, (uint)vertexBuffer.size(), vertexBuffer.data(), &vertexBufferStride, &offset);
m_d3d11DeviceContext->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
m_d3d11DeviceContext->Draw(6, 0);

Windows API - Win32
Windows API - Win32
A core set of Windows application programming interfaces (APIs) for desktop and server applications. Previously known as Win32 API.
2,429 questions
{count} votes