Everyone battle hardened in the world of Vulkan and D3D12 knows that debugging is ridiculously hard once we enter the domain of crashes and hangs. No one wants to do it, and seeing a random GPU crash show up is enough to want to quit graphics programming and take up farming on a remote island. Someone has to do it though, and given how questionable a lot of D3D12 content is w.r.t. correctness, this comes up a lot more often that we’d like in vkd3d-proton land.
The end goal of this blog is to demonstrate the magical UMR tool on Linux, which I would argue is the only reasonable post-mortem debugging method currently available on PC, but before we go that deep, we need to look at the current state of crash debugging on PC and the bespoke tooling we have in vkd3d-proton to deal with crashes.
Eating just crumbs makes for a miserable meal
Breadcrumbs is a common technique that most modern APIs have some kind of implementation of. The goal of breadcrumbs is simply to narrow down which draws or dispatches caused the failure. This information is extremely limited, but can sometimes be enough to figure out a crash if you’re very lucky and you have knowledge about the application’s intentions with every shader (from vkd3d-proton’s point of view, we don’t obviously).
Depending on the competency of the breadcrumb tool, you’d get this information:
- A range of draws or dispatches which could potentially have caused failure.
- Ideally, exactly the draw or dispatch which caused failure.
- If page fault, which address caused failure?
- Which resource corresponds to that failure? It is also possible that the address does not correspond to any resource. Causing true OOB on D3D12 and Vulkan is very easy.
As far as I know, this is where D3D12 on Windows ends, with two standard alternatives:
- WriteBufferImmediate (Basically VK_AMD_buffer_marker)
- DRED
There are vendor tools at least from NVIDIA and AMD which should make this neater, but I don’t have direct experience with any of these tools in D3D12, so let’s move on to the Vulkan side of things.
VK_AMD_buffer_marker breadcrumbs
Buffer markers is the simplest possible solution for implementing breadcrumbs. The basic idea is that a value is written to memory either before the GPU processes a command, or after work is done. On a device lost, counters can be inspected. The user will have to instrument the code somehow, either through a layer or directly. In vkd3d-proton, we can enable debug code which automatically does this for all D3D12 commands with VKD3D_CONFIG=breadcrumbs (not available in release builds).
For example, from our dispatch implementation:
VK_CALL(vkCmdDispatch(list->vk_command_buffer, x, y, z)); VKD3D_BREADCRUMB_AUX32(x); VKD3D_BREADCRUMB_AUX32(y); VKD3D_BREADCRUMB_AUX32(z); VKD3D_BREADCRUMB_COMMAND(DISPATCH);
Then it’s a matter of writing the breadcrumb somewhere:
cmd.type = VKD3D_BREADCRUMB_COMMAND_SET_BOTTOM_MARKER; cmd.count = trace->counter; vkd3d_breadcrumb_tracer_add_command(list, &cmd); VK_CALL(vkCmdWriteBufferMarkerAMD(list->vk_command_buffer, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, host_buffer, ...); trace->counter++; cmd.type = VKD3D_BREADCRUMB_COMMAND_SET_TOP_MARKER; cmd.count = trace->counter; vkd3d_breadcrumb_tracer_add_command(list, &cmd); VK_CALL(vkCmdWriteBufferMarkerAMD(list->vk_command_buffer, VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, host_buffer, ...);
We’ll also record commands and the parameters used in a side band buffer so that we can display the faulting command buffers.
Another thing to consider is that the buffer we write to must be coherent with the host. On a device lost happening randomly inside a command we won’t have the opportunity to perform host memory barriers and signal a fence properly, so we must make sure the memory punches straight through to VRAM. On AMD, we can do this with
memory_props = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_CACHED_BIT; if (is_supported(VK_AMD_device_coherent_memory)) { memory_props |= VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | VK_MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD; }
On fault, we scan through the host buffer, and if we observe that TOP and BOTTOM markers are not 0 (never executed) or UINT32_MAX (done), scan through and report the range of failing commands.
RADV speciality, making buffer markers actually useful
GPUs execute commands concurrently unless barriers are emitted. This means that there is a large range of potential draws or dispatches in flight at any one time. RADV_DEBUG=syncshaders adds barriers in between every command so that we’re guaranteed a hang will narrow down to a single command. No other Vulkan driver supports this, and it makes RADV the only practical driver for breadcrumb techniques, at least on Vulkan. Sure, it is possible to add barriers yourself between every command to emulate this, but for render passes, this becomes extremely annoying since you have to consider restarting the render pass for every draw call …
As a simple example, I’ve hacked up one of the vkd3d-proton tests to write a bogus root descriptor address, which is a great way to crash GPUs in D3D12 and Vulkan.
When running with just breadcrumbs, it’s useless:
Device lost observed, analyzing breadcrumbs ... Found pending command list context 1 in executable state, TOP_OF_PIPE marker 44, BOTTOM_OF_PIPE marker 0. ===== Potential crash region BEGIN (make sure RADV_DEBUG=syncshaders is used for maximum accuracy) ===== Command: top_marker marker: 1 Command: set_shader_hash hash: db5d68a6143611ad, stage: 20 Set arg: 0 (#0) Set arg: 18446603340520357888 (#ffff800100400000) Command: root_desc Set arg: 0 (#0) Set arg: 18446603340788793344 (#ffff800110400000) Command: root_desc Tag: ExecuteIndirect [MaxCommandCount, ArgBuffer cookie, ArgBuffer offset, Count cookie, Count offset] Set arg: 1 (#1) Set arg: 7 (#7) Set arg: 16 (#10) Set arg: 0 (#0) Set arg: 0 (#0) Set arg: 0 (#0) Command: execute_indirect_unroll_compute Command: bottom_marker marker: 1 Command: top_marker marker: 2 Command: execute_indirect Command: bottom_marker ... A ton of commands Command: barrier Command: bottom_marker marker: 44 ===== Potential crash region END =====
Instead, with syncshaders, it becomes:
===== Potential crash region BEGIN (make sure RADV_DEBUG=syncshaders is used for maximum accuracy) ===== Command: top_marker marker: 1 Command: set_shader_hash hash: db5d68a6143611ad, stage: 20 Set arg: 0 (#0) Set arg: 18446603340520357888 (#ffff800100400000) Command: root_desc Set arg: 0 (#0) Set arg: 18446603340788793344 (#ffff800110400000) <-- bogus pointer Command: root_desc Tag: ExecuteIndirect [MaxCommandCount, ArgBuffer cookie, ArgBuffer offset, Count cookie, Count offset] Set arg: 1 (#1) Set arg: 7 (#7) Set arg: 16 (#10) Set arg: 0 (#0) Set arg: 0 (#0) Set arg: 0 (#0) Command: execute_indirect_unroll_compute Command: bottom_marker marker: 1 ===== Potential crash region END =====
That’s actionable.