* This function patches a mesh shader to write to BDA instead of the output
object, as well as storing the vertex/primitive count. It separates indices,
per-vertex rate outputs and per-primitive rate outputs. The output is stored
not fully interleaved because we replace pointers in-place and they may be
combined if an output is a struct.
* If preceeded by a task shader, it will read a per-dispatch offset to account
for the fact that each task group produces an independent series of mesh
groups that can't differentiate from each other. Within that set of groups,
each group assigns its output linearly and this output will potentially be
sparse as it's sized based on the worst case output.
* This task shader is relatively degenerate, it just loads payload & dispatch
size from a buffer, writes to the payload, and dispatches. Used to
deterministically replay the same set of tasks as were saved.
* We do this by patching the shader to write to BDA pointers instead of Payload
pointers for all payload pointers, and also store the number of meshes
dispatched.
* After running this shader we'll read the results back and then feed the data
back onto the GPU via our own task shader, to ensure we can conservatively
size the mesh output buffer accounting for whatever task shader expansion
happened.
* This supports capture and replay of mesh draws, shader editing with printf
support, overlays, and pixel shader debugging.
* Not supported yet include the mesh viewer and shader debugging.
* The enums are given after compute, to preserve indices for the normal vertex
pipeline.
* Mesh dispatches are considered a new action type, rather than being bundled
into the `Drawcall` type. This will allow them to be distinguished by API
backends as needed. The UI treats them as drawcalls
* We apply this universally even though it's not relevant to D3D11/GL. It means
a couple of empty array entries but it should not cause any significant
issues.
* Shader messages will be identified by group and thread as with compute
shaders. For mesh shaders there is an additional subdivision to identify them
by task group, since each task group can submit a grid of mesh groups.
* Add helpers for creating DX op instructions with less boilerplate.
* On encode, strip any unused functions or globals to comply with strict
DXIL validation requirements.
* Create attribute sets on demand to match functions.
* Add some extra helpers for creating constants, blocks, and patching
runtime chunk.