- Separate the pixel history copy pixel shader into two separate
shaders, one for colour copy and one for depth
- Allocate and update descriptor sets on demand
- Add another compute shader for pixel history depth copy
* If we don't do this, the application could create a buffer/memory and use it
then destroy it and create another one in the same capture. The driver could
assign the same opaque capture address to both buffers even though they
"overlap" in the capture because they don't overlap in actual execution.
* Slightly artificially extending the life of the resource to the end of the
capture ensures the driver gives them non-overlapping device addresses.
Add support for requesting pixel history for depth/stencil images.
Also, adjust which index is used to patch primitive ID and fixed
fragment color shaders. Before it was using the index of the target
image in the framebuffer attachments. But it should be the index of the
corresponding color attachment.
Add support for other depth/stencil formats (other than D32_SFLOAT).
Remember the depth/stencil attachment format to correctly update the
values.
* We preserve each API's interpretation of bit order for packed formats like
RGBA4 or R5G6B5 when displaying the raw data in the UI, but when we need to
proxy it or save to disk, we always transform to D3D's order as standard.
* This allows us to proxy them reliably because we always have a standard bit
order and APIs that need a different order transform when fetching data to the
standard format, or setting proxy data from the standard format.
- For counting the number of fragments, also need to disable the depth
bounds test, since we are not initializing the depth value currently
- Reset depth to 0.0f for shader out, and set it to always pass. So that
we can get depth values from just one fragment.
- Initialize premod value for individual fragment events. This is not
surfaced in the UI, but available from the python API and checked in
tests.
The rules for merging semantics into an array were not strict enough.
If either the type, interpolation mode, or size is different, or if it
is not using a register's x component, then we can't combine it. Also,
the rules for marking a semantic as array length 1 were too strict,
resulting in some semantics packing into other registers incorrectly.
* This also fixes a case where pipelines with dynamic stencil masks wouldn't
have the masks properly set for stencil counting and we wouldn't get shader
output properly.
* We change to use VarType instead of CompType for signature parameters which
allows us to represent different types of variables beyond just
unsigned/signed integer and float.
Use a separate compute shader module for MSAA copy, and output
directly into the destination buffer instead of creating
staging resources.
Support case where there is no depth stencil attachment to get post mod
values in per fragment reporting. Previously used the original
framebuffer that might not have had depth/stencil view, so couldn't
count the fragments. Now use the sub image.
To get the post mod color, we need to blend with the premod color, so
we use vkCmdCopyImage to copy from the original image.