* This is potentially slightly less optimal as it means the initial
states aren't in GPU memory for a faster copy, but it means we're much
less likely to hit OOM due to way more GPU allocs, and it's still
pretty fast.
* Further optimisation is possible by reducing the number of images that
actually either need initial states at all (detect when images are
first used to clear via a renderpass), or by detecting images that are
frame invariant and we only have initial states for immutable contents
and avoid copying them more than once.
* This lets the usage actually have the right descriptor sets to check
against.
* It was lost as a result of a bad fix after copy-pasting removed the
previous read-time tracking.
* Sort of confused false positive - some pipeline SPIR-V blobs were
leaking but it was because they weren't being cached when they should
have, not because saving the shader cache doesn't destroy its blobs.
* These are some leaks, some mismatched new/deletes and some uninit'd
values. Mostly the leaks are what we care about so that the replay
host can be kept alive for a long time rather than needing to be
constantly restarted.
* Also added a valgrind suppression file to suppress some of the false
positives I ran into while testing.
* No need to base readback buffer on image memory requirements which
could be packed tighter than our requirements for readback (e.g. depth
and stencil combined formats)
* Descriptions for readers, for better error reporting and usage help.
This also allows these descriptions to be multiline, and they're
indented correctly.
* A bit better formatting of options and defaults.
* Print full list of errors when there are more than one.
* #undef max
* Header as well as footer on single-line command help.
* Remove parse and parse_check variants I don't use.
* Allow processing without looking at argv[0].
* Optionally stop the processing at the first non-command, so that
you can have a program and its arguments without trying to parse
the arguments themselves.
* On 32-bit, dispatchable objects are 32-bit wide since they're pointer
sized, vs 64-bit non-dispatchable objects. Using a dispatchable
pointer to write both types of objects leads to incorrect stepping.
* The spec allows for descriptor set updates that write beyond a given
binding's array count to continue on into the next consecutive binding
providing that it is the same type.