* In heavy D3D11 workloads the refcounting overhead especially during fast
binding changes was significant. Refactoring the refcounting to work on a
different model and deferring destruction of objects removes most of the
overhead.
* The cost of searching a few more pools to check allocations isn't so bad
especially if we can move IsAlloc() off the hot path. Better that than
allocating 100MB in pools.
* We only enable this if we find an existing struct somewhere enabling the base
feature. Otherwise we might try to enable this when it's not supported at all.
* We used to get this 'for free' by serialising our patched device next chains,
but we no longer do that so we need to enable this bit on replay as well.
* If we insert chunks next to the recorded commands for indirect draws we need
to update all other commands which are recorded but not submitted which have
chunk indices. Updating this could be very complex if the command buffer
record is only partially complete when the submit happens (which is quite
possible if it's not submitted until later), so instead we abandon trying to
have indirect chunks next to the recorded command chunks since it's not
strictly necessary.
* EXT_transform_feedback allow us to use what we want and is core in the minimum
replay version (3.0 for GL/GLES), but it doesn't include separate xfb objects.
* In D3D12 if the user passes NULL for the UAV or SRV descriptor when calling
Create*View we don't have the runtime to generate a default one for us when we
query, we'll just have nothing stored. So instead when we need to generate a
default "whole resource" descriptor to look up.
* GenerateGLSLShader is reasonably expensive because it uses glslang to
preprocess the shaders. If we can cache the input hash and look up the cache
with that hash then we can skip it entirely.
* The spec says we only return a function pointer for device or device-child
functions. In practice the loader wraps instances and physical devices so when
calling a direct GDPA returned function the loader won't unwrap it so we won't
get our proper wrapped objects and will crash.
* If we keep the page set in the command buffer and destroy it on reset, we'll
free the pages behind the baked chunks that we stored. If a capture records
and resets a command buffer multiple times within a capture we need to store
multiple baked command buffers.
* So instead we give the baked commands ownership of those pages and reset them
when the baked commands record is destroyed (either because the command buffer
has been reset as it was before - or if we hold onto a reference during
capture then after the capture is done).
* We tune the pipeline state view and texture viewer to only iterate over a
small list of dynamically used binds in the (vastly more common) case where
unused binds are not being shown.
* GL needs special handling because cubemaps need to be treated partially as
arrays to select the target, then not as arrays when the data is retrieved.