* In heavy D3D11 workloads the refcounting overhead especially during fast
binding changes was significant. Refactoring the refcounting to work on a
different model and deferring destruction of objects removes most of the
overhead.
* The cost of searching a few more pools to check allocations isn't so bad
especially if we can move IsAlloc() off the hot path. Better that than
allocating 100MB in pools.
* By default cmake will print huge scary warnings if the packages aren't
available instead of silently continuing and letting us check, which is a bad
default for optional packages.
* We only enable this if we find an existing struct somewhere enabling the base
feature. Otherwise we might try to enable this when it's not supported at all.
* We used to get this 'for free' by serialising our patched device next chains,
but we no longer do that so we need to enable this bit on replay as well.
* If we insert chunks next to the recorded commands for indirect draws we need
to update all other commands which are recorded but not submitted which have
chunk indices. Updating this could be very complex if the command buffer
record is only partially complete when the submit happens (which is quite
possible if it's not submitted until later), so instead we abandon trying to
have indirect chunks next to the recorded command chunks since it's not
strictly necessary.
* EXT_transform_feedback allow us to use what we want and is core in the minimum
replay version (3.0 for GL/GLES), but it doesn't include separate xfb objects.
* In D3D12 if the user passes NULL for the UAV or SRV descriptor when calling
Create*View we don't have the runtime to generate a default one for us when we
query, we'll just have nothing stored. So instead when we need to generate a
default "whole resource" descriptor to look up.
* QNetworkAccessManager is supposed to be asynchronous and threaded internally,
but calling get() the first time can take multiple *seconds* while it
initialises proxy data and loads ssl libraries.
* Qt's threading rules are so strict it's impossible to feasibly move
QNetworkAccessManager to another thread.
* Instead we use Qt's cross-thread signals and slots to move the whole thing
into a wrapper object. It's stupid.
* GenerateGLSLShader is reasonably expensive because it uses glslang to
preprocess the shaders. If we can cache the input hash and look up the cache
with that hash then we can skip it entirely.
* If we default to D3D11 at construction time, if we have persist data (very
likely) and it's for another API then we'll have to destroy the D3D11 viewer
and recreate the other API's viewer.
* It seems like on nv windows we need to explicitly rebind the main context to
the main thread and give each worker thread its own window, to prevent the
worker thread from being unable to bind its context sometimes.