* If an application allocates from and resets descriptor pools at very high
frequency the overhead of freeing and reallocating those descriptor sets can
be high. Instead use the descriptor pool as a pool for children and look up
the freelist first for an existing descriptor set before trying to allocate a
new one.
* This is still accurate, what we're missing is "read data as int, then cast to
float" which is represented by setting 'floatCast' to true. A normalized cast
or interpret is accurately represented by saying the input is snorm/unorm
typed.
* When we roll over from one binding to another due to descriptor count being
larger than a single binding, we need to update the frame reftype since it
might go from storage to sampled or vice-versa and so change from read-only to
read-write.
* While active capturing we might do significant work to flush coherent mapped
memory regions and prepare initial contents for postponed resources that are
about to be write-referenced. We need to do that before submitting the actual
work to the queue or else the contents may be corrupted.
* We track memory bindings to see which regions of a memory object are only used
for tiled images, and discard any writes in case this was accidental detection
of changes by the GPU which we don't want to replay. In the case of aliasing
if there's linear and tiled resources then we still replay the writes.
* Note that we have to take a slower path involving a copy since we can't
serialise straight into memory in this case, so applications should avoid
mapping memory behind
* When we changed to serialise render target descriptor contents at list record
time we also updated all descriptor writes to happen immediately so we'd get
the latest contents. However we didn't also update copies, so copies before
OMSetRenderTargets weren't properly reflected.
* There's nothing that needs the 'old' copy of descriptors so we can remove any
pending/deferring of updates and do it immediately, which also saves some
tracking.
* The function is illegal to call regardless of whether we get a non-NULL
function pointer. Core GLES doesn't support glBindFragDataLocation but
fortunately we don't need to call it ourselves unless the user has done some
dynamic binding - which assumes glBindFragDataLocation is available.
* Resources which aren't referenced in the frame don't need initial states
unless we have 'Ref All Resources' enabled. These initial states can be
stripped on replay as they aren't needed.
* We also renamed the WrittenRecords to more explicitly list that this is the
list of resources needing initial contents, whether because they were dirty
(and so had initial contents) or because they were written mid-frame and so
need to be reset.
* Instead of waiting for idle, we allocate a command buffer per swapchain image
to render the text overlay and use semaphores and fences to properly
synchronise with other GPU work ongoing.
* On discrete GPUs that expose PCI-E window memory (device local but still host
visible) this memory is extremely slow to read from on the CPU. It's
significantly faster to issue a command buffer to get the GPU to copy into CPU
memory and wait on that command buffer to finish, then read from the copy.
* We do this for detected coherent writes in queue submit, issuing the copy on
the queue being submitted to. We do *not* do this for memory unmaps or
explicit application flushes. This does mean those will remain slow, however
with no queue to use the synchronisation challenges become more significant
and most applications leave memory persistently mapped.
* We only care about tracking two things:
1. Resources that have been written very recently. These should not be
postponed as there's a high chance they'll be written mid-frame and so we'd
need their initial contents.
2. Resources that have their last non-complete-write reference was a while ago
However in the second case we can acceptably ignore any resources that haven't
been written recently either, since if the resource hasn't been written and
also hasn't been complete-written then it hasn't been used at all.
* So when updating the non-complete-write time we only do this if the resource
has had a write reference, and intermittently we remove any resources that
haven't had a write at all.
* Postponed resources will be exactly the same set, because we treat a resource
as postponable if we have no write time for it at all so it's fine to remove
old resources from the list. Fewer resources will be skipped, as we now treat
resources that have no known age as non-skippable. However in the majority of
these cases we expect either for the resource to not be used at all (thus the
postpone will never be forced to prepare and we won't serialise anything), or
else if it is used the chances are high it will be used read-only so the
postpone will still be enough.
* This means we don't have to iterate the whole bindrefs array every time we
want to propagate references in the background, but we can submit them in
batch.
* Almost all dirty-able resources (memory and images) become dirty almost
immediately, so spending time tracking dirty state is wasted. Instead we treat
these resources as dirty at creation and rely on the postponing logic to avoid
preparing initial states for newly created resources that are not used in the
frame.
* This may cause more 'last-minute' postponed prepares for newly created
resources, which would previously.