Subgroup_Zoo : unit tests, non-trivial convergence tests moved to Workgroup_Zoo
Workgroup_Zoo : convergence tests, small number of unit tests (not full coverage)
Added checks for workgroup convergence in Workgroup_Zoo tests
* Vulkan uses barrier()
* D3D12 uses AllMemoryBarrierWithGroupSync()
* dispatches workgroup of 2x1x1
* test debug results for workgroup 1,0,0
* It is impossible to emit a true 16-bit type on fxc, the minXX types we round
up internally to a 32-bit type since that's how they are defined to appear in
external resources like cbuffers and SRV/UAVs.
* The new 16-bit type enums that are shared between fxc/dxc structs are not
actually ever emitted by fxc for RDEF types.
* Maps are recorded as open whenever we intercept them, usually only falling off
for high traffic resources or direct maps like WRITE_NO_OVERWRITE.
* Unmaps can be successful any time as long as they're intercepted as reads (no-
op) or write discard (since we just need to intercept these).
* Unmaps from other write types require a map during an active capture to ensure
we properly set up shadow pointers.
* This follows PIX's algorithm in most places which although undocumented is
something many people expect to work. It deviates only when there would be
significant performance penalties for little gain.
* This is apparently in a format capability class to R8 it seems, and since we
don't expect anyone to be rendering to A8 let alone in MSAA, there's no point
in testing this.
With GLES, a precision specifier is mandatory for float types.
Specifying one in the user shader is not enough because it happens too
late after uvec2 and uvec4 uses in the custom prefix.
Check the renderdoc log for lines matching "Assertion" or "Error"
Using new helper function in testcase.py
def check_renderdoc_log(self, asserts: bool = True, errors: bool = True):
Set the thread property SubgroupId for the extra lanes outside of the subgroup
pass the workgroup laneIndex to BeginDebug
Set the thread property GroupThreadIndex, GroupFlatIndex for all workgroup lanes
Use SV_GroupThreadID to fill in threadid in the compute fetcher instead of SV_DispatchThreadID
Keep SV_DispatchThreadID to identify the candidate thread
Tests specifically aimed at workgroup debugging i.e GSM and non-aligned subgroups
Not focused on unit tests of subgroup/quad instructions that is handled by *_Subgroup_Zoo
* The problem here is that due to design flaws in the extension when ASs are in
use we don't know whether a memory allocation will need BDA or not an the
application doesn't have to set any flag - unlike for normal buffer BDA. So we
promote (almost) all memory allocations to BDA when using ASs even if they're
not needed.
* This normally works fine except if during self-capture the replay process
allocates some normal memory before all application replayed allocations have
been made, the self-capturing will promote it to BDA and request a replayable
address that might clash with a later address the application had used and
would be needed.
* To solve this, we ensure that during capture we don't create wrapped
allocations more than necessary - to avoid causing clashes - as well as
ensuring that on replay we only create new allocations after all replayed
allocations.
* We also take advantage of dedicated allocations for fake swapchain images,
since dedicated image allocations will not be promoted to BDA.