* This follows PIX's algorithm in most places which although undocumented is
something many people expect to work. It deviates only when there would be
significant performance penalties for little gain.
* This is apparently in a format capability class to R8 it seems, and since we
don't expect anyone to be rendering to A8 let alone in MSAA, there's no point
in testing this.
With GLES, a precision specifier is mandatory for float types.
Specifying one in the user shader is not enough because it happens too
late after uvec2 and uvec4 uses in the custom prefix.
Check the renderdoc log for lines matching "Assertion" or "Error"
Using new helper function in testcase.py
def check_renderdoc_log(self, asserts: bool = True, errors: bool = True):
Set the thread property SubgroupId for the extra lanes outside of the subgroup
pass the workgroup laneIndex to BeginDebug
Set the thread property GroupThreadIndex, GroupFlatIndex for all workgroup lanes
Use SV_GroupThreadID to fill in threadid in the compute fetcher instead of SV_DispatchThreadID
Keep SV_DispatchThreadID to identify the candidate thread
Tests specifically aimed at workgroup debugging i.e GSM and non-aligned subgroups
Not focused on unit tests of subgroup/quad instructions that is handled by *_Subgroup_Zoo
* The problem here is that due to design flaws in the extension when ASs are in
use we don't know whether a memory allocation will need BDA or not an the
application doesn't have to set any flag - unlike for normal buffer BDA. So we
promote (almost) all memory allocations to BDA when using ASs even if they're
not needed.
* This normally works fine except if during self-capture the replay process
allocates some normal memory before all application replayed allocations have
been made, the self-capturing will promote it to BDA and request a replayable
address that might clash with a later address the application had used and
would be needed.
* To solve this, we ensure that during capture we don't create wrapped
allocations more than necessary - to avoid causing clashes - as well as
ensuring that on replay we only create new allocations after all replayed
allocations.
* We also take advantage of dedicated allocations for fake swapchain images,
since dedicated image allocations will not be promoted to BDA.
* For very large shader symbol stores especially those on network drives, the
bad behaviour that PIX has to recursively search all possible subdirectories
and enumerate all files can be really slow. Most of the time a file is
identified by its hash filename and looked up directly - if that isn't a hit,
in many cases users would rather a fast exit to having no symbols.