* The internal printf was already used for all numeric types but PIX of course
has special encoding for strings so this was manually handled. Instead use the
callback-based formatter and decode strings for there, so that varargs string
length and string padding formatters are supported.
* On the active thread GSM reads come from the local GSM cache (not the global GSM data)
* This brings consistency in debugger UI when seeing the GSM data as variables (which is populated from the local GSM cache) and seeing the results of reading from the GSM data
GSM writes already populated the local GSM cache and the global GSM data
GSM Sync populates the local GSM cache with the data from the global GSM data
More checking of GSM local/global cache behaviour when debugging
One test is not GPU stable and its results are verified against hard coded expectation (this is to test the expected behaviour of the local GSM cache on the active thread)
Step Forwards: first appearance of a variable must have "before" = {}
Step Forwards: not-first appearance of a variable "before" must equal currently known value
Step Backwards: first appearance of a variable must have "after" = {}
Step Backwards: not-first appearance of a variable "after" must equal currently known value
* Update any SSA max points which are inside a loop to the next uniform block
* This covers the case of SSA IDs that are assigned to but never accessed
Example HLSL construct that is correct after this change (and incorrect before this change)
if (id >= 2 && id <= 20)
{
...
}
else
{
... <THE CORRECT THREADS MERGE HERE INTO A SINGLE TANGLE>
}
Also tweaked DXIL ControlFlow GraphVis output
Generate the graph in a string for use with RDCLOG or writing to a file
Changed (full) convergence nodes to be blue rectangles
Partial convergence nodes are rounded green rectangles
Partial convergence edges are green dashed lines
FIXUP GENERATE PARTIAL CONVERGENCE
Subgroup_Zoo : unit tests, non-trivial convergence tests moved to Workgroup_Zoo
Workgroup_Zoo : convergence tests, small number of unit tests (not full coverage)
Added checks for workgroup convergence in Workgroup_Zoo tests
* Vulkan uses barrier()
* D3D12 uses AllMemoryBarrierWithGroupSync()
* dispatches workgroup of 2x1x1
* test debug results for workgroup 1,0,0
* It is impossible to emit a true 16-bit type on fxc, the minXX types we round
up internally to a 32-bit type since that's how they are defined to appear in
external resources like cbuffers and SRV/UAVs.
* The new 16-bit type enums that are shared between fxc/dxc structs are not
actually ever emitted by fxc for RDEF types.
* Maps are recorded as open whenever we intercept them, usually only falling off
for high traffic resources or direct maps like WRITE_NO_OVERWRITE.
* Unmaps can be successful any time as long as they're intercepted as reads (no-
op) or write discard (since we just need to intercept these).
* Unmaps from other write types require a map during an active capture to ensure
we properly set up shadow pointers.
* This follows PIX's algorithm in most places which although undocumented is
something many people expect to work. It deviates only when there would be
significant performance penalties for little gain.
* This is apparently in a format capability class to R8 it seems, and since we
don't expect anyone to be rendering to A8 let alone in MSAA, there's no point
in testing this.
With GLES, a precision specifier is mandatory for float types.
Specifying one in the user shader is not enough because it happens too
late after uvec2 and uvec4 uses in the custom prefix.