Sunday, April 26, 2015

Floating point, precision quaifiers, and optimization

ESSL permits optimizations that may change the value of floating point expressions (lowp and mediump precision change, reassociation of addition/multiplication, etc.), which means that identical expressions may give different results in different shaders. This may cause problems with e.g. alignment of geometry in multi-pass algorithms, so output variables may be decorated with the invariant qualifier to force the compiler to be consistent in how it generates code for them. The compiler is still allowed to do value-changing optimizations for invariant expressions, but it need to do it in the same way for all shaders. This may give us interesting problems if optimizations and code generation are done without knowledge of each other...

Example 1

As an example of the problems we may get with invariant, consider an application that is generating optimized SPIR-V using an offline ESSL compiler, and uses the IR with a Vulkan driver having a simple backend. The backend works on one basic block at a time, and is generating FMA (Fused Multiply-Add) instructions when multiplication is followed by addition. This is fine for invariant, even though FMA changes the precision, as the backend is consistent and always generates FMA when possible (i.e. identical expressions in different shaders will generate identical instructions).

The application has a shader
#version 310 es

in float a, b, c;
out invariant float result;

void main() {
    float tmp = a * b;
    if (c < 0.0) {
       result = tmp - 1.0;
    } else {
       result = tmp + 1.0;
    }
}
This is generated exactly as written if no optimization is done; first a multiplication, followed by a compare and branch, and we have two basic blocks doing one addition each. But the offline compiler optimizes this with if-conversion, so it generates SPIR-V as if main was written as
void main()
{
    float tmp = a * b;
    result = (c < 0.0) ? (tmp - 1.0) : (tmp + 1.0);
}
The optimization has eliminated the branches, and the backend will now see that it can use FMA instructions as everything is in the same basic block.

But the application has one additional shader where main looks like
void main() {
    float tmp = a * b;
    if (c < 0.0) {
       foo();
       result = tmp - 1.0;
    } else {
       result = tmp + 1.0;
    }
}
The optimization cannot transform the if-statement here, as the basic blocks are too complex. So this will not use FMA, and will therefore break the invariance guarantee. 

Example 2

It is not only invariant expressions that are problematic — you may get surprising results from normal code too when optimizations done offline and in the backend interacts in interesting ways. For example, you can get different precision in different threads from "redundant computation elimination" optimizations. This happens for cases such as 
mediump float tmp = a + b;
if (x == 0) {
  /* Code not using tmp */
  ...
} else if (x == 1) {
  /* Code using tmp */
  ...
} else {
  /* Code using tmp */
  ...
}
where tmp is calculated, but not used, for the case "x == 0". The optimization moves the tmp calculation into the two basic blocks where it is used
if (x == 0) {
  /* Code not using tmp */
  ...
} else if (x == 1) {
  mediump float tmp = a + b;
  /* Code using tmp */
  ...
} else {
  mediump float tmp = a + b;
  /* Code using tmp */
  ...
}
and the backend may now chose to use different precisions for the two mediump tmp calculations. 

Offline optimization with SPIR-V

The examples above are of course silly — higher level optimizations should not be allowed to change control flow for invariant statements, and the "redundant computation elimination" does not make sense for warp-based architectures. But the first optimization would have been fine if used with a better backend that could combine instructions from different basic blocks. And not all GPUs are warp-based. That is, it is reasonable to do this kind of optimizations, but they need to be done in the driver where you have full knowledge about the backend and architecture.

My impression is that many developers believe that SPIR-V and Vulkan implies that the driver will just do simple code generation, and that all optimizations are done offline. But that will prevent some optimizations. It may work for a game engine generating IR for a known GPU, but I'm not sure that the GPU vendors will provide enough information on their architectures/backends that this will be viable either.

So my guess is that the drivers will continue to do all the current optimizations on SPIR-V too, and that offline optimizations will not matter...

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.