Intro
In this article I want to correct a popular misconception that’s been making the rounds in computer graphics aficionado circles for a long time now. It has to do with branching in the GPUs. Unfortunately there are a couple of educational websites out there that are spreading some misinformation and it would be nice correcting that. I tried contacting the authors without success, so without further ado, here goes my attempt to fix things up:
The issue
So, say I have this code, which I actually published the other day:
vec2 snap45( in vec2 v )
{
vec2 s = sign(v);
float x = abs(v.x);
return x>0.923880?vec2(s.x,0.0):
x>0.382683?s*sqrt(0.5):
vec2(0.0,s.y);
}
The exact details of what it does don’t matter for this discussion. All we care about is the two ternary operations, which as you know, implement conditional execution. Indeed, depending on the value of the variable x, the function will return different results. This could be implemented also with regular if statements, and all that I’m going to say stays the same.
But here’s the problem – when seeing code like this, somebody somewhere will invariably propose the following “optimization”, which replaces what they believe (erroneously) are “conditional branches” by arithmetical operations. They will suggest something like this:
vec2 snap45( in vec2 v )
{
vec2 s = sign(v);
float x = abs(v.x);
float w0 = step(0.92387953,x);
float w1 = step(0.38268343,x)*(1.0-w0);
float w2 = 1.0-w0-w1;
vec2 res0 = vec2(s.x,0.0);
vec2 res1 = vec2(s.x,s.y)*sqrt(0.5);
vec2 res2 = vec2(0.0,s.y);
return w0*res0 + w1*res1 + w2*res2;&#
13 Comments
ttoinou
Thanks Inigo !
How are we supposed to know what OpenGL functions are emulated rather than calling GPU primitives ?
doctorhandshake
I don’t know enough about these implementations to know if this can be interpreted as a blanket ‘conditionals are fine’ or, rather, ‘ternary operations which select between two themselves non-branching expressions are fine’.
Like does this apply if one of the two branches of a conditional is computationally much more expensive? My (very shallow) understanding was that having, eg, a return statement on one branch and a bunch of work on the other would hamstring the GPU’s ability to optimize execution.
toredo1729_2
Unrelated, but somehow similar: I really hate it that it's not possible to force gcc to transform things like this into a conditional move:
x > c ? y : 0.;
It annoyed me many times and it still does.
ryao
Do shader compilers have optimization passes to undo this mistake and if not, could they be added?
mirsadm
I've been caught by this. Even Claude/ChatGPT will suggest it as an optimisation. Every time I've measured a performance drop doing this. Sometimes significant.
londons_explore
So why isn't the compiler smart enough to see that the 'optimised' version is the same?
Surely it understands "step()" and can optimize the "step()=0.0" and "step()==1.0" cases separately?
This is presumably always worth it, because you would at least remove one multiplication (usually turning it into a conditional load/store/something else)
flowzai4
[dead]
magicalhippo
Processors change, compilers change. If you care about such details, best to ship multiple variants and pick the fastest one at runtime.
As I've mentioned here several times before, I've made code significantly faster by removing the hand-rolled assembly and replacing it with plain C or similar. While the assembly might have been faster a decade or two ago, things have changed…
quuxplusone
I'm sure TFA's conclusion is right; but its argument would be strengthened by providing the codegen for both versions, instead of just the better version. Quote:
"The second wrong thing with the supposedly optimizer [sic] version is that it actually runs much slower than the original version […] wasting two multiplications and one or two additions. […] But don't take my word for it, let's look at the generated machine code for the relevant part of the shader"
—then proceeds to show only one codegen: the one containing no multiplications or additions. That proves the good version is fine; it doesn't yet prove the bad version is worse.
TinkersW
It is weird how long misinformation like this sticks around, the conditional move/select approach has been superior for decades on both CPU & GPU, but somehow some people still write the other approach as an "optimization".
mahkoh
The author seems unaware of
which encodes the desired behavior and also works for vectors:
alkonaut
I wish there was a good way of knowing when an if forces an actual branch rather than when it doesn't. The reason people do potentially more expensive mix/lerps is because while it might cost a tiny overhead, they are scared of making it a branch.
I do like that the most obvious v = x > y ? a : b; actually works, but it's also concerning that we have syntax where an if is some times a branch and some times not. In a context where you really can't branch, you'd almost like branch-if and non-branching-if to be different keywords. The non-branching one would fail compilation if the compiler couldn't do it without branching. The branching one would warn if it could be done with branching.
DrNosferatu
This should be quantified and generalized for a full set of cases – that way the argument would stand far more clearly.