Skip to content Skip to footer
0 items - $0.00 0

Don’t “optimize” conditional moves in shaders with mix()+step() by romes

Don’t “optimize” conditional moves in shaders with mix()+step() by romes

Don’t “optimize” conditional moves in shaders with mix()+step() by romes

13 Comments

  • Post Author
    ttoinou
    Posted February 9, 2025 at 1:25 pm

    Thanks Inigo !

      The second wrong thing with the supposedly optimizer version is that it actually runs much slower than the original version. The reason is that the step() function is actually implemented like this:
    
      float step( float x, float y )
      {
          return x < y ? 1.0 : 0.0;
      }
    

    How are we supposed to know what OpenGL functions are emulated rather than calling GPU primitives ?

  • Post Author
    doctorhandshake
    Posted February 9, 2025 at 1:28 pm

    I don’t know enough about these implementations to know if this can be interpreted as a blanket ‘conditionals are fine’ or, rather, ‘ternary operations which select between two themselves non-branching expressions are fine’.

    Like does this apply if one of the two branches of a conditional is computationally much more expensive? My (very shallow) understanding was that having, eg, a return statement on one branch and a bunch of work on the other would hamstring the GPU’s ability to optimize execution.

  • Post Author
    toredo1729_2
    Posted February 9, 2025 at 1:32 pm

    Unrelated, but somehow similar: I really hate it that it's not possible to force gcc to transform things like this into a conditional move:

    x > c ? y : 0.;

    It annoyed me many times and it still does.

  • Post Author
    ryao
    Posted February 9, 2025 at 1:40 pm

    Do shader compilers have optimization passes to undo this mistake and if not, could they be added?

  • Post Author
    mirsadm
    Posted February 9, 2025 at 1:45 pm

    I've been caught by this. Even Claude/ChatGPT will suggest it as an optimisation. Every time I've measured a performance drop doing this. Sometimes significant.

  • Post Author
    londons_explore
    Posted February 9, 2025 at 1:51 pm

    So why isn't the compiler smart enough to see that the 'optimised' version is the same?

    Surely it understands "step()" and can optimize the "step()=0.0" and "step()==1.0" cases separately?

    This is presumably always worth it, because you would at least remove one multiplication (usually turning it into a conditional load/store/something else)

  • Post Author
    flowzai4
    Posted February 9, 2025 at 1:56 pm

    [dead]

  • Post Author
    magicalhippo
    Posted February 9, 2025 at 1:59 pm

    Processors change, compilers change. If you care about such details, best to ship multiple variants and pick the fastest one at runtime.

    As I've mentioned here several times before, I've made code significantly faster by removing the hand-rolled assembly and replacing it with plain C or similar. While the assembly might have been faster a decade or two ago, things have changed…

  • Post Author
    quuxplusone
    Posted February 9, 2025 at 2:14 pm

    I'm sure TFA's conclusion is right; but its argument would be strengthened by providing the codegen for both versions, instead of just the better version. Quote:

    "The second wrong thing with the supposedly optimizer [sic] version is that it actually runs much slower than the original version […] wasting two multiplications and one or two additions. […] But don't take my word for it, let's look at the generated machine code for the relevant part of the shader"

    —then proceeds to show only one codegen: the one containing no multiplications or additions. That proves the good version is fine; it doesn't yet prove the bad version is worse.

  • Post Author
    TinkersW
    Posted February 9, 2025 at 2:15 pm

    It is weird how long misinformation like this sticks around, the conditional move/select approach has been superior for decades on both CPU & GPU, but somehow some people still write the other approach as an "optimization".

  • Post Author
    mahkoh
    Posted February 9, 2025 at 2:22 pm

        So, if you ever see somebody proposing this
    
        float a = mix( b, c, step( y, x ) );
    

    The author seems unaware of

        float a = mix( b, c, y > x );
    

    which encodes the desired behavior and also works for vectors:

        The variants of mix where a is genBType select which vector each returned component comes from. For a component of a that is false, the corresponding component of x is returned. For a component of a that is true, the corresponding component of y is returned.

  • Post Author
    alkonaut
    Posted February 9, 2025 at 2:56 pm

    I wish there was a good way of knowing when an if forces an actual branch rather than when it doesn't. The reason people do potentially more expensive mix/lerps is because while it might cost a tiny overhead, they are scared of making it a branch.

    I do like that the most obvious v = x > y ? a : b; actually works, but it's also concerning that we have syntax where an if is some times a branch and some times not. In a context where you really can't branch, you'd almost like branch-if and non-branching-if to be different keywords. The non-branching one would fail compilation if the compiler couldn't do it without branching. The branching one would warn if it could be done with branching.

  • Post Author
    DrNosferatu
    Posted February 9, 2025 at 3:28 pm

    This should be quantified and generalized for a full set of cases – that way the argument would stand far more clearly.

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.