Hello yuz-ers! There were fewer individual changes this month, but the changes that were made are substantial! You won’t want to miss this.
Poor Melia up there.
Project Y.F.C. 1.90!
Blinkhawk showed up one weekend and asked, “Want to test a 50% performance boost and almost perfect rendering on Normal GPU accuracy?”
And that’s exactly what we did.
A more accurate name for this change would be “a rewrite of the Buffer Cache Rewrite”, perhaps rBCR for short?
Essentially, Blinkhawk rewrote most of the old buffer cache changes that Rodrigo introduced two years ago, taking into account the new demands of recent games and the issues found with the original BCR.
Part of the work also involves:
- Allowing the verification of fencing and writing of asynchronous downloads in a separate thread
- Restructuring how accuracy is managed by skipping host-guest fence synchronization and not downloading on host conditional rendering for normal GPU accuracy
- Improving consistency for
Query Cache
asynchronous downloads
The results are amazing. Most games that used to need High GPU accuracy to render correctly can now run on Normal with no issues.
Additionally, all this wizardry reduces bandwidth usage and boosts performance up to 87% for everyone (50% on average), from the low-end APUs to the high-end beasts.
Here’s an incomplete list of changes:
- As noted previously, many games which required High GPU accuracy to be visually accurate now work with Normal GPU accuracy with minimal sacrifice.
- Particles and character lighting/shading in
Pokémon Sword & Shield
have been fixed on Normal GPU accuracy. Performance has improved by up to 40% on Normal GPU accuracy. - Models (the BowWow, for example) and particle rendering are fixed on Normal GPU accuracy in
The Legend of Zelda: Link's Awakening
. Performance on Normal accuracy, with correct rendering, is now up to 70% higher than before. - Lighting in
Diablo II: Resurrected
has been fixed and will no longer flicker. - Lighting and shadows in
Luigi's Mansion 3
will no longer randomly flicker. - Pokémon photograph detection and data of
New Pokémon Snap
has been fixed on Normal GPU accuracy. This results in up to a 50% increase in performance with working photograph detection. Kirby and the Forgotten Land
vertex explosions, lighting, and particles have been fixed on Normal GPU accuracy. This results in an up to 40% performance increase, with accurate rendering on Normal accuracy.- Red lights in some machines in
Xenoblade Chronicles 2
have been fixed. Fire Emblem Warriors
has been accurately fixed and no longer requires a workaround.MONSTER HUNTER RISE
now accurately renders on Normal GPU accuracy, resulting in an up to 50% performance increase (note, however, that updates after 3.0.0 still have issues and require more work).- Vertex explosions in
Persona 5 Royal
no longer occur with Normal GPU accuracy, resulting in an up to 30% increase in performance. Atelier Ryza
series games now render correctly.- The pessimistic flushes option in advanced graphics settings is no longer needed in any of the affected games it benefitted and we have now removed it.
Mortal Kombat 11
no longer has any vertex explosions.NieR:Automata The End of YoRHa Edition
now renders correctly.Bayonetta 3
no longer requires High GPU accuracy to render correctly.Splatoon 2
’s ink physics work correctly on AMD GPUs while using High GPU accuracy.- Particles in
The Legend of Zelda: Breath of the Wild
have been fixed, resulting in 40% higher performance and accurate rendering on Normal GPU accuracy. - Tree flickering in
The Legend of Zelda: Breath of the Wild
has been fixed on all GPU accuracy options. - And much, much more!
No option needs to be enabled to take advantage of all of this, just switch GPU accuracy to Normal if you haven’t already. What are you waiting for?
Here are some stats of some of the most popular games.
We compared High GPU accuracy in Mainline 1407, and Normal GPU accuracy in Mainline 1421.
All tests are done at 2X resolution scaling, and using mods to disable dynamic resolution when possible.
And then we have these four, the high FPS squad. They’re reason enough to consider asking the modding community to start releasing 240 FPS mods!
Expect even higher numbers with a Zen 4 3D V-cache chip.
For example, in the same testing spot of Breath of the Wild
, a non-3D 7900X gets 90 FPS.
Other graphical changes
Citra-legend GPUCode stepped up to give us a hand with presentation. Presentation is the final step of most graphics code — the process of getting the output to the screen.
GPUCode’s work moves swapchain operations
to a separate thread in order to avoid stalling the main GPU thread. This improves performance in more demanding titles and on low-end hardware, and can make the difference between barely getting 60 and getting a smooth 60 frames per second in many cases.
However, it can also make the frametimes less consistent, therefore we’ve turned it off by default to allow for further testing. We need to determine which systems and games benefit the most.
For those interested in trying it, the toggle is available in Emulation > Configure… > Graphics > Advanced > Enable asynchronous presentation (Vulkan only)
.
vonchenplus continues to work on making the code match the information NVIDIA has made public
in their latest documentation.
You may remember Wollnashorn from their role in overhauling the Vulkan pipeline cache.
Now, Wollnashorn presents us with a technique to bypass hardware limitations in order to make The Legend of Zelda: Breath of the Wild
render accurately on non-NVIDIA hardware.
Object edges, especially grass blades, had distinct black borders on AMD and Intel GPUs.
The issue occurred regardless of the driver in use, so it was clearly a hardware limitation, and an incompatibility with what the game expects.
The Legend of Zelda: Breath of the Wild
uses a technique called deferred rendering–in this particular case, shadows render at half the resolution.
Four pixels of the full resolution depth texture are sampled simultaneously with a textureGather call.
textureGather
has the characteristic of working with normalized floating-point coordinates for the texture, so each fragment is always at the boundary of the four selected pixels.
Now, textureGather
uses floating-point, and each GPU design will have a different rounding precision.
Additionally, thanks to a blogspot by Nathan Reed, we know the integer coordinates of the pixel on the texture are calculated by the GPU after a conversion from a floating-point number to a fixed-point number.
With floating point conversions involved, you may be able to tell where this is going.
If the user’s GPU is not using the same rounding precision as the Nintendo Switch, different pixels can be sampled. Ergo, only NVIDIA GPUs got the four correct pixels the game intended.
How did Wollnashorn solve this? With a clever little trick,
of course.
Adding a very tiny (1/512) subpixel offset to the sample coordinates is sufficient to fudge the rounding.
Achieving that required modifying the code of the SPIR-V and GLSL backends, altering how the operation is handled only for AMD and Intel hardware for now, with the option to force it for any other future hardware that may require it, for example, a certain fruit company.
Here’s the final result:
Funny how we end up sharing the same problems Cemu faced (The Legend of Zelda: Breath of the Wild)
Something we have to mention is that this doesn’t fix a very similar-looking black line issue present when using anisotropic filtering
values higher than Default with AMD and Intel GPUs.
That’s a separate issue and we recommend Red and Blue Team users to at least set a per-game setting for The Legend of Zelda: Breath of the Wild
to set it to Default only.
The game doesn’t benefit from higher values anyway, since its terrain textures don’t seem to take advantage of it.
That’s a clean look all around (The Legend of Zelda: Breath of the Wild)
Linux got its well deserved share of love too thanks to byte[].
First up, he fixed up initialization of the Vulkan swapchain on Wayland,
helping Linux NVIDIA users to be able to launch their games.
As some of you may know, NVIDIA was historically very stubborn about their support of Wayland, and it doesn’t help that most Wayland compositors are very stubborn on their own too.
The year of the Linux deskto