Shade - WebGPU graphics

Spent a bit more time recently improving TAA in minor ways, specifically texture sampling under TAA.

TAA is known to smudge, or blur the textures a little. It’s a known issue resulting from the fact that TAA is kind of a blur kernel itself, with the size of 1 pixel. Which would sound pointless, except we have more than 1 sample within that pixel, just spread over time.

There are 2 key things you have to do when using TAA, with respect to texture sampling:

  1. You have to correctly remove TAA jitter from your UVs that you use to sample material textures. That is to say - we want the UV coordinates to stay put relative to the screen instead of moving with the jitter. They will still move, but they should only move because of differences in world-space sampling.
  2. You need to bias your mip level. Essentially TAA renders at a higher resolution, but our rendering engine samples textures without being aware of that. If you roughly assume that TAA gives you 2x resolution when sampling, which is not true, but let’s assume for the sake of the argument that it is - you’d need to sample lower mip level for you textures, specifically 1 level lower in this case.

I’m doing both things, but I’ve improve the first point slightly, which resulted in more stable UVs. Here are pictures before and after:

after:

If we zoom in on the bricks at the back, the difference is very clear


after

or the lion’s head



For a comparrion, I also tested it with text against three.js, in as close of a match as I could get

Shade:

Three.js


A few closer crops, shade first, three.js second





The differences might be subtle, but TAA is better. There’s less blurring, the textures are sharper.

Native MSAA in WebGL (three.js) is a great technique and has a significant advantage over post-processing techniques like TAA in terms of how much information it has available. But we can beat MSAA it with enough mathematics thrown at the problem.

Here’s another more realistic shot




Note the pavement detail and the tread on the front tire

6 Likes

Another demo
image

A few minor improvements:

  1. TAA filtering speed improved, overall perf up ~20%
  2. TAA color clamp modified slightly, giving better stability with less ghosting. Basically exploiting perceptual weights of YCoCg space
  3. SSR combine stage improved, making colors more vivid and closer to being physically accurate
  4. Fixed a few minor shading bugs
  5. Integrated infinite far plane
  6. Improved hashed alpha performance by ~20-40% depending on the hardware, skewing more towards bigger gains on the lower end
  7. Separated background velocity calculations for TAA, which makes it rock-solid now, previously it would smear ever-so-slightly under motion.
  8. Integrated STBN into the engine

I realized that I’ve been putting out demos of various things that weren’t very representative of what people are actually doing, so I hope this will be more useful.

The cool thing here is that there is 0 configuration. All the demo does is load the GLTF for Sponza and places the camera, that’s it. Everything else is automatic, from shadows to post-process.

One more thing, I tried a whole bunch of techniques while going through some SIGGRAPH papers from the past, and decided to try EON (Energy-compensated Oren-Nayar) diffuse brdf, here are the results:

And here’s the Burley

Burley is darker, especially at glancing angles, check out the green curtain in the bottom left of the screen.

I like EON, but without introducing more parameters like fuzz (sheen) or subsurface term - I feel Burley is better, so I’m keeping it for now.

4 Likes

Worked on texture quality again.

I’m already performing texture subsampling via TAA, which gives improved texture clarity, but there’s something else I wanted to do for a long while - rework mipmap generation.

Normally, when we generate mipmaps - we lose detail, because the standard mipmap generation process is essentially a box blur.

We take 4 pixels and reduce them to one by just averaging their values.

What happens with this - is we lose detail. The more mips in the chain - the more detail we lose.

Modern engines like Unity and Unreal use smarter mipmap generation methods. Both Unity and Unreal use Mitchell-Netravali filter for mipmaps. This filter preserves detail during the mip generation.

Here’s a simple visualisation of the standard mipmap generation results

And here’s Mitchell

source : Guest_Jim_

The end-result is not massive, but it’s noticeable, here’s a scene with books, using 4k textures:

If we zoom in, we can have a look what Mitchell filter looks like:

Here’s what that looks like with linear (old) filter:


The differences are a little subtle, but the end result is that textures are less blurry.


Here’s comparrison with three.js, the best I could get.

Three.js

Shade + Mitchell


Three.js

Shade + Mitchell


Three.js


Shade + Mitchell


Quite happy with the result. It always bothered me that we have all those beautiful pixels in the source texture, but we’re losing some clarity on-screen.

Plus, this new addition is basically free. Mipmap generation does take a little longer, but rendering performance is completely unaffected.

2 Likes

Reworked pipeline state management.

There’s an interesting aspect to GPU-resident drawing: you can draw as much as you like in a single call, BUT you can’t switch rasterization state. That is - you can’t go from drawing triangles to drawing points, you can’t change winding order of triangles and you can’t change triangle-side culling.

All of these are often referred to as “pipeline state”. Well, it’s part of the state, but whatever.

Engines like Northlight and REEngine (capcom) explicitly bin draw by pipeline state. This is not particularly novel.

I wanted to be clever, so I imposed a constraint that only “Front” facing triangles are allowed, which works in theory. Because if you need “Back” - you can just duplicate geometry and flip normals, and if you need “DoubleSided” - you can alway duplicate and flip - et voila.

This was costing more performance than I was happy with, so I ended up expanding the rasterizer somewhat to do the same thing as everyone else. On one of the test scenes I use for measuring performance, the “Emerald Square”, I got a nice 33% FPS boost as a result, because I don’t need to duplicate triangles anymore.

It doesn’t look like much, but it’s a pretty challenging scene. Especially this view. First, there are a lot of instances:

There are 2500 instances in total, and most of them are concentrated on the … Square, the one that’s green, you could even call it “Emerald”

We have a ton of alpha-tested foliage, which is a performance killer due to the discard usage. This, for some reason I still don’t quite understand, disables EarlyZ optimization on the GPU.

Finally, we draw 3 cascades of shadows, which is essentially drawing the scene 3 times on top of the main draw

Sounds wasteful perhaps, but it’s the standard in pretty much every game engine to date.

Anyway, all of this used to be ~90 FPS on my hardware, and now it’s a solid 120 FPS, which is about 33% increase. Not magic, just fewer triangles.

Incidentally, here’s what that scene looks like in Three.js

Three.js takes ~2.74ms to render this

If we add 3 cascades of shadows, it would be reasonable to expect to see about 4x time more around ~10.96 ms, totalling around 91 FPS. I’m pretty happy with this gap.

Considering I’m doing more:

  • SSAO + bent normals
  • SSR
  • Bloom
  • TAA
  • various denoising passes
3 Likes

Tried contact shadows, once again.

A while back I had full RTX shadows as a default, then, inspired by Unreal guys, I thought I would cast some rays in screen-space and only if the ray is invalid - I would fall back to RTX.

This was a mixed bag back then, as in truth no ray cast in screen-space is trustworthy. You always get artifacts, it’s just a question of what geometry you need to get there.

Here are a few shots from back then, first full RTX shadows, second just screen-space rays








In the end, I was unsatisfied with the artifacts, so I shelved the idea.

Recently I was working more on SSR, and I thought I would add some shadow micro-detail to existing CSM shadows. This is a pretty well-known technique. The idea is the same as what I did in the past, but instead of full rays - we trace just a few pixels, just enough to restore high-frequency detail in shadow.

The technique is commonly known as “contact shadows”. Here is my latest take on this, using a linear tracer. The shaodws here are purely screen-space





If you look closely, you can probably easily spot artifacts. But, the rays are pretty reliable close-up, the further we trace - the higher the chances that you’ll see something janky.

Here’s what we get if we combine CSM with 16 pixels worth of contact shadows


Most of the time the technique is quite subtle, you will not notice it even if you look for it, but it does add extra detail

Here’s an example, one of the screenshots is with contact shadows and the other one isn’t



Unless you’re some kind of a visual ninja - you probably can’t tell which is which. The first one is with contract shadows, the second is without.

Again, but zoomed in



The nice thing about this technique is that it’s very cheap, especially in a deferred pipeline like mine, where we only shade once.

2 Likes

Was curious what would adding screen-space shadows to every light look like

So here are a few shots. There are no shadow maps here except for the directional light (sun)







Looks pretty, but there are too many artifacts without a proper world-space shadow to fall back to. Contact shadows make sense because they fill in the gaps left by the shadowmap.

3 Likes

Got back to working on GI. The plan is to have an offline mode, where you will be able to bake probes and then load them together with the scene.

So far, almost everything done, I was working on a massive refactor of how indirect lighting is added to the final image in preparation to this.

Here’s how it looks now:

  • We prepare indirect diffuse. This can be straight from environment map or from probes, or anything else really. Right now it’s probes or environement.
  • We prepare indirect specular. This is separate as well. This is a bit more complex, as we get specular from SSR, with a fallback to probes. However, probes really do have irradiance, so the fallback is actually really high quality.
  • Finally, we resolve indirect, using the split-sum approximation and combine it with direct lighting.

As a refresher again, here’s what the irradiance probe data looks like:

It’s packed into an atlas with a desired resolution, currently 32x32 pixels.

How can you pack a probe into a square texture?

I hear you ask. We use octahedral encoding.

And to show how good the specular indirect is - here’s sponza shot, rendering indirect specular wiith just the probes

You’d be hard-pressed to find any issues in the reflections. The probe placements are not perfect, and you can see some artifacts here but in the final combined render you will not find these

For reference, here are a few shots with and without GI




5 Likes

Have you considered comparing the rendering effects of web 3D engines like babylon.js and playCanvas.js?

1 Like

Seems like that would take time away from developing his awesome engine? What do you need to know about the differences, perhaps I can help?

1 Like

Yep, pretty much exactly that. It’s embarrassing to say, but every time I benchmark the engine against something visually or in terms of performance, it takes hours. But I will attempt, at least a little :sweat_smile:

Lighting is not standardized, cameras are not standardized, GLTF support is all over the place (pretty much everyone implements materials differently, and geometry too in subtle ways).

The problem with engines like playcanvas and babylon in particular is that their support for things like post-processing and GI is spotty.

playcanvas has lightmaps, I haven’t tried the implementation, but this is largely incumbent on the user to supply properly UV-unwrapped models. And lightmaps are flat, there is no specular response. As far as brdf goes - it’s pretty middle of the road, last I checked, not cutting-edge, but nothing incredible either.

babylon keeps re-inventing itself, new features pop up and die, currently there is a plethora of exciting features, like:

  • Reflective Shadow Maps
  • IBL shadows
  • SSR (screen-space reflections)

The problem is in support and integration. If you say “okay, I like all of the above, can I have this?” the answer would be “no”, each technique works in isolation, without being aware of the others. And you can’t just mix them, sadly.

Then, there’s the question of AA (anti-aliasing). Once you go for any kind of post-process, you pretty much are forced to throw away MSAA (native AA)

Case and point, here’s the reflective shadowmap demo from babylon:

If we zoom in a bit, we start to see artifacts

If we disable FXAA, which is a pretty horrible AA to begin with, we get to see what we’re working with:

Which is jagged

Let’s take a look at the SSR example

Again, let’s zoom in a bit


I’m not even sure what exactly they do here, but the aliasing is bad, reallly really bad. If you’re in doubt, consider that the rims have have sharp lines going radially

In the screenshots above they are completely distroyed, and under slightest camera motion there’s just a mess of pixels in there.

Now, with all that, there is a solution, and that’s TAA, however, TAA is an intrusive technique, it requires your entire rendering pipeline to be aware of TAA. Here’s babylons:

I won’t bore you with details, but it’s not a good implementation. Solid TAA is hard to do, now I’m stupid, so it took me collectively over a year to tune it to what it is currently in Shade, but even for smart people it’s a very hard problem to solve. And I dare say that I have solved it. And I can say with confidence that Babylon.js - did not.

Where does this bring us. Can something like playcanvas look as good as Shade? - without static solution like lightmaps - no. With lightmaps, close, but only for fully diffuse (roughness = 1) materials. And only for fully static scenes. Once you start having some specularity in the scene - reflections will be very apparent and will dominate your perception.

Can babylon.js look as good as Shade - no. You can import lightmaps and be in the similar situation as playcanvas, you can add SSAO (which is not very good in Babylon), you can add SSR and IBL shadows and TAA and RSM, then you’ll need to spend a long time integrating all of these to produce 1 sensible image output, and then… then you will still not be there, because there is no light bounce.

I can say with certainty that Unreal looks better than Shade, but Unreal looks better than everything else :sweat_smile:

I wrote Shade from gound up targeting top-of-the-line visuals, so it’s not exactly fair to compare Shade with the likes of playcanvas of Babylon.js just on that ground alone. Shade was written as a deferred renderer from the start, with TAA integrated from the start, with indirect lighting being a major part of the engine as well as post-processing.

It’s not even that I playcanvas or babylon, or even three.js don’t look as good as something else, like Shade or Unreal or Unity HDRP - it’s more that they can’t. They can’t compete on the high-end visuals because of the architecture of the engine itself.

You can certainly build something on top of, say, three.js, that will look as good as Unreal. Seriously - you can. It would be hard work, but you could. You would be fighting the architecture along the way, and you just might be better served writing it from scratch.

It may sound bitter, but that’s 100% not the intention. I’ve been there, I’ve been writing that exact renderer on top of three.js for about a decade I suppose, and Shade is just that natural next step. Where I said to myself “WebGPU is a good-enough reasons to start from scratch”, and I happened to have the time.

To leave something a bit more tangible, here are 3 GLTF viewers from 3 respective engines:


three.js



Babylon.js



playcanvas


And here’s Shade


There are different FoVs, the lighting setup differs etc. But these are all “defaults”. I.e. I didn’t do anything special in shade, I just loaded a GLTF and positioned the camera, shadows are automatic, GI is automatic, all post-processing is dynamic. There is no manual tuning.

7 Likes

That last screenshot… a picture is worth a thousand words. Any chance to increase the gamma a bit?

On the topic of WebGPU - would three or Babylon stand in the way of implementing something as good as unity, or would they actually help. My understanding is that the API is so low level, that you simply cannot optimize for say a game engine, and a generic rendering library. Thus, it would be next to impossible if not building it from scratch with your own purpose in mind. I always saw the examples as starting points, the idea is that one would pick and choose and adjust according to needs.

Also, not sure if you’re aware, but people are linking to this thread when explaining what three does with WebGPU. It might not be clear that we aren’t actually looking at images created with three? We aren’t, unless it’s a comparison?

By all means, if you’ve done this with WebGPU there is no reason Babylon or three wouldn’t? But the philosophy, architecture etc, may not necessarily guarantee that this is the direction that either is taking. Perhaps for the ease of use, they might have to make compromises.

A good example might be, would I use Shade to visualize point clouds or some volumes? Do I need anything more than lambert for example to render an iso surface or this whole engine to apply some kind of clipping on said iso surface. I imagine the answer is no. Three r68 or so, would probably be perfect.

4 Likes

I noticed all Babylon demos have the equivalent of Threejs renderer.setPixelRatio(1) regardless if the display is actually 2 or 3, so the web browser’s native renderer is spreading large pixels across multiple hardware pixels.

Would it be more fair to run those examples with proper devicePixelRatio? How does it look if you add engine.setHardwareScalingLevel(1/devicePixelRatio) to the Back to the Future SSR example?

1 Like

Looks better

But it’s still there.


Unrelated, but I’m working on GI last few days, and thought I’d share a fly-through on the famous UE4 “Sun Temple” scene

2 Likes

Yeah, does look a bit poppy!

The GI is looking good! There’s one thing that my eye picked up on: at 0:18 the light seems to penetrate through the pot on the floor. I see it happens a often with default Three.js, lighting reflects off all surfaces, even those that are behind another surface such that light can’t reach it directly. Sometimes it becomes obvious and the human brain picks up on it. How may that be solved?

Ha, that’s a good catch. It’s actually nothing to do with GI per-se, it’s just lack of shadowing on the point lights. That Brazier has a pretty big but dim light attached to it

As far as the GI goes, it’s so complex it would take hours to explain everything. But I will try to just explain the relevant bits.

So… GI. It’s largely based on Gentle Light Probing write-up.

This is DDGI (Dynamic Diffuse Global Illumination). How do we deal with light bleeding?

There are 5 broad parts:

  1. GI probes are split into cells, cells provide locality.
  2. Each probe has it’s own depth map, so we have pretty accurate visibility to any given probe
  3. When weighing probes, we take surface normal into account to check for visibility, i.e. “Is this surface visible from probe’s perspective?”
  4. We apply parallax correction when sampling. This is broadly based on “Local Image-based Lighting With Parallax-corrected Cubemaps”.
  5. But we perform additional refinement here, similar to how parallax occlusion maps work.

The last point about refinement is something I was working on in the past couple of days.

To try and help visualize this, here’s a test scene called “PicaPica” lit by the direct+GI

Here are the probe locations

There are a total of ~2500 probes in the scene overall.

If we isolate just the radiance from the probes, we get this

It’s not super accurate, but consider that each pixel samples from it’s own set of 4 probes, and we do points 1 through 4 here to make sure we don’t “leak” light.

Now here’s the last step, step 5, the refinement of parallax correction

Pretty dramatic improvement

And here’s a path-traced reference with mirror reflections


To show it from a different angle, here’s a famous Sibenik cathedral model lit by just the rays of light getting through the tower windows

And here’s a “Breakfast room” scene

lit by just the light streaming through the blinds.

Here’s a different angle

Note the reflection of the windows in the glossy surface of the jug and the teapot and shadowed interior of the jug


Back to the light shadowing in the temple scene

It definitely needs a shadow, I’ve been putting off doing omni-directional shadows and I should get them done sooner rather than later :sweat_smile:

3 Likes

Why does it feel that all these images would benefit from increased gamma?

This reminds me of the arch viz world pre “linear workflow”. Exteriors just kinda worked, they might even benefit from higher contrast, but interiors benefited immensely from fiddling with the gamma.

1 Like

Do these feel better?




I’ve been using boosted exposure previously, same as what three.js does, boosting exposure by about 2.2 (not the same as gamma)

It compresses the color a bit more in the tonemapper and you get brighter highlights and more midtones.

The reasons I stopped using it is because it blows out the colors and forces compression in tonemapper sooner than you’d generally want it. But I’m curious if this looks better to everyone?

2 Likes

Much better, I would say.

1 Like

Better but not by a whole lot :frowning:

It feels like this:

It looks like info is there just not very visible.