Shade - WebGPU graphics

mrdoob · October 11, 2024, 7:01am

@Usnul

Have you has a look at the new WebGPURenderer / NodeMaterial / TSL architecture?

I feel like all the stuff you’re working on should be able to work well in the new setup.

For example, here’s the new SSRNode:

https://raw.githack.com/mrdoob/three.js/dev/examples/index.html?q=ssr#webgpu_postprocessing_ssr

github.com/mrdoob/three.js

Addons: Add `SSRNode`.

mrdoob:dev ← Mugen87:dev3

opened 11:45AM - 09 Oct 24 UTC

Mugen87

+582 -0

Related issue: #29295 **Description** This PR ports `SSRPass` to `WebGPURe…nderer` as `SSRNode`. I have not ported all features yet (reflector and bouncing) since I'd like to focus on the existing open issues first. Good things first: Instead of 3 passes (beauty/depth + normals + metalness), `SSRNode` now gets its (shared) inputs with just one pass thanks to MRT (🎉 ). Besides, #21487 should be partly solved since the implementation use the `metalness()` TSL function to render the actual metalness values of the materials. What's missing is some sort of attenuation during the SSR blending that depends on the metalness/roughness values. Finally, `SSRNode` performs no hierarchy traversal like `SSRPass` does for its metalness rendering which is also a plus in performance. The bad thing is that the fragment shader (to be more precise the raymarching portion) runs poorly in WGSL. On a mac Mini with M2 Pro and a 5K resolution I get just 42 FPS. With WebGL, I get the expected 60 FPS. So that needs a closer investigation. Especially since the SSR already runs at half resolution (you can tweak this via a `resolutionScale` property). I took the time to document the `ssr()` TSL function since I bet there is some room for refactoring and performance improvements. We might want to study different SSR implementations to further improve this one. Inputs from the community are welcome!

Usnul · October 12, 2024, 3:02pm

@mrdoob

I’ve had a look at all of those, didn’t see the SSRNode before though. I do have some thoughts.

WebGPURenderer

I think in general the approach is sound, and I like how minimalist a lot of the code is. It’s very few lines for the benefit that the API provides. I admire that.

To me, the issue is in the overall architecture. Three.js is going for a fairly traditional rendering architecture with WebGPU. Which is fine, and you can build other architectures on top of it, but that’s not what I wanted to build.

My goal was to build a GPU-resident renderer, something that can render millions of instances in real-time without much CPU overhead.

Let me paint a rough picture:
in 2014 DirectX 12 was released. It provided a much lower-level graphics API.
Vulkan came soon after, now we also have Metal. The primary problem that these APIs came to address was this:

pushing commands to GPU is slow, let’s make it faster.

To do that, command queues were introduced and we were given the ability to record command buffers in different threads. This was huge, you could now push almost 2 orders of magnitude more commands to the GPU each frame. The “bottleneck” of CPU ↔ GPU communication was widened.

At the time, it was believed that this widening would solve the issue for good, but GPUs got faster and CPUs didn’t really.

As AI started to adopt GPUs and started to take off, money started to poor into the manufacturers’ hands from clients that wanted to do general purpose compute on the GPU. They didn’t need “shaders” they had compute. So compute shaders started to develop and with time - dominate.

Graphics programmers saw this happening, so they started to move more and more of traditionally CPU-based work loads to the GPU. A lot of that trend was driven by the console market as their hardware architectures incorporated compute more readily, and the hardware architectures were more.. fluid, where you had a bunch of not-quite-CPU cores that could do a lot of work, but required special programming model, just like compute shaders.

Let’s fast-forward to today, a lot of graphics engines are idling on the CPU side, there’s little to do for them there, as most of the work is happening on the GPU. All you do is build descriptors of change, and sometimes even that is handled on the GPU.

With this, the paradigm on drawing one object at a time and sending a bunch of commands to the GPU for each object is dying. I don’t think it’s dead, the graphics APIs like Direct X, OpenGL, Vulkan etc are still centered around the concept of attributes, indices and traditional draw. But these are getting less and less use.

Granted a GPU-resident renderer is much harder to build, there’s more complexity, there’s less educational material and the APIs allow you to do that, but don’t really encourage you or help you in any specific ways.

So, long story short - WebGPURenderer the way it is now, and the direction it’s going is not compatible with what I’m doing. On a pretty fundamental level.

Three.js is fast-enough, it’s much faster to get into, the programming model is very clear and straight-forward. I’m creating something, arguably not for today but for 2+ years from now.

NodeMaterial

I like the idea. I think Node-based languages are really powerful. I wrote a few in my days. Heck, meep has a few node-based languages inside of it.

The problem I see with node-based languages is the user interface. A node-based language can be useful as an API, but only if it is sufficiently high-level. A low-level node-based API is just a pain with no gain. You can sort-of see it in Unreal and Unity, they have node-based shaders, but they offer very complex nodes for you to use there. I have used Unity’s shader editor a fair bit, and I came to realize that it’s a massive pain. Because it doesn’t offer a good user interface. The UI is slow, there’s no search, grouping is non-existent etc. So, in my view a node-based language without a great UI is a bad investment.

Can you write an SSR shader in a node-based language? - yes, as evidenced by @Mugen87 's work that you linked. But,

is it clearer than GLSL?
is it more concise than GLSL?
does it offer lower system complexity?
is it easier to learn?
does it compile faster?

I know I’m cherry picking here, but I hope this clarifies my view a bit.

Do I think GLSL is great, or even WGSL - no, I think they are pretty bad languages. Especially WGSL is a massive pain. But it’s a standardized pain, with a lot of reference material. I was missing module system in WGSL - so I wrote a bare-bones one like so:

/**
     *
     * @param {CodeChunk[]} dependencies
     * @param {string} code
     * @returns {CodeChunk}
     */
    CodeChunk.from(code, dependencies = []) { /* ...*/ }

This isn’t perfect, but it’s good enough for me. I looked at a bunch of different language abstractions for WGSL in particular, or, let’s say SPIR-V, the problem is that they all sacrifice expressiveness and specificity for the sake of compatibility. Just like, say TypeScript is dominant because it only compiles to 1 language - JavaScript, so it’s able to capture every aspect of JavaScript perfectly. As soon as you start to target multiple languages in translation - you’re playing a losing game.

Here’s a simple example: Unreal engine can target WebGL as a compile target. And it works! It looks like though. Why is it? Do guys at epic not know how to use a graphics API? It’s because the compiler is forced to target the lowest common denominator, the actual compiler doesn’t know about WebGL in particular, not really. It targets OpenGLES 3.0, and then disables all features that don’t translate directly to WebGL.

There are some successful examples out there, such as C, or LLVM. But they have the benefit of C not being a target language at the time, and actually providing more conciseness and expressiveness, and LLVM not being oriented towards programmers.

I think NodeMaterial concept is not bad, and can be great, but it needs to come with an amazing set of tooling, specifically a UI.

TSL - I don’t dislike it, but I don’t love it. It’s WGSL with extra steps, and you’re using it as a declarative language written in a functional language (JavaScript) which makes awkward on top of what I mentioned before.

SSRNode

I’m not sure this is speicifically valid or not. But SSRNode is a toy. If you read the original code - you can see the step-wise ray marching through the depth buffer, you can see the basic denoising with the edge-preserving blur pass, but it’s not a practical tool.

The traversal is way too slow, and it considers everything to be a mirror surface. You can use it to produce some pretty pictures, but it’s not physically grounded in the slightest, it doesn’t respect StandardMaterial and it’s not energy-preserving.

I don’t think that’s a problem, it’s a good teaching tool and there are use cases where you have glossy everything where it behaves close to “realistic”.

PavelBoytchev · October 12, 2024, 3:29pm

What if at some point Three.js evolves by adding GPU-based meshes, scenes and other entities?
Mesh → InstancedMesh → GPUMesh
i.e. instanced mesh standing somewhat halfway between traditional CPU mesh and an eventual future GPU mesh.
What if at some point CPUs and GPUs themselves merge into one processor, similarly to how FPUs are nowadays absorbed by CPUs?
What it AIPUs become so dominant, that the majority of seasonal programmers start to ignore the mere existence of CPUs and GPUs?

dubois · October 12, 2024, 3:31pm

Isnt the point of a system like this to just be able to add a node and have it work automagically? It doen’t seem that the target user is concerned with any of this?

I wouldnt be surprised if there was a react solution today, such as:

<MyMaterial>
    <PBREffect/>
    <SSREffect/>
</MyMaterial>

Or what in the old days of three.js would look like:

myMaterial.ssr = true

dubois · October 12, 2024, 3:35pm

This is the first take on TSL that i see like this, but you also seem to be doing much more lowe level work than the people that have been raving so far. Any chance you could elaborate on these cons?

Usnul · October 12, 2024, 3:59pm

It might, and I think it would open up a new market for 3d on the web. It would make the engine less approachable though, and require some pretty good documentation. All of this would put a strain on development resources and increase barrier to contribution.

That’s eseentially what I have, there’s just a mesh, no instancing. It makes sense when all of your data in on the GPU and you’re building your draw calls there as well. Instancing basically becomes the default and the cost of 1-instance draw overhead disappears.

Ha, I like to think about the future, but honestly I’m not that good at guessing the hardware future. I think that we’re not going to have something radically different. It’s true that we’ve had decidated hardware for various things over the years that have been absorbed into the CPU/GPU or even the chipset. But the overall architecture hasn’t actually changed that much. We still have an ALU, and we still have tiered memory model. As long as you have distance between ALU hardware - time is going to be a factor, and moving code as close to the data will be a trend.

GPU-resident architecture is essentially an observation that your data is on the GPU, so instead of controlling the logic from the CPU that’s far away, why not control it directly on the GPU as well.

I find that many “educators” like to sensationalize the topics they describe. I think that’s a really sad fact, because things like GPU-resident architecture sound scary and arcane - but they are really trivial to understand, it just takes a few more words and a little bit more time. When we present topics as being “super complex”, it makes others think that there’s no hope, they are too dumb to ever understand it.

It’s true that some topics are very very complex, but typically it’s because there is a lot to learn, not because each individual thing is hard.

Yes, you’re right. However, when you do this - a shader still need to be compiled. And TSL will add a lot of overhead to that compilation process. The compiler for TSL is not nothing either, so you’ll have to pay the cost of (down)loading that as well. Now you have the same functionality, but your bundle is larger and your app takes longer to start.

That’s on the user’s side. On the programmer’s side you’re affected more directly, most of what I wrote was in that key, about the developer experience.

In software engineerign we work with abstractions. Your CPU runs instructions, those instruction are created from compiling x86 assembly, which was compiled from C code in case of Chrome, for example.

In the same way, TSL is an abstraction on top of GLSL. An abstraction is not free, you typically sacrifice performance, because you lose some semantic power. You also add compilation/translation overhead.

It’s a trade-off. What does TSL give you for the negatives it incurs?

It lets you run your code on multiple graphics APIs, such as WebGL and WebGPU
It provides modules, you can now split your code across multiple files which helps in software development process by improving code reuse

I think that’s about it. You could argue that it provides static analysis as well, but I think this is really debatable, as DAWN (the WebGPU backned in Chrome) and driver vendors already do a ton of static analysis for you and have way more resources to do so than our three.js project here.

Again, I think nodes make sense, but not as an API, not in this case. ← only my opinion though.

dubois · October 12, 2024, 4:12pm

Is 1 feasible without 2 in the real world? If one is using things that are available on WebGPU but not on WebGL, does it defeat the purpose of running on multiple APIs? Would one still assemble different programs, but by the virtue of 2 this should be easier? (Provided one is willing to pay the price that you mentioned).

Usnul · October 12, 2024, 4:12pm

Specifically on TSL, as people seem to care about the topic.

The idea of a “common” language for the GPU is not new at all. In many ways that’s what GLSL or HSL are. They are common languages to vastly different GPU architectures.

The idea of having a language that compiles to multiple different API-specific languages like HLSL and GLSL is not new either. Most of the big companies out that that EA with their Frostbite engine had a node-based shader language that compiles to HSLS and GLSL for a long long time now.

Unity guys thought to themselves that they are very clever, so they invented their own oh-so-special shader language. Even godot has their own shader language.

NVidia created their own language not so long ago, called slang. Well, if nothing else - I like their sense of naming.

AMD’s famous Fidelity-FX has a kind-of light-weight language abstraction called ffx, very unimaginative. It targets GLSL and HLSL mostly.

Unreal tried a few different options and settled with HLSL and using cross-compiler toolkit from Intel to target other languages.

My point is - TSL seems like a new shiny thing, but only in the very narrow scope of a browser and WebGL/WebGPU APIs.

Just to make sure I don’t get more flack for this than I deserve - I don’t think that TSL is a bad idea. I think it’s an idea that need many more pieces to be good, in my view.

Usnul · October 12, 2024, 4:21pm

I think you pretty much got it right. Imagine the case of using a compute shader. These don’t exist in the WebGL, so what does this abstraction layer do for you?

It can do one of two things:

Cry and fail
Translate compute shader to an equivalent functionality using texture shaders or transform feedback feature of WebGL

The first option is kind of useless, as it means you don’t really support both APIs, and the second has extra cost, as performance will likely be bad and you will be pulling a massive engine for correctly translating compute to WebGL.

How about atomics, WebGPU has atomics, WebGL does not. How do you reconcile that?

You exclude atomics entirely
You use atomic api in WebGPU and just hope for the best without atomics in WebGL (Spoiler: not going to work out well)

How about memory address spaces in WebGPU that give us groupshared memory?

You get the picture. You have two languages that are inherently different.

With that said, there are more commonalities than there are differences, and you can share a lot of code. So in your shaders you might be able to share, say 95% of the codebase between WebGPU and WebGL. The problem now is that you have to have disclaimars like:

“This TSL module will only work with WebGPU”

Aaand we’re back to asking the question: Why not just write WGSL instead then?

PavelBoytchev · October 12, 2024, 4:42pm

( silently noting that the discussion now has the potential to slip off into off-topic )

dubois · October 12, 2024, 5:22pm

That did not sound very silent lol. Is it regarding mrdoobs post? I think its super valid to gather user feedback.

@Usnul thank you for the explanation!

Antonio · October 12, 2024, 9:30pm

Way to go @PavelBoytchev, this sounds not only feasible but also a great deal.

I’ve been myself searching for GPU data structures as a more efficient way to work, specially for geometry, about a month ago I found RX Mesh from Autodesk, and there are others exploring the same concepts. Somehow your post gives a very vivid picture of “totally doable ideas”, trully inspiring.

And lets just appreciate @Usnul efforts in sharing his advances and receiving feedback, this forum is pure gem.

Lawrence3DPK · October 13, 2024, 12:03pm

Rxmesh looks really promising! Just seen this also https://youtu.be/3EMdMD1PsgY?si=G2JXCCJQLPYVtiuu

Usnul · October 13, 2024, 2:06pm

Yep, that’s pretty much what Shade does as well. I watched that presentation and the referenced one from Remedy, my architecture is very close to theirs.

Mesh shaders are cool, but by many accounts compute shaders are close-enough in performance that it makes small amount of difference.

Also, you’d need full-fat drawIndirect API if you work with mesh shaders, which we don’t have in WebGPU. We have a severely lobotomized version which is practically useless.

Therefore - compute shaders

Attila_Schroeder · October 14, 2024, 8:38am

I have dealt with this intensively. I made a PR last week about this. I owe thanks to Sunag and RenauldRohlinger for their suggestions for improvement an review for this.
I worked very intensively with the node system to understand it better because it was very important to me to have drawIndirekt in the WebGPUBackend.
With this PR, three.webgpu.js r170 will supports drawIndirect/drawIndexedIndirect. We then really have the opportunity to use INDIRECT drawBuffers in compute shaders and assign them to geometries. TSL also always allows the option to use raw WGSL if one prefer that like I do. In any case, I’m really looking forward to r170 with the drawIndirect extension

Usnul · November 11, 2024, 12:40am

A small update. Added bloom post process effect.
Here’s a shot with some emissive materials

and here it is with bloom on:

The light intensity is ~2.2 here, and emissive coefficient on the traffic light LEDs is ~10.

Here’s a shot with different light direction to put most of the scene in shadow:

and with bloom on

for comparrison, here’s a version without any post process (SSR, SSAO, TAA, Bloom all off)

and again with all post on:

Here’s a shot of Sponza with high directional light intensity

without:

here’s Bistro scene, I cranked up emissiveness to 10 on all emissive materials

and here’s without

For those who are interested in details, the technique I use is a hybrid of a few different existing techniques.

The basic structure is the same as Unreal’s bloom (not the convolution one), that is 5 mip levels with progressive blur.
The technique is fully HDR
There is no threshold, we just add bloom on top, that’s also why the version with bloom looks brighter overall
Blur and other filtering is mostly in line with 2014 SIGGRAPH talk by Sledgehammer games’ Jorge Jimenez
I’m using 2 render targets for the actual work, one for downsampling and one for upsampling. The targets are 1/2 resolution, first downsampling pass uses source HDR image, and compositing just relies on the final 1/2 upsampled bloom image and bilinear hardware filtering for upscaling. I’m guessing Sledgehammer did something similar.
Bloom implementation, like everything else in Shade, relies on RenderGraph, so render targets are reused by other effects, such as SSR
I fade out coarser mip intensity during upsample using 0.8 as a multiplier. No specific reason for this, just empirically derived to look good. This makes the “glow” fall off a little more sharply with distance. To my eye it’s more attractive for less gamy applications. A compromise.
There are only 2 parameters exposed, number of mips and intensity. However, number of mips has very little impact on performance as we do less and less work with every mip. The screenshots are done with intensity = 1.

Because the filter size is quite small, with only 13 taps - the technique is almost free in terms of performance. The luma filtering helps get rid of temporal instability and fireflies.

Overall I’m very happy with the results.

I was on the fence about bloom before, but I think if it’s implemented well - it does enhance the image in a very significant way, and virtually for free too.

user123 · November 11, 2024, 11:25am

A while a go you suggested you may have some live demos, have I missed them?

Usnul · November 16, 2024, 1:54pm

Demo link

Controls

Mouse
- left drag : Rotation
- right drag : Pan
- wheel : zoom
Keyboard
- AD : pan left / right
- WS : foward / back

Touch controls are incidental, but should work at least to some degree ( if you’re lucky enough to have WebGPU support on your touch-screen device)

What’s there:

Post process:
- Screen-space ambient occlusion (GTAO)
- Screen-space stochastic reflections
- Temporal Anti-aliasing ( fallback to FXAA on failed reprojection)
- Bloom
- Standard ACES tonemap exposure and dither stuff
Soft RTX shadows
- Spatial filtering is on, but temporal is off, still not sorted out
Full GPU-driven draw pipeline
Culling:
- Occlusion culling based on HZB
- Progressive frustum culling ( instance > meshlet group > meshlet > triangle )
- Small primitive culling ( elements that fall between texel centers would not be rasterized anyway, so we filter them )
Depth-buffer-based material evaluation (similar to stencil, but all materials at once instead of one stencil per material, see Unreal’s SIGGRAPH presentation on Nanite for details )
IBL (limited to diffuse only, as SSR takes care of the specular part)
Physically-based sky, see Unreal’s presentation by Hillaire
Specular anti-aliasing (normal filtering really)
More physically-accurate diffuse material model based on Disney’s Burley instead of Lambert

Just to preempt some of the expectation:

Shadows are going to take up anywhere from 50 to 95% of render time, this is due to the lack of RTX API in WebGPU, RTX shadows are just expensive unfortunatelly. I’m working on an alternative solution, so this is a known perf limitation right now.

Disclaimer

This is not open-source, and the code is under copyright. I know that sucks for some of you, but it’s literal months of my work out of pocket, so please respect that.

dubois · November 16, 2024, 4:20pm

Would you consider just contributing this to the main threejs repo? Under “examples” or “addons”?

Usnul · November 17, 2024, 4:13pm

Hey @dubois ,

No, I don’t think so. My stance is relatively simple:

I don’t think this will benefit those that are trying to learn, instead it would benefit those that have commercial needs and years and years of experience instead.
This would not be maintained by anyone except for me pretty much, again, because of complexity

As it related to three.js specifically, the architecture is drastically different, so it doesn’t really fit.

But the main thing is that this is intended to be a commercial product at the end of the day, so open-sourcing would not align with that.

I still think there’s a lot of value in just seeing the tech run in the browser.

A lot of the times you think to yourself: “I don’t know if this technique would even be viable or not”. Then you spend weeks prototyping, only to arrive at the answer “no”, or you arrive at something that kind of works, but you give up because you don’t believe that you could get to something truly viable.

Seeing a complete solution eliminates a lot of the guesswork. For example, Epic’s Nanite has been pretty much solved in late 90s and early 2000s, I know this because I researched the topic extensively when working on my own implementation. But why hasn’t anyone made a commercial version of this before? Because it’s a complex thing that would take a large time expenditure and it’s not clear at all whether it’s viable at the end of the day.

Once Epic published their own work, suddenly you see versions of it popping up left and right, because guesswork is eliminated.

Anyway, I do hope this this will serve as an inspiration and a proof of viability to others.

Topic		Replies	Views
Three-gpu-pathtracer: A modular shader-based path tracing extension for three.js! Resources geometry , shaders , physical-material , raytracing-renderer , three-mesh-bvh	48	14212	July 4, 2025
Gentle Light Probing Showcase geometry , lighting , light-probes , global-illunimation	32	7257	January 26, 2025
Clustered Rendering on WebGPU Showcase shaders , lights , webgpu , clustered-rendering	5	561	June 20, 2025
R168 WebGPU - Chasing Shadows - fixed in r169 Questions shadows , webgpu	37	842	September 17, 2024
What's going on with WebGL2? WebGPU? Discussion	59	14044	May 12, 2020