For a while, I’ve had an idea to write a graphics engine. I like three.js, and I understand why it is the way it is, but I always wanted something with the ease and elegance of three.js, but with the features of something like Unreal.
Since webgpu was announced, it started me thinking that you could finally implement a fully GPU-resident renderer in a browser. This year I had some time, so I though I would give it a try.
Before I go on, here are some mandatory pictuctures:
Why make another graphics engine?
I’ve had a set of very specific goals:
- Occlusion culling. I don’t want to draw stuff that’s behind other stuff. This is one of the biggest performance killers for complex scenes. It’s so crucial, to the point where there is even a company that just does occlusion culling and does very well for themselves.
- GPU draw dispatch. I don’t want to dispatch an individual draw command for each mesh. I’ve done some experiments, and on the web you’re limited to about 60,000 draw calls per frame (at 60 FPS), as long as you’re using the same material and geometry. If you switch geometries, that drops significantly, and if you switch materials as well - you’re in the ballpark of about 3000 draw calls per frame. Of course CPU/GPU etc. will impact these numbers, but I’m on very high-end hardware, so these numbers will be significantly lower typically. I want to send the entire scene to the GPU and let it draw all objects, there should be no communication with the CPU. This gives you best possible GPU utilization and essentially removes draw call limit. Meaning you can draw even millions of objects per frame. Unreal has been so-called GPU-resident for ages now, and Unity seems to be pushing for it now as well.
- Deferred visibility-based shading. This basically means that we only want to shade pixels that will be visible on the screen. Eliminate all material and lighting calculations for pixels that will be occluded. This is what Unreal is doing since 4.2, and it’s a really significant step to allow large number of materials to be allowed in a single scene.
- Efficient and good-looking post-processing. Not much to say here, but I think a modern graphics engine has to come with things like SSAO, SSR and AA out of the box.
- Global illumination solution. Again, I think that having a turn-key global illumination is something is possible today and as such it’s a very desirable feature for a modern graphics engine. You can see that if you look at any in-house graphics engine of any major AAA game studio.
- Rock-solid shadowing solution. Not much to add here, shadows are pain to tweak, and there is no technical reason today for that. There are solutions that remove that need.
As I said earlier, three.js is great, but it’s optimized as a project towards being easy to understand and easy to work with. This goes against complexity, and complexity unfortunately is inherent in graphics. This means that under the current philosophy of three.js - it is unable to have complex features.
You could extend three.js quite far, you can add features and modify existing behavior, and I think the team has done an incredible job with that. In my experience three.js is getting more and more flexible and extensible every year. That being said - it is still quite rigid. And it has to be, in some sense. Extensibility sacrifices performance, it increases complexity and creates a headache for maintainers.
Since 2013 or so, since I got into three.js, I have been extending it, with shaders, with alternative lighting, with simulation systems etc. But I would always run up against the limits of what I am allowed to do.
What is already achieved
GPU draw dispatch
Here’s a scene that blender foundation released, that has 40,765 individual pebbles:
Here’s the same scene in three.js 162:
You’ll notice that it run’s around 9.5 FPS
Here’s another scene that looks pretty bad because of blender export, but it has 118,352 blades of grass and flowers
Here’s Shade:
And here’s three.js again (FPS is 1.5 or so):
Just to re-iterate, no matter how complex your scene is - it’s always O(1) work on the CPU side, all of the actual culling and draw command dispatch happens 100% on the GPU. This means that the only bottle-neck is your GPU, if it can draw 100,000 objects per frame - that’s what you’ll get, if it can do 100,000,000 - you’ll get that instead.
Occlusions culling
Here’s sponza scene:
We can see that culling is doing a good job, there are ~270,000 triangles in the scene, and we’re only drawing ~96,000 in the current view
If we move the camera behind the pillar, we’re not going to see much of the complex geometry in the curtains, the lion head at the back etc:
And thanks to occlusion culling, we’re not drawing those triangles:
We have reduced the amount of work for the GPU by 85%.
The culling is conservative, and it’s heavily based on Activision’s and Epic’s published research, but computationally it’s almost free.
Deferred visibility-based shading
Not too much to say here, visually you can’t see much of a difference. It eliminates over-shading completely. This is huge. Another huge benefit is that there is no material extraneous switching. That is, all pixels on the screen that should be drawn with a particular material are drawn at the same time. Even if they span across multiple separate meshes. Another another benefit is in scenes with high poly-count, where you start running into poor quad utilization in hardware and end up shading texels that are not even drawn due to how shading hardware works.
Efficient and good-looking post-processing
GTAO
So far I’ve got a FrameGraph system implemented, I’ve had a prototype laying around for years now, but I never managed to fit it into meep’s graphics pipeline, so with a little bit of tweaking I got it to work in Shade with webgpu, and I must say it’s a dream to work with. It automagically takes care of efficient resource allocation and aliasing. Some people are talking about postprocessing in threejs and saying things like “let’s reuse FrameBuffer X for Y, and avoid extra draw in Z”, well - FrameGraph just gives you all of that for free, you don’t even need to think about it. And if you want to turn something off - you can do it at runtime and it just works
I’ve implemented an SSAO solution following GTAO paper and an excellent resource published by Intel, the solution uses temporal and spatial blue noise, so you get an incredible value out of it, with just 9 taps you get pictures like these:
Here’s without:
Here’s another one:
And without:
Here are some screenshots from pure AA without denoising:
One key trick that is used here is sampling of depth mip map, this way we can accumulate much larger contribution with very few samples. That is, with just 2 depth mips we can effectively sample 16 pixels with just 1 tap, I’m using 5 mip levels, so we’re getting up to 1024 texels worth of contribution with each tap. To my knowledge, this is state of the art currently. The implementation is very cheap, cheaper even than what we have in three.js, and incredibly good-looking.
TAA
I’ve looked at a few different options for anti-aliasing. From the start hardware multi-sampling is not an option in deferred, so we have to look for alternatives. I’ve tried FXAA, and it just looks terrible, no surprise there. I wanted to go for TAA from the beginning, so after reading a whole load of papers I got something that doesn’t smear and has no visible ghosting. Here are 2 screenshots with and without TAA:
here’s with TAA on:
And here’s with TAA and mip-map bias:
The effect is somewhat subtle, but you can notice it in the book titles especially as they become more crisp and readable.
For completeness, here’s the FXAA:
And here’s the GTAO + TAA:
Colclusion
It’s a new era for graphics engines. We can see influence of traditional graphics engines all over even modern API’s like WebGPU, but for high-performance graphics you just have to go with a GPU-driven graphics pipeline. CPU is no longer in charge, most of your data doesn’t even live on the CPU anymore, it’s just a glorified butler for the GPU now.
I still don’t have a shadowing solution. I looked into CSM and virtual shadow maps, but I’m not entirely happy with either, as the first is quite expensive in terms of draw calls and the second is a pain to write using WebGPU as it turns out. I might go for ray-traced shadows with a bit of temporal reprojections, maybe use upscaling to make it cheaper too.
As for global illumination, I plan to port my existing work with light probes over.
Overall I’m very happy with what I’ve got so far and I’m really excited about the future, both for this project as well as for 3d on web in general.
PS:
Probably close to 90% of the shading model is lifted from three.js, so thank you everyone for your amazing work. I spent a lot of time researching state of the art and almost every time I would find that three.js already implement that or it’s not particularly feasible to do on the web.