Shade - WebGPU graphics

Usnul · February 1, 2026, 8:01pm

Worked on ensuring C0 continuity near surfaces across the entire map. For now this is achieved by purging incomplete node levels, works well enough even if it’s a bit of a blunt tool.

Actual locations for probes during the bake are now going through an optimization phase, which allows me to push probes behind surfaces out into the open resulting in much fewer light leakage artifacts.

This video here has only 28,376 probes in the map, and takes 1.8Mb of VRAM

Antonio · February 2, 2026, 11:57am

Holly cow, that’s a lot of bounces, I never go beyond 5 and 256 samples on cpu, very interesting yo see your results, keep up the good work!

Usnul · February 2, 2026, 11:59am

Spent more time optimizing probe placements. Here’s what I started with

and here’s what we have now

At a first glance it may look like there is some denoising going on - that’s not the case.

The first is more noisy because it’s baked at 1024 samples per probe, and second is 16k samples per probe. But that’s not super important.

Let’s take a look at some artifacts which are a result of poor probe placement

These are just some of the more prominent leakage artifacts.

The reason this happens is because our probes have implicit locations, based on a recursive grid

The geometry of the scene doesn’t care about this fact, so we end up commonly with situations like this

If we follow the surface across the probe grid, from A to B

We can see that lighting will change drastically, because the closest probe at A is on the left side of the surface and at B it’s on the right side. Imagine if the surface is a solid sphere and B is inside of the sphere - we’d get a massive light leak, with B being shadowed just because the nearest probe is sunken into the surface.

So B is problematic. But actually, so is A, A is too close to the surface, and is going to oversample the surface. This is often just referred to as aliasing.

Ideally this is what we want

We take probes behind surfaces and push them through, so they don’t cause leaks, and we take probes in front of the surfaces that are too close and push them out from the surface.

Let’s back up a bit. I said just a little earlier that the probe locations are implicit from the recursive grid, which means that where we sample is fixed. So we want the locations in blue, but when we will sample the light map, we will always have locations in pink.

This may seem like cheating, but the answer is “yes”. That is -we can bake with locations in blue, and sample with locations in pink.

But isn’t this wrong?

Yes, it’s wrong, in that - it creates a bias. But this bias produces end-result which is less wrong than if we didn’t bias. So actually we’re cancelling out the bias that comes from the grid-like nature of out probe mesh.

Second thing about this bias is that if we choose between light leaks and slight lighting shifts - lighting shifts are preferable. Light leaks are very obvious to our eyes, subtle lighting shift because we moved the probe during baking is going to be incredibly subtle.

Light leaks create visual discontinuities and increase contrast (erroneously).

How do we achieve this?

Here’s the relevant piece of code:

const hit = new SurfacePoint3();

for (let i = 0; i < probe_count; i++) {
	let probe_location_x = locations[i * 3];
	let probe_location_y = locations[i * 3 + 1];
	let probe_location_z = locations[i * 3 + 2];

	if (!bvh.query_point_distance_to_nearest(hit, probe_location_x, probe_location_y, probe_location_z)) {
		// nothing nearby, this should never happen
		continue;
	}

	// got something close by

	const near_surface_x = hit.position.x;
	const near_surface_y = hit.position.y;
	const near_surface_z = hit.position.z;

	const to_hit_x = near_surface_x - probe_location_x;
	const to_hit_y = near_surface_y - probe_location_y;
	const to_hit_z = near_surface_z - probe_location_z;

	const near_surface_orientation = v3_dot(
		to_hit_x, to_hit_y, to_hit_z,
		hit.normal.x, hit.normal.y, hit.normal.z
	);

Hopefully this is enough to figure out the rest.

One quite important piece to keep in mind, is that when you move probes - you should be careful not to worsen aliasing. I cast a ray from the original position to the desired location and if we get a collision - we move the probe to the mid-point between where it was and the raycast hit.

It’s dry and boring stuff, but it’s something I’ve learned the hard way not to neglect.

Usnul · February 2, 2026, 12:10pm

Yeah, 7 is a bit of an overkill. You get 90% of the lighting from 3 bounces typically.

As for the samples - that’s a tough one, if your scene has a lot of complexity and you want the nearby samples to be uniform - you need a lot of samples.

I remember watching an EA presentation from around 2013-15 where they were presenting their light map baking approach, and they were citing ~30,000 samples per pixel.

You usually start to see convergence around 4k in my experience, but unless you denoise your probes, you’re going to need a lot of samples to achieve a smooth transition across your probe mesh.

Usnul · February 2, 2026, 8:12pm

Latest results

Fixed a bunch of smaller bugs

light map stats:

VRAM Size: 20 Mb
Probe samples: 16,384
Probe count: 324,674
Bake time: 267s
Bake hardware: RTX 4090

Usnul · February 3, 2026, 4:01pm

Implemented a different compression scheme for probes, using 26 bytes per probe now, instead of previous 56. Visually there is no difference, so a definite win.

Reworked statistics for outlier filtering during baking. Previously it was based on mean, now I’m using median, which is less susceptible to blowing up.

Calculating median on the GPU is a pain, especially per-probe, so I’m using a histogram instead. 32 buckets seems to produce a good result.

Slightly changed energy compensation process of the outlier elimination as well, it diffuses into L0 only now, but still respecting chromacity of the probe.

Here’s with old:

And here’s with new:

Here’s Sibenik, it’s link almost entirely indirectly, so it’s a torture test for the system

Old

New

To highly how bad of a stress-test it is, here’s the same scene path-traced

Old

New

The effect is more pronounced in highly specular scenes.

Bounce counts and sample counts are the same.

Usnul · February 5, 2026, 12:48pm

Integrated the sparse volumetric lightmap into the GI pipeline:

Here’s GI off

Specular is done via probes as well, using GGX convolution

Usnul · February 7, 2026, 7:54pm

Spent more time on the specular component of the light maps

Still using SH3 probes, using GGX ZH basis ( thanks to Matt Pettineo ).

There’s a bit of chroma undersampling going on, but in the final output it’s not particularly noticeable.

Using reservoir sampling to pull 2 unique probes per pixel, instead of blending whole 8 corners of the voxel.

Thanks to NVIDIA’s Marcos Fajardo et. al for the inspiration from their 2023 paper “Stochastic Texture Filtering”

Applying parallax correction weights to the samples using sphere proxies. This is different from correcting individual probes, but it still improves accuracy.

Frame timing is 0.05ms on RTX 4090 at 1080x1080 resolution.

Usnul · February 8, 2026, 4:44pm

Worked on the specular GI some more. Improved the selection logic for 2 samples, there was a bit of a bias in the second sample selection.

Decided to drop parallax correction after some testing.

SH3 is too low frequency to have enough angular resolution for parallax to make a lot of difference. I didn’t measure it numerically, but overlaying 2 image with and without - I can’t tell the difference.

My “Sparse Volumetric Lightmap” implementation ends up having relatively high spatial resolution, which reduces average correction that parallax would produce even further.

Glad to have investigated this, but in the end doing less work on the GPU is always better

Usnul · February 10, 2026, 5:49pm

GI demo with sparse volumetric lightmap

The light map is only 2.2MB, the format (SVLM) was created specifically for this project and it maps directly to the GPU buffer without any translation.

For comparison, this grass albedo texture is 7.52MB as PNG

It needs to be decoded before we can push it to the GPU, where it will take up 2048*2048 pixels at 4 bytes per pixel, or 16MB

But this texture will not be enough to render the grass material, you also need the normal maps and ORM (occlusion, roughness, metalness)

each of which is also needs 16Mb of VRAM. So just this grass meterial will need 48 MB of VRAM in total, vs this lightmap which takes up 2.2 MB for the entire scene.

The map was baked at 7 bounces per sample, and 32,000 samples per probe.

There are a total of 60,826 probes in the lightmap.

Here’s a flythrough:

Would be curious to know what the performance is like, for me the GI part is blazingly fast, taking ~0.1ms in total for both diffuse and specular.

Soma · February 11, 2026, 8:40pm

Hey Antonio - how can I reach you for help with a 360 pano viewer? Multiresolution panorama | Pannellum

dubois · February 12, 2026, 6:33am

Perhaps through a private message?

Usnul · February 12, 2026, 4:56pm

So, as someone working on WebGPU renderer, I think got this.

First you click their avatar and then you look for a button that says “Message” , click that thing and you’re good to go!

dubois · February 13, 2026, 12:36am

Just to make it 100% foolproof, @Soma should click on @Antonio ‘s profile, not his own. Odd, but it seems that you can message yourself.

Usnul · March 8, 2026, 11:32pm

Ignacio Castaño on twitter pointed out a new thing to me: MKS (Magic Kernel Sharp)

It’s a different type of filter kernel. I whipped up support for it in Shade, it’s way more expensive than what I currently use which is Mitchell-Netravali but it has some nice properties.

Linear - this is base (what three.js uses)

Mitchell

MKS

The screenshots are pure albedo, that’s why they look a bit weird. But the textures are very high resolution (4k each book cover) which serves as a good test case.

MKS is actually a little softer than Mitchell, which is not surprising, as Mitchell is still a sharpening filter. However, MKS does an amazing job at removing dinging artefacts and moire patterns. You can see this on the “robot dreams” cover most prominently, as the upper part of the cover has some texture to it which results in ringing artefacts for both linear and mitchell filters, but MKS does an incredible job of blending it out.

Not sure if it’s worth the cost, I might refactor my mimap generation code to split up the kernel and make it run in reasonable time, but it’s something genuinely new to me! You live and you learn

Usnul · March 9, 2026, 2:16pm

Spent way more time on mipmap texture filtering than is healthy to. Ended up switching to MKS as a default filter for color textures.

Spent a lot of time tuning MKS specifically.

Here are screenshots for comparison:

Linear

MKS

CatmullRom

Mitchell

Wronski 2021

(10 tap kernel with MagicKernel pre-pass)

MKS does well at removing aliasing and ringing. It preserves overall image sharpness quite well, but is less aggressive than cubic spline filters.

Here is another scene with the books

Linear

MKS

CatmullRom

Mitchell

Wronski

References

Usnul · April 7, 2026, 12:25pm

Reworked occlusion culling architecture.

The reason being 2-fold:

I was using OneSweep prefix scan algorithm on the GPU, and it doesn’t jive with Apple silicone, which was causing horrible performance artifacts like stuttering and generally low FPS
HZB rebuilds were taking a significant chunk of overall frame time on lower-end GPUs

So now the engine runs pretty well on older macs. I got 2 updated demos:

→ Full resolution verion
→ Performance upscale version (60% internal resolution)

Some perf numbers

Demo	Device	FPS	Resolution
A	Apple M1 Pro	32	3456 x 2234
B	Apple M1 Pro	47	3456 x 2234
B	GTX 1080	61	3840 x 2160
A	rdna2 iGPU	19	1080 x 1080
B	rdna2 iGPU	33	1080 x 1080

The demoes feature:

GTAO
Bent Normals
Volumetric Lightmap ( Diffuse & Specular )
Bloom
Automatic Exposure
3-cascade CSM (shadows)

The scene stats:

Meshes: 5202
Materials: 32
Lights: 131
Polycount: 267,302

If you do run the demo, I would be very grateful for if you could post your performance numbers

Performance Uplift

Somewhat unrelated, but because of this and a few other changes the overall FPS has gone up by 10 to 15% on most scenes, with higher complexity scenes seeing more benefit.

The most notable is the Blender 3.3 splash screen scene, which is basically a torture test, I wrote about it earlier in this topic:

Mesh count: 374,734
Unique Geometries: 353
Total polycount on the scene: 717,869,562

It was previously running at 21 FPS, now it’s 46 FPS, and I wish I could say exactly why, but I honestly have no idea exactly what was the main cause, as I made so many little improvements since that time.

The upshot is that it’s ~21.74ms of frame time. That’s with shadows, volumetric light map etc (see above).

Three.js takes 3671ms on this scene, which makes Shade about 168 times faster on this scene. Or about 2.2 times faster relative to before.

Usnul · April 12, 2026, 3:40pm

Been working on a website for Shade, compiled a comparison table. I’m not super happy with it, but I tried to keep things fair. Feedback is very welcome:

All entries reflect web-deployed (browser-based) capabilities only.

Native/desktop-only features are excluded. Data current as of April 2026.

Rendering Architecture

Feature	Shade	three.js	Babylon.js	Unity	PlayCanvas
Rendering pipeline	GPU-driven visibility buffer	CPU-driven forward	CPU-driven forward	CPU-driven SRP	CPU-driven clustered fwd
Draw dispatch	GPU-resident indirect	CPU per mesh	CPU per mesh	CPU, SRP Batcher	CPU per mesh
Meshlet rendering	✓ Built-in	✗ None	✗ None	✗ None	✗ None
Visibility buffer	✓ Deferred visibility shading	✗ None	✗ None	✗ None	✗ None
FrameGraph	✓ Engine-centric, auto VRAM aliasing	✗ None	~ API exists, unused internally	✗ None	~ Render-pass based
Language	JavaScript (native)	JavaScript	TypeScript	C# → WASM	JavaScript

Culling & Scene Scale

Feature	Shade	three.js	Babylon.js	Unity	PlayCanvas
Frustum culling	GPU, per meshlet	CPU, per mesh	CPU, per mesh	CPU, per mesh	CPU, per mesh
Occlusion culling	✓ GPU HZB	✗ None	✗ None (raw queries only)	~ CPU-side, baked	✗ None
Culling granularity	Sub-mesh (meshlet)	Object bounding box	Object bounding box	Object bounding box	Object bounding box
Max meshes @ 60 FPS	Millions	Hundreds–low thousands	Thousands (instanced)	Thousands (batched)	Thousands (instanced)
Max active lights	Thousands (clustered)	~5–50 (forward)	Hundreds (clustered)	Dozens–hundreds	Hundreds (clustered)
Instancing required	No — every object dynamic	Yes, manual	Yes, manual	Yes, manual	Yes, manual

Shadows

Feature	Shade	three.js	Babylon.js	Unity	PlayCanvas
Cascaded shadows	✓ On by default	~ Addon, manual config	~ Manual config	✓ Built-in	✓ Built-in
Cascade blending	✓ Cross-cascade	✗ Hard splits	~ Manual	✓ Built-in	~ Limited
Cascade selection	✓ Projection-based (+50% texel density)	Distance-based	Distance-based	Distance-based	Distance-based
Shadow GPU culling	✓ Same GPU pipeline	✗ CPU-issued	✗ CPU-issued	✗ CPU-issued	✗ CPU-issued
Ray-traced shadows	✓ Software BVH (TLAS+BLAS)	✗ None	✗ None	✗ None	✗ None
Out-of-box quality	✓ No tweaking needed	✗ Manual biases, bounds, resolution	~ Manual biases & bounds	~ Some tweaking	~ Some tweaking

Post-Processing

Feature	Shade	three.js	Babylon.js	Unity	PlayCanvas
Integrated stack	✓ Full stack, production-grade	✗ Third-party required	~ Build-your-own (API only)	✓ URP pipeline	✓ CameraFrame
TAA	✓ Motion vectors, disocclusion detection, variance clipping, YCoCg	✗ FXAA / SMAA only	~ Heavy ghosting, disabled during motion	~ FXAA / SMAA (TAA experimental)	~ Basic TAA
Temporal upscaling	✓ TAAU (dynamic resolution)	✗ None	✗ None	~ Experimental (STP)	✗ None
SSAO	✓ GTAO, temporal reprojection, à-torus spatial filter, PBR-integrated	~ Image-level, ignores PBR AO	~ Image-level, ignores PBR AO	✓ Integrated	~ Image-level
SSR	✓ HiZ stochastic trace+resolve, temporal+spatial denoising, IBL energy-conserving	~ No PBR awareness, no IBL mixing	~ No IBL mixing, energy issues	~ Basic	✗ None
HDR Bloom	✓ Multi-pass, Karis-filtered, spatially-stable HDR bloom	✗ SDR source, not true HDR	✗ SDR source, not true HDR	✓ HDR pipeline	~ Bloom
Auto exposure	✓ Eye adaptation	✗ None	~ Manual	✓ Built-in	✗ None
HDR display output	✓ Native (>100 nits)	✗ None	✗ None	✗ None	✗ None

Transparency & Materials

Feature	Shade	three.js	Babylon.js	Unity	PlayCanvas
OIT	✓ MBOIT	✗ Sort-based	~ Depth peeling (expensive)	✗ Sort-based	✗ Sort-based
Alpha testing	Hashed (volume-preserving)	Binary cutoff	Binary cutoff	Binary cutoff	Binary cutoff
Material overdraw	Zero (ID buffer)	Full (forward)	Full (forward)	Full (fwd/deferred)	Full (forward)
Shader compilation	Single shader, instant	~ Per-material variant	~ Per-material variant	~ Per-material variant	~ Per-material variant
Runtime stutter	None by design	✗ On-demand compilation, recompiles on light/feature changes	✗ On-demand compilation, recompiles on changes	~ Pre-warm available, stutter possible	✗ On-demand compilation
Vertex compression	✓ On-line (~40% savings)	✗ None	✗ None	~ Offline only	✗ None

Global Illumination & Ray Tracing

Feature	Shade	three.js	Babylon.js	Unity	PlayCanvas
2D Lightmaps (SDR)	✓ Via PBR AO + UV2	✓ External bake + UV2	✓ External bake + UV2	✓ Baked in editor	✓ Editor bake tool
3D Lightmaps (HDR)	✓ Sparse Volumetric, SH3, full HDR	✗ None	✗ None	✗ None	✗ None
Lightmap specular	✓ GGX SH convolution	✗ Diffuse only, no angular data	✗ Diffuse only, no angular data	~ Limited	✗ Diffuse only, no angular data
Extra UV required	✓ No — volumetric	Yes (UV2)	Yes (UV2)	Yes (UV2)	Yes (UV2)
Software ray tracing	✓ Full (TLAS+BLAS+Materials)	✗ None	✗ None	✗ None	✗ None
In-engine GI bake	✓ Own RT engine	✗ None	✗ External tools only	~ Editor only, not in browser	~ Editor bake, SDR only

Memory, Performance & Integration

Feature	Shade	three.js	Babylon.js	Unity	PlayCanvas
Memory management	✓ Custom allocator + pooling	✗ Manual / GC	~ Semi-automatic	~ WASM heap + GC	~ Semi-automatic
Resolution scaling	✓ Dynamic + temporal upscale	✗ Manual	✗ Manual	~ Limited	✗ Manual
Built-in profiler	✓ GPU timing	✗ DevTools only	✓ Inspector (CPU)	~ Limited in browser	✓ Profiler
Web integration	Native JS	Native JS	Native TS/JS	WASM (heavy)	Native JS
Bundle size	~250 KB gz	~150 KB core, ~400 KB typical	~400 KB gz	10–50+ MB (WASM)	~300 KB gz
License	Commercial	MIT	Apache 2.0	Commercial	MIT

Summary

Strength area	Shade	three.js	Babylon.js	Unity	PlayCanvas
Best for	Massive scenes, AAA-quality web rendering	Prototyping, small-medium scenes, ecosystem	Full-featured 3D apps, tooling	Porting native games to web	Lightweight collaborative 3D apps
Unique advantage	GPU-driven pipeline with RT, TAAU, HZB — unmatched scene scale and visual quality on web	Massive community, simple API	Rich tooling, Node Material Editor	Full game engine feature set	In-browser editor, Gaussian splatting
Key limitation	No animation/skinning yet, no editor	No occlusion culling, no built-in PP, stutter-prone, limited scene scale	No GPU-driven pipeline, no RT, stutter-prone	Huge bundle, WASM overhead, experimental WebGPU	No occlusion culling, no RT, no SSR

Kitanga_Nday · April 13, 2026, 5:37pm

First of all, this is amazing stuff: Kaze playtest 2026 April

I literally just saw this, looks incredible. ~~Though I’m assuming some of your assets are ai~~, but even that is hard to tell. Good stuff. Did you try implementing GI in this game too?

Usnul · April 13, 2026, 5:54pm

Hey Kitanga,

Sorry for the confusion. This the video in question is Unity

I run a small gamedev studio and we’ve been working on an ARPG for the past 4+ years

There are a few custom shaders there, but it’s mostly just Unity HDRP

None of the assets are AI generated though They are licensed assets mostly, as we have no 3d artists in the team

So yeah, that’s not Shade, I guess shade would look somewhat similar if it was, because techniques are similar under the hood and post-process stack is similar, but it’s not Shade, it’s Unity 6.0 with HDRP

One might ask 2 questions:

Why not use Shade for your own game?

Shade as a project started almost full 2 years after the game project started

Why not use Meep for your own game?

I judged that it would be much easier to be faster, and we were aiming for consoles primarily

In hindsight - I probably would go with Meep, because Unity doesn’t give you nearly as much as it claims on paper.

Anyway, a huge tangent. Glad you liked the gameplay video though.

As for GI specifically in the video, he’s what the shading stack looks like:

SSGI
SSR
GTAO
TAA
IBL

But again - it’s all Unity’s built’in stack

Topic		Replies	Views
Three-gpu-pathtracer: A modular shader-based path tracing extension for three.js Resources geometry , shaders , physical-material , raytracing-renderer , three-mesh-bvh	50	15283	February 26, 2026
Gentle Light Probing Showcase geometry , lighting , light-probes , global-illunimation	32	8664	January 26, 2025
Clustered Rendering on WebGPU Showcase shaders , lights , webgpu , clustered-rendering	5	1036	June 20, 2025
R168 WebGPU - Chasing Shadows - fixed in r169 Questions shadows , webgpu	37	1051	September 17, 2024
What's going on with WebGL2? WebGPU? Discussion	59	14501	May 12, 2020

Shade - WebGPU graphics

Linear

MKS

CatmullRom

Mitchell

Wronski 2021

Linear

MKS

CatmullRom

Mitchell

Wronski

References

Performance Uplift

Rendering Architecture

Culling & Scene Scale

Shadows

Post-Processing

Transparency & Materials

Global Illumination & Ray Tracing

Memory, Performance & Integration

Summary

Related topics