Shade - WebGPU graphics

Worked on ensuring C0 continuity near surfaces across the entire map. For now this is achieved by purging incomplete node levels, works well enough even if it’s a bit of a blunt tool.

Actual locations for probes during the bake are now going through an optimization phase, which allows me to push probes behind surfaces out into the open resulting in much fewer light leakage artifacts.

This video here has only 28,376 probes in the map, and takes 1.8Mb of VRAM

3 Likes

Holly cow, that’s a lot of bounces, I never go beyond 5 and 256 samples on cpu, very interesting yo see your results, keep up the good work!

Spent more time optimizing probe placements. Here’s what I started with

and here’s what we have now

At a first glance it may look like there is some denoising going on - that’s not the case.

The first is more noisy because it’s baked at 1024 samples per probe, and second is 16k samples per probe. But that’s not super important.

Let’s take a look at some artifacts which are a result of poor probe placement

These are just some of the more prominent leakage artifacts.

The reason this happens is because our probes have implicit locations, based on a recursive grid

The geometry of the scene doesn’t care about this fact, so we end up commonly with situations like this

If we follow the surface across the probe grid, from A to B

We can see that lighting will change drastically, because the closest probe at A is on the left side of the surface and at B it’s on the right side. Imagine if the surface is a solid sphere and B is inside of the sphere - we’d get a massive light leak, with B being shadowed just because the nearest probe is sunken into the surface.

So B is problematic. But actually, so is A, A is too close to the surface, and is going to oversample the surface. This is often just referred to as aliasing.

Ideally this is what we want

We take probes behind surfaces and push them through, so they don’t cause leaks, and we take probes in front of the surfaces that are too close and push them out from the surface.

Let’s back up a bit. I said just a little earlier that the probe locations are implicit from the recursive grid, which means that where we sample is fixed. So we want the locations in blue, but when we will sample the light map, we will always have locations in pink.

This may seem like cheating, but the answer is “yes”. That is -we can bake with locations in blue, and sample with locations in pink.

But isn’t this wrong?

Yes, it’s wrong, in that - it creates a bias. But this bias produces end-result which is less wrong than if we didn’t bias. So actually we’re cancelling out the bias that comes from the grid-like nature of out probe mesh.

Second thing about this bias is that if we choose between light leaks and slight lighting shifts - lighting shifts are preferable. Light leaks are very obvious to our eyes, subtle lighting shift because we moved the probe during baking is going to be incredibly subtle.

Light leaks create visual discontinuities and increase contrast (erroneously).


How do we achieve this?

Here’s the relevant piece of code:

const hit = new SurfacePoint3();

for (let i = 0; i < probe_count; i++) {
	let probe_location_x = locations[i * 3];
	let probe_location_y = locations[i * 3 + 1];
	let probe_location_z = locations[i * 3 + 2];

	if (!bvh.query_point_distance_to_nearest(hit, probe_location_x, probe_location_y, probe_location_z)) {
		// nothing nearby, this should never happen
		continue;
	}

	// got something close by

	const near_surface_x = hit.position.x;
	const near_surface_y = hit.position.y;
	const near_surface_z = hit.position.z;

	const to_hit_x = near_surface_x - probe_location_x;
	const to_hit_y = near_surface_y - probe_location_y;
	const to_hit_z = near_surface_z - probe_location_z;

	const near_surface_orientation = v3_dot(
		to_hit_x, to_hit_y, to_hit_z,
		hit.normal.x, hit.normal.y, hit.normal.z
	);

Hopefully this is enough to figure out the rest.

One quite important piece to keep in mind, is that when you move probes - you should be careful not to worsen aliasing. I cast a ray from the original position to the desired location and if we get a collision - we move the probe to the mid-point between where it was and the raycast hit.

It’s dry and boring stuff, but it’s something I’ve learned the hard way not to neglect.

1 Like

Yeah, 7 is a bit of an overkill. You get 90% of the lighting from 3 bounces typically.

As for the samples - that’s a tough one, if your scene has a lot of complexity and you want the nearby samples to be uniform - you need a lot of samples.

I remember watching an EA presentation from around 2013-15 where they were presenting their light map baking approach, and they were citing ~30,000 samples per pixel.

You usually start to see convergence around 4k in my experience, but unless you denoise your probes, you’re going to need a lot of samples to achieve a smooth transition across your probe mesh.

1 Like

Latest results

Fixed a bunch of smaller bugs

light map stats:

  • VRAM Size: 20 Mb
  • Probe samples: 16,384
  • Probe count: 324,674
  • Bake time: 267s
  • Bake hardware: RTX 4090
3 Likes

Implemented a different compression scheme for probes, using 26 bytes per probe now, instead of previous 56. Visually there is no difference, so a definite win.

Reworked statistics for outlier filtering during baking. Previously it was based on mean, now I’m using median, which is less susceptible to blowing up.

Calculating median on the GPU is a pain, especially per-probe, so I’m using a histogram instead. 32 buckets seems to produce a good result.

Slightly changed energy compensation process of the outlier elimination as well, it diffuses into L0 only now, but still respecting chromacity of the probe.

Here’s with old:

And here’s with new:


Here’s Sibenik, it’s link almost entirely indirectly, so it’s a torture test for the system

Old

New

To highly how bad of a stress-test it is, here’s the same scene path-traced


Old

New


The effect is more pronounced in highly specular scenes.

Bounce counts and sample counts are the same.

Integrated the sparse volumetric lightmap into the GI pipeline:

Here’s GI off


Specular is done via probes as well, using GGX convolution

4 Likes

Spent more time on the specular component of the light maps

Still using SH3 probes, using GGX ZH basis ( thanks to Matt Pettineo ).

There’s a bit of chroma undersampling going on, but in the final output it’s not particularly noticeable.

Using reservoir sampling to pull 2 unique probes per pixel, instead of blending whole 8 corners of the voxel.

Thanks to NVIDIA’s Marcos Fajardo et. al for the inspiration from their 2023 paper “Stochastic Texture Filtering”

Applying parallax correction weights to the samples using sphere proxies. This is different from correcting individual probes, but it still improves accuracy.

Frame timing is 0.05ms on RTX 4090 at 1080x1080 resolution.

2 Likes

Worked on the specular GI some more. Improved the selection logic for 2 samples, there was a bit of a bias in the second sample selection.

Decided to drop parallax correction after some testing.

SH3 is too low frequency to have enough angular resolution for parallax to make a lot of difference. I didn’t measure it numerically, but overlaying 2 image with and without - I can’t tell the difference.

My “Sparse Volumetric Lightmap” implementation ends up having relatively high spatial resolution, which reduces average correction that parallax would produce even further.

Glad to have investigated this, but in the end doing less work on the GPU is always better :sweat_smile:



3 Likes

GI demo with sparse volumetric lightmap
Screenshot 2026-02-10 170035

The light map is only 2.2MB, the format (SVLM) was created specifically for this project and it maps directly to the GPU buffer without any translation.

For comparison, this grass albedo texture is 7.52MB as PNG

It needs to be decoded before we can push it to the GPU, where it will take up 2048*2048 pixels at 4 bytes per pixel, or 16MB

But this texture will not be enough to render the grass material, you also need the normal maps and ORM (occlusion, roughness, metalness)


each of which is also needs 16Mb of VRAM. So just this grass meterial will need 48 MB of VRAM in total, vs this lightmap which takes up 2.2 MB for the entire scene.

The map was baked at 7 bounces per sample, and 32,000 samples per probe.

There are a total of 60,826 probes in the lightmap.


Here’s a flythrough:


Would be curious to know what the performance is like, for me the GI part is blazingly fast, taking ~0.1ms in total for both diffuse and specular.

3 Likes

Hey Antonio - how can I reach you for help with a 360 pano viewer? Multiresolution panorama | Pannellum

Perhaps through a private message?

1 Like

So, as someone working on WebGPU renderer, I think got this.

First you click their avatar :one: and then you look for a button that says “Message” :two: , click that thing and you’re good to go!

2 Likes

Just to make it 100% foolproof, @Soma should click on @Antonio ‘s profile, not his own. Odd, but it seems that you can message yourself.

2 Likes