Shade - WebGPU graphics

The number of draw calls is roughly 111, i.e. the number of materials.

This is not exactly true, as there is some fixed overhead.

Here’s a basic breakdown of how a frame is rendered in Shade:

  1. All instances (meshes) are filtered using a compute shader into 2 sets:
    a. Visible (passing frustum check, and a conservative occlusion check)
    b. Potentially visible (“maybe” set)
  2. All visible instances are expanded to meshlets, again, we have “visible” and “maybe” groups
  3. All visible meshlets are expanded to triangles, same story with “visible” and “maybe” sets
  4. All visible triangles are now rasterized into a vizibility buffer, this is a rg32uint texture, with mesh_id and triangle_id. Actual rasterizer is dead-simple, just about as complex as a depth pre-pass shader.
  5. Using what we rasterized, we build a depth pyramid, this is the basis for occlusion testing mentioned above
  6. Using the current depth pyramid, we process the “maybe” sets, this gives remaining visible triangles for this frame
  7. We rasterize what we filtered in previous step
  8. We once again re-build the depth pyramid, this will be used in the next frame for steps 1-4

At this point we have a Vizibility Buffer and we spent 2 draw calls for actual geometry drawing so far. We also spent something like 20 draw calls for depth pyramid, but it’s relatively cheap as each pass processes ~33% of screen pixels (mip mapping)

Next is material pass, we fetch mesh_id from viz buffer and draw “depth” in a depth-only pass, where “depth” value is the material ID for the mesh at that pixel.

Next we do a draw pass for each material, with depth test set to equal and depth value being set to match material ID. Essentially we abuse depth-test hardware to get 0 overdraw. And I don’t mean it hyperbolically, like “virtually zero”, I mean that we run material shader once per-pixel only for pixels that are actually visible in the final render.

As a result, cost of texture switching is actually very low. Also, the material shader is uniform, meaning we don’t actually do any lighting here, we output g-buffer instead, things like roughness, albedo, normal etc.

The advantage is that we scale incredibly well with material and texture counts as well as number of instances and geometry size, at the cost of high GPU bandwidth.

I can’t say if this is a good trade as it depends on your usecase, obviously. But if you’re dealing with large scenes, and/or you want to run some post-processing, it’s definitely a huge win.

I did think about it, the problem is uniformity, you have to force every texture to have the same dimensions. I’m not exactly opposed to it, but it seems like a big ask. I already have texture resizing shaders that would make this transparent to the user, but loss of quality due to scaling would be a nasty surprise.

One more issue is the layer count limit, I actually use a texture array for ray-tracing path, that is, I have a special code path to do full inline ray-tracing, and there you have to have access to all textures at the same time, so I pack them into a texture array.

However, even then I ran into an issue, larger scenes, like Lumberyard’s Bistro

This scene has 400 textures

WebGPUDevice.limits.maxTextureArrayLayers is 256 by default

So, you simply can’t support larger scenes, full stop.

I had to get creative, I emulate texture sampling in shader, I skip mip maps and I treat the texture array as an atlas, meaning that I pack multiple textures per layer. This works alright for ray-tracing API, as I mostly use it for global illumination, so loss of texture quality and mips is something quite acceptable there

Here’s an example of what ray-tracer sees with all texture resolution fixed at 128x128

It looks surprisingly good for such a low resolution, but this is not acceptable for general usecase.

So, in short - the best asnwer would be bindless textures, but alas, this is not part of WebGPU spec and doesn’t look like we’re going to be getting bindless resources anytime soon.

The other alternative, which I consider to be workable would be virtual textures. In fact, virtual textures have the benefit of managing memory as well, since your “physical” texture is where sampling will be done, and it’s quite small, so you’re going to be getting way better cache utilization on the GPU. Virtual textures are hard though, and even though I have implemented them in the past and on WebGL, which is a less powerful API - it’s still a lot of work to do a proper solution, so something to look into in the future.

Anyway, thanks for the interesting questions @Lawrence3DPK

2 Likes