WebGPURenderer ~2× slower CPU and ~5–10× slower first frame than WebGLRenderer on many-mesh scenes (r183) — same on both backends

I’m evaluating moving a CAD/BIM viewer (thousands of separate meshes) from WebGLRenderer to WebGPURenderer, and the unified renderer is consistently slower per frame than the classic one — and the gap is the same whether it runs on its WebGPU backend or its forceWebGL backend, which suggests it’s a renderer-class cost, not a backend one.

Live repro (3 renderers, same scene, side by side): https://codepen.io/S-bastien-GOUBIER/pen/bNBaEor

Same synthetic scene in all three (N separate Meshes over a pool of 200 shared geometries, one trivial material, frustumCulled=false, matrixAutoUpdate=false). Only the renderer/material class differs (MeshNormalNodeMaterial vs MeshNormalMaterial). “CPU render” = time spent inside renderer.render(); “first frame” = the cold first render().

renderer (4,000 meshes, ~8M tris) CPU render first frame (cold) ≈ max fps (1000/CPU)
WebGLRenderer (classic) 4.5 ms 32 ms ~225
WebGPURenderer (WebGPU backend) 9.7 ms 167 ms ~103
WebGPURenderer({forceWebGL:true}) 10.3 ms 344 ms ~97

(The three run on one rAF so their displayed FPS is shared/identical — that’s why I report CPU-render time and a derived ≈ max fps, which are per-render()-call and valid concurrently. Scaling the scene up to ~9k meshes widens the gap further.)

So on this workload the unified renderer costs ~2× more CPU per frame and its first frame is ~5–10× slower — identical on both of its backends.

Questions:

  1. Is the ~2× higher per-draw CPU cost of WebGPURenderer vs the classic WebGLRenderer expected (on both backends)?

  2. The cold first frame is much slower — presumably lazy RenderObject / bind-group creation per object. Is there a recommended way to pre-warm or amortize it to improve time-to-first-paint with thousands of meshes?

Environment: three.js r183 (0.183.2), Chrome 148.0.7778.216 (64-bit), Windows 11, RTX 4090 Laptop.

Update: this looks like the same underlying problem already tracked in #30560 (and discussed here: WebGPURenderer: Current UBO system has severe performance issues with many render items. · Issue #30560 · mrdoob/three.js · GitHub).

The diagnosis there (per-object UBO / bind-group management for many non-instanced meshes → lots of setBindGroup()/writeBuffer() calls per frame) matches what I’m seeing, and it’s consistent with the gap being identical on both backends — i.e. a renderer-class cost rather than a backend one.

Two things my repro might add to that thread:

  • It’s on Windows / NVIDIA (RTX 4090), not Metal/Mac — so the slowdown isn’t Metal-specific.
  • The cold first frame is ~5–10× slower (likely lazy RenderObject / bind-group creation per object), which the existing issue doesn’t really cover.

Happy to move the discussion to GitHub if that’s the better place for it.

WebGLRenderer is over a decade old, and has been optimized continuously throughout.

WebGPURenderer is only a couple years old, and still has lots of bugs/edge cases, let alone actually having full performance parity with webGL.. as that is a secondary concern to it actually working and being stable.

The only reason I would recommend using WebGPURenderer, is if you have to do something that isn’t attainable with WebGLRenderer.

It sounds like since you are comparing them both, there is nothing about your workload that is webGPU specific functionality.. so just stick with WebGLRenderer. It’s more widely supported, less buggy, and better optimized in almost all ways. IMO for a CAD app.. WebGLRenderer is a safer choice.

If your workflow actually required using WebGPURenderer, it wouldn’t be a question. You either need it or you don’t.

For a FEM solver, you might actually see benefit from WebGPU, and it might reduce your code complexity.
but even then, a lot of bulk math is often trivially achievable with regular WebGL, just using rendertargets+shaders.

It’s only when you have a specific problem that requires the granularity of WebGPU’s data models, that you may actually NEED to use it.

The support matrix for webGL vs webGPU kinda shows part of the story:

WebGL - 3D Canvas graphics | Can I use... Support tables for HTML5, CSS3, etc ← supporter literally everywhere except opera mini

WebGPU | Can I use... Support tables for HTML5, CSS3, etc ← Supported mostly on desktop chromium based browsers with partial support for FF/Safari.

Admittedly, chromium browsers account for >90% of the ecosystem, but if you care about that other 10 percent of weirdos that don’t want to use a chromium browser, then WebGL is kinda your only choice.

The story of WebGPU adoption is similar in trajectory to WebGL2… WebGL2 landed.. It was buggier.. slower than WebGL1… then got more solid, faster, widely supported, and now it is the default when using WebGLRenderer.

WebGPU will follow a similar trajectory.. it will get less buggy.. and faster, and eventually, due to countless optimizations, it will be faster than any of the other renderers, simply because it gives more granular access to the GPU.

Thanks, that’s a really helpful and fair framing.

To clarify the motivation: I wasn’t comparing them to pick a renderer for its own sake — I was scoping a port to WebGPURenderer specifically to unlock things that aren’t really attainable on WebGL, namely GPU occlusion culling and a meshlet pipeline. That’s genuinely WebGPU-territory, so the “you either need it or you don’t” test does apply here.

But before going down that road, I wanted to start from a healthy baseline and understand the current per-draw cost — and given the issue is well understood (#30560: per-object UBO/bind-group churn) and still actively being worked on, I think the sensible move is to pause this investigation for now and let WebGPURenderer mature a bit more before revisiting the occlusion-culling / meshlet work.

Appreciate the perspective — agreed on the overall trajectory.