WebGPURenderer: Shader compilation time is long with models with a lot of morphs

I had no idea where to put this but let me leave this here as a quick update. @sunag

I found that WebGPURenderer seems to take a long time to compile shaders when it attempts to render models with a large amount of morphs.
It seems the stall is more obvious on Windows (NVIDIA GPU?).

Tested with this model (note that redistribution is prohibited):

Though it’s a VRM file, loading it as a mere glTF file safely reproduces the issue.
webgpu_loader_gltf.html example should reproduce by modifying the model URL.
On my environment (Core i9 13900HK, RTX 4070 Laptop, Chrome 120 and 121), Loading this model stalls the tab for 6-7 seconds.

One possible solution is that, I’m wondering if it’s possible to use a LoopNode to iterate over morph targets.
We currently iterate over morph targets on JS side, which generates a massive code. I suspect this causes the shader compilation time issue when we have a lot of morphs in the model. Low confidence.

The current r160 MorphNode implementation:

The generated code:

To achieve the WGSL side loop, I think we currently have two problems:

  • There is no way to give a uniform array to a shader.
    • We don’t have a way to handle arrays in the shader.
    • If it’s possible to pass a uniform array, we can iterate over weights on the WGSL side loop.
  • TextureNode always creates an implicit variable, and this creates a large number of variables if we iterate over on the JS side.
    • I think we can omit the behavior as texture sampling is not the only case that should use temporary variables for performance, and end developers should already know that.

@0b5vr If this topic does not get a response soon, it’s okay to move it over to GitHub so it gets more attention.

@0b5vr Sorry for the delay, this is related to the code being built using native for instead of TSL loop, for this we need a uniform array that works in this application as you also verified very well. I’m checking this and I think I’ll solve this problem in a few days.

5 Likes
3 Likes

Thank you! This is much faster!

r162, takes ~8.5s since model load

current dev (3ebe3ddd), takes ~1s since model load

1 Like

r161 vs dev doesn’t change the runtime performance much on Android, might improved CPU performance a bit but low confidence.

Startup performance improved well on Android, ~4.38s to ~1.84s.
Android 14, Pixel 7 Pro, Chrome 121, webgpu_loader_gltf.html with the model I mentioned in the original post.