Updating buffer attribute performance is incredibly slow

Hi folks,

A quick question about the performance of a buffer attribute with dynamically changing values.
I have an InterleavedBuffer that represents a Float32Array with an item size of 1.000.000 * 10.

I’m using a SharedArrayBuffer to let a worker-thread simulate my particles. Everything works just fine. However, for some reason the buffer isn’t used as a pointer when bound to the WebGL context (does that make sense?).

I have to set buffer.needsUpdate = true to actually see the changes, which makes me think that Three makes a copy of the entire buffer before uploading it to the GPU. The reason I’m making this assumption is because my framerate dips from 320 FPS to 24~40 FPS :exploding_head:

Is there a way around this?

I’m already using DynamicDrawUsage on my buffer.

tl;dr: How can I bind a pointer of my array buffer, rather than copying its contents to the GPU using Three?

edit: Since the entire particle “pool” is allocated statically anyways, chaning the buffer Usage to StaticDrawUsage has increased the performance somewhat, and everything still seems to work. It still isn’t optimal though. I wonder if what I’m trying to do actually works since system RAM and GPU memory are two separate things :thinking:

To ilustrate: simulating ~1000 particles is perfectly fine:

However, simulating ~10.000 in a single emitter causes frame drops that I my profiler is somehow missing. (notice the FPS at top left compared to all the timings on the right).

The only thing I can think of is that InterleavedBuffer.needsUpdate = true causes the renderer to perform a lot of work. I’m not familiar enough with the internals of the renderer itself, but if anyone can confirm my suspicions, that would be greatly appreciated so I know where I could possibly start optimizing.

edit 2: Just as I uploaded the second screenshot, I noticed the env-map update taking an enormous amount of time. I’ll investigate further…

edit 3: The env-map update time in the screenshots isn’t happening real-time, but only when the environment (skybox and such) changes.

edit 4: When the first call to WebGLRenderer.render() in the pipeline is being made and the .needsUpdate is set to true, it starts slowing down drastically.

1 Like

On the subject of buffers, to clear up a few things:

  • there are 2 copies of the buffer, one on the CPU and one on the GPU, this is standard WebGL and pretty much every other API behavior
  • to update anything in the buffer, typically you have to push some or all of the CPU buffer’s content to the GPU, this is what .needsUpdate = true does for you in the nutshell
  • Usage parameter actually does pretty much nothing. It’s a hint to the WebGL routine itself about how you intend to use the buffer. For clarity:
    • Static = I intend not to change data
    • Dynamic = I intend to change data sometimes
    • Streaming = I intend to change data all the time
      As you can see, these terms are super vague and don’t actually change how you interact with WebGL, they promise potential performance benefits when you use the right Usage.
  • And yes, pushing data to GPU can be costly. I noticed that as well. This is why you generally don’t want to update particles on the CPU if you have too many. Beyond that - you want to pay attention to how much space you use per particle, at the WebGL level it boils down to how many buffers you are updating, and how much data you’re pushing across. In my experience it’s the volume that’s the larger bottleneck.

Hope that helps. Also, maybe try to use Streaming instead of Dynamic, who knows, maybe the copying will be a bit faster on your target browser/os/hardware combo? :woman_shrugging:

9 Likes

Thank you so much for the detailed info @Usnul. Very much appreciated!

My particle shader uses (only?) 5 buffers and one packed texture atlas, so there isn’t much to optimize there.

  • there are 2 copies of the buffer, one on the CPU and one on the GPU, this is standard WebGL and pretty much every other API behavior

I had my suspicion this would be the case. I should’ve known this from reading through the WebGL spec/documentation. Apologies for the laziness :sweat:

In my experience it’s the volume that’s the larger bottleneck.

I think this is the case as well. I’m using one Points-object for the entire particle system (there is one per level). I still have to implement frustum culling on the emitter and/or particle level, so maybe that’ll help somewhat.

Thanks again! <3

2 Likes

@Usnul Setting the buffer usage property to StreamDrawUsage did the trick! :heart:

This results in:

  • The renderer FPS remains steady (capped at monitor VSync speed)
  • The particle renderer remains at sub-zero ms latency
  • The simulation thread is sweaty, but that is to be expected and still needs some unrelated optimization work.

The result now, is that the game itself runs smooth, but only the particles stutter due to slow simulation updates in the worker-thread :smiley:

Previously, the particle render pass was also reporting low latencies, but the actual FPS dropped. Using StreamDrawUsage somehow keeps the FPS smooth.

3 Likes