Really high process resource utilization on both GPU and CPU when modifying layers of a texture array

theahura · March 20, 2024, 11:53pm

We have a raw shader material that takes in a texture array as a uniform so that we can render different textures to instances of an instanced mesh. As instance meshes go in and out of the frustrum view of the camera (or as we load textures over the network) we modify the texture array. We’re observing that this is extremely slow.

Profiling shows that we’re getting a ton of calls to compressedTexSubImage3D (which makes sense) that is taking a ton of time on the CPU and GPU processes. See the image below:

Zoomed in CPU view

Zoomed in GPU view

A few misc notes

all of our textures are about 10kb each
we’ll observe anywhere from 5ms to 200ms transferring that data to the GPU
typically we only transfer 1-4 textures per frame

We haven’t really worked with texture arrays much before, so it’s totally possible that this is just expected behavior. Nonetheless, I’d love to know if there are things we can do to make this more efficient, or alternatively other things we can do to load the texture arrays.

One hypothesis we have is that the texture arrays are locked while the GPU is rendering which is causing lock contention when we try to synchronize the GPU and CPU. But we also don’t really know how to test that hypothesis and would love some input on that as well

manthrax · March 21, 2024, 3:58am

Is there a reason that you need to dynamically swap them in and out vs. just keeping them all resident at the same time?
It’s not quite clear what your setup is, since you mention texSubImage3d, but then mention using texture arrays… are these texture arrays of 3d textures or something?

compressedTexSubImage3D also sounds like it would be expected to incur some kind of latency, since operations on only a portion of a compressed texture may involve the driver decompressing and recompressing the texture on the CPU before uploading if it is only a partial texture update?
If you are indeed using compressed textures, perhaps you could decompress them after loading, so that selective updates are faster?
We probably need more information about the setup to make any good guesses.

One approach to testing whether its purely a fixed synchronization cost, vs something related to the amount of data, would be to swap the textures out with a single colored pixel texture each and see if the timings change?

theahura · March 21, 2024, 3:37pm

Is there a reason that you need to dynamically swap them in and out vs. just keeping them all resident at the same time?

When our application loads in, they come down the network incrementally. So we load them in dynamically based on when the textures become available. We preload a texture array that’s ‘empty’ (but of the right byte size) to start, and then as the textures stream in we swap individual slices of the empty texture array.

It’s not quite clear what your setup is, since you mention texSubImage3d, but then mention using texture arrays… are these texture arrays of 3d textures or something?

My understanding is that the texture array is essentially a “3 dimensional” stack of 2D images. So each slice of the texture array is 2D, but the overall data structure is 3D.

since you mention texSubImage3d, but then mention using texture arrays

It’s interesting that you call these out – are these not related in your mind? We don’t call compressedTexSubImage3D directly, I assumed that call is somewhere deep in the THREE webgl callstack when we move data over to the GPU in our uniform. I think all we are actually doing is setting needsUpdate on this data structure: three.js docs

compressedTexSubImage3D also sounds like it would be expected to incur some kind of latency, since operations on only a portion of a compressed texture may involve the driver decompressing and recompressing the texture on the CPU before uploading if it is only a partial texture update?
If you are indeed using compressed textures, perhaps you could decompress them after loading, so that selective updates are faster?

Under the hood, each slice of the texture arrays we are using is a basisu compressed png image which is loaded into THREE using the ktx2 loader. I was under the impression that the individual slices of the texture array could be compressed separately and swapped in and out, without having to update the entire texture array. (And note we are only ever changing slices, not partial changes within each slice.) I may just be misunderstanding how the texture array is actually represented though – maybe there is actually compression latency?

manthrax · March 21, 2024, 9:45pm

You can have an array of 2d textures. (an array of 2d textures, each possibly different width/height) (referenced in the shader as an array sampler2D[ ])

Or you can have a 3d texture… (not an array… fixed width/height, all slabs are the same dimensions… the storage is one contiguous block of memory of widthheightdepth*4 bytes) (referenced in the shader as a single sampler3D)

I’m not clear on which you are using… but compressedTexSubImage3D sounds like a 3d texture…

You said that you preallocate the buffer of the right byte size but I’m not clear how you know the compressed size of the texture before the texture is loaded?

And if it is a compressed 3d texture, I sort of expect partial update operations on that 3d texture to require decompressing then entire texture buffer, splicing in the new texture data and… re-compressing?! I didn’t know that was even a thing possible with compressed texture data…

Forgive my confusion… I just haven’t implemented something of this complexity before. but I am pretty curious?

In your shader, how is this texture object defined… Is it a sampler2D[ ] or a sampler3D?

theahura · March 21, 2024, 10:32pm

You can have an array of 2d textures. (an array of 2d textures, each possibly different width/height) (referenced in the shader as an array sampler2D)

Or you can have a 3d texture… (not an array… fixed width/height, all slabs are the same dimensions… the storage is one contiguous block of memory of widthheightdepth*4 bytes) (referenced in the shader as a single sampler3D)

Ah I misunderstood your question. It’s a 3D texture. To be specific, it is a THREE.CompressedArrayTexture.

You said that you preallocate the buffer of the right byte size but I’m not clear how you know the compressed size of the texture before the texture is loaded?

We know the dimensions of each slice of the TextureArray, so we preallocate a TextureArray of the right size

And if it is a compressed 3d texture, I sort of expect partial update operations on that 3d texture to require decompressing then entire texture buffer, splicing in the new texture data and… re-compressing?! I didn’t know that was even a thing possible with compressed texture data…

I could be wrong, but I am pretty sure the entire TextureArray itself is not compressed, rather the individual slices of the array are compressed. But I guess now that you mention it, it is sort of unclear. Digging in deeper a bit, it seems like three.js/src/renderers/webgl/WebGLTextures.js at ef80ac74e6716a50104a57d8add6c8a950bff8d7 · mrdoob/three.js · GitHub suggests it is maybe just a single compressed texture, or at least passes the data to the GPU all at once even if only one slice is changing. But I too am somewhat unsure of all of this, because I’m digging deeper into THREE internals than normal

Topic		Replies	Views
.basis textures and multi texture optimisation techniques (array textures, shaderMaterial etc.) Questions textures , materials , shaders	8	1830	January 24, 2021
Texture Performance Questions performance	2	2930	March 12, 2021
Texture Atlas Loader Resources loaders	6	4442	December 5, 2021
Is Partial Texture Rendering Efficient for Memory Management? Questions textures , optimization	2	229	November 6, 2023
Use GPUComputationRenderer texture for dynamic InstancedMesh objects positions Questions gpgpu	9	3629	May 5, 2022

Really high process resource utilization on both GPU and CPU when modifying layers of a texture array

Related topics