Optimization of large amounts (100-1000) of Skinned Meshes (CPU bottlenecks)

Greetings everyone! :wave: :smiley:

First off, this is going to be a bit of a long post, not about any specific error, but rather about optimization strategies for applications with a lot of animated characters. I’ll provide a lot of information and maybe turn this a bit into some sort of case study. You dont have to read all the details if you dont want to, the intro is enough. I would appreciate any advice on optimizing such scenes and where to go from this, whether its general or detailed advice :slight_smile:

Intro

I am working on a game with a lot (up to maximum of 500 visible at the same time) of animated units using SkeletonUtils.clone() on SkinnedMesh, each having its own animation state and its own AnimationMixer. I thought I would need to work on instancing, but to my surprise, the bottleneck turned out to be CPU calculations (not rendering) and much faster than I expected.

The game is top down view RTS style and meshes are visible from a high distance:

This allows me to aggressively optimize the skinned model as much as possible:

  • 700 vertices :eight_pointed_black_star:
  • 900 triangles :small_red_triangle:
  • 128x128 textures :framed_picture:
  • 25 bones :bone: (lowest possible skeleton LOD in Mixamo) :skull_and_crossbones:

However, even with this, frames start dropping below 60, after the number of Skinned Meshes exceeds 200 (no shadow casting).

General performance details:

Carefully monitoring the performance :bar_chart: with 300 units (no shadows) on an older gaming laptop with Intel(R) Core™ i7-4720HQ CPU @ 2.60GHz 2.60 GHz and 8GB RAM, this is what I get in a single frame:

Note this is all just Three.js internal stuff. I have disabled all of my game logic.

Aside from just rendering, draw calls, etc, I noticed 3 main culprits which take up 75% of the scene CPU work (listed in the order of left to right on the profiler):

  1. AnimationMixer.update() of all the skinned meshes :man_dancing:
  2. updateMatrixWorld() of all the meshes and most importantly - their bones :skull_and_crossbones:
  3. projectObject() on all the skinned meshes and their bones :movie_camera:

Lets look at each one by one:

1. Animation mixers

Its mostly PropertyMixer.apply() calls, which seem to end on 50/50 Euler.setFromRotationMatrix() and AnimationAction._update() where in the case of latter, all the time is spent at the end loop interpolants:

for ( let j = 0, m = interpolants.length; j !== m; ++ j ) {
	interpolants[ j ].evaluate( clipTime );
	propertyMixers[ j ].accumulate( accuIndex, weight );
}

I tried optimizing my animation as much as possible, getting rid of scale tracks and position tracks, leaving only really rotations interpolation. I also decreased the density of keyframes. Unfortunately this seems to still be a lot to process :frowning:

2. updateMatrixWorld() - Matrices local & world updates

On every frame Three.js traverses the scene and performs updateMatrixWorld() on all objects.
Not much to ponder here. 25 bones per skeleton is pretty low in games standard (around 30 is recommended for mobile), but that’s still 300 x 25 = 7500 nested Object3Ds that need their matrices to be calculated each frame for animation. I tried:

  • flattening the skeleton bone hierarchy, but no noticable perf improvement seen
  • different matrix multiplication algorithms instead of the default one. Many, many tries optimizing it, using no memory allocation inside the function, or the gl-matrix.js code, or other multiplication algos - no noticable performance gain.
    But hey, at least it means Three.js is pretty optimized in this hot path, so good job to the maintainers I suppose :clap: :smiley:

3. projectObject()

On every frame Three.js traverses the scene and performs projectObject() on its objects.
Since bones arent renderable, I added a check quick return at the start of projectObject():

if ( object.visible === false || object.isBone ) return;

I may be wrong, but I believe I noticed at least a small improvement due to this. Unfortunately its not significant enough as majority of its time seems to be spent on WebGLObjects.update() which calls Skeleton.update(). Its a function which is just a loop over all bones and calculating matrix offsets. I tried:

  • making it a bit more efficient but not much gain was seen. Most of it is just matrices multiplication (going back to point 2.)
  • simplifying the model skeleton to the lowest possible LOD of 25 bones (no hand/face bones, etc). Any further would be very difficult and likely produce visual artifacts.

Possible solutions to try:

I have been going at it for a couple of days now, struggling to break through in the Three.js grounds and optimizing the model, but havent been able to reach anything substantial. Looking more outside into external tools, there are some things I am considering:

  • InstancedSkinnedMesh - great codepen three.js - instanced skinned mesh - CodeSandbox by @CodyJasonBennett. I am not sure, but seems like allows for different animation states for each instance. Additionally relieves the GPU, memory and draw calls, however those are not the bottlenecks atm. Unfortunately after increasing the number of instances to couple hundreds and firing profiling tools, it also seems to suffer from matrices multiplication CPU bottlenecks.
  • Multiple SkinnedMeshes sharing a single Skeleton - seems to completely eliminate the issue of bone matrices multiplications, since there is only one skeleton. Unfortunately it comes at a cost of all SkinnedMeshes sharing the same animations state. One way around it would be to store a skeleton for each animation state and when the entity needs an animation change, rebind to another skeleton with that animation. Seems hacky but might work. Its a little similar to combining instancing with LOD (one instanced mesh for each LOD). This is likely to be my next try.
  • Moving matrix multiplications to workers - not sure how straight forward that would be in Three.js, but at least sounds possible. JS workers are pretty asynchronous, which might result in some matrices not being calculated in time, but maybe its not that big of a deal for one frame.
  • :question: - I dont know what else can be tried. I will be grateful for any other suggestions.

! Summarizing !

I am really hoping I can do something better and not that I hit the limits of Three.js/WebGL/hardware. There have been many RTS games and battle games (Mount & Blade for example) with hundreds animated units running smoothly, on my PC as well, so I have hopes for solving this.
I looked a bit through Unity forums, and while there are posts about rendering large numbers of animated characters, none of them really mention things outlined in the profiler above. Could it be game engines do these matrix multiplications on the GPU? Or is C++ that much superior to JS that it never surfaces as a problem on the CPU?

If you have any answers to these questions and/or recommendations on the details of the Three.js bottlenecks, how to remedy them, how to work around it, or improve it for this specific scenario, I will gladly hear it. General strategies for optimizations of such games are also greatly appreciated :sparkling_heart:

Thank you for your time and thank you for reading! :bowing_man:

4 Likes

I’m looking at the CPU matrix multiplication in my InstancedSkinnedMesh demo (_offsetMatrix.multiplyMatrices(matrix, boneInverses[i])), and I wonder if we can move it to the GPU at the cost of doubling memory. That can be optimized further if we know the types of transforms it contains (don’t need a 4x4 matrix for pure rotation or pure translation), or if, like with glTF for instance, there will be no hierarchy and we can safely remove it altogether. Another idea would be to create keyframes or similar that are merely interpreted on the GPU.

2 Likes

Afaik skinning is already performed on the GPU via a bone texture.

I think for the style of units you have, @DolphinIQ you might get better performance with morph targets… or a similar scheme of just storing raw vertex animation frames and tweening them in the shader?

1 Like

Skinning is done on the GPU with a bone texture in all cases; that’s not the problem – (re)populating the data is, which I think is caused by a poor data structure. I think we’re all on the same page about keyframes, or however you want to call them. I’d be curious if there’s prior work on precomputing this to a texture array, VAT, or however you consume it since I believe that’s the main issue here.

@CodyJasonBennett I think moving the matrix multiplication to the GPU would be a huge win! Seems like a perfect job for the GPU, even at the cost of doubling memory. If the users dont intend to animate bones themselves on the CPU (which is very rare), it would be a large boost to exported animations.

Update!

I tried @mjurczyk’s advice - animation throttling. That is simply not doing animation on every frame, but rather once every X frames. This technique is used by many games, Nintendo uses it, and even Roblox https://devforum.roblox.com/t/new-clientanimatorthrottling-property/1553697.

Did it work? Well, yes, but only to a degree. Unfortunately out of the 3 performance issues, this technique solves only one of them - the first one - AnimationMixer.update(). Out of the 3, this is the only thing which can be skipped updating for a couple frames and works as intended:

If I try to skip Bone matrices world updates or Skeleton.update() the character breaks. Specifically, the skinned mesh doesnt follow its proper position, but rather only updates its mesh transform once the bones are updated. So with throttling of skipping 3 frames, it only updates once every 4th frames resulting in it looking very choppy:


You can see the character’s selection circle is its real position after which the skinned mesh is lagging behind.

Seems like once SkinnedMesh is binded to the skeleton, it no longer really cares about non-bone matrices and is only interested in its bones? But maybe I am missing something here. If it should be possible, please let me know.

Unfortunately optimizing just one of the 3 issues, will not cut it and the number of animated skinned meshes is still very limited. I will continue to look for other solutions.

2 Likes

I’m still going to release that one sometime soon (probaly around christmas :christmas_tree:) Animated Instanced Skinned Meshes (GLTF) - #9 by Fyrestar

It doesn’t just make SkinnedMesh instances but handles hierarchical LOD reduction to process less per LOD as well as throttling the updates, the actual bottleneck with instancing and skinning is the pure massive matrices math going on.

I also figured i’ll extend it to move these computations off the main thread to workers. Depending on the skeleton structure (like humanoid) that amound should be no problem already, however this optimization depends on sparse distribution if a large crowd is moving dense nearby it becomes expensive still, so moving the computations into a async backend api will help with that, workers, then i look into GPU too.

5 Likes

That sounds great! Do you need any help? I could provide a test skinned model with multiple LODs. I would love to help deliver this by December. That would be an awesome christmas gift to Three.js :heart_eyes: :christmas_tree:

Hide bones to save performance.

mesh.traverse(function(child){
if(child.isBone){ child.visible=false; }
}

If the users dont intend to animate bones themselves on the CPU (which is very rare),

Is this that rare? Any kind of physics or environment based animation like IK or ragdolling would require CPU updating. And I don’t know the details of what performing all these animations on the GPU might entail but animation blending might also pose problems.

A few other thoughts on how you could handle this, some of which have already been mentioned:

  • Move bone animation calculations over to workers and update the bone matrix texture buffer quickly to the main thread.
  • Update animations every few frames (I wonder if it’s possible to update an animation step and then interpolate over the next frames so you avoid the choppiness). You could choose to update more complex or interesting animations more frequently or based on distance / frustum culling.
  • Reuse animations across characters. They don’t all need to walk with the same animation but maybe you could have a pool of 5-10 currently animated skeletons at different states that could be randomly picked from when starting a character walking.
  • Swap out models and skeletons for lower levels of detail based on distance / frustum culling.

Seems like once SkinnedMesh is binded to the skeleton, it no longer really cares about non-bone matrices and is only interested in its bones? But maybe I am missing something here. If it should be possible, please let me know.

Look into “bindMode” on SkinnedMesh. I’m not that familiar with the details of it but it looks like setting it is intended to enable skinned mesh root to be moved independent of the bones.

3 Likes

@Chaser_Code Unfortunately bone visibility does not affect performance :confused:

@gkjohnson Personally I rarely see procedural bone animations. Usually just very specific, advanced games. Could just be really subjective to my happenstances of course :smile_cat:

Update!

I also tried implementing the hot path matrix multiplication in Rust compiled to WASM, but the performance only got worse :sweat_smile: It likely needs to be in a larger scope than just replacing specific functions, but would require a lot of reorganizing of Three.js internals at which point it would probably be better to just use workers for that :construction_worker_man: :hammer_and_pick:

@gkjohnson Thanks for the advice, those sound very effective. In the end I went with animation throttling, so updating animation once every N frames. In my case 4 seems like a good value. After that it becomes too visible.

Also thanks to your tip SkinnedMesh.bindMode = "detached" (and adding the root bone to the scene becomes mandatory with detached) it allowed me to throttle world matrix multiplications and Skeleton.update() as well as mixer.update() (eliminating the issue from my previous comment). With this, my app can handle a lot more units than before, reaching even 500 Skinned Meshes on the screen at 60fps! :partying_face: :tada: (slightly lowered on the recording due to OBS screen capture)

And here is a closer look into animation throttling visuals. The number changed in the GUI means to update animations/matrices/skeleton every N frames (meaning at 4 it will skip 3 frames, then update, then skip 3, etc)

I only start seeing a very slight difference at 3. At 12 you can fully see how the animation update rate is much lower than overall framerate, producing choppy, Starcraft1-looking animations :space_invader: :space_invader: Still doesnt seem too bad, but of course biggest performance gains are in the lower range anyway.

To summarize!

Maybe the issue isn’t entirely solved, the bottlenecks have simply been spread out over more frames, so they will return with a larger amount of meshes, however I’m satisfied! :heart_eyes:
I have 500 units walking around, each with their own animation state, pathfinding and collisions. I dont need more at the moment, and if an issue arises, this thread is really a gold mine of incredible ideas :magic_wand: :sparkles: so it will be easy to pick one in the future. Im sure others who encounter this problem will also benefit greatly from suggestions here. Thanks to everyone posting, this community is the best! :sparkling_heart: :heartbeat: Have a great day!

.

Edit: If I make more progress on this topic in the future, I will post more updates here, and I encourage others to do so as well, as this thread had become quite rich with great ideas.

5 Likes

In the video I saw that some models are stopped, so they don’t need to update their matrices, do they? Or do they have an idle animation that I don’t notice?

Have you disabled matrixAutoUpdate and matrixWorldAutoUpdate?

Which video, which timestamp? :sweat_smile:

In general they never stop animation (not talking about throttling atm), they just change their animation state from Walk to Idle. The idle animation isn’t very noticeable, its just them standing in place, breathing, but its still an animation. The default is T-pose.

As for the details, I did something more or less like this:

// Setup inside the unit class
this.rootBone.matrixWorldAutoUpdate = false;
this.skinnedMesh.bindMode = "detached";

// Setup outside the unit class
scene.add( this.rootBone ); // Necessary with "detached" bind mode!

// Update loop
if ( shouldUpdateAnimationDependingOnFrame ) {
    unit.mixer.update( delta * skippedFramesCount ); // Compensate for skipped frames
    unit.skinnedMesh.skeleton.needsUpdate = true;
    unit.rootBone.updateWorldMatrix( false, true );
}

I had to hack three.js internals in a few places for this to work so here are the important notes:

  1. Implement Skeleton.needsUpdate flag to indicate whether the skeleton should be updated inside WebGLRenderer.projectObject(). Very similar to many other .needsUpdate flags inside Three.js
  2. Edit Object3D.updateMatrixWorld( force ) method - Remove force flag and make `Object3D.matrixWorldAutoUpdate` work properly again by DolphinIQ · Pull Request #27261 · mrdoob/three.js · GitHub, because currently (r158) Object3D.matrixWorldAutoUpdate = false doesnt work and bone matrices would be recalculated regardless of that flag.

Sorry, answer took me a while because I was working on the PR above :construction_worker_man: :hammer_and_wrench:

3 Likes

Could you explain how you implemented the ‘skeleton.needsUpdate’ feature?

I developed a repository specifically for experimentation purposes. Initially, I was achieving 10 fps while handling 2,000 instances. However, through a mix of various optimization techniques, including animation throttling and the use of instanced skinned meshes, I was able to dramatically increase the performance, hitting 144 fps.

Before:

After:

You can check the repo here, which includes a guide on how to use the classes and a link to a demo.

I hope you find this repository helpful. I believe there’s still potential for further performance enhancements @CodyJasonBennett @DolphinIQ @Fyrestar . Your feedback and contributions are greatly appreciated.

4 Likes

Good performance on my machine. Good solution about turn off animation at distance its increase fps. Also can use sprite imposter for far objects maybe, but maybe and not if far objects have few trinagles.

1 Like

Really cool demo! Unfortunately it suffers from the same issue as Cody’s original codepen - bones matrix world calculations. On a really good gaming PC I am already 40fps at 500 instances due to CPU lagging. Issue disappears with animation throttling, but makes animations rather choppy. I ran into this as well, unfortunately its only redistributing workflow over multiple frames, while the core issue remains. Throttling over X frames gives you a linear X increase.

Looking for even better scaling I ended up with reusing skeletons for many Skinned Meshes sharing the same animation state using some sort of skeleton manager:

import { AnimationClip, AnimationMixer, Bone, Skeleton } from 'three';
import * as SkeletonUtils from 'three/addons/utils/SkeletonUtils.js';

class SkeletonManager {

    constructor( model ) {

        this.sourceModel = model; // gltf.scene
        
        const shareSkinnedMesh = model.getObjectByProperty( 'isSkinnedMesh', true );
        if ( !shareSkinnedMesh ) {
            throw new Error('SkeletonManager provided model does not have a SkinnedMesh!');
        }
        /**
         * Requirements per 1 animation:
         * - 1 or more skeletons (for binding skinned meshes to it)
         * - 1 mixer (for playing the animations)
         * - 1 root bone (to be added to the scene)
         */
        // Animation names which will be used as identifiers for maps below
        /** @type { Set <string> } */ this.animations = new Set(); 
        /** @type { Object <string, AnimationMixer> } */ this.mixers = {};
        /** @type { Object <string, Skeleton> } */ this.skeletons = {};
        /** @type { Object <string, Bone> } */ this.rootBones = {};
        /** @type { Object <string, number> } */ this.animationSpeedFactors = {};
    }

    update( delta ) {

        for ( const animationName of this.animations ) {

            this.mixers[ animationName ].update( delta );
            this.skeletons[ animationName ].needsBonesUpdate = true;
        }
    }

    /**
     * Registers an animation (currently only 1 skeleton per animation)
     * @param { AnimationClip } animation 
     * @param { Float } animationSpeedFactor 
     * @param { Integer } skeletonsPerAnimationCount 
     */
    addAnimation( animation, animationSpeedFactor = 1.0, skeletonsPerAnimationCount = 1 ) {

        const clipName = animation.name.toLowerCase();

        const animationMesh = SkeletonUtils.clone( this.sourceModel );

        const skeleton = animationMesh.getObjectByProperty( 'isSkinnedMesh', true ).skeleton;
        const rootBone = skeleton.bones[ 0 ];
        const mixer = new AnimationMixer( rootBone );
		mixer.clipAction( animation ).play();
        mixer.timeScale = animationSpeedFactor;

        this.animations.add( clipName );
        this.animationSpeedFactors[ clipName ] = animationSpeedFactor;
        this.mixers[ clipName ] = mixer;
        this.skeletons[ clipName ] = skeleton;
        this.rootBones[ clipName ] = rootBone;

        return rootBone; // return root bone which needs to be added to the scene
    }

    requestSkeleton( animationName ) {

        const lowercase = animationName.toLowerCase();
        if ( this.animations.has( lowercase ) === false ) {
            throw new Error( 'SkeletonManager.requestSkeleton() - SkeletonManager does not posess any animations of provided animationName!' );
        }

        return this.skeletons[ lowercase ];
    }
}

export { SkeletonManager };

And usage is like so:

// General setup 
const unitSkeletonManager = new SkeletonManager( gltf.scene );
for ( const animation of gltf.animations ) { // Register all gltf animations
      // Lets assume 'gltf.animations' constains an AnimationClip called "walk"
      const rootBone = unitSkeletonManager.addAnimation( animation, 1, 1 );
      scene.add( rootBone ); // need to add the root bones in case of "detached"
}

// Unit specific setup. Do for each unit sharing the same model & animations
unit.skinnedMesh.bindMode = "detached";

// And then whenever one of those units changes animation state, we simply call:
const skeleton = unitSkeletonManager.requestSkeleton( "walk" );
unit.skinnedMesh.bind( skeleton, _someIdentityMat4 );
// THAT'S ALL!

With this configuration I can get 1000 animated skinned meshes on the screen at the same time w 60fps! :partying_face:
And that is without animation throttling and on a low-end old laptop! :older_man: :computer: :tada:
The animation transitions actually arent visible at all, unless youre really close and looking for them :eyes:

I went with this because I am making a top down view game, where distance throttling isnt very useful, since all characters are relatively at the same distance from the camera. For a first person game you could maybe use this for distant characters and normal skinning for close ones. Implementing multiple skeletons per animation and then choosing the one closest to animation start should also help, thats one of the things on my list. Alternatively for closeup first person games @luisherasme solution should also work very well if not better.

After sharing skeletons, the bottleneck moves over to rendering thousands of skinned meshes. Thats a lot of draw calls for WebGL2 and also I believe Skinned Meshes are a bit more expensive to render than normal ones.
This could be further improved upon with instancing and changing bone attributes directly per instance. This way it could likely reach 10k and more characters since the skeleton costs are now just O(1) instead of O(n) like with throttling.

But first I have to optimize my Quadtree implementation and maybe do offscreen canvas threading :sweat_smile:

@luisherasme

Could you explain how you implemented the ‘skeleton.needsUpdate’ feature?

Look for WebGLObjects.js and the end of update() method:

if ( object.isSkinnedMesh ) {

	const skeleton = object.skeleton;

	// if ( updateMap.get( skeleton ) !== frame ) { // change this line with the one below
	if ( skeleton.needsBonesUpdate && updateMap.get( skeleton ) !== frame ) {

		skeleton.update();
		updateMap.set( skeleton, frame );

		skeleton.needsBonesUpdate = false; // Add this to reset the update flag
	}
}

Then, just manually set .needsBonesUpdate = true; on your Skeleton whenever you need it to update

1 Like

Hey, can you try again? I just updated the demo to use LOD for the animations. It’s much faster on my machine; the bottleneck is now on the GPU.

https://instanced-animation.vercel.app/

I feel I’m left off-board. 2000 skinned meshes, instancing turned on, I barely reach 5 fps in Firefox and 10 fps in Chrome.

Hey, can you run the Chrome profiler and show me the results? Before I added the LOD for the animations, I tested this demo on a tablet and found that the bottleneck was on the GPU. However, on my desktop, the bottleneck was in the CPU.

Summary:


Call tree of a small fragment: