Greetings everyone!
First off, this is going to be a bit of a long post, not about any specific error, but rather about optimization strategies for applications with a lot of animated characters. I’ll provide a lot of information and maybe turn this a bit into some sort of case study. You dont have to read all the details if you dont want to, the intro is enough. I would appreciate any advice on optimizing such scenes and where to go from this, whether its general or detailed advice
Intro
I am working on a game with a lot (up to maximum of 500 visible at the same time) of animated units using SkeletonUtils.clone()
on SkinnedMesh
, each having its own animation state and its own AnimationMixer
. I thought I would need to work on instancing, but to my surprise, the bottleneck turned out to be CPU calculations (not rendering) and much faster than I expected.
The game is top down view RTS style and meshes are visible from a high distance:
This allows me to aggressively optimize the skinned model as much as possible:
- 700 vertices
- 900 triangles
- 128x128 textures
- 25 bones (lowest possible skeleton LOD in Mixamo)
However, even with this, frames start dropping below 60, after the number of Skinned Meshes exceeds 200 (no shadow casting).
General performance details:
Carefully monitoring the performance with 300 units (no shadows) on an older gaming laptop with Intel(R) Core™ i7-4720HQ CPU @ 2.60GHz 2.60 GHz and 8GB RAM, this is what I get in a single frame:
Note this is all just Three.js internal stuff. I have disabled all of my game logic.
Aside from just rendering, draw calls, etc, I noticed 3 main culprits which take up 75% of the scene CPU work (listed in the order of left to right on the profiler):
AnimationMixer.update()
of all the skinned meshesupdateMatrixWorld()
of all the meshes and most importantly - their bonesprojectObject()
on all the skinned meshes and their bones
Lets look at each one by one:
1. Animation mixers
Its mostly PropertyMixer.apply() calls, which seem to end on 50/50 Euler.setFromRotationMatrix() and AnimationAction._update()
where in the case of latter, all the time is spent at the end loop interpolants:
for ( let j = 0, m = interpolants.length; j !== m; ++ j ) {
interpolants[ j ].evaluate( clipTime );
propertyMixers[ j ].accumulate( accuIndex, weight );
}
I tried optimizing my animation as much as possible, getting rid of scale tracks and position tracks, leaving only really rotations interpolation. I also decreased the density of keyframes. Unfortunately this seems to still be a lot to process
2. updateMatrixWorld()
- Matrices local & world updates
On every frame Three.js traverses the scene and performs updateMatrixWorld()
on all objects.
Not much to ponder here. 25 bones per skeleton is pretty low in games standard (around 30 is recommended for mobile), but that’s still 300 x 25 = 7500 nested Object3D
s that need their matrices to be calculated each frame for animation. I tried:
- flattening the skeleton bone hierarchy, but no noticable perf improvement seen
- different matrix multiplication algorithms instead of the default one. Many, many tries optimizing it, using no memory allocation inside the function, or the gl-matrix.js code, or other multiplication algos - no noticable performance gain.
But hey, at least it meansThree.js
is pretty optimized in this hot path, so good job to the maintainers I suppose
3. projectObject()
On every frame Three.js traverses the scene and performs projectObject()
on its objects.
Since bones arent renderable, I added a check quick return at the start of projectObject()
:
if ( object.visible === false || object.isBone ) return;
I may be wrong, but I believe I noticed at least a small improvement due to this. Unfortunately its not significant enough as majority of its time seems to be spent on WebGLObjects.update()
which calls Skeleton.update(). Its a function which is just a loop over all bones and calculating matrix offsets. I tried:
- making it a bit more efficient but not much gain was seen. Most of it is just matrices multiplication (going back to point 2.)
- simplifying the model skeleton to the lowest possible LOD of 25 bones (no hand/face bones, etc). Any further would be very difficult and likely produce visual artifacts.
Possible solutions to try:
I have been going at it for a couple of days now, struggling to break through in the Three.js grounds and optimizing the model, but havent been able to reach anything substantial. Looking more outside into external tools, there are some things I am considering:
InstancedSkinnedMesh
- great codepen three.js - instanced skinned mesh - CodeSandbox by @CodyJasonBennett. I am not sure, but seems like allows for different animation states for each instance. Additionally relieves the GPU, memory and draw calls, however those are not the bottlenecks atm. Unfortunately after increasing the number of instances to couple hundreds and firing profiling tools, it also seems to suffer from matrices multiplication CPU bottlenecks.- Multiple
SkinnedMesh
es sharing a singleSkeleton
- seems to completely eliminate the issue of bone matrices multiplications, since there is only one skeleton. Unfortunately it comes at a cost of allSkinnedMesh
es sharing the same animations state. One way around it would be to store a skeleton for each animation state and when the entity needs an animation change, rebind to another skeleton with that animation. Seems hacky but might work. Its a little similar to combining instancing with LOD (one instanced mesh for each LOD). This is likely to be my next try. - Moving matrix multiplications to workers - not sure how straight forward that would be in Three.js, but at least sounds possible. JS workers are pretty asynchronous, which might result in some matrices not being calculated in time, but maybe its not that big of a deal for one frame.
- - I dont know what else can be tried. I will be grateful for any other suggestions.
! Summarizing !
I am really hoping I can do something better and not that I hit the limits of Three.js/WebGL/hardware. There have been many RTS games and battle games (Mount & Blade for example) with hundreds animated units running smoothly, on my PC as well, so I have hopes for solving this.
I looked a bit through Unity forums, and while there are posts about rendering large numbers of animated characters, none of them really mention things outlined in the profiler above. Could it be game engines do these matrix multiplications on the GPU? Or is C++ that much superior to JS that it never surfaces as a problem on the CPU?
If you have any answers to these questions and/or recommendations on the details of the Three.js bottlenecks, how to remedy them, how to work around it, or improve it for this specific scenario, I will gladly hear it. General strategies for optimizations of such games are also greatly appreciated
Thank you for your time and thank you for reading!