The speed isn’t really satisfactory yet and my client still finds it a bit slow.
Is there a faster or more efficient way to build something like this? Right now I’m doing wrist tracking and attaching the 3D model manually, but I’m wondering if there are better approaches like using WebXR anchors, AR frameworks, or different tracking methods that could improve performance or stability.
Yes, I know TensorFlow. I’ve used Mediapipe directly, and since TensorFlow also uses Mediapipe under the hood, it actually sits on top of Mediapipe, which makes it even slower. I even tried compiling the C# translation program into WebAssembly, but it still wasn’t fast enough.
In TensorFlow, there’s no wrist detection built in, and combining full-body and hand tracking adds even more load for mobile devices, plus TensorFlow itself is quite heavy.
I seem to recall last time I implemented virtual try-on, that it was easy to flood the inference with requests, and have them back up and slow down the inference. Perhaps check that your not requesting inference every frame? ( i could be misremembering this.. )
Part of getting it to run smooth for me also required filtering the inference data… I was getting lots of jitter when calling it at high rates, so I ended up having to do a kind of running average filter on the data to reduce noise.. I think I started out using a kalman filter, but then ended up going with a simpler running average.
For a JS mediapipe using body tracking project, I discovered that on some devices running the inference on the HTML video element was significantly slower than drawing the video feed on an HTML canvas and running the inference on the canvas.