"codec avatars" like with gs and tv

I recently saw this incredibile interview where faces of the 2 people are recreated in vr:

I immediately thought how this could be implemented using threejs/web technologies:

  1. Using gaussian splats, couldn’t we recreate a mesh of our face captured with the webcam?

  2. then using tasks-vision facelandmarker couldn’t we apply blendshapes to it?

The closest demo we have so far is I guess the recent: three.js examples from @mrdoob

it uses facecap.glb which I suppose contains some specific morphTargetInfluences we should probably re-create dynamically from the captured mesh

What do you think?

→ I’m interested if you know more about that facecap.glb model and the “influences” it embeds.

EDIT: In addition, I also found this great “morphable” face kit — could maybe be used as an alternative to facecap model

Links
1 Like

I don’t think they’re using morph targets, or some kind of “rigged” face. I think they are doing mesh streaming over a high bandwidth connection.

:open_mouth: Surely streaming the data between weights [0, 1] of say 20 morph targets would be more efficient / lightweight that streaming the complete mesh vertice positional data for each model (potentially 10’s of thousands of data points per update) of each pose right? If not that’s staggering!

I’m just speculating. But things like… the eye movement… mouth interiors… tongue… wrinkles… it’s all looks too precise to me to be canned. But this isn’t new technology either… The film industry has been using similar facial mocap for a while now, and this looks like an extension of that.

A 4k mp4 is streaming 8,294,400 rgb pixels per frame at 60 hz, so it’s not unfathomable to stream meshes at that similar rates. It comes down to bandwidth, codecs, and capture hardware.

I would speculate that the initial “capture” process they allude to in the video is capturing diffuse/normal/roughness of the face in a default pose, and then at runtime the hardware is capturing a depth map, streaming that to the clients, and then reconstructing 3d from there…

1 Like