Before jumping into searching for solutions, you should clarify your workflow in terms of how those images are generated in the first place, and what has threejs to do with that.
One way of doing that would be to perform some sort of camera solving for being able to find intrinsic/extrinsic camera parameters and actually align a threejs camera-scene combo with those images/masks, or simply apply this parameters if you happen to know them in advance.
If that is your plan, and only once images and a threejs camera-scene combo is in place, you could try something like:
- masking objects by using postprocessing
- masking objects by setting their transparency
- stencil maybe
- CSG is too much?