Best way to do Instanced Mesh picking in 2024?

Hey folks!

I am working on a project where I am rendering thousands to tens of thousands of images simultaneously using an instanced mesh. The instanced mesh has a vertex shader on it that effectively moves all of the associated images around in world space for animations.

I want to implement object picking (both single-click mouse raycasting and group selection using a lasso or rectangle select) on this setup. There are a few possible ways to do this, but I’m not sure what is ideal given advancements in web graphics.

The naive approach is to try and calculate picking logic in javascript instead of doing gpu-picking. I could create a second picking geometry with a bunch of planes, update that geometry to match the shader behavior, and then do hit testing with the mouse cursor. This would work, but would be super slow and probably eliminates most of the benefit of using an instanced mesh in the first place. Also, maintaining two sources of truth for the shader transforms seems bad from an engineering perspective.

A slightly different approach is to do color-based picking. The basic idea here is to have an off screen pixel buffer where the individual pixel color values correspond to unique 1:1 ids for instances in the visible instanced mesh. That off screen buffer could be associated with its own scene that is updated with the same shader logic as the instanced mesh. This is essentially the approach used here: three.js/examples/webgl_interactive_cubes_gpu.html at master · mrdoob/three.js · GitHub. The upshot of this approach is that it should be relatively performant (everything is happening on the GPU). The downside is that it won’t work for group/lasso selecting instances that are occluded by other instances (is there a modification to the above three.js example that supports group selection? :thinking:) Further, when I tried implementing this the first time around, I ran into weird GPU synchronization issues – I’m not really sure how best to describe this issue, but it was taking up to 25ms per frame of just waiting on the GPU.

There are modifications to the above approach that may allow us to resolve the occlusion problem for group selection – maybe running multiple scenes at different z heights or something? – but none of these feel particularly elegant and they all are much more expensive.

A third way of approaching the problem is use some kind of compute shader. AFAICT these are supported in webgpu, but not webgl or webgl2. And webgpu itself isn’t that widely supported yet – just chrome iirc.

Are there other approaches that I am missing, or modifications to the above that will make them more tractable? The color picking mechanism was the optimal approach maybe 5 years ago – have recent advances in webgl2 or so improved things?

Maybe too much to hope for a silver bullet :sweat_smile:

we’ve had nothing but bad experience with gpu and color picking. i remember it was fairly complex, changed the nature of the app since now we had to transmit data into uniforms which unhinged data flow, needed new shaders for meshes and lines. it never felt that fast either because of all the pixel processing, edge detection and things like that. whenever the app or its models and scene have to conform to a technical detail i would be careful.

whatever difference in performance was there can probably be remedied with bvh GitHub - gkjohnson/three-mesh-bvh: A BVH implementation to speed up raycasting and enable spatial queries against three.js meshes.

although i have no idea if it will work with instanced meshes ootb. you most likely would have to adapt THREE.InstancedMesh.prototype.raycast to this three-mesh-bvh/src/utils/ExtensionUtilities.js at 39b479b41b9ecfc473a9abc1ac300b3ce5c82f86 · gkjohnson/three-mesh-bvh · GitHub or maybe it exists already @gkjohnson

1 Like

BVH seems useful for making raycasting itself fast, especially in a situation where we have many raycasts. But unless I’m missing something it probably won’t help with a scenario where we have a only one raycast, but the picking geometry itself needs to update a lot (which is more or less the thing I’m dealing with)

The reason I was looking into gpu-picking at all is because the shader for the animation transformations is pretty expensive to run in JS

@Mugen87 I hope you don’t mind the tag, I noticed that you’d responded to some previous questions that were similar (e.g. Instances GPU picking with complex shaders) and some related github PRs (e.g. InstancedMesh: Add support for raycast(). by webglzhang · Pull Request #17961 · mrdoob/three.js · GitHub).

Curious if you had any thoughts on the above. I notice too that InstancedMesh has a native raycaster, abutnd I am pretty sure that that does not support shader-animations, but wanted to check if it does

@theahura

I notice too that InstancedMesh has a native raycaster, abutnd I am pretty sure that that does not support shader-animations

If you’re animating the geometry in the shader then you’re on your own in terms of raycasting. The cpu does not know about the gpu-computed transformations.

A slightly different approach is to do color-based picking.

Further, when I tried implementing this the first time around, I ran into weird GPU synchronization issues – I’m not really sure how best to describe this issue, but it was taking up to 25ms per frame of just waiting on the GPU.

GPU-picking is a fine approach but you have read the data asynchronously. Otherwise the application will stall waiting for all other GPU work to finish. Ultimately this means that you’ll have to read the data from a frame or two before. You can look into WebGLQueries to be able to check when your draw is actually complete and ready to read.

A couple other simple optimizations are that you can also MRT (multiple render targets) to draw the ids and the color in a single draw pass to save time. And you should only read the single pixel under the mouse cursor back to the CPU to check which item is being picked.

The downside is that it won’t work for group/lasso selecting instances that are occluded by other instances

There may be some fancy shader work you can do but even with compute shaders I don’t think this would be simple. You can use a depth-peeling like approach or run the checks in parallel in a webworker. I don’t know you’re requirements but it would be much simpler if you change the behavior to only allow the center of each target to be selected by the lasso - then you can run this check on the GPU more easily and performantly.

@drcmda

it never felt that fast either because of all the pixel processing, edge detection and things like that.

GPU picking is fast - it needs to be implemented properly, though. There’s no need for edge detection or additional processing and you need to make sure you read data back once its ready. Though if supporting asynchronous pick events is a problem for an app then yes another approach may be better.

whatever difference in performance was there can probably be remedied with bvh GitHub - gkjohnson/three-mesh-bvh: A BVH implementation to speed up raycasting and enable spatial queries against three.js meshes.

although i have no idea if it will work with instanced meshes ootb. you most likely would have to adapt

Three-mesh-bvh works for instanced meshes in that it will perform BVH-based raycasting on each individual geometry in the instanced mesh but not for the set of instanced objects. In this case the geometries are planes so there’s little benefit to be had.

2 Likes

i remember we had to implement something for lines for instance, because one pixel wasn’t enough, filter order was also a problem, where some objects are to be prioritised over others. all in all it wasn’t so much the performance, but all the back and forth, conforming the whole app to the process of selecting, and of course the async nature which then caused other issues. we moved the product to BVH and never looked back. we do not change the underlying geometry, so it was a good fit - in this case i don’t know, i thought maybe it could be a consideration.

2 Likes

GPU-picking is a fine approach but you have read the data asynchronously. Otherwise the application will stall waiting for all other GPU work to finish. Ultimately this means that you’ll have to read the data from a frame or two before. You can look into WebGLQueries to be able to check when your draw is actually complete and ready to read

Never heard of this before! Do you have a code pointer by any chance? I would’ve expected the three.js example (three.js webgl - interactive cubes (gpu)) would have this but it doesnt look like it

@gkjohnson sorry, what do you mean here? I’m not immediately sure why this would be better

@theahura

Never heard of this before! Do you have a code pointer by any chance?

I don’t.

@gkjohnson sorry, what do you mean here? I’m not immediately sure why this would be better

You can render the lasso shape to a texture and then sample the texture on the gpu to determine whether or not a point is within the lasso shape. You can render the result for each item to a single pixel and read it back to get the selected set.

Reading just the center pixel is a way to speed things up. You can of course read more over the surface of each shape but iterating over every pixel on the surface of every shape will likely be prohibitively slow.

@drcmda

all in all it wasn’t so much the performance, but all the back and forth, conforming the whole app to the process of selecting, and of course the async nature which then caused other issues.

Yeah generally I consider gpu picking to be a very use-case specific approach for selection.

So based on the discussion above + some other poking around, I put together the following example:

import * as THREE from 'three';
import { TrackballControls } from 'three/addons/controls/TrackballControls.js';

// Basic scene setup
const scene = new THREE.Scene();
const camera = new THREE.PerspectiveCamera(75, window.innerWidth / window.innerHeight, 0.1, 1000);
camera.position.z = 5;

const renderer = new THREE.WebGLRenderer({ preserveDrawingBuffer: true });
renderer.setSize(window.innerWidth, window.innerHeight);
document.body.appendChild(renderer.domElement);

const controls = new TrackballControls(camera, renderer.domElement);
controls.rotateSpeed = 5.0;
controls.zoomSpeed = 1.2;
controls.panSpeed = 0.8;
controls.noZoom = false;
controls.noPan = false;
controls.staticMoving = true;
controls.dynamicDampingFactor = 0.3;

const planeGeometries = [];
const planeMaterials = [];
const planeMeshes = [];
const NUM_PLANES = 10; // Number of plane geometries

// Create plane geometries
for (let i = 0; i < NUM_PLANES; i++) {
  const geometry = new THREE.PlaneGeometry(1, 1);
  const material = new THREE.MeshBasicMaterial({ color: 0xffffff });
  const plane = new THREE.Mesh(geometry, material);
  plane.position.set(5 - i, 5 - i, 5 - i);
  scene.add(plane);

  planeGeometries.push(geometry);
  planeMaterials.push(material);
  planeMeshes.push(plane);
}

const pickingScene = new THREE.Scene();
const pickingTexture = new THREE.WebGLRenderTarget(NUM_PLANES, 1);
const pickingMaterial = new THREE.ShaderMaterial({
  // Vertex shader
  vertexShader: `
        void main() {
            gl_Position = projectionMatrix * modelViewMatrix * vec4(position, 1.0);
        }
    `,
  // Fragment shader
  fragmentShader: `
                                uniform mat4 visibleCameraViewMatrix;
                                uniform mat4 visibleCameraProjectionMatrix;
                                uniform vec3 planes[${NUM_PLANES}];
                                uniform vec2 mouse;
                                uniform vec2 resolution;

                                // Function to convert world coordinates to screen coordinates
                                vec2 worldToScreen(vec3 worldPosition) {
                                                vec4 clipSpacePosition = visibleCameraProjectionMatrix * visibleCameraViewMatrix * vec4(worldPosition, 1.0);
                                                vec3 ndc = clipSpacePosition.xyz / clipSpacePosition.w; // Normalized Device Coordinates
                                                vec2 screenPosition = vec2(ndc.x, -ndc.y) * 0.5 + 0.5;
                                                return screenPosition * resolution;
                                }

                                void main() {
                                                // Calculate the index of the plane
                                                int planeIndex = int(gl_FragCoord.x);
                                                vec3 planePosition = planes[planeIndex];

                                                // Convert plane world position to screen position
                                                vec2 screenPos = worldToScreen(planePosition);

                                                // Check if the mouse is over the plane
                                                float radius = 10.0; // Radius for picking

            // Convert from webgl coordinates (center is 0, 0) to pixel
            // coordinates (top left is 0, 0, scaled by resolution)
            vec2 mousePos = vec2((mouse.x + 1.0) / 2.0 * resolution.x, (1.0 - (mouse.y + 1.0) / 2.0) * resolution.y);

                                                if (distance(screenPos, mousePos) < radius) {
                                                                gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0); // Red if under mouse
                                                } else {
                                                                gl_FragColor = vec4(0.0, 0.0, 1.0, 1.0); // Blue otherwise
                                                }
                                }
        `,
  uniforms: {
    visibleCameraViewMatrix: { value: camera.matrixWorldInverse },
    visibleCameraProjectionMatrix: { value: camera.projectionMatrix },
    planes: {
      value: planeMeshes.map((mesh) => [mesh.position.x, mesh.position.y, mesh.position.z]).flat(),
    },
    mouse: { value: new THREE.Vector2() },
    resolution: { value: new THREE.Vector2(window.innerWidth, window.innerHeight) },
  },
});

const pickingPlane = new THREE.PlaneGeometry(NUM_PLANES, 1);
const pickingMesh = new THREE.Mesh(pickingPlane, pickingMaterial);
pickingScene.add(pickingMesh);

const aspect = NUM_PLANES / 1; // Width divided by height
const frustumHeight = 1; // Or any other value that matches your scene setup
const pickingCamera = new THREE.OrthographicCamera(
  (-aspect * frustumHeight) / 2,
  (aspect * frustumHeight) / 2,
  frustumHeight / 2,
  -frustumHeight / 2,
  0,
  10,
);
pickingCamera.position.z = 5;

document.addEventListener('mousemove', function (event) {
  // Convert mouse position (top left is 0, 0) to webgl coordinates
  // (center is 0, 0) scaled as a proportion of the resolution
  pickingMaterial.uniforms.mouse.value.x = (event.clientX / window.innerWidth) * 2 - 1;
  pickingMaterial.uniforms.mouse.value.y = -(event.clientY / window.innerHeight) * 2 + 1;
  console.log(pickingMaterial.uniforms.mouse.value);
});

let renderPicking = false;
const alternatePicking = () => {
  setTimeout(() => {
    renderPicking = !renderPicking;
    alternatePicking();
  }, 1000);
};
// alternatePicking();

function render() {
  requestAnimationFrame(render);

  controls.update();

  // Render picking scene
  renderer.setRenderTarget(pickingTexture);
  renderer.setClearColor(new THREE.Color('green'), 1);
  renderer.clear();
  renderer.render(pickingScene, pickingCamera);

  // Read pixel data
  const pixelBuffer = new Uint8Array(4 * NUM_PLANES);
  renderer.readRenderTargetPixels(pickingTexture, 0, 0, NUM_PLANES, 1, pixelBuffer);

  for (let i = 0; i < NUM_PLANES; i++) {
    if (pixelBuffer[i * 4] === 255) {
      // Red channel
      planeMaterials[i].color.set(0xff0000); // Highlight if under mouse
    } else {
      planeMaterials[i].color.set(0xffffff); // Default color
    }
  }

  // Render main scene
  renderer.setRenderTarget(null);

  if (renderPicking) {
    renderer.render(pickingScene, pickingCamera);
  } else {
    renderer.render(scene, camera);
  }
}

render();

The example implements GPU picking in such a way that multiple geometries can register as being under the mouse cursor. It does this by evaluating a separate 1xN pixel array, where each pixel represents whether or not the given geometry is near the mouse on screen. The positions for the actual geometries are passed in as uniforms, it calculates in parallel whether the given geometries are near the mouse, and change color on the hittest accordingly. Right now it only gives a binary value as to whether the data is near the mouse, but it could easily return a distance that can then be used on the CPU side to figure out the top-most hit.

All that said, this seems rather poor for hit testing. Since I’m using planar geometries, picking on corners is really unreliable/nonexistent without making the radius fairly large, which in turn can result in false positives.

Also seems somewhat computationally inefficient, and this is before we add more logic to do things like lasso selection.

Having implemented this now, it feels like a mistake to move away from doing a constant look up with a single pixel buffer that is directly under the mouse (always O(1)) to a lookup that requires evaluating every single instance in the instanced mesh (which is O(N)). Still, maybe I’m wrong and this is fine?

Curious to hear other thoughts

All that said, this seems rather poor for hit testing. Since I’m using planar geometries, picking on corners is really unreliable/nonexistent without making the radius fairly large, which in turn can result in false positives.

If you want a precise result you need to use precise calculations. Using a circle for hit detection is not a good approximation for a rectangle.

Also seems somewhat computationally inefficient, and this is before we add more logic to do things like lasso selection.

Having implemented this now, it feels like a mistake to move away from doing a constant look up with a single pixel buffer that is directly under the mouse (always O(1)) to a lookup that requires evaluating every single instance in the instanced mesh (which is O(N)). Still, maybe I’m wrong and this is fine?

There are always dozens of ways to achieve something like this and you’re the only one who knows your project requirements. No one can give you an answer as to what the “best” way is to achieve something for your case since there are a lot of factors to balance (ie implementation complexity, performance, feature set, etc). Pick a solution that seems good enough for you and use it. And try something different if performance when you find isn’t good enough.

There are always dozens of ways to achieve something like this…try something different if performance when you find isn’t good enough.

I’ll be honest, I’m not well versed in GLSL or picking to know off hand all the possible approaches and their tradeoffs – that’s why I asked the question! :laughing: I’m happy to provide as many details as possible re factors that I care about here, and tried to do that in the first post a bit as well. And ofc trying to write and share code that may be relevant. But yea unfortunately it’s just not super obvious to me what alternatives exist