Fast raycasting and other spatial queries

I was doing some optimization work on my engine recently and remembered various points being made about ray casting performance here on discourse as well as on GitHub, so I decided to share some numbers from an actual real-world usecase. Goal is to give you some perspective on what you can expect from a JS spatial index.

A decently implemented spatial index has fairly low impact on overall frame time in my experience. Here are some numbers for those who might be interested:
image
For clarity: traverseFrustumIntersections is a method used for extracting just about everything that’s within a frustum, such as checking if animated models should be animating or if they can be put to sleep, or particle emitters etc. as well as replacing Three.js frustum culling. As you can see with all the usage it amounts to only about 3% of the overall frame timing, that’s being written with a visitor pattern, if you write that in a dumber way - it could be a fair bit faster still. The dataset in question is my game, here’s the scene:


there are total of ~15,000 objects in the tree, so it’s not super huge.

I use another instance of that index just for representing terrain polygons, that’s just under ~ 200,000 polygons and doing a ray test every frame (from mouse pointer) amounts to being barely registered in the profile:


as you can see, it comes in with about 0.02% frame cost, which is practically nothing, you could do 50 of those casts and it would barely make a dent in your budget at ~1%. Again, it’s visitor based, so doing a dumb, array-filling, implementation would be a fair bit faster.

That profile is ~10s worth of recording, frustum testing amounted to total of 149ms. If 10s has 6000 frames, that’s around 25 microseconds per frame on frustum testing or 0.023ms. There are a few of those tests done each frame in different parts of the engine, such as animation system, rendering, particles.

For purposes of completeness, my index is BVH binary AABB tree. Here are signatures of 2 methods mentioned above:

/**
 *
 * @param {number} startX
 * @param {number} startY
 * @param {number} startZ
 * @param {number} directionX
 * @param {number} directionY
 * @param {number} directionZ
 * @param {function(node:LeafNode)} visitor
 */
BinaryNode.prototype.traverseRayLeafIntersections = function (startX, startY, startZ, directionX, directionY, directionZ, visitor) {
...
};


/**
 *
 * @param {Frustum[]} frustums Collection of THREE.js Frustums
 * @param {function(node:LeafNode, boolean)} visitor
 * @returns {number}
 */
BinaryNode.prototype.threeTraverseFrustumsIntersections = function (frustums, visitor) {
...
};

I’m curious to find out what are others’ experiences with spatial indices in terms of performance, am I the only one for whom it’s a non-issue? I’m struggling to come up with good practical examples where you would need more than a handful of rays per frame/simulation tick. :thinking:

2 Likes

I’m not quiet sure what the numbers tell, i mean there are no other specs like your hardware and this is very scene dependend. The complexity of the scene, it’s content and the indexing technique are crucial how effective it is. I think in your game you benefit from a top-down perspective what reduces the traversal generally. A compasion of with and without index would give a better overview.

The spatial index (IndexedVolume) i made uses different techniques to effectively hierarchically perform culling, but is also used for raycasting and accessing the scene content spatially, what is very important for collisions, since every object will know everytime which objects it can collide with very reduced, without testing the tree top-down. By storing objects related to their size this also helps for distance culling and raycasting. Generally how dense the index tree is will determine how efficient it will work as well.

I’ll give some numbers when i made a new test scene and a comparsion with and without index. In some comparsion tests the results were from 1-8 fps to crash without index and stable 60 with, though it uses more techniques such as auto-instancing, impostor etc. The only real downside of a index is the insertion time, depending on if it works dynamic though. Generating a forest with a few million trees can take a while, you wouldn’t do this in production though. Instead the scene gets either serialized indexed, or it also supports a progressive chunked insertion for procedural or streamed content. As mentioned it uses more techniques for distanced content what changes the representation of the content itself, but that’s not really relevant for this topic.

btw, shouldn’t this rather go in Discussion?

2 Likes

That’s fair, I didn’t think many people would care, my specs are:

  • Browser: Chrome Version 75.0.3770.100 (Official Build) (64-bit)
  • OS: Windows 10 x64
  • CPU: i9-9900k @ 3.6GHz
  • RAM: DDR4 Dual Channel 14-14-14-34 3200MHz

Without it the game is just crashing, so I can’t really provide a reliable comparison.

that’s completely true. A top-down view makes it so that your content is distributed mostly in 1 plane, and you’re able to reject most of the sub-volumes pretty quickly. I did a test with 10,000,000 boxes and I don’t think it’s worth mentioning that that would simply not work on a lot of hardware. That example is not top-down, your view is instantiated at the center of the volume, so it’s close to worst-case.

I’d be curious to know what kinds of techniques you use.

Actually I both agree and disagree, the example I linked earlier, with 10,000,000 boxes does just that, it inserts a lot of boxes into the tree at runtime, so it’s possible to do it without impacting performance, but there are caveats, for sure.

I have 3 tools for making insertion efficient:

  • Add one box to the tree. This is relatively slow, but it finds a good placement for a node.
  • Add many boxes at once. This uses spatial hash to build a sub-tree that’s relatively good and inserts it as a single unit. Fast, but it creates sub-optimal overall tree as it could have been better to insert some of the boxes into other branches instead of keeping them all in a single sub-tree.
  • Incremental tree optimizer. This thing runs relatively slowly performing 1-2 depth rotations on the tree, optimizing spatial locality. The optimizer is kind of like magic sauce, it allows you to sacrifice tree quality in short-term for faster insertion/removal and lets you improve that quality later on with completely adjustable runtime overhead.

I use both. My implementation of the tree has barriers to prevent migration of nodes out of a sub-tree, this enables chunking, and I also have a couple of serialization formats, I store my foliage in such a format, as it lets me optimize the tree beforehand and just load the optimal tree from disk later on.

Sure, we can do that. Mainly I was thinking it might be a resource for those who are thinking about using a spatial index and want to get a feel for what to expect. But I’m personally more interested in having a discussion myself. :+1: