Millions of facets with three.js?

Hello,
Like 3 years ago, I tested Three.js, with great joy.
I tried to make a few demos, like a logo for my company, some 3D scenes, etc…
But I quickly struggled to a limitation: after 10K facets, it was unusable because the FPS dropped down to 25 images per seconds.

I’m not sure how many facets Three.js can draw nowadays, perharps it’s the same I think unlesse the engine has been rewritten.

Nevertheless, in those years, I tried to overpass this limitation of the 10K facets.
And I managed to draw more than a million of facets.
Actually, not all facets are drown, only 100K maximum.
But I could still manage worlds up to 10M facets before droppping to 25 FPS.

I never published my work, I had no time to rework the Three.js engine nor engage a long discussion about that.
But nowadays, I see that WebGL is still not very used, and the games I played, there were not so many facets as well.

So, if you’re interested, here is a quick overview of what I did.

Firstly, I completely rethinked the facet indexation. With Three.js I saw that a octree is used. But actually an octree in JS is really slow because the structure itself is really heavy.
So, I started with the quickest way to sort in JS: the Array.sort() function.
But it can only sort numbers, otherwise it’s really slow. So, the first thing I did is choose a main vertex for a facet (the verrtex numer 0), and converted its coordinates to integers, and mixed the bits and X/Y/Z coordinates to obtain a plain 32 bits number, with 10 bits for each X/Y/Z, that’s a world with 1 million coordinate steps in each axis. The facets can be sorted with the plain JS sort function. They can be searched for with dichotomia algorithm, and also merged, substracted, union, intersection. You just need the min/max X/Y/Z coordinates to reduce the time spent in those operations. Also, I used ASM.js tricks to mix X/Y/Z coordinates quickly.

Now, inserting new facets might seem difficult with only that trick. So, I also used an array of those sorted facets. The first slot of the array holds at most 8 facets. Then if the slot is full, the second slot is used and can contain up to 16 facets, and so on until the 20th slot wich can contain 1 million facets. Now I could merge nearly instantly lots of facets together. Inserting 10K facets is just finding the first slot that can fit them, and adding all browsed slots together until a free slot is found, then sorting once the collected facets into one slot. This is a linear algorithm on average. Over 1 mililion facets however, it started to drop below 50 FPS.

Removing the facets, now: I just kept track of the slot sizes, and removed the facets by setting it to null. If the size of the slot drops under half its size, then I create/merge with the lower slot.

Ok that’s fine, but what about WebGL? I had the problem that sending lots facets
to WebGL is slow. So, each time I sort a bunch of facets, I also prepare a WebGl segment and attach it to its slot. When I update facets in the slot, I also change WegGL facets and only those that are needed, removed facets have all 0’s coordinates and are detected by the shaders as deleted facets. So I never need to resize the WebGL segments, only create them, and reuse them (I don’t actually delete the WebGL segment until the number of slots really diminuish).
Also, asking WebGL to draw what’s inside the camera view is easy: just find lower/upper bounds in each slot, and ask WebGL to draw just that part.
And, large objects can be added or removed easily: facets usually stand close together, so they are in the same slot, and merging slots keeps them together. There is always a slot close to the size of any object, so they can be removed/added/updated quickly.

The rest of the functionalities are working too: camera clipping, finding a facet in any way, shaders, colors, etc… But also geometry operations are much faster, especially with lots of facets. Only, the indexation is different, and also the way to pass facets to WebGL (facet segments are kept between frames).

Although, it might not work good with some special worlds, like having 1 million facets in a place and another million in another distant place, because of integer indexation. Other than those particular features, it works very well. The overall number of JS code line was about one thousand for the core indexation algorithms (then I wrote a few thousand more for other features, like the ability to transpile JS shaders to WegGL, shader & object library etc…, and because I couldn’t use Three.js after those changes).

I hope my explanations are clear enough, but if you have any question, I may answer them if I can. I have no time to rewrite this piece of code, but I remember it took me 3 week ends to write it. But then I am not very used to Three.js code base, so I had no time to change Three.js engine.
I hope that someone will be interested enough in drawing 1 million facets at 50 FPS to apply those techniques to Three.js and manage even bigger worlds.

Thanks,
Nicoco

I don’t think that rendering 10k faces was ever a potential performance bottleneck. Can you proof this statement in any way? Maybe with a live demo?

I’m afraid this statement is not true. three.js never had an octree implementation in the core.

3 Likes

I remember it was slow over 10K facets. I tried many things to speed up: making smaller objets, bigger objets, etc… the only thing that would work was not drawing everything.
May be it is because I benchmarked it on a laptop and a cheap GPU.
But anyway, if I managed to multiply the performances by a factor of 100 (in real looking world with textures and animations…) to 1000 (more simple worlds with reapeating textures and not much animations), then there is potential to overcome any limitation that subsist on tower computers.

Also, what was really slow is the downloading of facet coordinates to the GPU. So, may be it has changed, but it was like 3 years ago and I remember that most of the optimization came from not downloading all the scene each frame. Since the WebGL segments where kept between frames, you could move 10K facet, plus drawing 100K (GPU was freed from drawing everything in the scene with fast JS camera clipping instead of GPU), and managing 1 million (that’s the double-array index).

I didn’t tried the Unity framework, it seem they have less limitations, though not much higher, objects are just removed from the scene to draw bigger worlds.

And yes, I read the source code, and it was indeed an octree. Each object is stored on a cube, which itself is cut in 8 sub-parts, etc… If the object stand in the bounding box of a sub-part it is stored in the sub node, but if you have a very large object with a big bounding box, it is stored in a bigger cube.

I don’t known if it is still the same… I found it: https://collinhover.github.io/threeoctree/
It’s a plugin actually, not in the three.js core.

Do you have exemples of very big scenes? I still have a laptop ^^
(may be some GPUs are better than others… especially gaming GPUs, but as of WebGL, the possibilities are quite tiny, I couldn’t do anything but drawing for exemple, whereas with OpenCL I made some machine learning with sucess).

Let’s start with this one: https://threejs.org/examples/webgl_buffergeometry

It renders 160.000 triangles with a single draw call (you can evaluate WebGLRenderer.info.render in order to see these stats). It runs smoothly at 60 FPS on my five year old iMac.

This third party octree implementation is not used in any of the official three.js examples (https://threejs.org/examples/). So when you do a benchmark with them, forget any relations to octrees^^.

1 Like

That would be only 2^10=1024 coordinate steps per axis.

I don’t quite understand the indexing you are describing, in particular what happens on JS side and what is passed to the GPU. Did you delete the old code, or is it perhaps locked into a commercial project? Would you be able to share an example visual output of your technique? :slight_smile:

I’m sorry, I lost the original code with my old laptop.

You’re right about the 2^10 possibilities, then I must not remember well. I used 2 integers with 20+20+20 bits perhaps. I don’t recall exactly how I did it and all the implementation details. I just know it’s possible to draw millions of facets with a basic GPU.

The scene with 160 000 facets is actually 160 000 dots or nearly… The problems arise when facets are human-sized facets. Real world looking scenes. That mostly span out of the camera view. And this scene is not moving, only the camera is. My GPU could not render 1 million facets whose only 100 000 is in the view and 900 000 is outside the view. It is like rendering twice 1000 * 1000 pixels (nearly 1 million facets to skip and 1 million screen pixel to draw). In that case this scene whould render at 30 FPS instead of 60 (supposing that 60FPS is nearly the max given it’s a demo).

I “interleaved” bits of the X/Y/Z coordinates. It would be more precise than “mix”. The higher bit of X goes with the higer bit of Y and the higher bit of Z, and so on until the least significant bit.
I don’t remember exactly the details of the ASM.js routine, I just remember that I achieve the goal of millions of facets by interleaving bits. And millions of coordinates in each axis.

Once bits are interleaved, the integers can be sorted in 1 dimension, but actually sort 3 dimensions.

There were 3 structures: 1 raw JS array holding facets real coordinates and the interleaved X/Y/Z coordinates (10 elements), with the 0th element being a pointer to more elaborate JS structure to hold various information about the segment, including a second structure: the WebGL segment.
The third structure is again a JS array that holds these arrays in slots having sizes of: 0…7 (special low slot faster than a smaller 0…1 or 0…3 slot), then 8…15, 16…31, 32…63, and so on.

Most operations need to be done on all slots. For example, for rendering, I needed to get the camera view bounding box. Find the minimum and maximum point of the bounding box. Transform it to interleaved integers. Find the minimum and the maximum by dichotomy. Ask the GPU to render these corresponding facets. And do that for all slots.
Other geometry operations (substraction, union, …) are based on the same algorithm, plus the trick of degenerated thus invisible facets and unions work with Array.push.call.
This is an optimization for the X coordinates only. You can also optimize with X and Y. And Z. But I didn’t tried: it was fast enough for what I wanted to do. And it would probably need more WebGL calls which are really slow. The most important is that the GPU doesn’t need to scan most of the facets. Perhaps some GPUs can index facets in a better way, but as far as I know, they just scan all facets and pass them to the shader, which in turn must quickly find if a facet is out of view to return quickly from the shader routine.

To merge slots, the Array.push.call function is the best.

What is passed to the GPU is just an array of coordinates. The parameters of the shader has the other needed information to draw the facets: world matrix, camera matrix, texture IDs, lights, and so on. The JS side manages all non-pixel and all non-dynamic texture data (the said structure of facets, objects, texture affectation, animations, shader selection, …).

The code is not locked, it’s lost. It doesn’t really matter, once you have the algorithm it’s only a thousand lines of code. It took me 3 weed ends to make it work. The rest is much more important (shaders, reflexion, refraction, animated textures, shader textures, light management, etc…).

To be more clear about the problem.
Let’s take a simple example with 1 million facets.
You give a WebGL segment with those facets to the GPU.
The shaders runs on all given facets to compute the pixel and its color.
Now it happens that most pixels are out of the screen.
The GPU can’t know it until the shader ends.
So it need to scan all the facets even if a small part will actually have a pixel to render.
And you lost all that reflexion and refraction computations you’ve done for non visible facets (I’m exagerating but at least you loose matrix computations). (See note below).

Let’s say you make more segments, and select which segments to render prior to call the shader.
You can’t do that with WebGL (perhaps now you can I didn’t checked) because you can only have drawing shader, not computing shaders.
I tried to do computing shader with drawing shaders but then you need somehow to convert in JS some complicated data to integers/floats (what size? which bits in what order?), and that’s tricky and full of particular cases, you then have to convert again the pixel result to ints/floats/complicated structures, you can’t have a generic way of doing computing shader with drawing shaders. I tried with textures also, but it’s the same problem.
Plus you arise with the problem of having many segments to manage in JS.
Each segment is slow to manage: segment creation/deletion/load is slow.

So the only way I found is the coordinates interleave bits. One generic algorithm. Fast to do in JS (ASM.js especially). Fast to sort. Fast to search. Fast to operate on geometry that spans groups of facets, with slots. Not too many segments. Easy to load. Not needed to create/destroy/load often. Not needed to render all/most facets. And you’re still free to do what you want on the shader side.
But the drawback is that it completely changes the structure of the data, so without good knwledge of three.js codebase I couldn’t easily make them work together.

Note: I remember I used a library to compute reverse camera matrix in order to do clipping in the camera view in the JS part.
Note: Just do the logo of your company like I did: fonts with many facets, relexion, refraction, textures, animation, etc… all the effets. In full screen, I didn’t mesured, but it wasn’t 60FPS. Though relection and refraction are known to be slow and there’s nothing to do about it except approximations.

The GPU will know after running the vertex shader whether each fragment will be within the screen. It doesn’t have to compute the final color of fragments that fall outside the screen.

It sounds like you did a great experiment with an indexing method that sounds a lot like a middle-split Kd-tree (eventually evaluated on the GPU, as far as I can understand). It is still not entirely obvious how three.js would fail at the same tasks, though. Could there be a chance that you used three.js in a suboptimal way? Since this was some time ago, maybe you used Geometry instead of BufferGeometry, for instance, and made a lot of online changes?

Note: I remember I used a library to compute reverse camera matrix in order to do clipping in the camera view in the JS part.

three.js clips to the view frustum on the level of complete geometries by default.

I know some users, at least including myself, @Usnul and @Fyrestar, have missed built-in spatial indexing, which can greatly accelerate certain operations.

If we ignore the “Millions of facets” for now (because a million faces is not a big deal for the GPU (although it is for grandma’s mobile phone)), and aim for a geometry with a billion faces, whereof only a few million are within view and non-occluded at any given time, what would be needed? That would be a case where I would expect three.js to fail badly.

1 Like

I’ve heard terms like “polygon soup” thrown around quite often in recent years. That a “game” or, really just a “scene” is considered as a large collection of polygons and nothing more. You forget about objects and hierarchies for the most part, because it simplifies the concept drastically. If you do that - you only have to think about “how do I make my soup run at interactive frame rate”, admittedly, this approach is not all-encompassing as it is not as well suited for dynamic elements in the scene and may run counter to some memory optimization techniques such as instancing.

This is mostly a tangent, in the same direction as what @EliasHasle was talking about.

2 Likes

It is also hardware, shader and other factors related, if people only talk about how many million bare triangles they can render this mostly doesn’t reflect any real world scenario or the average hardware specs of the target audience, especially the gap between desktop and mobile.

1 Like

967 x 12 x 12 x 12 = 1,653,696 triangles

1 Like

@looeee
Are we gonna have instancedMesh with r109 ? :star_struck:

1 Like

http://dusanbosnjak.com/test/webGL/three-instanced-mesh/webgl_instanced_mesh_v4.html

Here’s an example of this. It too renders a lot of triangles. The sphere in particular renders over 32 million triangles (i think) since the shadows are turned on. The box is 1.2 million, times two. The demo itself is from:

Yes, but if you want to use something similar to this in <r109 you can check these out:


4 Likes

Looks like it!!

3 Likes

http://dusanbosnjak.com/test/webGL/three-instanced-mesh/webgl_instanced_mesh_v4.html
Ok so it hasn’t changed a lot: that scene, with 100 000 boxes (tha’ts half a million facets) renders at 32 FPS on my laptop.
And those facets are plain colored, they don’t have textures.
Now if you add textures, you’ll see that it can’t even reach half a million.

Phong doesn’t count much: the GPU is good at that kind of computings, it does them in parallel moreover. The problem is that you have to choose between lights (basic shading is fast but when you add lot of relexion and refraction it becomes slow), fog (really intense volumetric computations), facets (read lots of memory) or textures (read lots of memory). You can’t have all of them and a decent FPS.

The particular way I indexed facets make it possible to skip facets much faster than enumerating them so the number of facets is not a problem anymore.

The other example is a particular case: it doesn’'t have to read lots of memory, it is based on repeating facets.

As for objects and hierarchy, the way I indexed facets have nothing to do with object management. You can have any hierarchy you want, and you can have multiple indexes and merge them together fastly. There can be a decorelation between raw facets in the index and objets. At the end of the frame time, when you have moved all your objects, you update the index. And if objects use that kind of index, they can themselves know quickly if there move places them in the camera view or not.

Example: you have 1000 object of 1 million facets. You ask each object: give me the facets that are in view. There you have your billion facets. You won’t be able to draw more than millions of facets without halving the FPS: 1 million facets and 1 million pixels that’s twice the time spent.

As for instanced mesh, it is neat for repeating geometry. It doesn’t cope with lots different objects.

Though there are solutions for certain types of objects. For example trees can be rendered with 5 kind of instanced mesh, but in the shader you move the vertice randomly. Then you’ll have all different trees. But you have to write your own shader, and it doesn’t work so well with all kind of objets. Cars for example.

Ok if I have some time to spent, I’ll make another demo. Because you just don’t get the point: when you REALLY give 1 million facets to the GPU it is as much as the number of pixels to render, and then the GPU spends twice as much time as if it had to render only a few facets and 1 million pixels.

How did you come up with this number? A box has 6 sides with 2 triangles each, making it 1.2 million “facets”. I’m not even sure how much these facets matter since it’s 24 vertices being transformed. Keep in mind that shadows are being rendered so, double the number of facets…

3 Likes

Ok, so how many WebGL programmers does it take to change 1.2 mil faucets?

5 Likes

At least 10 instances and a spatial index

btw When I first looked at this topic I thought it was about instancing 10k bathroom faucets models or something. Is that actually a common synonym for a face or just something lost in translation? :face_with_monocle:

4 Likes