QUESTION: Hi, I am very sorry if this question already exists, but I have not been ale to find anything yet. SO, here goes, I am working on a project where I want to render a 3d scene on the server-side and then send over the scene as an image. Up to this point, I was able to find quite a few options - most relevant of those are:
However the second part of the issue remains unanswered. Once I have done server-side rendering and have sent the scene over to the client as an image, how on earth do I capture the user interaction with that scene?
I know that I maybe getting tunnel-vision on the solution that I have discussed above, so I thought I’d mention my objective for this project. So, if anyone knows a better approach, I’ll be very grateful if you can share it with me.
PROJECT OBJECTIVES:
Allow users to load detailed 3D models (with high resolution textures) on any mobile device (including older cellphones and computers with not so great CPUs).
Allow users to interact with the 3D model and specify different locations (vertices) where we load new 3d objects.
Above all I am looking for a seamless loading of rendered objects without relying on the client-side hardware specs. Having said that, I am happy to beef-up my server. Or even have multiple servers with a dedicated server for rendering.
I am kind of getting overwhelmed with all the information out there and do not know the correct jargon to look up. Any help will be much appreciated.
I’m not sure I understand what you are trying to do. I’m gonna interpret your question as: “I want to do server-side rendering of a 3D scene, but still have full client-side interaction with the 3D scene”.
If that’s really your question then good luck to you, and I hope you don’t expect to be able to have real-time (30+ fps) rendering because you’ll basically be streaming from the server. That being said, if you really need to go down this road, I’d say you have to work the other way around; start with a normal, fully functional scene on the client (meaning also client-side rendering), and work your way towards outsourcing the rendering to the server (instead of assuming everything happens on the server and just some pieces of data are shared with the client).
I’d start with a fully functional client and keep the rendering very simple (no textures/complex shaders, lighting etc.) just a “debug” visualisation of your scene.
Then I would find a way to have all necessary data for rendering the scene in high quality on the server. I can see two approaches for this:
your application keeps a copy of the scene-data on the server and updates it based on user interactions so it stays synchronised with the client
you find a way to serialise all the information to build and render your scene and send it to the server with every render request (more RESTful approach)
Option 2 would definitely be my first choice, because this way the server is stateless; keeping multiple states synchronised is definitely gonna give you headaches. Only if option 2 is proven to give too much of a performance hit (if your scenes are very complex and it’s too much data to send to the server with every render request) then, and only then, I’d go looking into option 1.
Finally, when you’re able to reproduce the client-side scene on the server and produce high-quality renders, then you can have the client request these renders and overlay them on your canvas.
Hi Mark, Thank you for responding to this. I appreciate the detailed response. You did get that right. That is exactly what I am trying to achieve.
As for the options you mentioned, you are absolutely right, I should try and see if the 2nd approach works. I can see that the first approach can get really messy.
But let me see if I understand the second approach. If I am not wrong, in the second approach you are suggesting something along the lines of “Progressive Mesh Streaming”? I do not know too much about this approach but from what I understand, it keeps a low-res mesh as a base mesh and transmits that to the client initially. After that, once the client is done rendering the base mesh (which should be fairly quick) the client then requests “vertex split instructions” to increase the mesh resolution. I got that information from this paper. Is that close to what you are suggesting?
If it is what you are referring to, do you know if there is any such functionality built into Threejs? I looked up some packages that might allow me to achieve Progressive Mesh Streaming and I came across 3P, from what I have read, it should be able to do progressive mesh streaming but I could not find an example of this so I am not sure how well is this package supported.
According to my experience there is nothing like that available. I’ve invested some time in this topic in the past and came to the conclusion that Progressive Mesh Streaming is a nice topic for the scientific community but more or less not feasible for practical use cases.
Simply because there is no standardization of this technique and thus no reference implementation. And implementing everything on your own is too time-consuming. Especially if you want a solution that blends in well with professional established workflows. If it’s only about reducing download size, I suggest you use glTF and compress your mesh data with Draco. There are really good tools for this like glTF Pipeline. And three.js will handle the decompression automatically like demonstrated by this example as long as you correctly configure GLTFLoader:
Thank you Mugen, that makes a lot of sense. I must say that I am surprised that nothing like this exists yet. But then again, maybe it is because of the introduction of better 3d support in the browsers and better hardware support in the new upcoming devices.
Thank you for your help. I think I’ll stick to the gltf loader and try to minimize the size of the meshes.
Are there any other ways in which I can reduce the rendering load on the client while still maintaining the 3D model’s integrity? Because that is the reason that I was looking into all these alternatives. Even though webgl’s support has grown a lot I still find it difficult for some mobile phones to render large 3d models.
I think you have just hit on the core value of the entire field of real-time 3D graphics. Nearly everything we do is to try and maintain visual quality while reducing load on the client. If generating models on a server and streaming them was feasible, you can bet people would already be doing it. Game streaming services like g-force now or stadia have recently started to do this, but they have huge resources to throw at the problem and require servers very close to the user. I don’t know exactly how far the user can be from the server before latency becomes an issue, but I reckon it’s probably on the order of a few hundred kilometres. In other words, 3D streaming services will probably only ever work well for users in cities.
This means you must focus on client side optimization. three.js already does most of the work for you, as long as you set up your scene correctly.
Further optimization is specific to the type of models you want to display. For example, if you want to display a scene with many repeating separate pieces (e.g. a landscape with trees) you might choose instancing or merging the meshes. While if you have a single highly detailed model (like a human face) you might try to store as much of the detail as possible in the texture maps while reducing the number of polygons. If you want to display highly detailed map data, you might be better off using software specifically designed for that like CesiumJS.
Can you share the models you want to display here? We can recommend some optimizations you can do.
Thank you for your insightful response. You are absolutely right, which is why I am now planning on abandoning my efforts to offload the rendering burden off to the back-end somehow. Instead I am putting the focus on optimizing the load on front-end using formats like gltf, as suggested in an earlier post. And I quite like your idea about storing as much of the detail as possible in the texture maps while reducing the number of polygons, this maybe the answer for the application that I am making.
And you are right, let me be more specific about the models I’ll be using and better explain the goals of the app.
App and Models
The app will be loading animated characters (human characters) and will allow users to design their own avatar. The focus of the app will be on the costume design of the characters. I am not very concerned about the characters but more so about their clothing. Because I want to be able to show the materials used with as much clarity as possible. For example being able to differentiate between wool and denim etc.
Lastly, I am still learning about textures and wrapping those textures onto different meshes - I am going to create a different discussion for the specific questions that I have regarding textures, but if you have any resources for me regarding putting more details into the texture (and lessening the rendering load on the client) please let me know, I am very keen on learning more about this.
I would say that you should consider building a custom material. The MeshStandardMaterial does a great job on the plastic-metal spectrum, and it’s OK for things like bricks or wood. But it’s a general purpose material and won’t look as good as one designed specially for cloth.
On the other hand, it might be good enough and it’s definitely easier to stick with a pre-made material. You will have to make that call yourself.
if you have any resources for me regarding putting more details into the texture
This really depends on the software you use. If you use Blender it will be very different than if you use Maya, for example. However, this is not really a three.js skill (it’s related but done in another application), so you’ll get better answers if you ask on a forum for the software you will use, or maybe a general game dev forum.
Check out renderfarm.js from resources. It’s still quite far from what you need right now but a step in that direction.
I think the most time consuming part in the pipeline is still the ray traced rendering needed for the real HQ graphics. But that’s been getting better year on year and feels this might be doable in the not too distant future.
Check out Lumion. It’s a desktop rendering software used widely in architecture and landscape which does a pretty neat looking real-time render. Unity has real-time raytracing in their preview already. Enscape is another one with nice realtime graphics.
Then comes the part of sending it to the client. If one was to send diffed pixel buffers across the wire something like WebRTC or sockets might do the job. They’re already used pretty commonly for videos.
Sorry to be the optimist here, but this is pretty exciting.
With that point I meant to refer to the part of getting the rendered view delivered to the client. Once the rendering is done on the server it would need to be sent to client in real-time. New renders would be generated at every change in scene and there would be a continuous stream of renders being sent to the client from the server. Lets call the renders as a frame (as its used commonly in video).
A approach that’s common for video encoding and streaming is instead of sending the complete frame only the pixels that have changed with respect to the previous frame is sent. Tom Scott has a good video explaining this concept in context of video compression. This reduces the amount of data that needs to be sent and can help improve speed.
WebRTC and UDP sockets are just technologies that could be used for sending streams of data and well-suited for these kinds of application. You can find a ton of stuff about them by Googling and I don’t have any particular reference that I can recommend. There might be better alternatives than these as well. Just that these were first that came to my mind.
It’s not quite that simple because you need to receive user input as well as render. The limiting factor is not only how fast you render the frame (at least as long as you have the budget for servers), and the fact that every user must have a fast enough connection to stream HD video, it’s the latency of the entire round trip of sending a frame and receiving user input.
You can test this by using ping to check how long a round trip to your favorite websites is. Here are some results I got:
I’m not sure how this would work out but I guess you would have to double this value (and probably more) to get the actual latency.
Let’s say you can get 64ms. That’s equivalent to 15 frames per second. I’m sure there are all kinds of tricks you could use to compress the stream so that the user will get a 60FPS video but interact at 15FPS or whatever.
However, this is the best-case scenario. I’m on a 50mb broadband connection here. The latency of a mobile connection will be higher. Users will find your website laggy and frustrating to use. Perhaps the internet of the future will be fast enough for this, but the internet of the present is not. Rural and mobile users will hate you.
If you read reviews of Google stadia, they nearly all have this complaint. Apparently Google refuses to answer any questions related to Stadia latency, but this article says it’s around 200ms which going to feel really laggy.
I’m sure there are a lot more challenges in the pipeline. That was just a gist of one of the ways it could be done.
More than gaming I am imagining cases like renders or product configurators etc. where its easier to mask the latency or the finer milliseconds are just not as critical.
Scaling would still be a challenge, but not sure why the risk of poorer experience should stop anyone from experimenting with the possibility.
This is great! I like the opinions you all have shared. I do agree with @looeee about the latency. You are right that it does come down to the copper wire lengths (or optic fiber in some cases).
I do have a few small questions:
Hi Looeee, I was just reviewing your earlier comment, could you recommend a forum for general game dev. I’ve found one on stackoverflow but I am wondering if there is any better one?
If I understand this correctly, are you suggesting something along the lines of:
Two video streams layered almost on top of each other, where the rendered high-quality stream does not allow any interaction and the lower quality is not visible but is only present to collect the user interaction?
Could you perhaps, point me to any such tool that allows to capture user interaction? All I can think of is using ffmpeg streams and have some sort of a vnc seesion to capture user response.