3D sonar vision demo

A couple of months ago, I got it into my head that it would be cool to make a 3D game with no visuals.

The idea was that the player would navigate dungeons, temples, and labyrinths by means of listening only.

They would have to listen carefully to ambient sounds, (such as water dripping, the sound of leaves under their feet) and make use of an auditory compass and sonar sensor system, that described shapes to them with sound.

Fast forward a bit and here we are now. I’ve made a simple demo that shows how it could work.

I think the sensors I’ve programmed on top of threeJS internals need to be tweaked for maximum usability, but the basic concept is in there.

Let me know if you manage to make it to the centre of the maze without looking :slight_smile:


http://www.socket-two.com/main/resource/sonar-vision

3 Likes

Wonderful, I quite like the presentation. One thing though, the “hum” associated with ray-casts is missing for me. I did like the idea of creating a unique sound wave to represent what’s essentially a low-res depth buffer.

The demo runs quite smoothly and controls well. One thing I would ask though is ability to skip onto next piece of text, it felt a bit too slow for me.

Thanks, Usnul!

You are absolutely right about the text. The frustration caused by a timed script was at the forefront of my mind after I implemented it and played it through a couple of times…but I was being lazy and listened to the evil, complacent half of my brain. But I think you have just clinched it for me…I might implement button based message navigation now.

It works ok for me on Chrome and Firefox. In regards to the hum…did you press 3 on your keyboard? I could probably do with making that a bit clearer in the interface.

I have pressed 3, 3 times at least.

I’m using latest chrome. Everything else listed in your text seems to work fine, except for the hum.

By the way, you may think about casting rays in all directions, with more rays closer to the facing vector. Camera is a visual abstraction for the most part, providing spatial presence via sound might work better if you have a wider context. Just a thought though.

Also, I think this approach would work way better with may more rays. Some 20 or so rays that you use is simply too little, considering that the information they convey is entirely discrete when it comes to depth discontinuities.

About the “compas” - I think it’s not bad, but perhaps similar idea can be applied, like having different notes at different directions, and lerping between them depending on current facing angle. I found distinct clicks too annoying, especially the part where if you move your cursor ever so slightly in the “click” region - you’ll hear a machinegun 'click-click-click…".

Overall, for global orientation, i think ambient sounds are most useful, they are positioned globally, so they offer very good audio-landmarks.

I think I’ve found and fixed the issue.

It works well on Linux and Mac, but not on Windows. When Synth.js loads, it tries to access the audio context immediately, but fails to do so, as a user gesture has not happened. Windows is the one acting in the correct way. Weird how Mac and Linux are OK with this.

(As an aside, having been spurred to test my latest projects on Windows, I am now noticing that they run a bit choppy compared to Mac and Linux. I don’t know why it is, but it makes me a sad panda.)

By the way, you may think about casting rays in all directions, with more rays closer to the facing vector. Camera is a visual abstraction for the most part, providing spatial presence via sound might work better if you have a wider context. Just a thought though.

That sounds good. Having a limited FOV is a constraint imposed by normal vision, so if I am building a system that is not like normal vision, why not dispense with it! But on the other hand, I suppose you could argue that the human brain is better adapted to receiving spatial information from a FOV that is less than 180 degrees.

And like a spell to summon a dragon, the rays are expensive to cast, no? Casing more than 20 or so 60 times a second… :-/

But I do wonder…

In order to render faces, ThreeJS and webGL work out the distance between a camera and practically infinite points on a mesh at high resolution. I wonder if its possible to utilise whatever methodology is used there to create a depth sensor that is more like a continuous wave, as opposed to an array of discrete, probing rays.

If it were possible, I suppose the natural progression of that would maybe be to implement it as a custom ThreeJS renderer. It would be very cool to have a general purpose “Click here to view scene with audio” button that could be added to any ThreeJS project for improved web accessibility.

But I am not profoundly knowledgeable of how 3D graphics work at a low level (or ThreeJS itself, for that matter) so this may be a bit of a quixotic idea.

About the “compas” - I think it’s not bad, but perhaps similar idea can be applied, like having different notes at different directions, and lerping between them depending on current facing angle. I found distinct clicks too annoying, especially the part where if you move your cursor ever so slightly in the “click” region - you’ll hear a machinegun 'click-click-click…".

The compass has a different sound for each cardinal direction (N, S, E, W). But you are right - it makes sense that this could be adjusted so that we also have a unique sound for each of the 7 x 4 interstitial points too.

I think that when it comes to the sense of irritation that can be caused by repetitive sound, this is hard to avoid with systems such as this (i.e. screen readers) - they do have to emit a stream of information that is both wide and continuous if they are to do a reasonable job of filling in for eyesight.

I think that making the audio enjoyable, and not irritating, is probably a question of sitting down and doing some killer sound design.

This is not true, I think. Our ears are on the sides of our head, at least that’s the case for me, I wouldn’t dare to presume what that’s like for others. Nor do I condemn those who have ears in other locations, I’m not earist, honest.

The sound is received more uniformly from all around you, with some caveats, such as shoulders blocking and reflecting sound and your ears being pointed forward slightly.

That can be true, with usage of a spatial index that would be possible though. However, due to uniform nature of your usecase - it’s possible to simple render the scene to a depth buffer of low resolution and read depth from the rendered depth buffer then. That would be fairly fast, a few milliseconds to render the buffer at most and then a fraction of that to analyze that and produce sound signature.

Our ears are on the sides of our head, at least that’s the case for me, I wouldn’t dare to presume what that’s like for others.

We are all different, that’s for sure! But I think it is fair to say that, broadly speaking, the human brain is adapted to more readily process auditory information that represents an auditory source (volume, frequency), and not information originating from a visual source (light) that has been converted into audio. And that’s regardless of where their ears are!

But I believe that this doesn’t necessarily mean its a bad idea to have rays casting in all directions, converting distance to the pan, pitch and volume of oscillators. In fact, I think its a great idea! All I’m saying is it could have a downside - That configuration of sensors would provide users with more information about their surroundings, but people may find this more challenging to interpret.

I’m not familiar with depthBuffers. What does depth buffer data look like?

hmm… here are a couple of useful links, I hope:

https://threejs.org/docs/#api/en/materials/MeshDepthMaterial

In short - it’s a monochrome image where each pixel represents depth from the camera’s near plane to nearest fragment in the scene.