Is there anything in JS to detect surfaces, like what ARKit or ARCore do?

I think I’ve read somewhere that Google’s ARCore uses the cellphone’s depth camera to figure out …depth, and if that’s true, then it’s sort of cheating -even my grandma would do it with ready-made depth as input.

This works by having an infrared source that projects a pattern, eg of dots to the environment, and then the depth camera sees and analyzes those patterns which are distorted due to depth-differences.

I’ve also read that we don’t have access to the depth camera via WebGL / WebXR, so Google still has the advantage. Given ARCore’s poor performance though, they haven’t made any sort of achievement, so it leaves a lot of room for creativity and competition, and I think it can be surpassed, even without that dumb camera.

That said, I’ve noticed 2 serious limitations:

  1. lack of camera frame rate control vs being auto-determined by the lighting,
  2. camera lag.

Both limitations can only be explained by the common hardware and software plague: sloppiness.

“IF” the phone even has a depth camera though, right? RGBD, basically two lenses (or… can also just be two pin holes over the ccd, but thats not very common)
but yes, ir throw pattern over the scene in that case.

i think your right though (to be honest, ive only witnessed ARcore in action for less than 4 seconds in my life) from what i saw, it was , almost flawless motion matching objects into the scene…
(im not suprised we cant access such a camera honestly. exaclty like a kinect over the years)
but yeah, it seems like after Project tango laid out its fundementals , that was about it. really no improvements. I don’t know if i want to blame johnny lee for that, or regina dugan.
interesting tango, and lee “graduated” out from her ATAP research division in 2015. and she left less than a year later.

and thats when its growth stopped it seems
the only improvements i would predict would be ble triangulation, GPS 2.0, and 5g beam forming additions

and thats an excellent point. FRAME RATES. i need to study more aspects of that.

but it appears…

arcore, attempts to give frame rate control, but usually if forced by an app and…
certain ARCore supported devices require internal buffering to operate correctly.
then consider that its tracking points PLUS doing object detection …
and, adjusting from errors every so often . with a 24 /30 fps camera?
60fps seems like the only way.

but yeah, using opencv does allow you to control alot more regarding FPS and these functions.

i wouldnt quiet say sloppiness though, just a refusal to allow compromises .
like they wanted to be able to report the increase of the # of devices capable of running ARcore, even if they just shouldn’t .

still, without 60fps, iim sure this is hugely frustrating

(oh, i heard about some company named… apple? and arkit, are using time of flight IR depth imaging. like they’re pulsing? ir signals, and actually timing that return flight time of that pulse? very fast FPS? or just using all their lenses to share timing off sets to time that return fast enough?

oh…yeah, like the unique identifying pulses for cars with lidar.

The fact that giants like Google and Apple (albeit Google cannot compare to a manufacturer, let alone to Apple’s creative history) use on android and iphone infrared scan devices to detect depth, shows how extremely primitive still is the science for robotic vision.

Most phones that are powerful enough for a decent 3D app or game, have a depth camera (but we can’t use it anyway via web-apps, so no need to bother with that anymore).

i wouldnt quiet say sloppiness though, just a refusal to allow compromises .

It’s very wrong to assume what can, or cannot do another developer -because the phone developers think they have thought of all the possible cases, they are …wiser, and they know better, and then… place ridiculous limits to the software developers! That’s superficial judgment in my book, aka sloppiness.

The fact that they don’t allow control of FPS is a serious obstacle in making motion-based robotic vision, because a stable frame rate is far more important than the darkness and amount of noise in an image that those “geniuses” prioritized.

still, without 60fps, iim sure this is hugely frustrating

Although 60fps would be ideal, in my testing 30fps is enough, provided it is stable.
It would also allow for heavier processing of the video stream (the app/game should run at 60fps) and I would be very happy if that was the case, but I only get an unstable framerate with lots of hiccups that is bellow 30fps most of the time (I’m guessing around 15-30) , except from extremely bright light conditions like outdoors, or when looking up close to a bright screen monitor (filling the whole camera frame) on a small tripod.

Add to this the camera lag -not one, but several frames behind, which is evidence of not just not optimizing, but not even designing for performance, and thus seriously limiting the development possibilities. Poor design is also sloppiness in my book.

Buffering camera frames seems interesting (thanks), but it won’t solve the problem of auto-fps control…

but yeah, using opencv does allow you to control alot more regarding FPS and these functions.

That’s C++, so nothing we can use in WebGL/XR.

@dllb OpenCV has a guide on compiling to WebAssembly: OpenCV: Build OpenCV.js

2 Likes

Interesting, thanks!

Hi, new here but have been referred to this conversation by @ThorstenBux.

While by no means a computer vision expert, I am interested in this, primarily to enhanced location-based (i.e. using GPS and sensors) AR by detecting surfaces and thus allowing more realistic placement of objects so that they actually appear on the ground. My specific use-case is outdoor AR for navigation for walkers and hikers.

I have been working with Thorsten a little on this, and have investigated the PTAM library which he has suggested, but so far, I am encountering problems with using it from emscripten, which are not yet solved.

Thanks to this thread, Thorsten’s advice and a bit of research, I have some idea of the general procedure needed. We need point or edge detection, which seem to be fairly well established algorithms and can be done from JS libraries such as tracking.js or JSFeat. Once we have collections of points, we can then detect edges or planes (again, there seem to be well-defined algorithms to do this)

The difficulty is converting the 2D points to 3D coordinates, ready for rendering (for example with three.js). For this a camera pose estimation is needed, e.g. using solvePnP. However, to obtain this (from my rudimentary knowledge of this area) a calibration step is needed using known correspondences of 2D and 3D points (e.g. a paper marker of specific dimensions at a given distance from the camera).

This (i.e 2D to 3D point conversion) is the main stumbling block, I think. If the 2D-to-3D conversion is solved the rest of it, while not easy, at least looks well-defined and there are plenty of algorithms available to help us. PTAM looks like one possible approach but does anyone have any other suggestions?

personally ,for myself, i was planning on finding edges first…
then try to assume shapes among the edges on screen. then pin tracking points in the most sensible places.

and as far as point tracking, yes definitely JSfeat

as far as edge detection , hands down i highly recommend GammaCV (but i think that projects dead)

its extremely fast.
oh, and they’ve got ‘pc lines’. that seems like a perfect combo for planar detection.
(and strokewidthtransform? interesting)

main point: look into GammaCV

calibration step for correspondences of 2D and 3D points…
yeah, i myself am far from anything but a (very) determined noob.
but that being said, my guess is…
1, knowing what the application is intended for and where its to be used will help answer that question… take for example. self driving cars tend to almost always look down roads towards the horizion. so some amount of callibration is done with the horizon, and the left and right curbs (and accompaning strips) all meeting up in the center.
2. first thing that came to my mind … was watching my friend play with ARcore on his phone today. it had him wiggle his phone around to calibrate… but i noticed the model it showed , no matter how many times he zoomed into it, it would always fix itself into the corner of the room (and about 30 ft away despite it being only 5)
i think its just finding the 2 furthest planes, and a “ground truth?”
and assuming each is 90 from each other

(oh you said outdoor hiking. i’ll have to think about that.
…>>!? calibrate with a users hand out reached. palm away from the camera !?<<
may not be perfectly sized, but i mean, will that make a huge difference?
besides having them stop and measure their hand u might ask Qs on height, weight, age, gender, (if its fitness related) and narrow down the dimensions of the calibration hand model.

just guessing, throwin out ideas.