Is there anything in JS to detect surfaces, like what ARKit or ARCore do?

I think I’ve read somewhere that Google’s ARCore uses the cellphone’s depth camera to figure out …depth, and if that’s true, then it’s sort of cheating -even my grandma would do it with ready-made depth as input.

This works by having an infrared source that projects a pattern, eg of dots to the environment, and then the depth camera sees and analyzes those patterns which are distorted due to depth-differences.

I’ve also read that we don’t have access to the depth camera via WebGL / WebXR, so Google still has the advantage. Given ARCore’s poor performance though, they haven’t made any sort of achievement, so it leaves a lot of room for creativity and competition, and I think it can be surpassed, even without that dumb camera.

That said, I’ve noticed 2 serious limitations:

  1. lack of camera frame rate control vs being auto-determined by the lighting,
  2. camera lag.

Both limitations can only be explained by the common hardware and software plague: sloppiness.

“IF” the phone even has a depth camera though, right? RGBD, basically two lenses (or… can also just be two pin holes over the ccd, but thats not very common)
but yes, ir throw pattern over the scene in that case.

i think your right though (to be honest, ive only witnessed ARcore in action for less than 4 seconds in my life) from what i saw, it was , almost flawless motion matching objects into the scene…
(im not suprised we cant access such a camera honestly. exaclty like a kinect over the years)
but yeah, it seems like after Project tango laid out its fundementals , that was about it. really no improvements. I don’t know if i want to blame johnny lee for that, or regina dugan.
interesting tango, and lee “graduated” out from her ATAP research division in 2015. and she left less than a year later.

and thats when its growth stopped it seems
the only improvements i would predict would be ble triangulation, GPS 2.0, and 5g beam forming additions

and thats an excellent point. FRAME RATES. i need to study more aspects of that.

but it appears…

arcore, attempts to give frame rate control, but usually if forced by an app and…
certain ARCore supported devices require internal buffering to operate correctly.
then consider that its tracking points PLUS doing object detection …
and, adjusting from errors every so often . with a 24 /30 fps camera?
60fps seems like the only way.

but yeah, using opencv does allow you to control alot more regarding FPS and these functions.

i wouldnt quiet say sloppiness though, just a refusal to allow compromises .
like they wanted to be able to report the increase of the # of devices capable of running ARcore, even if they just shouldn’t .

still, without 60fps, iim sure this is hugely frustrating

(oh, i heard about some company named… apple? and arkit, are using time of flight IR depth imaging. like they’re pulsing? ir signals, and actually timing that return flight time of that pulse? very fast FPS? or just using all their lenses to share timing off sets to time that return fast enough?

oh…yeah, like the unique identifying pulses for cars with lidar.

The fact that giants like Google and Apple (albeit Google cannot compare to a manufacturer, let alone to Apple’s creative history) use on android and iphone infrared scan devices to detect depth, shows how extremely primitive still is the science for robotic vision.

Most phones that are powerful enough for a decent 3D app or game, have a depth camera (but we can’t use it anyway via web-apps, so no need to bother with that anymore).

i wouldnt quiet say sloppiness though, just a refusal to allow compromises .

It’s very wrong to assume what can, or cannot do another developer -because the phone developers think they have thought of all the possible cases, they are …wiser, and they know better, and then… place ridiculous limits to the software developers! That’s superficial judgment in my book, aka sloppiness.

The fact that they don’t allow control of FPS is a serious obstacle in making motion-based robotic vision, because a stable frame rate is far more important than the darkness and amount of noise in an image that those “geniuses” prioritized.

still, without 60fps, iim sure this is hugely frustrating

Although 60fps would be ideal, in my testing 30fps is enough, provided it is stable.
It would also allow for heavier processing of the video stream (the app/game should run at 60fps) and I would be very happy if that was the case, but I only get an unstable framerate with lots of hiccups that is bellow 30fps most of the time (I’m guessing around 15-30) , except from extremely bright light conditions like outdoors, or when looking up close to a bright screen monitor (filling the whole camera frame) on a small tripod.

Add to this the camera lag -not one, but several frames behind, which is evidence of not just not optimizing, but not even designing for performance, and thus seriously limiting the development possibilities. Poor design is also sloppiness in my book.

Buffering camera frames seems interesting (thanks), but it won’t solve the problem of auto-fps control…

but yeah, using opencv does allow you to control alot more regarding FPS and these functions.

That’s C++, so nothing we can use in WebGL/XR.

@dllb OpenCV has a guide on compiling to WebAssembly: OpenCV: Build OpenCV.js

2 Likes

Interesting, thanks!

Hi, new here but have been referred to this conversation by @ThorstenBux.

While by no means a computer vision expert, I am interested in this, primarily to enhanced location-based (i.e. using GPS and sensors) AR by detecting surfaces and thus allowing more realistic placement of objects so that they actually appear on the ground. My specific use-case is outdoor AR for navigation for walkers and hikers.

I have been working with Thorsten a little on this, and have investigated the PTAM library which he has suggested, but so far, I am encountering problems with using it from emscripten, which are not yet solved.

Thanks to this thread, Thorsten’s advice and a bit of research, I have some idea of the general procedure needed. We need point or edge detection, which seem to be fairly well established algorithms and can be done from JS libraries such as tracking.js or JSFeat. Once we have collections of points, we can then detect edges or planes (again, there seem to be well-defined algorithms to do this)

The difficulty is converting the 2D points to 3D coordinates, ready for rendering (for example with three.js). For this a camera pose estimation is needed, e.g. using solvePnP. However, to obtain this (from my rudimentary knowledge of this area) a calibration step is needed using known correspondences of 2D and 3D points (e.g. a paper marker of specific dimensions at a given distance from the camera).

This (i.e 2D to 3D point conversion) is the main stumbling block, I think. If the 2D-to-3D conversion is solved the rest of it, while not easy, at least looks well-defined and there are plenty of algorithms available to help us. PTAM looks like one possible approach but does anyone have any other suggestions?

personally ,for myself, i was planning on finding edges first…
then try to assume shapes among the edges on screen. then pin tracking points in the most sensible places.

and as far as point tracking, yes definitely JSfeat

as far as edge detection , hands down i highly recommend GammaCV (but i think that projects dead)

its extremely fast.
oh, and they’ve got ‘pc lines’. that seems like a perfect combo for planar detection.
(and strokewidthtransform? interesting)

main point: look into GammaCV

calibration step for correspondences of 2D and 3D points…
yeah, i myself am far from anything but a (very) determined noob.
but that being said, my guess is…
1, knowing what the application is intended for and where its to be used will help answer that question… take for example. self driving cars tend to almost always look down roads towards the horizion. so some amount of callibration is done with the horizon, and the left and right curbs (and accompaning strips) all meeting up in the center.
2. first thing that came to my mind … was watching my friend play with ARcore on his phone today. it had him wiggle his phone around to calibrate… but i noticed the model it showed , no matter how many times he zoomed into it, it would always fix itself into the corner of the room (and about 30 ft away despite it being only 5)
i think its just finding the 2 furthest planes, and a “ground truth?”
and assuming each is 90 from each other

(oh you said outdoor hiking. i’ll have to think about that.
…>>!? calibrate with a users hand out reached. palm away from the camera !?<<
may not be perfectly sized, but i mean, will that make a huge difference?
besides having them stop and measure their hand u might ask Qs on height, weight, age, gender, (if its fitness related) and narrow down the dimensions of the calibration hand model.

just guessing, throwin out ideas.

I’m sooo sorry for my disappearance but i had numerous large set backs, that i absolutely was not expecting and other issues popping up around the house. but, ive been diligently chipping away at the project with some great results.
over all, and through all the available tools i’ve encountered i do believe JSFEAT is all that’s needed to get the job done, and surprisingly efficiently ( i feel a small glitch in my code might of helped helped in that regard )

i was really hoping gammaCV’s pclines was going to be very useful here, but im having problems with it running smoothly on android and there’s little material on it that i can find. If anyone has a solution to that, id be extremely thankful.

pretty sure ive got 6dof and some structure from motion established and now am just attaching it all into three.js now.
(id really like some fast hough lines though to lock down planar tracking in the SFM portion)

i apologize again for the wait.

can i detect a plane or surface in the single image using hit test? because, i want to input an image and detect the plane or surface in the image (same as markerless).

im still new whit this. but can i detect a plane or surface in the single image using this? because, i want to input an image and detect the plane or surface in the image (same as markerless). just like in this video, but only on website https://www.youtube.com/watch?v=aliy3qNiGHo&feature=youtu.be

Your response will be very helpful

so yah stumbled on this thread looking at all the same stuff.

iv been thinking of having a bash at this myself, first i thought about canny edge detection also, but looking into it more i think i may have another idea.

  1. using the gyro on mobile it should be fairly easy to get pitch/yaw/roll fairly easy (famous last words) for the camera within 3js.

  2. for X and Z (left/right/up/down) im thinking you could use a basic object detection like ml5.js(coco) to pick out every day items in the area, ml5 places a bounding box around each object, and gives you the x,y coords in the frame, so this could be translated to x,z on the camera.

  3. im a little stumped here. ml5 gives you the width/height of the object so you could translate the image getting bigger to getting closer to the camera (+X direction) and smaller to getting further away (-X direction)…the problem here is when the camera is pitched the height will be effected and the same for width/yaw, im guessing there could be some calc done between the size and pitch/yaw at any given point to figure out if the object has gotten smaller because your further away or because of pitch or yaw, unfortunately im only a hobby coder really and math’s isn’t my strong suit.

finally…obviously this doesnt detect planes, so it cant tell weather to place something on a wall or on a floor, neither will it give distance between worktop or floor, but thats were edge detection can play a roll, as previously stated gammacv + PC lines. the latter should give you the basics in markerless ar movement in theory (maybe)

my other thought is some sort of point tracker, were you could pick out 3 points and measure the triangle for left/right/forward/back movement like head tracking with 3 IR leds in flight sims, i just thought for my first attempt ml5 was kind of an out of the box solution…

thanks for this thread and everyone who has supplied info so far, would be cool if we could get a working/ maybe even an open source solution going from this?..the whole webxr ios, low browser support etc is a killer, esp when 8th wall has a js working solution (out of reach for hobbyist tho)

i must add for point 3, i would store the size of bounding box and angle of camera at the given point said object enters the frame, so then you would reference camera angle/object size against that as the scene goes on to figure out the distance…and if you move where this object leaves the frame you would grab the next main object and replace the previous size/angle etc and gradually add onto the position of the camera…i imagine this as if the camera is like someone swinging from tree to tree (object to object) through a jungle

ad 1) this does work on Android/Chrome but not on iOS/Safari or iOS/Chrome because Apple disabled the sensors in the browser by default. I’ve tried that just the other day and was very happy until I got feedback from iOS users.

I’m also looking for a solution to this problem. Available services are way too expensive and lock you in their platforms. Open source libs like AR.js only support marker based tracking.

My naive approach would be much like yours … somehow extract features and match them across frames to determine 3D perspective from 2D position changes.

Iv started using ‘model-viewer’

Make + export model/scene with three.js,

Save to server,

Load it into ‘model-viewer’.

Supports webxr/scene-viewer/quick look all from a single file…

Certainly not perfect.

8th wall is pretty much the industry standard for web AR at the moment (have worked on projects for Ford, XFinity and Doritos using it over the past year or so).
It’s JS +WASM, you can do face tracking, image tracking and ground tracking.
Unfortunately it’s quite expensive ( and the customer support isn’t great, emailed them to ask a basic question about setting the renderer’s device pixel ratio and never got an answer…).
Free option is AR.je, but last time I checked the performance wasn’t good.

2 Likes