Show HN: Object Detection in an Hour (opens in new tab)

(strayrobots.io)

166 pointskekeblom4y ago34 comments

34 comments

31 comments · 11 top-level

Heads up for anyone else, I was interested in the strayscanner app to try on my iPhone 11, but I’m getting an error when trying to record: “unsupported device: this device doesn’t seem to have the required level of ARKit support”.

kekeblomOP4y ago

Ah yeah. The app store doesn’t seem to have a way to restrict downloads to only lidar devices. The description does mention the limitation, but there doesn’t seem to be a way to set a hard constraint. So sorry about this! Wonder if there is a way to issue refunds on the app store.

bredren4y ago

I was surprised to read this, but it appears you are correct.

This other app uses the entire first paragraph to warn about this. Maybe something to copy: https://apps.apple.com/us/app/sitescape-lidar-3d-scanner/id1...

pzo4y ago

Maybe worth to support front camera (TrueDepth camera) as well? Record3d gives pretty good accuracy (https://record3d.app/). I know probably not the best way to scan something without seeing the screen but better that than nothing. As a workaround people can use small mirror as well to do scanning and see result on the screen at the same time.

kekeblomOP4y ago

It seems that only Apple can issue refunds on purchases and not developers.

1 more reply

TomVDB4y ago

I just got caught by this myself: it doesn't work on an regular iPhone 12 either. It needs to be an iPhone 12 Pro or 12 Pro Max.

AndrewKemendo4y ago· 3 in thread

Just so I understand the idealized pipeline here, a user does the following:

1. Use the Scanner app to take the images and camera pose data

2. Export the scene directory (color and depth images and json files) somehow to your computer

3. Import (integrate, open) the directory via the Stray CLI

4. Annotate voxels via 3D bounding box in Studio GUI

5. Generate labels from the annotated voxels

6. Import data and labels, train and test a detectron model with pytorch

7. Export trained model in torchscript format

8. Profit

I assume you require users to "ETL" the scene directory from your phone to your desktop/laptop via some manual transfer process?

Is there any reason I couldn't stop at step 5 and push my new labeled date to my own training system?

kekeblomOP4y ago

Pretty much yeah. Just to be clear, we only use the color and depth images from the camera. There is actually an offline calibration step to obtain camera intrinsic parameters, which are copied into each scene.

The integrate step runs a SLAM pipeline to compute the trajectory of the camera. Then we run an integration step to obtain the mesh.

Our core philosophy is to not stand in the way once you want to do something custom. So totally, if you want to just read the camera poses and 3D labels and do your own thing, you can totally do that and the data is available in each scene folder.

rocauc4y ago

Do you comprehend how the tool reduces labeling time in (4) and (5) (compared to labeling with eg CVAT) as the post claims?

mainframed4y ago

You only have to label the 3D Bounding Box once. Then you can automatically generate 2D bounding boxes for every frame of the video. So instead of annotating every frame with a 2D box, you only annotate once with a 3D box.

Though, I wonder if the whole hassle of relying on RGB-D sensor of a phone, copying from your phone and using a yet another annotation tool, is worth it, when you can instead use some tracking bbox annotation tool, which interpolates many frames. With those, you can even annotate moving and distant objects, which I would argue is even better for generalization (since the background changes).

But I bet there are some use cases/users which can profit from it.

1 more reply

notum4y ago· 3 in thread

Looks great! Very innovative approach. Are the generated models compatible with OpenCV OAK camera?

wokwokwok4y ago

https://github.com/facebookresearch/detectron2/issues/1602

^ probably not, since they use detectron2, but given the labeled images are really the core part of this, there’s no reason you can’t use them on a different mode that is compatible.

hietalajulius4y ago

Yeah the labels are loaded into the Detectron2 format from the 3D annotations json at train time, we plan to add similar data loading for YOLOv3 etc soon. Starting out with Detectron2 was mainly for POC/demo purposes, the idea in the future is to be able to feed the data anywhere it might be needed.

hietalajulius4y ago

I don't know the specifics of that camera/its software, but the trained models are saved as TorchScript (https://pytorch.org/docs/stable/jit.html) which can be used very flexibly in python/C++.

jonatron4y ago· 2 in thread

Using video to automatically build a large training set is smart! Well done! I was thinking about making a properly free and open dataset from just walking around London, and this gives me some ideas...

hietalajulius4y ago

Nice! Let us know if using Studio might be helpful there.

drooby4y ago

Reminds me of: “How would a human do it”

nathan_phoenix4y ago· 2 in thread

Just curious, what's the business plan behind this?

kekeblomOP4y ago

We plan to charge for some of the algorithms. Also, eventually some parts will run in our cloud and we could charge for compute credits.

nathan_phoenix4y ago

Okay, thanks for the info!

actually_a_dog4y ago· 1 in thread

Super cool, especially the way it was able to differentiate that Posti box from the scooters, even though they have vaguely the same shape. Just out of curiosity, what confidence level did the classifier assign to the Posti box as a scooter?

hietalajulius4y ago

Looks like it picks up parts of the mail thing as a scooter only in a few frames and the score is way below 1% (I set the minimum threshold to 0.01%), here's an example: https://imgur.com/a/9pSTwut

tadeegan4y ago· 1 in thread

This is only beneficial in static scenes right? Otherwise you can’t get free labels across the whole video.

kekeblomOP4y ago

Yes it relies on the target being static when capturing the training data, but it’s ok for the background to move. We were actually surprised by how well it works on moving objects without being trained on them. In the post you can see Julius riding on a scooter and that is an unseen example with a detector that was only trained on static scooters.

kayhi4y ago· 1 in thread

Newbie here, where's the intersection between object detection and OCR?

For example, if I have images in different pdf files that I want to compare or trying to identify information on the wine label, what are criteria to consider on which method to use?

mlboss4y ago

Object detection will find the bounding box around the text then you will need another seq2seq model to convert image to text.

luisramalho4y ago· 1 in thread

Good job guys! Greetings to @hietalajulius (Smartly alumni here)

hietalajulius4y ago

Thanks!!!

tvirosi4y ago· 1 in thread

Cool article! Those clips are from sweden right? :) Just curious if I spotted it right

kekeblomOP4y ago

Close! They are actually from Helsinki, Finland.

kekeblomOP4y ago

Hi HN! hietalajulius and I have been working on a toolkit for solving computer vision problems.

These days, there are a lot of fancy solutions to many computer vision problems, but there aren't good implementations of the algorithms, getting to a working solution requires figuring out lots of different steps, tools are buggy and not well maintained and often, you need a lot of training data to feed the algorithms. Projects easily balloon into months long R&D projects, even when done by seasoned computer vision engineers. With the Stray Robots toolkit, we aim to lower the barrier for deploying computer vision solutions.

Currently, the toolkit allows you to build 3D scenes from a stream of depth camera images, annotate the scenes using a GUI and fit computer vision algorithms to infer the labels from single images, among a few other things. In this project, we used the toolkit to build a simple electric scooter detector using only 25 short video clips of electric scooters.

If you want to try it out, you can install the toolkit by following the instructions here: https://docs.strayrobots.io/installing/index.html

Going forward we plan to add other components such as 3D keypoint detection, semantic segmentation and 6D object pose estimation.

Let us know what you think! Both of us are here to answer any questions you may have.

j / k navigate · click thread line to collapse

34 comments

31 comments · 11 top-level

posix_compliant4y ago· 5 in thread

kekeblomOP4y ago

bredren4y ago

I was surprised to read this, but it appears you are correct.

This other app uses the entire first paragraph to warn about this. Maybe something to copy: https://apps.apple.com/us/app/sitescape-lidar-3d-scanner/id1...

pzo4y ago

kekeblomOP4y ago

It seems that only Apple can issue refunds on purchases and not developers.

1 more reply

TomVDB4y ago

I just got caught by this myself: it doesn't work on an regular iPhone 12 either. It needs to be an iPhone 12 Pro or 12 Pro Max.

AndrewKemendo4y ago· 3 in thread

Just so I understand the idealized pipeline here, a user does the following:

1. Use the Scanner app to take the images and camera pose data

2. Export the scene directory (color and depth images and json files) somehow to your computer

3. Import (integrate, open) the directory via the Stray CLI

4. Annotate voxels via 3D bounding box in Studio GUI

5. Generate labels from the annotated voxels

6. Import data and labels, train and test a detectron model with pytorch

7. Export trained model in torchscript format

8. Profit

I assume you require users to "ETL" the scene directory from your phone to your desktop/laptop via some manual transfer process?

Is there any reason I couldn't stop at step 5 and push my new labeled date to my own training system?

kekeblomOP4y ago

The integrate step runs a SLAM pipeline to compute the trajectory of the camera. Then we run an integration step to obtain the mesh.

rocauc4y ago

Do you comprehend how the tool reduces labeling time in (4) and (5) (compared to labeling with eg CVAT) as the post claims?

mainframed4y ago

But I bet there are some use cases/users which can profit from it.

1 more reply

notum4y ago· 3 in thread

Looks great! Very innovative approach. Are the generated models compatible with OpenCV OAK camera?

wokwokwok4y ago

https://github.com/facebookresearch/detectron2/issues/1602

^ probably not, since they use detectron2, but given the labeled images are really the core part of this, there’s no reason you can’t use them on a different mode that is compatible.

hietalajulius4y ago

I don't know the specifics of that camera/its software, but the trained models are saved as TorchScript (https://pytorch.org/docs/stable/jit.html) which can be used very flexibly in python/C++.

jonatron4y ago· 2 in thread

hietalajulius4y ago

Nice! Let us know if using Studio might be helpful there.

drooby4y ago

Reminds me of: “How would a human do it”

nathan_phoenix4y ago· 2 in thread

Just curious, what's the business plan behind this?

kekeblomOP4y ago

We plan to charge for some of the algorithms. Also, eventually some parts will run in our cloud and we could charge for compute credits.

nathan_phoenix4y ago

Okay, thanks for the info!

actually_a_dog4y ago· 1 in thread

hietalajulius4y ago

Looks like it picks up parts of the mail thing as a scooter only in a few frames and the score is way below 1% (I set the minimum threshold to 0.01%), here's an example: https://imgur.com/a/9pSTwut

tadeegan4y ago· 1 in thread

This is only beneficial in static scenes right? Otherwise you can’t get free labels across the whole video.

kekeblomOP4y ago

kayhi4y ago· 1 in thread

Newbie here, where's the intersection between object detection and OCR?

For example, if I have images in different pdf files that I want to compare or trying to identify information on the wine label, what are criteria to consider on which method to use?

mlboss4y ago

Object detection will find the bounding box around the text then you will need another seq2seq model to convert image to text.

luisramalho4y ago· 1 in thread

Good job guys! Greetings to @hietalajulius (Smartly alumni here)

hietalajulius4y ago

Thanks!!!

tvirosi4y ago· 1 in thread

Cool article! Those clips are from sweden right? :) Just curious if I spotted it right

kekeblomOP4y ago

Close! They are actually from Helsinki, Finland.

kekeblomOP4y ago

Hi HN! hietalajulius and I have been working on a toolkit for solving computer vision problems.

If you want to try it out, you can install the toolkit by following the instructions here: https://docs.strayrobots.io/installing/index.html

Going forward we plan to add other components such as 3D keypoint detection, semantic segmentation and 6D object pose estimation.

Let us know what you think! Both of us are here to answer any questions you may have.

j / k navigate · click thread line to collapse