This other app uses the entire first paragraph to warn about this. Maybe something to copy: https://apps.apple.com/us/app/sitescape-lidar-3d-scanner/id1...
1. Use the Scanner app to take the images and camera pose data
2. Export the scene directory (color and depth images and json files) somehow to your computer
3. Import (integrate, open) the directory via the Stray CLI
4. Annotate voxels via 3D bounding box in Studio GUI
5. Generate labels from the annotated voxels
6. Import data and labels, train and test a detectron model with pytorch
7. Export trained model in torchscript format
8. Profit
I assume you require users to "ETL" the scene directory from your phone to your desktop/laptop via some manual transfer process?
Is there any reason I couldn't stop at step 5 and push my new labeled date to my own training system?
The integrate step runs a SLAM pipeline to compute the trajectory of the camera. Then we run an integration step to obtain the mesh.
Our core philosophy is to not stand in the way once you want to do something custom. So totally, if you want to just read the camera poses and 3D labels and do your own thing, you can totally do that and the data is available in each scene folder.
Though, I wonder if the whole hassle of relying on RGB-D sensor of a phone, copying from your phone and using a yet another annotation tool, is worth it, when you can instead use some tracking bbox annotation tool, which interpolates many frames. With those, you can even annotate moving and distant objects, which I would argue is even better for generalization (since the background changes).
But I bet there are some use cases/users which can profit from it.
^ probably not, since they use detectron2, but given the labeled images are really the core part of this, there’s no reason you can’t use them on a different mode that is compatible.
For example, if I have images in different pdf files that I want to compare or trying to identify information on the wine label, what are criteria to consider on which method to use?
These days, there are a lot of fancy solutions to many computer vision problems, but there aren't good implementations of the algorithms, getting to a working solution requires figuring out lots of different steps, tools are buggy and not well maintained and often, you need a lot of training data to feed the algorithms. Projects easily balloon into months long R&D projects, even when done by seasoned computer vision engineers. With the Stray Robots toolkit, we aim to lower the barrier for deploying computer vision solutions.
Currently, the toolkit allows you to build 3D scenes from a stream of depth camera images, annotate the scenes using a GUI and fit computer vision algorithms to infer the labels from single images, among a few other things. In this project, we used the toolkit to build a simple electric scooter detector using only 25 short video clips of electric scooters.
If you want to try it out, you can install the toolkit by following the instructions here: https://docs.strayrobots.io/installing/index.html
Going forward we plan to add other components such as 3D keypoint detection, semantic segmentation and 6D object pose estimation.
Let us know what you think! Both of us are here to answer any questions you may have.