Object detection is an extremely tough problem (some would say it is the computer vision problem ;-)), and while we've made a lot of progress in the past decade, the best methods are still terrible [2] -- average detection precision between 30-50%. For reference, most consumer applications require an AP of 90+% to be considered usable.
So if this is a completely automated solution, it's not going to be able to do much better, unless the creators can make massive (I mean orders-of-magnitude) improvements on the state-of-the-art.
But that being said, there are some applications where lower performance is acceptable. And if you add some manual verification, you could conceivably make this much better (with an increase in latency, though). Another possibility is to specialize on a certain type of input image (e.g., if you're a company taking photos in your warehouse, where all your photos look very similar and/or you can control the lighting and environment).
Still, I'm excited to see companies attempting to take object detection out to the real world. All the best to these guys!
[1] http://pascallin.ecs.soton.ac.uk/challenges/VOC/
[2] http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/resu...
Our current coverage is of the trees of the northeast US (about 200 species), but we are working on expanding that.
In this case, you have to tell it which object you're looking for.
Language dependent/creative tasks run much higher (smaller worker pool, more brain power needed).
We've developed RTFM at CrowdFlower to handle the similar task of moderating images and providing detailed reasons for why they are flagged. It's a common problem that the computers can't solve well enough yet.
Sheeps were detected as horses, faces but not as cats or cars. This seems to be the current state of the art for a general purpose classification. I haven't seen anything better yet (unless you'd specialize in sheep detection)
As a computer vision researcher, I am not impressed by this. It is primarily an api for smartphone app makers who want a binary result for detection. It does not help with scene context analysis. For instance, if I have a big picture of a airplane on a wall, it will detect the airplane.. Does it know that this airplane is in the sky? or on a wall? There are a 1000 failure cases.
http://www.keithcarter.com/wp-content/uploads/2009/10/blue-a...
The idea was a simple (but clever) one - use virtual reality to segment the world into solid blocks of identified objects. The solid blocks are identifiable to those with poor vision in a way that the real world is not.
Essentially this meant processing an image, identifying items e.g. cars, fences, roads etc and then colouring them solid. So instead of a confusing scene of blur, you have a blurred but still identifiable scene of a solid strip of grey for the road, a solid blob of red for the car, another solid yellow stip for a fence etc. A poorly sighted person could still identify from this something that made sense in a way that they couldn't in the real world.
What was required was an input, real time visual processing and then display back to the user - all of which was fantasy 15 year ago.
However, attempt this today with a visual feed, real time processing like this, and then near instantaneous display of the results back to the person with e.g. google glass, and you might have a viable way to show the world categorised in a visual way that will help those with poor vision. Interesting times.
According to http://news.ycombinator.com/item?id=4985100 it won't be today, perhaps tomorrow.
One thing that I want to mention: our service was built favoring Precision over Recall; we reasoned that we'd rather have a low number of false positives and make sure that when we do report a detection, that it actually is one. Thus, our service may occasionally miss instances.
I'm going to implement a button on the Experiment page that lets you flag a detection as something that we need to work on; we will use your feedback to improve the accuracy.
Your application detects none of them... Is it because my ancient phone camera's pics are too grainy? Or do the bikes need to be en profile to be detected properly? Or maybe it's trained to detect bikes with people on them, instead of bikes parked in the street?
This is worse than OpenCV (I thought you were using OpenCV but apparently aren't?)
http://www.bostonglobe.com/rw//Boston/2011-2020/WebGraphics/...
https://lh3.ggpht.com/-GbPgbhUtmnE/UH0p3VmMWoI/AAAAAAAAApM/u...
I want to clarify: the 4 object concurrent detection refers to 4 classes of objects. On the Experiment page, you can only choose one class to detect on (whether that is person, bottles, cars, etc). However, by using the API, you can simultaneously search for cars, planes, people, and motorcycles, for example.
Any plans to increase the number of objects you can search for at once? Very interested in using this but I'd want to be able to scan for ~20 objects.
The man is detected, but also a shape above the umbrella.
Edit: direct link the the result picture: https://s3.amazonaws.com/dextro_detection_results/debug13579...
- your pricing won't work for video (even at only 5fps)
- I can't really use the data without a confidence level of detection. Because for some applications I'd rather discard a bouding box that is below a threshold I set.
Other than that, congrats for the great work :)
In the meantime, if you want to experiment with Dextro for video, shoot us an email at team@dextrorobotics.com and we will hook you up!
With regard to confidence level, that's something that we provide the enterprise-class service with; if this is a critical feature, we can potentially offer it to everyone as well.
In the works: Shoes Balls Smartphones and tablets Dogs Keyboards Cups and glasses Doors Keys
when used http://www.airbus.com/fileadmin/media_gallery/aircraft_pages... it detected 2 planes, there was 1 only
but hope with additional training images, it would improve.
http://englishrussia.com/wp-content/uploads/2007/08/130-cats...
http://3rdarm.biz/images/2010/02/faces.jpg
It got almost all of them but so many errors. It can't detect sheeps either.
I was really impressed at first, but as I tried out more and more images, it became apparent that the api isn't mature enough for one or two cents worth of money. There is a 90% of the algorithm detect the image correctly, but sometimes it doesn't detect the entire object. For example, I used another image of two jets, but it only found one of them even though the jets were identical, but one was smaller than the other.
2 horses / detected 0: http://images4.fanpop.com/image/photos/23500000/horse-horses...
4 horses / detected all as 1: http://4.bp.blogspot.com/-Rso9vw4BmSE/TqZU6vHl3kI/AAAAAAAACL...
* The documentation is pretty weak.
* I am not sure what a classID is, and I don't see any links to where the numbers come from.
* The example request is posting to an insecure http address, but the secret api key is required?
* The example request doesn't fit on one line? It took me a while to see it was in the "GET / HTTP/1.1" style.
* How do errors work? Having clearly specified error responses would be really useful.
If you trying to sell me on your API, show it to me.
BTW, it can find only two airplanes in this photo http://www.q8.com/SiteCollectionImages/Gatwick%20Airport.jpg
http://artificial-intelligence-projects.com/augmented-realit...
It's still in the development stage because I can only fiddle with it when I have the time and impetus to do so. Criticisms/comments welcome.
See this: http://i.imgur.com/ulith.png?1
You need to get a higher percentage of actual matches before you can use this for anything.
I've been searching lately for a post-face.com API and have been following a few for a while, but they seem to have similar issues with poor results.
detected 3 planes... there is only 1 plane and a car
http://i.dailymail.co.uk/i/pix/2012/11/06/article-2228752-15...
does a good job with painting too, but it did find the phantom neighbor peeping in as well:
http://img822.imageshack.us/img822/347/screenshot20130111at5...
seems too buggy to pay just yet