undefined | Better HN

0 pointsoliverx07y ago0 comments

(disclosure: I work for a competitor)

These are actually very valid points and why most solutions tend to go for a sensor fusion approach that leverages the best of computer vision along with other sensor modalities. Using only ceiling cameras will not get you to the kind of accuracy that most retailers require to confidently rely on the technology.

Even if you placed more cameras (not only ceiling cameras, but also in the shelves and other areas of the store pointing at different angles), leaving aside the cost (which adds up), people will always be able to occlude the items from the cameras (even unintentionally). To put it in numbers (as an example): computer vision can get you 80% of the way in terms of accuracy / detecting items grabbed, for the rest you need other sensors.

The main benefit over other self checkout systems is the time customers wait in line. If you have been to Amazon Go, the experience of walking out without waiting is quite magical.

0 comments

2 comments · 2 top-level

Animats7y ago

Occlusion is a big vision problem here. The customer facing problem is being stuck at checkout with a "Wait for Attendant" message on screen.

There's "LaneHawk", which has a camera mounted near the floor looking at the bottom level of shopping carts.[1] This is to catch bags of dog food, cases of beer, and similar big items the cashier might miss. Came out around 2010. Saves about $10 per lane per day. That's probably the most successful system in this area right now.

[1] https://www.youtube.com/watch?v=rpHqWSYTF2s

TheEzEzz7y ago

We were initially unsure that we could achieve high accuracy with overhead cameras only. It's definitely not immediately obvious that it should be feasible, and we weren't convinced until we arrived at a working solution (and a working store!). Getting our action recognition models to high enough accuracy was a big part of the crux, which, not coincidentally, is an area where the literature isn't super mature yet.

Our system still makes mistakes, of course, but we're in the high 90's of accuracy, which is good enough economically to start deploying and getting real world users. And there's plenty of signal to squeeze out still.

j / k navigate · click thread line to collapse