> What good is speed if the accuracy isn't significantly better than a coin flip?
Because distinguishing an object as belonging to one class out of a thousand with 50% accuracy doesn't mean it's a coin flip. You'd need a thousand-sided coin. Random chance in that case is 0.1%, which maeks 50% way, way better.
It’s more nuanced than this. It’s not “look at this image and tell me yes or no if there’s a car in it” it’s more like “tell me where all the cars are in this image, if any.” We use this a lot, and ramping up recall we can do some interesting use cases.
I assure you it’s highly useful in the real, real world.
And that's why nobody actually uses it for those things, at least not yet. Don't forget that advancement is often incremental, and that in this case advancement has actually been somewhat fast. YOLOv3 came out in 2018.