These are actually very valid points and why most solutions tend to go for a sensor fusion approach that leverages the best of computer vision along with other sensor modalities. Using only ceiling cameras will not get you to the kind of accuracy that most retailers require to confidently rely on the technology.
Even if you placed more cameras (not only ceiling cameras, but also in the shelves and other areas of the store pointing at different angles), leaving aside the cost (which adds up), people will always be able to occlude the items from the cameras (even unintentionally). To put it in numbers (as an example): computer vision can get you 80% of the way in terms of accuracy / detecting items grabbed, for the rest you need other sensors.
The main benefit over other self checkout systems is the time customers wait in line. If you have been to Amazon Go, the experience of walking out without waiting is quite magical.