I was just answering about the concerns of self-driving systems relying solely (or primarily) on visual data from cameras. It's true this is how humans drive, but human visual system is much more complicated than what the current (published) state of the art in image processing seems to be. I do not trust visual-only systems today (especially if they employ deep learning shenanigans).
I know those problems aren't insurmountable, but I'd feel much safer if they threw in a LIDAR there too.