From the OP:
> humans can build near perfect 3D representations of the world with 2D images stitched together with the parallax neural nets in our brain
This is a statement about cognition. And the response addresses this.
Your response:
> The person you're responding to is talking about depth from stereo, not cognition.
I think this is the disconnect. The person _is_ talking about cognition. OP makes a claim about how humans see, connected to how the human brain works. Response explains why camera-based image recognition right now is a lot worse than your eyes (a big piece of the answer is your brain).
> The cameras just have to replace eye well enough
So yes this is nice in theory. But I also get the sense most people don't realize just how large the chasm is today between cameras and human eyes. They don't "just provide line of sight depth." Dynamic range, field of view, reliability even under conditions like high heat -- there are many other dimensions where they just aren't analogous yet.