All you have is two overlapping cameras with poor resolution and they let you behind the wheel, right?
This is a software and processing hardware problem only. Human beings drive OK for the most part and we don't have lidar/radar/sonar/etc.
With 8 cameras providing 360 vision and an advanced NVIDIA hardware suite, this problem is entirely solvable and this is a very reasonable solution to it.