I will say, we solved the draw order problem a different, easier way by adopting a fixed ortho perspective and rendering objects as spritesheets/tiles. We can then author levels using conventional 2D methods. Objects now sort by their pivot's Y value, more or less, so walking behind things isn't an issue.
Seeing the Cave Story screenshot has me thinking, there might be an especially good opportunity for pre-rendered side-scrollers... Just entirely eliminate whole classes of problems, plus you easily bring back some of that animation-inspired back/mid/foreground goodness.
The technique does seem like it would be a great fit for mobile, where users have limited control and efficiency is really important.
I did a pre-rendered background 3D game on the PlayStation 1. We had an operating steel mill as an animated background, with high resolution characters pre-rendered to 3D cards, plus z-buffer data for both so the characters could pass behind background set elements, and when characters fight their geometries overlay/penetrate correctly. Using the PSX MDEC video, we could have up to a dozen background frames. That enabled the giant rotating gears and assembly lines of the steel mill to 'operate' with 3-6 frame loops, and each game level to support multiple perspectives (camera views) of the action. Each camera view could be hundreds of millions of polygons, all precalculated to 2D elements, and the final game engine treating the hardware as more of a real time compositing engine (with a 3D simulation running logically in parallel.)
The game was not popular, misunderstood at the time, and the studio was a film VFX studio whose staff did not like being put on the game production. There were 75 levels and it had quite the large team for the time, about 45 animators, 15 level developers, and 6 engine developers. https://www.youtube.com/watch?v=n9w1e7D5ucY