The hardware likely is optimized for the common case, so I would think that can be a lot slower. It wouldn’t surprise me, for example, if there are image sensors out there that can only be read out in top to bottom, left to right order.
Also, with RAW images and sensors that aren’t rectangular grids, I think that would complicate RAW images parsing. Code for that could have to support up to four different formats, depending on how the sensor is designed,