You're right, it would be a fun side effect to see the color change between different scenes :)
I guess there's not enough in commercial interest in fixing these problems, but it can be probably done with the current algorithms. Precise 3D reconstruction is much more important than colour reconstruction.
Would it be possible to have some sort of semantic naming coupled to it 'the car in this scene is red', that could be fixed by humans, maybe even by talking to the NN? That's when this stuff starts getting fun IMO - human language together with superhuman domain knowledge.
Maybe it's about a truck painting shop, and every time they paint something, it either doesn't change color, or get a color different from what the customer ordered. I would watch that.
Are there any other issues or advantages that you're able to see that would be unique to movies? While I'm unable to think of a use, seems like there's the potiental for side channel analyis via music, dialog, background-noises, scripts, etc.
Script analysis would take much more effort probably, but it could work if the colour is in the movie script...in that case object detection gets important as well. Just using more training data to cover all the dog breeds and car colours would help a bit though.