I agree the YouTube example isn't the worst one, but also at the same time, I don't agree with you that there is nothing buggy or janky about it.
https://tonsky.me/blog/every-frame-perfect/youtube@1x.png?t=...
There is no logical reason for there to be two copies of the video rendered at once. The video is literally resizing into position, while all of the UI elements shift around it. Why would there be more than one copy of the thing that is resizing?
I will relent on only one thing: "If I take a screenshot of your app at any moment, it must make sense" is too strong of a statement on its own. The context that it is a screenshot of an animation is important, just like cartoon inbetween frames. However, I think if you're being generous with interpretation you can allow this to be implied.