This simply isn't a good faith take, because you're straw-manning the implementation of the query that the original poster put forward. They aren't asserting that the AI would need to do supernatural super-real time decoding of MPEG encoded files. What if the AI had already seen them? And was able to encode in the typically-compressed way LLMs do the information it needs to answer questions like that without re-decoding the original movies?
This raises many valid questions on topics like the structuring of data within an LLM, how large LLMs may eventually become, what systems should orbit around the LLM (does it make more sense for LLMs to watch YouTube videos, or have already watched YouTube videos?).
My definition of AI is the same definition that Nick Bostrom talks about in his 2014 book Superintelligence. There's no moving goalposts. Goal posts have been set in cement since 2014. Achieving human-level parity has obviously only been a "goal" insomuch as its a 10 millisecond stop on the gradient toward superintelligence. OpenAI is not worth $150 billion dollars because it purports to be building a human-and-nothing-more in a box.