It’s a dual core MCU that runs at around 240MHz.
One of the main limitation tends to be to be around pushing pixels to the screen - most displays are serial (SPI) based which tops out at 80Mhz.
The other limitation is a lack of hardware video decoding. Some of the latest ESP32 chips are RISCV based and have SIMD instructions which can improve decode performance.
But if you pick an easy to decode codec (MJPEG) is popular. The you can get decent performance with the size of displays available.
https://youtu.be/2NLblyCvJBU?si=_c-ycaS4cNZEJBaD