The entire Linux world is fragmented. Luckily, things are getting better with systemd, wayland, etc, aiming to provide a simple and standardized way to interface with some parts of the system for developers.
Both Firefox and mpv use the same decoder for Youtube (ffvp9). With accelerated layers forced on and fullscreen, you should see roughly comparable performance. For windowed, it's currently expected to be somewhat more taxing than mpv because all of the extra window contents are composited every frame.