Yes, until we get client-side transcoding, it is that difficult. Video files, even files compressed with modern codecs, are really big, and the concepts are generally really hard for most people to grok. The knowledge is so arcane that if you have a basic grasp of ffmpeg you are already a wizard. Most people's eyes will glaze over the second you start talking about how they should use a different container, or codec, or whatever. That's why GIFs (for sub-one-minute videos) and Flash (for long videos and/or videos with sound) became the universal language of video; there was no more "Here's this video, but you have to install WMP/RealPlayer/QuickTime/Bonzai Buddy to see it!" In fact, Flash probably became the standard because it did such a good job at integrating with the rest of the browser and didn't shove obtrusive branding in the user's face.
The tl;dr is that the support needs to very near universal for this to work, and no one is interested in WebM. Google probably could've forced the issue with YouTube, and they initially claimed they were going to, but they chickened out for some reason. I've heard it's because VP8 didn't live up to expectations and that they'll renew the push when VP9 is done, but whatever the motive, the reality is that if you want HTML 5 video to work, you must use H.264, as even Mozilla has been forced to admit.