I think the main problem is that XUL is like HTML, but not. So people's brains are already wired to understand html, but when there's something that looks identical to it, but has it's own set of quirks, it takes a lot of time getting used to. It's as if someone took C++ and changed just enough stuff so that you wouldn't notice it until you tried to run it.
How many html tags do you recognize here: https://developer.mozilla.org/en/XUL_Reference ?
Lots of the tags are somewhat historical because css wasn't as powerful back in 2002 as it is today. Most people starting to hack XUL won't know the difference between a bbox and a deck tag.