I know it's fun to hate on XML, but as compared to
inventing a new pseudo-text-pseudo-binary format, its parsing mechanics are well understood
I'm not claiming all of PDF's woes are related to its encoding, but it's not zero, either. Start from the fact that XML documents have XML Schema allowing one to formally specify what can and cannot appear where. The PDF specification is a bunch of English which makes for shitty constraint boundaries