Empirical validiton did not really take off until the late 00s.
https://hci.stanford.edu/publications/bds/4p-guidelines.html
Hmmm, I don't quite see where that supports "Apple didn't do empirical validation"? Is it just that it doesn't mention empirical validation at all, instead focusing on designer-imposed UI consistency?
ISTR hearing a lot about how the Mac team did user research back in the 1980s, though I don't have a citation handy. Specific aspects like the one-button mouse and the menu bar at the top of the screen were derived by watching users try out different variations.
I take that to be "empirical validation", but maybe you have a different / stricter meaning in mind?
Admittedly the Apple designers tried to extract general principles from the user studies (like "UI elements should look and behave consistently across different contexts") and then imposed those as top-down design rules. But it's hard to see how you could realistically test those principles. What's the optimal level of consistency vs inconsistency across an entire OS? And is anyone actually testing that sort of thing today?
I cannot shake the feeling that they simply feel like the were used to just brand Apple as user friendly in the 90s and that Apple never actually adopted their principles and just used it as it fit the company's marketing.
I personally think Apple did follow their own guidelines pretty closely in the 90s, but in the OS X era they've been gradually eroded. iOS 7 in particular was probably a big inflexion point -- I think that's when many formerly-crucial principles like borders around buttons were dropped.