You have everyone's interview scores, performance feedback and promotion histories in a database at a company with tens of thousands of employees. You also have the interview scores for everyone who failed the interview process. Put a statistician on that for a day and you will get a lot of significant data about your hiring pipeline.
It is not hard to do, the data just isn't public and such data will never become public. Therefore public researchers will always lag behind private ones, since the private ones have access to the interesting data.
Edit: Also it is not a theory, I have seen internal studies on this myself.