Central limit theorem tells in practice that gaussian distributions is can be expected to be quite common. And it makes the gaussian distribution a good first guess. Least squares gives the ML estimate for gaussian residuals. I don't find this very direct, and there being a rationale doesn't mean that rationale is what in reality drives the usage.
I mention the relation to the gaussian distribution. Which part of the comment is incorrect?