In load testing there's a difference between testing if something executes as specified versus testing where it breaks. I'm pretty sure that GitHub has tests to validate their performance specification. They may have tests even far in excess of that. They may not have tested where it breaks. They may have had a discussion like: what if someone tries such and such, and their answer may have been: we have good monitoring, we'll catch it before it gets out of hand, and lock out the user in question (which ultimately is what they did). In other words, spending engineering resources to determine the breaking point may not have been a priority. I'm not saying that I would agree with that in all circumstances, but it's their determination to make.