See: https://www.quora.com/Is-website-scraping-legal-and-ethical
But I don't necessarily want to go down this rabbit hole here, as it detracts from the interesting technical issues outlined in the article. If you haven't read the entire article, I recommend that you do, because they had to overcome many interesting technical challenges while working on this.
The article actually mostly talks about how they scaled working with the accumulated data, and only a little about scaling the scraper.
Just like the interesting part about Google is how they process and index the data and not their crawler, the most interesting part of this article actually not the scraping, but how they handled all the data and processed it.
Edit: I found Github's policy on scraping. I hope the link brings some closure to this concern: https://help.github.com/articles/github-terms-of-service-dra...