Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
How to Scrape 20M Records from GitHub
(opens in new tab)
(opensourcewatch.io)
6 points
cglee
9y ago
7 comments
Save
Share
7 comments
3 comments · 1 top-level
top
newest
oldest
minimaxir
9y ago
· 2 in thread
Yikes, HTML scraping? That's bad/against TOS, especially when GitHub has a robust API.
michaelrm
9y ago
:) We explored using the API unfortunately at the volume of data being pulled down, the API is not feasible due to the number of authorized API keys needed.
minimaxir
9y ago
And you don't see this as unethical?
1 more reply
j
/
k
navigate · click thread line to collapse