story

Ask HN: How to Get Sports Data?

4 points49yearsold5y ago12 comments

I am thinking of creating a free REST API service that will allow users to list tournaments, schedules, match results etc - for basketball, cricket, badminton. How do I get the data into my database? Do I need to purchase this from someone or I simply do web scraping of different websites to get the data I want?

12 comments

JudgePenitent5y ago

If you want to give this data away for free, thats noble of you, but might trigger lawsuits.

In general, larger players have connections to private apis that come right from the field. Sports betting, espn, etc all pay for expensive connections to get live data.

The historical data will be time consuming but most likely legally safe.

Its the live match results data that you should not scrape in my opinion. Technically it will take some time to set up, but its possible. Its the legal part of all this that I would not recommend.

If i were you, id make two services: free historical data, and a paid live data api. I dont know much about sports data api in particular, but my question is why isnt this already out there? Probably because its very expensive to get enough feeds to have a live api anyone wants to use.

runawaybottle5y ago

Don’t purchase anything if it isn’t for live play by play feeds. That can run you almost 500-1k a year per sport (so it will add up).

A personal solution I’ve been considering is a scraper that knows how to scrape across a few different sites (so you have fallbacks). This can be used to scrape real time data and distribute data fetching to avoid rate limits or reliance on one site (let’s say cbs sports).

For historical data, you’ll have to get attuned to wiki or specific sports sites that have archives (they are out there). Perfect job to outsource to anyone really if you don’t feel like doing the tedious work.

I’d say scraping is the way so long as your scraper knows how to get the same data from multiple sources. If you got the cash then just google a sports data api.

49yearsoldOP5y ago

Thanks. But i am still curious to know how does ESPN gets all the schedule for all sports in their persistence store? Does NFL, NBA, IPL or other sports league provide them with APIs to fetch data or does ESPN too has web scraping scripts that goes after NBA/NFL/MLB websites to get the data they need or do they have people actually doing manual data entry for all of this information via some UI into their own database?

JudgePenitent5y ago

I know a guy who works in sports betting software, a pretty large player. They have direct access feeds that they pay a lot for bulk access, to cbs sports, nbc, etc. I dont know how much but its no small sum. ESPN does not use web scraping for live stats- they use high speed apis that get info right from the field (that either their crew gathers or cbs gathers and they pay for access). The overwhelming majority is not manually entered.

I also do some scraping in my spare time and i can tell you that large companies will have... precautions in place to prevent you from scraping their by-the-minute sports stats they pay millions for. Id also say right now scraping is a gray market, airlines in particular have proven that they are willing to fight for their (public) data, so tread carefully.

Scraping is a problematic way to do this too, because how can you verify what is up to date? You have to rely on one single authoritative source, otherwise youll be relying on some stat calc which “guesses” what the right stat is. Additionally, if you take live data that multiple large businesses feel they own, now they can split the lawyering fees on you and even if they lose, it will be cheaper for them together.

The historical content probably would be trivial to get, and its hosts most likely dont care if you have it. its the recent/live data which may be particularly difficult to take. In my professional opinion this is two separate projects- a paid up to date version with live stats and a free historical wiki.

1 more reply

akudha5y ago

Are there any legal problems with scraping sports data?

quickthrower25y ago

Betfair API might help but check their terms of service to make sure you are not breaking it.

49yearsoldOP5y ago

How does Betfair get this data into their system? Who provides them with this data or they too write their own web scraping application to collect and consolidate the data?

quickthrower25y ago

Probably it is manual because they need to take bets and decide what to pay out on. If they get it wrong, they lost a lot of money and/or credibility. To be totally sure, DYOR.

There is no way they can trust scraped data. Especially when you can have different matches with slightly different names. For example U21 version of A vs B for a givens sport (learned that the hard way when arbing!).

snyena5y ago

Would it be possible to follow your progress somehow?

j / k navigate · click thread line to collapse

12 comments

JudgePenitent5y ago

If you want to give this data away for free, thats noble of you, but might trigger lawsuits.

In general, larger players have connections to private apis that come right from the field. Sports betting, espn, etc all pay for expensive connections to get live data.

The historical data will be time consuming but most likely legally safe.

Its the live match results data that you should not scrape in my opinion. Technically it will take some time to set up, but its possible. Its the legal part of all this that I would not recommend.

runawaybottle5y ago

Don’t purchase anything if it isn’t for live play by play feeds. That can run you almost 500-1k a year per sport (so it will add up).

I’d say scraping is the way so long as your scraper knows how to get the same data from multiple sources. If you got the cash then just google a sports data api.

49yearsoldOP5y ago

JudgePenitent5y ago

1 more reply

akudha5y ago

Are there any legal problems with scraping sports data?

quickthrower25y ago

Betfair API might help but check their terms of service to make sure you are not breaking it.

49yearsoldOP5y ago

How does Betfair get this data into their system? Who provides them with this data or they too write their own web scraping application to collect and consolidate the data?

quickthrower25y ago

Probably it is manual because they need to take bets and decide what to pay out on. If they get it wrong, they lost a lot of money and/or credibility. To be totally sure, DYOR.

snyena5y ago

Would it be possible to follow your progress somehow?

j / k navigate · click thread line to collapse