... by externalizing the costs to a third party.
In general, I'm really surprised that they published this article. It's like they described exactly the data that somebody working on preventing scraping would need to block this traffic, in totally unnecessary level of detail. (E.g. telling exactly which ASN this traffic would be arriving from, describing the very specific timing of their traffic spikes, the kind of multi-city searches that probably see almost no organic traffic).
I just don't get it. It's like they're intentionally trying to get blocked so that they can write a follow-up "how Google blocked our bootstrapped business" blog post.
I'm always surprised by the level of ignorance, but I've seen more than one startup burn because the founders didn't understand which taxes were due and, thus, failed to account for them in their pricing.
> I'm always surprised by the level of ignorance
Such as the ignorance displayed in your comment?
It's not illegal. Google can sue them and bury them in court fees and potentially win a civil suit, but it sure as hell isn't illegal.
I do think that if they ever get traction they'll have a lot of problems - there's a reason GDS access to flight availability is slow, expensive, and difficult to implement well. Scraping definitely won't scale.
The article mentions that they are using rotating residential proxies.
I am very interested in what a 'rotating residential proxy' is. Are they routing requests through random people's internet connections? Are these people willing participants? Where do they come from?
A residential proxy is listed as an "IP address provided by an Internet Service Provider", but I still don't really understand how they get access to them. ISPs have to be selling them access, right?
Providers of the 'free' Hola vpn.
"80M+ Monthly devices hosting Luminati's SDK" & "100% Peers chose to opt-in to Luminati's network" (https://luminati.io/network-details)
There is a 0% chance that 80M+ are agreeing "I am OK with Luminati selling access to my home internet connection to any party able to pay", which seems like an honest description of their business model. More likely Luminati is paying unscrupulous app developers to include this SDK in their apps, and some put some legalese into 10,000 word install-time agreements that no one reads.
Which is what this guy could have done, instead of behaving like pond scum. It’s not like it’s particularly complicated to get programmatic access to a GDS API, that’s what they’re there for.
> Brisk Voyage finds cheap, last-minute weekend trips for our members. The basic idea is that we continuously check a bunch of flight and hotel prices, and when we find a trip that’s a low-priced outlier, we send an email with booking instructions.
Edit: Ok, this could actually be interesting. At least in the short while. .)
[1] https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-l...
Blogging about it publicly, as they're doing it: that may appear newish, but I'm sure some other startup did that 15 years ago.
> As a user of the Site, you agree not to:
> 1. Systematically retrieve data or other content from the Site to create or compile, directly or indirectly, a collection, compilation, database, or directory without written permission from us.
It's data scraping/middlemen all the way down... I wonder if Google indexes their scrape results to throw some loops in the mix.
When you block small scrapers from your site but permit giants like Googlebot and Bing all you're doing is locking in a monopoly that's bad for everyone
Google does not need anyone's permission to scrape publicly accessible data, and they are not required to follow any opt-out requests.
(Disclaimer: I work for priceline).
I wonder if that could already bring them on Googles radar. If so, Google would probably send a cease and desist letter and this startup would simply give up.
I wonder if Google would also demand their legal expenses? Probably a couple thousand dollars?
I know, nobody would go to court against Google - but what would happen if this did go to court? Which laws would Google cite to deem this illegal?