Google's primary out here is its reputation (not guarantee) for obeying robots.txt. If Google indexed a page that disallowed it in robots.txt, the case would be much stronger. There's also the unofficial out, which is that judges think Google is a cool large company, so they rule in their favor based on their personal biases.
Fair use is a case-by-case basis, so you can't say that Google's infringing conduct is generally accepted to be fair use. The EFF had to take on Universal in Lenz v. Universal Music Group, and that went up to the Supreme Court. That's how individuals are left to assert their fair use rights.
>Fair use is a case-by-case basis, so you can't say that Google's infringing conduct is generally accepted to be fair use.
There is so much wrong with this statement. For one, how can you call something infringing at the same time you point out that nothing has been proven? That simply defies all common logic.
Secondly, in general terms, the activities in question have been found to be non-infringing by the courts. Sure fair-use is case-by-case but if you're operating within similar parameters as a previously litigated case, then the legal risk is immensely reduced.
I don't disagree with your assertion that the legal system greatly favours the well monied/connected (I don't think anyone would). But you can't claim it to be fact that Google Search is infringing anything with little to no evidence or rulings to cite. Unless you're just stating an opinion in which case you should clearly indicate that.
Fair use is an affirmative defense. Google admits that it copies content without legal license to do so, but claims that said copies are non-infringing under fair use exemptions. I guess you're probably correct that it's no longer appropriate to refer to Google's behavior specifically as "infringing", just "copying without authorization", which, for those of us without $5 million to commit to a legal team, means "infringing". I will try to remember the special standard of law which has been allowed to Google and refer to their copying only as "unauthorized" and not "infringing" in the future.
If you review the points summarized in the Wikipedia articles you helpfully linked, you'll see that Google's defense is mostly "Yeah, but we're Google".
In Field, "the court found that the plaintiff had granted Google an implied, nonexclusive license to display the work because of Field’s failure in using meta tags to prevent his site from being cached by Google.", i.e., because Field already knew Google existed and knew there was a standard way to prevent its access but chose not to employ it, he gave Google an implied license.
Who else does that work for? Can I send an email to Netflix and tell them "Hey, if you don't want me to copy your shows, please add this in your page's HEAD element: <meta name='please-dont-download-my-shows-sir'>"? No?
I understand there are other criteria which were used to decide if Google's use was specifically infringing in addition to the implied license. Just demonstrating that Google is getting favored treatment from the judiciary that would not be available to a normal entity.
In Perfect 10 [0], the judge even explicitly indicated that he was loathe to find Google's use of thumbnails infringing because he didn't want to "impede the advance of internet technology", but that he felt the law obligated him to do so (his ruling in that matter was overturned on appeal, when the Ninth Circuit found Google's usage non-infringing). What if the defendant had been some company perceived as less technically advanced than Google? This is probably as close as you can get to an explicit statement of favoritism. The Ninth Circuit also rejected Perfect 10's claim that RAM copies were infringing (which was not the case with an unlucky non-Google company discussed further down).
What if I started indexing and rehosting thumbnails? I can assure you that I would get C&D'd almost immediately and I would be forced to shut down because I can't afford to pay lawyers for 3 years while the case works through the system (and to be honest, I'm surprised it only took 3 years). And even if I could, with a reputation less sterling than Google's, there's no reason to believe that a judge would rule in the favor of one useless guy instead of a big company. A judge would look at the case and say "Google's use was fair because it provided a public service [actually cited as part of the justification in most of your linked cases], but this guy is just using it for a few hundred people, it's definitely unfair, he owes that company more money than he'll make in his life, case dismissed".
There are many such cases on the books. I don't know if Google has a direct connection to the reptilian overlords or what, but it seems in most cases where they're not involved, the good side loses.
In Craigslist v. 3Taps, while primarily a CFAA case, 3Taps was found to be infringing copyrights by sampling Craigslist postings in order to allow its clients to plot them on a map. Being a "public service" or a "referential use" didn't matter for them. They were raked over the coals, and it's been that way with most cases.
In Ticketmaster v. RMG Technologies [1], RMG was found to infringe just by parsing a page. "Defendant's direct liability for copyright infringement is based on the automatically-created copies of ticketmaster.com webpages that are stored on Defendant's computer each time Defendant accesses ticketmaster.com. [...] Defendant contends [...] that such copies could not give rise to copyright liability because their creation constitutes fair use[.] [...] Defendant's fair use defense fails."
The case specifically discusses how, despite the precedent in Perfect 10, since the Defendant is not Google, it is bound by a site's Terms of Use and copyright law, and RAM copies, which are specifically non-infringing for Google, were infringing for RMG.
Very similar findings were made in Facebook v. Power Ventures, and the founder was left holding a bag of $3 million in personal liability.
This is a thread about the legality of HN users scraping. It seems Google is the only entity capable of making unauthorized copies and then getting courts to agree that it's fair use. For the rest of us, it's infringement, which carries stiff penalties (and this doesn't even broach the CFAA portion of the issue).
So when I say "infringing", I mean something that would be considered infringing if you aren't Google. It's apparently only infringement if the judges involved don't personally use your site and don't have to worry about personally suffering the consequences of not having access to it. :)
[0] https://www.eff.org/document/perfect-10-v-google-ninth-circu...
[1] https://scholar.google.com/scholar_case?case=147697505884223...
>Can I send an email to Netflix and tell them "Hey, if you don't want me to copy your shows, please add this in your page's HEAD element: <meta name='please-dont-download-my-shows-sir'>"?
Actually, under fair use you certainly can make a personal copy (see Betamax case). If you distribute the work you would likely run afoul of the criteria summarized above.
The robots.txt relevancy is being over stated in your argument. The main criteria used in this case is summarized above. The fact that Google provides an opt-out mechanism is a secondary, supporting argument.
>What if I started indexing and rehosting thumbnails? I can assure you that I would get C&D'd almost immediately
A determination of infringement would depend entirely on the context as related to the afore mentioned criteria. The fact that someone might try to sue is a product of the terrible system in general and you're absolutely right - as with any legal matter the entity with the deeper pockets can often bully the other guy into submission.
>In Craigslist v. 3Taps, while primarily a CFAA case, 3Taps was found to be infringing copyrights
My understanding is that the copyright part of the case was thrown out [1] and thus was settled solely around CFAA matters.
>In Ticketmaster v. RMG Technologies , RMG was found to infringe just by parsing a page.
I agree that the logic used for the judgement is absurd (for reasons that are plainly obvious to any HN user). But it's less clear whether the case would meet fair use criteria outlined above should it have come to that. My guess is that it wouldn't qualify since the usage affects the copyright holders ability to make money on the work and doesn't meet any of the other criteria for Fair Use.
>Facebook v. Power Ventures
This is not a case involving a defense of fair use (as far as I can tell). Facebook even acknowledged the users owned the data and had a right to it. The defendant was actually found to be violating CFAA and CAN-SPAM acts.
>It seems Google is the only entity capable of making unauthorized copies and then getting courts to agree that it's fair use. For the rest of us, it's infringement
Provably false [2]. It sounds like perhaps your personal experience has soured your opinion on the matter? That's understandable. But none of the evidence you've cited supports the argument that Google is infringing copyrights in its core activities nor that Google is the only entity where copyright laws and fair use legislation don't apply.
PS: To be clear, my argument revolves specifically around copyright infringement and fair use. I don't have enough understanding of other, separate legislation like CFAA to comment on that except to say that it seems overly broad and unrealistic. But that's another topic. I'm specifically arguing against calling Google a copyright infringer in a broad sense which is what you've done. That's not been proven.
[0] https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors [1] https://techcrunch.com/2013/04/30/craigslist-3taps-lawsuit-d... [2] http://fairuse.stanford.edu/overview/fair-use/cases/