undefined | Better HN

story

0 pointsanileated3y ago0 comments

There’s a reason scraping is a legally grey area.

> Web scraping is legal, US appeals court reaffirms

First, the case is not closed. [0]

Second, to draw an analogy, you can use scraping in the same way you can use a computer: for legal purposes. That is, you cannot use scraping to violate copyright, just as you cannot use a computer to violate copyright.

The following being my conjecture (IANAL), there is fair use and there is copyright violation, and scraping can be used for either—it does not automatically make you a criminal, but neither is it automatically OK. If what you do is demonstrably fair use presumably you’d be fine; but OpenAI with its products cannot prove fair use in principle (and arguably the use stops being fair already at the point where it compiles works with intent to profit).

[0] https://news.ycombinator.com/item?id=31079231

0 comments

cykros3y ago

It seems the issue with scraping as it pertains to copyright issues isn't the scraping, any more than buying a book to sell off photocopies of it cheaply doesn't indicate that there is a problem with buying books. The issue is the copying, and more importantly, the distribution of those copies.

Fair use of course being the exception.

Now, as for accessing things like credentials that get left in unsecured AWS buckets is the bigger area where courts are less likely to recognize the legality of scraping. Never mind the fact that these people literally published their private data on a globally accessible platforms in a public fashion. I'm not a lawyer but I've seen reports of this leaning both directions in court, and yes, I've seen wget listed as a "hacker tool."

This is what happens when feelings matter more to the legal system than principles.

And before it's brought up, I may as well point out that no, I don't condone the actual USE of obviously private credentials found in an AWS bucket any more than I condone the use of a credit card that one may find on the sidewalk. Both are clearly in the public sphere, unprotected, but for both there is a pretty good expectation that someone put it there by accident, and that it's not YOUR credential to use.

Basically, getting back to the OP, ChatGPT hasn't done anything I've seen that'd constitute copyright infringement -- fair use seems to apply fairly well. As for the ad-supported model, adblockers did this all first. If you wanted to stop anything accessing your site that didn't view ads, there are solutions out there to achieve this. Don't be surprised when it chases away a good amount of traffic though -- you're likely serving up ad-supported content because it's not content you expected your users to pay for to begin with.

faktory3y ago

Yes but that's a technical issue. I took the parent as making a philosophical point and responded in that spirit.

williamcotton3y ago

Wouldn’t it be nice if the people on these forums were not ignorant of both philosophy or the legal system before diving into incoherent conversations about both at the same time where the main thrust is the emotions they have about these tools?

anileatedOP3y ago

One can dream.

faktory3y ago

yup

j / k navigate · click thread line to collapse

0 comments

cykros3y ago

Fair use of course being the exception.

This is what happens when feelings matter more to the legal system than principles.

faktory3y ago

Yes but that's a technical issue. I took the parent as making a philosophical point and responded in that spirit.

williamcotton3y ago

anileatedOP3y ago

One can dream.

faktory3y ago

yup

j / k navigate · click thread line to collapse