undefined | Better HN

0 pointsdebugnik8mo ago0 comments

> as AI scrapers bother implementing the PoW

That's what it's for, isn't it? Make crawling slower and more expensive. Shitty crawlers not being able to run the PoW efficiently or at all is just a plus. Although:

> which is trivial for them, as the post explains

Sadly the site's being hugged to death right now so I can't really tell if I'm missing part of your argument here.

> figure out that they can simply remove "Mozilla" from their user-agent

And flag themselves in the logs to get separately blocked or rate limited. Servers win if malicious bots identify themselves again, and forcing them to change the user agent does that.

0 comments

throwawayffffas8mo ago

> That's what it's for, isn't it? Make crawling slower and more expensive.

The default settings produce a computational cost of milliseconds for a week of access. For this to be relevant it would have to be significantly more expensive to the point it would interfere with human access.

mfost8mo ago

I thought the point (which the article misses) is that a token gives you an identity, and an identity can be tracked and rate limited.

So a crawlers that goes very ethically and does very little strain on the server should indeed be able to crawl for a whole week on a cheap compute, one that hammers the server hard will not.

throwawayffffas8mo ago

Sure but it's really cheap to mint new identities, each node on their scrapping cluster can mint hundreds of thousands of tokens per second.

Provisioning new ips is probably more costly than calculating the tokens, at least with the default difficulty setting.

seba_dos18mo ago

...unless you're sus, then the difficulty increases. And if you unleash a single scrapping bot, you're not a problem anyway. It's for botnets of thousands, mimicking browsers on residual connections to make them hard to filter out or rate limit, effectively DDoSing the server.

Perhaps you just don't realize how much did the scraping load increase in the last 2 years or so. If your server can stay up after deploying Anubis, you've already won.

dale_glass8mo ago

How is it going to hurt those?

If it's an actual botnet, then it's hijacked computers belonging to other people, who are the ones paying the power bills. The attacker doesn't care that each computer takes a long time to calculate. If you have 1000 computers each spending 5s/page, then your botnet can retrieve 200 pages/s.

If it's just a cloud deployment, still it has resources that vastly outstrip a normal person's.

The fundamental issue is that you can't serve example.com slower than a legitimate user on a crappy 10 year old laptop could tolerate, because that starts losing you real human users. So if let's say say user is happy to wait 5 seconds per page at most, then this is absolutely no obstacle to a modern 128 core Epyc. If you make it troublesome to the 128 core monster, then no normal person will find the site usable.

throwawayffffas8mo ago

It's not really hijacked computers, there is a whole market for vpns with residential exit nodes.

The way i think it works is they provide free VPN to the users or even pay their internet bill and then sell the access to their ip.

The client just connects to a vpn and has a residential exit IP.

The cost of the VPN is probably higher than the cost for the proof of work though.

seba_dos18mo ago

> How is it going to hurt those?

In an endless cat-and-mouse game, it won't.

But right now, it does, as these bots tend to be really dumb (presumably, a more competent botnet user wouldn't have it do an equivalent of copying Wikipedia by crawling through its every single page in the first place). With a bit of luck, it will be enough until the bubble bursts and the problem is gone, and you won't need to deploy Anubis just to keep your server running anymore.

shkkmo8mo ago

The explanation of how the estimate is made is more detailed, but here is the referenced conclusion:

>> So (11508 websites * 2^16 sha256 operations) / 2^21, that’s about 6 minutes to mine enough tokens for every single Anubis deployment in the world. That means the cost of unrestricted crawler access to the internet for a week is approximately $0.

>> In fact, I don’t think we reach a single cent per month in compute costs until several million sites have deployed Anubis.

kbelder8mo ago

If you use one solution to browse the entire site, you're linking every pageload to the same session, and can then be easily singled out and blocked. The idea that you can scan a site for a week by solving the riddle once is incorrect. That works for non-abusers.

shkkmo8mo ago

Well, since they can get a unique token for every site every 6 minutes only using a free GCP VPS that doesn't really matter, scraping can easily be spread out across tokens or they can cheaply and quickly get a new one whenever the old one gets blocked.

hiccuphippo8mo ago

Wasn't sha256 designed to be very fast to generate? They should be using bcrypt or something similar.

throwawayffffas8mo ago

Unless they require a new token for each new request or every x minutes or something it won't matter.

And as the poster mentioned if you are running an AI model you probably have GPUs to spare. Unlike the dev working from a 5 year old Thinkpad or their phone.

_flux8mo ago

Apparently bcrypt has design that makes it difficult to accelerate effectively on a GPU.

Indeed a new token should be requested per request; the tokens could also be pre-calculated, so that while the user is browsing a page, the browser could calculate tickets suitable to access the next likely browsing targets (e.g. the "next" button).

The biggest downside I see is that mobile devices would likely suffer. Possible the difficulty of the challange is/should be varied by other metrics, such as the number of requests arriving per time unit from a C-class network etc.

debugnikOP8mo ago

That's a matter of increasing the difficulty isn't it? And if the added cost is really negligible, we can just switch to a "refresh" challenge for the same added latency and without burning energy for no reason.

Retr0id8mo ago

If you increase the difficulty much beyond what it currently is, legitimate users end up having to wait for ages.

debugnikOP8mo ago

And if you don't increase it, crawlers will DoS the sites again and legitimate users will have to wait until the next tech hype bubble for the site to load, which is the reason why software like Anubis is being installed in the first place.

1 more reply

therein8mo ago

I am guessing you don't realize that that means people using not the latest generation phones will suffer.

debugnikOP8mo ago

I'm not using the latest generation of phones, not in the slightest, and I don't really care, because the alternative to Anubis-like intersitials is the sites not loading at all when they're mass-crawled to death.

dcminter8mo ago

> Sadly the site's being hugged to death right now

Luckily someone had already captured an archive snapshot: https://archive.ph/BSh1l

j / k navigate · click thread line to collapse

0 comments

throwawayffffas8mo ago

> That's what it's for, isn't it? Make crawling slower and more expensive.

mfost8mo ago

I thought the point (which the article misses) is that a token gives you an identity, and an identity can be tracked and rate limited.

So a crawlers that goes very ethically and does very little strain on the server should indeed be able to crawl for a whole week on a cheap compute, one that hammers the server hard will not.

throwawayffffas8mo ago

Sure but it's really cheap to mint new identities, each node on their scrapping cluster can mint hundreds of thousands of tokens per second.

Provisioning new ips is probably more costly than calculating the tokens, at least with the default difficulty setting.

seba_dos18mo ago

Perhaps you just don't realize how much did the scraping load increase in the last 2 years or so. If your server can stay up after deploying Anubis, you've already won.

dale_glass8mo ago

How is it going to hurt those?

If it's just a cloud deployment, still it has resources that vastly outstrip a normal person's.

throwawayffffas8mo ago

It's not really hijacked computers, there is a whole market for vpns with residential exit nodes.

The way i think it works is they provide free VPN to the users or even pay their internet bill and then sell the access to their ip.

The client just connects to a vpn and has a residential exit IP.

The cost of the VPN is probably higher than the cost for the proof of work though.

seba_dos18mo ago

> How is it going to hurt those?

In an endless cat-and-mouse game, it won't.

shkkmo8mo ago

The explanation of how the estimate is made is more detailed, but here is the referenced conclusion:

>> In fact, I don’t think we reach a single cent per month in compute costs until several million sites have deployed Anubis.

kbelder8mo ago

shkkmo8mo ago

hiccuphippo8mo ago

Wasn't sha256 designed to be very fast to generate? They should be using bcrypt or something similar.

throwawayffffas8mo ago

Unless they require a new token for each new request or every x minutes or something it won't matter.

And as the poster mentioned if you are running an AI model you probably have GPUs to spare. Unlike the dev working from a 5 year old Thinkpad or their phone.

_flux8mo ago

Apparently bcrypt has design that makes it difficult to accelerate effectively on a GPU.

debugnikOP8mo ago

Retr0id8mo ago

If you increase the difficulty much beyond what it currently is, legitimate users end up having to wait for ages.

debugnikOP8mo ago

1 more reply

therein8mo ago

I am guessing you don't realize that that means people using not the latest generation phones will suffer.

debugnikOP8mo ago

dcminter8mo ago

> Sadly the site's being hugged to death right now

Luckily someone had already captured an archive snapshot: https://archive.ph/BSh1l

j / k navigate · click thread line to collapse