Are captchas and DDoS bot protection ruining the web?
I hate these "are you human" checks too, but when a persistent threat is poking your defenses and legitimate web traffic is only 10% - 20% of your server load... you have to do something.
So the alternative here to receiving a challenge is that the site would just be blocked in your country or for your network provider.
Would you prefer to be outright blocked, or is it ok to have an annoying "are you human?" challenge?
Pretty interesting story about a tiny IP checking tool and how it sort of got out of hand sadly due to abuse. The solution? Major sold the site to Cloudflare for $1. But really kind of a shame overall.
This constant and unrelenting beating at your doors doesn't go away unless you add perimeter protection.
The options here are:
1) Block the IP and cidr ranges that are giving you trouble
2) Silently scan the connection request and block it when things look fishy
3) Provide a challenge in the return response that is difficult for bots to complete
Most of the bot protection on the internet is #2 where you don't notice you've been verified as a human and the site just loads. People hate #3 of completing a challenge, but the other option here is #1 where the site doesn't load at all.
I'd argue that bots are breaking the internet.
4) Provide a challenge in the return response that is impossible for anyone to complete
One way to see this one is to use Selenium to launch your browser. E.g., run this code in Python:
from selenium import webdriver
browser = webdriver.Chrome()
then when the browser launches start using it manually to surf the web [1]. This works great on most sites I've visited this way, including my financial institutions. But if it hits a Cloudflare CAPTCHA it fails. For example try this on fanfiction.net. It hits the browser check page if I try to go to any category or story page. I click the checkbox to tell it I'm real, get the challenge to identify the lions or whatever, do that until it is satisfied I really can identify lions...and then just goes back to the browser check page. As far as I can tell it is just an endless loop of check the box and identify the things at that point.
There are some settings you can do in Selenium to tell it to to somewhat hide from the site that Selenium is involved, which for a while allowed getting past the CAPTCHA but that stopped working after a while.
There's also a project somewhere on Github to make a Selenium Chrome driver specifically designed to not trigger bot detection, which also worked for a while and then stopped.
[1] Why would I want a Selenium-launched browser if I'm going to be using it manually? It's for sites where I want to do some automated things on just some pages. For example one of my financial institutions has a lot of options on their transaction download page, so after I finish manually doing things like checking balances, looking at recent activity, paying bills and want to finish by downloading transactions, I can have the script that launched the browser handle that.
This is just #2 and #3 combined.
It sounds like this is working as intended and also wastes your time with un-passable captchas instead of you spending more time trying to figure out how to get around their bot protection.
Another observation here is that you really shouldn't be hacking some scripts on top of your bank login. The banks know this and they are trying everything possible to dissuade you from doing this.
(Or is "success", for the anti-gun crowd, mostly about winning performative virtue contests in their own social media bubbles - while tens of thousands of "99.9% of 'em aren't like us, so we only pretend to care" people die?)
MEANWHILE, back at the Greatest Hypocrite Playoffs - I clicked on the link in a browser with cookies and js blocked. The article's web page (at Imgur.com) only says:
"If you're seeing this message, that means JavaScript has been disabled on your browser, please enable JS to make Imgur work."
(Privacy Badger & NoScript say they're blocking cookies from 6+ domains, and js from 12+ domains & subdomains. I know of Cloudflare-protected sites where allowing cookies from 1 domain and js from 2 subdomains are plenty to make them work right.)You can blame me, I think. I haven't done it yet, but I can see the day coming where one of the sites I maintain will move behind Cloudflare. That site gets too much shit. The other day Little Bobby Tables browsed the site, and it wasn't the first time. I have to choose: Deal with the low-level shit or require javascript from a bunch of users who mostly have that enabled anyway. So blame me, and all the site operators who face the same choice.
If your feed reader periodically requests a feed, cloudflare starts showing their javascript based checking your browser thingy.
For people not using a CDN and wanting to keep bots off the static content, this can for now be partially accomplished doing two things. Forcing HTTP/2.0 and one raw table iptables rule to drop TCP SYN packets that do not have an MSS in the desired range. Most poorly written bots do not even bother to set MSS. I'd wager this is something CF looks at in their eBPF logic. Blocking non HTTP/2.0 requests will drop all search engine crawlers except for Bing.
How did you set up LUA scripting to provide a JS hidden browser test with Nginx and HAProxy?
Most of these give site-wide examples but one can run the LUA by locations or other ACL's to protect specific resources or exclude specific resources from protection. e.g. RSS feeds.
> bad traffic (about 20% of requests to my sites)
i have ran many websites too and have not needed cloudflare to deal with that "problem"
This is also controlled by the Cloudflare customer. If I'm having issues with my server due to fake/hostile traffic coming to my website, you're dang right I will do what it takes to stop it.
Every day we stray further from what the web could have been if we could have nice things.
Even Cloudflare's DNS product is just standard DNS protocol, sitting behind network-level DDoS protections. It's only HTTP where they tamper with the application layer.