I don't want to defend them, because they gate away a good chunk of the internet with their "bot protection", but unless you do PoW (which is also ecologically a nightmare), probably fingerprinting is the way to go - completely destroying the privacy of everyone involved.
Cromite, a privacy conscious fork of Chromium for Android, has constantly issues with CloudFlare Turnstile [2] because they (Cloudflare) try to fingerprint it in multiple ways in order to pass the challenge. The only way to get it to work would be to join the CloudFlare Browser Developer program - which requires signing an NDA. Rightfully so, the project maintainer didn't want to do it.
If you want to see the extent of what CloudFlare does to fingerprint the browsers, just have a look in the issue [2] and see which flags need to be disabled in order to allow CloudFlare to pass the challenge.
I understand both sides, but at least CloudFlare could be flexible enough to fall back to PoW instead of just blocking people from sending forms or accessing websites...
Tools are inherently amoral; only people can have motives we can celebrate or condemn.
Those might ignore it, but there are always alternatives.
Can you expand? I don't see a problem with some napkin math. 5W load for 2 seconds is 0.002Wh (we have to let smartphones pass and not by doing PoW for 10s of seconds). 8 billion checks a day for a year = 8GWh.
In any case, according to some napkin math done by Kimi 2.6 (which by itself is probably already consuming more than all of my PoW challenges for the upcoming 5 years) - the situation looks incredibly in favor of PoW: https://www.kimi.com/share/19e7ef40-a432-8912-8000-0000b4a71...
Which makes me wonder why CloudFlare isn't switching to this already
your doctor seeing you naked does not destroy your privacy, it's your doctor sharing the photos with everybody that does. i.e. it problem here is that intermediaries like cloudflare don't work for you, they work for somebody else or sell the data themselves.
Only as long as legislation and law enforcement is off the table. Almost like we have those because everyone doing their own policing is not a reasonable way to run a society.
Firefox with a non-default profile can be created like that:
./firefox -CreateProfile "profile-name /home/user/.mozilla/firefox/profile-dir/"
# For, say, cloudflare that would be:
./firefox -CreateProfile "cloudflare /home/user/.mozilla/firefox/cloudflare/"
And you can launch it like that: ./firefox -profile "/home/user/.mozilla/firefox/profile-dir/"
# For cloudflare that would be:
./firefox -profile "/home/user/.mozilla/firefox/cloudflare/"
So, given that /usr/bin/firefox is just a shell script, you can - create a copy of it, say, /usr/bin/firefox-cloudflare
- adjust the relevant line, adding the -profile argument
If you use an icon to run firefox (say, /usr/share/applications/firefox.desktop), you'll need to do copy/adjust line for the icon.Of course, "./firefox" from examples above should be replaced with the actual path to executable. For default installation of Firefox the path would be in /usr/bin/firefox script.
So, you can have a separate profiles for something sensitive/invasive (linkedin, cloudflare, shops, banks, etc.) and then you can have a separate profile for everything else.
And each profile can have its own set of extensions.
(That said, I still keep separate machines. One for doing "official" things, the other for everything else)
It's either proof-of-humanity (increasingly hard to get in this day and age, particularly if accessibility is a concern), proof odf identity (even worse) or proof of system integrity, which is the least bad out of all the terrible options.
They also gate away a good many people with their "bot protection". I am extremely worried about how so many seem to have outsourced the control over who can access their websites to a company, with no second thoughts whatsoever.
As someone responsible for mitigating card testing "attacks", account harvesting, and DDOS attacks..
It is unfortunate, but the ISP industries(from telco up to transit) and CC industries aren't providing a lot of great options. This idea that people are doing things "without a second thought" is usually false when it comes to businesses.
I think the Web is on its last legs, anyway. Generative AI and LLM-instead-of-search has destroyed what little value remained.
1. If X% of the population gets wrongly branded with the scarlet letter B[ot], how do they appeal and get it fixed?
2. How will sites notice and know if their choice of "bot protection" is losing them X% of users/customers/job-seekers etc.? If it's a really robust system, they'll never even see the complaints either...
3. If everyone does detect that something is awry, will it be such a monopoly that there's no choice but to let it happen?
Bot protection with fingerprinting is just an illusion. Any signals like this which is on client side can be spoofed by an above average person. Fingerprinting is just way to consolidate the market for advertising business. Assigning Reputation to residential IP addresses and commercial blocks is is another approach to achieve the desired result. Providers would be a lot more careful to allow their IP addresses for misuses, however turns out that it would bring down the DDOS business on both sides, attackers and protectors.
Ironically, more than often its the same companies that invest in building their own bots and finding ways to stop bots from other companies.
At the upper bound, fraud can always be committed by paying real people with real accounts to perform the desired action in a way that is 100% truly indistinguishable from organic. There's fundamentally actual prevention technique at the limit.
So the entire game is only "increasing the costs until it's not viable ROI", not "holistically prevent", which is why fingerprinting is a relevant technique here.
Well I mean maybe it wasn't useless 2 years ago, but in the age of AI it definitely is.
The WebGL fingerprinting thing is cute, too. I guess it'll buy them some time since off-the-shelf solutions are going to probably not handle this well yet. That said, as long as the reward for bypassing turnstile and other anti-bot protections remains high, these things really can't do much. A decently resourced adversary can probably come up with a dozen different approaches to make this less useful. Without really looking into it much, my kneejerk is you could probably tweak Mesa to have deterministically random behavior for whatever edge cases it looks for, but you could also just have lots of different GPU/driver combos to proxy to. The web gets less open, but in an asymmetrical way. If you really have an incentive to keep botting, you'll surely find a way.
The next step is to fully give up and just essentially implement WEI. And then the bot problem disappears?
Nope. Botting will still hold tremendous value, so likely there will be many crafty workarounds and bypasses over time. And there will be countermeasures for those and workarounds for that. Guess we'll start to find out who actually has the resources and incentives to keep botting in this environment.
So what's the real solution? Well the most obvious thing to do would be to make botting less valuable. Can we? I dunno. It may have been a mistake to move so many important things to the Internet after all. I mean, some of this is just threat actors catching up with what's possible and was inevitable to begin with. But, some of it is just trying to find solutions to problems that were unnecessary to begin with. Or failing to implement solutions despite an obvious need to do so.
There are a lot of threads to pull on, here. Account takeover still holds tremendous value to threat actors. Why? In my opinion, it's because passkeys were a tremendous failure, no matter what adoption shows. If we wanted to just improve security for users, I think we didn't need to restructure the internet around another authentication mechanism that of course, provides attestation capabilities, we could've just improved on passwords. For more secure handling of passwords, PAKEs exist. Password managers exist. For anti-phishing, TOTPs exist. What if you could have the exact same passkey experience, but in such a way that everything can gracefully fallback to just passwords and TOTP, because they're the real keymatter at the end of it? Add a web standard that lets browsers and browser extensions hook into the login process, standardize PAKEs as part of the web. Cross-vendor syncronization? A problem easily solved if we ever wanted to.
Instead of that, we got the dumbest possible world. Passkeys are sometimes available, but often not. Can you sync your passkeys across devices? Probably, maybe they have blacklisted KeepassXC by now so maybe I can't :)
But a lot of stuff doesn't even offer me the option to use passkeys, so they still use passwords. Can I enter my password to log in still? No, of course not. See, I will helpfully get the option to enter my password, in addition to the option to use email or SMS, the most secure authentication scheme known to Man, but if I actually select password and enter my secure password from my secure password manager, what I get to find out is that the password option is actually password and email or SMS and there's no option to use TOTP. Oh, and you randomly get logged out for no reason sometimes.
Some of the bots will probably disappear. Like, whatever bot is throwing me several terabytes of nonsense traffic every month will probably eventually disappear since they're wasting so much bandwidth on doing literally nothing. I have no idea what the point is, but I know it can't be terribly valuable for them, and it's not terribly expensive for me. I'd love to know who the hell is doing that and why, though.
But since the web is ran mostly by crap companies like Google, it will never get its shit together, and we will get solutions like WEI and identitity verification to solve problems that were entirely manufactured (or caused by a significant lack therefore of) in the first place.
By virtue of incompetent and ignorant Devs and middle managers. Our by virtue of greed and maliciousness.
Yeah yeah never attribute to malice what can be explained by stupidity... This time no. It's both.
More to the point, these systems actually help scraping because proof of work unlocks essentially unlimited scraping, in my experience.
That said - from my experience on the other side, sure you can’t stop people like me or you, but you can stop 99% of the others. That’s more than worth it operationally.
It sure seems to keep me, the casual visitor, far away from just about any site they "protect". I have zero desire to alter my browsing configuration or use extra tools to get around turnstile, I'd rather not even visit the site in the first place.
I hate what the anti scrapper mechanisms have become but it really is the lesser evil. The alternative for many small operators is to just completely shutdown.
Obviously this is terrible, but I think there's a possibility it's the least terrible option? Another option is IP reputation, which I think is worse. Or scanning a code with a non-rooted phone, which I think is even worse than that!
There isn't one, and pretending otherwise is nonsense because humans will always provide their credentials to something to act on their behalf.
In the limit you end up with Chinese phone farms.
Cloudflare, Google Captcha, HCaptcha etc. are all shitty technical solutions because, as we are all discovering, it comes at the cost of our privacy (i.e. our personal data may monetise these services) and / or our computing resource and time. If current copyright laws aren't sufficient to prevent this, we have to acknowledge the system is broken. The answer could be enhancing it with some kind of Digital Millennium Copyright Act (DMCA) -like laws, but in favour of the creators against BigTech or rogue actors.
- Web-scraping and copyright law - https://www.neudata.co/blog/web-scraping-and-copyright-law
- Why DMCA Claims Against Web Scrapers Face Long Odds - https://capstonedc.com/insights/why-dmca-claims-against-web-...
> we have to acknowledge the system is broken
The system is broken. It probably takes, what, 10 seconds or less to use a residential or foreign proxy, 6+ months to internationally track and prosecute a single offender? So like a million times more effort going the regulatory route.
As for issues like bots overloading websites or using too many resources scaling laws will take care of it quickly, it’s not like you can’t serve thousands of RPS from a Raspberry Pi these days.
The thing why Cloudflare got invented isn't AI scrapers. These are just the latest development... the original reason why Cloudflare got created and why it experienced such a meteoric growth is DDoS and botnets.
Yes. We need regulation in the AI space. But it will be useless as long as bad actors aren't held accountable - and a lot of the bad actors aren't in our jurisdictions. You got hacked devices all over the world in giant botnets, controlled by Russia, Chinese, Iranian and North Korean actors. You got Chinese AI scraper bots as China is heavily investing into training their own models. You got Indian, Filipino and Myanmar-based scammers.
And frankly I have no idea how to get all of that under control. As much as I'd like to see sanctions against both domestic and foreign enablers of abuse (which includes residential ISPs) - it's going to be one giant ass whack-a-mole game.
Which sounds extremely difficult to differentiate
You can forget about it. It is not possible. Simple as that.
But in principle I agree that there's no good answer to this, scraping _is_ useful and I bet most of us here had scraped something, it is AI company and their use of human's material for training without consent and return that led us to this (I know botting exists in forum since forum is a thing but it is easily solved by human moderators and keyword filter)
This stupid "war against bots" is going to lead to the downfall of the Internet and effectively turn it into another walled garden where only "approved" (anti-)user agents are allowed. Don't fall for the nonsense about "AI scrapers" --- it's just a way to manufacture consent.
Those images also used to crash all the early GUI irc and chat clients that showed inline images without size checks...
These are sad times we're living as far as openness of the web goes. People would have less of a scraping problem if their websites didn't ship with 20MB of JS.
Imagine you run a company register for a local government. You want to let people look up companies by their registration number (which they must disclose in all communications to you) to see if they're legit and whether any warnings have been raised against them. You don't want unscrupulous marketers to just be able to `SELECT * FROM companies WHERE type='nail_salon' AND city='london'`.
If you aren't super strict about scraping, some shadowy business in Neverland, completely unconcerned with following your laws, will build that database.
Rate limits didn't work because they kept rotating IP addresses.
I'm pretty sure Turnstyle would allow more people through than my current solution, but this was quick and easy. I expect to have to ban more ASNs from other countries in the future but the worst bots are now gone.
I can't, because every request comes from a new IP!!!
I'm not good at creating petitions but can happily sign it. Also with stop killing games and anti-chat control.
I can imagine this can get a traction, if it's explained in youtube video to "normal" people.
I doubt politicians care much about fingerprinting, though. They're more afraid of actual businesses getting attacked by bots than they are about Linux users with weird setups not being able to access some websites.
b. Accept Only Necessary Fingerprinting
For good reason. I've run that setting for ages but I kept having to disable it and add workarounds because websites would break in weird ways. Timezones in scheduling websites being messed up nearly made me miss a couple of appointments. There's no way to tell the user Firefox isn't broken without displaying a permanent banner like "if websites are broken in any way or you see weird glitches or your computer's time is wrong or fonts look weird or videos don't always work right, click here to disable fingerprinting protection".
Interestingly, Turnstile breaks with resistfingerprinting but works with fingerprintingProtection, I guess the latter takes this crap into account.
The reason for spoofing the time zone (to UTC) is that it is one of the many things used to fingerprint users. There is an unintended side effect however: a mismatch with the IP geolocation could out you as a VPN user even if no VPN is actually used.
When Youtube still supported trends, visiting the trends page from Poland with Safari set to English gave you really interesting results. Mostly intellectually-stimulating content from channels like Veritasium, with a smattering of reviews, trailers, music and focus soundtracks thrown in. Meanwhile, visiting that same page on (Windows) Chrome set to Polish gave you the typical "you won't believe what this man just did!!!" crap.
I somewhat expect breaking sites with strict settings, I don’t expect an still wide open tracking path.
That’s deceiving.
Websites already break often with the strictest protections enabled, adding a "super duper strict protections" mode will just lead to bug reports. Even more-than-bare-basic tracking prevention has HN threads full of comments like "doesn't work on <Firefox fork>" because they don't see the connection between fingerprinting protection, WebRTC/WebGL/WebGPU, and websites not working.
People who are willing to take that bet can enable it in about:config.
>Turns out it's because Cloudflare wants to have a fingerprint of your device via WebGL, the only reason for doing this would be tracking.
> So Cloudflare just banned all WebKitGTK browsers as I guess they put an exception for Safari.
This is false. I ran firefox with:
* hardware acceleration disabled (so software renderer, nothing to fingerprint)
* resistfingerprinting enabled, including letterboxing with default window size
* webgl disabled
* VPN enabled
* In a Windows VM
By all accounts this should be the most suspicious fingerprint ever, but turnstile happily lets me through. If they want to track people, they're doing a pretty bad job. My guess is that OP's browser is getting banned because his WebKitGTK has a weird fingerprint, not because of webgl or whatever.
> Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
Official Firefox can be leaky unless you build it yourself with some build-time changes or use a fork with such[0]. Am I guessing right that you still have Webcompat, RemoteSettings, and Nimbus enabled still? How do you know a compatibility intervention isn't causing your browser to open the kimono just enough to "unbreak the page"?
> My guess is that OP's browser is getting banned because his WebKitGTK has a weird fingerprint, not because of webgl or whatever.
My guess is a different flavor of the same: Not matching an expected fingerprint (simplified: whitelist vs blacklist approach) combined with other factors.
[0]: I'm currently aware of Tor Browser, Konform Browser (am dev), Mullvad Browser, and to a certain extent Waterfox, LibreWolf, and r3df0x doing that.
fingerprintingProtection works fine on the other hand, but then again that's intentionally less intrusive.
So why is Cloudflare saying the author got blocked because of WebGL?
> > Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
> This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
While I don't have an iDevice to try, the assumption that they are special cased is fair... because they are: https://blog.cloudflare.com/eliminating-captchas-on-iphones-...
(Yes, this is basically WEI in a shinier package.)
Another case of the much predicted downfall of freedom due to "people who hide themselves must have something to hide, so they are automatically suspicious"
If tech companies weren't waging a war against public resources in an attempt to get their grubby little hands on every bit of data they can, we wouldn't be in this mess.
If there were a more reliable "I'm not a bot" signal and maybe some reliable method of rate limiting, we could do away with Turnstile and just let people through again. Unfortunately, every well-intended privacy measure is abused by AI's war against the public good.
They send these emails you know? "CF saved you XXX Gb of data and protected your from YYY attacks". I have few high load web sites which I turned CF on for a while. Knowing my traffic pretty well, I can say these "CF saved you XXX Gb of data and protected your from YYY attacks" is absolute bullshit with numbers greatly exaggerated.
Since wwe can't catch them on this lie, they can put any number they like to make their "service" attrractive.
What all security extensions do you run? After running into issues over the years, with extensions doing multiple things that fight each other, I switched to trying to block via ublock origin as much as possible, then prefer other extensions to just do one thing to extend coverage, like this one. Makes it much easier to troubleshoot/exclude/disable when it breaks something vs. fiddling in settings.
You were never entitled to it in the first place
I'll make sure to fail all cloudflare turnshit in the future.
WebGL finger printing is just one of many things you need to do if you actually want to stop automation. There is no way round it other than requiring ID of some sort.
Also by default addons.mozilla.org is a privileged site so of course they include google tracking in it and they get the proper fingerprint no matter what you have configured.
AMOs privileges are limited to (A) installing extensions with only one prompt (instead of two) (b) launching some sort of "UI Tour" feature that highlights some features of the UI and (c) extensions cannot, by default, operate on the site. That last one is an unfortunate trade-off we've made because of the massive waves of malicious extensions. You can re-enable extensions access to AMO on a case by case basis: https://support.mozilla.org/en-US/kb/quarantined-domains but I recognize this is an opt-in, non-default configuration.
I am saddened to hear we use Google Analytics on the site, but I can tell you with certainty that it is not bypassing any of Firefox's built-in fingerprinting protections or getting any privileged access that way.
Aside from general dev, could use a hand in bringing it to more platforms (mobile and flatpak are frequently asked) and taking a closer look at fingerprinting protections and what's currently tripping up the turnstile.
Yeah, this needs to be burned to the ground.
Normally websites feature test and just skip using obscure disabled APIs, or more likely, websites don't use those APIs at all or only tracking scripts use it, which are already optional usually.
Problem with CF is that if you want increased security they'll prevent you from gaining it everywhere, even on sites they don't protect, or prevent you from accessing services even the ones you paid for. Browsers don't allow disabling APIs per domain, so you're either at risk everywhere or you're blocked from accessing a lot of things for no particular reason.
CF can't be bothered to feature test.
That pref is there for the Tor Browser.
Also enabled by default for Konform Browser and Mullvad Browser, which borrow many of the privacy- and security-related patches from Tor Browser.
If randomized canvas stuff was cracked down upon as a bot thing but now everyone with a copy of Firefox is doing it, maybe Cloudflare should just “legalize” it?
The breadth of responses here about people who can't reproduce this (or can) is one of the most frustrating things about working on fingerprinting protection. I also cannot reproduce this behavior, and have to assume that there is some complicated, behind-the-scenes risk assessment that is being done and some people trigger it and some don't. If any Cloudflare devs want to chat, I would love to. While not a normal way to contact us (support requests will be ignored), I can be reached at security@mozilla.com
[0]: https://konform-browser.codeberg.page/
[1]: Most? All? Without any telemetry, relying on user reports and our own testing here.
Internet Archive passed?
I would get locked out of the account on all devices after saying these things until I compeleted their turnstile. For many accounts I just never used them again.
I could go more into this, but im highly suspicious of Cloudflare and of course X/Twitter in this regard. Ive been reccomend people to follow on anonymous twitter accounts for people I went to elementary school with and havent spoken to in years and have no digital connection to. Its very weird.
So no real benefit for bot detection here. Just a privacy nightmare for everyone else.
I use Cloudflare protection on all my website but only the account creation page uses Turnstyle.
Which, to be clear, is the entire problem: given how much of the internet goes through them, they should have enough alternative signals as to wether you’re not a bad actor that are stronger than this specific one.
However, this also presents the problem that there’s barely any users in their base with your exact configuration, so getting any actual solutions might just take forever.
So if you want privacy, you have to accept poor and sometimes insecure services.
I'd like to hear from someone who worked on WebGL and how they feel about their ambitions being utterly subverted. Remember when the dream was playing games i. the browser?
I keep getting the turnstile and having to click the "I a human" button.
this can mean WebContent process is crashing