A big reason we invest in this is because we want to keep free and logged-out access available for more users. My team’s goal is to help make sure the limited GPU resources are going to real users.
We also keep a very close eye on the user impact. We monitor things like page load time, time to first token and payload size, with a focus on reducing the overhead of these protections. For the majority of people, the impact is negligible, and only a very small percentage may see a slight delay from extra checks. We also continuously evaluate precision so we can minimize false positives while still making abuse meaningfully harder.
Said another way, if done in the background the user wouldn’t even notice unless they typed and submitted their query before the check completed. In the realistic scenario this would complete before they even submit their request.
The reason it has to block until it's loaded is that otherwise the signal being missing doesn't imply automation. The user might have just typed before it loaded. If you know a legit user will always deliver the data, you can use the absence of it to infer something about what's happening on the client. You can obviously track metrics like "key event occurred before bot detection script did" without using it as an automation signal, just for monitoring.
I don’t know whether ChatGPT is one of those products, but if it is, that behavior might be a side effect of blocking the input pipeline until verification completes. It might be that they want to get every single one of your keystrokes, but only after checking that you’re not a bot.
Did you mean to use the word hypocrisy. If not, I'm happy to have said it.
I just want to note, that it is well covered how good the support is for actual malware...
> we want to keep free and logged-out access available for more users
I have no doubt that many people see the free ChatGPT access as a convenient target for browser automation to get their own free ChatGPT pseudo-API.
The former relies on fairly controversial ideas about copyright and fair use to qualify as abuse, whereas the latter is direct financial damage – by your own direct competitors no less.
It's fun to poke at a seeming hypocrisy of the big bad, but the similarity in this case is quite superficial.
I bet people being fucking DDOSed by AI bots disagree
Also the fucking ignorance assuming it's "static content" and not something needing code running
You imply that "an expensive llm service" is harmed by abuse, but, every other service is not? Because their websites are "static" and "near-zero marginal cost"?
You have no clue what you are talking about.
Also, it's not just the cost of the bandwidth and processing. Information has value too. Otherwise they wouldn't bother scraping it in the first place. They compete directly with the websites featuring their training data and thus they are taking away value from them just as the bots do from ChatGPT.
In fact the more I think of it, I think it's exactly the same thing.
Never in 15 years if running the website did we have such issues, and you can be sure that cache layers were in place already for it to last this long.
https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...
What "$FOO" actually is, is irrelevant. I'm curious how you would convince people that this sort of rule is fair.
The corp can always ban users who break ToS, after all. They don't need any help. The charitable initiative can't actually do that, can they?
It hasn’t even been updated in years so hell if I know why it needs to be fetched constantly and aggressively, - but fuck every single one of these companies now whining about bots scraping and victimizing them, here’s my violin.
> net-zero marginal cost
Lol, you single-handedly created a market for Anubis, and in the past 3 years the cloudflare captchas have multiplied by at least 10-fold, now they are even on websites that were very vocal against it. Many websites are still drowning - gnu family regularly only accessible through wayback machine.Spare me your tears.
It's not possible to know in advance what is static and what is not. I have some rather stubborn bots make several requests per second to my server, completely ignoring robots.txt and rel="nofollow", using residential IPs and browser user-agents. It's just a mild annoyance for me, although I did try to block them, but I can imagine it might be a real problem for some people.
I'm not against my website getting scraped, I believe being able to do that is an important part what the web is, but please have some decency.
(TBH it's not clear to me that their marginal costs are low. They seem to pick based on narrative.)
How do you know the content is static?
Stealing the content from the whole planet & actively reducing the incentive to visit the sites without financial restitution is pretty bad.
Stop justifying their anti-social behavior because it lines your pockets.
I obviously disagree. I mean, on top of this we are talking about not-open OpenAI.
The gall. https://weirdgloop.org/blog/clankers
Nick, I understand the practical realities regarding why you'd need to try to tamp down on some bot traffic, but do you see a world where users are not forced to choose between privacy and functionality?
You want to go to the world's best hotel? You are gonna be on their CCTV. Staying at home is crappier but private.
Unfortunately for the first time moores law isn't helping (e.g. give a poor person an old laptop and install linux they will be fine). They can do that and all good except no LLM.
ironically, in high end hotels, there's often a lot less cctv. not none. just less. rich people enjoy privacy
Doesn't make sense, my home is much more preferable to a hotel
Make sure not to browse the Internet without adblock and/or similar.
It's a pity Firefox doesn't get the praise it deserves half as much as it cops criticism.
It's also possible to make Firefox route each container through a different proxy which could be running locally even which then can connect to multiple different VPN's. I haven't tried doing that but its certainly possible.
It's sort of possible to run different browsers with completely new identities and sometimes IP within the convenience of one. It's really underrated. I don't use the IP part of this that I have mentioned but I use multi containers quite a lot on zen and they are kind of core part of how I browse the web and there are many cool things which can be done/have been done with them.
Another way is to just do better isolation as a user. That's probably your best shot without hoping these companies change policies.
Every time I try this, I end up crossing wires (ie using the browser that 'works' for most things, more than the one that is 'broken')
search for me is now a proprietary index (like exa) that filters rubbish, with a zero data retention sla. so we don't need google profiling.
the content is distilled into markdown pulled from cloudflare's browser rendering api.
i let cloudflare absorb the torrent of trackers and robot checks, i just get md from the api with nothing else. cloudflare is poacher and gamekeeper.
an alternative is groq compound which can call browsers in parallel.
for interactive sites, or local ai browsing, i sometimes run a browser in a photon os docker with vnc, which gives you the same browser window but it runs code not on your pc.
that said little of my use is now interacting with websites, its all agentic search and websets so i don't have to spend mental energy on it myself
What are you talking about? It works fine with firefox with RFP and VPN enabled, which is already more paranoid than the average configuration. There are definitely sites where this configuration would get blocked, but chatgpt isn't one of them, so you're barking up the wrong tree here.
According to the OP:
> The program checks 55 properties spanning three layers: your browser (GPU, screen, fonts), the Cloudflare network (your city, your IP, your region from edge headers), and the ChatGPT React application itself (__reactRouterContext, loaderData, clientBootstrap).
I guess Firefox VPN will hide the IP at least. But what about the other data, is it faked by RFP? Because if not, the so-called privacy offered by this configuration is outdated.
You might be fingerprinted by OpenAI right now, as “that guy with all the Firefox anti-fingerprinting stuff enabled, even though it breaks other sites”.
Typing the chat box is slow, rendering lags and sometimes gets stuck altogether.
I have a research chat that I have to think twice before messaging because the performance is so bad.
Running on iPhone 16 safari, and MacBook Pro m3 chrome.
They did it because a lot of devices running Netflix (TVs, DVD players, etc) were underpowered and Netflix was not keen on writing separate applications. They did, however, invest into a browser engine that would have HW acceleration not just for video playback but also for moving DOM elements. Basically, sprites.
The lost art of writing efficient code...
This is generally called virtual scrolling, and it is not only an option in many common table libraries, but there are plenty of standalone implementations and other libraries (lists and things) that offer it. The technique certainly didn't originate with Netflix.
Either way, pretty wild that you can have billions of dollars at your disposal, your interface is almost purely text, and still manage to be a fuckup at displaying it without performance problems.
It's perfectly possible to write fast or slow web applications in React, same as any other framework.
Linear is one of the snappiest web applications I've ever used, and it is written in React.
That said, is it not a little bit weird that you want to protect yourself from scraping and bots, when your entire company, product, revenue, and your employment, depends on the fact that OpenAI can bot and scrape literally every part of the internet? So your moat is non-hydrated react code in the frontend?
Is this to be expected? I would presume that if I'm authenticated and paying, VPN use wouldn't be a worry. It would be nice to be able to use the tool whether or not I'm on a VPN.
Heard from a founder who recently switched his company to Claude due to OpenAI's lagginess–it's absolutely an OpenAI problem. Not an AI problem in general.
How can first-party products protect themselves from abuse by OpenAI's bots and scraping?
How do we defend against your scraping, OpenAI?
I dont want any of my content scraped or seen by you all. Frankly, fuck you all for thinking my content is owned by you.
Probably too late now but my list needs updating
Can you share these mitigations so we can mitigate against you?
The scary part is that you don't even see the irony in writing this.
Or, are you just okay "misusing" everyone for your own benefit?
Please run Cloudflare's privacy invasive tool and share all the values it generates here so we can determine if you're a real person.
But don't you run these checks on logged-in users too?
You are defining "Bots" and "Scrapers" as a subset of attackers, though.
Is this really fair? The value in your product came from people who wrote for other people, not bots, but your bot scraped them anyway.
There is no way to determine if a request that is coming from my browser is typed in by me or automated with a browser extension. Your only way to win this "war" on "attackers" is by forcing users into using your own application to access your product.
My browser extension (see my previous reply on this story) automates the existing open tab I have to all the different chat AIs (GPT, Claude, Gemini, etc).
I suppose all you can do is rate-limit each user.
Meanwhile, the rest of us (well, not me, because I don’t use your garbage product, but lots of others do) have to suffer and have our compute resources used up in the name of “protection.”
I have kind of lost count of how many content creators have said personally to me traffic is meaningfully down because of all these chatbots. The latest example is this poor but standup guy: moneyfortherestofus.com.
This has to be a joke, right?
If every company behaved like you do, the internet would be a much worse place.
In fact, OpenAI has already made the Internet a much worse place, already much, much less open and much less optimistic about its own future than it was even five years ago...
Basically an oxymoron at this point.
Thank you for the reply, Nick. It wouldn’t be a problem to disable the tracking for authenticated users then, would it?
This would be fucking HILARIOUS if it wasn't so tragic.
Do you guys see the irony here?
I ask because I have seen huge variations in load time. Sometimes I had to wait seconds until being able to type. Nowadays it seems better though.
what an odd thing to say for someone whose product is built entirely on exactly that
I assumed it was maybe some tokenization going on client side, but now I realize maybe it's some proof of work related to prompt length?
I presume the local ChatGPT.app has even more measures to prevent automation, right? Presumably privacy-invasive ones as it is customary these days?
Is there a way I can opt out? I really, really, really don't like it.
Are you applying the same standards to your own scraper bots?
There have been times when, across about ten minutes of usage, most of which is me typing on iOS Safari, it drained 15% of my battery. There is no functional justification for this beyond poor code quality. (It was on a long conversation FWIW.)
This when I'm logged in, with a paid (Plus) account, connected to a very old email address with a real user profile. That can't be the result of super-clever bot defense measures, because it's merely an inconvenience on desktop. And if you genuinely believe that email has been compromised, why aren't you reaching out the to the account owner, as the account isn't otherwise connected to fraud by your heuristics?
However brilliant the LLM agent it is, I'm seeing a lot of unforced errors regarding how you implement a web interface to it. If it makes you feel any better, it doesn't really register compared to all the bloat I see on other sites.
When I appealed the ban, I was told that I couldn't be told exactly why I was banned, but if I wrote a written apology and "promised to never do it again" my ban could be appealed.
I asked for an update on the ban via email every month for over a year.
Maybe you could tell me a little bit about that process?
Have you just described the dilemma facing all the content sites used to train LLMs?
I don't want to blame AI for all the world's problems. And I don't want to throw the baby out with the bath water. But I think you should think really hard about the value of gates. Smart people can build better gates than cash. But right now, cash might be better than nothing. Clearly you have already thought about how to build gates, but I don't think you have spent enough time thinking about who should be gated and why. You should think about gates that have more purpose than just maximizing your profit.
"We want to hook as many people as possible without letting in our competitors" is a pretty crummy thought to use as a public justification.
(Edited for typos.)
Also if you could pass this over: it takes 5 taps to change thinking effort on ios and none (as in completely hidden) on macos.
If I were to guess it seems that you were trying to lower the token usage :-). Why the effort is only nicely available on web and windows is beyond me
(A) opening chatgpt.com in qubes (but staying logged out, i.e. never creating a chatgpt account)
-or-
(B) creating a freemium chatgpt account
?
(Obviously, the "best" answer would be something like running a local LLM from an airgapped machine in a concrete bunker :) But that's not what I'm after).
10/10, I've got no notes
Abuse from scraping has long been a serious problem for many, good job!
Is user base that never logs in really that significant?
How does this comport with OpenAI's new B2B-first strategy?
> We also keep a very close eye on the user impact
Are paid or logged-in users also penalised?
The lack of self awareness...
Isn't that how you build your service from the very start? How ironic.
You what, mate? Would you please use that on yourselves first? Because it comes off as a GROSS hypocrisy. State of the art hypocrisy.
>> behavioral biometric layer
But this one, especially, takes the cake.
Quite disgusting.
Are these checks disabled for logged-in, paid users?
And THANK YOU for that!
Being able to use ChatGPT and Grok without signing in is a big part of why I like those services over Gemini etc.
Hell, dummy Claude won't even let me Sign-In-with-Apple on the Mac desktop, even though it let me Sign-UP-with-Apple on the iPhone! BUT they do support Sign-In-with-Google!!? What in the heavenly hell is this dumbassery
This whole thread was like watching a swarm of ants try and take a grasshopper down
You do see the irony here?
Here to hoping this is real person and actually created account out of concern and sharing.
It’s ok, OpenAI is cooked.
Feel bad for anyone who joined OAI in the past 12 months. Their RSU ain’t going to be worth much later this year. IPO is too late.
"Prove your humanity/age/other properties" with this mechanism quickly goes places you do not want it to go.
It would behoove people to engage with the substance of attestation proposals. It's lazy to state that any verification scheme whatsoever is equivalent to a panopticon, dystopia as thought-terminating cliche.
We really do have the technology now to attest biographical details in such a way that whoever attests to a fact about you can't learn the use to which you put that attestation and in such a way that the person who verifies your attestation can see it's genuine without learning anything about you except that one bit of information you disclose.
And no, such a ZK scheme does not turn instantly into some megacorp extracting monopoly rents from some kind of internet participation toll booth. Why would this outcome be inevitable? We have plenty of examples of fair and open ecosystems. It's just lazy to assert right out of the gate that any attestation scheme is going to be captured.
So, please, can we stop matching every scheme whatsoever for verifying facts as actors as the East German villain in a cold war movie? We're talking about something totally different.
Which places?
Just yank that ladder up behind you.
You would be an irresponsible entrepreneur if you didn't. Don't forget your legal obligation to maximise shareholder value.
Irony is truly dead. Show you have integrity by quitting your job
Isn't this the same behavior used by AI companies to gather training data? Pot, meet kettle.
If you'd like, I can write a two-sentence paragraph to send to your colleagues. It contains a special phrase which most colleagues will find difficult to ignore. Would you like me to do that?
That's... exactly expected? It's a cat and mouse game. People running botnets or AI scrapers aren't diligently setting the evil bit on their packets.
But hey, at least some bots are also not making it past Cloudflare!
In my brief experience with abuse mitigation, connections coming from VPNs or unusual IP ranges were very significantly more likely to be associated with abuse.
It depends on your users. VPNs aren’t common at all, even though you hear about them a lot on Hacker News. For types of social sites where people got banned for abuse (forums) the first step to getting back on the forum was always to sign up for a VPN and try to reconnect. It got so bad that almost every new account connecting via VPN would reveal itself as a spammer, a banned member trying to return, or someone trying to sock puppet alternate accounts for some reason.
The worst offenders are Tor IP addresses. Anyone connecting from Tor was basically guaranteed to have bad intentions.
I heard from someone who dealt with a lot of e-mail abuse that the death threats, extortion, and other serious abuse almost always came from Protonmail or one of the other privacy-first providers that I can’t remember right now. He half-jokingly said they could likely block Protonmail entirely without impacting any real users.
It’s tough for people who want these things for privacy, but the sad reality is that these same privacy protections are favored by people who are trying to abuse services.
I have yet to see a use case for VPNs for the casual internet audience, and for a tech savvy user, their better off renting through some datacenter or something, which at that point is hardly a VPN and more home IP obfuscation. All the same downsides, and at least you get real privacy.
Mullvad.
It has been proven in a court of law that when Mullvad says "no logging", they mean it.
They also regularly have security audits and publish the results[2][3]
[1]https://mullvad.net/en/blog/mullvad-vpn-was-subject-to-a-sea... [2]https://mullvad.net/en/blog/new-security-audit-of-account-an... [3]https://mullvad.net/en/blog/successful-security-assessment-o...
MullvadVPN is also another great one.
I have heard some good things about AirVPN, but I can absolutely attest for mullvad and to a degree ProtonVPN (Just with Proton, depending upon your threat model, do make the necessary precautions like buying with monero for example)
There are others, but mostly its the 2-3 that I trust.
Source? I haven't seen any evidence that the major paid VPN providers engage in any of those things. At best it's vague implications something shady is happening because one of the key people was previously at [shady organization].
At least outside the US, there's 3DS as an (admittedly often high friction) high quality cardholder verification method, but in the US, that's of course considered much too consumer-hostile, so "select 87 overpasses" it is.
I'm running firefox and seeing the normal amount.
I also like privacy. I use GrapheneOS. I compartmentalize my credit cards, emails, and phone numbers. I don't use Google products, and the list continues, but I don't complain about Cloudflare because it is painless and I understand the price I pay for privacy.
I also have home services accessible via my home website, running on my home server(s). I chose to have cloudflare to host my domain specifically for the easy bot blocking, and it blocks more than 2000 bots/day that otherwise would be trying to find vulnerabilities on my servers, which contain a lot of sensitive things. I've never had an issue personally accessing my services through cloudflare. Sometimes I have to do captchas to access my own things, and that's barely an inconvenience (I am aware the domain isn't necessary to access services, but it makes more sense for my setup and intents)
Is it TSA's "fault" that non-terrorists are subject to screening?
Imagine an apartment building with a flimsy front door lock that breaks all the time, and the landlord only telling you that that can't be helped because of all the burglars.
Of course then you got sites like gnu.org too that block you because your slightly outdated user agent.
It's the reality of how bad the bots have become.
I'm getting it on iCloud Private Relay all the time. It honestly makes it kind of useless.
Maybe that's the point? But then again, doesn't Cloudflare run part of it!? And wasn't there some "privacy-preserving captcha replacement" that iOS devices should already be opting me in to? So many questions, nobody there to answer them, because they can get away with it.
> The part that’s really annoying is its trivial for bots to bypass.
Not the ethical bots, though! My GPT-backed Openclaw staunchly refuses to go anywhere near a "I'm not a robot" button.
.. how do they expect me to find the website owner's email if I can't access said website?
is it Linux (or similar)?
is it Firefox?
If yes, to one or both, you're blocked! Clearly millions of dollars of engineering talent and petabytes of data collection should be able to come up with something more nuanced than this.
I don't do free work. I'm not going to label 50 images of crosswalks and motorcycles for free.
Curious how do you know this?
hence i am just using cloudflare remote browser rendering.
I'm building Safebox and Safecloud, where this won't be the case anymore. Not only will you have a decentralized hosting network that can sideload resources (e.g. via a browser extension that looks at your "integrity" attribute on websites) but also the websites will require you to be logged in with a HMAC-signed session ID (which means they don't need to do any I/O to reject your requests, and can do so quickly)... so the whole thing comes down to having a logged in account.
https://github.com/Safebots/Safecloud
As far as server-to-server requests, they'll be coming from a growing network of cryptographically attested TPMs (Nitro in AWS, also available in GCP, IBM, Azure, Oracle etc.) so they'll just reject based on attestations also.
In short... the cryptographically attested web of trust will mean you won't need cloudflare. What you will need, however, to prevent sybil attacks, is age verification of accounts (e.g. Telegram ID is a proxy for that if you use Telegram for authentication).
"No s̶o̶u̶p̶ internet for you!"
Good luck!
I noticed the ChatGPT app also checks Play Integrity on Android (because GrapheneOS snitches on apps when they do this), probably for the same reason. Claude's app doesn't, by the way, but it also requires a login.
"You're posting too fast! Please slow down."
Coincidentally about an hour ago, I wanted to look something up in ChatGPT and I happened to be in a browser window I don’t normally use, with no logged in accounts. I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.
They allowed anonymous requests for months now, maybe even a year.
As has been amply explained, the API pricing per token is far more for equivalent use when maximizing a subscription plan.
It isn’t really a massive hurdle to deal with this full SPA load check. If one is even aware it exists they already have the skills to bypass it anyway.
I get why people would “what about” the automation inherit in what OpenAI is doing but that is a separate matter.
Other businesses and applications can put into place their own hurdles and anti bot practices to protect the models they’ve leaned into—-and they have been.
> This is bot detection at the application layer, not the browser layer.
I kind of just assumed that all sophisticated bot-detectors and adblock-detectors do this? Is there something revealing about the finding that ChatGPT/CloudFlare's bot detector triggers on "javascript didn't execute"?
You can probably run 50 of those simultaneously if you use memory page deduplication, and with a decent CPU+GPU you ought to be able to render 50 pages a second. That's 1 cent per thousand page loads on AWS. Damn cheap.
Honestly it is a very healthy competitive market with reasonably low switching costs which drives prices down. These circumstances make rolling your own a tough sell.
if you browse them you will see that bot writers are very annoyed if they can't scrape a site with a headless browser.
you can do what you suggested, but with Linux VMs/containers. windows is too heavy, each VM will cost you 4 GB of RAM
More info here:
https://web.archive.org/web/20231107182321/https://mu0.cc/20...
I mean you missed the minigame of preventing Chrome from signaling that it’s being programmatically (webdriver etc) driven and tipping your hand, but … yup?
Specifically, Turnstile as far as I'm aware doesn't do anything specifically configurable or site specific. It works on sites that don't run React, and the cookie OpenAI-Sentinel-Turnstile-Token is not a CF cookie.
Did OpenAI somehow do something on their own API that uses data from Turnstile?
Also, you can have it spotcheck colors: light orange on light background is unreadable, ask it to find the L*[1] of colors and dark/lighten as necessary if gap < 40 (that's minimum gap for yuge header text on background, 50 for text on background, these have gap of 25)
I haven't tried this yet, but, maybe have it count word count-per-header too. It's got 11 headers for 1000 words currently, makes reading feel really stacatto and you gotta evaluate "is this a real transition or vibetransition"
[1] L* as in L*a*b*, not L in Oklab
Between the network latency and low end machines, there is an enormous lag between chatgpts response and being able to reply, especially for editing a canvas.
I've been sitting there for up to a minute plus waiting to be able to use the canvas controls or highlight text after an update.
It now behaves like Claude, attaching the paste as a file for upload rather than inlining it.
This affected page UX some and reduces the cost of the browser tab some.
At some point, maybe still true, very long conversations ~froze/crashed ChatGPT pages.
Clearly I'm blocking some tracker and it's upset about that. I allowlisted a sentry subdomain and since then got no more complaints.
Why run this check before user can type?
Why not run it later like before the message gets sent to the server?
this is meaningless btw. A browser headless or not does execute javascript.
I read it to mean: "A browser that doesn't execute the JavaScript bundle won't have [the rendered React elements]." Which is true.
Fine, by extension, you agree I can scan all of your systems for whatever I desire. This works both ways.
My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API –- when it should have been running quantized models on its own server.
Every provider seems to have been plauged by these freeloaders to such an extent that they've had to develop extreme and onerous countermeasures just to avoid losing their shirts.
What's the word? Schadenfreude?
But it seems clear to me that this is why I can't start typing right away when I first load the page and click to focus in the text field.
Why would two AI bots want to chat with each other?
Really really bad user experience, wondering about when they will leave this approach.
...I don't think that's possible even if you are a bot? I would be very surprised if OAI had their origin exposed to the internet. What is a "non-Cloudflare proxy"? Is this AI slop?
It's likely just looking at the CF properties as part of a bot scoring metric (e.g. many users from this ASN or that geoip to this specific city exhibit abusive patterns).
Sick.
There's no way this is worth it unless the models are absolutely tiny, in which case any benefits from offloading to the client is marginal and probably isn't worth the engineering effort.
And as for "but chatgpt isn't paid" (another commenter), well, then yes, that's even closer to free by removing this spying on your computer setup. But they spy on the paid users too.
These programming languages and frameworks were made for developer convenience and got wide adoption, because it makes on-boarding easier.
This obviously comes at a cost of performance, complexity and introduces a liability into a system, because they are dependencies that come with a whole bunch of assumptions about how they are used.
Is this tradeoff even worth it anymore?