ChatGPT won't let you type until Cloudflare reads your React state (opens in new tab)

(buchodi.com)

986 pointsalberto-m1mo ago615 comments

615 comments

Hey! I'm Nick, and I work on Integrity at OpenAI. These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

A big reason we invest in this is because we want to keep free and logged-out access available for more users. My team’s goal is to help make sure the limited GPU resources are going to real users.

We also keep a very close eye on the user impact. We monitor things like page load time, time to first token and payload size, with a focus on reducing the overhead of these protections. For the majority of people, the impact is negligible, and only a very small percentage may see a slight delay from extra checks. We also continuously evaluate precision so we can minimize false positives while still making abuse meaningfully harder.

vlovich1231mo ago

That still doesn’t explain why you can’t even start typing until that check proceeds. You could condition the outbound request from being processed until that’s the case. But preventing from typing seems like it’s just worse UX and the problem will fail to appear in any metrics you can track because you have no way of measuring “how quickly would the user have submitted their request without all this other stuff in the way”.

Said another way, if done in the background the user wouldn’t even notice unless they typed and submitted their query before the check completed. In the realistic scenario this would complete before they even submit their request.

mike_hearn1mo ago

I developed the first version of Google's equivalent of this (albeit theirs actually computes a constantly rotating key from the environment, it doesn't just hard-code it in the program!).

The reason it has to block until it's loaded is that otherwise the signal being missing doesn't imply automation. The user might have just typed before it loaded. If you know a legit user will always deliver the data, you can use the absence of it to infer something about what's happening on the client. You can obviously track metrics like "key event occurred before bot detection script did" without using it as an automation signal, just for monitoring.

6 more replies

p-e-w1mo ago

Many cloud products now continuously send themselves the input you type while you are typing it, to squeeze the maximum possible amount of data from your interactions.

I don’t know whether ChatGPT is one of those products, but if it is, that behavior might be a side effect of blocking the input pipeline until verification completes. It might be that they want to get every single one of your keystrokes, but only after checking that you’re not a bot.

3 more replies

dncornholio1mo ago

You cannot know what verifications they use. I could argue the disabled textbox is some sort part of the verification process. Humans will click on it while bots won't.

1 more reply

m3kw91mo ago

Because the way they have the server architecture setup and how it loads the screen. You don’t even want all the bots hitting servers

matchagaucho1mo ago

Keyboard response feels 10x slower in ChatGPT Projects (possibly for reasons other than react state).

QEDCTrL1mo ago

Sounds like anti-distillation to me. But, know what? Meh.

1 more reply

deadbabe1mo ago

Remember you’re talking to a vibe coder who just stares at code being printed out by AI.

1 more reply

Imnimo1mo ago

It's interesting to me that OpenAI considers scraping to be a form of abuse.

DrinkyBird1mo ago

It’s funny because the first AI scraper I remember blocking was from OpenAI’s, as it got stuck in a loop somehow and was impacting the performance of a wiki I run. All to violate every clause of the CC BY-NC-SA license of the content it was scraping :)

raincole1mo ago

Quite sure even literal thieves would consider thievery a form of abuse.

3 more replies

zer00eyz1mo ago

" Integrity at OpenAI .. protect ... abuse like bots, scraping, fraud "

Did you mean to use the word hypocrisy. If not, I'm happy to have said it.

I just want to note, that it is well covered how good the support is for actual malware...

jordanb1mo ago

They don't want anyone to take that which they have rightfully stolen.

2 more replies

axegon_1mo ago

The levels of irony that shouldn't be possible...

ProofHouse1mo ago

The irony is thick

sabedevops1mo ago

Seriously. The hypocrisy is staggering!

wiseowise1mo ago

Church, politicians, moralists are all the biggest hypocrites that want to teach you something.

1 more reply

gib4441mo ago

And have absolutely no reservations about making such an obvious statement on a public forum

RobotToaster1mo ago

"You're trying to kidnap what I've rightfully stolen!"

Aurornis1mo ago

I interpreted scraping to mean in the context of this:

> we want to keep free and logged-out access available for more users

I have no doubt that many people see the free ChatGPT access as a convenient target for browser automation to get their own free ChatGPT pseudo-API.

2 more replies

rsrsrs861mo ago

This

miki1232111mo ago

It's not scraping they're concerned about, it's abusing free GPU resources to (anonymously) generate (abusive) content.

nikitaga1mo ago

Scraping static content from a website at near-zero marginal cost to its server, vs scraping an expensive LLM service provided for free, are different things.

The former relies on fairly controversial ideas about copyright and fair use to qualify as abuse, whereas the latter is direct financial damage – by your own direct competitors no less.

It's fun to poke at a seeming hypocrisy of the big bad, but the similarity in this case is quite superficial.

PunchyHamster1mo ago

> Scraping static content from a website at near-zero marginal cost to its server, vs scraping an expensive LLM service provided for free, are different things.

I bet people being fucking DDOSed by AI bots disagree

Also the fucking ignorance assuming it's "static content" and not something needing code running

7 more replies

not2b1mo ago

I understand why OpenAI is trying to reduce its costs, but it simply isn't true that AI crawlers aren't creating very significant load, especially those crawlers that ignore robots.txt and hide their identities. This is direct financial damage and it's particularly hard on nonprofit sites that have been around a long time.

2 more replies

lm4111mo ago

That is ridiculous.

You imply that "an expensive llm service" is harmed by abuse, but, every other service is not? Because their websites are "static" and "near-zero marginal cost"?

You have no clue what you are talking about.

1 more reply

cicko1mo ago

Interesting how other people's cost is "near-zero marginal cost" while yours is "an expensive LLM service". Also, others' rights are "fairly controversial ideas about copyright and fair use" while yours is "direct financial damage". I like how you frame this.

sandeepkd1mo ago

Lets not try to qualify the wrongs by picking a metric and evaluating just one side of it. A static website owner could be running with a very small budget and the scraping from bots can bring down their business too. The chances of a static website owner burning through their own life savings are probably higher.

2 more replies

alsetmusic1mo ago

Have you not seen the multiple posts that have reached the front page of HN with people taking self-hosted Git repos offline or having their personal blogs hammered to hell? Cause if you haven't, they definitely exist and get voted up by the community.

bakugo1mo ago

The cost is so marginal that many, many websites have been forced to add cloudflare captchas or PoW checks before letting anyone access them, because the server would slow to a crawl from 1000 scrapers hitting it at once otherwise.

AmbroseBierce1mo ago

It's not like those models are expensive because the usefulness that they extracted from scraping others without permission right? You are not even scratching the surface of the hypocrisy

wolvoleo1mo ago

It's more ironic because without all the scraping openai has done, there would have been no ChatGPT.

Also, it's not just the cost of the bandwidth and processing. Information has value too. Otherwise they wouldn't bother scraping it in the first place. They compete directly with the websites featuring their training data and thus they are taking away value from them just as the bots do from ChatGPT.

In fact the more I think of it, I think it's exactly the same thing.

1 more reply

VadimPR1mo ago

Getting scraped by abusive bots who bring down the website because they overload the DB with unique queries is not marginal. I spent a good half of last year with extra layers of caching, CloudFlare, you name it because our little hobby website kept getting DDoS'd by the bots scraping the web for training data.

Never in 15 years if running the website did we have such issues, and you can be sure that cache layers were in place already for it to last this long.

unsungNovelty1mo ago

"near-zero marginal costs". For whom exactly????

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...

lelanthran1mo ago

I don't think a rule along the lines of "Doing $FOO to a corporate is forbidden, but doing $FOO to a charitable initiative is fine" is at all fair.

What "$FOO" actually is, is irrelevant. I'm curious how you would convince people that this sort of rule is fair.

The corp can always ban users who break ToS, after all. They don't need any help. The charitable initiative can't actually do that, can they?

ungreased06751mo ago

You’re describing the tragedy of the commons. No single raindrop thinks it’s responsible for the flood.

1 more reply

razingeden1mo ago

It is direct financial damage if my servers not on an unmetered connection — after years of bills coming in around $3/mo I got a surprise >$800 bill on a site nobody on earth appears to care about besides AI scrapers.

It hasn’t even been updated in years so hell if I know why it needs to be fetched constantly and aggressively, - but fuck every single one of these companies now whining about bots scraping and victimizing them, here’s my violin.

1 more reply

the_sleaze_1mo ago

60% of our traffic is bot, on average. Sometimes almost 100%.

not_your_vase1mo ago

  > net-zero marginal cost

Lol, you single-handedly created a market for Anubis, and in the past 3 years the cloudflare captchas have multiplied by at least 10-fold, now they are even on websites that were very vocal against it. Many websites are still drowning - gnu family regularly only accessible through wayback machine.

Spare me your tears.

grishka1mo ago

> Scraping static content from a website at near-zero marginal cost to its server

It's not possible to know in advance what is static and what is not. I have some rather stubborn bots make several requests per second to my server, completely ignoring robots.txt and rel="nofollow", using residential IPs and browser user-agents. It's just a mild annoyance for me, although I did try to block them, but I can imagine it might be a real problem for some people.

I'm not against my website getting scraped, I believe being able to do that is an important part what the web is, but please have some decency.

xmcqdpt21mo ago

AI providers also claim to have small marginal costs. The costs of token is supposedly based on pricing in model training, so not that different from eg your server costs being low but the content production costs being high. And in many cases AI companies are direct competitors (artists, musicians etc.)

(TBH it's not clear to me that their marginal costs are low. They seem to pick based on narrative.)

SkiFire131mo ago

> Scraping static content

How do you know the content is static?

ori_b1mo ago

My website serving git that only works from Plan 9 is serving about a terabyte of web traffic monthly. Each page load is about 10 to 30 kilobytes. Do you think there's enough organic, non-scraper interest in the site that scrapers are a near-zero part of the cost?

make31mo ago

Absolutely not, the former relies on controversial ideas to qualify as legal.

Stealing the content from the whole planet & actively reducing the incentive to visit the sites without financial restitution is pretty bad.

foobiekr1mo ago

You are, of course, ignoring the production costs of the static content that OpenAi is stealing.

Stop justifying their anti-social behavior because it lines your pockets.

swagmoney16061mo ago

And yet I have to pay in my time and cash to handle the constant ddos'es from the constant LLM scraping

AtlasBarfed1mo ago

Because you say it is?

I obviously disagree. I mean, on top of this we are talking about not-open OpenAI.

gmerc1mo ago

It’s not for techbros to decide at what threshold of theft it’s actually theft. “My GPU time is more valuable than your CPU time” isn’t a thing and Wikipedias latest numbers on scraping show that marginal costs at scale are a valid concern

mcfedr1mo ago

I'm sure the copyright holders would consider your use of their content as direct financial damage

nozzlegear1mo ago

Are they, actually?

nickphx1mo ago

Speak for yourself.

karlshea1mo ago

I don’t know what world you live in but it’s not this one.

andrepd1mo ago

> Scraping static content from a website at near-zero marginal cost to its server

The gall. https://weirdgloop.org/blog/clankers

platybubsy1mo ago

Bait or genuine techbro? Hard to say

everdrive1mo ago

It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.

Nick, I understand the practical realities regarding why you'd need to try to tamp down on some bot traffic, but do you see a world where users are not forced to choose between privacy and functionality?

mememememememo1mo ago

Local models for privacy.

You want to go to the world's best hotel? You are gonna be on their CCTV. Staying at home is crappier but private.

Unfortunately for the first time moores law isn't helping (e.g. give a poor person an old laptop and install linux they will be fine). They can do that and all good except no LLM.

karlgkk1mo ago

> You want to go to the world's best hotel? You are gonna be on their CCTV.

ironically, in high end hotels, there's often a lot less cctv. not none. just less. rich people enjoy privacy

2 more replies

nozzlegear1mo ago

> Staying at home is crappier but private.

Doesn't make sense, my home is much more preferable to a hotel

2 more replies

0x3f1mo ago

Meet me in a cafe and I will sign a JWT saying you're not a bot. You can submit this to whoever will accept it.

magicseth1mo ago

If apple approves it, ive got a solution: A keyboardthat attests to your humanity https://typed.by/magicseth/2451#2NyGLfAQxmqRiAOTlaX7ma3G4d1o...

7 more replies

jagged-chisel1mo ago

Sounds like we’re bringing back the PGP key signing parties

2 more replies

tshaddox1mo ago

Doesn’t really make sense, because any service can just say “you must paste your human-attestation JWT here to use this service” and plenty of people will.

1 more reply

kevin_thibedeau1mo ago

I've been doing that for years. Cloudflare is slowly breaking more and more of the web.

atoav1mo ago

What if I run a website and OpenAI produces bot traffic? Do they also consider it abuse when they do it?

subscribed1mo ago

This is indeed what I do. And you also should. Separate browser for banking, trusted shipping sites etc, and the normal one.

Make sure not to browse the Internet without adblock and/or similar.

SV_BubbleTime1mo ago

Firefox multicontainers are pretty cool. But it’s an advanced process that most people wouldn’t do or do correctly.

Sabinus1mo ago

I love the containers too. My current use case is to keep my YouTube account separate from my Google one. Google doesn't need all that behavioural data in one place.

It's a pity Firefox doesn't get the praise it deserves half as much as it cops criticism.

halJordan1mo ago

It is absolutely not an advanced process. It's clicking a gui. It's not advanced thinking to understand profiles. It's a basic ability to hold multiple things in your mind at once. Telling people that's difficult only increases the societal problem that being ignorant is ok.

2 more replies

Imustaskforhelp1mo ago

The possibilities with Firefox multi containers and automation scripts as well are truly endless.

It's also possible to make Firefox route each container through a different proxy which could be running locally even which then can connect to multiple different VPN's. I haven't tried doing that but its certainly possible.

It's sort of possible to run different browsers with completely new identities and sometimes IP within the convenience of one. It's really underrated. I don't use the IP part of this that I have mentioned but I use multi containers quite a lot on zen and they are kind of core part of how I browse the web and there are many cool things which can be done/have been done with them.

madrox1mo ago

I am not Nick, but there's a few ways that world happens: the free tier goes away and what people pay for more correctly reflects what they use, this all becomes cheap enough that it doesn't matter, or we come up with an end to end method of determining usage is triggered by a person.

Another way is to just do better isolation as a user. That's probably your best shot without hoping these companies change policies.

gib4441mo ago

> It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.

Every time I try this, I end up crossing wires (ie using the browser that 'works' for most things, more than the one that is 'broken')

lukewarm7071mo ago

i am increasingly moving towards a model of 'no browser'.

search for me is now a proprietary index (like exa) that filters rubbish, with a zero data retention sla. so we don't need google profiling.

the content is distilled into markdown pulled from cloudflare's browser rendering api.

i let cloudflare absorb the torrent of trackers and robot checks, i just get md from the api with nothing else. cloudflare is poacher and gamekeeper.

an alternative is groq compound which can call browsers in parallel.

for interactive sites, or local ai browsing, i sometimes run a browser in a photon os docker with vnc, which gives you the same browser window but it runs code not on your pc.

that said little of my use is now interacting with websites, its all agentic search and websets so i don't have to spend mental energy on it myself

1 more reply

cruffle_duffle1mo ago

There is also the browser I use to get Claude to route around people blocking its webfetch. Both Playwright and chrome-mcp.

1 more reply

gruez1mo ago

>It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.

What are you talking about? It works fine with firefox with RFP and VPN enabled, which is already more paranoid than the average configuration. There are definitely sites where this configuration would get blocked, but chatgpt isn't one of them, so you're barking up the wrong tree here.

scared_together1mo ago

Is your interlocutor barking up the wrong tree, or are you missing the forest for the trees?

According to the OP:

> The program checks 55 properties spanning three layers: your browser (GPU, screen, fonts), the Cloudflare network (your city, your IP, your region from edge headers), and the ChatGPT React application itself (__reactRouterContext, loaderData, clientBootstrap).

I guess Firefox VPN will hide the IP at least. But what about the other data, is it faked by RFP? Because if not, the so-called privacy offered by this configuration is outdated.

You might be fingerprinted by OpenAI right now, as “that guy with all the Firefox anti-fingerprinting stuff enabled, even though it breaks other sites”.

1 more reply

halflife1mo ago

Don’t know if it’s related to the article, but the chats ui performance becomes absolutely horrendous in long chats.

Typing the chat box is slow, rendering lags and sometimes gets stuck altogether.

I have a research chat that I have to think twice before messaging because the performance is so bad.

Running on iPhone 16 safari, and MacBook Pro m3 chrome.

DenisM1mo ago

In the good old days Netflix had "Dynamic HTML" code that would take a DOM element which scrolled out of view port and move it to the position where it was about to be scrolled in from the other end. Hence he number of DOM elements stayed constant no matter how far you scroll and the only thing that grows is the Y coordinate.

They did it because a lot of devices running Netflix (TVs, DVD players, etc) were underpowered and Netflix was not keen on writing separate applications. They did, however, invest into a browser engine that would have HW acceleration not just for video playback but also for moving DOM elements. Basically, sprites.

The lost art of writing efficient code...

zdragnar1mo ago

> Hence he number of DOM elements stayed constant no matter how far you scroll and the only thing that grows is the Y coordinate.

This is generally called virtual scrolling, and it is not only an option in many common table libraries, but there are plenty of standalone implementations and other libraries (lists and things) that offer it. The technique certainly didn't originate with Netflix.

3 more replies

groundzeros20151mo ago

This is how every scrolling list has been implemented since the 80s. We actually lost knowledge about how to build UI in the move to web

1 more reply

bschwindHN1mo ago

Almost certainly running some sort of O(n^2) algorithm on the chat text every key press. Or maybe just insane hierarchies of HTML.

Either way, pretty wild that you can have billions of dollars at your disposal, your interface is almost purely text, and still manage to be a fuckup at displaying it without performance problems.

stacktraceyo1mo ago

Same. It’s wild how bad it can get with just like a normal longer running conversation

qingcharles1mo ago

OpenAI sites are the only ones that do this to me. I have to keep a separate browser profile just for my OpenAI login with absolutely nothing installed on it or it'll end up being dogshit slow and unusable.

moffkalast1mo ago

Yeah just had this earlier today, I had to write my response in vscode and paste it in, there were literal seconds of lag for typing each character. Typical bloated React.

scq1mo ago

Just because a web application uses React and is slow, it does not follow that it is slow because of React.

It's perfectly possible to write fast or slow web applications in React, same as any other framework.

Linear is one of the snappiest web applications I've ever used, and it is written in React.

2 more replies

PunchyHamster1mo ago

That's how eating your own dogshit works, or whatever was that saying

lionkor1mo ago

Hi Nick, first of all, very cool of you to respond here instead of letting us all sit in the dark. I think that's what makes HN special.

That said, is it not a little bit weird that you want to protect yourself from scraping and bots, when your entire company, product, revenue, and your employment, depends on the fact that OpenAI can bot and scrape literally every part of the internet? So your moat is non-hydrated react code in the frontend?

Schiendelman1mo ago

Don't beat up an engineer for decisions made by company leadership. It's really inappropriate.

3 more replies

sebmellen1mo ago

Great to hear from a first-party source. I'm a Pro subscriber and my team spends well over two thousand dollars per month on OpenAI subscriptions. However, even when I'm logged in with my Pro account, if I'm using a VPN provider like Mullvad, I often have trouble using the chat interface or I get timeout errors.

Is this to be expected? I would presume that if I'm authenticated and paying, VPN use wouldn't be a worry. It would be nice to be able to use the tool whether or not I'm on a VPN.

JumpCrisscross1mo ago

> even when I'm logged in with my Pro account, if I'm using a VPN provider like Mullvad, I often have trouble using the chat interface or I get timeout errors

Heard from a founder who recently switched his company to Claude due to OpenAI's lagginess–it's absolutely an OpenAI problem. Not an AI problem in general.

vkou1mo ago

> Hey! I'm Nick, and I work on Integrity at OpenAI. These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

How can first-party products protect themselves from abuse by OpenAI's bots and scraping?

mystraline1mo ago

This is a completely in-scope question.

How do we defend against your scraping, OpenAI?

I dont want any of my content scraped or seen by you all. Frankly, fuck you all for thinking my content is owned by you.

CableNinja1mo ago

I use nginx conditionals and useragent checking, then respond with 418 or 410.

Probably too late now but my list needs updating

tedsanders1mo ago

It's documented here: https://developers.openai.com/api/docs/bots

1 more reply

wilg1mo ago

robots.txt bro https://developers.openai.com/api/docs/bots/

1 more reply

seba_dos11mo ago

Hi! It's all perfectly understandable - after all, we use things like Anubis to protect our services from OpenAI and similar actors and keep them available to the real users for exactly the same reasons.

noosphr1mo ago

>These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Can you share these mitigations so we can mitigate against you?

0x3f1mo ago

It's just Cloudflare. Bypassing it is a whole industry.

zenethian1mo ago

I read the comment as “use it to mitigate against OpenAI bots scraping the web” and not to mitigate Cloudflare.

1 more reply

dawnerd1mo ago

Flaresolverr is one way. Isn’t perfect but bypasses a lot.

lm4111mo ago

"we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform"

The scary part is that you don't even see the irony in writing this.

Or, are you just okay "misusing" everyone for your own benefit?

driverdan1mo ago

Brand new account with 2 comments in this thread. How can we be sure you're not a bot deployed to defend OpenAI?

Please run Cloudflare's privacy invasive tool and share all the values it generates here so we can determine if you're a real person.

mehov1mo ago

> because we want to keep free and logged-out access

But don't you run these checks on logged-in users too?

MyNameIsNickT1mo ago

Yep, on logged-in users too. The reason is basically the same: we want scarce compute going to real people, not attackers. Being logged in is one useful signal, but it doesn’t fully prevent automation, account abuse, or other malicious traffic, so we apply protections in both cases.

lelanthran1mo ago

> The reason is basically the same: we want scarce compute going to real people, not attackers.

You are defining "Bots" and "Scrapers" as a subset of attackers, though.

Is this really fair? The value in your product came from people who wrote for other people, not bots, but your bot scraped them anyway.

There is no way to determine if a request that is coming from my browser is typed in by me or automated with a browser extension. Your only way to win this "war" on "attackers" is by forcing users into using your own application to access your product.

My browser extension (see my previous reply on this story) automates the existing open tab I have to all the different chat AIs (GPT, Claude, Gemini, etc).

I suppose all you can do is rate-limit each user.

angoragoats1mo ago

Nothing you do can fully prevent automation. Someone who wants to automate requests badly enough will be able to do it, especially when the “protections” are as easy to decrypt and analyze as the OP proved.

Meanwhile, the rest of us (well, not me, because I don’t use your garbage product, but lots of others do) have to suffer and have our compute resources used up in the name of “protection.”

2 more replies

salawat1mo ago

More like "We want your money, but don't want to provide service." Are you sure OpenAI isn't morphing into a finance/insurance company?

1 more reply

jorvi1mo ago

I'm glad you guys at least went with CloudFlare. LMarena went with Google's ReCaptcha, which is plain evil. It'll often gaslight you and pretend you failed a captcha of identifying something as simple as fire hydrants. Another lovely trick is asking you to identify bridges or busses, but in actuality it also wants you to identify viaducts or semi-trucks.

c0_0p_1mo ago

Can't have those bots or scrapers running amok can we...

ghm21991mo ago

Would OpenAI also consider renumerations to every site they have scraped that had a robots.txt file and they chose to ignore it anyway? Feel free to not answer this question.

I have kind of lost count of how many content creators have said personally to me traffic is meaningfully down because of all these chatbots. The latest example is this poor but standup guy: moneyfortherestofus.com.

timeinput1mo ago

I'm really glad Hacker News disallows AI generated comments. The response I got from asking that question really is quite enlightening. Short answer: "no", long answer: "no -- fuck off", longer answer: "no -- fuck off -- if you want I can dig into whether or not you should fuck off harder"

pdntspa1mo ago

Y'all just salty that DeepSeek et al are training their LLMs on yours

dev1ycan1mo ago

"abuse like bots, scraping, fraud, and other attempts to misuse the platform"

This has to be a joke, right?

pera1mo ago

I really can't tell for sure (new user posting a ridiculously hypocritical corporate message on a Sunday) but if GP actually works for OpenAI the lack of self-awareness is seriously striking

singpolyma31mo ago

How?

1 more reply

conartist61mo ago

Still feels very anti-consumer.

If every company behaved like you do, the internet would be a much worse place.

In fact, OpenAI has already made the Internet a much worse place, already much, much less open and much less optimistic about its own future than it was even five years ago...

lm4111mo ago

"Integrity at OpenAI"

Basically an oxymoron at this point.

wiseowise1mo ago

> A big reason we invest in this is because we want to keep free and logged-out access available for more users.

Thank you for the reply, Nick. It wouldn’t be a problem to disable the tracking for authenticated users then, would it?

lloydatkinson1mo ago

It would because someone's KPI depends on number of tracked users lol

1 more reply

andrepd1mo ago

> OpenAI: These checks are part of how we protect products from abuse like bots, scraping, and other attempts to misuse the platform.

This would be fucking HILARIOUS if it wasn't so tragic.

rchaud1mo ago

Manifest destiny for me, border enforcement for thee.

lmz1mo ago

This kind of flawed thinking again. Like the natives didn't fight and lose wars against the manifest destiny types.

2 more replies

Chance-Device1mo ago

It can be both

witx1mo ago

> These checks are part of how we protect our first-party products from abuse like bots, scraping,

Do you guys see the irony here?

hosteur1mo ago

They obviously get it. They just do not care.

the_gipsy1mo ago

But is the title true, is typing specifically blocked? Or does it just block submitting the text?

I ask because I have seen huge variations in load time. Sometimes I had to wait seconds until being able to type. Nowadays it seems better though.

numlock861mo ago

> [...] we protect our first-party products from abuse like [...] scraping [...]

what an odd thing to say for someone whose product is built entirely on exactly that

huertouisj1mo ago

sometimes I paste giant texts (think summarization) in the chatgpt (paid) webapp and I noticed that the CPU fans spin up for about 5 seconds after, as if the text is "processed" client side somehow. this is before hitting "submit" to send the prompt to the model.

I assumed it was maybe some tokenization going on client side, but now I realize maybe it's some proof of work related to prompt length?

egorfine1mo ago

Paying customer since inception here.

I presume the local ChatGPT.app has even more measures to prevent automation, right? Presumably privacy-invasive ones as it is customary these days?

Is there a way I can opt out? I really, really, really don't like it.

radicality1mo ago

The way I use the products something like this. My main account on my MacBook - ChatGPT website, codex cli. Then, a Mac VM running via UTM with shared writable dir - anything more ‘shady’ in terms of permissions and for playing with new ai apps - eg ChatGPT/Codex standalone apps, Atlas, Claude desktop app etc. Seems to work decently enough. And I do totally agree that there should be a way to opt out of all these privacy invasive measures, especially after paying $200/mo

tipiirai1mo ago

I don't trust what OpenAI says. Sam Altman gives shivers, and these kinds of blog posts make things look even worse.

myHNAccount1231mo ago

Can you fix the resizing text box issue on Safari when a new line is inserted? When your question wraps to a newline Safari locks up for a few seconds and it's really annoying. You can test by pasting text too.

xg151mo ago

> how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Are you applying the same standards to your own scraper bots?

SilasX1mo ago

It has not been negligible for me, and, however you're doing this, there is significant room for improvement.

There have been times when, across about ten minutes of usage, most of which is me typing on iOS Safari, it drained 15% of my battery. There is no functional justification for this beyond poor code quality. (It was on a long conversation FWIW.)

This when I'm logged in, with a paid (Plus) account, connected to a very old email address with a real user profile. That can't be the result of super-clever bot defense measures, because it's merely an inconvenience on desktop. And if you genuinely believe that email has been compromised, why aren't you reaching out the to the account owner, as the account isn't otherwise connected to fraud by your heuristics?

However brilliant the LLM agent it is, I'm seeing a lot of unforced errors regarding how you implement a web interface to it. If it makes you feel any better, it doesn't really register compared to all the bloat I see on other sites.

potsandpans1mo ago

Chatgpt banned me after I said disparaging things about Sam Altman in a chat.

When I appealed the ban, I was told that I couldn't be told exactly why I was banned, but if I wrote a written apology and "promised to never do it again" my ban could be appealed.

I asked for an update on the ban via email every month for over a year.

Maybe you could tell me a little bit about that process?

jgalt2121mo ago

> we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform

Have you just described the dilemma facing all the content sites used to train LLMs?

leros1mo ago

Fwiw, I stopped using ChatGPT and went to a competitor because the checks slow down ChatGPT so much that the webapp becomes unusable in anything but a new short chat. CPU usage goes to 100%, you can't type, the entire tab freezes, etc. It's a miserable experience to use and I'm on a relatively new MacBook not some old computer. If you read around it's a very common problem people have been having for a while now.

freeopinion1mo ago

Its your business and your call. But my opinion is that I wish you would quit offering free services. I'm pretty concerned about the horrible effect your free services are having on education. Yes, AI can be an incredible tool to enhance education. But the reality is that it is decimating children's will to learn anything.

I don't want to blame AI for all the world's problems. And I don't want to throw the baby out with the bath water. But I think you should think really hard about the value of gates. Smart people can build better gates than cash. But right now, cash might be better than nothing. Clearly you have already thought about how to build gates, but I don't think you have spent enough time thinking about who should be gated and why. You should think about gates that have more purpose than just maximizing your profit.

"We want to hook as many people as possible without letting in our competitors" is a pretty crummy thought to use as a public justification.

(Edited for typos.)

piskov1mo ago

Tangential question: are there chatgpt app devs on X? There are a few from Codex team but I couldn’t find guys from “ordinary” chatgpt.

Also if you could pass this over: it takes 5 taps to change thinking effort on ios and none (as in completely hidden) on macos.

If I were to guess it seems that you were trying to lower the token usage :-). Why the effort is only nicely available on web and windows is beyond me

xtajv1mo ago

Earnest question: if I was feeling lazy and security-conscious at the same time, would I be better off...

(A) opening chatgpt.com in qubes (but staying logged out, i.e. never creating a chatgpt account)

-or-

(B) creating a freemium chatgpt account

(Obviously, the "best" answer would be something like running a local LLM from an airgapped machine in a concrete bunker :) But that's not what I'm after).

20k1mo ago

>abuse like bots, scraping

10/10, I've got no notes

cheese_van1mo ago

Abuse from scraping has long been a serious problem for many, good job!

jesuslop1mo ago

Hi Nick, the lag is quite bad in the field, honest. In desktop app in this case/datapoint. There was that "halt and catch fire" episode where they spoke about a millisencod threshold of delay that separated usability and non. Solvent hw and fiber connection.

invalidusernam31mo ago

But why block the ui until then? Surely you can just not make any requests until the checks are complete?

rglullis1mo ago

I shouldn't be giving ideas to your boss, but I bet he would be interested in making ChatGPT available only by paying customers or free for those whose who gets their eyes scanned by The Orb. Give 30 days of raised limits and we're all set to live in the dystopia he wants.

gck11mo ago

I always wondered why you even have logged out access. I'm glad I can use ChatGPT in incognito when I want a "clean room" response, but surely that's not the primary use case.

Is user base that never logs in really that significant?

pocksuppet1mo ago

This episode proves they know who you are, even when you're logged out. If they didn't know, they wouldn't let you use the service.

aucisson_masque1mo ago

Why send the Turnstile bytecode encrypted ? Surely people savvy enough to abuse the system will find out how to decrypt it, see OP, and it gives the impression that you are trying to hide stuffs you're not proud about.

pocksuppet1mo ago

Because they want to make it as hard as possible to reverse engineer. If they wanted it to be easy, they'd use <input type="checkbox" name="ishuman">I am a human

toddmorey1mo ago

Why are all these checks still performed on an authenticated, paid user?

JumpCrisscross1mo ago

> we want to keep free and logged-out access available for more users

How does this comport with OpenAI's new B2B-first strategy?

> We also keep a very close eye on the user impact

Are paid or logged-in users also penalised?

account421mo ago

> These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

The lack of self awareness...

prmoustache1mo ago

> we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Isn't that how you build your service from the very start? How ironic.

diebillionaires1mo ago

As a free tier user I only get like three queries in now without model quality reduction, so I'd say your bases are covered as far as GPU costs around misuse.

subscribed1mo ago

> "abuse like bots, scraping"

You what, mate? Would you please use that on yourselves first? Because it comes off as a GROSS hypocrisy. State of the art hypocrisy.

>> behavioral biometric layer

But this one, especially, takes the cake.

Quite disgusting.

kelnos1mo ago

> A big reason we invest in this is because we want to keep free and logged-out access available for more users.

Are these checks disabled for logged-in, paid users?

sourcecodeplz1mo ago

I really appreciate the free options, without even needing a login. Wish they would also keep the small free weekly allowance for Codex.

user39393821mo ago

Have you given any thought to what we trade when big tech elects one corporation as the gatekeeper for vast swaths of the Internet?

nicbou1mo ago

For what it's worth, I switched to Gemini because of the long ChatGPT load time. Gemini loads as fast as Google Search.

owebmaster1mo ago

The reason why you did it is clear, why you guys settle down for such a poor implementation is why this thread exists

mghackerlady1mo ago

No, leave it. Surely the mighty OpenAI can deal with the scraping. At least, it seems to think everyone else can

Razengan1mo ago

> we want to keep free and logged-out access available for more users.

And THANK YOU for that!

Being able to use ChatGPT and Grok without signing in is a big part of why I like those services over Gemini etc.

Hell, dummy Claude won't even let me Sign-In-with-Apple on the Mac desktop, even though it let me Sign-UP-with-Apple on the iPhone! BUT they do support Sign-In-with-Google!!? What in the heavenly hell is this dumbassery

arendtio1mo ago

Do you do those checks only for users without accounts or also for those with accounts?

gmerc1mo ago

the company that scrapes every until it collapses really needs to protect itself from scraping. Lol.

AndrewKemendo1mo ago

Kudos for trying

This whole thread was like watching a swarm of ants try and take a grasshopper down

SubiculumCode1mo ago

In long threads in chatgpt, it grinds to a halt in both Chrome and Firefox. Please fix

htx80nerd1mo ago

Thanks. I've used ChatGPT a million times and never had any input issues.

matheusmoreira1mo ago

> protect our first-party products from abuse like bots, scraping

You do see the irony here?

marxisttemp1mo ago

History will not be kind to you and your ilk. Quit your job.

tekawade1mo ago

Hey Nick, I find it concerning this account is. Frayed just to comment on this thread. And never even reply back to any of the real concerns.

Here to hoping this is real person and actually created account out of concern and sharing.

lifis1mo ago

Are you disabling them for paying subscribers?

crest1mo ago

Then make sure they only target the free tier!

sandeepkd1mo ago

You do not ever trust the client side. Sometimes being simple is good enough. The maximum you can do is put rate limits on the IP address and/or user account. You just do not want some one to use the product at machine speeds.

0dayman1mo ago

Hi Nick, your software is a horrendous encroachment on users' privacy and its quality is subpar to those of us who know what we're working with. We don't use your product here.

chronc63931mo ago

> Hi Nick, your software is a horrendous encroachment on users' privacy and its quality is subpar to those of us who know what we're working with. We don't use your product here.

It’s ok, OpenAI is cooked.

Feel bad for anyone who joined OAI in the past 12 months. Their RSU ain’t going to be worth much later this year. IPO is too late.

quotemstr1mo ago

We really need ZKPs of humanity

ctoth1mo ago

No, we really don't. We don't need worldcoin, we don't need papers, please. We just don't.

"Prove your humanity/age/other properties" with this mechanism quickly goes places you do not want it to go.

quotemstr1mo ago

No, it doesn't go places we "do not want it to go". What part of zero knowledge doesn't make sense? How precisely does a free, unlinkable, multi-vendor, open-source cryptographic attestation of recent humanity create something terrible?

It would behoove people to engage with the substance of attestation proposals. It's lazy to state that any verification scheme whatsoever is equivalent to a panopticon, dystopia as thought-terminating cliche.

We really do have the technology now to attest biographical details in such a way that whoever attests to a fact about you can't learn the use to which you put that attestation and in such a way that the person who verifies your attestation can see it's genuine without learning anything about you except that one bit of information you disclose.

And no, such a ZK scheme does not turn instantly into some megacorp extracting monopoly rents from some kind of internet participation toll booth. Why would this outcome be inevitable? We have plenty of examples of fair and open ecosystems. It's just lazy to assert right out of the gate that any attestation scheme is going to be captured.

So, please, can we stop matching every scheme whatsoever for verifying facts as actors as the East German villain in a cold war movie? We're talking about something totally different.

2 more replies

Muromec1mo ago

> quickly goes places you do not want it to go.

Which places?

gzread1mo ago

Sure. I'll provide an API to provide mine to your bot for $1 each time.

ryanmcbride1mo ago

Protecting your site from bots and scraping is absolutely hilarious considering how you acquired (read: stole) the data you trained your bot on dude.

Just yank that ladder up behind you.

pocksuppet1mo ago

> Just yank that ladder up behind you.

You would be an irresponsible entrepreneur if you didn't. Don't forget your legal obligation to maximise shareholder value.

blactuary1mo ago

> I work on Integrity at OpenAI

Irony is truly dead. Show you have integrity by quitting your job

MisterTea1mo ago

> These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Isn't this the same behavior used by AI companies to gather training data? Pot, meet kettle.

thegreatpeter1mo ago

You’re doing gods work sir, thank you!

wackget1mo ago

I understand it's not your area, but can you please politely tell your colleagues that the clickbait-type teaser questions from the latest model are absolutely infuriating and are quickly leading to me abandon the platform entirely?

If you'd like, I can write a two-sentence paragraph to send to your colleagues. It contains a special phrase which most colleagues will find difficult to ignore. Would you like me to do that?

rsrsrs861mo ago

Hi Nick, do you believe what you say? You scraped the shit out of everyone

tomalbrc1mo ago

Fake Account

nickphx1mo ago

the irony of your statement is hilarious, disappointing, and infuriating.

boesboes1mo ago

lol, hypocrites.

lxgr1mo ago

It's absurd how unusable Cloudflare is making the web when using a browser or IP address they consider "suspicious". I've lately been drowning in captchas for the crime of using Firefox. All in the interest of "bot protection", of course.

lucasfin0001mo ago

The real frustrating part is that Cloudflare's "definition" of suspicious keeps changing and expanding. VPN users, privacy-first browsers, uncommon IP ranges, they all get flagged. The people most likely to get caught by these systems are exactly the ones who care most about their privacy, and not the bots that they are apparently targeting.

gruez1mo ago

>The real frustrating part is that Cloudflare's "definition" of suspicious keeps changing and expanding.

That's... exactly expected? It's a cat and mouse game. People running botnets or AI scrapers aren't diligently setting the evil bit on their packets.

lxgr1mo ago

So the stable state here is all humans eventually being locked out? (Bots are getting better every day; I doubt the same is true for all humans, including those with weird browsers or networks unwilling to install some dystopian Cloudflare "Internet passport".)

But hey, at least some bots are also not making it past Cloudflare!

3 more replies

Gormo1mo ago

To the contrary, people running botnets or AI scrapers are likely going out of their way to mimic ordinary web traffic from consumer devices. Ultimately, these measures will only affect users who are trying to protect their privacy and security, and will be ineffective at stopping bots.

jagged-chisel1mo ago

That’s obviously because they’re not being “evil”

Aurornis1mo ago

> The people most likely to get caught by these systems are exactly the ones who care most about their privacy, and not the bots that they are apparently targeting.

In my brief experience with abuse mitigation, connections coming from VPNs or unusual IP ranges were very significantly more likely to be associated with abuse.

It depends on your users. VPNs aren’t common at all, even though you hear about them a lot on Hacker News. For types of social sites where people got banned for abuse (forums) the first step to getting back on the forum was always to sign up for a VPN and try to reconnect. It got so bad that almost every new account connecting via VPN would reveal itself as a spammer, a banned member trying to return, or someone trying to sock puppet alternate accounts for some reason.

The worst offenders are Tor IP addresses. Anyone connecting from Tor was basically guaranteed to have bad intentions.

I heard from someone who dealt with a lot of e-mail abuse that the death threats, extortion, and other serious abuse almost always came from Protonmail or one of the other privacy-first providers that I can’t remember right now. He half-jokingly said they could likely block Protonmail entirely without impacting any real users.

It’s tough for people who want these things for privacy, but the sad reality is that these same privacy protections are favored by people who are trying to abuse services.

3 more replies

whatisthiseven1mo ago

Which VPNs are people using that actually care about the user's privacy? Most of them don't, sell their home IP to buyers, sell their DNS history to others, etc. Worse, some of them could require invasive MITM cert stuff most users will just click yes through.

I have yet to see a use case for VPNs for the casual internet audience, and for a tech savvy user, their better off renting through some datacenter or something, which at that point is hardly a VPN and more home IP obfuscation. All the same downsides, and at least you get real privacy.

traceroute661mo ago

> Which VPNs are people using that actually care about the user's privacy?

Mullvad.

It has been proven in a court of law that when Mullvad says "no logging", they mean it.

They also regularly have security audits and publish the results[2][3]

[1]https://mullvad.net/en/blog/mullvad-vpn-was-subject-to-a-sea... [2]https://mullvad.net/en/blog/new-security-audit-of-account-an... [3]https://mullvad.net/en/blog/successful-security-assessment-o...

3 more replies

evilduck1mo ago

Using any popular datacenter's IP range for a personal VPN is likely to be outright blocked.

1 more reply

lxgr1mo ago

I'm forced to use a VPN to occasionally check my US bank account, since a foreign IP address is obviously a harbinger of unspeakable evil (while the friendly Youtube advertised neighborhood VPN is obviously evidence of pure intentions).

Imustaskforhelp1mo ago

ProtonVPN with bitcoin which you get from a monero swap is a good idea for complete privacy if you want port forwarding.

MullvadVPN is also another great one.

I have heard some good things about AirVPN, but I can absolutely attest for mullvad and to a degree ProtonVPN (Just with Proton, depending upon your threat model, do make the necessary precautions like buying with monero for example)

There are others, but mostly its the 2-3 that I trust.

1 more reply

gruez1mo ago

>Most of them don't, sell their home IP to buyers, sell their DNS history to others, etc. Worse, some of them could require invasive MITM cert stuff most users will just click yes through.

Source? I haven't seen any evidence that the major paid VPN providers engage in any of those things. At best it's vague implications something shady is happening because one of the key people was previously at [shady organization].

ymolodtsov1mo ago

Yes, using an incognito windows is more than enough to kick off their checks.

ehnto1mo ago

I recently had the insane experience of filling out 15 consecutive captchas, after, I had checked out and entered my payment information into the payment processor widget. I just wanted to submit the order. I was logged in to their website, and the bank even needed a one time code for payment. If the bank is pretty sure I am human then your ecomm site can figure it out surely.

lxgr1mo ago

That's my favorite combination: Shitty bot detection meeting shitty payment security systems.

At least outside the US, there's 3DS as an (admittedly often high friction) high quality cardholder verification method, but in the US, that's of course considered much too consumer-hostile, so "select 87 overpasses" it is.

amatecha1mo ago

A while back I was buying tickets for a gondola for a trip in Europe and the checkout process failed during payment because their site didn't load their analytics/tracking stuff with proper error-handling, so when my ad-blocker prevented the tracking stuff, their checkout process failed to handle my CC's 2-factor auth and the checkout would fail. Had to contact my CC company and work with the gondola company to tell them what they're doing wrong so they could fix their website code. Pretty sad to know whoever built their stuff actually shipped a checkout flow (for a VERY popular tourist destination) without testing with ad-blockers enabled.

lxgr1mo ago

To be fair, this sometimes seems on the ad blocker. I've definitely seen mine accidentally nuke part of the payment Javascript (or maybe the 3DS iframe?) because some substring of it matched some common ad URL, which is obviously unrecoverable for the site itself.

girvo1mo ago

Surprising really, because I'm a Firefox + Ublock Origin die hard and I never get Cloudflare captchas. Wonder what the difference is? I have CGNAT turned off, if that matters at all (probably not).

lxgr1mo ago

I could definitely imagine a public IPv4 with lots of good, logged-in Cloudflare traffic to act as a positive signal for their heuristics, possibly even overriding the Firefox penalty.

danielheath1mo ago

Maybe check your network isn't sending web traffic you're not aware of?

I'm running firefox and seeing the normal amount.

jychang1mo ago

Most people are on a CGNAT these days, drowning in captchas is the new normal. You’re at the mercy of one of your neighbors not hosting a botnet from their home computer.

perching_aix1mo ago

For better or for worse, CF's fingerprinting and traffic filtering is a lot more in-depth than just IP trend analysis. Kind of by necessity, exactly because of what you mention. So I'd think that's not as big a worry per se.

1 more reply

tokioyoyo1mo ago

Not even remotely true, I genuinely have no idea what you're talking about. The only time I get captcha'ed is when I sometimes VPN around, or do some custom browser stuff and etc. I'll even say I get captcha'ed less now than maybe 5 years ago.

1 more reply

cogman101mo ago

Every so often, usually after a firefox update, CF will get into a "I'm convinced your a bot" mode with me. I can get out of it by solving 20 CAPTCHAs.

hansvm1mo ago

It's probably just a higher rate of autonomous vehicles needing stop signs and buses identified at that moment, and cognitive bias causes you to only remember when that happens when you recently performed an update. /s

2 more replies

g-b-r1mo ago

Maybe you allow tracking and cookies?

Eji17001mo ago

I don't, and I rarely have issues with firefox. Private + blockers + VPN causes, expected, issues but otherwise i'm usually fine?

onion2k1mo ago

Is that because botnets spoof being Firefox? It's not really fair to blame Cloudflare it is. That's on the bots.

doctaj1mo ago

In what way would that not be fair? Their product giving false positives (unnecessary challenges for a normal browser humans commonly use) to real people is definitely their fault.

eks3911mo ago

That sounds like it is working as intended, not a false positive. A false positive would mean it blocked you whereas a challenge means more information is needed. You aren't noticing all of the times it correctly decides you are human, only the times when it needs to "inconvenience" you for more information because you prioritize privacy, a key similarity with some bots.

I also like privacy. I use GrapheneOS. I compartmentalize my credit cards, emails, and phone numbers. I don't use Google products, and the list continues, but I don't complain about Cloudflare because it is painless and I understand the price I pay for privacy.

I also have home services accessible via my home website, running on my home server(s). I chose to have cloudflare to host my domain specifically for the easy bot blocking, and it blocks more than 2000 bots/day that otherwise would be trying to find vulnerabilities on my servers, which contain a lot of sensitive things. I've never had an issue personally accessing my services through cloudflare. Sometimes I have to do captchas to access my own things, and that's barely an inconvenience (I am aware the domain isn't necessary to access services, but it makes more sense for my setup and intents)

gruez1mo ago

>Their product giving false positives (unnecessary challenges for a normal browser humans commonly use) to real people is definitely their fault.

Is it TSA's "fault" that non-terrorists are subject to screening?

2 more replies

lxgr1mo ago

No, using a stupid authentication/verification method with lots of false positives is always on whoever deploys it.

Imagine an apartment building with a flimsy front door lock that breaks all the time, and the landlord only telling you that that can't be helped because of all the burglars.

josephcsible1mo ago

If it's just as easy to spoof being Chrome as it is to spoof being Firefox, then it is indeed fair to blame Cloudflare if they give Firefox users more CAPTCHAs than Chrome users.

conradkay1mo ago

Not really, there's camoufox but the vast majority use modified chrome/chromium

binaryturtle1mo ago

I'm with a slightly older Firefox and can't use many websites at all anymore because the Cloudflare cancer.

Of course then you got sites like gnu.org too that block you because your slightly outdated user agent.

mghackerlady1mo ago

I... Don't think it does that? It shouldn't, anyway. How long has that been a thing? They've been hit pretty hard by the slop crew lately but I couldn't imagine it being so bad they require an up to date UA

1 more reply

geysersam1mo ago

I use firefox daily and I don't encounter the problems you describe, might be worth looking if there's some other issue.

mghackerlady1mo ago

Heaven forbid you not use JavaScript, then they can't <s>track you</s> keep the internet safe!

lm4111mo ago

That's not Cloudflare trying to make your life hard.

It's the reality of how bad the bots have become.

dawnerd1mo ago

I’ve been getting it in safari too. It’s ridiculous frankly. My residential ip must have been flagged or something. The part that’s really annoying is its trivial for bots to bypass.

lxgr1mo ago

> I’ve been getting it in safari too.

I'm getting it on iCloud Private Relay all the time. It honestly makes it kind of useless.

Maybe that's the point? But then again, doesn't Cloudflare run part of it!? And wasn't there some "privacy-preserving captcha replacement" that iOS devices should already be opting me in to? So many questions, nobody there to answer them, because they can get away with it.

> The part that’s really annoying is its trivial for bots to bypass.

Not the ethical bots, though! My GPT-backed Openclaw staunchly refuses to go anywhere near a "I'm not a robot" button.

gzread1mo ago

Cloudflare makes money on both sides. It makes money from Apple to run Private Relay and it makes money from website operators to block Private Relay. It hosts the websites of DDoS services and protects them from DDoS, too.

segmondy1mo ago

trying using firefox and then using a cellphone network for internet. sometimes i can't access a site, because i get infinite captcha. i know what a damn bus, stairwell, stop light or motorcycle looks like.

tshaddox1mo ago

Is anyone talking about the fact that this is a fundamental design flaw of the web? Or arguably even the entire Internet?

3form1mo ago

It's hard to call something a "fundamental flaw of web" if it wasn't an issue for 30 years. Unless you mean something more general that I'm missing.

tshaddox1mo ago

Arguably it didn’t see widespread commercial adoption for 30 years, and you wouldn’t expect fundamental design flaws regarding commercial incentives to manifest before that.

fastball1mo ago

Cloudflare isn't providing Turnstile as a service in a vacuum, this is a direct response to bad actors who can trivially abuse the web.

pixl971mo ago

A flaw can be fundamental but not immediate. It's probably better to say it's a fundamental flaw of the open web, that is the system collapses as the number of bad actors increases, and there is no way to prevent bad actors and have the system keep the name as open web.

lazycouchpotato1mo ago

At times I'm completely locked out of a website and Cloudflare asks me to email the website owner to get the issue resolved.

.. how do they expect me to find the website owner's email if I can't access said website?

wongarsu1mo ago

Once upon a time we had whois lookup for exactly that usecase (finding a domain's owner without visiting the site). Of course now nearly everyone has meaningless entries from some domain privacy service

amatecha1mo ago

These days I just close sites that show that "checking if you're a bot" shit. If this is how the web is going to be now, I don't care, I'll just not use it. I didn't need to see that article or post that badly anyways. I'm tired of paying the price for the sociopathic, greedy actions of others. It's especially bad for anyone who uses an open source OS like Linux or *BSD (to the extent many sites just block me automatically with a 403 Forbidden simply for using OpenBSD + Firefox, completely free pass if I try the same site from a Windows or Linux computer).

jgalt2121mo ago

We use Cloudflare to protect our content, but at the same time our machines mostly run Linux / Firefox so it really is quite a frustrating relationship. It really bums me out how much of Turnstile boils down to these two questions:

is it Linux (or similar)?

is it Firefox?

If yes, to one or both, you're blocked! Clearly millions of dollars of engineering talent and petabytes of data collection should be able to come up with something more nuanced than this.

dheera1mo ago

Exactly. For the most part all this bot protection is only protecting these websites against humans.

I don't do free work. I'm not going to label 50 images of crosswalks and motorcycles for free.

ronbenton1mo ago

> For the most part all this bot protection is only protecting these websites against humans.

Curious how do you know this?

lukewarm7071mo ago

sometimes when there is mafia you get no option but pay pizzo

hence i am just using cloudflare remote browser rendering.

EGreg1mo ago

Well, that's for the public internet.

I'm building Safebox and Safecloud, where this won't be the case anymore. Not only will you have a decentralized hosting network that can sideload resources (e.g. via a browser extension that looks at your "integrity" attribute on websites) but also the websites will require you to be logged in with a HMAC-signed session ID (which means they don't need to do any I/O to reject your requests, and can do so quickly)... so the whole thing comes down to having a logged in account.

https://github.com/Safebots/Safecloud

As far as server-to-server requests, they'll be coming from a growing network of cryptographically attested TPMs (Nitro in AWS, also available in GCP, IBM, Azure, Oracle etc.) so they'll just reject based on attestations also.

In short... the cryptographically attested web of trust will mean you won't need cloudflare. What you will need, however, to prevent sybil attacks, is age verification of accounts (e.g. Telegram ID is a proxy for that if you use Telegram for authentication).

password43211mo ago

Wow, if Seinfeld can have a soup nazi, I think it's within reason for you to be called the internet nazi.

"No s̶o̶u̶p̶ internet for you!"

Good luck!

ale421mo ago

This was sarcasm, right?

EGreg1mo ago

Why would you assume it needs to be? You don’t think that websites on the Internet might not want to allow random bots and scrapers to waste their resources, and require people to have an account in order to access non-static resources on the website? You do realize that API keys exist, right?

simonw1mo ago

Presumably this is all because OpenAI offers free ChatGPT to logged out users and don't want that being abused as a free API endpoint.

NotPractical1mo ago

But do they do it whether you're logged in or not?

I noticed the ChatGPT app also checks Play Integrity on Android (because GrapheneOS snitches on apps when they do this), probably for the same reason. Claude's app doesn't, by the way, but it also requires a login.

Gander57391mo ago

Because accounts are free, and could still be used to abuse as a free endpoint, with a little trickiness.

gzread1mo ago

Don't you need a Google account and to get a Google account you need a phone number?

"You're posting too fast! Please slow down."

1 more reply

appreciatorBus1mo ago

Yup.

Coincidentally about an hour ago, I wanted to look something up in ChatGPT and I happened to be in a browser window I don’t normally use, with no logged in accounts. I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.

gruez1mo ago

>I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.

They allowed anonymous requests for months now, maybe even a year.

solaire_oa1mo ago

Yeah, additionally gemini.google.com is also free unauthenticated, which I've been using for a very long time (a year?). Why this is being treated as news is confusing.

1 more reply

aziaziazi1mo ago

I used to mostly use chatgpt in an incognito tab, logged out. Until I notice it seems to have some context of my logged in session, and of the logged out as well. It may be paranoia or prompt deduction as well but that felt strange.

FergusArgyll1mo ago

Yeah it works but it's a dumber model. Prob mini

lelandfe1mo ago

You get a couple requests in at a smarter model and then it prompts you to sign up, and from there uses an extremely dumb model.

bredren1mo ago

It is also intended to protect the usage patterns of pro subscribers.

As has been amply explained, the API pricing per token is far more for equivalent use when maximizing a subscription plan.

It isn’t really a massive hurdle to deal with this full SPA load check. If one is even aware it exists they already have the skills to bypass it anyway.

I get why people would “what about” the automation inherit in what OpenAI is doing but that is a separate matter.

Other businesses and applications can put into place their own hurdles and anti bot practices to protect the models they’ve leaned into—-and they have been.

darepublic1mo ago

Using 5.2 at 20 a month would also be a steal. Other shoe will drop on codex sooner or later

thisisnow1mo ago

Its probably same for copilot.microsoft.com and their cloudfart usage

petcat1mo ago

> These properties only exist if the ChatGPT React application has fully rendered and hydrated. A headless browser that loads the HTML but doesn't execute the JavaScript bundle won't have them. A bot framework that stubs out browser APIs but doesn't actually run React won't have them.

> This is bot detection at the application layer, not the browser layer.

I kind of just assumed that all sophisticated bot-detectors and adblock-detectors do this? Is there something revealing about the finding that ChatGPT/CloudFlare's bot detector triggers on "javascript didn't execute"?

iancarroll1mo ago

It’s pretty interesting to me that Cloudflare is collecting additional client-side data for individual customers. This is not widely done by most anti-bot solutions.

supriyo-biswas1mo ago

OpenAI is on an enterprise plan and (presumably) gets a customized version of Turnstile.

red_admiral1mo ago

"Sophisticated" may vary, but for a lot of EU media products you can just block the script that launches the paywall/consent overlay. Sometimes disabling JS does it; sometimes activating reading mode works.

Chance-Device1mo ago

Perhaps the author should have made it clearer why we should care about any of this. OpenAI want you to use their real react app. That’s… ok? I skimmed the article looking for the punchline and there doesn’t seem to be one.

raincole1mo ago

Why does every article need a 'punchline'? It's a technical analysis. Do you expect punchlines when you read recipes or source code?

Chance-Device1mo ago

Where did I say “every article”? This is AI slop that’s set up like it’s some investigative expose of something scandalous and then shows us nothing interesting. A competent human writer would have reframed the whole thing or just not published it.

1 more reply

dmos621mo ago

For me the interesting parts of the article is how author got to the decompiled checks and what the checks are. Anti-bot is an interesting space.

elwebmaster1mo ago

That's because the article is AI slop.

londons_explore1mo ago

I just don't understand why bot owners can't just run a complete windows 11 VM running Google Chrome complete with graphics acceleration.

You can probably run 50 of those simultaneously if you use memory page deduplication, and with a decent CPU+GPU you ought to be able to render 50 pages a second. That's 1 cent per thousand page loads on AWS. Damn cheap.

jaccola1mo ago

There are myriad providers competing to offer this, nicely packaged with all the accoutrements (IP rotation, location spoofing, language settings, prebuilt parsers, etc.) behind an easy to use API.

Honestly it is a very healthy competitive market with reasonably low switching costs which drives prices down. These circumstances make rolling your own a tough sell.

arcfour1mo ago

They do, but the fact that they have to do this means there are fewer bots because it's less economical to go to such lengths, compared to something much less complex (which is orders of magnitude cheaper).

huertouisj1mo ago

there are scraping subreddits.

if you browse them you will see that bot writers are very annoyed if they can't scrape a site with a headless browser.

you can do what you suggested, but with Linux VMs/containers. windows is too heavy, each VM will cost you 4 GB of RAM

londons_explore1mo ago

The reason to use windows is that anti bot tech is going to be a lot stricter if Linux is detected...

xmcp1231mo ago

I’m in those. xvfb and headless=false still works great

poly2it1mo ago

If you know of a simple way to run a Windows 11 VM with good graphics acceleration (no GPU passthrough), please contact me.

MarioMan1mo ago

I assume your concern with GPU passthrough is that each VM needs a whole GPU? You can use GPU-PV to split your GPU between VM instances. Then the main bottleneck becomes how thin you split out your VRAM.

More info here:

https://web.archive.org/web/20231107182321/https://mu0.cc/20...

https://youtu.be/XLLcc29EZ_8?t=570

https://github.com/jamesstringer90/Easy-GPU-PV

deltoidmaximus1mo ago

Wouldn't virtualbox or vmware's paravirtual GPUs be a better fit for this use case? Unfortunately the offerings with qemu/libvirt still lag vmwares by a lot.

1 more reply

himata41131mo ago

284 on 296gb of ram with deduplication enabled on a 128c with 32Q vgpu.

YetAnotherNick1mo ago

I am reasonably sure that these kind of fingerprints can detect if the browser is inside a VM.

kristjansson1mo ago

… yup?

I mean you missed the minigame of preventing Chrome from signaling that it’s being programmatically (webdriver etc) driven and tipping your hand, but … yup?

technion1mo ago

To prompt a discussion that's purely technical: I'm interested in how this was done.

Specifically, Turnstile as far as I'm aware doesn't do anything specifically configurable or site specific. It works on sites that don't run React, and the cookie OpenAI-Sentinel-Turnstile-Token is not a CF cookie.

Did OpenAI somehow do something on their own API that uses data from Turnstile?

XYen0n1mo ago

Cloudflare should be able to determine whether a website uses React by analyzing data flowing through its CDN.

technion1mo ago

Whilst true, "validate the right state is loaded" would surely be something not done without developer input.

1 more reply

ripbozo1mo ago

and chatgpt was then used to write this article. at least try to clean it up a bit

hx81mo ago

Ah yes, the timeless hallmark of web blogs: a draft so messy even a language model would ask for a second pass.

i18nagentai1mo ago

The irony of a company that sells DDoS protection making the browsing experience worse for legitimate users. The real issue is that Cloudflare's bot detection runs JavaScript that introspects the page state — which means any site using Cloudflare is implicitly giving Cloudflare access to read the DOM of the protected application. That's a much bigger concern than the typing delay.

refulgentis1mo ago

If you have AI write a blog post for ya, when you think it's set, check word count (can c+p to google docs if AI can't pull it off with built in tools), and ask it to identify repetitions if it's over 1000.

Also, you can have it spotcheck colors: light orange on light background is unreadable, ask it to find the L*[1] of colors and dark/lighten as necessary if gap < 40 (that's minimum gap for yuge header text on background, 50 for text on background, these have gap of 25)

I haven't tried this yet, but, maybe have it count word count-per-header too. It's got 11 headers for 1000 words currently, makes reading feel really stacatto and you gotta evaluate "is this a real transition or vibetransition"

[1] L* as in L*a*b*, not L in Oklab

tommodev1mo ago

Ah, this explains chatgpt (and probably copilot) performance behind corporate firewalls such as zscaler.

Between the network latency and low end machines, there is an enormous lag between chatgpts response and being able to reply, especially for editing a canvas.

I've been sitting there for up to a minute plus waiting to be able to use the canvas controls or highlight text after an update.

bredren1mo ago

On a related note, ChatGPT.com changed how it handles large text pastes this past week.

It now behaves like Claude, attaching the paste as a file for upload rather than inlining it.

This affected page UX some and reduces the cost of the browser tab some.

At some point, maybe still true, very long conversations ~froze/crashed ChatGPT pages.

NSPG9111mo ago

I was using KeepChatGPT[1] for a while back in 2023-2024, pre-Gemini-in-Google era, and I was fascinated as to how it was able to mask being a user without needing any API or help from the end user. I stopped using it after 2024 because 1) Gemini and 2) It breaks quite a lot. I did however, like how you had an option to push the AI panel to the right, if only Google even considers doing so.

[1]: https://github.com/xcanwin/keepchatgpt

qingcharles1mo ago

I have a little helper app I run sometimes that I have a button to push a query into ChatGPT and get a json response. You wouldn't even know OpenAI had any anti-bot tools because it doesn't get flagged at all. It just uses a webview inside WinForms.

natdempk1mo ago

Does anyone know how this is integrated on the Cloudflare side and across the app? Is this beyond standard turnstile? Is this custom/enterprise functionality? Something else?

croemer1mo ago

When using ChatGPT Android app with some NextDNS block lists, I get an error modal in app saying "security misconfiguration blah blah".

Clearly I'm blocking some tracker and it's upset about that. I allowlisted a sentry subdomain and since then got no more complaints.

tosh1mo ago

It used to be possible to type immediately while the page is loading and have all key presses end up in the input field.

Why run this check before user can type?

Why not run it later like before the message gets sent to the server?

1 more reply

tripdout1mo ago

AI-written article?

1 more reply

CorneredCoroner1mo ago

> A headless browser that loads the HTML but doesn't execute the JavaScript bundle won't have them.

this is meaningless btw. A browser headless or not does execute javascript.

jaccola1mo ago

I disagree, a browser can have javascript execution disabled (and this is somewhat common in scraping to save time/resources).

I read it to mean: "A browser that doesn't execute the JavaScript bundle won't have [the rendered React elements]." Which is true.

maxwellg1mo ago

Wouldn't a browser that doesn't execute JS also not execute the browser fingerprinting code in the first place?

XYen0n1mo ago

If JavaScript is disabled, why use a headless browser instead of making HTTP requests directly?

girvo1mo ago

A bunch of the points in this AI generated blog post were like that. Makes me feel dirty when I'm 1/3rd of the way through and I realise how off it is.

thisisnow1mo ago

Hah, sure, you just let random JS execute from random sites on your machine...

lightedman1mo ago

Preventing me from typing until you SCAN MY SYSTEM?

Fine, by extension, you agree I can scan all of your systems for whatever I desire. This works both ways.

tristor1mo ago

This explains some of the weird performance behavior I've seen in the last 24 hours with ChatGPT, sometimes lagging my entire browser while typing. Note, I'm a paying user with a Teams account, so it's kind of annoying that this is being applied to logged in paying users as well. I might have to vibe-code my own chat webUI using the APIs.

TimLeland1mo ago

It seems they fixed the biggest issue Ive had where you start typing then it erases the content once the page fully loads

EGreg1mo ago

Why does ChatGPT slow down so much when the conversations get long, while Claude does compaction?

My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API –- when it should have been running quantized models on its own server.

themafia1mo ago

My theory is that "AI" doesn't really have any long term paying customers and the majority of the "users" are people who have cooked up some clever hack to effectively siphon computing power from these providers in an effort to crank out the lowest effort ad supported slop imaginable.

Every provider seems to have been plauged by these freeloaders to such an extent that they've had to develop extreme and onerous countermeasures just to avoid losing their shirts.

What's the word? Schadenfreude?

jtbayly1mo ago

Others here are asking if this is the cause of slow performance in a long chat.

But it seems clear to me that this is why I can't start typing right away when I first load the page and click to focus in the text field.

darepublic1mo ago

I imagine to stop web automation from getting free API like use of the model

dsparkman1mo ago

That explains why ChatGPT has been running like shit all weekend. In the desktop app on Mac, it could not even complete a response. On the web, it would hang before you could input anything.

heliumtera1mo ago

I am shocked openai collects data about it's users before users have the opportunity to send the same data to openai servers!

pautasso1mo ago

AI goes through great lengths to ensure it's talking with humans.

Why would two AI bots want to chat with each other?

1 more reply

edg50001mo ago

The chat client has serious performance issues on lower end systems. Now I see why!

self-portrait1mo ago

A/B testing /dev/ kit that tokenizes four permutations of language

apsurd1mo ago

Haven't read yet but instantly matched with my experience of the chat being unusable at times. The latency and glitch-like feel is unbearable.

aslihana1mo ago

I mean, I can easily get them to behaving defensively for not being abused. But MBP with M5 here, my chatgpt tab always get stucked when I hit some prompt.

Really really bad user experience, wondering about when they will leave this approach.

arcfour1mo ago

> They exist only if the request passed through Cloudflare's network. A bot making direct requests to the origin server or running behind a non-Cloudflare proxy will produce missing or inconsistent values.

...I don't think that's possible even if you are a bot? I would be very surprised if OAI had their origin exposed to the internet. What is a "non-Cloudflare proxy"? Is this AI slop?

It's likely just looking at the CF properties as part of a bot scoring metric (e.g. many users from this ASN or that geoip to this specific city exhibit abusive patterns).

j451mo ago

This is a lot of fingerprinting.

AndreyK19841mo ago

CamuFox will fix it easy peasy.

seker181mo ago

Cómo puedo acceder a un celular

aucisson_masque1mo ago

Mistral chat is also free to use without account and doesn't do that.

tom-blk1mo ago

Wild insight

gobdovan1mo ago

Imagine if they'd put as much effort into making a decent frontend experience.

littlecranky671mo ago

How is this fingerprinting even GDPR compliant? Fingerprinting + profiling need consent, and the service must work without tracking+profiling consent.

yapyap1mo ago

wow OpenAi sure doesnt like bots for a company enabling the botification of the world wide web

baggachipz1mo ago

"We wouldn't want somebody scraping our data, that's ours!"

Josephjackjrob11mo ago

cloud flare will not be around for long, its a shame as it is the GOAT lol

avazhi1mo ago

Another AI-slop article.

Sick.

pencilcode1mo ago

ai slop analysis finding CF detects non javascript capable browsers with no punchline

blinkbat1mo ago

Ok... so... ?

beering1mo ago

So are you able to get free inference now that you decrypted this?

superkuh1mo ago

It doesn't look like it in the full sense of "free". But part of how one pays these services is by running a permissive modern browser which allows the corporation to spy on you even when you already paid in currency. In a sense by depriving them of the ability to easily spy on your this workaround is closer to "free".

gruez1mo ago

>My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API

There's no way this is worth it unless the models are absolutely tiny, in which case any benefits from offloading to the client is marginal and probably isn't worth the engineering effort.

danny_codes1mo ago

It’s free as a loss leader. The trick is to upsell later. Unfortunately for OpenAI there are plenty of competitors with fungible products, so it might be hard to pull a classic monopoly rug-pull.

beering1mo ago

They already see everything I’m doing because I send my prompts to them. What “workaround” are you referring to?

superkuh1mo ago

They see everything your doing because you send the text. But this is talking about everything about your computer system. You would not normally be sending this to them or having it involved at all. This workaround allows you to not involve unneeded information about your computer setup. It is not about avoiding sending prompt text.

And as for "but chatgpt isn't paid" (another commenter), well, then yes, that's even closer to free by removing this spying on your computer setup. But they spy on the paid users too.

voxic111mo ago

But isn't ChatGPT access free through the browser? What do you mean already paid in currency?

pocksuppet1mo ago

If you want to send more than a few prompts each day, you have to pay. With currency.

dgb231mo ago

Why are companies like OpenAI and others that are all-in on LLMs still using ReactJS, Python and so on?

These programming languages and frameworks were made for developer convenience and got wide adoption, because it makes on-boarding easier.

This obviously comes at a cost of performance, complexity and introduces a liability into a system, because they are dependencies that come with a whole bunch of assumptions about how they are used.

Is this tradeoff even worth it anymore?

robmccoll1mo ago

Probably training data. The largest number of public repos are built on that stack. We recently picked React for new projects because LLMs seemed to be the most reliable when writing React code.

j / k navigate · click thread line to collapse

615 comments

MyNameIsNickT1mo ago

Hey! I'm Nick, and I work on Integrity at OpenAI. These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

A big reason we invest in this is because we want to keep free and logged-out access available for more users. My team’s goal is to help make sure the limited GPU resources are going to real users.

vlovich1231mo ago

mike_hearn1mo ago

I developed the first version of Google's equivalent of this (albeit theirs actually computes a constantly rotating key from the environment, it doesn't just hard-code it in the program!).

6 more replies

p-e-w1mo ago

Many cloud products now continuously send themselves the input you type while you are typing it, to squeeze the maximum possible amount of data from your interactions.

3 more replies

dncornholio1mo ago

You cannot know what verifications they use. I could argue the disabled textbox is some sort part of the verification process. Humans will click on it while bots won't.

1 more reply

m3kw91mo ago

Because the way they have the server architecture setup and how it loads the screen. You don’t even want all the bots hitting servers

matchagaucho1mo ago

Keyboard response feels 10x slower in ChatGPT Projects (possibly for reasons other than react state).

QEDCTrL1mo ago

Sounds like anti-distillation to me. But, know what? Meh.

1 more reply

deadbabe1mo ago

Remember you’re talking to a vibe coder who just stares at code being printed out by AI.

1 more reply

Imnimo1mo ago

It's interesting to me that OpenAI considers scraping to be a form of abuse.

DrinkyBird1mo ago

raincole1mo ago

Quite sure even literal thieves would consider thievery a form of abuse.

3 more replies

zer00eyz1mo ago

" Integrity at OpenAI .. protect ... abuse like bots, scraping, fraud "

Did you mean to use the word hypocrisy. If not, I'm happy to have said it.

I just want to note, that it is well covered how good the support is for actual malware...

jordanb1mo ago

They don't want anyone to take that which they have rightfully stolen.

2 more replies

axegon_1mo ago

The levels of irony that shouldn't be possible...

ProofHouse1mo ago

The irony is thick

sabedevops1mo ago

Seriously. The hypocrisy is staggering!

wiseowise1mo ago

Church, politicians, moralists are all the biggest hypocrites that want to teach you something.

1 more reply

gib4441mo ago

And have absolutely no reservations about making such an obvious statement on a public forum

RobotToaster1mo ago

"You're trying to kidnap what I've rightfully stolen!"

Aurornis1mo ago

I interpreted scraping to mean in the context of this:

> we want to keep free and logged-out access available for more users

I have no doubt that many people see the free ChatGPT access as a convenient target for browser automation to get their own free ChatGPT pseudo-API.

2 more replies

rsrsrs861mo ago

This

miki1232111mo ago

It's not scraping they're concerned about, it's abusing free GPU resources to (anonymously) generate (abusive) content.

nikitaga1mo ago

Scraping static content from a website at near-zero marginal cost to its server, vs scraping an expensive LLM service provided for free, are different things.

The former relies on fairly controversial ideas about copyright and fair use to qualify as abuse, whereas the latter is direct financial damage – by your own direct competitors no less.

It's fun to poke at a seeming hypocrisy of the big bad, but the similarity in this case is quite superficial.

PunchyHamster1mo ago

> Scraping static content from a website at near-zero marginal cost to its server, vs scraping an expensive LLM service provided for free, are different things.

I bet people being fucking DDOSed by AI bots disagree

Also the fucking ignorance assuming it's "static content" and not something needing code running

7 more replies

not2b1mo ago

2 more replies

lm4111mo ago

That is ridiculous.

You imply that "an expensive llm service" is harmed by abuse, but, every other service is not? Because their websites are "static" and "near-zero marginal cost"?

You have no clue what you are talking about.

1 more reply

cicko1mo ago

sandeepkd1mo ago

2 more replies

alsetmusic1mo ago

bakugo1mo ago

AmbroseBierce1mo ago

It's not like those models are expensive because the usefulness that they extracted from scraping others without permission right? You are not even scratching the surface of the hypocrisy

wolvoleo1mo ago

It's more ironic because without all the scraping openai has done, there would have been no ChatGPT.

In fact the more I think of it, I think it's exactly the same thing.

1 more reply

VadimPR1mo ago

Never in 15 years if running the website did we have such issues, and you can be sure that cache layers were in place already for it to last this long.

unsungNovelty1mo ago

"near-zero marginal costs". For whom exactly????

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...

lelanthran1mo ago

I don't think a rule along the lines of "Doing $FOO to a corporate is forbidden, but doing $FOO to a charitable initiative is fine" is at all fair.

What "$FOO" actually is, is irrelevant. I'm curious how you would convince people that this sort of rule is fair.

The corp can always ban users who break ToS, after all. They don't need any help. The charitable initiative can't actually do that, can they?

ungreased06751mo ago

You’re describing the tragedy of the commons. No single raindrop thinks it’s responsible for the flood.

1 more reply

razingeden1mo ago

1 more reply

the_sleaze_1mo ago

60% of our traffic is bot, on average. Sometimes almost 100%.

not_your_vase1mo ago

  > net-zero marginal cost

Spare me your tears.

grishka1mo ago

> Scraping static content from a website at near-zero marginal cost to its server

I'm not against my website getting scraped, I believe being able to do that is an important part what the web is, but please have some decency.

xmcqdpt21mo ago

(TBH it's not clear to me that their marginal costs are low. They seem to pick based on narrative.)

SkiFire131mo ago

> Scraping static content

How do you know the content is static?

ori_b1mo ago

make31mo ago

Absolutely not, the former relies on controversial ideas to qualify as legal.

Stealing the content from the whole planet & actively reducing the incentive to visit the sites without financial restitution is pretty bad.

foobiekr1mo ago

You are, of course, ignoring the production costs of the static content that OpenAi is stealing.

Stop justifying their anti-social behavior because it lines your pockets.

swagmoney16061mo ago

And yet I have to pay in my time and cash to handle the constant ddos'es from the constant LLM scraping

AtlasBarfed1mo ago

Because you say it is?

I obviously disagree. I mean, on top of this we are talking about not-open OpenAI.

gmerc1mo ago

mcfedr1mo ago

I'm sure the copyright holders would consider your use of their content as direct financial damage

nozzlegear1mo ago

Are they, actually?

nickphx1mo ago

Speak for yourself.

karlshea1mo ago

I don’t know what world you live in but it’s not this one.

andrepd1mo ago

> Scraping static content from a website at near-zero marginal cost to its server

The gall. https://weirdgloop.org/blog/clankers

platybubsy1mo ago

Bait or genuine techbro? Hard to say

everdrive1mo ago

mememememememo1mo ago

Local models for privacy.

You want to go to the world's best hotel? You are gonna be on their CCTV. Staying at home is crappier but private.

Unfortunately for the first time moores law isn't helping (e.g. give a poor person an old laptop and install linux they will be fine). They can do that and all good except no LLM.

karlgkk1mo ago

> You want to go to the world's best hotel? You are gonna be on their CCTV.

ironically, in high end hotels, there's often a lot less cctv. not none. just less. rich people enjoy privacy

2 more replies

nozzlegear1mo ago

> Staying at home is crappier but private.

Doesn't make sense, my home is much more preferable to a hotel

2 more replies

0x3f1mo ago

Meet me in a cafe and I will sign a JWT saying you're not a bot. You can submit this to whoever will accept it.

magicseth1mo ago

If apple approves it, ive got a solution: A keyboardthat attests to your humanity https://typed.by/magicseth/2451#2NyGLfAQxmqRiAOTlaX7ma3G4d1o...

7 more replies

jagged-chisel1mo ago

Sounds like we’re bringing back the PGP key signing parties

2 more replies

tshaddox1mo ago

Doesn’t really make sense, because any service can just say “you must paste your human-attestation JWT here to use this service” and plenty of people will.

1 more reply

kevin_thibedeau1mo ago

I've been doing that for years. Cloudflare is slowly breaking more and more of the web.

atoav1mo ago

What if I run a website and OpenAI produces bot traffic? Do they also consider it abuse when they do it?

subscribed1mo ago

This is indeed what I do. And you also should. Separate browser for banking, trusted shipping sites etc, and the normal one.

Make sure not to browse the Internet without adblock and/or similar.

SV_BubbleTime1mo ago

Firefox multicontainers are pretty cool. But it’s an advanced process that most people wouldn’t do or do correctly.

Sabinus1mo ago

I love the containers too. My current use case is to keep my YouTube account separate from my Google one. Google doesn't need all that behavioural data in one place.

It's a pity Firefox doesn't get the praise it deserves half as much as it cops criticism.

halJordan1mo ago

2 more replies

Imustaskforhelp1mo ago

The possibilities with Firefox multi containers and automation scripts as well are truly endless.

madrox1mo ago

Another way is to just do better isolation as a user. That's probably your best shot without hoping these companies change policies.

gib4441mo ago

Every time I try this, I end up crossing wires (ie using the browser that 'works' for most things, more than the one that is 'broken')

lukewarm7071mo ago

i am increasingly moving towards a model of 'no browser'.

search for me is now a proprietary index (like exa) that filters rubbish, with a zero data retention sla. so we don't need google profiling.

the content is distilled into markdown pulled from cloudflare's browser rendering api.

i let cloudflare absorb the torrent of trackers and robot checks, i just get md from the api with nothing else. cloudflare is poacher and gamekeeper.

an alternative is groq compound which can call browsers in parallel.

for interactive sites, or local ai browsing, i sometimes run a browser in a photon os docker with vnc, which gives you the same browser window but it runs code not on your pc.

that said little of my use is now interacting with websites, its all agentic search and websets so i don't have to spend mental energy on it myself

1 more reply

cruffle_duffle1mo ago

There is also the browser I use to get Claude to route around people blocking its webfetch. Both Playwright and chrome-mcp.

1 more reply

gruez1mo ago

>It's getting to the point where a user needs at minimum two browsers. One to allow all this horrendous client checking so that crucial services work, and another browser to attempt to prevent tracking users across the web.

scared_together1mo ago

Is your interlocutor barking up the wrong tree, or are you missing the forest for the trees?

According to the OP:

I guess Firefox VPN will hide the IP at least. But what about the other data, is it faked by RFP? Because if not, the so-called privacy offered by this configuration is outdated.

You might be fingerprinted by OpenAI right now, as “that guy with all the Firefox anti-fingerprinting stuff enabled, even though it breaks other sites”.

1 more reply

halflife1mo ago

Don’t know if it’s related to the article, but the chats ui performance becomes absolutely horrendous in long chats.

Typing the chat box is slow, rendering lags and sometimes gets stuck altogether.

I have a research chat that I have to think twice before messaging because the performance is so bad.

Running on iPhone 16 safari, and MacBook Pro m3 chrome.

DenisM1mo ago

The lost art of writing efficient code...

zdragnar1mo ago

> Hence he number of DOM elements stayed constant no matter how far you scroll and the only thing that grows is the Y coordinate.

3 more replies

groundzeros20151mo ago

This is how every scrolling list has been implemented since the 80s. We actually lost knowledge about how to build UI in the move to web

1 more reply

bschwindHN1mo ago

Almost certainly running some sort of O(n^2) algorithm on the chat text every key press. Or maybe just insane hierarchies of HTML.

Either way, pretty wild that you can have billions of dollars at your disposal, your interface is almost purely text, and still manage to be a fuckup at displaying it without performance problems.

stacktraceyo1mo ago

Same. It’s wild how bad it can get with just like a normal longer running conversation

qingcharles1mo ago

moffkalast1mo ago

Yeah just had this earlier today, I had to write my response in vscode and paste it in, there were literal seconds of lag for typing each character. Typical bloated React.

scq1mo ago

Just because a web application uses React and is slow, it does not follow that it is slow because of React.

It's perfectly possible to write fast or slow web applications in React, same as any other framework.

Linear is one of the snappiest web applications I've ever used, and it is written in React.

2 more replies

PunchyHamster1mo ago

That's how eating your own dogshit works, or whatever was that saying

lionkor1mo ago

Hi Nick, first of all, very cool of you to respond here instead of letting us all sit in the dark. I think that's what makes HN special.

Schiendelman1mo ago

Don't beat up an engineer for decisions made by company leadership. It's really inappropriate.

3 more replies

sebmellen1mo ago

Is this to be expected? I would presume that if I'm authenticated and paying, VPN use wouldn't be a worry. It would be nice to be able to use the tool whether or not I'm on a VPN.

JumpCrisscross1mo ago

> even when I'm logged in with my Pro account, if I'm using a VPN provider like Mullvad, I often have trouble using the chat interface or I get timeout errors

Heard from a founder who recently switched his company to Claude due to OpenAI's lagginess–it's absolutely an OpenAI problem. Not an AI problem in general.

vkou1mo ago

> Hey! I'm Nick, and I work on Integrity at OpenAI. These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

How can first-party products protect themselves from abuse by OpenAI's bots and scraping?

mystraline1mo ago

This is a completely in-scope question.

How do we defend against your scraping, OpenAI?

I dont want any of my content scraped or seen by you all. Frankly, fuck you all for thinking my content is owned by you.

CableNinja1mo ago

I use nginx conditionals and useragent checking, then respond with 418 or 410.

Probably too late now but my list needs updating

tedsanders1mo ago

It's documented here: https://developers.openai.com/api/docs/bots

1 more reply

wilg1mo ago

robots.txt bro https://developers.openai.com/api/docs/bots/

1 more reply

seba_dos11mo ago

noosphr1mo ago

>These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Can you share these mitigations so we can mitigate against you?

0x3f1mo ago

It's just Cloudflare. Bypassing it is a whole industry.

zenethian1mo ago

I read the comment as “use it to mitigate against OpenAI bots scraping the web” and not to mitigate Cloudflare.

1 more reply

dawnerd1mo ago

Flaresolverr is one way. Isn’t perfect but bypasses a lot.

lm4111mo ago

"we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform"

The scary part is that you don't even see the irony in writing this.

Or, are you just okay "misusing" everyone for your own benefit?

driverdan1mo ago

Brand new account with 2 comments in this thread. How can we be sure you're not a bot deployed to defend OpenAI?

Please run Cloudflare's privacy invasive tool and share all the values it generates here so we can determine if you're a real person.

mehov1mo ago

> because we want to keep free and logged-out access

But don't you run these checks on logged-in users too?

MyNameIsNickT1mo ago

lelanthran1mo ago

> The reason is basically the same: we want scarce compute going to real people, not attackers.

You are defining "Bots" and "Scrapers" as a subset of attackers, though.

Is this really fair? The value in your product came from people who wrote for other people, not bots, but your bot scraped them anyway.

My browser extension (see my previous reply on this story) automates the existing open tab I have to all the different chat AIs (GPT, Claude, Gemini, etc).

I suppose all you can do is rate-limit each user.

angoragoats1mo ago

Meanwhile, the rest of us (well, not me, because I don’t use your garbage product, but lots of others do) have to suffer and have our compute resources used up in the name of “protection.”

2 more replies

salawat1mo ago

More like "We want your money, but don't want to provide service." Are you sure OpenAI isn't morphing into a finance/insurance company?

1 more reply

jorvi1mo ago

c0_0p_1mo ago

Can't have those bots or scrapers running amok can we...

ghm21991mo ago

Would OpenAI also consider renumerations to every site they have scraped that had a robots.txt file and they chose to ignore it anyway? Feel free to not answer this question.

timeinput1mo ago

pdntspa1mo ago

Y'all just salty that DeepSeek et al are training their LLMs on yours

dev1ycan1mo ago

"abuse like bots, scraping, fraud, and other attempts to misuse the platform"

This has to be a joke, right?

pera1mo ago

I really can't tell for sure (new user posting a ridiculously hypocritical corporate message on a Sunday) but if GP actually works for OpenAI the lack of self-awareness is seriously striking

singpolyma31mo ago

How?

1 more reply

conartist61mo ago

Still feels very anti-consumer.

If every company behaved like you do, the internet would be a much worse place.

In fact, OpenAI has already made the Internet a much worse place, already much, much less open and much less optimistic about its own future than it was even five years ago...

lm4111mo ago

"Integrity at OpenAI"

Basically an oxymoron at this point.

wiseowise1mo ago

> A big reason we invest in this is because we want to keep free and logged-out access available for more users.

Thank you for the reply, Nick. It wouldn’t be a problem to disable the tracking for authenticated users then, would it?

lloydatkinson1mo ago

It would because someone's KPI depends on number of tracked users lol

1 more reply

andrepd1mo ago

> OpenAI: These checks are part of how we protect products from abuse like bots, scraping, and other attempts to misuse the platform.

This would be fucking HILARIOUS if it wasn't so tragic.

rchaud1mo ago

Manifest destiny for me, border enforcement for thee.

lmz1mo ago

This kind of flawed thinking again. Like the natives didn't fight and lose wars against the manifest destiny types.

2 more replies

Chance-Device1mo ago

It can be both

witx1mo ago

> These checks are part of how we protect our first-party products from abuse like bots, scraping,

Do you guys see the irony here?

hosteur1mo ago

They obviously get it. They just do not care.

the_gipsy1mo ago

But is the title true, is typing specifically blocked? Or does it just block submitting the text?

I ask because I have seen huge variations in load time. Sometimes I had to wait seconds until being able to type. Nowadays it seems better though.

numlock861mo ago

> [...] we protect our first-party products from abuse like [...] scraping [...]

what an odd thing to say for someone whose product is built entirely on exactly that

huertouisj1mo ago

I assumed it was maybe some tokenization going on client side, but now I realize maybe it's some proof of work related to prompt length?

egorfine1mo ago

Paying customer since inception here.

I presume the local ChatGPT.app has even more measures to prevent automation, right? Presumably privacy-invasive ones as it is customary these days?

Is there a way I can opt out? I really, really, really don't like it.

radicality1mo ago

tipiirai1mo ago

I don't trust what OpenAI says. Sam Altman gives shivers, and these kinds of blog posts make things look even worse.

myHNAccount1231mo ago

xg151mo ago

> how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Are you applying the same standards to your own scraper bots?

SilasX1mo ago

It has not been negligible for me, and, however you're doing this, there is significant room for improvement.

potsandpans1mo ago

Chatgpt banned me after I said disparaging things about Sam Altman in a chat.

When I appealed the ban, I was told that I couldn't be told exactly why I was banned, but if I wrote a written apology and "promised to never do it again" my ban could be appealed.

I asked for an update on the ban via email every month for over a year.

Maybe you could tell me a little bit about that process?

jgalt2121mo ago

> we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform

Have you just described the dilemma facing all the content sites used to train LLMs?

leros1mo ago

freeopinion1mo ago

"We want to hook as many people as possible without letting in our competitors" is a pretty crummy thought to use as a public justification.

(Edited for typos.)

piskov1mo ago

Tangential question: are there chatgpt app devs on X? There are a few from Codex team but I couldn’t find guys from “ordinary” chatgpt.

Also if you could pass this over: it takes 5 taps to change thinking effort on ios and none (as in completely hidden) on macos.

If I were to guess it seems that you were trying to lower the token usage :-). Why the effort is only nicely available on web and windows is beyond me

xtajv1mo ago

Earnest question: if I was feeling lazy and security-conscious at the same time, would I be better off...

(A) opening chatgpt.com in qubes (but staying logged out, i.e. never creating a chatgpt account)

-or-

(B) creating a freemium chatgpt account

(Obviously, the "best" answer would be something like running a local LLM from an airgapped machine in a concrete bunker :) But that's not what I'm after).

20k1mo ago

>abuse like bots, scraping

10/10, I've got no notes

cheese_van1mo ago

Abuse from scraping has long been a serious problem for many, good job!

jesuslop1mo ago

invalidusernam31mo ago

But why block the ui until then? Surely you can just not make any requests until the checks are complete?

rglullis1mo ago

gck11mo ago

I always wondered why you even have logged out access. I'm glad I can use ChatGPT in incognito when I want a "clean room" response, but surely that's not the primary use case.

Is user base that never logs in really that significant?

pocksuppet1mo ago

This episode proves they know who you are, even when you're logged out. If they didn't know, they wouldn't let you use the service.

aucisson_masque1mo ago

pocksuppet1mo ago

Because they want to make it as hard as possible to reverse engineer. If they wanted it to be easy, they'd use <input type="checkbox" name="ishuman">I am a human

toddmorey1mo ago

Why are all these checks still performed on an authenticated, paid user?

JumpCrisscross1mo ago

> we want to keep free and logged-out access available for more users

How does this comport with OpenAI's new B2B-first strategy?

> We also keep a very close eye on the user impact

Are paid or logged-in users also penalised?

account421mo ago

> These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

The lack of self awareness...

prmoustache1mo ago

> we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Isn't that how you build your service from the very start? How ironic.

diebillionaires1mo ago

As a free tier user I only get like three queries in now without model quality reduction, so I'd say your bases are covered as far as GPU costs around misuse.

subscribed1mo ago

> "abuse like bots, scraping"

You what, mate? Would you please use that on yourselves first? Because it comes off as a GROSS hypocrisy. State of the art hypocrisy.

>> behavioral biometric layer

But this one, especially, takes the cake.

Quite disgusting.

kelnos1mo ago

> A big reason we invest in this is because we want to keep free and logged-out access available for more users.

Are these checks disabled for logged-in, paid users?

sourcecodeplz1mo ago

I really appreciate the free options, without even needing a login. Wish they would also keep the small free weekly allowance for Codex.

user39393821mo ago

Have you given any thought to what we trade when big tech elects one corporation as the gatekeeper for vast swaths of the Internet?

nicbou1mo ago

For what it's worth, I switched to Gemini because of the long ChatGPT load time. Gemini loads as fast as Google Search.

owebmaster1mo ago

The reason why you did it is clear, why you guys settle down for such a poor implementation is why this thread exists

mghackerlady1mo ago

No, leave it. Surely the mighty OpenAI can deal with the scraping. At least, it seems to think everyone else can

Razengan1mo ago

> we want to keep free and logged-out access available for more users.

And THANK YOU for that!

Being able to use ChatGPT and Grok without signing in is a big part of why I like those services over Gemini etc.

arendtio1mo ago

Do you do those checks only for users without accounts or also for those with accounts?

gmerc1mo ago

the company that scrapes every until it collapses really needs to protect itself from scraping. Lol.

AndrewKemendo1mo ago

Kudos for trying

This whole thread was like watching a swarm of ants try and take a grasshopper down

SubiculumCode1mo ago

In long threads in chatgpt, it grinds to a halt in both Chrome and Firefox. Please fix

htx80nerd1mo ago

Thanks. I've used ChatGPT a million times and never had any input issues.

matheusmoreira1mo ago

> protect our first-party products from abuse like bots, scraping

You do see the irony here?

marxisttemp1mo ago

History will not be kind to you and your ilk. Quit your job.

tekawade1mo ago

Hey Nick, I find it concerning this account is. Frayed just to comment on this thread. And never even reply back to any of the real concerns.

Here to hoping this is real person and actually created account out of concern and sharing.

lifis1mo ago

Are you disabling them for paying subscribers?

crest1mo ago

Then make sure they only target the free tier!

sandeepkd1mo ago

0dayman1mo ago

Hi Nick, your software is a horrendous encroachment on users' privacy and its quality is subpar to those of us who know what we're working with. We don't use your product here.

chronc63931mo ago

> Hi Nick, your software is a horrendous encroachment on users' privacy and its quality is subpar to those of us who know what we're working with. We don't use your product here.

It’s ok, OpenAI is cooked.

Feel bad for anyone who joined OAI in the past 12 months. Their RSU ain’t going to be worth much later this year. IPO is too late.

quotemstr1mo ago

We really need ZKPs of humanity

ctoth1mo ago

No, we really don't. We don't need worldcoin, we don't need papers, please. We just don't.

"Prove your humanity/age/other properties" with this mechanism quickly goes places you do not want it to go.

quotemstr1mo ago

So, please, can we stop matching every scheme whatsoever for verifying facts as actors as the East German villain in a cold war movie? We're talking about something totally different.

2 more replies

Muromec1mo ago

> quickly goes places you do not want it to go.

Which places?

gzread1mo ago

Sure. I'll provide an API to provide mine to your bot for $1 each time.

ryanmcbride1mo ago

Protecting your site from bots and scraping is absolutely hilarious considering how you acquired (read: stole) the data you trained your bot on dude.

Just yank that ladder up behind you.

pocksuppet1mo ago

> Just yank that ladder up behind you.

You would be an irresponsible entrepreneur if you didn't. Don't forget your legal obligation to maximise shareholder value.

blactuary1mo ago

> I work on Integrity at OpenAI

Irony is truly dead. Show you have integrity by quitting your job

MisterTea1mo ago

> These checks are part of how we protect our first-party products from abuse like bots, scraping, fraud, and other attempts to misuse the platform.

Isn't this the same behavior used by AI companies to gather training data? Pot, meet kettle.

thegreatpeter1mo ago

You’re doing gods work sir, thank you!

wackget1mo ago

If you'd like, I can write a two-sentence paragraph to send to your colleagues. It contains a special phrase which most colleagues will find difficult to ignore. Would you like me to do that?

rsrsrs861mo ago

Hi Nick, do you believe what you say? You scraped the shit out of everyone

tomalbrc1mo ago

Fake Account

nickphx1mo ago

the irony of your statement is hilarious, disappointing, and infuriating.

boesboes1mo ago

lol, hypocrites.

lxgr1mo ago

lucasfin0001mo ago

gruez1mo ago

>The real frustrating part is that Cloudflare's "definition" of suspicious keeps changing and expanding.

That's... exactly expected? It's a cat and mouse game. People running botnets or AI scrapers aren't diligently setting the evil bit on their packets.

lxgr1mo ago

But hey, at least some bots are also not making it past Cloudflare!

3 more replies

Gormo1mo ago

jagged-chisel1mo ago

That’s obviously because they’re not being “evil”

Aurornis1mo ago

> The people most likely to get caught by these systems are exactly the ones who care most about their privacy, and not the bots that they are apparently targeting.

In my brief experience with abuse mitigation, connections coming from VPNs or unusual IP ranges were very significantly more likely to be associated with abuse.

The worst offenders are Tor IP addresses. Anyone connecting from Tor was basically guaranteed to have bad intentions.

It’s tough for people who want these things for privacy, but the sad reality is that these same privacy protections are favored by people who are trying to abuse services.

3 more replies

whatisthiseven1mo ago

traceroute661mo ago

> Which VPNs are people using that actually care about the user's privacy?

Mullvad.

It has been proven in a court of law that when Mullvad says "no logging", they mean it.

They also regularly have security audits and publish the results[2][3]

3 more replies

evilduck1mo ago

Using any popular datacenter's IP range for a personal VPN is likely to be outright blocked.

1 more reply

lxgr1mo ago

Imustaskforhelp1mo ago

ProtonVPN with bitcoin which you get from a monero swap is a good idea for complete privacy if you want port forwarding.

MullvadVPN is also another great one.

There are others, but mostly its the 2-3 that I trust.

1 more reply

gruez1mo ago

>Most of them don't, sell their home IP to buyers, sell their DNS history to others, etc. Worse, some of them could require invasive MITM cert stuff most users will just click yes through.

ymolodtsov1mo ago

Yes, using an incognito windows is more than enough to kick off their checks.

ehnto1mo ago

lxgr1mo ago

That's my favorite combination: Shitty bot detection meeting shitty payment security systems.

amatecha1mo ago

lxgr1mo ago

girvo1mo ago

Surprising really, because I'm a Firefox + Ublock Origin die hard and I never get Cloudflare captchas. Wonder what the difference is? I have CGNAT turned off, if that matters at all (probably not).

lxgr1mo ago

I could definitely imagine a public IPv4 with lots of good, logged-in Cloudflare traffic to act as a positive signal for their heuristics, possibly even overriding the Firefox penalty.

danielheath1mo ago

Maybe check your network isn't sending web traffic you're not aware of?

I'm running firefox and seeing the normal amount.

jychang1mo ago

Most people are on a CGNAT these days, drowning in captchas is the new normal. You’re at the mercy of one of your neighbors not hosting a botnet from their home computer.

perching_aix1mo ago

1 more reply

tokioyoyo1mo ago

1 more reply

cogman101mo ago

Every so often, usually after a firefox update, CF will get into a "I'm convinced your a bot" mode with me. I can get out of it by solving 20 CAPTCHAs.

hansvm1mo ago

2 more replies

g-b-r1mo ago

Maybe you allow tracking and cookies?

Eji17001mo ago

I don't, and I rarely have issues with firefox. Private + blockers + VPN causes, expected, issues but otherwise i'm usually fine?

onion2k1mo ago

Is that because botnets spoof being Firefox? It's not really fair to blame Cloudflare it is. That's on the bots.

doctaj1mo ago

In what way would that not be fair? Their product giving false positives (unnecessary challenges for a normal browser humans commonly use) to real people is definitely their fault.

eks3911mo ago

gruez1mo ago

>Their product giving false positives (unnecessary challenges for a normal browser humans commonly use) to real people is definitely their fault.

Is it TSA's "fault" that non-terrorists are subject to screening?

2 more replies

lxgr1mo ago

No, using a stupid authentication/verification method with lots of false positives is always on whoever deploys it.

Imagine an apartment building with a flimsy front door lock that breaks all the time, and the landlord only telling you that that can't be helped because of all the burglars.

josephcsible1mo ago

If it's just as easy to spoof being Chrome as it is to spoof being Firefox, then it is indeed fair to blame Cloudflare if they give Firefox users more CAPTCHAs than Chrome users.

conradkay1mo ago

Not really, there's camoufox but the vast majority use modified chrome/chromium

binaryturtle1mo ago

I'm with a slightly older Firefox and can't use many websites at all anymore because the Cloudflare cancer.

Of course then you got sites like gnu.org too that block you because your slightly outdated user agent.

mghackerlady1mo ago

1 more reply

geysersam1mo ago

I use firefox daily and I don't encounter the problems you describe, might be worth looking if there's some other issue.

mghackerlady1mo ago

Heaven forbid you not use JavaScript, then they can't <s>track you</s> keep the internet safe!

lm4111mo ago

That's not Cloudflare trying to make your life hard.

It's the reality of how bad the bots have become.

dawnerd1mo ago

I’ve been getting it in safari too. It’s ridiculous frankly. My residential ip must have been flagged or something. The part that’s really annoying is its trivial for bots to bypass.

lxgr1mo ago

> I’ve been getting it in safari too.

I'm getting it on iCloud Private Relay all the time. It honestly makes it kind of useless.

> The part that’s really annoying is its trivial for bots to bypass.

Not the ethical bots, though! My GPT-backed Openclaw staunchly refuses to go anywhere near a "I'm not a robot" button.

gzread1mo ago

segmondy1mo ago

tshaddox1mo ago

Is anyone talking about the fact that this is a fundamental design flaw of the web? Or arguably even the entire Internet?

3form1mo ago

It's hard to call something a "fundamental flaw of web" if it wasn't an issue for 30 years. Unless you mean something more general that I'm missing.

tshaddox1mo ago

Arguably it didn’t see widespread commercial adoption for 30 years, and you wouldn’t expect fundamental design flaws regarding commercial incentives to manifest before that.

fastball1mo ago

Cloudflare isn't providing Turnstile as a service in a vacuum, this is a direct response to bad actors who can trivially abuse the web.

pixl971mo ago

lazycouchpotato1mo ago

At times I'm completely locked out of a website and Cloudflare asks me to email the website owner to get the issue resolved.

.. how do they expect me to find the website owner's email if I can't access said website?

wongarsu1mo ago

amatecha1mo ago

jgalt2121mo ago

is it Linux (or similar)?

is it Firefox?

If yes, to one or both, you're blocked! Clearly millions of dollars of engineering talent and petabytes of data collection should be able to come up with something more nuanced than this.

dheera1mo ago

Exactly. For the most part all this bot protection is only protecting these websites against humans.

I don't do free work. I'm not going to label 50 images of crosswalks and motorcycles for free.

ronbenton1mo ago

> For the most part all this bot protection is only protecting these websites against humans.

Curious how do you know this?

lukewarm7071mo ago

sometimes when there is mafia you get no option but pay pizzo

hence i am just using cloudflare remote browser rendering.

EGreg1mo ago

Well, that's for the public internet.

https://github.com/Safebots/Safecloud

password43211mo ago

Wow, if Seinfeld can have a soup nazi, I think it's within reason for you to be called the internet nazi.

"No s̶o̶u̶p̶ internet for you!"

Good luck!

ale421mo ago

This was sarcasm, right?

EGreg1mo ago

simonw1mo ago

Presumably this is all because OpenAI offers free ChatGPT to logged out users and don't want that being abused as a free API endpoint.

NotPractical1mo ago

But do they do it whether you're logged in or not?

Gander57391mo ago

Because accounts are free, and could still be used to abuse as a free endpoint, with a little trickiness.

gzread1mo ago

Don't you need a Google account and to get a Google account you need a phone number?

"You're posting too fast! Please slow down."

1 more reply

appreciatorBus1mo ago

Yup.

gruez1mo ago

>I assumed it wouldn’t work, but to my surprise with no account, no cookies of any kind it took my query and gave me an answer.

They allowed anonymous requests for months now, maybe even a year.

solaire_oa1mo ago

Yeah, additionally gemini.google.com is also free unauthenticated, which I've been using for a very long time (a year?). Why this is being treated as news is confusing.

1 more reply

aziaziazi1mo ago

FergusArgyll1mo ago

Yeah it works but it's a dumber model. Prob mini

lelandfe1mo ago

You get a couple requests in at a smarter model and then it prompts you to sign up, and from there uses an extremely dumb model.

bredren1mo ago

It is also intended to protect the usage patterns of pro subscribers.

As has been amply explained, the API pricing per token is far more for equivalent use when maximizing a subscription plan.

It isn’t really a massive hurdle to deal with this full SPA load check. If one is even aware it exists they already have the skills to bypass it anyway.

I get why people would “what about” the automation inherit in what OpenAI is doing but that is a separate matter.

Other businesses and applications can put into place their own hurdles and anti bot practices to protect the models they’ve leaned into—-and they have been.

darepublic1mo ago

Using 5.2 at 20 a month would also be a steal. Other shoe will drop on codex sooner or later

thisisnow1mo ago

Its probably same for copilot.microsoft.com and their cloudfart usage

petcat1mo ago

> This is bot detection at the application layer, not the browser layer.

iancarroll1mo ago

It’s pretty interesting to me that Cloudflare is collecting additional client-side data for individual customers. This is not widely done by most anti-bot solutions.

supriyo-biswas1mo ago

OpenAI is on an enterprise plan and (presumably) gets a customized version of Turnstile.

red_admiral1mo ago

Chance-Device1mo ago

raincole1mo ago

Why does every article need a 'punchline'? It's a technical analysis. Do you expect punchlines when you read recipes or source code?

Chance-Device1mo ago

1 more reply

dmos621mo ago

For me the interesting parts of the article is how author got to the decompiled checks and what the checks are. Anti-bot is an interesting space.

elwebmaster1mo ago

That's because the article is AI slop.

londons_explore1mo ago

I just don't understand why bot owners can't just run a complete windows 11 VM running Google Chrome complete with graphics acceleration.

jaccola1mo ago

There are myriad providers competing to offer this, nicely packaged with all the accoutrements (IP rotation, location spoofing, language settings, prebuilt parsers, etc.) behind an easy to use API.

Honestly it is a very healthy competitive market with reasonably low switching costs which drives prices down. These circumstances make rolling your own a tough sell.

arcfour1mo ago

huertouisj1mo ago

there are scraping subreddits.

if you browse them you will see that bot writers are very annoyed if they can't scrape a site with a headless browser.

you can do what you suggested, but with Linux VMs/containers. windows is too heavy, each VM will cost you 4 GB of RAM

londons_explore1mo ago

The reason to use windows is that anti bot tech is going to be a lot stricter if Linux is detected...

xmcp1231mo ago

I’m in those. xvfb and headless=false still works great

poly2it1mo ago

If you know of a simple way to run a Windows 11 VM with good graphics acceleration (no GPU passthrough), please contact me.

MarioMan1mo ago

More info here:

https://web.archive.org/web/20231107182321/https://mu0.cc/20...

https://youtu.be/XLLcc29EZ_8?t=570

https://github.com/jamesstringer90/Easy-GPU-PV

deltoidmaximus1mo ago

Wouldn't virtualbox or vmware's paravirtual GPUs be a better fit for this use case? Unfortunately the offerings with qemu/libvirt still lag vmwares by a lot.

1 more reply

himata41131mo ago

284 on 296gb of ram with deduplication enabled on a 128c with 32Q vgpu.

YetAnotherNick1mo ago

I am reasonably sure that these kind of fingerprints can detect if the browser is inside a VM.

kristjansson1mo ago

… yup?

I mean you missed the minigame of preventing Chrome from signaling that it’s being programmatically (webdriver etc) driven and tipping your hand, but … yup?

technion1mo ago

To prompt a discussion that's purely technical: I'm interested in how this was done.

Did OpenAI somehow do something on their own API that uses data from Turnstile?

XYen0n1mo ago

Cloudflare should be able to determine whether a website uses React by analyzing data flowing through its CDN.

technion1mo ago

Whilst true, "validate the right state is loaded" would surely be something not done without developer input.

1 more reply

ripbozo1mo ago

and chatgpt was then used to write this article. at least try to clean it up a bit

hx81mo ago

Ah yes, the timeless hallmark of web blogs: a draft so messy even a language model would ask for a second pass.

i18nagentai1mo ago

refulgentis1mo ago

[1] L* as in L*a*b*, not L in Oklab

tommodev1mo ago

Ah, this explains chatgpt (and probably copilot) performance behind corporate firewalls such as zscaler.

Between the network latency and low end machines, there is an enormous lag between chatgpts response and being able to reply, especially for editing a canvas.

I've been sitting there for up to a minute plus waiting to be able to use the canvas controls or highlight text after an update.

bredren1mo ago

On a related note, ChatGPT.com changed how it handles large text pastes this past week.

It now behaves like Claude, attaching the paste as a file for upload rather than inlining it.

This affected page UX some and reduces the cost of the browser tab some.

At some point, maybe still true, very long conversations ~froze/crashed ChatGPT pages.

NSPG9111mo ago

[1]: https://github.com/xcanwin/keepchatgpt

qingcharles1mo ago

natdempk1mo ago

Does anyone know how this is integrated on the Cloudflare side and across the app? Is this beyond standard turnstile? Is this custom/enterprise functionality? Something else?

croemer1mo ago

When using ChatGPT Android app with some NextDNS block lists, I get an error modal in app saying "security misconfiguration blah blah".

Clearly I'm blocking some tracker and it's upset about that. I allowlisted a sentry subdomain and since then got no more complaints.

tosh1mo ago

It used to be possible to type immediately while the page is loading and have all key presses end up in the input field.

Why run this check before user can type?

Why not run it later like before the message gets sent to the server?

1 more reply

tripdout1mo ago

AI-written article?

1 more reply

CorneredCoroner1mo ago

> A headless browser that loads the HTML but doesn't execute the JavaScript bundle won't have them.

this is meaningless btw. A browser headless or not does execute javascript.

jaccola1mo ago

I disagree, a browser can have javascript execution disabled (and this is somewhat common in scraping to save time/resources).

I read it to mean: "A browser that doesn't execute the JavaScript bundle won't have [the rendered React elements]." Which is true.

maxwellg1mo ago

Wouldn't a browser that doesn't execute JS also not execute the browser fingerprinting code in the first place?

XYen0n1mo ago

If JavaScript is disabled, why use a headless browser instead of making HTTP requests directly?

girvo1mo ago

A bunch of the points in this AI generated blog post were like that. Makes me feel dirty when I'm 1/3rd of the way through and I realise how off it is.

thisisnow1mo ago

Hah, sure, you just let random JS execute from random sites on your machine...

lightedman1mo ago

Preventing me from typing until you SCAN MY SYSTEM?

Fine, by extension, you agree I can scan all of your systems for whatever I desire. This works both ways.

tristor1mo ago

TimLeland1mo ago

It seems they fixed the biggest issue Ive had where you start typing then it erases the content once the page fully loads

EGreg1mo ago

Why does ChatGPT slow down so much when the conversations get long, while Claude does compaction?

themafia1mo ago

Every provider seems to have been plauged by these freeloaders to such an extent that they've had to develop extreme and onerous countermeasures just to avoid losing their shirts.

What's the word? Schadenfreude?

jtbayly1mo ago

Others here are asking if this is the cause of slow performance in a long chat.

But it seems clear to me that this is why I can't start typing right away when I first load the page and click to focus in the text field.

darepublic1mo ago

I imagine to stop web automation from getting free API like use of the model

dsparkman1mo ago

That explains why ChatGPT has been running like shit all weekend. In the desktop app on Mac, it could not even complete a response. On the web, it would hang before you could input anything.

heliumtera1mo ago

I am shocked openai collects data about it's users before users have the opportunity to send the same data to openai servers!

pautasso1mo ago

AI goes through great lengths to ensure it's talking with humans.

Why would two AI bots want to chat with each other?

1 more reply

edg50001mo ago

The chat client has serious performance issues on lower end systems. Now I see why!

self-portrait1mo ago

A/B testing /dev/ kit that tokenizes four permutations of language

apsurd1mo ago

Haven't read yet but instantly matched with my experience of the chat being unusable at times. The latency and glitch-like feel is unbearable.

aslihana1mo ago

I mean, I can easily get them to behaving defensively for not being abused. But MBP with M5 here, my chatgpt tab always get stucked when I hit some prompt.

Really really bad user experience, wondering about when they will leave this approach.

arcfour1mo ago

...I don't think that's possible even if you are a bot? I would be very surprised if OAI had their origin exposed to the internet. What is a "non-Cloudflare proxy"? Is this AI slop?

It's likely just looking at the CF properties as part of a bot scoring metric (e.g. many users from this ASN or that geoip to this specific city exhibit abusive patterns).

j451mo ago

This is a lot of fingerprinting.

AndreyK19841mo ago

CamuFox will fix it easy peasy.

seker181mo ago

Cómo puedo acceder a un celular

aucisson_masque1mo ago

Mistral chat is also free to use without account and doesn't do that.

tom-blk1mo ago

Wild insight

gobdovan1mo ago

Imagine if they'd put as much effort into making a decent frontend experience.

littlecranky671mo ago

How is this fingerprinting even GDPR compliant? Fingerprinting + profiling need consent, and the service must work without tracking+profiling consent.

yapyap1mo ago

wow OpenAi sure doesnt like bots for a company enabling the botification of the world wide web

baggachipz1mo ago

"We wouldn't want somebody scraping our data, that's ours!"

Josephjackjrob11mo ago

cloud flare will not be around for long, its a shame as it is the GOAT lol

avazhi1mo ago

Another AI-slop article.

Sick.

pencilcode1mo ago

ai slop analysis finding CF detects non javascript capable browsers with no punchline

blinkbat1mo ago

Ok... so... ?

beering1mo ago

So are you able to get free inference now that you decrypted this?

superkuh1mo ago

gruez1mo ago

>My best guess is -- ChatGPT is running something in your browser to try to determine the best things to send down to the model API

There's no way this is worth it unless the models are absolutely tiny, in which case any benefits from offloading to the client is marginal and probably isn't worth the engineering effort.

danny_codes1mo ago

It’s free as a loss leader. The trick is to upsell later. Unfortunately for OpenAI there are plenty of competitors with fungible products, so it might be hard to pull a classic monopoly rug-pull.

beering1mo ago

They already see everything I’m doing because I send my prompts to them. What “workaround” are you referring to?

superkuh1mo ago

And as for "but chatgpt isn't paid" (another commenter), well, then yes, that's even closer to free by removing this spying on your computer setup. But they spy on the paid users too.

voxic111mo ago

But isn't ChatGPT access free through the browser? What do you mean already paid in currency?

pocksuppet1mo ago

If you want to send more than a few prompts each day, you have to pay. With currency.

dgb231mo ago

Why are companies like OpenAI and others that are all-in on LLMs still using ReactJS, Python and so on?

These programming languages and frameworks were made for developer convenience and got wide adoption, because it makes on-boarding easier.

This obviously comes at a cost of performance, complexity and introduces a liability into a system, because they are dependencies that come with a whole bunch of assumptions about how they are used.

Is this tradeoff even worth it anymore?

robmccoll1mo ago

Probably training data. The largest number of public repos are built on that stack. We recently picked React for new projects because LLMs seemed to be the most reliable when writing React code.

j / k navigate · click thread line to collapse