Cloudlflare builds OAuth with Claude and publishes all the prompts (opens in new tab)

(github.com)

889 pointsgregorywegory9mo ago529 comments

529 comments

The commits are revealing.

Look at this one:

> Ask Claude to remove the "backup" encryption key. Clearly it is still important to security-review Claude's code!

> prompt: I noticed you are storing a "backup" of the encryption key as `encryptionKeyJwk`. Doesn't this backup defeat the end-to-end encryption, because the key is available in the grant record without needing any token to unwrap it?

I don’t think a non-expert would even know what this means, let alone spot the issue and direct the model to fix it.

victorbjorklund9mo ago

That is how LLM:s should be used today. An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch. Just the other day I was working on a prototype and let claude write code for a auth flow. Everything was good until the last step where it was just sending the user id as a string with the valid token. So if you got a valid token you could just pass in any user id and become that user. Still saved me a lot of time vs doing it from scratch.

Vinnl9mo ago

At least for me, I'm fairly sure that I'm better at not adding security flaws to my code (which I'm already not perfect at!) than I am at spotting them in code that I didn't write, unfortunately.

1 more reply

otabdeveloper49mo ago

> Still saves a lot of time vs typing everything from scratch

No it doesn't. Typing speed is never the bottleneck for an expert.

As an offline database of Google-tier knowledge, LLM's are useful. Though current LLM tech is half-baked, we need:

a) Cheap commodity hardware for running your own models locally. (And by "locally" I mean separate dedicated devices, not something that fights over your desktop's or laptop's resources.)

b) Standard bulletproof ways to fine-tune models on your own data. (Inference is already there mostly with things like llama.cpp, finetuning isn't.)

3 more replies

XCSme9mo ago

> Still saves a lot of time vs typing everything from scratch.

In my experience, it takes longer to debug/instruct the LLM than to write it from scratch.

1 more reply

zx80809mo ago

> An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch.

It's a lie. The marketing one, to be specific, which makes it even worse.

1 more reply

noone_youknow9mo ago

For me, it’s not the typing - it’s the understanding. If I’m typing code, I have a mental model already or am building one as I type, whereas if I have an LLM generate the code then it’s “somebody else’s code” and I have to take the time to understand it anyway in order to usefully review it. Given that’s the case, I find it’s often quicker for me to just key the code myself, and come away with a better intuition for how it works at the end.

827a9mo ago

I tend to disagree, but I don't know what my disagreement means for the future of being able to use AI when writing software. This workers-oauth-provider project is 1200 lines of code. An expert should be able to write that on the scale of an hour.

The main value I've gotten out of AI writing software comes from the two extremes; not from the middle-ground you present. Vibe coding can be great and seriously productive; but if I have to check it or manually maintain it in nearly any capacity more complicated than changing one string, productivity plummets. Conversely; delegating highly complex, isolated function writing to an AI can also be super productive, because it can (sometimes) showcase intelligence beyond mine and arrive at solutions which would take me 10x longer; but definitionally I am not the right person to check its code output; outside of maybe writing some unit tests for it (a third thing AI tends to be quite good at)

2 more replies

0points9mo ago

I really don't agree with the idea that expert time would just be spent typing, and I'd be really surprised if that's the common sentiment around here.

An expert reasons, plans ahead, thinks and reasons a little bit more before even thinking about writing code.

If you are measuring productivity by lines of code per hour then you don't understand what being a dev is.

2 more replies

dismalaf9mo ago

> Still saves a lot of time vs typing everything from scratch

Probably very language specific. I use a lot of Ruby, typing things takes no time it's so terse. Instead I get to spend 95% of my time pondering my problems (or prompting the LLM)...

2 more replies

signa119mo ago

> ... Still saves a lot of time vs typing everything from scratch ...

how ? the prompts have still to be typed right ? and then the output examined in earnest.

3 more replies

blinded9mo ago

Sure! But over half the fun of coding is writing and learning.

i5heu9mo ago

Revealing against what?

If you look at the README it is completely revealed... so i would argue there is nothing to "reveal" in the first place.

> I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

JW_000009mo ago

I think OP meant "revealing" as in "enlightening", not as "uncovering something that was hidden intentionally".

rienbdj9mo ago

> Revealing against what?

Revealing of what it is like working with an LLM in this way.

kortilla9mo ago

Revealing the types of critical mistakes LLMs make. In particular someone that didn’t already understand OAuth likely would not have caught this and ended up with a vulnerable system.

risyachka9mo ago

If the guy knew how to properly implement oauth - did he save any time though by prompting or just tried to prove a point that if you actually already know all details of impl you can guide llm to do it?

Thats the biggest issue I see. In most cases I don't use llm because DIYing it takes less time than prompting/waiting/checking every line.

3 more replies

throwaway20379mo ago

While I think this is a cool (public) experiment by Claude, asking an LLM to write security-sensitive code seems crazy at this point. Ad absurdum: Can you imagine asking Claude to implement new functionality in OpenSSL libs!?

PeterStuer9mo ago

Which is exactly why AI coding assistants work with your expertise rather than replace it. Most people I see fail at AI assisted development are either non-technical people expecting the AI will solve it all, or technical people playing gotcha with the machine rather than collaborating with it.

bootsmann9mo ago

There is also one quite early in the repo where the dev has to tell Claude to store only the hashes of secrets

kentonv9mo ago

Yeah I was disappointed in that one.

I hate to say, though, but I have reviewed a lot of human code in my time, and I've definitely caught many humans making similar-magnitude mistakes. :/

hn_throwaway_999mo ago

I just wanted to say thanks so much publishing this, and especially your comments here - I found them really helpful and insightful. I think it's interesting (though not unexpected) that many of the other commenters' comments here show what a Rorschach test this is. I think that's kind of unfortunate, because your experience clearly showed some of the benefits and limitations/pitfalls of coding like this in an objective manner.

I am curious, did you find the work of reviewing Claude's output more mentally tiring/draining than writing it yourself? Like some other folks mentioned, I generally find reviewing code more mentally tiring than writing it, but I get a lot of personal satisfaction by mentoring junior developers and collaborating with my (human) colleagues (most of them anyway...) Since I don't get that feeling when reviewing AI code, I find it more draining. I'm curious how you felt reviewing this code.

1 more reply

jjcm9mo ago

Most interesting aspect of this is it likely learned this pattern from human-written code!

1 more reply

jofzar9mo ago

I know I'm preaching to the masses here, but isn't this why PR are so important?

bananapub9mo ago

this seems like a true but pointless observation? if you're producing security-sensitive code then experts need to be involved, whether that's me unwisely getting a junior to do something, or receiving a PR from my cat, or using an LLM.

removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforces / former-NFT fanboys keep pushing, just letting an LLM generate code for a human to review then send out for more review is really pretty boring and already very effective for simple to medium-hard things.

toofy9mo ago

> …removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforce…

this is completely expected behavior by them. departments with well paid experts will be one of the first they’ll want to cut. in every field. experts cost money.

we’re a long, long, long way off from a bot that can go into random houses and fix under the sink plumbing, or diagnose and then fix an electrical socket. however, those who do most of their work on a computer, they’re pretty close to a point where they can cut these departments.

in every industry in every field, those will be jobs cut first. move fast and break things.

hn_throwaway_999mo ago

I think it's a critically important observation.

I thought this experience was so helpful as it gave an objective, evidence-based sample on both the pros and cons of AI-assisted coding, where so many of the loudest voices on this topic are so one-sided ("AI is useless" or "developers will be obsolete in a year"). You say "removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforces / former-NFT fanboys keep pushing", but the fact is many people with the power to push AI onto their workers are going to be more receptive to actual data and evidence than developers just complaining that AI is stupid.

ActionHank9mo ago

But AIbros will be running around telling everyone that Claude invented OAuth for Cloudflare all on its own and then opensourced it.

october81409mo ago

It's a Jr Developer that you have to check all their code over. To some people that is useful. But you're still going to have to train Jr Developers so they can turn into Sr Developers.

PeterStuer9mo ago

I don't like the jr dev analogy. It neither has the same weaknesses nor the same strenghts.

It's more like the genious coworker that has an overassertive ego and sometimes shows up drunk, but if you know how to work with them and see past their flaws, can be a real asset.

1 more reply

Cthulhu_9mo ago

I don't really agree; a junior developer, if they're curious enough, wouldn't just write insecure code, they would do self-study and find out best practices etc before writing code, including not storing plaintext passwords and the like.

1 more reply

paxys9mo ago

This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

kentonv9mo ago

It took me a few days to build the library with AI.

I estimate it would have taken a few weeks, maybe months to write by hand.

That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.

In my attempts to make changes to the Workers Runtime itself using AI, I've generally not felt like it saved much time. Though, people who don't know the codebase as well as I do have reported it helped them a lot.

I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that, since AI can help me find my way around very quickly, whereas previously I generally shied away from jumping in and would instead try to get someone on the team to make whatever change I needed.

srhtftw9mo ago

> It took me a few days to build the library with AI. ... > I estimate it would have taken a few weeks, maybe months to write by hand.

I don't think this is a fair assessment give the summary of the commit history https://pastebin.com/bG0j2ube shows your work started on 2025-02-27 and started trailing off at 2025-03-20 as others joined in. Minor changes continue to present.

> That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.

Still, this allowed you to complete in a month what may have taken two. That's a remarkable feat considering the time and value of someone of your caliber.

3 more replies

michelsedgh9mo ago

The fascinating part is that each person is finding their own way of using these tools from kids to elders and everyone in between no matter what your background or language or whatever is

1 more reply

9dev9mo ago

Funny thing. I have built something similar recently, that is a 2.1-compliant authorisation server in TypeScript[0]. I did it by hand, with some LLM help on the documentation. I think it took me about two weeks full time, give or take, and there’s still work to do, especially on the testing side of things, so I would agree with your estimate.

I’m going to take a very close look at your code base :)

[0] https://github.com/colibri-hq/colibri/blob/next/packages/oau...

nipah9mo ago

Your estimation maybe right, but maybe also there is a point on why it is right: https://neilmadden.blog/2025/06/06/a-look-at-cloudflares-ai-...

Maybe because (and I'm quoting that article) it is still lacking in what it should have that you managed to accomplish this task in "few days" instead of "a few weeks, maybe months".

Maybe the bottleneck was not your typing speed, but the [specific knowledge] to build that system. Because if you know something well enough, you can build it way faster, like rebuilding something from scratch, you will be faster as you already know the paths. In which case, my question would be: would not be writing this as fast, or maybe at least more secure and reasonable, if you had the complete knowledge of the system first.

Because contrary to LLMs, humans can actually improve and learn when they do things, and they don't whey they don't do things. Not knowing the code to the full extent is worth the time "gained" by using the LLM to write it?

I think it's very hard to estimate those other aspects of the thing.

upstairs-war9mo ago

Thanks kentonv. I picked up where you left off, supported with oauth2.1 rfc, and integrated ms oauth to our internal mcp server. Cool to have Claude be business aware

jdbohrman9mo ago

YES!!!! I've actually been thinking about starting a studio specifically geared to turning complex RFPs and protocols into usable tools with AI-assisted coding. I built these using Cursor just to test how for it could go. I think the potential of doing that as a service is huge:

https://github.com/jdbohrman-tech/hermetic-mls https://github.com/jdbohrman-tech/roselite

I think it's funny that Roselite caused a huge meltdown to the Veilid team simply because they have a weird adamancy to no AI assistance. They even called it "plagiarism"

graeme9mo ago

>I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that

This makes sense. Are there codebases where you find this doesn't work as well, either from the codebase's min required context size or the code patterns not being in the training data?

1 more reply

aprilthird20219mo ago

Matches my experiences well. Making changes to large, complex codebases I know well? Teaching the AI to get up to speed with me takes too much time.

Code I know nothing about? AI is very helpful there

philipwhiuk9mo ago

> Though, people who don't know the codebase as well as I do have reported it helped them a lot.

My problem I guess is that maybe this is just Dunning-Kruger esq. When you don't know what you don't know you get the impression it's smart. When you do, you think it's rubbish.

Like when you see a media report on a subject you know about and you see it's inaccurate but then somehow still trust the media on a subject you're a non-expert on.

3 more replies

gokhan9mo ago

> Not software engineers being kicked out ... but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

And the author is implementing a fairly technical project in this case. How about routine LoB app development?

thewebguyd9mo ago

> But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

This is likely where all this will end up. I have doubts that AI will replace all engineers, but I have no doubt in my mind that we'll certainly need a lot less engineers.

A not so dissimilar thing happened in the sysadmin world (my career) when everything transitioned from ClickOps to the cloud & Infrastructure as Code. Infrastructure that needed 10 sysadmins to manage now only needed 1 or 2 infrastructure folks.

The role still exists, but the quantity needed is drastically reduced. The work that I do now by myself would have needed an entire team before AWS/Ansible/Terraform, etc.

5 more replies

paxys9mo ago

Increased productivity means increased opportuntity. There isn't going to be a time (at least not anytime soon) when we can all sit back and say "yup, we have accomplished everything there is to do with software and don't need more engineers".

1 more reply

simonw9mo ago

I guess I have trouble emphasizing with "But what if you only need 2 kentonv's instead of 20 at the end?" because I'm an open source oriented developer.

What's open source for if not allowing 2 developers to achieve projects that previously would have taken 20?

bigstrat20039mo ago

> but rather experienced engineers using AI to generate bits of code and then meticulously testing and reviewing them.

My problem is that (in my experience anyways) this is slower than me just writing the code myself. That's why AI is not a useful tool right now. They only get it right sometimes so it winds up being easier to just do it yourself in the first place. As the saying goes: bad help is worse than no help at all, and AI is bad help right now.

motorest9mo ago

> My problem is that (in my experience anyways) this is slower than me just writing the code myself.

In my experience, the only times LLMs slow down your task is when you don't use them effectively. For example, if you provide barely any context or feedback and you prompt a LLM to write you the world, of course it will output unusable results, primarily because it will be forced to interpolate and extrapolate through the missing context.

If you take the time to learn how to gently prompt a LLM into doing what you need, you'll find out it makes you far more productive.

JimDabell9mo ago

> My problem is that (in my experience anyways) this is slower than me just writing the code myself.

How much experience do you have writing code vs how much experience do you have prompting using AI though? You have to factor in that these tools are new and everybody is still figuring out how to use them effectively.

1 more reply

uludag9mo ago

I feel this is on point. So not only is there the time lost correcting and testing AI generated code, but there's also the mental model you build of the code when you write it yourself.

Assuming you want a strong mental model of what the code does and how it works (which you'd use in conversations with stakeholders and architecture discussions for example), writing the code manually, with perhaps minor completion-like AI assistance, may be the optimal approach.

dkdcio9mo ago

> The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

I *think* the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI. I think it requires a lot of work: well-documented, well-structured codebases with fast built-in feedback loops (good linting/unit tests etc.), but we're heading there no

motorest9mo ago

> I think the answer to this is clearly no: or at least, given what we can accomplish today with the tools we have now, and that we are still collectively learning how to effectively use this, there's no way it won't be faster (with effective use) in another 3-6 months to fully-code new solutions with AI.

I think these discussions need to start from another point. The techniques changed radically, and so did the way problems are tackled. It's not that a software engineer is/was unable to deliver a project with/without LLMs. That's a red herring. The key aspects are things like the overall quality of the work being delivered vs how much time it took to reach that level of quality.

For example, one of the primary ways a LLM is used is not to write code at all: it's to explain to you what you are looking at. Whether it's used as a Google substitute or a rubber duck, developers are able to reason with existing projects and even explore approaches and strategies to tackle problem like they were never able to do so. You no longer need to book meetings with a principal engineer to as questions: you just drop a line in Copilot Chat and ask away.

Another critical aspect is that LLMs help you explore options faster, and iterate over them. This allows you to figure out what approach works best for your scenario and adapt to emerging requirements without having to even chat with anyone. This means that, within the timeframe you would deliver the first iteration of a MVP, you can very easily deliver a much more stable project.

2 more replies

necovek9mo ago

In a "well-documented, well-structured codebase with fast built-in feedback loops", a human programmer is really empowered to make changes fast. This is exactly what's needed for fast iteration, including in unfamiliar codebases.

When you are not introducing a new pattern in the code structure, it's mostly copy-paste and then edit.

But it's also extremely rare, so a pretty high bar to be able to benefit from tools like AI.

0xbadcafebee9mo ago

That's not the million dollar question; anyone who's done any kind of AI coding will tell you it's ridiculously faster. I haven't touched JavaScript, CSS & HTML in like a decade. But I got a whole website created with complex UI interactions in 20 minutes - and no frameworks - by just asking ChatGPT to write stuff for me. And that's the crappy, inefficient way of doing this work. Would have taken me a week to figure out all that. If I'd known how to do it already, and I was very good, perhaps it would have taken the same amount of time? But clearly there is a force-multiplier at work here.

The million dollar question is, what are the unintended, unpredicted consequences of developing this way?

If AI allows me to write code 10x faster, I might end up with 10x more code. Has our ability to review it gotten equally fast? Will the number of bugs multiply? Will there be new classes of bugs? Will we now hire 1 person where we hired 5 before? If that happens, will the 1 person leaving the company become a disaster? How will hiring work (cuz we have such a stellar track record at that...)? Will the changing economics of creating software now make SaaS no longer viable? Or will it make traditional commercial software companies no longer viable? Will the entire global economy change, the way it did with the rise of the first tech industry? Are we seeing a rebirth?

We won't know for sure what the consequences are for a while. But there will be consequences.

stackskipton9mo ago

>experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them

And where are supposed to get experienced engineers if replaced all Jr Devs with AI? There is a ton of benefit from drudgery of writing classes even if seems like grunt work at the time.

motorest9mo ago

> This is exactly the direction I expect AI-assisted coding to go in. Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

There is a middle ground: software engineers being kicked out because now some business person can hand over the task of building the entire OAuth infrastructure to a single inexperienced developer with a Claude account.

petersellers9mo ago

I'm not so sure that would work well in practice. How would the inexperienced developer know that the code created by the AI was correct? What if subtle bugs are introduced that the inexperienced developer didn't catch until it went out into production? What if the developer didn't even know how to debug those problems correctly? Would they know that the code they are writing is maintainable and extensible, or are they just going to generate a new layer of code on top of the old one any time they need a new feature?

2 more replies

belter9mo ago

The million-dollar question is not whether you can review at the speed the model is coding. It is whether you can trust review alone to catch everything.

If a robot assembles cars at lightning speed... but occasionally misaligns a bolt, and your only safeguard is a visual inspection afterward, some defects will roll off the assembly line. Human coders prevent many bugs by thinking during assembly.

pton_xd9mo ago

> Human coders prevent many bugs by thinking during assembly.

I'm far from an AI true believer but come on -- human coders write bugs, tons and tons of bugs. According to Peopleware, software has "an average defect density of one to three defects per hundred lines of code"!

1 more reply

chrisweekly9mo ago

THIS.

IMHO more rigorous test automation (including fuzzing and related techniques) is needed. Actually that holds whether AI is involved or not, but probably more so if it is.

Shorn9mo ago

And yet, doors still fall off airplanes without any AI in sight.

jstummbillig9mo ago

This is not where AI-assisted coding is going. Where it is going is: The AI will quickly become better at avoiding these types of mistakes than humans ever were (and are ever going to be), because they can and thus will be RL'ed away. What will be left standing longest is providing the vision wrt what the actual problem is, you want to solve.

kypro9mo ago

> Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X), but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

Why would a human review the code in a few years when AI is far better than the average senior developer? Wouldn't that be as stupid as a human reviewing stockfish's moves in Chess?

danans9mo ago

> Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X)

The theory of enshittification says that "business person pressing a few buttons" approach will be pursued, even if it lowers quality, to save costs, at least until that approach undermines quality so much that it undermines the business model. However, nobody knows how much quality tradeoff tolerance is there to mine.

hooverd9mo ago

AI is great for undifferentiated heavy lifting and surfacing knowledge, but by the time I've made all the decisions, I can just write the code that matters myself there.

tkiolp49mo ago

Why is speed important in this context? If the code is published one week/month later, would that affect what exactly? It’s open source.

kentonv9mo ago

As it happens, if this were released a month later, it would have been a huge loss for us.

This OAuth library is a core component of the Workers Remote MCP framework, which we managed to ship the day before the Remote MCP standard dropped.

And because we were there and ready for customers right at the beginning, a whole lot of people ended up building their MCP servers on us, including some big names:

https://blog.cloudflare.com/mcp-demo-day/

(Also if I had spent a month on this instead of a few days, that would be a month I wasn't spending on other things, and I have kind of a lot to do...)

c-linkage9mo ago

I very much appreciate the fact that the OP posted not just the code developed by AI but also posted the prompts.

I have tried to develop some code (typically non-web-based code) with LLMs but never seem to get very far before the hallucinations kick in and drive me mad. Given how many other people claim to have success, I figure maybe I'm just not writing the prompts correctly.

Getting a chance to see the prompts shows I'm not actually that far off.

Perhaps the LLMs don't work great for me because the problems I'm working on a somewhat obscure (currently reverse engineering SAP ABAP code to make a .NET implementation on data hosted in Snowflake) and often quite novel (I'm sure there is an OpenAuth implementation on gitbub somewhere from which the LLM can crib).

8-prime9mo ago

This is something that I have noticed as well. As soon as you venture into somewhat obscure fields, the output quality of LLMs drastically drops in my experience.

Side note, reverse engineering SAP ABAP sounds torturous.

1 more reply

theshrike799mo ago

The usual solution is a multi-tiered one.

First you use any LLM with a large context to write down the plan - preferably in a markdown file with checkboxes "- [ ] Task 1"

Then you can iterate on the plan and ask another LLM more focused on the subject matter to do the tasks one by one, which allows it to work without too much hallucination as the context is more focused.

mtlynch9mo ago

>In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed.

I'm confused by "I (@kentonv)" means here because kentonv is a different user.[0] Are you saying this is your alt? Or is this a typo/misunderstanding?

Edit: Figured out that most of your post is quoting the README. Consider using > and * characters to clarify.

[0] https://news.ycombinator.com/user?id=kentonv

kentonv9mo ago

He is quoting from the project readme. I wrote all this text.

mdaniel9mo ago

Thanks for weighing in here

If I might make a suggestion, based on how fast things change, even within a model family, you may benefit from saying Claude what. I was especially cognizant of this given the recent v4 release which (of course) hailed as the second coming. Regardless, you may want to update your readme to say

It may also be wildly out of scope for including in a project's readme, but knowing which of the bazillions of coding tools you used would also help a tiny bit with this reproduction crises found in every single one of these style threads

2 more replies

diggan9mo ago

It's a literal copy-paste from the README, I think it was supposed to be quoted but parent messed it up somehow.

https://github.com/cloudflare/workers-oauth-provider/blob/fe...

dang9mo ago

(this comment was originally a reply to https://news.ycombinator.com/item?id=44159167, which summarized the readme in a confusing way.)

jes51999mo ago

I’ve been using Claude (via Cursor) on a greenfield project for the last couple months and my observation is:

1. I am much more productive/effective

2. It’s way more cognitively demanding than writing code the old-fashioned way

3. Even over this short timespan, the tools have improved significantly, amplifying both of the points above

pton_xd9mo ago

This mirrors my experience and those I've talked to.

LLM assisted coding is a way to get stuff done much faster, at a greatly increased mental cost / energy spent. Oddly enough.

piker9mo ago

The small dopamine hits you get from "it compiles" are completely automated away, and you're forced to survive on the goal alone. The issues are necessarily complex and require thinking about how the LLM has gotten it subtly wrong.

Painful, but effective?

chii9mo ago

> Oddly enough.

i actually dont find that outcome odd at all. The high cognative demand comes from the elimination of spurious busy work that would normally come with coding (things like syntax sugars, framework outline, and such). If an AI takes care of all of these things, and lets an author "code" at the speed of thought, you'd be running your engine at maximum.

Not to mention the need to also critically look at the generated code to ensure it's actual correctness (hopefully this can also be helped/offloaded by an ai in the future).

diggan9mo ago

> It’s way more cognitively demanding than writing code the old-fashioned way

How are you using it?

I've been mainly doing "pair programming" with my own agent (using Devstral as of late) and find the reviewing much easier than it would been to literally type all of the code it produces, at least time wise.

I've also tried vibe coding for a bit, and for that I'd agree with you, as you don't have any context if you end up wanting to review something. Basically, if the project was vibe coded from the beginning, it's much harder to get into the codebase.

But when pair programming with the LLM, I already have a built up context, and understand how I want things to be and so on, so reviewing pair programmed code goes a lot faster than reviewing vibe coded code.

jes51999mo ago

I’ve tried a bunch of things but now I’m mostly using Cursor in agent mode with Claude Sonnet 4, doing small-ish pull-request-sized prompts. I don’t have to review code as carefully as I did with Claude 3.7

but I’m finding the bottleneck now is architecture design. I end up having these long discussions with chatGPT-o3 about design patterns, sometimes days of thinking, and then relatively quick implementation sessions with Cursor

1 more reply

SkyPuncher9mo ago

> 2. It’s way more cognitively demanding than writing code the old-fashioned way

Funnily, enough, I find the exact opposite. I feel so much relief that I don't have to waste time figuring out every, single detail. It frees me up to focus on architectural and higher level changes.

jes51999mo ago

I guess what I mean is, I found the details sort of “mindless” before. Code that I could write in my sleep. Now I only have to do the thinky parts

layer89mo ago

This means that you fully trust the LLM to get the details right.

infinitebattery9mo ago

From this commit: https://github.com/cloudflare/workers-oauth-provider/commit/...

===

"Fix Claude's bug manually. Claude had a bug in the previous commit. I prompted it multiple times to fix the bug but it kept doing the wrong thing.

So this change is manually written by a human.

I also extended the README to discuss the OAuth 2.1 spec problem."

===

This is super relatable to my experience trying to use these AI tools. They can get halfway there and then struggle immensely.

diggan9mo ago

> They can get halfway there and then struggle immensely.

Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.

It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.

So if something was wrong at one point, you need to go back to the initial message, and adjust it to clarify the prompt enough so it doesn't make that same mistake again, and regenerate the conversation from there on.

int_19h9mo ago

Chatbot UIs really need better support for conversation branching all around. It's very handy to be able to just right-click on any random message in the conversation in LM Studio and say, "branch from here".

3 more replies

dingnuts9mo ago

Can you imagine if Excel worked like this? the formula put out the wrong result, so try again! It's like that scene from The Office where Michael has an accountant "run it again." It's farcical. They have created computers that are bad at math and I will never forgive them.

Also, each try costs money! You're pulling the lever on a god damned slot machine!

I will TRY AGAIN with the same prompt when I start getting a refund for my wasted money and time when the model outputs bullshit, otherwise this is all confirmation and sunk cost bias talking, I'm sure if it.

1 more reply

eikenberry9mo ago

I thought Claude still has a problem generating the same output for the same input? That you can't just rewind and rerun and get to the same point again.

2 more replies

viktorcode9mo ago

It can be done, but for my environment the sum of all prompts that I end up typing to get the right result ends up being longer than the actual code.

So now I'm using LLMs as crapshoot machines for generating ideas which I then implement manually

krooj9mo ago

The comment in lines 163 - 172 make some claims that are outright false and/or highly A/S dependent, to the point where I question the validity of this post entirely. While it's possible that an A/S can be pseudo-generated based on lots of training data, each implementation makes very specific design choices: i.e.: Auth0's A/S allows for a notion of "leeway" within the scope of refresh token grant flows to account for network conditions, but other A/S implementations may be far more strict in this regard.

My point being: assuming you have RFCs (which leave A LOT to the imagination) and some OSS implementations to train on, each implementation usually has too many highly specific choices made to safely assume an LLM would be able to cobble something together without an amount of oversight effort approaching simply writing the damned thing yourself.

nicce9mo ago

I am waiting for studies whether we have just an illusion of production or these actually save man hours in the long term in creation of production-level systems.

arendtio9mo ago

One way to mitigate the issue is to use tests or specifications and let the AI find a solution to the spec.

A few months ago, solving such a spec riddle could take a while, and most of the time, the solutions that were produced by long run times were worse than the quick solutions. However, recently the models have become significantly better at solving such riddles, making it fun (depending on how well your use case can be put into specs).

In my experience, sonnet 3.7 represented a significant step forward compared to sonnet 3.5 in this discipline, and Gemini 2.5 Pro was even more impressive. Sonnet 4 makes even fewer mistakes, but it is still necessary to guide the AI through sound software engineering practices (obtaining requirements, discovering technical solutions, designing architecture, writing user stories and specifications, and writing code) to achieve good results.

Edit: And there is another trick: Provide good examples to the AI. Recently, I wanted to create an app with the OpenAI Realtime API and at first it failed miserably, but then I added the most important two pages of the documentation and one of the demo projects into my workspace and just like that it worked (even though für my use-case the API calls had to be use quite differently).

fxnn9mo ago

That's one thing where I love Golang. I just tell Aider to `/run go doc github.com/some/package`, and it includes the full signatures in the chat history.

It's true: often enough AI struggles to use libraries, and doesn't remember the usage correctly. Simply adding the go doc fixed that often.

mysterydip9mo ago

This to me is why I think these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data.

diggan9mo ago

> these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data

I mean, bypassing the fact that "actual understanding" doesn't have any consensus about what it is, does it matter if it's "actual understanding" or "kind of understanding", or even "barely understanding", as long as it produces the results you expect?

2 more replies

nisegami9mo ago

Same. But I personally find it a lot easier to do those bits at the end than to begin from a blank file/function, so it's a good match for me.

SkyPuncher9mo ago

Same here. Sometimes you just need time to stew in the problem/solution space.

LLMs let me be ultraproductive upfront then come in at the end to clean up when I have a full understanding.

jauntywundrkind9mo ago

> Again, please check out the commit history -- especially early commits -- to understand how this went.

Direct link to earliest page of history: https://github.com/cloudflare/workers-oauth-provider/commits...

A lot of very explicit & clear prompting, with direct directions to go. Some examples on the first page: https://github.com/cloudflare/workers-oauth-provider/commit/... https://github.com/cloudflare/workers-oauth-provider/commit/...

aeneas_ory9mo ago

Very impressive, and at the same time very scary because who knows what security issues are hidden beneath the surface. Not even Claude knows! There is very reliable tooling like https://github.com/ory/hydra readily available that has gone through years of iteration and pentests. There are also lots of libraries - even for NodeJS - that have gone through certification.

In my view this is an antipattern of AI usage and „roll your own crypto“ reborn.

simonw9mo ago

The most clearly Claude-written commits are on the first page, this link should get you to them: https://github.com/cloudflare/workers-oauth-provider/commits...

declan_roberts9mo ago

Getting a "Too Many Requests" error is kind of hilarious given the company involved.

rcastellotti9mo ago

same

_tqr39mo ago

I’ve tried building a web app with LLMs before. Two of them went in circles—I'd ask them to fix an infinite loop, they’d remove the code for a feature; I’d ask them to add the feature back, they’d bring back the infinite loop, and so on. The third one kept losing context—after just 2–3 messages, it would rebuild the whole thing differently.

They’ll probably get better, but for now I can safely say I’ve spent more time building and tweaking prompts than getting helpful results.

diggan9mo ago

Rather than doing that approach which eventually builds up to 10+ messages or more, iterate on your initial prompt and you'll see better results. So if the first prompt correctly fixed the infinite loop, but removed something else, instead of saying "Add that back again", change the initial prompt to include "Don't remove anything else than what's explicitly mentioned" or similar, and you'll either get exactly what you want, or some other issue. Then rinse and repeat until completed.

Eventually you'll build up a somewhat reusable template you can use as a system prompt to guide it exactly how you want.

Basically, you get what you ask for, nothing else and nothing more. If you're unclear, it'll produce unclear outputs, if you didn't mention something, it'll do whatever with that. You have to be really, really explicit about everything.

vaidhy9mo ago

I think the discussions are also missing another key element. The time in takes to read someone else code is way more mentally tiring.

When I am writing the code, my mind tracks what I have done and the new pieces flow. When I am reading code written by someone else, there is no flow.. I have to track individual pieces and go back and forth on what was done before.

I can see myself using LLMs for short snippets rather than start something top down.

qsort9mo ago

I think this is pretty cool, but it doesn't really move my priors that much. Looking at the commit history shows a lot of handholding even in pretty basic situations, but on the other hand they probably saved a lot of time vs. doing everything manually.

Luker889mo ago

Hello Cloudflare, impressive result, I did not think things were this advanced.

Still, legal question where I'd like to be wrong: AFAIK (and IANAL) if I use AI to generate images, I can't attach copyright to it.

But the code here is clearly copyrighted to you.

Is that possible because you manually modify the code?

How does it work in examples like this one where you try to have close to all code generated by AI?

kentonv9mo ago

I am also not a lawyer, but I believe the law here is yet to be fully settled. Here in the US, there have been lower-court rulings but surely it will go to the supreme court.

There are parts of the library that I did write by hand, which are presumably copyright Cloudflare either way. As for whether the AI-generated parts are, I guess we'll see.

But given the thing is MIT-licensed, it doesn't seem like it matters much in this case?

dvrp9mo ago

Did you check the latest documents from copyright.gov? They’re interesting exactly because of what you’re saying

Luker889mo ago

I did not, especially seeing as I am not from the USA, so I'd like to have the point of view of a multinational company

--edit: didn't the same office have a controversy a few weeks ago where AI training was almost declared not-fair-use, and the boss was fired on the spot byt the administration, or something like that?

Things sounds confusing to me, which is why I'm asking

fastball9mo ago

I believe you are wrong about AI-generated images as well.

kentonv9mo ago

I'm the author of this library! Or uhhh... the AI prompter, I guess...

I'm also the lead engineer and initial creator of the Cloudflare Workers platform.

--------------

Plug: This library is used as part of the Workers MCP framework. MCP is a protocol that allows you to make APIs available directly to AI agents, so that you can ask the AI to do stuff and it'll call the APIs. If you want to build a remote MCP server, Workers is a great way to do it! See:

https://blog.cloudflare.com/remote-model-context-protocol-se...

https://developers.cloudflare.com/agents/guides/remote-mcp-s...

--------------

OK, personal commentary.

As mentioned in the readme, I was a huge AI skeptic until this project. This changed my mind.

I had also long been rather afraid of the coming future where I mostly review AI-written code. As the lead engineer on Cloudflare Workers since its inception, I do a LOT of code reviews of regular old human-generated code, and it's a slog. Writing code has always been the fun part of the job for me, and so delegating that to AI did not sound like what I wanted.

But after actually trying it, I find it's quite different from reviewing human code. The biggest difference is the feedback loop is much shorter. I prompt the AI and it produces a result within seconds.

My experience is that this actually makes it feels more like I am authoring the code. It feels similarly fun to writing code by hand, except that the AI is exceptionally good at boilerplate and test-writing, which are exactly the parts I find boring. So... I actually like it.

With that said, there's definitely limits on what it can do. This OAuth library was a pretty perfect use case because it's a well-known standard implemented in a well-known language on a well-known platform, so I could pretty much just give it an API spec and it could do what a generative AI does: generate. On the other hand, I've so far found that AI is not very good at refactoring complex code. And a lot of my work on the Workers Runtime ends up being refactoring: any new feature requires a bunch of upfront refactoring to prepare the right abstractions. So I am still writing a lot of code by hand.

I do have to say though: The LLM understands code. I can't deny it. It is not a "stochastic parrot", it is not just repeating things it has seen elsewhere. It looks at the code, understands what it means, explains it to me mostly correctly, and then applies my directions to change it.

davidwu9mo ago

Thanks for so meticulously documenting the prompts you used and whether or not a commit was done manually or via the AI.

rethab9mo ago

Fancy! Why are the first twenty commits or so created in the same minute though? Surely you can’t be that fast if you need to prompt for each commit

kentonv9mo ago

That's weird! It must be due to a history rewrite I did later on to clean up the repo, removing some files that weren't really part of the project. I didn't realize when I first started the experiment that we'd actually end up releasing the code so I had to go back and clean it up later. I am surprised though that this messed up the timestamps -- usually rebases retain timestamps. I think I used `git filter-branch`, though. Maybe that doesn't retain timestamps.

1 more reply

tveita9mo ago

Some examples of prompt exchanges that seem representative:

https://claude-workerd-transcript.pages.dev/oauth-provider-t... ("Total cost: $6.45")!

https://github.com/cloudflare/workers-oauth-provider/commit/...

The first transcript includes the cost, would be interesting to know the ballpark of total Claude spend on this library so far.

This is opportune for me, as I've been looking for a description of AI workflows from people of some presumed competency. You'd think there would be many, but it's hard to find anything reliable amidst all the hype. Is anyone live coding anything but todo lists?

antirez: https://antirez.com/news/144#:~:text=Yesterday%20I%20needed%...

tptacek: https://news.ycombinator.com/item?id=44163292

kentonv9mo ago

I didn't keep extract track but I'd estimate the total cost of Claude credits to build this library was somewhere around $50, which is pretty negligible compared to the time saved.

multimoon9mo ago

I think this reinforces that “vibecoding” is silly and won’t survive. It still needed immensely skilled programmers to work with it and check its output, and fix several bugs it refused to fix.

Like anything else it will be a tool to speed up a task, but never do the task on its own without supervision or someone who can already do the task themselves, since at a minimum they have to already understand how the service is to work. You might be able to get by to make things like a basic website, but tools have existed to autogenerate stuff like that for a decade.

NitpickLawyer9mo ago

I don't think it does. Vibecoding is currently best suited for low-stakes stuff. Get a gui up, crud stuff, write an app for a silly one time use, etc. There's a ton of usage there. And it's putting that power in the hands of people that didn't have the capabilities before.

This isn't vibecoding. This is LLM-assisted coding.

subarctic9mo ago

I get the sense that "vibecoding" is used like a strawman these days, something people keep moving the goal posts on so they can keep saying it's silly. Getting an LLM to write code for you that mostly works with some tweaks is vibe coding, isn't it?

Izkata9mo ago

No. Vibe coding is never even looking at the code and using the LLM as the only interface.

mmaunder9mo ago

Claude 4 in agent mode is incredible. Nothing compares. But you need to have a deep technical understanding of what you’re building and how to split it into achievable milestones and build on each one. It also helps to provide it with URLs with specs, standards, protocols, RFCs etc that are applicable and then tell it what to use from the docs.

weinzierl9mo ago

"I thoughts LLMs were glorified Markov chain generators"

"the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked."

These two views are by no means mutually exclusive. I find LLMs extremely useful and still believe they are glorified Markov generators.

The take away should be that that is all you need and humans likely are nothing more than that.

kentonv9mo ago

I suppose it's all a continuum and we can each have different opinions on what the threshold for "glorified markov generator" is.

But there have been many cases in my experience where the LLM could not possibly have been simply pattern-matching to something it had seen before. It really did "understand" the meaning of the code by any definition that makes sense to me.

palata9mo ago

> It really did "understand" the meaning of the code by any definition that makes sense to me.

I find it dangerous to say it "understands". People are fast to say it "is sentient by any definition that makes sense to them".

Also, would we say that a compiler "understands" the meaning of the code?

smallnix9mo ago

> humans likely are nothing more than that

Relevant post: https://news.ycombinator.com/item?id=44089156

bufferoverflow9mo ago

> I find LLMs extremely useful and still believe they are glorified Markov generators.

Then you should be able to make a markov chain generator without deep neural nets, and it should be on the same level of performance as current LLMs.

But we both know you can't.

ronsor9mo ago

You can, but it will require far more memory than any computer has.

Flemlo9mo ago

The way the input doesn't match the output should imply that it's not just statistics.

As soon as compression happens, optimization happens which can lead to rules/learning of principles which got feed by statistics.

immibis9mo ago

That's "just" more statistics though.

1 more reply

hattmall9mo ago

I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

As an edit, after reading some of the prompts, what is the likelihood that a non-expert could even come up with those prompts?

The really really interesting thing would be if an AI could actually generate the prompts.

kentonv9mo ago

(I'm the author of this library -- or, the guy who prompted the AI at least.)

I absolutely would not vibe code an OAuth implementation! Or any other production code at Cloudflare. We've been using more AI internally, but made this rule very clear: the human engineer directing the AI must fully understand and take responsibility for any code which the AI has written.

I do think vibe coding can be really useful in low-stakes environments, though. I vibe-coded an Android app to use as a baby monitor (it just streams audio from a Unifi camera in the kid's room). I had no previous Android experience, and it would have taken me weeks to learn without AI, but it only took a few hours with AI.

I think we are in desperate need of safe vibe coding environments where code runs in a sandbox with security policies that make it impossible to screw up. That would enable a whole lot of people to vibe-code personal apps for personal use cases. It happens I have some background building such platforms...

But those guardrails only really make sense at the application level. At the systems level, I don't think this is possible. AI is not smart enough yet to build systems without serious bugs and security issues. So human experts are still going to be necessary for a while there.

diggan9mo ago

> I think we are in desperate need of safe vibe coding environments where code runs in a sandbox with security policies that make it impossible to screw up.

OpenAI's new Rust version of Codex might be of interest, haven't dived deeper into the codebase but seems they're thinking about sandboxing from the get-go: https://github.com/openai/codex/blob/7896b1089dbf702dd079299...

freedomben9mo ago

What tools did you use for the vibe coding an Android app? And was it able to do the UI stuff too?

I've wanted to do this but am not sure how to get started. For example, should I generate a new app in Android Studio and then point Claude Code at it? Or can I ask Claude Code (or another agent) to start it from scratch? (in the past that did not work, but I'm curious if it's just a PEBKAC error)

1 more reply

aerhardt9mo ago

> I do think vibe coding can be really useful in low-stakes environments, though. I vibe-coded an Android app to use as a baby monitor (it just streams audio from a Unifi camera in the kid's room). I had no previous Android experience, and it would have taken me weeks to learn without AI, but it only took a few hours with AI.

Bro, you're still an engineer at Cloudflare!

One problem I see with "vibe coding" is how it means one thing if Ilya Sutskever says it, and another if a non-tech executive parrots it and imagines "citizens developers" coding their own business apps.

rangerelf9mo ago

> I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

I don't know if it was the intent but these kind of questions bother me, the seem to hint at an agenda, "when can I have a farm of idiots with keyboards paid minimum wage churn out products indistinguishable from expertly designed applications".

To me that's the danger of AI, not it's purported intelligence, but our manifested greed.

1 more reply

dkdcio9mo ago

Why do you need a non-expert? We built on layers of abstractions, AI will help you at whichever layer you're the "expert" at. Of course you'll need to understand low-level stuff to work on low-level code

i.e. I might not use AI to build an OAuth library, but I might use AI to build a web app (which I am an expert at) that may use an OAuth library Cloudfare developed (which theya are experts at). Trying to make "anyone" code "anything" doesn't seem like the point to me

nisegami9mo ago

GP is just quoting the readme, they aren't the author.

My 2 cents:

>I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

No sooner and no later than we could say the same thing about a junior developer. In essence, if you can't validate the code produced by a LLM then you shouldn't really have been writing that code to begin with.

>The really really interesting thing would be if an AI could actually generate the prompts.

I think you've hit on something that is going underexplored right now in my opinion. Orchestration of AI agents, where a we have a high level planning agent delegating subtasks to more specialized agents to perform them and report back. I think an approach like that could help avoid context saturation for longer tasks. Cline / Aider / Roo Code / etc do something like this with architect mode vs coding mode but I think it can be generalized.

freedomben9mo ago

On a meta-note, it's (seriously) kind of refreshing to see that other people make this same typo when trying to type Cloudflare. I also often write CLoudflare, Cloudlfare, and Cloudfare:

> Cloudlflare builds OAuth with Claude and publishes all the prompts

gcr9mo ago

This library has some pretty bad security bugs. For example, the author forgot to check that redirect_uri — matches one of the URLs listed during client registration.

The CVE is uncharacteristically scornful: https://nvd.nist.gov/vuln/detail/cve-2025-4143

I’m glad this was patched, but it is a bit worrying for something “not vibe coded” tbh

eGQjxkKF6fif9mo ago

Looking at all of these arguments and viewpoints really is something to witness.

Congratulations Cloudflare, and thank you for showing that a pioneer, and leader in the internet security space can use the new methods of 'vibe coding' to build something that connects people in amazing ways, and that you can use these prompts, code, etc to help teach others to seek further in their exploration of programming developments.

Vibe programming has allowed me to break through depression and edit and code the way I know how to do; it is a helpful and very meaningful to me. I hope that, it can be meaningful for others.

I envision the current generation and future generations of people to utilize these things; but we need to accept, that this way of engineering, developing things, creation, is paving a new way for peoples.

Not a single comment in here is about people traumatized, broken, depressed, or have a legitimate reason for vibe coding.

These things assist us, as human beings; we need to be mindful that it isn't always about us. How can we utilize these things to to the betterment of the things we are passionate about? I humbly look forward to seeing how projects in the open source space can showcase not only developmental talent, but the ability to reason and use logic and project building thoughtfulness to use these tools to build.

Good job, Cloudflare.

nop_slide9mo ago

It literally says in the post it’s not “vibe coded”. That has a very specific meaning of not reviewing the code at all and accepting everything.

lapcat9mo ago

If my future career consists of constantly prompting and code-reviewing a semi-competent, nonhuman coder in order to eventually produce something decent, then I want no part in that future, even if it's more "efficient" in the sense of taking less time overall. That sounds extremely frustrating, personally unrewarding, alienating. I've read the prompts and the commit messages, and to be honest, I don't have the patience to deal with a Claude-level coder. I'd be yelling at the idiot and shaking my fists the whole time. I'd rather just take more time and write the code myself. It's much more pleasant that way. This future of A.I. work sounds like a dystopia to me. I didn't sign up for that. I never wanted to be a glorified babysitter.

It feels infinitely worse than mentoring an inexperienced engineer, because Claude is inhuman. There's no personal relationship, it doesn't make human mistakes or achieve human successes, and if Claude happens to get better in the future, that's not because you personally taught it anything. And you certainly can't become friends.

They want to turn artists and craftsmen into assembly line supervisors.

chii9mo ago

> They want to turn artists and craftsmen into assembly line supervisors.

the same was uttered by blacksmiths and other craftsman who has been displaced by technology. Yet they are mercilessly crushed.

Your enjoyment of a job is not a consideration to those paying you to do it; and if there's a more efficient way, it will be adopted. The idea that your job is your identity may be at fault here - and when someone's identity is being threatened (as it very much is right now with these new AI tools), they respond very negatively.

1 more reply

mattgreenrocks9mo ago

I deeply resent the notion that we engineers should let non-engineers tell us how to achieve agreed-upon objectives (e.g. "use LLMs more!"). I'm happy to use LLMs when they are useful. If I have to babysit them excessively, then it's a double loss: I'm not accruing domain knowledge, and I'm wasting time. The contract of work I was sold in the early 2000s: decision makers specify what should be be built, and what the time constraints are. This bounds the space of possibilities along with the local engineering culture. I bear the responsibility of execution, clarifying requirements, and bringing up potential issues sooner rather than later.

However, at no point was the exact technical approach prescribed to me. It'd be asinine if someone came to me and said, "you need to be using VSCode, not vim." It's irrelevant to execution. Yet, that's exactly what's happening with LLMs.

The denial of agency to devs via prescriptive LLM edicts will only end badly.

1 more reply

zeroq9mo ago

Holy cow!

My latest try with Gemini went like this:

  - Write me a simple todo app on CloudFlare with auth0 authentication.
  - Let's proceed with a simple todo app on CloudFlare. We start by importing the @auth0-cloudflare and...
  - Does that @auth0-cloudflare actually exists?
  - Oh, it doesn't. I can give you a walkthrough on how to set up an account on auth0. Would you like me to?
  - Yes, please.
  - Here. I'm going to write the walkthrough in a document... (proceed to create an empty document)
  - That seems to be an empty document.
  - Oh, my bad. I'll produce it once more. (proceed to create another empty document)
  - Seems like you're md parsing library is broken, can you write it in chat instead?
  - Yes... (Your Gemini trial has expired. Would you like to pay $100 to continue?)

My idea was to try the new model with a low hanging fruit - as kentov mentioned, it's a very basic task that has been made thousand of times on the internet with extremely well documented APIs (officially and on reddit/stackoverflow/etc.).

Sure, it was a short hike before my trial expired, and kentov himself admited it took him couple of days to put it together, but... holy cow.

thih99mo ago

Congrats and thanks for sharing, both the code and the story.

Which Claude plan did you use? Was it enough or did you feel limited by the quotas?

kentonv9mo ago

This was mostly Claude Code, which runs on API credits. I think I spent a two-digit number of dollars. The model was Sonnet 3.7 (this was all a couple months ago, before Claude 4).

alanfranz9mo ago

Carefully reviewed greenfield project; I don’t think this is astonishing, and I very much love they recorded the prompts.

Question is: will this work for non-greenfield projects as well? Usually 95% of work in a lifetime is not greenfield.

Or will we throw away more and more code as we go, since AI will rewrite it, and we’ll probably introduce subtle bugs as we go?

DaiPlusPlus9mo ago

> Question is: will this work for non-greenfield projects as well?

Depends on the project. Word on the street is the closer your project is to an archetypical React tutorial TODO App then you'll likely be pleased with the results. Whereas if your project is a WDK driver in Rust where every file is a minefield then you'll spend the next few evenings having to audit everything with a fine toothed comb.

> since AI will rewrite it

That depends if you believe in documentation-first or program-first definitions of a specification.

ookblah9mo ago

AI critics always have to make strawmen arguments about how there has to be a human in the loop to "fix" things when that's never been the argument AI proponents ever make (at least those who deal with it day to day). This will only get better with time. AI can frequently one-shot throwaway scripts that I need get things done. For actual features I typically start and have it go thru the initial slog and then finish it off. You must be reviewing the entire time, but it takes a huge cognitive load off. You can rubber-duck debug with it.

I do agree if you have no idea what you are doing or are still learning it could be a detriment, but like anything it's just a tool. I feel for junior devs and the future. Lazy coders get lazier, those who utilize them to the fullest extent get even better, just like with any tech.

skydhash9mo ago

The one thing about concocting throwaway scripts yourself is the increased familiarity with the tooling you use. And you're not actually throwing away those scripts. I have random scripts laying around my file system (and my shell history) to check how I did a task in the past.

2 more replies

nipah9mo ago

> This will only get better with time.

Prove it.

skybrian9mo ago

Looking at the commit history, there’s a fair bit of manual intervention to fix bugs and remove unused code.

wooque9mo ago

Not surprised, this is perfect task for AI, boilerplaty code that implements something that is implemented 100 times. And it's small project, 1200 lines of pure code.

I'm surprised I took them more than 2 days to do that with AI.

dang9mo ago

We changed the URL from https://github.com/cloudflare/workers-oauth-provider/commits... to the project page.

abroadwin9mo ago

Oh hey, looks like it's mostly Kenton Varda, who you may recognize from his LAN party house: https://news.ycombinator.com/item?id=42156977

davidjfelix9mo ago

Or Cap'n'Proto, Protobuf, Cloudflare workers, Cloudflare Durable Objects. The LAN house is cool too.

EtienneK9mo ago

> This is a TypeScript library that implements the provider side of the OAuth 2.1 protocol with PKCE support.

What is the "provider" side? OAuth 2.1 has no definition of a "provider". Is this for Clients? Resource Servers? Authorization Server?

Quickly skimming the rest of the README it seems this is for creating a mix of a Client and a Resource Server, but I could be mistaken.

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs

Experience with the RFCs but have not been able to correctly name it.

kentonv9mo ago

This library helps implement both the resource server and authorization server. Most people understand these two things to be, collectively, the "provider" side of OAuth -- the service provider, who is providing an API that requires authorization. The intent when using this library is that you write one Worker that does both. This library has no use on the client side.

This is intended for building lightweight services quickly. Historically there has been no real need for "lightweight" OAuth providers -- if you were big enough that people wanted to connect to you using OAuth, you were not lightweight. MCP has sort of changed that as the "big" side of an MCP interaction is the client side (the LLM provider), whereas lots of people want to create all kinds of little MCP servers to do all kinds of little things. But MCP specifies OAuth as the authentication mechanism. So now people need to be able to implement OAuth from the provider side easily.

> Experience with the RFCs but have not been able to correctly name it.

These docs are written for people building MCP servers, most of whom only know they want to expose an API to AIs and have never read OAuth RFCs. They do not know or care about the difference between an authorization server and a resource server.

1 more reply

DaiPlusPlus9mo ago

> OAuth 2.1 has no definition of a "provider"

Strictly speaking, yes. But speaking of IDPs more broadly, it’s perfectly acceptable to refer to the authorisation-server as an auth-provider, especially in OIDC (which is OAuth, with extensions) where it’s explicitly called “OpenID provider” - so it’s natural for anyone well-versed in both to cross terminology like that.

cyberax9mo ago

I looked through the source code, and it looks reasonable. The code is well-commented (even _over_ commented a bit). There are probably around ~700 meaningful lines of pure code. So this should be about 2 weeks of work for a good developer. This is without considering the tests.

And OAuth is not particularly hard to implement, I did that a bunch of times (for server and the client side). It's well-specified and so it fits well for LLMs.

So it's probably more like 2x acceleration for such code? Not bad at all!

nipah9mo ago

700 lines of code is 2 weeks of work for a good developer? My friend, I wrote 350 lines of executable code (excluding boilerplate) in a morning (4AM to like 9AM, maybe a bit more) to make a test with voxel octrees like yesterday. There's no reason it would take "2 weeks of work for a good developer" to write 700. What takes times in those projects is the research, if you already have this fresh in your head it should not take more than 3 days to make something very simple but reasonable, and a week at max to make something good (not perfect, but good).

globular-toast9mo ago

Should I be impressed? Oauth already exists and there are countless libraries implementing it. Is it impressive that an LLM can regurgitate yet another one?

jplehmann9mo ago

Fascinating share and discussion. I read many of the comments, but extracted key take-aways using GPT here: https://chatgpt.com/share/6840c9e8-a498-8005-971b-3b91e09b9d... for anyone interested.

scherlock9mo ago

Is it really good form in TypeScript to make all functions async, even when functions don't use await? like this, https://github.com/cloudflare/workers-oauth-provider/blob/fe...

kentonv9mo ago

env.OAUTH_KV.get(...) is reading from Workers KV storage. It returns a promise.

ccorcos9mo ago

No. It’s possible env[?].get returns a promise though

bsder9mo ago

> Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards.

So, for those of us who are not OAuth experts, don't have a team of security engineers on call, and are likely to fall into all the security and compliance traps, how does this help?

I don't need AI to write my shitty code. I need AI to review and correct my shitty code.

baq9mo ago

You want hammers to review your woodwork or your hammering technique, too?

…anyway, Gemini pro is a quite good reviewer if you are specific about what you need reviewed and provide relevant dependencies in the context.

IncreasePosts9mo ago

Has this source been compared with other oauth libraries, to see if it is just license-violating some other open source code it was trained on?

varispeed9mo ago

The thing is you need to know what exactly LLM should create and you need to know what it is doing wrong and tell it to fix it. Meaning, if you don't already have skill to build something yourself, AI might not be as useful. Think of it as keyboard on steroids. Instead of typing literally what you want to see, you just describe it in detail and LLM decompresses that thought.

_pdp_9mo ago

It is a single file with 2630 locs and it is a straightforward problem. 1/3 of the code is just interface definitions and comments.

DJBunnies9mo ago

I feel like well defined RFCs and standards are easily coded against, and I question the investment/value/time tradeoff here. These things happily regurgitate training data, but seriously struggle when they don’t have a pool of perfect examples to pull from.

When Claude can do something new, then I think it will be impressive.

Otherwise it’s just piecing together existing examples.

alienbaby9mo ago

Reading the authors comments on the github page I can relate. Over this paast weekend I attempted to use copilot to write some code for a home project and expected it to be terrible, like the last time I tried.

Except, this time it wasn't. It got most things right first time, and fixed things I asked it to.

I was pleasantly surprised.

topspin9mo ago

Been here many times:

    This time Claude fixed the problem, but:
    - It also re-ordered some declarations, even though I told it not to. AFAICT they aren't changed, just reordered, and it also added some doc comments.
    - It fixed an unrelated bug, which is that `getClient()` was marked `private` in `OAuthProvider` but was being called from inside `OAuthHelpers`. I hadn't noticed this before, but it was indeed a bug.

Frequently can't get LLMs to limit themselves to what has been prompted, and instead they run around and "best practice" everything, "fixing" unrelated issues, spewing commentary everywhere, and creating huge, unnecessary diffs.

dboreham9mo ago

Tbf I've seen human developers do this and similar irritating things many times.

ZiiS9mo ago

Shouldn't they really have asked it to read https://developers.cloudflare.com/workers/examples/protect-a...

kentonv9mo ago

The secret token is hashed first, and it's the hash that is looked up in storage. In this arrangement, an attacker cannot use timing to determine the correct value byte-by-byte, because any change to the secret token is expected to randomize the whole hash. So, timing-safe equality is not needed.

That said, if you have spotted a place in the code where you believe there is such a vulnerability, please do report it. Disclosure guidelines are at: https://github.com/cloudflare/workers-oauth-provider/blob/ma...

1 more reply

kiitos9mo ago

Is this not... embarrassing? to the engineers who submit these commits?

It seems that way to me...

Certainly if I were on a hiring panel for anyone who had this kind of stuff in their Google search results, it would be a hard-no from me -- but what do i know?

catigula9mo ago

When I want to spend a dollar or two, it's much faster to just instruct Claude on how to write my code and prompt/correct it than to write it myself.

It feels probably similarly from going from dumb or semi-dumb text editor to an IDE.

zackify9mo ago

Oauth isn’t that complicated. It’s not a surprise to see an llm build out from the spec. Honestly I was playing around writing a low level implementation recently just for fun as I built out my first oauth mcp server.

ab_testing9mo ago

Sorry this might be a dumb question but where are the prompts in the source code ? I was thinking like I prompt ChatGPT and it prints some code , there would be similar prompts. Is the readme the prompt?

animex9mo ago

https://github.com/cloudflare/workers-oauth-provider/commits...

Start at the bottom...they are in the commit messages, or sometimes the .md file

paulddraper9mo ago

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

vjerancrnjak9mo ago

I like how it is just 1 file.

Wonder how well incremental editing works with such a big file. I keep pushing for 1 file implementations, yet people split it up into bazillion files because it works better with AI.

weird-eye-issue9mo ago

Unfortunately Claude Code falls apart as soon as you hit 25k tokens in a single file. It's a hard coded limit where they will no longer put the full file into the prompt so it's up to the model to read it chunk by chunk or by using search tools

mehdibl9mo ago

That's great.

But Claude don't allow yet to add APPS in their backend.

Mainly only closed beta for integration.

How you can configure an app to leverage correctly Oauth and have your own app secret ID/ Client ID!

teaearlgraycold9mo ago

> Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

This sounds like coding but slower

kentonv9mo ago

I would say it ended up being much faster than had I written it by hand. It took a few days to produce this library -- it would almost certainly have taken me weeks to write it myself.

3 more replies

TeMPOraL9mo ago

The point was validating a hypothesis. That is the validation part.

1 more reply

jonplackett9mo ago

This doesn’t seem like much of a surprise that it’s possible - if you are a security expert, you can make LLMs write secure code.

Bluestein9mo ago

From the docs:

> "NOOOOOOOO!!!! You can't just use an LLM to write an auth library!"

> "haha gpus go brrr"

jbeus9mo ago

Getting rate limited…Cloudflare, do you you think you can help with caching and load balancing for github?

rienbdj9mo ago

Why not use an existing OAuth library?

throwaway3141559mo ago

"built OAuth" here means they "implemented OAuth for CloudFlare workers" FYI.

hamdouni9mo ago

Claude is not mentioned in the 'contributors' section.

jwally9mo ago

fwiw, I feel like LLM code generation is scaffolding on steroids, and strapped to a rocket. A godsend if you know what you're doing, but really easy to blow yourself up if you complacent. At least with where models are today; imho.

caycep9mo ago

tbh I would find it annoying to have to go audit someone else (i.e. an LLM's) code...

Also, maybe the humbling question is, maybe we humans aren't so exceptional if 90% of the sum of human knowledge can be predicted by next-word-prediction

horacemorace9mo ago

I did the same thing a few months ago with 4o. This stuff works fine if done with care.

unshavedyak9mo ago

Yup. I'm more skeptic than pro-AI these days, but nonetheless i'm still trying to use AI in my workflows.

I don't actually enjoy it, i generally find it difficult to use as i have more trouble explaining what i want than actually just doing it. However it seems clear that this is not going away and to some degree it's "the future". I suspect it's better to learn the new tools of my craft than to be caught unaware.

With that said i still think we're in the infancy of actual tooling around this stuff though. I'm always interested to see novel UXs on this front.

qsort9mo ago

Probably unrelated to the broader discussion, but I don't think the "skeptic vs pro-AI" distinction even makes that much sense.

For example, I usually come off as being relatively skeptic within the HN crowd, but I'm actually pushing for more usage at work. This kind of "opinion arbitrage" is common with new technologies.

4 more replies

thewebguyd9mo ago

> I don't actually enjoy it, i generally find it difficult to use as i have more trouble explaining what i want than actually just doing it.

This is my problem I run into quite frequently. I have more trouble trying to explain computing or architectural concepts in natural language to the AI than I do just coding the damn thing in the first place. There are many reasons we don't program in natural language, and this is one of them.

I've never found natural language tools easier to use, in any iteration of them, and so I get no joy out of prompting AI. Outside of the increasingly excellent autocomplete, I find it actually slows me down to try and prompt "correctly."

cheema339mo ago

> I don't actually enjoy it, i generally find it difficult to use as i have more trouble explaining what i want than actually just doing it.

Most people who are good at a skill start here with AI. Over time, they learn how to explain things better to AI. And the output from AI improves significantly as a result. Additionally AI models keep improving over time as well.

If you stay with it, you will reach an inflection point soon enough and will actually start enjoying it.

helsinki9mo ago

I don’t see any prompts?

sensanaty9mo ago

The expanded commit messages have the prompts

stego-tech9mo ago

On the one hand, I would expect LLMs to be able to crank out such code when prompted by skilled engineers who also understand prompting these tools correctly. OAuth isn’t new, has tons of working examples to steal as training data from public projects, and in a variety of existing languages to suit most use cases or needs.

On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc - because that requires learning “in real time” from information sources you’re literally generating in that moment, not decades of Stack Overflow responses without context. That has been bandied about for years, with no evidence to show for it beyond specifically cherry-picked examples, often from highly-controlled environments.

I never doubted that, with competent engineers, these tools could be used to generate “new” code from past datasets. What I continue to doubt is the utility of these tools given their immense costs, both environmentally and socially.

btown9mo ago

It's said that much of research is data janitorial work, and from my experience that's not just limited to the machine learning space. Every research scientist wishes that they had an army of engineers to build bespoke tooling for their niche, so they could get back to trying ideas at the speed of thought rather than needing to spend a day writing utility functions for those tools and poring over tables to spot anomalies. Giving every researcher a priceless level of leverage is a tremendous social good.

Of course, we won't be able to tell the real effects, now, because every longitudinal study of researchers will now be corrupted by the ongoing evisceration of academic research in the current environment. Vibe-coding won't be a net creativity gain to a researcher affected by vibe-immigration-policy, vibe-grant-availability, and vibe-firings, for all of which the unpredictability is a punitive design goal.

Whether fear of LLMs taking jobs has contributed to a larger culture of fear and tribalism that has emboldened anti-intellectual movements worldwide, and what the attributable net effect on research and development will be... it's incredibly hard to quantify.

4 more replies

abalone9mo ago

I like to make a rough analogy with autonomous vehicles. There's a leveling system from 1 (old school cruise control) to 5 (full automation):

* We achieved Level 2 autonomy first, which requires you to fully supervise and retain control of the vehicle and expect mistakes at any moment. So kind of neat but also can get you in big trouble if you don't supervise properly. Some people like it, some people don't see it as a net gain given the oversight required.

^ This is where Tesla "FSD beta" is at, and probably where LLM codegen tools are at today.

* After many years we have achieved a degree of Level 4 autonomy on well-trained routes albeit with occasional human intervention. This is where Waymo is at in certain cities. Level 4 means autonomy within specific but broad circumstances like a given area and weather conditions. While it is still somewhat early days it looks like we can generally trust these to operate safely and ask for help when they are not confident. Humans are not out of the loop.[1]

^ This is probably what where we can expect codegen to grow after many more years of training and refinement in specific domains. I.e. a lot of what CloudFlare engineers did with their prompt engineering tweaking was of this nature. Think of them as the employees driving the training vehicles around San Francisco for the past decade. And similarly, "L4 codegen" needs to prioritize code safety which in part means ensuring humans can understand situations and step in to guide and debug when the tool gets stuck.

* We are still nowhere close to Level 5 "drive anywhere and under any conditions a human can." And IMHO it's not clear we ever will based purely on the technology and methods that got us to L4. There are other brain mechanisms at work that need to be modeled.

[1] https://www.cnbc.com/2023/11/06/cruise-confirms-robotaxis-re...

1 more reply

diggan9mo ago

> On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc

Does it even have to be able to do so? Just the ability to speed up exploration and validation based on what a human tells it to do is already enormously useful, depending on how much you can speed up those things, and how accurate it can be.

Too slow or too inaccurate and it'll have a strong slowdown factor. But once some threshold been reached, where it makes either of those things faster, I'd probably consider the whole thing "overall useful". Nut of course that isn't the full picture and ignoring all the tradeoffs is kind of cheating, there are more things to consider too as you mention.

I'm guessing we aren't quite over the threshold because it is still very young all things considered, although the ecosystem is already pretty big. I feel like generally things tend to grow beyond their usefulness initially, and we're at that stage right now, and people are shooting it all kind of directions to see what works or not.

2 more replies

svara9mo ago

> On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things

Really a lot of innovation, even at the very cutting edge, is about combining old things in new ways, and these are great productivity tools for this.

I've been "vibe coding" quite a bit recently, and it's been going great. I still end up reading all the code and fixing issues by hand occasionally, but it does remove a lot of the grunt work of looking up simple things and typing out obvious code.

It helps me spend more time designing and thinking about how things should work.

It's easily a 2-3x productivity boost versus the old fashioned way of doing things, possibly more when you take into account that I also end up implementing extra bells and whistles that I would otherwise have been too lazy to add, but that come almost for free with LLMs.

I don't think the stereotype of vibe coding, that is of coding without understanding what's going on, actually works though. I've seen the tools get stuck on issues they don't seem to be able to understand fully too often to believe that.

I'm not worried at all that LLMs are going to take software engineering jobs soon. They're really just making engineers more powerful, maybe like going from low level languages to high level compiled ones. I don't think anyone was worried about the efficiency gains from that destroying jobs either.

There's still a lot of domain knowledge that goes into using LLMs for coding effectively. I have some stories on this too but that'll be for another day...

TeMPOraL9mo ago

> where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc - because that requires learning “in real time” from information sources you’re literally generating in that moment, not decades of Stack Overflow responses without context.

Personally I hope this will materialize, at the very least because there's plenty of discoveries to be made by cross-correlating discoveries already made; the necessary information should be there, but reasoning capability (both that of the model and that added by orchestration) seems to be lacking. I'm not sure if pure chat is the best way to access it, either. We need better, more hands-on tools to explore the latent spaces of LLMs.

1 more reply

waynenilsen9mo ago

most engineering is glorified plumbing so as far as labour productivity goes, this should go a long way

1 more reply

lincoln20xx9mo ago

I have a non-zero number of industrial process patents under my belt. Allegedly, that means that I had ideas that had not previously been recorded. Once I wrote them down, paid some lawyers a bunch of money, and did some paperwork, I have the right to pay lawyers more money to make someone's life difficult if I think that someone ever tries to do something with the same thoughts, regardless of if they had those thoughts before, after, or independently of me.

In my opinion, there is a very valid argument that the vast majority of things that are patented are not "new" things, because everything builds on something else that came before it.

The things that are seen as "new" are not infrequently something where someone in field A sees something in field B, ponders it for a minute, and goes "hey, if we take that idea from field B, twist it clockwise a bit, and bolt it onto the other thing we already use, it would make our lives easier over in this nasty corner of field A." Congratulations! "New" idea, and the patent lawyers and finance wonks rejoice.

LLMs may not be able to truly "invent" "new" things, depending on where you place those particular goalposts.

However, even a year or two ago - well before Deep Research et al - they could be shockingly useful for drawing connections between disparate fields and applications. I was working through a "try to sort out the design space of a chemical process" type exercise, and decided to ask whichever GPT was available and free at the time about analogous applications and processes in various industries.

After a bit of prodding it made some suggestions that I definitely could have come up on my own if I had the requisite domain knowledge, but would almost certainly never have managed on my own. It also caused me to make a connection between a few things that I don't think I would have stumbled upon otherwise.

I checked with my chemist friends, and they said the resulting ideas were worth testing. After much iteration, one of the suggested compounds/approaches ended up generating the least bad result from that set of experiments.

I've previously sketched out a framework for using these tools (combined with other similar machine learning/AI/simulation tools) to massively improve the energy consumption of industrial chemical processes. It seems to me that that type of application is one where the LLM's environmental cost could be very much offset by the advances it provides.

The social cost is a completely different question though, and I think a very valid one. I also don't think our economic system is structured in such a way that the social costs will ever be mitigated.

Where am I going with this? I'm not sure.

Is there a "ghost in the machine"? I wouldn't place a bet on yes, at least not today. But I think that there is a fair bit of something there. Utility, if nothing else. They seem like a force multiplier to me, and I think that with proper guidance, that force multiplier could be applied to basic research, material science, economics, and "inventions".

Right now, it does seem that it takes someone with a lot of knowledge about the specific area, process, or task to get really good results out of LLMs.

Will that always be true? I don't know. I think there's at least one piece of the puzzle we don't have sorted out yet, and that the utility of the existing models/architectures will ride the s-curve up a bit longer but ultimately flatten out.

I'm also wrong a LOT, so I wouldn't bet a shiny nickel on that.

okthrowman2839mo ago

The sheer amount of copium in this thread is illuminating, it’s fascinating the lengths people will go to downplaying advancements like this when their egos/livelihoods are threatened - pretty natural though I suppose.

blibble9mo ago

> I thoughts LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel.

so he's been convinced by it shitting out yet another javascript oauth library?

this experiment proves nothing re: novelty

kentonv9mo ago

While implementing the OAuth standard itself is not novel, many of the specific design details in this implementation are. I gave it a rather unusual API spec, an unusual storage schema, and an unusual end-to-end encryption scheme. It was totally able to understand these requests, even reasoning about the motivation behind them, and implement what I wanted. That's what convinced me.

BTW, the vast majority of JS OAuth libraries are implementing the client side of OAuth. Provider-side implementations are relatively rare, as historically it's mostly only big-name services that ever get to the point of being a OAuth providers, and they tend to build it all in-house and not release code.

1 more reply

ayuhito9mo ago

Good thing most of my tasks don’t require novelty, just working code.

tonyhart79mo ago

same argument with me but only for claude

another models feels like shit to use, but claude is good

yapyap9mo ago

bit of a typo in the title

csmpltn9mo ago

There are tens (if not hundreds) of thousands of OAuth libraries out there. Probably millions of relevant codebases on GitHub, Bitbucket, etc. Possibly millions of questions on StackOverflow, Reddit, Quora. Vast amounts of documentation across many products and websites. RFCs. All kinds of forums. Wikipedias…

Why are you so surprised an LLM could regurgitate one back? I wouldn’t celebrate this example as a noteworthy achievement…

ThrowawayTestr9mo ago

Could you imagine typing the words "write an oauth library in typescript" into a computer and it actually working even 5 years ago? This is literally science fiction.

1 more reply

bigcat123456789mo ago

Kenton at it again!

arrty889mo ago

im using AI to build a cloudflare replica :)

apwell239mo ago

yes there are plenty of examples of ppl writing tic-tac-toe or a flying simulator with llm all over youtube. what does that prove exactly? oauth is as routine as it gets.

Phiality9mo ago

This is so cool

pier259mo ago

Did you really save time given that every line of code was "thoroughly reviewed"?

dboreham9mo ago

It turns out that "actually understanding" is a fictional concept. It's just the delusion some LLMs (yours and mine) has about what's going on inside itself.

keeda9mo ago

A number of comments point out that OAuth is a well known standard and wonder how AI would perform on less explored problem spaces. As it happens I have some experience there, which I wrote about in this long-ass post nobody ever read: https://www.linkedin.com/pulse/adventures-coding-ai-kunal-ka...

It’s now a year+ old and models have advanced radically, but most of the key points still hold, which I've summarized here. The post has way more details if you need. Many of these points have also been echoed by others like @simonw.

Background:

* The main project is specialized and "researchy" enough that there is no direct reference on the Internet. The core idea has been explored in academic literature, a couple of relevant proprietary products exist, but nobody is doing it the way I am.

* It has the advantage of being greenfield, but the drawback of being highly “prototype-y”, so some gnarly, hacky code and a ton of exploratory / one-off programs.

* Caveat: my usage of AI is actually very limited compared to power users (not even on agents yet!), and the true potential is likely far greater than what I've described.

Highlights:

* At least 30% and maybe > 50% of the code is AI-generated. Not only are autocompletes frequent, I do a lot of "chat-oriented" and interactive "pair programming", so precise attribution is hard. It has written large, decently complicated chunks of code.

* It does boilerplate extremely easily, but it also handles novel use-cases very well.

* It can refactor existing code decently well, but probably because I'ver worked to keep my code highly modular and functional, which greatly limits what needs to be in the context (which I often manage manually.) Errors for even pretty complicated requests are rare, especially with newer models.

Thoughts:

* AI has let me be productive – and even innovate! – despite having limited prior background in the domains involved. The vast majority of all innovation comes from combining and applying well-known concepts in new ways. My workflow is basically a "try an approach -> analyze results -> synthesize new approach" loop, which generates a lot of such unique combinations, and the AI handles those just fine. As @kentonv says in the comments, there is no doubt in my mind that these models “understand” code, as opposed to being stochastic parrots. Arguments about what constitutes "reasoning" are essentially philosophical at this point.

* While the technical ideas so far have come from me, AI now shows the potential to be inventive by itself. In a recent conversation ChatGPT reasoned out a novel algorithm and code for an atypical, vaguely-defined problem. (I could find no reference to either the problem or the solution online.) Unfortunately, it didn't work too well :-) I suspect, however, that if I go full agentic by giving it full access to the underlying data and letting it iterate, it might actually refine its idea until it works. The main hurdles right now are logistics and cost.

* It took me months to become productive with AI, having to find a workflow AND code structure that works well for me. I don’t think enough people have put in the effort to find out what works for them, and so you get these polarized discussions online. I implore everyone, find a sufficiently interesting personal project and spend a few weekends coding with AI. You owe it to yourself, because 1) it's free and 2)...

* Jobs are absolutely going to be impacted. Mostly entry-level and junior ones, but maybe even mid-level ones. Without AI, I would have needed a team of 3+ (including a domain expert) to do this work in the same time. All knowledge jobs rely on a mountain of donkey work, and the donkey is going the way of the dodo. The future will require people who uplevel themselves to the state of the art and push the envelope using these tools.

* How we create AI-capable senior professionals without junior apprentices is going to be a critical question for many industries. My preliminary take is that motivated apprentices should voluntarily eschew all AI use until they achieve a reasonable level of proficiency.

revskill9mo ago

So who's the experts here ?

gregorywegoryOP9mo ago

From the readme: This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results). Check out the commit history to see how Claude was prompted and what code it produced.

"NOOOOOOOO!!!! You can't just use an LLM to write an auth library!"

"haha gpus go brrr"

In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed. I was an AI skeptic. I thoughts LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel. I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh... the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.

To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs. I was trying to validate my skepticism. I ended up proving myself wrong.

Again, please check out the commit history -- especially early commits -- to understand how this went.

JackSlateur9mo ago

Fascinating

It's like cooking with a toddler

The end result has a lower quality than your own potential, it takes more time to be producted, and it is harder too because you always need to supervise and correct what's done

aerhardt9mo ago

> it takes more time to be producted, and it is harder too because you always need to supervise and correct what's done

This is hogwash, the lead dev in charge of this has commented elsewhere that he's saved inordinate amounts of time. He mentioned that he gets about a day a week to code and produced this in under a month, which under those circumstances would've been impossible without LLM assistance.

2 more replies

chrisweekly9mo ago

mods: typo in title "CloudLflare"

mdaniel9mo ago

There is no "@" system here, you are welcome to email hn@ycombinator.com or hope that we're still within the edit window for the title

1 more reply

Squeeeez9mo ago

[flagged]

kentonv9mo ago

What are you talking about? The entire library is 2600 lines. There are no 2500-line methods.

1 more reply

sceptic1239mo ago

If you need to be an expert to use AI tools safely, what does that say about AI tools?

dkdcio9mo ago

Genuinely curious what your point is? Do you know how to use a ventillator? A A timing gun? A tonometer? A keratometer? Can you use all of those in a "production" setting safely without expertise?

2 more replies

j / k navigate · click thread line to collapse

529 comments

rienbdj9mo ago

The commits are revealing.

Look at this one:

> Ask Claude to remove the "backup" encryption key. Clearly it is still important to security-review Claude's code!

I don’t think a non-expert would even know what this means, let alone spot the issue and direct the model to fix it.

victorbjorklund9mo ago

Vinnl9mo ago

At least for me, I'm fairly sure that I'm better at not adding security flaws to my code (which I'm already not perfect at!) than I am at spotting them in code that I didn't write, unfortunately.

1 more reply

otabdeveloper49mo ago

> Still saves a lot of time vs typing everything from scratch

No it doesn't. Typing speed is never the bottleneck for an expert.

As an offline database of Google-tier knowledge, LLM's are useful. Though current LLM tech is half-baked, we need:

a) Cheap commodity hardware for running your own models locally. (And by "locally" I mean separate dedicated devices, not something that fights over your desktop's or laptop's resources.)

b) Standard bulletproof ways to fine-tune models on your own data. (Inference is already there mostly with things like llama.cpp, finetuning isn't.)

3 more replies

XCSme9mo ago

> Still saves a lot of time vs typing everything from scratch.

In my experience, it takes longer to debug/instruct the LLM than to write it from scratch.

1 more reply

zx80809mo ago

> An expert prompts it and checks the code. Still saves a lot of time vs typing everything from scratch.

It's a lie. The marketing one, to be specific, which makes it even worse.

1 more reply

noone_youknow9mo ago

827a9mo ago

2 more replies

0points9mo ago

I really don't agree with the idea that expert time would just be spent typing, and I'd be really surprised if that's the common sentiment around here.

An expert reasons, plans ahead, thinks and reasons a little bit more before even thinking about writing code.

If you are measuring productivity by lines of code per hour then you don't understand what being a dev is.

2 more replies

dismalaf9mo ago

> Still saves a lot of time vs typing everything from scratch

Probably very language specific. I use a lot of Ruby, typing things takes no time it's so terse. Instead I get to spend 95% of my time pondering my problems (or prompting the LLM)...

2 more replies

signa119mo ago

> ... Still saves a lot of time vs typing everything from scratch ...

how ? the prompts have still to be typed right ? and then the output examined in earnest.

3 more replies

blinded9mo ago

Sure! But over half the fun of coding is writing and learning.

i5heu9mo ago

Revealing against what?

If you look at the README it is completely revealed... so i would argue there is nothing to "reveal" in the first place.

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

JW_000009mo ago

I think OP meant "revealing" as in "enlightening", not as "uncovering something that was hidden intentionally".

rienbdj9mo ago

> Revealing against what?

Revealing of what it is like working with an LLM in this way.

kortilla9mo ago

Revealing the types of critical mistakes LLMs make. In particular someone that didn’t already understand OAuth likely would not have caught this and ended up with a vulnerable system.

risyachka9mo ago

Thats the biggest issue I see. In most cases I don't use llm because DIYing it takes less time than prompting/waiting/checking every line.

3 more replies

throwaway20379mo ago

PeterStuer9mo ago

bootsmann9mo ago

There is also one quite early in the repo where the dev has to tell Claude to store only the hashes of secrets

kentonv9mo ago

Yeah I was disappointed in that one.

I hate to say, though, but I have reviewed a lot of human code in my time, and I've definitely caught many humans making similar-magnitude mistakes. :/

hn_throwaway_999mo ago

1 more reply

jjcm9mo ago

Most interesting aspect of this is it likely learned this pattern from human-written code!

1 more reply

jofzar9mo ago

I know I'm preaching to the masses here, but isn't this why PR are so important?

bananapub9mo ago

toofy9mo ago

> …removing expert humans from the loop is the deeply stupid thing the Tech Elite Who Want To Crush Their Own Workforce…

this is completely expected behavior by them. departments with well paid experts will be one of the first they’ll want to cut. in every field. experts cost money.

in every industry in every field, those will be jobs cut first. move fast and break things.

hn_throwaway_999mo ago

I think it's a critically important observation.

ActionHank9mo ago

But AIbros will be running around telling everyone that Claude invented OAuth for Cloudflare all on its own and then opensourced it.

october81409mo ago

It's a Jr Developer that you have to check all their code over. To some people that is useful. But you're still going to have to train Jr Developers so they can turn into Sr Developers.

PeterStuer9mo ago

I don't like the jr dev analogy. It neither has the same weaknesses nor the same strenghts.

It's more like the genious coworker that has an overassertive ego and sometimes shows up drunk, but if you know how to work with them and see past their flaws, can be a real asset.

1 more reply

Cthulhu_9mo ago

1 more reply

paxys9mo ago

The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

kentonv9mo ago

It took me a few days to build the library with AI.

I estimate it would have taken a few weeks, maybe months to write by hand.

That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.

srhtftw9mo ago

> It took me a few days to build the library with AI. ... > I estimate it would have taken a few weeks, maybe months to write by hand.

> That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a clear API spec.

Still, this allowed you to complete in a month what may have taken two. That's a remarkable feat considering the time and value of someone of your caliber.

3 more replies

michelsedgh9mo ago

The fascinating part is that each person is finding their own way of using these tools from kids to elders and everyone in between no matter what your background or language or whatever is

1 more reply

9dev9mo ago

I’m going to take a very close look at your code base :)

[0] https://github.com/colibri-hq/colibri/blob/next/packages/oau...

nipah9mo ago

Your estimation maybe right, but maybe also there is a point on why it is right: https://neilmadden.blog/2025/06/06/a-look-at-cloudflares-ai-...

Maybe because (and I'm quoting that article) it is still lacking in what it should have that you managed to accomplish this task in "few days" instead of "a few weeks, maybe months".

I think it's very hard to estimate those other aspects of the thing.

upstairs-war9mo ago

Thanks kentonv. I picked up where you left off, supported with oauth2.1 rfc, and integrated ms oauth to our internal mcp server. Cool to have Claude be business aware

jdbohrman9mo ago

https://github.com/jdbohrman-tech/hermetic-mls https://github.com/jdbohrman-tech/roselite

I think it's funny that Roselite caused a huge meltdown to the Veilid team simply because they have a weird adamancy to no AI assistance. They even called it "plagiarism"

graeme9mo ago

>I have found AI incredibly useful when I jump into other people's complex codebases, that I'm not familiar with. I now feel like I'm comfortable doing that

This makes sense. Are there codebases where you find this doesn't work as well, either from the codebase's min required context size or the code patterns not being in the training data?

1 more reply

aprilthird20219mo ago

Matches my experiences well. Making changes to large, complex codebases I know well? Teaching the AI to get up to speed with me takes too much time.

Code I know nothing about? AI is very helpful there

philipwhiuk9mo ago

> Though, people who don't know the codebase as well as I do have reported it helped them a lot.

My problem I guess is that maybe this is just Dunning-Kruger esq. When you don't know what you don't know you get the impression it's smart. When you do, you think it's rubbish.

Like when you see a media report on a subject you know about and you see it's inaccurate but then somehow still trust the media on a subject you're a non-expert on.

3 more replies

gokhan9mo ago

> Not software engineers being kicked out ... but rather experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them.

But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

And the author is implementing a fairly technical project in this case. How about routine LoB app development?

thewebguyd9mo ago

> But what if you only need 2 kentonv's instead of 20 at the end? Do you assume we'll find enough new tasks that will occupy the other 18? I think that's the question.

This is likely where all this will end up. I have doubts that AI will replace all engineers, but I have no doubt in my mind that we'll certainly need a lot less engineers.

The role still exists, but the quantity needed is drastically reduced. The work that I do now by myself would have needed an entire team before AWS/Ansible/Terraform, etc.

5 more replies

paxys9mo ago

1 more reply

simonw9mo ago

I guess I have trouble emphasizing with "But what if you only need 2 kentonv's instead of 20 at the end?" because I'm an open source oriented developer.

What's open source for if not allowing 2 developers to achieve projects that previously would have taken 20?

bigstrat20039mo ago

> but rather experienced engineers using AI to generate bits of code and then meticulously testing and reviewing them.

motorest9mo ago

> My problem is that (in my experience anyways) this is slower than me just writing the code myself.

If you take the time to learn how to gently prompt a LLM into doing what you need, you'll find out it makes you far more productive.

JimDabell9mo ago

> My problem is that (in my experience anyways) this is slower than me just writing the code myself.

1 more reply

uludag9mo ago

I feel this is on point. So not only is there the time lost correcting and testing AI generated code, but there's also the mental model you build of the code when you write it yourself.

dkdcio9mo ago

> The million dollar (perhaps literally) question is – could @kentonv have written this library quicker by himself without any AI help?

motorest9mo ago

2 more replies

necovek9mo ago

When you are not introducing a new pattern in the code structure, it's mostly copy-paste and then edit.

But it's also extremely rare, so a pretty high bar to be able to benefit from tools like AI.

0xbadcafebee9mo ago

The million dollar question is, what are the unintended, unpredicted consequences of developing this way?

We won't know for sure what the consequences are for a while. But there will be consequences.

stackskipton9mo ago

>experienced engineers using AI to generate bits of code and then meticulously reviewing and testing them

And where are supposed to get experienced engineers if replaced all Jr Devs with AI? There is a ton of benefit from drudgery of writing classes even if seems like grunt work at the time.

motorest9mo ago

petersellers9mo ago

2 more replies

belter9mo ago

The million-dollar question is not whether you can review at the speed the model is coding. It is whether you can trust review alone to catch everything.

pton_xd9mo ago

> Human coders prevent many bugs by thinking during assembly.

1 more reply

chrisweekly9mo ago

THIS.

IMHO more rigorous test automation (including fuzzing and related techniques) is needed. Actually that holds whether AI is involved or not, but probably more so if it is.

Shorn9mo ago

And yet, doors still fall off airplanes without any AI in sight.

jstummbillig9mo ago

kypro9mo ago

Why would a human review the code in a few years when AI is far better than the average senior developer? Wouldn't that be as stupid as a human reviewing stockfish's moves in Chess?

danans9mo ago

> Not software engineers being kicked out and some business person pressing a few buttons to have a fully functional app (as is playing out in a lot of fantasies on LinkedIn & X)

hooverd9mo ago

AI is great for undifferentiated heavy lifting and surfacing knowledge, but by the time I've made all the decisions, I can just write the code that matters myself there.

tkiolp49mo ago

Why is speed important in this context? If the code is published one week/month later, would that affect what exactly? It’s open source.

kentonv9mo ago

As it happens, if this were released a month later, it would have been a huge loss for us.

This OAuth library is a core component of the Workers Remote MCP framework, which we managed to ship the day before the Remote MCP standard dropped.

And because we were there and ready for customers right at the beginning, a whole lot of people ended up building their MCP servers on us, including some big names:

https://blog.cloudflare.com/mcp-demo-day/

(Also if I had spent a month on this instead of a few days, that would be a month I wasn't spending on other things, and I have kind of a lot to do...)

c-linkage9mo ago

I very much appreciate the fact that the OP posted not just the code developed by AI but also posted the prompts.

Getting a chance to see the prompts shows I'm not actually that far off.

8-prime9mo ago

This is something that I have noticed as well. As soon as you venture into somewhat obscure fields, the output quality of LLMs drastically drops in my experience.

Side note, reverse engineering SAP ABAP sounds torturous.

1 more reply

theshrike799mo ago

The usual solution is a multi-tiered one.

First you use any LLM with a large context to write down the plan - preferably in a markdown file with checkboxes "- [ ] Task 1"

mtlynch9mo ago

>In all seriousness, two months ago (January 2025), I (@kentonv) would have agreed.

I'm confused by "I (@kentonv)" means here because kentonv is a different user.[0] Are you saying this is your alt? Or is this a typo/misunderstanding?

Edit: Figured out that most of your post is quoting the README. Consider using > and * characters to clarify.

[0] https://news.ycombinator.com/user?id=kentonv

kentonv9mo ago

He is quoting from the project readme. I wrote all this text.

mdaniel9mo ago

Thanks for weighing in here

2 more replies

diggan9mo ago

It's a literal copy-paste from the README, I think it was supposed to be quoted but parent messed it up somehow.

https://github.com/cloudflare/workers-oauth-provider/blob/fe...

dang9mo ago

(this comment was originally a reply to https://news.ycombinator.com/item?id=44159167, which summarized the readme in a confusing way.)

jes51999mo ago

I’ve been using Claude (via Cursor) on a greenfield project for the last couple months and my observation is:

1. I am much more productive/effective

2. It’s way more cognitively demanding than writing code the old-fashioned way

3. Even over this short timespan, the tools have improved significantly, amplifying both of the points above

pton_xd9mo ago

This mirrors my experience and those I've talked to.

LLM assisted coding is a way to get stuff done much faster, at a greatly increased mental cost / energy spent. Oddly enough.

piker9mo ago

Painful, but effective?

chii9mo ago

> Oddly enough.

Not to mention the need to also critically look at the generated code to ensure it's actual correctness (hopefully this can also be helped/offloaded by an ai in the future).

diggan9mo ago

> It’s way more cognitively demanding than writing code the old-fashioned way

How are you using it?

jes51999mo ago

1 more reply

SkyPuncher9mo ago

> 2. It’s way more cognitively demanding than writing code the old-fashioned way

Funnily, enough, I find the exact opposite. I feel so much relief that I don't have to waste time figuring out every, single detail. It frees me up to focus on architectural and higher level changes.

jes51999mo ago

I guess what I mean is, I found the details sort of “mindless” before. Code that I could write in my sleep. Now I only have to do the thinky parts

layer89mo ago

This means that you fully trust the LLM to get the details right.

infinitebattery9mo ago

From this commit: https://github.com/cloudflare/workers-oauth-provider/commit/...

===

"Fix Claude's bug manually. Claude had a bug in the previous commit. I prompted it multiple times to fix the bug but it kept doing the wrong thing.

So this change is manually written by a human.

I also extended the README to discuss the OAuth 2.1 spec problem."

===

This is super relatable to my experience trying to use these AI tools. They can get halfway there and then struggle immensely.

diggan9mo ago

> They can get halfway there and then struggle immensely.

Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.

It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.

int_19h9mo ago

3 more replies

dingnuts9mo ago

Also, each try costs money! You're pulling the lever on a god damned slot machine!

1 more reply

eikenberry9mo ago

I thought Claude still has a problem generating the same output for the same input? That you can't just rewind and rerun and get to the same point again.

2 more replies

viktorcode9mo ago

It can be done, but for my environment the sum of all prompts that I end up typing to get the right result ends up being longer than the actual code.

So now I'm using LLMs as crapshoot machines for generating ideas which I then implement manually

krooj9mo ago

nicce9mo ago

I am waiting for studies whether we have just an illusion of production or these actually save man hours in the long term in creation of production-level systems.

arendtio9mo ago

One way to mitigate the issue is to use tests or specifications and let the AI find a solution to the spec.

fxnn9mo ago

That's one thing where I love Golang. I just tell Aider to `/run go doc github.com/some/package`, and it includes the full signatures in the chat history.

It's true: often enough AI struggles to use libraries, and doesn't remember the usage correctly. Simply adding the go doc fixed that often.

mysterydip9mo ago

This to me is why I think these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data.

diggan9mo ago

> these tools don't have actual understanding, and are instead producing emergent output from pooling an incomprehensibly large set of pattern-recognized data

2 more replies

nisegami9mo ago

Same. But I personally find it a lot easier to do those bits at the end than to begin from a blank file/function, so it's a good match for me.

SkyPuncher9mo ago

Same here. Sometimes you just need time to stew in the problem/solution space.

LLMs let me be ultraproductive upfront then come in at the end to clean up when I have a full understanding.

jauntywundrkind9mo ago

> Again, please check out the commit history -- especially early commits -- to understand how this went.

Direct link to earliest page of history: https://github.com/cloudflare/workers-oauth-provider/commits...

aeneas_ory9mo ago

In my view this is an antipattern of AI usage and „roll your own crypto“ reborn.

simonw9mo ago

The most clearly Claude-written commits are on the first page, this link should get you to them: https://github.com/cloudflare/workers-oauth-provider/commits...

declan_roberts9mo ago

Getting a "Too Many Requests" error is kind of hilarious given the company involved.

rcastellotti9mo ago

same

_tqr39mo ago

They’ll probably get better, but for now I can safely say I’ve spent more time building and tweaking prompts than getting helpful results.

diggan9mo ago

Eventually you'll build up a somewhat reusable template you can use as a system prompt to guide it exactly how you want.

vaidhy9mo ago

I think the discussions are also missing another key element. The time in takes to read someone else code is way more mentally tiring.

I can see myself using LLMs for short snippets rather than start something top down.

qsort9mo ago

Luker889mo ago

Hello Cloudflare, impressive result, I did not think things were this advanced.

Still, legal question where I'd like to be wrong: AFAIK (and IANAL) if I use AI to generate images, I can't attach copyright to it.

But the code here is clearly copyrighted to you.

Is that possible because you manually modify the code?

How does it work in examples like this one where you try to have close to all code generated by AI?

kentonv9mo ago

I am also not a lawyer, but I believe the law here is yet to be fully settled. Here in the US, there have been lower-court rulings but surely it will go to the supreme court.

There are parts of the library that I did write by hand, which are presumably copyright Cloudflare either way. As for whether the AI-generated parts are, I guess we'll see.

But given the thing is MIT-licensed, it doesn't seem like it matters much in this case?

dvrp9mo ago

Did you check the latest documents from copyright.gov? They’re interesting exactly because of what you’re saying

Luker889mo ago

I did not, especially seeing as I am not from the USA, so I'd like to have the point of view of a multinational company

Things sounds confusing to me, which is why I'm asking

fastball9mo ago

I believe you are wrong about AI-generated images as well.

kentonv9mo ago

I'm the author of this library! Or uhhh... the AI prompter, I guess...

I'm also the lead engineer and initial creator of the Cloudflare Workers platform.

--------------

https://blog.cloudflare.com/remote-model-context-protocol-se...

https://developers.cloudflare.com/agents/guides/remote-mcp-s...

--------------

OK, personal commentary.

As mentioned in the readme, I was a huge AI skeptic until this project. This changed my mind.

davidwu9mo ago

Thanks for so meticulously documenting the prompts you used and whether or not a commit was done manually or via the AI.

rethab9mo ago

Fancy! Why are the first twenty commits or so created in the same minute though? Surely you can’t be that fast if you need to prompt for each commit

kentonv9mo ago

1 more reply

tveita9mo ago

Some examples of prompt exchanges that seem representative:

https://claude-workerd-transcript.pages.dev/oauth-provider-t... ("Total cost: $6.45")!

https://github.com/cloudflare/workers-oauth-provider/commit/...

The first transcript includes the cost, would be interesting to know the ballpark of total Claude spend on this library so far.

antirez: https://antirez.com/news/144#:~:text=Yesterday%20I%20needed%...

tptacek: https://news.ycombinator.com/item?id=44163292

kentonv9mo ago

I didn't keep extract track but I'd estimate the total cost of Claude credits to build this library was somewhere around $50, which is pretty negligible compared to the time saved.

multimoon9mo ago

I think this reinforces that “vibecoding” is silly and won’t survive. It still needed immensely skilled programmers to work with it and check its output, and fix several bugs it refused to fix.

NitpickLawyer9mo ago

This isn't vibecoding. This is LLM-assisted coding.

subarctic9mo ago

Izkata9mo ago

No. Vibe coding is never even looking at the code and using the LLM as the only interface.

mmaunder9mo ago

weinzierl9mo ago

"I thoughts LLMs were glorified Markov chain generators"

"the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked."

These two views are by no means mutually exclusive. I find LLMs extremely useful and still believe they are glorified Markov generators.

The take away should be that that is all you need and humans likely are nothing more than that.

kentonv9mo ago

I suppose it's all a continuum and we can each have different opinions on what the threshold for "glorified markov generator" is.

palata9mo ago

> It really did "understand" the meaning of the code by any definition that makes sense to me.

I find it dangerous to say it "understands". People are fast to say it "is sentient by any definition that makes sense to them".

Also, would we say that a compiler "understands" the meaning of the code?

smallnix9mo ago

> humans likely are nothing more than that

Relevant post: https://news.ycombinator.com/item?id=44089156

bufferoverflow9mo ago

> I find LLMs extremely useful and still believe they are glorified Markov generators.

Then you should be able to make a markov chain generator without deep neural nets, and it should be on the same level of performance as current LLMs.

But we both know you can't.

ronsor9mo ago

You can, but it will require far more memory than any computer has.

Flemlo9mo ago

The way the input doesn't match the output should imply that it's not just statistics.

As soon as compression happens, optimization happens which can lead to rules/learning of principles which got feed by statistics.

immibis9mo ago

That's "just" more statistics though.

1 more reply

hattmall9mo ago

I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

As an edit, after reading some of the prompts, what is the likelihood that a non-expert could even come up with those prompts?

The really really interesting thing would be if an AI could actually generate the prompts.

kentonv9mo ago

(I'm the author of this library -- or, the guy who prompted the AI at least.)

diggan9mo ago

> I think we are in desperate need of safe vibe coding environments where code runs in a sandbox with security policies that make it impossible to screw up.

freedomben9mo ago

What tools did you use for the vibe coding an Android app? And was it able to do the UI stuff too?

1 more reply

aerhardt9mo ago

Bro, you're still an engineer at Cloudflare!

rangerelf9mo ago

> I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

To me that's the danger of AI, not it's purported intelligence, but our manifested greed.

1 more reply

dkdcio9mo ago

nisegami9mo ago

GP is just quoting the readme, they aren't the author.

My 2 cents:

>I guess for me the questions is, at what point do you feel it would be reasonable to this without the experts involved in your case?

>The really really interesting thing would be if an AI could actually generate the prompts.

freedomben9mo ago

On a meta-note, it's (seriously) kind of refreshing to see that other people make this same typo when trying to type Cloudflare. I also often write CLoudflare, Cloudlfare, and Cloudfare:

> Cloudlflare builds OAuth with Claude and publishes all the prompts

gcr9mo ago

This library has some pretty bad security bugs. For example, the author forgot to check that redirect_uri — matches one of the URLs listed during client registration.

The CVE is uncharacteristically scornful: https://nvd.nist.gov/vuln/detail/cve-2025-4143

I’m glad this was patched, but it is a bit worrying for something “not vibe coded” tbh

eGQjxkKF6fif9mo ago

Looking at all of these arguments and viewpoints really is something to witness.

Vibe programming has allowed me to break through depression and edit and code the way I know how to do; it is a helpful and very meaningful to me. I hope that, it can be meaningful for others.

Not a single comment in here is about people traumatized, broken, depressed, or have a legitimate reason for vibe coding.

Good job, Cloudflare.

nop_slide9mo ago

It literally says in the post it’s not “vibe coded”. That has a very specific meaning of not reviewing the code at all and accepting everything.

lapcat9mo ago

They want to turn artists and craftsmen into assembly line supervisors.

chii9mo ago

> They want to turn artists and craftsmen into assembly line supervisors.

the same was uttered by blacksmiths and other craftsman who has been displaced by technology. Yet they are mercilessly crushed.

1 more reply

mattgreenrocks9mo ago

The denial of agency to devs via prescriptive LLM edicts will only end badly.

1 more reply

zeroq9mo ago

Holy cow!

My latest try with Gemini went like this:

  - Write me a simple todo app on CloudFlare with auth0 authentication.
  - Let's proceed with a simple todo app on CloudFlare. We start by importing the @auth0-cloudflare and...
  - Does that @auth0-cloudflare actually exists?
  - Oh, it doesn't. I can give you a walkthrough on how to set up an account on auth0. Would you like me to?
  - Yes, please.
  - Here. I'm going to write the walkthrough in a document... (proceed to create an empty document)
  - That seems to be an empty document.
  - Oh, my bad. I'll produce it once more. (proceed to create another empty document)
  - Seems like you're md parsing library is broken, can you write it in chat instead?
  - Yes... (Your Gemini trial has expired. Would you like to pay $100 to continue?)

Sure, it was a short hike before my trial expired, and kentov himself admited it took him couple of days to put it together, but... holy cow.

thih99mo ago

Congrats and thanks for sharing, both the code and the story.

Which Claude plan did you use? Was it enough or did you feel limited by the quotas?

kentonv9mo ago

This was mostly Claude Code, which runs on API credits. I think I spent a two-digit number of dollars. The model was Sonnet 3.7 (this was all a couple months ago, before Claude 4).

alanfranz9mo ago

Carefully reviewed greenfield project; I don’t think this is astonishing, and I very much love they recorded the prompts.

Question is: will this work for non-greenfield projects as well? Usually 95% of work in a lifetime is not greenfield.

Or will we throw away more and more code as we go, since AI will rewrite it, and we’ll probably introduce subtle bugs as we go?

DaiPlusPlus9mo ago

> Question is: will this work for non-greenfield projects as well?

> since AI will rewrite it

That depends if you believe in documentation-first or program-first definitions of a specification.

ookblah9mo ago

skydhash9mo ago

2 more replies

nipah9mo ago

> This will only get better with time.

Prove it.

skybrian9mo ago

Looking at the commit history, there’s a fair bit of manual intervention to fix bugs and remove unused code.

wooque9mo ago

Not surprised, this is perfect task for AI, boilerplaty code that implements something that is implemented 100 times. And it's small project, 1200 lines of pure code.

I'm surprised I took them more than 2 days to do that with AI.

dang9mo ago

We changed the URL from https://github.com/cloudflare/workers-oauth-provider/commits... to the project page.

abroadwin9mo ago

Oh hey, looks like it's mostly Kenton Varda, who you may recognize from his LAN party house: https://news.ycombinator.com/item?id=42156977

davidjfelix9mo ago

Or Cap'n'Proto, Protobuf, Cloudflare workers, Cloudflare Durable Objects. The LAN house is cool too.

EtienneK9mo ago

> This is a TypeScript library that implements the provider side of the OAuth 2.1 protocol with PKCE support.

What is the "provider" side? OAuth 2.1 has no definition of a "provider". Is this for Clients? Resource Servers? Authorization Server?

Quickly skimming the rest of the README it seems this is for creating a mix of a Client and a Resource Server, but I could be mistaken.

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs

Experience with the RFCs but have not been able to correctly name it.

kentonv9mo ago

> Experience with the RFCs but have not been able to correctly name it.

1 more reply

DaiPlusPlus9mo ago

> OAuth 2.1 has no definition of a "provider"

cyberax9mo ago

And OAuth is not particularly hard to implement, I did that a bunch of times (for server and the client side). It's well-specified and so it fits well for LLMs.

So it's probably more like 2x acceleration for such code? Not bad at all!

nipah9mo ago

globular-toast9mo ago

Should I be impressed? Oauth already exists and there are countless libraries implementing it. Is it impressive that an LLM can regurgitate yet another one?

jplehmann9mo ago

Fascinating share and discussion. I read many of the comments, but extracted key take-aways using GPT here: https://chatgpt.com/share/6840c9e8-a498-8005-971b-3b91e09b9d... for anyone interested.

scherlock9mo ago

Is it really good form in TypeScript to make all functions async, even when functions don't use await? like this, https://github.com/cloudflare/workers-oauth-provider/blob/fe...

kentonv9mo ago

env.OAUTH_KV.get(...) is reading from Workers KV storage. It returns a promise.

ccorcos9mo ago

No. It’s possible env[?].get returns a promise though

bsder9mo ago

> Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards.

So, for those of us who are not OAuth experts, don't have a team of security engineers on call, and are likely to fall into all the security and compliance traps, how does this help?

I don't need AI to write my shitty code. I need AI to review and correct my shitty code.

baq9mo ago

You want hammers to review your woodwork or your hammering technique, too?

…anyway, Gemini pro is a quite good reviewer if you are specific about what you need reviewed and provide relevant dependencies in the context.

IncreasePosts9mo ago

Has this source been compared with other oauth libraries, to see if it is just license-violating some other open source code it was trained on?

varispeed9mo ago

_pdp_9mo ago

It is a single file with 2630 locs and it is a straightforward problem. 1/3 of the code is just interface definitions and comments.

DJBunnies9mo ago

When Claude can do something new, then I think it will be impressive.

Otherwise it’s just piecing together existing examples.

alienbaby9mo ago

Except, this time it wasn't. It got most things right first time, and fixed things I asked it to.

I was pleasantly surprised.

topspin9mo ago

Been here many times:

    This time Claude fixed the problem, but:
    - It also re-ordered some declarations, even though I told it not to. AFAICT they aren't changed, just reordered, and it also added some doc comments.
    - It fixed an unrelated bug, which is that `getClient()` was marked `private` in `OAuthProvider` but was being called from inside `OAuthHelpers`. I hadn't noticed this before, but it was indeed a bug.

dboreham9mo ago

Tbf I've seen human developers do this and similar irritating things many times.

ZiiS9mo ago

Shouldn't they really have asked it to read https://developers.cloudflare.com/workers/examples/protect-a...

kentonv9mo ago

1 more reply

kiitos9mo ago

Is this not... embarrassing? to the engineers who submit these commits?

It seems that way to me...

Certainly if I were on a hiring panel for anyone who had this kind of stuff in their Google search results, it would be a hard-no from me -- but what do i know?

catigula9mo ago

When I want to spend a dollar or two, it's much faster to just instruct Claude on how to write my code and prompt/correct it than to write it myself.

It feels probably similarly from going from dumb or semi-dumb text editor to an IDE.

zackify9mo ago

ab_testing9mo ago

animex9mo ago

https://github.com/cloudflare/workers-oauth-provider/commits...

Start at the bottom...they are in the commit messages, or sometimes the .md file

paulddraper9mo ago

> To emphasize, this is not "vibe coded". Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

vjerancrnjak9mo ago

I like how it is just 1 file.

Wonder how well incremental editing works with such a big file. I keep pushing for 1 file implementations, yet people split it up into bazillion files because it works better with AI.

weird-eye-issue9mo ago

mehdibl9mo ago

That's great.

But Claude don't allow yet to add APPS in their backend.

Mainly only closed beta for integration.

How you can configure an app to leverage correctly Oauth and have your own app secret ID/ Client ID!

teaearlgraycold9mo ago

> Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.

This sounds like coding but slower

kentonv9mo ago

I would say it ended up being much faster than had I written it by hand. It took a few days to produce this library -- it would almost certainly have taken me weeks to write it myself.

3 more replies

TeMPOraL9mo ago

The point was validating a hypothesis. That is the validation part.

1 more reply

jonplackett9mo ago

This doesn’t seem like much of a surprise that it’s possible - if you are a security expert, you can make LLMs write secure code.

Bluestein9mo ago

From the docs:

> "NOOOOOOOO!!!! You can't just use an LLM to write an auth library!"

> "haha gpus go brrr"

jbeus9mo ago

Getting rate limited…Cloudflare, do you you think you can help with caching and load balancing for github?

rienbdj9mo ago

Why not use an existing OAuth library?

throwaway3141559mo ago

"built OAuth" here means they "implemented OAuth for CloudFlare workers" FYI.

hamdouni9mo ago

Claude is not mentioned in the 'contributors' section.

jwally9mo ago

caycep9mo ago

tbh I would find it annoying to have to go audit someone else (i.e. an LLM's) code...

Also, maybe the humbling question is, maybe we humans aren't so exceptional if 90% of the sum of human knowledge can be predicted by next-word-prediction

horacemorace9mo ago

I did the same thing a few months ago with 4o. This stuff works fine if done with care.

unshavedyak9mo ago

Yup. I'm more skeptic than pro-AI these days, but nonetheless i'm still trying to use AI in my workflows.

With that said i still think we're in the infancy of actual tooling around this stuff though. I'm always interested to see novel UXs on this front.

qsort9mo ago

Probably unrelated to the broader discussion, but I don't think the "skeptic vs pro-AI" distinction even makes that much sense.

For example, I usually come off as being relatively skeptic within the HN crowd, but I'm actually pushing for more usage at work. This kind of "opinion arbitrage" is common with new technologies.

4 more replies

thewebguyd9mo ago

> I don't actually enjoy it, i generally find it difficult to use as i have more trouble explaining what i want than actually just doing it.

cheema339mo ago

> I don't actually enjoy it, i generally find it difficult to use as i have more trouble explaining what i want than actually just doing it.

If you stay with it, you will reach an inflection point soon enough and will actually start enjoying it.

helsinki9mo ago

I don’t see any prompts?

sensanaty9mo ago

The expanded commit messages have the prompts

stego-tech9mo ago

btown9mo ago

4 more replies

abalone9mo ago

I like to make a rough analogy with autonomous vehicles. There's a leveling system from 1 (old school cruise control) to 5 (full automation):

^ This is where Tesla "FSD beta" is at, and probably where LLM codegen tools are at today.

[1] https://www.cnbc.com/2023/11/06/cruise-confirms-robotaxis-re...

1 more reply

diggan9mo ago

> On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things - research, materials science, economies, inventions, etc

2 more replies

svara9mo ago

> On the other hand, where I remain a skeptic is this constant banging-on that somehow this will translate into entirely new things

Really a lot of innovation, even at the very cutting edge, is about combining old things in new ways, and these are great productivity tools for this.

It helps me spend more time designing and thinking about how things should work.

There's still a lot of domain knowledge that goes into using LLMs for coding effectively. I have some stories on this too but that'll be for another day...

TeMPOraL9mo ago

1 more reply

waynenilsen9mo ago

most engineering is glorified plumbing so as far as labour productivity goes, this should go a long way

1 more reply

lincoln20xx9mo ago

In my opinion, there is a very valid argument that the vast majority of things that are patented are not "new" things, because everything builds on something else that came before it.

LLMs may not be able to truly "invent" "new" things, depending on where you place those particular goalposts.

Where am I going with this? I'm not sure.

Right now, it does seem that it takes someone with a lot of knowledge about the specific area, process, or task to get really good results out of LLMs.

I'm also wrong a LOT, so I wouldn't bet a shiny nickel on that.

okthrowman2839mo ago

blibble9mo ago

> I thoughts LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel.

so he's been convinced by it shitting out yet another javascript oauth library?

this experiment proves nothing re: novelty

kentonv9mo ago

1 more reply

ayuhito9mo ago

Good thing most of my tasks don’t require novelty, just working code.

tonyhart79mo ago

same argument with me but only for claude

another models feels like shit to use, but claude is good

yapyap9mo ago

bit of a typo in the title

csmpltn9mo ago

Why are you so surprised an LLM could regurgitate one back? I wouldn’t celebrate this example as a noteworthy achievement…

ThrowawayTestr9mo ago

Could you imagine typing the words "write an oauth library in typescript" into a computer and it actually working even 5 years ago? This is literally science fiction.

1 more reply

bigcat123456789mo ago

Kenton at it again!

arrty889mo ago

im using AI to build a cloudflare replica :)

apwell239mo ago

yes there are plenty of examples of ppl writing tic-tac-toe or a flying simulator with llm all over youtube. what does that prove exactly? oauth is as routine as it gets.

Phiality9mo ago

This is so cool

pier259mo ago

Did you really save time given that every line of code was "thoroughly reviewed"?

dboreham9mo ago

It turns out that "actually understanding" is a fictional concept. It's just the delusion some LLMs (yours and mine) has about what's going on inside itself.

keeda9mo ago

Background:

* It has the advantage of being greenfield, but the drawback of being highly “prototype-y”, so some gnarly, hacky code and a ton of exploratory / one-off programs.

* Caveat: my usage of AI is actually very limited compared to power users (not even on agents yet!), and the true potential is likely far greater than what I've described.

Highlights:

* It does boilerplate extremely easily, but it also handles novel use-cases very well.

Thoughts:

revskill9mo ago

So who's the experts here ?

gregorywegoryOP9mo ago

"NOOOOOOOO!!!! You can't just use an LLM to write an auth library!"

"haha gpus go brrr"

Again, please check out the commit history -- especially early commits -- to understand how this went.

JackSlateur9mo ago

Fascinating

It's like cooking with a toddler

The end result has a lower quality than your own potential, it takes more time to be producted, and it is harder too because you always need to supervise and correct what's done

aerhardt9mo ago

> it takes more time to be producted, and it is harder too because you always need to supervise and correct what's done

2 more replies

chrisweekly9mo ago

mods: typo in title "CloudLflare"

mdaniel9mo ago

There is no "@" system here, you are welcome to email hn@ycombinator.com or hope that we're still within the edit window for the title

1 more reply

Squeeeez9mo ago

[flagged]

kentonv9mo ago

What are you talking about? The entire library is 2600 lines. There are no 2500-line methods.

1 more reply

sceptic1239mo ago

If you need to be an expert to use AI tools safely, what does that say about AI tools?

dkdcio9mo ago

Genuinely curious what your point is? Do you know how to use a ventillator? A A timing gun? A tonometer? A keratometer? Can you use all of those in a "production" setting safely without expertise?

2 more replies

j / k navigate · click thread line to collapse