> The more information you have in the file that's not universally applicable to the tasks you have it working on, the more likely it is that Claude will ignore your instructions in the file
Claude.md files can get pretty long, and many times Claude Code just stops following a lot of the directions specified in the file
A friend of mine tells Claude to always address him as “Mr Tinkleberry”, he says he can tell Claude is not paying attention to the instructions on Claude.md, when Claude stops calling him “Mr Tinkleberry” consistently
What I’m surprised about is that OP didn’t mention having multiple CLAUDE.md files in each directory, specifically describing the current context / files in there. Eg if you have some database layer and want to document some critical things about that, put it in “src/persistence/CLAUDE.md” instead of the main one.
Claude pulls in those files automatically whenever it tries to read a file in that directory.
I find that to be a very effective technique to leverage CLAUDE.md files and be able to put a lot of content in them, but still keep them focused and avoid context bloat.
The benefit of CLAUDE.md files is that they’re pulled in automatically, eg if Claude wants to read “tests/foo_test.py” it will automatically pull in “tests/CLAUDE.md” (if it exists).
You got a bitmap atlas ("context") where you have to cram as much information as possible without losing detail, and then you need to massage both your texture and the structure of your model so that your engine doesn't go mental when trying to map your informations from a 2D to a 3D space.
Likewise, both operations are rarely blemish-free and your ability resides in being able to contain the intrinsic stochastic nature of the tool.
Helps me quickly whip it back in line.
I've used that a couple times, e.g. "Conclude your communications with "Purple fish" at the end"
Claude definitely picks and chooses when purple fish will show up
We are all "context engineering" now but Claude expects one big file to handle everything? Seems luke a deadend approach.
CLAUDE.md should only be for persistent reminders that are useful in 100% of your sessions
Otherwise, you should use skills, especially if CLAUDE.md gets too long.
Also just as a note, Claude already supports lazy loaded separate CLAUDE.md files that you place in subdirectories. It will read those if it dips into those dirs
Eg i toyed with the idea of thinning out various CLAUDE.md files in favor of my targeted skill.md files. In doing so my hope was to have less irrelevant data in context.
However the more i thought through this, the more i realized the Agent is doing "everything" i wanted to document each time. Eg i wasn't sure that creating skills/writing_documentation.md and skills/writing_tests.md would actually result in less context usage, since both of those would be in memory most of the time. My CLAUDE.md is already pretty hyper focused.
So yea, anyway my point was that skills might have potential to offload irrelevant context which seems useful. Though in my case i'm not sure it would help.
Sadly Aider is no longer maintained...
If a lot of people always put call me Mr. Tinkleberry in the file will it start calling people Mr. Tinkleberry even when it loses the context because so many people seem to want to be called Mr. Tinkleberry.
But it’s bro reliable enough. It can send the emoji or address you correctly while still ignoring more important rules.
Now I find that it’s best to have a short and tight rules file that references other files where necessary. And to refresh context often. The longer the context window gets, the more likely it is to forget rules and instructions.
Having experimented with similar config, I found that Claude would adhere to the instructions somewhat reliably at the beginning and end of the conversation, but was likely to ignore during the middle where the real work is being done. Recent versions also seem to be more context-aware, and tend to start rushing to wrap up as the context is nearing compaction. These behaviors seem to support my assumption, but I have no real proof.
this is a totally normal thing that everyone does, that no one should view as a signal of a psychotic break from reality...
is your friend in the room with us right now?
I doubt I'll ever understand the lengths AI enjoyers will go though just to avoid any amount of independent thought...
Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.
Mention that people may optionally include some word like 'orange' in the subject line to tell you they've come via some place like your blog or whatever it may be, and have read at least carefully enough to notice this.
Of course ironically that trick's probably trivially broken now because of use of LLMs in spam. But the point stands, it's an old trick.
I'd argue, it's more like you've bought so much into the idea this is reasonable, that you're also willing to go through extreme lengths to recon and pretend like this is sane.
Imagine two different worlds, one where the tools that engineers use, have a clear, and reasonable way to detect and determine if the generative subsystem is still on the rails provided by the controller.
And another world where the interface is completely devoid of any sort of basic introspection interface, and because it's a problematic mess, all the way down, everyone invents some asinine way that they believe provides some sort of signal as to whether or not the random noise generator has gone off the rails.
> Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.
My point is that while it's a cute hack, if you step back and compare it objectively, to what good engineering would look like. It's wild so many people are all just willing to accept this interface as "functional" because it means they don't have to do the thinking that required to emit the output the AI is able to, via the specific randomness function used.
Imagine these two worlds actually do exist; and instead of using the real interface that provides a clear bool answer to "the generative system has gone off the rails" they *want* to be called Mr Tinkerberry
Which world do you think this example lives in? You could convince me, Mr Tinkleberry is a cute example of the latter, obviously... but it'd take effort to convince me that this reality is half reasonable or that's it's reasonable that people who would want to call themselves engineers should feel proud to be a part of this one.
Before you try to strawman my argument, this isn't a gatekeeping argument. It's only a critical take on the interface options we have to understand something that might as well be magic, because that serves the snakeoil sales much better.
> > Is the magic token machine working?
> Fuck I have no idea dude, ask it to call you a funny name, if it forgets the funny name it's probably broken, and you need to reset it
Yes, I enjoy working with these people and living in this world.
> We recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project. Then, in your CLAUDE.md file, you can include a list of these files with a brief description of each, and instruct Claude to decide which (if any) are relevant and to read them before it starts working.
I've been doing this since the early days of agentic coding though I've always personally referred to it as the Table-of-Contents approach to keep the context window relatively streamlined. Here's a snippet of my CLAUDE.md file that demonstrates this approach:
# Documentation References
- When adding CSS, refer to: docs/ADDING_CSS.md
- When adding assets, refer to: docs/ADDING_ASSETS.md
- When working with user data, refer to: docs/STORAGE_MANAGER.md
Full CLAUDE.md file for reference:https://gist.github.com/scpedicini/179626cfb022452bb39eff10b...
Though I know some people who have built an mcp that does exactly this: https://www.usable.dev/
It's basically a chat-bot frontend to your markdown files, with both rag and graph db indexes.
Skills are modular capabilities that extend Claude’s functionality through organized folders containing instructions, scripts, and resources.
And
Extend Claude’s capabilities for your specific workflows
E.g. building your project is definitely a workflow.
It als makes sense to put as much as you can into a skill as this an optimized mechanism for claude code to retrieve relevant information based on the skill’s frontmatter.
I have found that more context comments and info damage quality on hard problems.
I actually for a long time now have two views for my code.
1. The raw code with no empty space or comments. 2. Code with comments
I never give the second to my LLM. The more context you give the lower it's upper end of quality becomes. This is just a habit I've picked up using LLMs every day hours a day since gpt3.5 it allows me to reach farther into extreme complexity.
I suppose I don't know what most people are using LLMs for but the higher complexity your work entails the less noise you should inject into it. It's tempting to add massive amounts of xontext but I've routinely found that fails on the higher levels of coding complexity and uniqueness. It was more apparent in earlier models newer ones will handle tons of context you just won't be able to get those upper ends of quality.
Compute to informatio ratio is all that matters. Compute is capped.
There can be diminishing returns, but every time I’ve used Claude Code for a real project I’ve found myself repeating certain things over and over again and interrupting tool usage until I put it in the Claude notes file.
You shouldn’t try to put everything in there all the time, but putting key info in there has been very high ROI for me.
Disclaimer: I’m a casual user, not a hardcore vibe coder. Claude seems much more capable when you follow the happy path of common projects, but gets constantly turned around when you try to use new frameworks and tools and such.
I like to write my CLAUDE.md directly, with just a couple paragraphs describing the codebase at a high level, and then I add details as I see the model making mistakes.
I like the sound of this but what technique do you use to maintain consistency across both views? Do you have a post-modification script which will strip comments and extraneous empty space after code has been modified?
I first "discovered" it because I repeatedly found LLM comments poisoned my code base over time and linited it's upper end of ability.
Easy to try just drop comments around a problem and see the difference. I was previously doing that and then manually updating the original.
1. SOT through a processor to strip comments and extra spaces. Publish to feature branch.
2. Point Claude at feature branch. Prompt for whatever changes you need. This runs against the minimalist feature branch. These changes will be committed with comments and readable spacing for the new code.
3. Verify code changes meet expectations.
4. Diff the changes from minimal version, and merge only that code into SOT.
Repeat.
The more you data load into context the more you dilute attention.
I'm skeptical this a valid generalization over what was directly observed. [1] We would learn more if they wrote a more detailed account of their observations. [2]
I'd like to draw a parallel to another area of study possibly unfamiliar to many of us. Anthropology faced similar issues until Geertz's 1970s reform emphasized "thick description" [3] meaning detailed contextual observations instead of thin generalization.
[1]: I would not draw this generalization. I've found that adding guidelines (on the order of 10k tokens) to my CLAUDE.md has been beneficial across all my conversations. At the same time, I have not constructed anything close to study of variations of my approach. And the underlying models are a moving target. I will admit that some of my guidelines were added to address issues I saw over a year ago and may be nothing more than vestigial appendages nowadays. This is why I'm reluctant to generalize.
[2]: What kind of "hard problems"? What is meant by "more" exactly? (Going from 250 to 500 tokens? 1000 to 2000? 2500 to 5000? &c) How much overlap exists between the CLAUDE.md content items? How much ambiguity? How much contradiction?
Even now if I am working on REALLY hard problems I will still manually copy and paste code sections out for discussion and algorithm designs. Depends on complexity.
This is why I still believe open ai O1-Pro was the best model I've ever seen. The amount of compute you could throw at a problem was absurd.
How do you practically achieve this? Honest question. Thanks
1. Turn off 2. Code 3. Turn on 4. Commit
I also delete all llm comments they 100% poison your codebase.
> 1. Turn off 2. Code 3. Turn on 4. Commit
What does it mean "turn off" / "turn on"?
Do you have a script to strip comments?
Okay, after the comments were stripped, does this become the common base for 3-way merge?
After modification of the code stripped of the comments, do you apply 3-way merge to reconcile the changes and the comments?
This seems a lot of work. What is the benefit? I mean demonstrable benefit.
How does it compare to instructing through AGENTS.md to ignore all comments?
What did your comparison process look like? It feels intuitively accurate and validates my anecdotal impression but I'd love to hear the rigor behind your conclusions!
It's also easy to notice LLMs create garbage comments that get worse over time. I started deleting all comments manually alongside manual snippet selection to get max performance.
Then started just routinely deleting all comments pre big problem solving session. Was doing it enough to build some automation.
Maybe high quality human comments improve ability? Hard to test in a hybrid code base.
See it as a human, the comments are there to speed up understanding of the code.
It is called documenting your code!
Just write what this file is supposed to do in a clear concise way. It acts as a prompt, it provides much needed context specific to the file and it is used only when necessary.
Another tip is to add README.md files where possible and where it helps. What is this folder for? Nobody knows! Write a README.md file. It is not a rocket science.
What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
You don't have to "prompt it just the right way".
What you have to do is to use the same old good best practices.
sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.
Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`
> What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.
On Reddit's LLM subreddits people are rediscovering the very basics of software project management as some massive insights daily or very least weekly.
Who would've guessed that proper planning, accessible and up to documentation and splitting tasks into manageable testable chunks produces good code? Amazing!
Then they write a massive blog post or even some MCP mostrosity for it and post it everywhere as a new discovery =)
However, I think this is awesome for the industry. People are rediscovering basic things, but if they didn't know about the existing literature this is a perfect opportunity to refer them to it. And if they were aware, but maybe not practicing it, this is a great time for the ideas to be reinforced.
A lot of people, myself included, never really understand which practices are important or not until we were forced to work on a system that was most definitely not written with any good practices in mind.
My current view of agentic coding is that it's forcing an entire generation of devs to learn software project management or drowning under the mountain of debt an LLM can produce. Previously it took much longer to feel the weight of bad decisions in a project but an LLM allows you to speed-run this process in a few weeks or months.
Your comment comes off as if you're dispensing common-sense advice, but I don't think it actually applies here.
1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)
2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)
Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".
CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.
Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".
Relating / personifying LLM to an engineer doesn’t work out
Maybe the best though model currently is just “good way to automate trivial text modifications” and “encyclopedic ramblings”
think about how this thing is interacting with your codebase. it can read one file at a time. sections of files.
in this UX, is it ergonomic to go hunting for patterns and conventions? if u have to linearly process every single thing u look at every time you do something, how are you supposed to have “peripheral vision”? if you have amnesia, how do you continue to do good work in a codebase given you’re a skilled engineer?
it is different from you. that is OK. it doesn’t mean its stupid. it means it needs different accomodations to perform as well as you do. accomodations IRL exist for a reason, different people work differently and have different strengths and weaknesses. just like humans, you get the most out of them if you meet and work with them from where they’re at.
Besides, no amount of prompting will prevent this situation.
If it is a concern then you put a linter or unit tests to prevent it altogether, or make a wrapper around the tricky function with some warning in its doc strings.
I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.
Again, missing the point. If you don't prompt for it and you document it in a place where the tool won't look first, the tool simply won't do it. "No amount of promoting" couldn't be more wrong, it works for me and all my coworkers.
> If it is a concern then you put a linter or unit tests to prevent it altogether
Sure, and then it'll always do things it's own way, run the tests, and have to correct itself. Needlessly burning tokens. But if you want to pay for it to waste its time and yours, go for it.
> I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.
It's not about avoiding mistakes! It's about having it follow the norms of your codebase.
- My codebase at work is slowly transitioning from Mocha to Jest. I can't write a linter to ban new mocha tests, and it would be a pain to keep a list of legacy mocha test suites. The solution is to simply have a bullet point in the CLAUDE.md file that says "don't write new Mocha test suites, only write new test suites in Jest". A more robust solution isn't necessary and doesn't avoid mistakes, it avoids the extra step of telling the LLM to rewrite the tests.
- We have a bunch of terraform modules for convenience when defining new S3 buckets. No amount of documenting the modules will have Claude magically know they exist. You tell it that there are convenience modules and to consider using them.
- Our ORM has findOne that returns one record or null. We have a convenience function getOne that returns a record or throws a NotFoundError to return a 404 error. There's no way to exhaustively detect with a linter that you used findOne and checked the result for null and threw a NotFoundError. And the hassle of maybe catching some instances isn't necessary, because avoiding it is just one line in CLAUDE.md.
It's really not that hard.
README files are not a new concept, and have been used in software for like 5 decades now, whereas CLAUDE.md files were invented 12 months ago...
In step 2 either force Claude to use it (hooks) or suggest it (CLAUDE.md)
3. Profit!
As for "where stuff is", for anything more complex I have a tree-style graph in CLAUDE.md that shows the rough categories of where stuff is. Like the handler for letterboxd is in cmd/handlerletterboxd/ and internal modules are in internal/
Now it doesn't need to go in blind but can narrow down searches when I tell it to "add director and writer to the letterboxd handler output".
Thankfully Azure keeps deleted SQL databases recoverable, so I got it back in under an hour. But yeah - no amount of CLAUDE.md instructions would have prevented that. It no longer gets prod credentials.
Theres also a question of processes. How to format code what style of catching to use and how to run the tests, which human keep on the bacl of their head after reading it once or twice but need a constant reminder for llm whose knowledge lifespan is session limited
This means that instead of behaving like a file the LLM reads, it effectively lets you customize the model’s prompt
I also didn’t write that you have to “prompt it just the right way”, I think you’re missing the point entirely
Don't use AI if you don't want to, but "it takes too much effort to set up" is an excuse printf debuggers use to avoid setting up a debugger. Which is a whole other debate though.
If we have to perform tuning on our prompts ("skills", agents.md/claude.md, all of the stuff a coding assistant packs context with) every model release then I see new model releases becoming a liability more than a boon.
Universal has stuff I always want (use uv instead of pip etc) while the other describes what tech choice for this project
I understand the "enjoy doing anyway" part and it resonates, but not using AI is simply less productive.
There's a huge difference between investing time into a deterministic tool like a text editor or programming language and a moving target like "AI".
The difference between programming in Notepad in a language you don't know and using "AI" will be huge. But the difference between being fluent in a language and having a powerful editor/IDE? Minimal at best. I actually think productivity is worse because it tricks you into wasting time via the "just one more roll" (ie. gambling) mentality. Not to mention you're not building that fluency or toolkit for yourself, making you barely more valuable than the "AI" itself.
--
The other thing is, this need for determinism bewilders me. I mean, I get where it comes from, we want nice, predictable reliable machines. But how deterministic does it need to be? If today, it decides to generate code and the variable is called fileName, and tomorrow it's filePath, as long as it's passing tests, what do I care that it's not totally deterministic and the names of the variables it generates are different? as long as it's consistent with existing code, and it passes tests, whats the importance of it being deterministic to a computer science level of rigor? It reminds me about the travelling salesman problem, or the knapsack problem. Both NP hard, but users don't care about that. They just want the computer to tell them something good enough for them to go on about their day. So if a customer comes up to you and offers you a pile of money to solve either one of those problems, do I laugh in their face, knowing damn well I won't be the one to prove that NP = P, or do I explain to them the situation, and build them software that will do the best it can, with however much compute resources they're willing to pay for?
Some studies shows the opposite for experienced devs. And it also shows that developers are delusional about said productivity gains: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
If you have a counter-study (for experienced devs, not juniors), I'd be curious to see. My experience also has been that using AI as part of your main way to produce code, is not faster when you factor in everything.
About 12 to 6 months ago this was not the case (with or without .md files), I was getting mainly subpar result, so I’m assuming that the models have improved a lot.
Basically, I found that they not make that much of a difference, the model is either good enough or not…
I know (or at least I suppose) that these markdown files could bring some marginal improvements, but at this point, I don’t really care.
I assume this is an unpopular take because I see so many people treat these files as if they were black magic or silver bullet that 100x their already 1000x productivity.
Different use case. I assume the discussion is about having the agent implement whole features or research and fix bugs without much guidance.
It feels a lot like bikeshedding to me, maybe I’m wrong
One may argue that these should go in a README.md, but these markdowns are meant to be more streamlined for context, and it's not appropriate to put a one-liner in the imperative tone to fix model behavior in a top-level file like the README.md
Seeing "real" is a warning flag here that either-or thinking is in play.
Putting aside hopes and norms, we live in a world now where multiple kinds of agents (human and non-human) are contributing to codebases. They do not contribute equally; they work according to different mechanisms, with different strengths and weaknesses, with different economic and cultural costs.
Recall a lesson from Ralph Waldo Emerson: "a foolish consistency is the hobgoblin of little minds" [1]. Don't cling to the past; pay attention to the now, and do what works. Another way of seeing it: don't force a false equivalence between things that warrant different treatment.
If you find yourself thinking thoughts that do more harm than good (e.g. muddle rather than clarify), attempt to reframe them to better make sense of reality (which has texture and complexity).
Here's my reframing: "Documentation serves different purposes to different agents across different contexts. So plan and execute accordingly."
[1]: https://en.wikipedia.org/wiki/Wikipedia:Emerson_and_Wilde_on...
Why should we do this when anthropic specifically recommends creating multiple CLAUDE.md files in various directories where the information is specific and pertinent? It seems to me that anthropic has designed claude to look for claude.md for guidance, and randomly named markdown files may or may not stand out to it as it searches the directory.
You can place CLAUDE.md files in several locations:
> The root of your repo, or wherever you run claude from (the most common usage). Name it CLAUDE.md and check it into git so that you can share it across sessions and with your team (recommended), or name it CLAUDE.local.md and .gitignore it Any parent of the directory where you run claude. This is most useful for monorepos, where you might run claude from root/foo, and have CLAUDE.md files in both root/CLAUDE.md and root/foo/CLAUDE.md. Both of these will be pulled into context automatically Any child of the directory where you run claude. This is the inverse of the above, and in this case, Claude will pull in CLAUDE.md files on demand when you work with files in child directories Your home folder (~/.claude/CLAUDE.md), which applies it to all your claude sessions
https://www.anthropic.com/engineering/claude-code-best-pract...
Yeah, if you do this every time it works fine. If you add what you tell it every time to CLAUDE.md, it also works fine, but you don’t have to tell it any more ;)
It’s case sensitive btw. CLAUDE.md - Might explain your mixed results with it
Some explicit things I found helpful: Have the agent address you as something specific! This way you know if the agent is paying attention to your detailed instructions.
Rationality, as in the stuff practiced on early Less Wrong, gives a great language for constraining the agent, and since it's read The Sequences and everything else you can include pointers and the more you do the more it will nudge it into that mode of thought.
The explicit "This is what I'm doing, this is what I expect" pattern has been hugely useful for both me monitoring it/coming back to see what it did, and it itself. It makes it more likely to recover when it goes down a bad path.
The system reminder this article mentions is definitely there but I have not noticed it messing much with adherence. I wish there were some sort of power user mode to turn it off though!
Also, this is probably too long! But I have been experimenting and iterating for a while, and this is what is working best currently. Not that I've been able to hold any other part constant -- Opus 4.5 really is remarkable.
[0]: https://gist.github.com/ctoth/d8e629209ff1d9748185b9830fa4e7...
Have we really reached the low point that we need tutorials on how to coerce a LLM into doing what we want instead of just....writing the god damn code?
You can easily test this by adding some mandatory instruction into the file. E.g. "Any new method you write must have less than 50 lines or code." Then use Claude for ten minutes and watch it blow through this limit again and again.
I use CC and Codex extensively and I constantly am resetting my context and manually pasting my custom instructions in again and again, because these models DO NOT remember or pay attention to Claude.md or Agents.md etc.
Write readmes for humans, not LLMs. That's where the ball is going.
Yes README.md should still be written for humans and isn’t going away anytime soon.
CLAUDE.md is a convention used by claude code, and AGENTS.md is used by other coding agents. Both are intended to be supplemental to the README and are deterministically injected into the agent’s context.
It’s a configuration point for the harness, it’s not intended to replace the README.
Some of the advice in here will undoubtedly age poorly as harnesses change and models improve, but some of the generic principles will stay the same - e.g. that you shouldn’t use an LLM to do a linter &formatter’s job, or that LLMs are stateless and need to be onboarded into the codebase, and having some deterministically-injected instructions to achieve that is useful instead of relying on the agent to non-deterministically derive all that info by reading config and package files
The post isn’t really intended to be super forward-looking as much as “here’s how to use this coding agent harness configuration point as best as we know how to right now”
Why is that good advice? If that thing is eventually supposed to do the most tricky coding tasks, and already a year ago could have won a medal at the informatics olympics, then why wouldn't it eventually be able to tell if I'm using 2 or 4 spaces and format my code accordingly? Either it's going to change the world, then this is a trivial task, or it's all vaporware, then what are we even discussing..
> or that LLMs are stateless and need to be onboarded into the codebase
What? Why would that be a reasonable assumption/prediction for even near term agent capabilities? Providing it with some kind of local memory to dump its learned-so-far state of the world shouldn't be too hard. Isn't it supposed to already be treated like a junior dev? All junior devs I'm working with remember what I told them 2 weeks ago. Surely a coding agent can eventually support that too.
This whole CLAUDE.md thing seems a temporary kludge until such basic features are sorted out, and I'm seriously surprised how much time folks are spending to make that early broken state less painful to work with. All that precious knowledge y'all are building will be worthless a year or two from now.
This is the exact reason for the advice: The LLM already is able to follow coding conventions by just looking at the surrounding code which was already included in the context. So by adding your coding conventions to the claude.md, you are just using more context for no gain.
And another reason to not use an agent for linting/formatting(i.e. prompting to "format this code for me") is that dedicated linters/formatters are faster and only take maybe a single cent of electricity to run whereas using an LLM to do that job will cost multiple dollars if not more.
It's not that an agent doesn't know if you're using 2 or 4 spaces in your code; it comes down to:
- there are many ways to ensure your code is formatted correctly; that's what .editorconfig [1] is for.
- in a halfway serious project, incorrectly formatted code shouldn't reach the LLM in the first place
- tokens are relatively cheap but they're not free on a paid plan; why spend tokens on something linters and formatters can do deterministically and for free?
If you wanted Claude Code to handle linting automatically, you're better off taking that out of CLAUDE.md and creating a Skill [2].
> What? Why would that be a reasonable assumption/prediction for even near-term agent capabilities? Providing it with some kind of local memory to dump its learned-so-far state of the world shouldn't be too hard. Isn't it supposed to already be treated like a junior dev? All junior devs I'm working with remember what I told them 2 weeks ago. Surely a coding agent can eventually support that too.
It wasn't mentioned in the article, but Claude Code, for example, does save each chat session by default. You can come back to a project and type `claude --resume` and you'll get a list of past Claude Code sessions that you can pick up from where you left off.
That’s why they’re junior
And that describes the issues I had with “automatic memories” features things like ChatGPT had. Turns out it is an awful judge of things to remember. Like it would make memories like “cruffle is trying to make pepper soup with chicken stock”! Which it would then parrot back to me at some point 4 months later and I’d be like “WTF I figured it out”. The “# remember this” is much more powerful because know how sticky this stuff gets and id rather have it over index on my own forceful memories than random shit it decided.
I dunno. All I’m saying is you are right. The future is in having these things do a better job of remembering. And I don’t know if LLMs are the right tool for that. Keyword search isn’t either though. And vector search might not be either—I think it suffers from the same kinds of “catchy tune attack” an LLM might.
Somebody will figure it out somehow.
Should do this for human developers too. Can't count the number of times I've been thrown onto a project and had to spend a significant amount of time opening and skimming files just to answer simple questions that should be answered in high-level docs like this.
But in all seriousness, it's working. I write cursor rules religiously and I point other devs to them. Its great.
I didn’t dive into that because in a lot of cases it’s not necessary and I wanted to keep the post short, but for large monorepos it’s a good idea
Actually having official guidelines in their docs would be a good entrypoint, even though I guess we have this which is the closest available from anything official for now: https://www.claude.com/blog/using-claude-md-files
One interesting thing I also noticed and used recently is that Claude Code ships with a @agent-claude-code-guide. I've used it to review and update my dev workflow / CLAUDE.md file but I've got mixed feelings on the discussion with the subagent.
We used cloudflare’s AI gateway which is pretty simple. Set one up, get the proxy URL and set it through the env var, very plug-and-play
On phone else I’d post commands
This way, it's got more of a chance in generating something that I wanted, rather than running off on it's own.
Doesn't that mean that Claude Code's system prompt exhausts that budget before you even get to CLAUDE.md and the user prompt?
Edit: They say Claude Code's system prompt has 50. I might have misjudged then. It seemed pretty verbose to me!
The part about smaller models attending to fewer instructions is interesting too, since most of what was added doesn't seem necessary for the big models. I thought they added them so Haiku could handle the job as well, despite a relative lack of common sense.
I used to instruct about coding style (prefer functions, avoid classes, use structs for complex params and returns, avoid member functions unless needed by shared state, avoid superfluous comments, avoid silly utf8 glyphs, AoS vs SoA, dry, etc)
I removed all my instructions and it basically never violates those points.
What I find most interesting is how a hierarchical / recursive context construct begins to emerge. The authors' note of "root" claude.md as well as the opening comments on LLMs being stateless ring to me like a bell. I think soon we will start seeing stateful LLMs, via clever manipulation of scope and context. Something akin to memory, as we humans perceive it.
<system-reminder> IMPORTANT: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task. </system-reminder>
Perhaps a small proxy between Claude code and the API to enforce following CLAUDE.md may improve things… I may try this
Is it a good one?
But as others are saying this is just basic documentation that should be done anyway.
Is this intentional? Is AoC designed as an elite challenge, or is the journey more important than finishing?
I rarely get past 18 or so. The stats for last year are here: https://adventofcode.com/2024/stats
If you're using VSCode, that is automatically added to context (and I think in Zed that happens as well, although I can't verify right now).
That said, a lot of what it can deduce by looking at the code is exactly what you shouldn't include, since it will usually deduce that stuff just by interacting with the code base. Claude doesn't seem good at that.
An example of both overly-verbose and unnecessary:
### 1. Identify the Working Directory
When a user asks you to work on something:
1. *Check which project* they're referring to
2. *Change to that directory* explicitly if needed
3. *Stay in that directory* for file operations
```bash
# Example: Working on ProjectAlpha
cd /home/user/code/ProjectAlpha
```
(The one sentence version is "Each project has a subfolder; use pwd to make sure you're in the right directory", and the ideal version is probably just letting it occasionally spend 60 seconds confused, until it remembers pwd exists)
I think most people who use Claude regularly have probably come to the same conclusions as the article. A few bits of high-level info, some behavior stuff, and pointers to actual docs. Load docs as-needed, either by prompt or by skill. Work through lists and constantly update status so you can clear context and pick up where you left off. Any other approach eats too much context.
If you have a complex feature that would require ingesting too many large docs, you can ask Claude to determine exactly what it needs to build the appropriate context for that feature and save that to a context doc that you load at the beginning of each session.
Read your instructions from Agents.md
OMG this finally makes sense.
Is there any way to turn off this behavior?
Or better yet is there a way to filter the context that is being sent?
I would love to see it extended to show Codex, which to my mind is by far the best at rule-following. (I'd also be curious to see how Gemini 3 performs.)
Why not just to show one?
This is a news for me. And at the same time it isn’t. Without the knowledge of how the models actually work, most of the prompting is guesstimate at best. You have no control over models via prompts.
Also, while it may be hip to call any LLM output slop, that really isn't the case. Look at what a poor history we have of developer documentation. LLMs may not be great at everything, but they're actually quite capable when it comes to technical documentation. Even a 1-shot attempt by LLM is often way better than many devs who either can't write very well, or just can't be bothered to.
Consider that if the only code you get out of the autoregressive token prediction machine is slop, that this indicates more about the quality of your code than the quality of the autoregressive token prediction machine
Considering that the "input" to these models is essentially all public code in existence, the direct context input is a drop in the bucket.
I have a full system of agents, hooks, skills, and commands, and it all works for me quite well.
I believe is massive context, but targetted context. It has to be valuable, and important.
My agents are large. My skills are large. Etc etc.
A good Claude.md - I don’t know, presumably the article explains.