We put Claude Code in Rollercoaster Tycoon (opens in new tab)

(labs.ramp.com)

526 pointsiamwil2mo ago288 comments

288 comments

I’ve always found it crazy that my LLM has access to such terrible tools compared to mine.

It’s left with grepping for function signatures, sending diffs for patching, and running `cat` to read all the code at once.

I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc.

Is anyone working on making it so LLM’s get better tools for actually writing/refactoring code? Or is there some “bitter lesson”-like thing that says effort is always better spent just increasing the context size and slurping up all the code at once?

nbardy2mo ago

> Claude Code officially added native support for the Language Server Protocol (LSP) in version 2.0.74, released in December 2025.

I think from training it's still biased towards simple tooling.

But also, there is real power to simple tools, a small set of general purpose tools beats a bunch of narrow specific use case tools. It's easier for humans to use high level tools, but for LLM's they can instantly compose the low level tools for their use case and learn to generalize, it's like writing insane perl one liners is second nature for them compared to us.

If you watch the tool calls you'll see they write a ton of one off small python programs to test, validate explore, etc...

If you think about it any time you use a tool there is probably a 20 line python program that is more fit to your use case, it's just that it would take you too long to write it, but for an LLM that's 0.5 seconds

frumplestlatz2mo ago

> but for LLM's they can instantly compose the low level tools for their use case and learn to generalize

Hard disagree; this wastes enormous amounts of tokens, and massively pollutes the context window. In addition to being a waste of resources (compute, money, time), this also significantly decreases their output quality. Manually combining painfully rudimentary tools to achieve simple, obvious things -- over and over and over -- is *not* an effective use of a human mind or an expensive LLM.

Just like humans, LLMs benefit from automating the things they need to do repeatedly so that they can reserve their computational capacity for much more interesting problems.

I've written[1] custom MCP servers to provide narrowly focused API search and code indexing, build system wrappers that filter all spurious noise and present only the material warnings and errors, "edit file" hooks that speculatively trigger builds before the LLM even has to ask for it, and a litany of other similar tools.

Due to LLM's annoying tendency to fall back on inefficient shell scripting, I also had to write a full bash syntax parser and shell script rewriting ruleset engine to allow me to silently and trivially rewrite their shell invocations to more optimal forms that use the other tools I've written, so that they don't have to do expensive, wasteful things like pipe build output through `head`/`tail`/`grep`/etc, which results in them invariably missing important information, and either wandering off into the weeds, or -- if they notice -- consuming a huge number of turns (and time) re-running the commands to get what they need.

Instead, they call build systems directly with arbitrary options, | filters, etc, and magically the command gets rewritten to something that will produce the ideal output they actually need, without eating more context and unnecessary turns.

LLMs benefit from an IDE just like humans do -- even if an "IDE" for them looks very different. The difference is night and day. They produce vastly better code, faster.

[1] And by "I've written", I mean I had an LLM do it.

forty2mo ago

Note that the Claude code LSP integration was actually broken for a while after it was released, so make sure you have a very recent version if you want to try it out.

However as parent comment said, it seems to always grep instead, unless explicitly said to use the LSP tool.

cududa2mo ago

Correct. If you try to create a coding agent using the raw Codex or Claude code API and you build your own “write tool”, and don’t give the model their “native patch tool”, 70%+ of the time it’s write/ patch fails because it tries to do the operation using the write/ patch tool it was trained on.

1 more reply

cm21872mo ago

We are back to RISC vs CISC!

1 more reply

KronisLV2mo ago

> I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc

I am so surprised that all of the AI tooling mostly revolves around VSC or its forks and that JetBrains seem to not really have done anything revolutionary in the space.

With how good their refactoring and code inspection tools are, you’d really think they’d pass of that context information to AI models and that they’d be leaps and bounds ahead.

harikb2mo ago

Recently, all these agents can talk LSP (language server protocol) so it should get better soon. That said, yeah they don't seem to default to use `ripgrep` when that is clearly better than `grep`

2 more replies

eek21212mo ago

Are you? I'm not surprised at all, considering that the biggest investment juggernaut in AI is also the author of VSC. I wonder what the connection is? ;)

1 more reply

penneyd2mo ago

Agreed - this seems like a no brainer, surely this is something that is being worked on.

htrp2mo ago

Jetbrains is trying but I feel like they're very very behind in the space

epicureanideal2mo ago

Claude and other LLMs can be used through JetBrains, and the IDE provides a significantly better experience than VS Code in my opinion.

PlatoIsADisease2mo ago

I haven't seen JetBrains as 'great'. I think they have a strong marketing team that gets into universities and potentially astroturfs on the internet, but I have always found better tools for every language. Although, I can't remember what I ended up choosing for PHP.

mulmboy2mo ago

LLMs aren't like you or me. They can comprehend large quantities of code quickly and piece things together easily from scattered fragments. so go to reference etc become much less important. Of course though things change as the number of usages of a symbol becomes large but in most cases the LLM can just make perfect sense of things via grep.

To provide it access to refactoring as a tool also risks confusing it via too many tools.

It's the same reason that waffling for a few minutes via speech to text with tangents and corrections and chaos is just about as good as a carefully written prompt for coding agents.

fragmede2mo ago

Anthropic, for one.

> Added LSP (Language Server Protocol) tool for code intelligence features like go-to-definition, find references, and hover documentation

https://github.com/anthropics/claude-code/blob/main/CHANGELO...

novaleaf2mo ago

their c# LSP theoretically worked for a week or so (I never saw it in action though), but now it always errors on launch :(

1 more reply

hippo222mo ago

If you can read fast enough, grepping is probably faster than waiting for a compiler to tell you anything.

gf0002mo ago

Faster for worse results, though. Determining the source of a symbol is not as trivial as finding the same piece of text somewhere else, it should also reliably be able to differentiate among them. What better source for that then the compiler itself?

2 more replies

fancy_pantser2mo ago

Zed Editor gives the LLM tools that use the LSP as you'd expect as a normal IDE user, like "go to symbol definition" so it greps a lot less.

selcuka2mo ago

JetBrain IDEs come with an MCP server that supports some refactoring tools [1]:

> Starting with version 2025.2, IntelliJ IDEA comes with an integrated MCP server, allowing external clients such as Claude Desktop, Cursor, Codex, VS Code, and others to access tools provided by the IDE. This provides users with the ability to control and interact with JetBrains IDEs without leaving their application of choice.

[1] https://www.jetbrains.com/help/idea/mcp-server.html#supporte...

ricw2mo ago

Tidewave.ai does exactly that. It’s made Claude code so much more functional. It provides mcp servers to

- search all your code efficiently - search all documentation for libraries - access your database and get real data samples (not just abstract data types) - allows you to select design components from your figma project and implements them for you - allows Claude to see what is rendered in the browser

It’s basically the ide for your LLM client. It really closes the loop and has made Claude and myself so much more productive. Highly recommended and cheap at $10/month

Ps: my personal opinion. I have Zero affiliation with them

Wowfunhappy2mo ago

LLMs operate on text. They can take in text, and they can produce text. Yes, some LLMs can also read and even produce images, but at least as of today, they are clearly much better at using text[1].

So cat, ripgrep, etc are the right tools for them. They need a command line, not a GUI.

1: Maybe you'd argue that Nano Banana is pretty good. But would you say its prompt adherence is good enough to produce, say, a working Scratch program?

kelipso2mo ago

Inputs to functions are text, as in variables, or file names, directory names, symbol names with symbol searching. Outputs you get from these functions for things like symbol searching is text too, or at least easily reformatted to text. Like API calls are all just text input and output.

1 more reply

JimDabell2mo ago

Kit looks like a good step in this direction:

https://github.com/cased/kit

girvo2mo ago

You can give agents the ability to check VSCode Diagnostics, LSP servers and the like.

But they constantly ignore them and use their base CLI tools instead, it drives me batty. No matter what I put in AGENTS.md or similar, they always just ignore the more advanced tooling IME.

worksonmine2mo ago

Doesn't have to be a bad thing, not all languages have good LSP support. If the AI can optimize for simple cross-language tools it won't be as dependent on the LSP implementation.

I used grep and simple ctags to program in vanilla vim for years. It can be more useful than you'd think. I do like the LSP in Neovim and use it a lot, but I don't need it.

1 more reply

hahahahhaah2mo ago

An LSP MCP?

ninkendo2mo ago

Yeah, or something even smarter than that.

If you are willing to go language-specific, the tooling can be incredibly rich if you go through the effort. I’ve written some rust compiler drivers for domain-specific use cases, and you can hook into phases of the compiler where you have amazingly detailed context about every symbol in the code. All manner of type metadata, locations where values are dropped, everything is annotated with spans of source locations too. It seems like a worthy effort to index all of it and make it available behind a standard query interface the LLM can use. You can even write code this way, I think rustfmt hooks into the same pipeline to produce formatted code.

I’ve always wished there were richer tools available to do what my IDE already does, but without needing to use the UI. Make it a standard API or even just CLI, and free it from the dependency on my IDE. It’d be very worth looking into I think.

1 more reply

rudedogg2mo ago

LSP also kind of sucks. But the problem is all the big companies want big valuations, so they only chase generic solutions. That's why everything is a VS Code clone, etc..

https://paulgraham.com/ds.html

dexwiz2mo ago

I've never used an LSP plugin half as good as a JetBrains IDE.

immibis2mo ago

Always wondered what happened to the era of IDEs actually knowing the language you're using.

ramraj072mo ago

Not coding agents but we do a lot of work trying to find the best tools, and the result is always that the simplest possible general tool that can get the job done always beats a suite of complicated tools and rules on how to use them.

eru2mo ago

Well, jump to definition isn't exactly complicated?

And you can use whatever interface the language servers already use to expose that functionality to eg vscode?

1 more reply

elif2mo ago

Surely there is an embedding for emacs giving it full elisp control

BryantD2mo ago

This isn’t completely the answer to what you want but skills do open a lot of doors here. Anything you can do on a command line can turn into a skill, after all.

karlgkk2mo ago

I’ve been saying this for a while. CPU demand is about to go through the roof.

I think about it, to get these tools to be most effective you have to be able to page things in and out of their context windows.

What was once a couple of queries is now gonna be dozens or hundreds or even more from the LLM

For code that means querying the AST and query it in a way that allows you to limit the results of the output

I wonder which SAST vendor Anthropic will buy.

throwawaygo2mo ago

Workin on it

Jaysobel2mo ago

Author here - some bonus links!

Session transcript using Simon Willison's claude-code-transcripts

https://htmlpreview.github.io/?https://gist.githubuserconten...

Reddit post

https://www.reddit.com/r/ClaudeAI/comments/1q9fen5/claude_co...

OpenRCT2!!

https://github.com/jaysobel/OpenRCT2

Project repo

https://github.com/jaysobel/OpenRCT2

theptip2mo ago

Did you eval using screenshots or some sort of rendered visualization instead of the CLI? I wonder if Claude has better visual intelligence when viewing images (lots of these in its training set) rather than ascii schematics (probably very few of these in the corpus).

cheema332mo ago

Computer use and screenshots are context intensive. Text is not. The more context you give to an LLM, the dumber it gets. Some people think at 40% context utilization, the LLM starts to get into the dumb zone. That is where the limitations are as of today. This is why CLI based tools like Claude Code are so good. And any attempt at computer use has fallen by the wayside.

There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.

1 more reply

Jaysobel2mo ago

I had tried the browser screenshotting feature for agents in Cursor and found it wasn't very reliable - screenshots eat a lot of context, and the agent didn't have a good sense for when to use them. I didn't try it in this project. I bet it would work in some specific cases.

nanapipirara2mo ago

Claude helped me immensely getting an image converter to work. Giving it screenshots of wrong output (lots of layers had an unpredictable offsets that was not supposed to be there) and output as I expected it helped Claude understand the problems and it fixed the bugs immediately.

deepl_y2mo ago

I'm not sure if this proves anything, but i saw this article of Opus playign pokemon, and here they were given actual screenshots, and it still says it navigated visual space pretty poorly despite the advancements https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-i...

cheschire2mo ago

Did you intend the last link to link to your project? It’s a copy of the OpenRCT2 project.

philipwhiuk2mo ago

I think the first one should have been https://github.com/OpenRCT2/OpenRCT2

The actual change to implement CC is: https://github.com/jaysobel/OpenRCT2/commit/5d49dc960fcfc133...

fragmede2mo ago

> Claude is at a pretty steep visuo-spatial disadvantage,

How hard would it be to use with OpenAI's offerings instead? Particularly, imo, OpenAI's better at "looking" at pictures than Claude.

rashidae2mo ago

> As a mirror to real-world agent design: the limiting factor for general-purpose agents is the legibility of their environments, and the strength of their interfaces. For this reason, we prefer to think of agents as automating diligence, rather than intelligence, for operational challenges.

hk__22mo ago

> The only other notable setback was an accidental use of the word "revert" which Codex took literally, and ran git revert on a file where 1-2 hours of progress had been accumulating.

qaboutthat2mo ago

If I tell Claude to "revert that last change, it isn't right, try this instead" and Claude hasn't committed recently it will happily `git checkout ...` and blow away all recent changes instead of reverting the "last change".

(Which, it's not wrong or anything -- I did say "revert that change" -- it's just annoying. And telling `CLAUDE.md` to commit more often doesn't work consistently, because Claude is a dummy sometimes).

mh-2mo ago

I haven't tried it, but theoretically one could use Claude Code's hooks facility to enforce committing at some determined thresholds.

1 more reply

_flux2mo ago

Amazing that these tools don't maintain a replayable log of everything they've done.

Although git revert is not a destructive operation, so it's surprising that it caused any loss of data. Maybe they meant git reset --hard or something like that. Wild if Codec would run that.

arcanemachiner2mo ago

I was looking at the insanity known as Gas Town [0] the other day, and it does use Git to store historical work state in something it calls "beads":

https://github.com/steveyegge/gastown?tab=readme-ov-file

2 more replies

rabf2mo ago

I have had codex recover things for me from its history after claude had done a git reset hard, codex is one of the more reliable models/harneses when it comes to performing undo and redo operations in my experience.

theptip2mo ago

Claude Code has had this feature for a few months now.

CPLX2mo ago

I found this tool to be the solution I was looking for to address this specific problem:

https://contextify.sh

MattGaiser2mo ago

Claude Code has /rewind. Not sure if it is foolproof, but this has been tried.

2 more replies

defunct342mo ago

Claude (can’t remember if was 4.1 Opus, 4.5 Sonnet, or 4.5 Opus) once just started playing with git worktrees and royally f-d up the local repo and lost several hours of work. Since then, I watch it like a hawk.

stinkbeetle2mo ago

`git reset --hard` doesn't remove unreferenced commits or rewrite the reflog so I don't think that would do it. Something like `git reset && git gc` would have to be done.

2 more replies

alt2272mo ago

I wonder how they accidentaly used a word like that.

gbear6052mo ago

“Please revert that last change you did”, referring to like a smaller change that had just been done

GardenLetter272mo ago

Codex reverted kindly.

esafak2mo ago

Does Codex not let you set command permissions?

legojoey172mo ago

Yea, it does so this would likely have been to be a `--yolo` (I don't care, let me `rm -rf /`). I've found even with the "workspace write" mode and no additional writable directories I can't do git operations without approval so it seems to exclude `.git` by default.

Filligree2mo ago

Yet another reason to use Jujutsu. And put a `jj status` wrapper in your PS1. ;-)

westurner2mo ago

Start with env args like AGENT_ID for indicating which Merkle hash of which model(s) generated which code with which agent(s) and add those attributes to signed (-S) commit messages. For traceability; to find other faulty code generated by the same model and determine whether an agent or a human introduced the fault.

Then, `git notes` is better for signature metadata because it doesn't change the commit hash to add signatures for the commit.

And then, you'd need to run a local Rekor log to use Sigstore attestations on every commit.

Sigstore.dev is SLSA.dev compliant.

Sigstore grants short-lived release attestation signing keys for CI builds on a build farm to sign artifacts with.

So, when jujutsu autocommits agent-generated code, what causes there to be an {{AGENT_ID}} in the commit message or git notes? And what stops a user from forging such attestations?

1 more reply

diath2mo ago

> Yet another reason to use Jujutsu

And what would that reason be? You can git revert a git revert.

2 more replies

glemion432mo ago

It's not going to happen...

Stop spamming

2 more replies

pocketarc2mo ago

I love the interview at the end of the video. The kubectl-inspired CLI, and the feedback for improvements from Claude, as well as the alerts/segmentation feedback.

You could take those, make the tools better, and repeat the experience, and I'd love to see how much better the run would go.

I keep thinking about that when it comes to things like this - the Pokemon thing as well. The quality of the tooling around the AI is only going to become more and more impactful as time goes on. The more you can deterministically figure out on behalf of the AI to provide it with accurate ways of seeing and doing things, the better.

Ditto for humans, of course, that's the great thing about optimizing for AI. It's really just "if a human was using this, what would they need"? Think about it: The whole thing with the paths not being properly connected, a human would have to sit down and really think about it, draw/sketch the layout to visualize and understand what coordinates to do things in. And if you couldn't do that, you too would probably struggle for a while. But if the tool provided you with enough context to understand that a path wasn't connected properly and why, you'd be fine.

wonnage2mo ago

I see this sentiment of using AI to improve itself a lot but it never seems to work well in practice. At best you end up with a very verbose context that covers all the random edge cases encountered during tasks.

For this to work the way people expect you’d need to somehow feed this info back into fine tuning rather than just appending to context. Otherwise the model never actually “learns”, you’re just applying heavy handed fudge factors to existing weights through context.

pilord3142mo ago

I've been playing around with an AI generated knowledge base to grok our code base, I think you need good metrics on how the knowledge base is used. A few things is:

1. Being systematic. Having a system for adding, improving and maintaining the knoweldge base 2. Having feedback for that system 3. Implementing the feedback into a better system

I'm pretty happy I have an audit framework and documentation standards. I've refactored the whole knowledge base a few times. In the places where it's overly specific or too narrow in it's scope of use for the retained knowledge, you just have to prune it.

Any garden has weeds when you lay down fertile soil.

Sometimes they aren't weeds though, and that's where having a person in the driver's seat is a boon.

mcintyre19942mo ago

The features it asked for in this case were better tools, I thought they were really sensible. It said it wanted a —dry-run (like the CLIs the rct one was modelled on), it wanted to be able to segment guest feedback, and it wanted better feedback from its path tools. Those might not be actually possible in rct, but in a different context they’re pretty smart requests and not just verbose edge cases.

lukebechtel2mo ago

> We don't know any C++ at all, and we vibe-coded the entire project over a few weeks. The core pieces of the build are…

what a world!

AndrewKemendo2mo ago

I would’ve walked for days to a CompUSA and spent my life savings if there was anything remotely equivalent to this when I was learning C on my Macintosh 4400 in 1997

People don’t appreciate what they have

lifetimerubyist2mo ago

It’s worse. They’re proud they don’t know.

2 more replies

imiric2mo ago

Did you actually learn C? Be thankful nothing like this existed in 1997.

A machine generating code you don't understand is not the way to learn a programming language. It's a way to create software without programming.

These tools can be used as learning assistants, but the vast majority of people don't use them as such. This will lead to a collective degradation of knowledge and skills, and the proliferation of shoddily built software with more issues than anyone relying on these tools will know how to fix. At least people who can actually program will be in demand to fix this mess for years to come.

5 more replies

yoyohello132mo ago

Everyone should read that section. It was really interesting reading about their experiences/challenges getting it all working.

falloutx2mo ago

First time I am seeing realistic timelines from a vibe-coded project. Usually everyone who vibe codes just says they did in few hours, no matter the project.

ben_w2mo ago

Hmm. My experience with it is that a few hours of that will get you a sprint if you're lucky and the prompt hits the happy path. I had… I think two of those, over 5 weeks? I can believe plenty of random people stumble across happy-path examples.

Exciting when it works, but I think a much more exciting result for people with less experience who may not know that the "works for me" demo is the dreaded "first 90%", and even fairly small projects aren't done until the fifth-to-tenth 90%.

(That, and that vibe coding in the sense of "no code review" are prone to balls of mud, so you need to be above average at project management to avoid that after a few sprint-equivalents of output).

Aurornis2mo ago

It’s possible to vibe code certain generic things in a few hours if you’re basically combining common, thoroughly documented, mature building blocks. It’s not going to be production ready or polished but you can get surprisingly far with some things.

For real work, that phase is like starting from a template or a boilerplate repo. The real work begins after the basics are wired together.

fnordpiglet2mo ago

Interesting article but it doesn’t actually discuss how well it performs at playing the game. There is in fact a 1.5 hour YouTube video but it woulda been nice for a bit of an outcome postmortem. It’s like “here’s the methods and set up section of a research paper but for the conclusion you need to watch this movie and make your own judgements!”

Sharlin2mo ago

It does discuss that? Basically it has good grasp of finances and often knows what "should" be done, but it struggles with actually building anything beyond placing toilets and hotdog stalls. To be fair, its map interface is not exactly optimal, and a multimodal model might fare quite a bit better at understanding the 2D map (verticality would likely still be a problem).

cyanydeez2mo ago

I was told the important part of AI is the generation part, not the verification or quality.

nipponese2mo ago

> kept the context above the ~60% remaining level where coding models perform at their absolute best

Maybe this is obvious to Claude users but how do you know your remaining context level? There is UI for this?

adithyareddy2mo ago

You can also show context in the statusline within claude code: https://code.claude.com/docs/en/statusline#context-window-us...

nipponese2mo ago

Follow up Q: what are you supposed to do when the context becomes too large? Start a new conversation/context window and let Claude start from scratch?

6 more replies

d4rkp4ttern2mo ago

Yes you can literally just ask Claude Code to create a status line showing context usage. I had it make this colored progress bar of context usage, changing thru green, yellow, orange, red as context fills up. Instructions to install:

https://github.com/pchalasani/claude-code-tools?tab=readme-o...

neilfrndes2mo ago

Claude code has a /context command.

MattGaiser2mo ago

/context

margorczynski2mo ago

I think something like Civilization would be better because:

1) The map is a grid

2) Turn based

maxall42mo ago

> In this article we'll tell you why we decided to put Claude Code into RollerCoaster Tycoon, and what lessons it taught us about B2B SaaS.

What is this? A LinkedIn post?

mcintyre19942mo ago

> Your outlook above is too self critical. This is the first time an AI has beaten this park much less played a full game of RollerCoaster Tycoon through a TUI. There are important learnings for B2B SaaS. This isn't LinkedIn (it is, in fact, LinkedIn). But seriously. What can we learn here.

From the transcript: https://htmlpreview.github.io/?https://gist.githubuserconten... :)

haunter2mo ago

This is what I want but for PoE/PoE2 builds. I always get a headache just looking at the passive tree https://poe.ninja/poe2/passive-skill-tree

TaupeRanger2mo ago

I corroborate that spatial reasoning is a challenge still. In this case, it's the complexity of the game world, but anyone who has used Codex/Claude with complex UIs in CSS or a native UI library will recognize the shortcomings fairly quickly.

khoury2mo ago

Can't wait for someone to let Claude control a runescape character from scratch

itsgrimetime2mo ago

I've done this! Given the right interface I was surprised at how well it did. Prompted it "You're controlling a character in Old School RuneScape, come up with a goal for yourself, and don't stop working on it until you've achieved it". It decided to fish for and cook 100 lobsters, and it did it pretty much flawlessly!

Biggest downside was it's inability to see (literally), getting lists of interact-able game objects, NPCs, etc was fine when it decided to do something that didn't require any real-time input. Sailing, or anything that required it to react to what's on screen was pretty much impossible without more tooling to manage the reacting part for it (e.g. tool to navigate automatically to some location).

runfrook2mo ago

RuneScape is packet based and there are tools for inspecting packets. I wonder if these tools can give some insight to Claude Code.

The only thing is you would need a description of the worlmap on each tick (i.e. where npcs are, where objects are, where players are)

reactordev2mo ago

https://www.reddit.com/r/2007scape/comments/1qeh3nc/i_added_...

https://ubos.tech/mcp/runescape-mcp-server-rs-osrs/

ASpring2mo ago

People have been botting on Runescape since the early 2000s. Obviously not quite at the Claude level :). The botting forums were a group of very active and welcoming communities. This is actually what led me to Java programming and computer science more broadly--I wrote custom scripts for my characters.

I still have some parts of the old Rei-net forum archived on an external somewhere.

ideashower2mo ago

Wouldn't that break Jagex's TOS though? Is there a way of getting caught?

AstroBen2mo ago

I imagine Jagex must be up there with having the most sophisticated bot detection out of anyone. Its been a thing for decades

1 more reply

phreeza2mo ago

Claude Code in dwarf fortress would be wild

rsanek2mo ago

https://www.youtube.com/watch?v=FLmPN03ZQbM

__turbobrew__2mo ago

Given dwarf fortress has an ASCII interface it may actually be a lot easier to set up claude to work with it. Also, a lot of the challenges of dwarf fortress is just knowing all the different mechanics and how they work which is something claude should be good at.

vunderba2mo ago

And it’s (Claude) almost certainly accumulated a fair amount of knowledge about the game itself, given the number of tutorials, guides, and other resources that have been written about DF over the last two decades.

wtetzner2mo ago

Unfortunately it's rendering ASCII characters as sprites using SDL, so it's not really a text interface.

sodafountan2mo ago

This was an interesting application of AI, but I don't really think this is what LLMs excel at. Correct me if I'm wrong.

It was interesting that the poster vibe-coded (I'm assuming) the CTL from scratch; Claude was probably pretty good at doing that, and that task could likely have been completed in an afternoon.

Pairing the CTL with the CLI makes sense, as that's the only way to gain feedback from the game. Claude can't easily do spatial recognition (yet).

A project like this would entirely depend on the game being open source. I've seen some very impressive applications of AI online with closed-source games and entire algorithms dedicated to visual reasoning.

I'm still trying to figure out how this guy: https://www.youtube.com/watch?v=Doec5gxhT_U

Was able to have AI learn to play Mario Kart nearly perfectly. I find his work to be very impressive.

I guess because RCT2 is more data-driven than visually challenging, this solution works well, but having an LLM try to play a racing game sounds like it would be disastrous.

tadfisher2mo ago

Not sure if you clocked this, but the Mario Kart AI is not an LLM. It's a randomized neural net that was trained with reinforcement learning. Apologies if I misread.

sodafountan2mo ago

Yeah, that was the point of my post. LLMs traditionally aren't used in gaming like this.

deadbabe2mo ago

While this seems cool at first, it does not demonstrate superiority over a true custom built AI for rollercoaster tycoon.

It is a curiosity, good for headlines, but the takeaway is if you really need an actual good AI, you are still better off not using an LLM powered solution.

colesantiago2mo ago

> We don't know any C++ at all, and we vibe-coded the entire project over a few weeks.

And these are the same people that put countless engineers through gauntlets of bizarre interview questions and exotic puzzles to hire engineers.

But when it comes to C++ just vibe it obviously.

falloutx2mo ago

Oh, I almost didn't realise this is done by a company. I was like this must have costed a lot, didn't realize its just an advertisement for ramp

equinumerous2mo ago

This is a cool idea. I wanted to do something like this by adding a Lua API to OpenRCT2 that allows you to manipulate and inspect the game world. Then, you could either provide an LLM agent the ability to write and run scripts in the game, or program a more classic AI using the Lua API. This AI would probably perform much better than an LLM - but an interesting experiment nonetheless to see how a language model can fare in a task it was not trained to do.

equinumerous2mo ago

As far as a scripting API, it looks like the devs beat me to it with a JS/TS plugin system: https://github.com/OpenRCT2/OpenRCT2/blob/develop/distributi...

mentos2mo ago

The opening paragraph I thought was the agent prompt haha

> The park rating is climbing. Your flagship coaster is printing money. Guests are happy, for now. But you know what's coming: the inevitable cascade of breakdowns, the trash piling up by the exits, the queue times spiraling out of control.

karanveer2mo ago

the beauty of this game was that it was developed in Assembly Code and on top of that by majorly one person.

I've been trying to locate the dev of this game since a long time, so I can thank them for an amazing experience.

If anyone knows their social or anything, please do share, including OP.

Also, nice work on CC in this. May actually be interested in Claude Code now.

kinduff2mo ago

It's been several times that I see ASCII being used initially for these kinds of problems. I think it's because its counter-intuitive, in the sense that for us humans ASCII is text but we tend to forget spacial awareness.

I find this very interesting of us humans interacting with AIs.

js4ever2mo ago

Most interesting phrase: "Keeping all four agents busy took a lot of mental bandwidth."

neom2mo ago

Wonder how it would do with Myst.

alt2272mo ago

Surely it must have digested plenty of walkthroughs for any game?

A linear puzzle game like that I would just expect the ai to fly through first time, considering it has probably read 30 years of guides and walkthroughs.

singpolyma32mo ago

The real test would be to try it on a new game of the same style and complexity

1 more reply

skybrian2mo ago

Would a way to take screenshots help? It seems to work for browser testing.

joshribakoff2mo ago

I’ve been doing game development and it starts to hallucinate more rapidly when it doesn’t understand things like the direction it placing things or which way the camera is oriented

Gemini models are a little bit better about spatial reasoning, but we’re still not there yet because these models were not designed to do spatial reasoning they were designed to process text

In my development, I also use the ascii matrix technique.

kleene_op2mo ago

Spatial awareness was also a huge limitation to Claude playing pokemon.

It really seems to me that the first AI company getting to implement "spatial awareness" vector tokens and integrating them neatly with the other conventional text, image and sound tokens will be reaping huge rewards. Some are already partnering with robot companies, it's only a matter of time before one of those gets there.

1 more reply

hypercube332mo ago

I disagree. With opus I'll screenshot an app and draw all over it like a child with me paint and paste it into the chat - it seems to reasonably understand what I'm asking with my chicken scratch and dimensions.

As far as 3d I don't have experience however it could be quite awful at that

1 more reply

miohtama2mo ago

They would need a spatial reason or layout specific tool, to translate to English and back

1 more reply

neonmagenta2mo ago

But will Claude pick up complaining guests and put them in a tiny isolated section of the park that only has a bathroom that charges $10 to use?

vermilingua2mo ago

I want to get off MR ALTMANS WILD RIDE.

petcat2mo ago

Question: There is still a competitive AoE2 community. Will that be destroyed by AI?

pbmonster2mo ago

Dota 2 is a real time strategy game with an arguably more complex micro game (but a far simpler macro game than AoE2, but that's far easier for an AI to master), and OpenAI Five completely destroyed the reigning champions. In 2019. Perfect coordination between units, superhuman mechanical skill, perfect consistency.

I see no reason why AoE2 would be any different.

Worth noting that openAI Five was mostly deep reinforcement learning and massive distributed training, it didn't use image to text and an LLM for reasoning about what it sees to make its "decisions". But that wouldn't be a good way to do an AI like that anyway.

Oh, and humans still play Dota. It's still a highly competitive community. So that wasn't destroyed at all, most teams now use AI to study tactics and strategy.

bawolff2mo ago

I suspect the fun is playing against real people and the unexpected things they do. Just because the AI can beat you does not necessarily make it fun. People still play chess despite stock fish existing.

ddtaylor2mo ago

Does this website do anything besides host the article with an animated background?

HelloUsername2mo ago

*OpenRCT2

seu2mo ago

> also completely unfazed by the premise that it has been 'hacked into' a late-90's computer game. This was surprising, but fits with Claude's playful personality and flexible disposition.

When I read things like this, I wonder if it's just me not understanding this brave new world, or half of AI developers are delusional and really believe that they are dealing with a sentient being.

bspammer2mo ago

It can be non-sentient and still have an observable personality. The same way a character in a novel can have a personality despite not being real.

vinyl72mo ago

Delusional

sriram_sun2mo ago

> "Where Claude excels:"

Am I reading a Claude generated summary here?

alt2272mo ago

I thought it sounded more like an ad for Claude written by Anthropic:

> "This was surprising, but fits with Claude's playful personality and flexible disposition."

vidarh2mo ago

This sounds as expected to me as a heavy user of Opus. Claude absolutely has a "personality" that is a lot less formal and more willing to "play along" with more creative tasks than Codex. If you want an agent that's prepared to just jump in, it's a plus. If you want an agent that will be careful, considered and plan things out meticulously, it's not always so great - I feel that when you want Claude to do reptitive, tedious tasks, you need to do more work to prevent it from getting "bored" and try to take shortcuts or find something else to do, for example.

1 more reply

afro882mo ago

Yes I believe so. Also things like forcing a "key insight" summary after the excels vs struggles section.

I would take any descriptions like "comprehensive", "sophisticated" etc with a massive grain of salt. But the nuts and bolts of how it was done should be accurate.

rnmmrnm2mo ago

this is cute but i imagined prompting the ai for a loop-di-loop roller coaster. If this could build complex ride it would be a game changer.

blibble2mo ago

yeah I was expecting it to... do something in the game? like build a ride

not just make up bullshit about events

azhenley2mo ago

Edit: HN's auto-resubmit in action, ignore.

Bluescreenbuddy2mo ago

What

eterm2mo ago

So, this link is actually 5 days old, if you hover the "2 hours ago" you'll see the date 5 days ago.

HN second-chance pool shenanigans.

1 more reply

bawolff2mo ago

Honestly i thought the AI would do better then what is described. RCT is pretty simple when it comes to things like what to set ride price to. I think the game has a straightforward formula for how guests respond to prices.

joshcsimmons2mo ago

Interesting this is on the ramp.com domain? I'm surprised in this tech market they can pay devs to hack on Rollercoaster Tycoon. Maybe there's some crossover I'm missing but seems like a sweet gig honestly.

emeril2mo ago

yeah really - ramp.com is a credit card/expense platform that surely loses money right now...

pretty heavy/slow javascript but pretty functional nonetheless...

mock-possum2mo ago

Why would they be losing money? It’s what we use for tracking expenses and getting comped for travel, meals, software licenses etc - works great in my experience. I can click a few buttons and get a new business expense card spun up in less than a minute, use it to make a purchase, get approval and have the funds transferred. Boom easy.

Do you not think they’re charging enough or something?

ulf-777232mo ago

This is brilliant SEO work, I doubt that they loose money with it. With 40h and some additional for the landingpage it might be an expensive link bait, but definitely worth it. Kudos!

If not for SEO, it’s building quite a good reputation for this company, they got a lot of open positions.

I’m a big fan of transport tycoon, used to play it for hours as a kid and with Open Transport Tycoon it also might have been a good choice, but maybe not B2C?

fuzzy_lumpkins2mo ago

so the janitors will finally stay on their assigned footpaths?

nacozarina2mo ago

next up: Crusader Kings III

mcphage2mo ago

> You’re right, I did accidentally slaughter all the residents of Béziers. I won’t do that again. But I think that you’ll find God knows his own.

Forgeties792mo ago

Paradox future hire right here

Deukhoofd2mo ago

Crusader Kings is a franchise I really could see LLMs shine. One of the current main criticisms on the game is that there's a lack of events, and that they often don't really feel relevant to your character.

An LLM could potentially make events far more aimed at your character, and could actually respond to things happening in the world far more than what the game currently does. It could really create some cool emerging gameplay.

Braini2mo ago

In general you are right, I expect something like this to appear in the future and it would be cool.

But isn't the criticism rather that there are too many (as you say repetitive, not relevant) events - its not like there are cool stories emerging from the underlying game mechanics anymore ("grand strategy") but players have to click through these boring predetermined events again and again.

1 more reply

Kapura2mo ago

"i vibe coded a thing to play video games for me"

i enjoy playing video games my own self. separately, i enjoy writing code for video games. i don't need ai for either of these things.

gordonhart2mo ago

Yeah, but can you use your enjoyment of video games as marketing material to justify a $32B valuation?

falloutx2mo ago

If you look at submissions from this website, its all just self glazing and "We did X with claude code"

yawnr2mo ago

Haha exactly. This screams “we have too many people working here and don’t know what to do with them”.

Jaysobel2mo ago

actually it was all to drive traffic to my 'rollercoaster coasters' Etsy store

https://bansostudio.etsy.com

TaupeRanger2mo ago

^ this guy funds

1 more reply

bigyabai2mo ago

That's fine. Tool-assisted speedruns long predate LLMs and they're boring as hell: https://youtu.be/W-MrhVPEqRo

It's still a neat perspective on how to optimize for super-specific constraints.

ai_2mo ago

That TAS is spliced. The stairs beyond the door aren't loaded, you need the key to load it.

This is a real console 0-star TAS: https://youtu.be/iUt840BUOYA

throwaway3141552mo ago

> Tool-assisted speedruns long predate LLMs and they're boring as hell

You and I have _very_ different definitions for the word boring. A lot of effort goes into TAS runs.

rangestransform2mo ago

I actually think it would be pretty fun to code something to play video games for me, it has a lot of overlap with robotics. Separately, I learned about assembly from cheat engine when I was a kid.

markbao2mo ago

That’s not the point of this. This was an exercise to measure the strengths and weaknesses of current LLMs in operating a company and managing operations, and the video game was just the simulation engine.

echelon2mo ago

You do you. I find this exceedingly cool and I think it's a fun new thing to do.

It's kind of like how people started watching Let's Plays and that turned into Twitch.

One of the coolest things recently is VTubers in mocap suits using AI performers to do single person improv performances with. It's wild and cool as hell. A single performer creating a vast fantasy world full of characters.

LLMs and agents playing Pokemon and StarCraft? Also a ton of fun.

1 more reply

jsbisviewtiful2mo ago

AI for the sake of AI. Feels like a lot of the internet right now

j / k navigate · click thread line to collapse

288 comments

ninkendo2mo ago

I’ve always found it crazy that my LLM has access to such terrible tools compared to mine.

It’s left with grepping for function signatures, sending diffs for patching, and running `cat` to read all the code at once.

nbardy2mo ago

> Claude Code officially added native support for the Language Server Protocol (LSP) in version 2.0.74, released in December 2025.

I think from training it's still biased towards simple tooling.

If you watch the tool calls you'll see they write a ton of one off small python programs to test, validate explore, etc...

frumplestlatz2mo ago

> but for LLM's they can instantly compose the low level tools for their use case and learn to generalize

Just like humans, LLMs benefit from automating the things they need to do repeatedly so that they can reserve their computational capacity for much more interesting problems.

LLMs benefit from an IDE just like humans do -- even if an "IDE" for them looks very different. The difference is night and day. They produce vastly better code, faster.

[1] And by "I've written", I mean I had an LLM do it.

forty2mo ago

Note that the Claude code LSP integration was actually broken for a while after it was released, so make sure you have a very recent version if you want to try it out.

However as parent comment said, it seems to always grep instead, unless explicitly said to use the LSP tool.

cududa2mo ago

1 more reply

cm21872mo ago

We are back to RISC vs CISC!

1 more reply

KronisLV2mo ago

I am so surprised that all of the AI tooling mostly revolves around VSC or its forks and that JetBrains seem to not really have done anything revolutionary in the space.

With how good their refactoring and code inspection tools are, you’d really think they’d pass of that context information to AI models and that they’d be leaps and bounds ahead.

harikb2mo ago

Recently, all these agents can talk LSP (language server protocol) so it should get better soon. That said, yeah they don't seem to default to use `ripgrep` when that is clearly better than `grep`

2 more replies

eek21212mo ago

Are you? I'm not surprised at all, considering that the biggest investment juggernaut in AI is also the author of VSC. I wonder what the connection is? ;)

1 more reply

penneyd2mo ago

Agreed - this seems like a no brainer, surely this is something that is being worked on.

htrp2mo ago

Jetbrains is trying but I feel like they're very very behind in the space

epicureanideal2mo ago

Claude and other LLMs can be used through JetBrains, and the IDE provides a significantly better experience than VS Code in my opinion.

PlatoIsADisease2mo ago

mulmboy2mo ago

To provide it access to refactoring as a tool also risks confusing it via too many tools.

It's the same reason that waffling for a few minutes via speech to text with tangents and corrections and chaos is just about as good as a carefully written prompt for coding agents.

fragmede2mo ago

Anthropic, for one.

> Added LSP (Language Server Protocol) tool for code intelligence features like go-to-definition, find references, and hover documentation

https://github.com/anthropics/claude-code/blob/main/CHANGELO...

novaleaf2mo ago

their c# LSP theoretically worked for a week or so (I never saw it in action though), but now it always errors on launch :(

1 more reply

hippo222mo ago

If you can read fast enough, grepping is probably faster than waiting for a compiler to tell you anything.

gf0002mo ago

2 more replies

fancy_pantser2mo ago

Zed Editor gives the LLM tools that use the LSP as you'd expect as a normal IDE user, like "go to symbol definition" so it greps a lot less.

selcuka2mo ago

JetBrain IDEs come with an MCP server that supports some refactoring tools [1]:

[1] https://www.jetbrains.com/help/idea/mcp-server.html#supporte...

ricw2mo ago

Tidewave.ai does exactly that. It’s made Claude code so much more functional. It provides mcp servers to

It’s basically the ide for your LLM client. It really closes the loop and has made Claude and myself so much more productive. Highly recommended and cheap at $10/month

Ps: my personal opinion. I have Zero affiliation with them

Wowfunhappy2mo ago

So cat, ripgrep, etc are the right tools for them. They need a command line, not a GUI.

1: Maybe you'd argue that Nano Banana is pretty good. But would you say its prompt adherence is good enough to produce, say, a working Scratch program?

kelipso2mo ago

1 more reply

JimDabell2mo ago

Kit looks like a good step in this direction:

https://github.com/cased/kit

girvo2mo ago

You can give agents the ability to check VSCode Diagnostics, LSP servers and the like.

But they constantly ignore them and use their base CLI tools instead, it drives me batty. No matter what I put in AGENTS.md or similar, they always just ignore the more advanced tooling IME.

worksonmine2mo ago

Doesn't have to be a bad thing, not all languages have good LSP support. If the AI can optimize for simple cross-language tools it won't be as dependent on the LSP implementation.

I used grep and simple ctags to program in vanilla vim for years. It can be more useful than you'd think. I do like the LSP in Neovim and use it a lot, but I don't need it.

1 more reply

hahahahhaah2mo ago

An LSP MCP?

ninkendo2mo ago

Yeah, or something even smarter than that.

1 more reply

rudedogg2mo ago

LSP also kind of sucks. But the problem is all the big companies want big valuations, so they only chase generic solutions. That's why everything is a VS Code clone, etc..

https://paulgraham.com/ds.html

dexwiz2mo ago

I've never used an LSP plugin half as good as a JetBrains IDE.

immibis2mo ago

Always wondered what happened to the era of IDEs actually knowing the language you're using.

ramraj072mo ago

eru2mo ago

Well, jump to definition isn't exactly complicated?

And you can use whatever interface the language servers already use to expose that functionality to eg vscode?

1 more reply

elif2mo ago

Surely there is an embedding for emacs giving it full elisp control

BryantD2mo ago

This isn’t completely the answer to what you want but skills do open a lot of doors here. Anything you can do on a command line can turn into a skill, after all.

karlgkk2mo ago

I’ve been saying this for a while. CPU demand is about to go through the roof.

I think about it, to get these tools to be most effective you have to be able to page things in and out of their context windows.

What was once a couple of queries is now gonna be dozens or hundreds or even more from the LLM

For code that means querying the AST and query it in a way that allows you to limit the results of the output

I wonder which SAST vendor Anthropic will buy.

throwawaygo2mo ago

Workin on it

Jaysobel2mo ago

Author here - some bonus links!

Session transcript using Simon Willison's claude-code-transcripts

https://htmlpreview.github.io/?https://gist.githubuserconten...

Reddit post

https://www.reddit.com/r/ClaudeAI/comments/1q9fen5/claude_co...

OpenRCT2!!

https://github.com/jaysobel/OpenRCT2

Project repo

https://github.com/jaysobel/OpenRCT2

theptip2mo ago

cheema332mo ago

1 more reply

Jaysobel2mo ago

nanapipirara2mo ago

deepl_y2mo ago

cheschire2mo ago

Did you intend the last link to link to your project? It’s a copy of the OpenRCT2 project.

philipwhiuk2mo ago

I think the first one should have been https://github.com/OpenRCT2/OpenRCT2

The actual change to implement CC is: https://github.com/jaysobel/OpenRCT2/commit/5d49dc960fcfc133...

fragmede2mo ago

> Claude is at a pretty steep visuo-spatial disadvantage,

How hard would it be to use with OpenAI's offerings instead? Particularly, imo, OpenAI's better at "looking" at pictures than Claude.

rashidae2mo ago

hk__22mo ago

> The only other notable setback was an accidental use of the word "revert" which Codex took literally, and ran git revert on a file where 1-2 hours of progress had been accumulating.

qaboutthat2mo ago

mh-2mo ago

I haven't tried it, but theoretically one could use Claude Code's hooks facility to enforce committing at some determined thresholds.

1 more reply

_flux2mo ago

Amazing that these tools don't maintain a replayable log of everything they've done.

Although git revert is not a destructive operation, so it's surprising that it caused any loss of data. Maybe they meant git reset --hard or something like that. Wild if Codec would run that.

arcanemachiner2mo ago

I was looking at the insanity known as Gas Town [0] the other day, and it does use Git to store historical work state in something it calls "beads":

https://github.com/steveyegge/gastown?tab=readme-ov-file

2 more replies

rabf2mo ago

theptip2mo ago

Claude Code has had this feature for a few months now.

CPLX2mo ago

I found this tool to be the solution I was looking for to address this specific problem:

https://contextify.sh

MattGaiser2mo ago

Claude Code has /rewind. Not sure if it is foolproof, but this has been tried.

2 more replies

defunct342mo ago

stinkbeetle2mo ago

`git reset --hard` doesn't remove unreferenced commits or rewrite the reflog so I don't think that would do it. Something like `git reset && git gc` would have to be done.

2 more replies

alt2272mo ago

I wonder how they accidentaly used a word like that.

gbear6052mo ago

“Please revert that last change you did”, referring to like a smaller change that had just been done

GardenLetter272mo ago

Codex reverted kindly.

esafak2mo ago

Does Codex not let you set command permissions?

legojoey172mo ago

Filligree2mo ago

Yet another reason to use Jujutsu. And put a `jj status` wrapper in your PS1. ;-)

westurner2mo ago

Then, `git notes` is better for signature metadata because it doesn't change the commit hash to add signatures for the commit.

And then, you'd need to run a local Rekor log to use Sigstore attestations on every commit.

Sigstore.dev is SLSA.dev compliant.

Sigstore grants short-lived release attestation signing keys for CI builds on a build farm to sign artifacts with.

So, when jujutsu autocommits agent-generated code, what causes there to be an {{AGENT_ID}} in the commit message or git notes? And what stops a user from forging such attestations?

1 more reply

diath2mo ago

> Yet another reason to use Jujutsu

And what would that reason be? You can git revert a git revert.

2 more replies

glemion432mo ago

It's not going to happen...

Stop spamming

2 more replies

pocketarc2mo ago

I love the interview at the end of the video. The kubectl-inspired CLI, and the feedback for improvements from Claude, as well as the alerts/segmentation feedback.

You could take those, make the tools better, and repeat the experience, and I'd love to see how much better the run would go.

wonnage2mo ago

pilord3142mo ago

I've been playing around with an AI generated knowledge base to grok our code base, I think you need good metrics on how the knowledge base is used. A few things is:

1. Being systematic. Having a system for adding, improving and maintaining the knoweldge base 2. Having feedback for that system 3. Implementing the feedback into a better system

Any garden has weeds when you lay down fertile soil.

Sometimes they aren't weeds though, and that's where having a person in the driver's seat is a boon.

mcintyre19942mo ago

lukebechtel2mo ago

> We don't know any C++ at all, and we vibe-coded the entire project over a few weeks. The core pieces of the build are…

what a world!

AndrewKemendo2mo ago

I would’ve walked for days to a CompUSA and spent my life savings if there was anything remotely equivalent to this when I was learning C on my Macintosh 4400 in 1997

People don’t appreciate what they have

lifetimerubyist2mo ago

It’s worse. They’re proud they don’t know.

2 more replies

imiric2mo ago

Did you actually learn C? Be thankful nothing like this existed in 1997.

A machine generating code you don't understand is not the way to learn a programming language. It's a way to create software without programming.

5 more replies

yoyohello132mo ago

Everyone should read that section. It was really interesting reading about their experiences/challenges getting it all working.

falloutx2mo ago

First time I am seeing realistic timelines from a vibe-coded project. Usually everyone who vibe codes just says they did in few hours, no matter the project.

ben_w2mo ago

(That, and that vibe coding in the sense of "no code review" are prone to balls of mud, so you need to be above average at project management to avoid that after a few sprint-equivalents of output).

Aurornis2mo ago

For real work, that phase is like starting from a template or a boilerplate repo. The real work begins after the basics are wired together.

fnordpiglet2mo ago

Sharlin2mo ago

cyanydeez2mo ago

I was told the important part of AI is the generation part, not the verification or quality.

nipponese2mo ago

> kept the context above the ~60% remaining level where coding models perform at their absolute best

Maybe this is obvious to Claude users but how do you know your remaining context level? There is UI for this?

adithyareddy2mo ago

You can also show context in the statusline within claude code: https://code.claude.com/docs/en/statusline#context-window-us...

nipponese2mo ago

Follow up Q: what are you supposed to do when the context becomes too large? Start a new conversation/context window and let Claude start from scratch?

6 more replies

d4rkp4ttern2mo ago

https://github.com/pchalasani/claude-code-tools?tab=readme-o...

neilfrndes2mo ago

Claude code has a /context command.

MattGaiser2mo ago

/context

margorczynski2mo ago

I think something like Civilization would be better because:

1) The map is a grid

2) Turn based

maxall42mo ago

> In this article we'll tell you why we decided to put Claude Code into RollerCoaster Tycoon, and what lessons it taught us about B2B SaaS.

What is this? A LinkedIn post?

mcintyre19942mo ago

From the transcript: https://htmlpreview.github.io/?https://gist.githubuserconten... :)

haunter2mo ago

This is what I want but for PoE/PoE2 builds. I always get a headache just looking at the passive tree https://poe.ninja/poe2/passive-skill-tree

TaupeRanger2mo ago

khoury2mo ago

Can't wait for someone to let Claude control a runescape character from scratch

itsgrimetime2mo ago

runfrook2mo ago

RuneScape is packet based and there are tools for inspecting packets. I wonder if these tools can give some insight to Claude Code.

The only thing is you would need a description of the worlmap on each tick (i.e. where npcs are, where objects are, where players are)

reactordev2mo ago

https://www.reddit.com/r/2007scape/comments/1qeh3nc/i_added_...

https://ubos.tech/mcp/runescape-mcp-server-rs-osrs/

ASpring2mo ago

I still have some parts of the old Rei-net forum archived on an external somewhere.

ideashower2mo ago

Wouldn't that break Jagex's TOS though? Is there a way of getting caught?

AstroBen2mo ago

I imagine Jagex must be up there with having the most sophisticated bot detection out of anyone. Its been a thing for decades

1 more reply

phreeza2mo ago

Claude Code in dwarf fortress would be wild

rsanek2mo ago

https://www.youtube.com/watch?v=FLmPN03ZQbM

__turbobrew__2mo ago

vunderba2mo ago

wtetzner2mo ago

Unfortunately it's rendering ASCII characters as sprites using SDL, so it's not really a text interface.

sodafountan2mo ago

This was an interesting application of AI, but I don't really think this is what LLMs excel at. Correct me if I'm wrong.

It was interesting that the poster vibe-coded (I'm assuming) the CTL from scratch; Claude was probably pretty good at doing that, and that task could likely have been completed in an afternoon.

Pairing the CTL with the CLI makes sense, as that's the only way to gain feedback from the game. Claude can't easily do spatial recognition (yet).

I'm still trying to figure out how this guy: https://www.youtube.com/watch?v=Doec5gxhT_U

Was able to have AI learn to play Mario Kart nearly perfectly. I find his work to be very impressive.

I guess because RCT2 is more data-driven than visually challenging, this solution works well, but having an LLM try to play a racing game sounds like it would be disastrous.

tadfisher2mo ago

Not sure if you clocked this, but the Mario Kart AI is not an LLM. It's a randomized neural net that was trained with reinforcement learning. Apologies if I misread.

sodafountan2mo ago

Yeah, that was the point of my post. LLMs traditionally aren't used in gaming like this.

deadbabe2mo ago

While this seems cool at first, it does not demonstrate superiority over a true custom built AI for rollercoaster tycoon.

It is a curiosity, good for headlines, but the takeaway is if you really need an actual good AI, you are still better off not using an LLM powered solution.

colesantiago2mo ago

> We don't know any C++ at all, and we vibe-coded the entire project over a few weeks.

And these are the same people that put countless engineers through gauntlets of bizarre interview questions and exotic puzzles to hire engineers.

But when it comes to C++ just vibe it obviously.

falloutx2mo ago

Oh, I almost didn't realise this is done by a company. I was like this must have costed a lot, didn't realize its just an advertisement for ramp

equinumerous2mo ago

As far as a scripting API, it looks like the devs beat me to it with a JS/TS plugin system: https://github.com/OpenRCT2/OpenRCT2/blob/develop/distributi...

mentos2mo ago

The opening paragraph I thought was the agent prompt haha

karanveer2mo ago

the beauty of this game was that it was developed in Assembly Code and on top of that by majorly one person.

I've been trying to locate the dev of this game since a long time, so I can thank them for an amazing experience.

If anyone knows their social or anything, please do share, including OP.

Also, nice work on CC in this. May actually be interested in Claude Code now.

kinduff2mo ago

I find this very interesting of us humans interacting with AIs.

js4ever2mo ago

Most interesting phrase: "Keeping all four agents busy took a lot of mental bandwidth."

neom2mo ago

Wonder how it would do with Myst.

alt2272mo ago

Surely it must have digested plenty of walkthroughs for any game?

A linear puzzle game like that I would just expect the ai to fly through first time, considering it has probably read 30 years of guides and walkthroughs.

singpolyma32mo ago

The real test would be to try it on a new game of the same style and complexity

1 more reply

skybrian2mo ago

Would a way to take screenshots help? It seems to work for browser testing.

joshribakoff2mo ago

I’ve been doing game development and it starts to hallucinate more rapidly when it doesn’t understand things like the direction it placing things or which way the camera is oriented

Gemini models are a little bit better about spatial reasoning, but we’re still not there yet because these models were not designed to do spatial reasoning they were designed to process text

In my development, I also use the ascii matrix technique.

kleene_op2mo ago

Spatial awareness was also a huge limitation to Claude playing pokemon.

1 more reply

hypercube332mo ago

As far as 3d I don't have experience however it could be quite awful at that

1 more reply

miohtama2mo ago

They would need a spatial reason or layout specific tool, to translate to English and back

1 more reply

neonmagenta2mo ago

But will Claude pick up complaining guests and put them in a tiny isolated section of the park that only has a bathroom that charges $10 to use?

vermilingua2mo ago

I want to get off MR ALTMANS WILD RIDE.

petcat2mo ago

Question: There is still a competitive AoE2 community. Will that be destroyed by AI?

pbmonster2mo ago

I see no reason why AoE2 would be any different.

Oh, and humans still play Dota. It's still a highly competitive community. So that wasn't destroyed at all, most teams now use AI to study tactics and strategy.

bawolff2mo ago

ddtaylor2mo ago

Does this website do anything besides host the article with an animated background?

HelloUsername2mo ago

*OpenRCT2

seu2mo ago

> also completely unfazed by the premise that it has been 'hacked into' a late-90's computer game. This was surprising, but fits with Claude's playful personality and flexible disposition.

When I read things like this, I wonder if it's just me not understanding this brave new world, or half of AI developers are delusional and really believe that they are dealing with a sentient being.

bspammer2mo ago

It can be non-sentient and still have an observable personality. The same way a character in a novel can have a personality despite not being real.

vinyl72mo ago

Delusional

sriram_sun2mo ago

> "Where Claude excels:"

Am I reading a Claude generated summary here?

alt2272mo ago

I thought it sounded more like an ad for Claude written by Anthropic:

> "This was surprising, but fits with Claude's playful personality and flexible disposition."

vidarh2mo ago

1 more reply

afro882mo ago

Yes I believe so. Also things like forcing a "key insight" summary after the excels vs struggles section.

I would take any descriptions like "comprehensive", "sophisticated" etc with a massive grain of salt. But the nuts and bolts of how it was done should be accurate.

rnmmrnm2mo ago

this is cute but i imagined prompting the ai for a loop-di-loop roller coaster. If this could build complex ride it would be a game changer.

blibble2mo ago

yeah I was expecting it to... do something in the game? like build a ride

not just make up bullshit about events

azhenley2mo ago

Edit: HN's auto-resubmit in action, ignore.

Bluescreenbuddy2mo ago

What

eterm2mo ago

So, this link is actually 5 days old, if you hover the "2 hours ago" you'll see the date 5 days ago.

HN second-chance pool shenanigans.

1 more reply

bawolff2mo ago

joshcsimmons2mo ago

emeril2mo ago

yeah really - ramp.com is a credit card/expense platform that surely loses money right now...

pretty heavy/slow javascript but pretty functional nonetheless...

mock-possum2mo ago

Do you not think they’re charging enough or something?

ulf-777232mo ago

This is brilliant SEO work, I doubt that they loose money with it. With 40h and some additional for the landingpage it might be an expensive link bait, but definitely worth it. Kudos!

If not for SEO, it’s building quite a good reputation for this company, they got a lot of open positions.

I’m a big fan of transport tycoon, used to play it for hours as a kid and with Open Transport Tycoon it also might have been a good choice, but maybe not B2C?

fuzzy_lumpkins2mo ago

so the janitors will finally stay on their assigned footpaths?

nacozarina2mo ago

next up: Crusader Kings III

mcphage2mo ago

> You’re right, I did accidentally slaughter all the residents of Béziers. I won’t do that again. But I think that you’ll find God knows his own.

Forgeties792mo ago

Paradox future hire right here

Deukhoofd2mo ago

Braini2mo ago

In general you are right, I expect something like this to appear in the future and it would be cool.

1 more reply

Kapura2mo ago

"i vibe coded a thing to play video games for me"

i enjoy playing video games my own self. separately, i enjoy writing code for video games. i don't need ai for either of these things.

gordonhart2mo ago

Yeah, but can you use your enjoyment of video games as marketing material to justify a $32B valuation?

falloutx2mo ago

If you look at submissions from this website, its all just self glazing and "We did X with claude code"

yawnr2mo ago

Haha exactly. This screams “we have too many people working here and don’t know what to do with them”.

Jaysobel2mo ago

actually it was all to drive traffic to my 'rollercoaster coasters' Etsy store

https://bansostudio.etsy.com

TaupeRanger2mo ago

^ this guy funds

1 more reply

bigyabai2mo ago

That's fine. Tool-assisted speedruns long predate LLMs and they're boring as hell: https://youtu.be/W-MrhVPEqRo

It's still a neat perspective on how to optimize for super-specific constraints.

ai_2mo ago

That TAS is spliced. The stairs beyond the door aren't loaded, you need the key to load it.

This is a real console 0-star TAS: https://youtu.be/iUt840BUOYA

throwaway3141552mo ago

> Tool-assisted speedruns long predate LLMs and they're boring as hell

You and I have _very_ different definitions for the word boring. A lot of effort goes into TAS runs.

rangestransform2mo ago

I actually think it would be pretty fun to code something to play video games for me, it has a lot of overlap with robotics. Separately, I learned about assembly from cheat engine when I was a kid.

markbao2mo ago

echelon2mo ago

You do you. I find this exceedingly cool and I think it's a fun new thing to do.

It's kind of like how people started watching Let's Plays and that turned into Twitch.

LLMs and agents playing Pokemon and StarCraft? Also a ton of fun.

1 more reply

jsbisviewtiful2mo ago

AI for the sake of AI. Feels like a lot of the internet right now

j / k navigate · click thread line to collapse