Coding agent in 94 lines of Ruby (opens in new tab)

(radanskoric.com)

153 pointsradanskoric1y ago84 comments

84 comments

38 comments · 12 top-level

RangerScience1y ago· 5 in thread

This is very cool, somewhat inspiring, and (personally) very informative: I didn't actually know what "agentic" AI use was, but this did an excellent job (incidentally!) explaining it.

Might poke around...

What makes something a good potential tool, if the shell command can (technically) can do anything - like running tests?

(or it is just the things requiring user permission vs not?)

radanskoricOP1y ago

Thanks, sharing my learnings on how coding agents work was my main intention with the article. Personally I was a bit surprised by how much of the "magic" is coming directly from the underlying LLM.

The shell command can run anything really. When I tested it, it asked me multiple times to run the tests and then I could see it fixing the tests in iterations. Very interesting to observe.

If I was to improve this to be a better Ruby agent (which I don't plan to do, at least not yet), I would probably try adding some Rspec/Minitest specific tools that would parse the response and present it back to the LLM in a cleaned up format.

RangerScience1y ago

Do you know of examples of other agents with more defined tools, to use as inspiration/etc?

(Like - what would it look like to clean up test results for an LLM?)

elif1y ago

Why stop there? Give it a capybara tool and make it a full TDD agent

1 more reply

tough1y ago

> What makes something a good potential tool, if the shell command can (technically) can do anything - like running tests?

Think of it as -semantic- wrappers so the LLM can -decide- what action to take at any given moment given its context, the user prompt, and available tools names and descriptions.

creating wrappers for the most used basic tools even if they all pipe to terminal unix commands can be useful.

also giving it speicif knowledge base it can consult on demand like a wiki of its own stack etc

notpushkin1y ago

Also it’s safer than just giving unrestricted shell access to an LLM.

1 more reply

johnisgood1y ago· 5 in thread

Side-note: I do not understand the inclusion of "N lines of X". You import a library, which presumably consists of many lines. I do not see the point. It would be true that this is only 94 lines of Ruby if and only if there was no "require "ruby_llm/tool"" at the top.

radanskoricOP1y ago

I put the lines of code into the title to communicate to the reader that they can get a good understanding just by reading this article.

Basically, what I wanted to say was: "Here is an article on building a prototype coding agent in Ruby that explains how it works and the code is just 94 lines so you'll really be able to get a good understanding just by reading this article."

But that's a bit too long for a title. :)

When understanding a certain concept, it's very useful to be able to see just the code that's relevant to the concept. Ruby language design enables that really well. Also, Ruby community in general puts a lot of value on readability. Which is why with Ruby it's often possible to eliminate almost all of the boilerplate while still keeping the code relatively flexible.

zoky1y ago

That actually is exactly the point. It has to do with the expressiveness of the language as well as how much you can do with the available toolset. If I showed you a 200-line program to play hangman written in C and a 2000-line equivalent program written in assembly, it wouldn’t really be useful to take into account the 15 million lines of code in the C compiler when trying to compare the two languages.

johnisgood1y ago

I do not think it is any meaningful. If you have such a library in C, or Common Lisp, or Forth, then using that library is probably always going to be just a few lines of code. The library just has to have a good enough API.

1 more reply

monooso1y ago

Given that this post is a response to an article about achieving the same in "N lines of Go" (also using a library), it seems like an appropriate title.

johnisgood1y ago

The original post uses "github.com/anthropics/anthropic-sdk-go", the Ruby uses a different library, does it not? If they are two different libraries, then the comparison does not make too much sense.

2 more replies

Mystery-Machine1y ago· 4 in thread

Just out of curiosity, I never understood why people do `ENV.fetch("ANTHROPIC_API_KEY", nil)` which is the equivalent of `ENV["ANTHROPIC_API_KEY"]`. I thought the whole point of calling `.fetch` was to "fail fast". Instead of assigning `nil` as default and having `NoMethodError: undefined method 'xxx' for nil` somewhere random down the line, you could fail on the actual line where a required (not optional) ENV var wasn't found. Can someone please explain?

jaredsohn1y ago

There might be code later that says that if the anthropic api key is not set, then turn off the LLM feature. Wouldn't make sense for this LLM-related code but the concept makes sense for using various APIs from dev.

riffraff1y ago

But if you do ENV[xxx] the value is also set to nil.

Using .fetch with a default of nil is what's arguably not very useful.

IMO it's just a robocop rule to use .fetch, which is useful in general for exploding on missing configuration but not useful if a missing value is handled.

radanskoricOP1y ago

Author here. You're actually right here.

I took the code from RubyLLM configuration documentation. If you're pulling in a lot of config options and some have default values then there's value in symmetry. Using fetch with nil communicates clearly "This config, unlike those others, has no default value". But in my case, that benefit is not there so I think I'll change it to your suggestion when I touch the code again.

zeckalpha1y ago

This may be a Pythonism, where the exception raising convention is inverse of Ruby's.

{}["key"] # KeyError in Python

elif1y ago· 3 in thread

Thank you for showing off why ruby is useful not just in the current year, but particular to the current time and AI situation. When you're dealing with code written with hallucinations, you want an easy to understand quickly language (of which ruby is S tier) where out of place behavior cannot hide in code so repetitive and unnecessary that your mind tries to skip over it.

radanskoricOP1y ago

That's an excellent point.

Code was always read more than written. With AI it shifts even more towards reading so language readability becomes even more important. And Ruby really shines there.

dontlaugh1y ago

Ruby, the language famous for lots of difficult to understand runtime “magic”?

radanskoricOP1y ago

It's a sharp knife. You can create a messy nightmare or a clean super readable codebase, it's up to how good the author is.

1 more reply

thih91y ago· 2 in thread

> Claude is trained to recognise the tool format and to respond in a specific format.

Does that mean that it wouldn’t work with other LLMs?

E.g. I run Qwen3-14B locally; would that or any other model similar in size work?

radanskoricOP1y ago

It would work with most other Tool enabled LLMs. RubyLLM abstracts away the format. Some will work better than the others, depending on the provider, but almost all have tool support.

Claude is just an example. I pulled the actual payloads by looking at what is actually being sent to Claude and what it is responding. It might vary slightly for other providers. I used Clause because I already had a key ready from trying it out before.

simonw1y ago

Qwen3 was trained for tool usage too. Most models are these days.

https://qwenlm.github.io/blog/qwen3/#agentic-usages

thih91y ago· 2 in thread

> return { error: "User declined to execute the command" }

I wonder if AIs that receive this information within their prompt might try to change the user’s mind as part of reaching their objective. Perhaps even in a dishonest way.

To be safe I’d write “error: Command cannot be executed at the time”, or “error: Authentication failure”. Unless you control the training set; or don’t care about the result.

Interesting times.

radanskoricOP1y ago

If a certain user is susceptible to having the LLM convince them to run an unsafe command, I fear we can't fix that by trying to trick the LLM. :D

Either the user needs to be educated or we need to restrict what the user themselves can do.

johnisgood1y ago

I am leaning towards the former. Please let us have nice things despite the people unwilling to learn.

1 more reply

rbitar1y ago· 1 in thread

RubyLLM has been a joy to work with so nice to see it’s being used here. This project is also great and will make it easier to build an agent that can fetch data outside of the codebase for context and/or experiment with different system prompts. I’ve been a personal fan of claude code but this will be fun to work with

radanskoricOP1y ago

Author here. The code I made took me 3 hours (including getting up to speed on RubyLLM). I also intentionally DIDN'T use a coding assistant to write it (although I use Windsurf in my regular work). :D

It's clearly not a full featured agent but the code is here and it's a nice starting point for a prototype: https://github.com/radanskoric/coding_agent

My best hope for it is that people will use it to experiment with their own ideas. So if you like it, please feel free to fork it. :)

fullstackwife1y ago· 1 in thread

This reminds me about PHP hello world programs which would take a string from GET, use it as a path, read a file from this path, and return the content in the response. You could make a website while not using any knowledge about websites.

Agents are the new PHP scripts!

zeckalpha1y ago

RCE as a service!

ColinEberhardt1y ago· 1 in thread

Great post, thanks for sharing. I wrote something similar a couple of years ago, showing just how simple it is to work with LLMs directly rather than through LangChain, adding tool use etc …

https://blog.scottlogic.com/2023/05/04/langchain-mini.html

It is of course quite out of date now as LLMs have native tool use APIs.

However, it proves a similar point to yours, in most applications 99% of the power is within the LLM. The rest is often just simple plumbing.

radanskoricOP1y ago

Thanks for sharing this. The field moves so yes, it's out of date, but it's useful to see how the tools concept evolved. Especially since I wasn't paying attention at that area of development back when you wrote your article. Very interesting.

mattbrewsbytes1y ago· 1 in thread

Wow, so that RubyLLM gem makes writing an agent more about basic IO operations. I have somehow thought there needed to be deep understanding of LLMs and/or AI APIs to build things like this where I would need to research and read a lot of docs, stay up to date on the endless updates the various AI systems have, etc. The example from the article is about files and directories, this same concept could apply to any text inputs, like data out of a Rails app.

radanskoricOP1y ago

That was my misunderstanding as well. That's why I wrote the article.

Btw, it's not even about the RubyLLM gem. The gem abstracts away the calling of various LLM providers and gives a very clean and easy to use interface. But it's not what gives the "agentic magic". The magic is pretty much all in the underlying LLMs.

Seeing all the claims made by some closed source agent products (remember the "world's first AI software engineer"?) I thought that a fair amount of AI innovation is in the agent tool itself. So I was surprised when I realised that almost all of the "magic" parts are coming from the underlying LLM.

It's also kind of nice because it means that if you wanted to work on an agent product you can do that even if you're not an AI specialised engineer (like I am not).

sagarpatil1y ago· 1 in thread

I don’t understand the hype in the original post.

OpenAI launched function calls two years ago and it was always possible to create a simple coding agent.

radanskoricOP1y ago

Author here. The part about coding agents that wasn't clear to me was how much of the "magic" is in the underlying LLM and how much in the code around it making it into an agent.

When I realised that it's mostly in the LLM I found that a bit surprising. Also, since I'm not an AI Engineer, I was happy to realise that my "regular programming" skills would be enough if I wanted to build a coding agent.

It sounds like you were aware of that for a while now, but I and a lot of other people weren't. :)

That was my motivation for writing the article.

melvinroest1y ago

The way I'd create extra functionality is to give command-line access with a permission step in between. I'd then create a folder of useful scripts and give it permission to execute those.

You can make it much more than just a coding agent. I personally use my personal LLMs for data analysis by integrating it with some APIs.

These type of LLM systems are basically acting as a frontend now that respond to very fuzzy user input. Such an LLM can reach out to your own defined functions (aka a backend).

The app space that I think is interesting and that I'm working on is creating these systems combined with some solid data creating advicing/coaching/recommendation systems.

If you want some input on building something like that, my email is in my profile. Currently I'm playing around with an LLM chat interface with database access that gives study advice based on:

* HEXACO data (personality)

* Motivational data (self-determination theory)

* ESCO data (skills data)

* Descriptions of study programs described in ESCO data

If you want to chat about creating these systems, my email is in my profile. I'm currently also looking for freelance opportunities based on things like this as I think there are many LLM applications to which we've only scratched the surface.

j / k navigate · click thread line to collapse

84 comments

38 comments · 12 top-level

RangerScience1y ago· 5 in thread

This is very cool, somewhat inspiring, and (personally) very informative: I didn't actually know what "agentic" AI use was, but this did an excellent job (incidentally!) explaining it.

Might poke around...

What makes something a good potential tool, if the shell command can (technically) can do anything - like running tests?

(or it is just the things requiring user permission vs not?)

radanskoricOP1y ago

Thanks, sharing my learnings on how coding agents work was my main intention with the article. Personally I was a bit surprised by how much of the "magic" is coming directly from the underlying LLM.

The shell command can run anything really. When I tested it, it asked me multiple times to run the tests and then I could see it fixing the tests in iterations. Very interesting to observe.

RangerScience1y ago

Do you know of examples of other agents with more defined tools, to use as inspiration/etc?

(Like - what would it look like to clean up test results for an LLM?)

elif1y ago

Why stop there? Give it a capybara tool and make it a full TDD agent

1 more reply

tough1y ago

> What makes something a good potential tool, if the shell command can (technically) can do anything - like running tests?

Think of it as -semantic- wrappers so the LLM can -decide- what action to take at any given moment given its context, the user prompt, and available tools names and descriptions.

creating wrappers for the most used basic tools even if they all pipe to terminal unix commands can be useful.

also giving it speicif knowledge base it can consult on demand like a wiki of its own stack etc

notpushkin1y ago

Also it’s safer than just giving unrestricted shell access to an LLM.

1 more reply

johnisgood1y ago· 5 in thread

radanskoricOP1y ago

I put the lines of code into the title to communicate to the reader that they can get a good understanding just by reading this article.

But that's a bit too long for a title. :)

zoky1y ago

johnisgood1y ago

1 more reply

monooso1y ago

Given that this post is a response to an article about achieving the same in "N lines of Go" (also using a library), it seems like an appropriate title.

johnisgood1y ago

The original post uses "github.com/anthropics/anthropic-sdk-go", the Ruby uses a different library, does it not? If they are two different libraries, then the comparison does not make too much sense.

2 more replies

Mystery-Machine1y ago· 4 in thread

jaredsohn1y ago

riffraff1y ago

But if you do ENV[xxx] the value is also set to nil.

Using .fetch with a default of nil is what's arguably not very useful.

IMO it's just a robocop rule to use .fetch, which is useful in general for exploding on missing configuration but not useful if a missing value is handled.

radanskoricOP1y ago

Author here. You're actually right here.

zeckalpha1y ago

This may be a Pythonism, where the exception raising convention is inverse of Ruby's.

{}["key"] # KeyError in Python

elif1y ago· 3 in thread

radanskoricOP1y ago

That's an excellent point.

Code was always read more than written. With AI it shifts even more towards reading so language readability becomes even more important. And Ruby really shines there.

dontlaugh1y ago

Ruby, the language famous for lots of difficult to understand runtime “magic”?

radanskoricOP1y ago

It's a sharp knife. You can create a messy nightmare or a clean super readable codebase, it's up to how good the author is.

1 more reply

thih91y ago· 2 in thread

> Claude is trained to recognise the tool format and to respond in a specific format.

Does that mean that it wouldn’t work with other LLMs?

E.g. I run Qwen3-14B locally; would that or any other model similar in size work?

radanskoricOP1y ago

It would work with most other Tool enabled LLMs. RubyLLM abstracts away the format. Some will work better than the others, depending on the provider, but almost all have tool support.

simonw1y ago

Qwen3 was trained for tool usage too. Most models are these days.

https://qwenlm.github.io/blog/qwen3/#agentic-usages

thih91y ago· 2 in thread

> return { error: "User declined to execute the command" }

I wonder if AIs that receive this information within their prompt might try to change the user’s mind as part of reaching their objective. Perhaps even in a dishonest way.

To be safe I’d write “error: Command cannot be executed at the time”, or “error: Authentication failure”. Unless you control the training set; or don’t care about the result.

Interesting times.

radanskoricOP1y ago

If a certain user is susceptible to having the LLM convince them to run an unsafe command, I fear we can't fix that by trying to trick the LLM. :D

Either the user needs to be educated or we need to restrict what the user themselves can do.

johnisgood1y ago

I am leaning towards the former. Please let us have nice things despite the people unwilling to learn.

1 more reply

rbitar1y ago· 1 in thread

radanskoricOP1y ago

It's clearly not a full featured agent but the code is here and it's a nice starting point for a prototype: https://github.com/radanskoric/coding_agent

My best hope for it is that people will use it to experiment with their own ideas. So if you like it, please feel free to fork it. :)

fullstackwife1y ago· 1 in thread

Agents are the new PHP scripts!

zeckalpha1y ago

RCE as a service!

ColinEberhardt1y ago· 1 in thread

Great post, thanks for sharing. I wrote something similar a couple of years ago, showing just how simple it is to work with LLMs directly rather than through LangChain, adding tool use etc …

https://blog.scottlogic.com/2023/05/04/langchain-mini.html

It is of course quite out of date now as LLMs have native tool use APIs.

However, it proves a similar point to yours, in most applications 99% of the power is within the LLM. The rest is often just simple plumbing.

radanskoricOP1y ago

mattbrewsbytes1y ago· 1 in thread

radanskoricOP1y ago

That was my misunderstanding as well. That's why I wrote the article.

It's also kind of nice because it means that if you wanted to work on an agent product you can do that even if you're not an AI specialised engineer (like I am not).

sagarpatil1y ago· 1 in thread

I don’t understand the hype in the original post.

OpenAI launched function calls two years ago and it was always possible to create a simple coding agent.

radanskoricOP1y ago

Author here. The part about coding agents that wasn't clear to me was how much of the "magic" is in the underlying LLM and how much in the code around it making it into an agent.

It sounds like you were aware of that for a while now, but I and a lot of other people weren't. :)

That was my motivation for writing the article.

melvinroest1y ago

The way I'd create extra functionality is to give command-line access with a permission step in between. I'd then create a folder of useful scripts and give it permission to execute those.

You can make it much more than just a coding agent. I personally use my personal LLMs for data analysis by integrating it with some APIs.

These type of LLM systems are basically acting as a frontend now that respond to very fuzzy user input. Such an LLM can reach out to your own defined functions (aka a backend).

The app space that I think is interesting and that I'm working on is creating these systems combined with some solid data creating advicing/coaching/recommendation systems.

If you want some input on building something like that, my email is in my profile. Currently I'm playing around with an LLM chat interface with database access that gives study advice based on:

* HEXACO data (personality)

* Motivational data (self-determination theory)

* ESCO data (skills data)

* Descriptions of study programs described in ESCO data

j / k navigate · click thread line to collapse