I updated a svelte component at work, and while i could test it in the browser and see it worked fine, the existing unit test suddenly started failing. I spent about an hour trying to figure out why the results logged in the test didn't match the results in the browser.
I got frustrated, gave in and asked Claude Code, an AI agent. The tool call loop is something like: it reads my code, then looks up the documentation, then proposed a change to the test which i approve, then it re-runs the test, feeds the output back into the AI, re-checks the documentation, and then proposes another change.
It's all quite impressive, or it would be if at one point it didn't randomly say "we fixed it! The first element is now active" -- except it wasn't, Claude thought the first element was element [1], when of course the first element in an array is [0]. The test hadn't even actually passed.
An hour and a few thousand Claude tokens my company paid for and got nothing back for lol.
Even in this example coding agent is short lived . I am curious about continuously running agents that are never done.
An example of my own, not agentic or running in a loop, but might be an interesting example of a use case for this stuff: I had a CSV file of old coupon codes I needed to process. Everything would start in limbo, uncategorized. Then I wanted to be able to search for some common substrings and delete them, search for other common substrings and keep them. I described what I wanted to do with Claude 3.7 and it built out a ruby script that gave me an interactive menu of commands like search to select/show all/delete selected/keep selected. It was an awesome little throwaway script that would’ve taken me embarrassingly long to write, or I could’ve done it all by hand in Excel or at the command line with grep and stuff, but I think it would’ve taken longer.
Honestly one of the hard things about using AI for me is remembering to try to use it, or coming up with interesting things to try. Building up that new pattern recognition.