undefined | Better HN

0 pointsdisgruntledphd25mo ago0 comments

> (And the automated test suite will help them confirm that the refactoring worked properly, because naturally you had them construct an automated test suite when they built those original features, right?)

I dunno, maybe I have high standards but I generally find that the test suites generated by LLMs are both over and under determined. Over-determined in the sense that some of the tests are focused on implementation details, and under-determined in the sense that they don't test the conceptual things that a human might.

That being said, I've come across loads of human written tests that are very similar, so I can see where the agents are coming from.

You often mention that this is why you are getting good results from LLMs so it would be great if you could expand on how you do this at some point in the future.

0 comments

7 comments · 7 top-level

simonw5mo ago

I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.

One last tip I use a lot is this:

  Clone datasette/datasette-enrichments
  from GitHub to /tmp and imitate the
  testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

2 more replies

jihadjihad5mo ago

In my experience asking the model to construct an automated test suite, with no additional context, is asking for a bad time. You'll see tests for a custom exception class that you (or the LLM) wrote that check that the message argument can be overwritten by the caller, or that a class responds to a certain method, or some other pointless and/or tautological test.

If you start with an example file of tests that follow a pattern you like, along with the code the tests are for, it's pretty good at following along. Even adding a sentence to the prompt about avoiding tautological tests and focusing on the seams of functions/objects/whatever (integration tests) can get you pretty far to a solid test suite.

2 more replies

archagon5mo ago

I get the sense that many programmers resent writing tests and see them as a checkbox item or even boilerplate, not a core part of their codebase. Writing great tests takes a lot of thought about the myriad of bizarre and interesting ways your code will run. I can’t imagine that prompting an LLM to “write tests for this code” will result in anything but the most trivial of smoke test suites.

Incidentally, I wonder if anyone has used LLMs to generate complex test scenarios described in prose, e.g. “write a test where thread 1 calls foo, then before hitting block X, thread 2 calls bar, then foo returns, then bar returns” or "write a test where the first network call Framework.foo makes returns response X, but the second call returns error Y, and ensure the daemon runs the appropriate mitigation code and clears/updates database state." How would they perform in this scenario? Would they add the appropriate shims, semaphores, test injection points, etc.?

krschacht4mo ago

For something like tests, where I have very specific opinions on how I want them written, I have a simple doc (tests.md) and I’ll regularly tag Claude with it.

Claude writes a bunch of new code and I’ll tell it, “Before I review this code, make sure all tests adhere to the guidance of @tests.md” (you can probably make this a slash command too)

I find that if I put these instructions in the system prompt, far down in a conversation that’s used lots of the context window, they will only loosely be followed. But when I tag it in like this, Claude will strongly and thoughtfully follow the guidance and examples I’ve written up about how I want my tests.

kaydub5mo ago

Once the agent writes your tests, have another agent review them and ask that agent to look for pointless tests, to make sure testing is around more than just the "happy path", etc. etc.

Just like anything else in software, you have to iterate. The first pass is just to thread the needle.

wvenable5mo ago

> I dunno, maybe I have high standards

I don't get it. I have insanely high standards so I don't let the LLM get away with not meeting my standards. Simple.

touristtam5mo ago

Embrace TDD? Write those tests and tell the agent to write the subject under test?

1 more reply

j / k navigate · click thread line to collapse

0 comments

7 comments · 7 top-level

simonw5mo ago

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

One last tip I use a lot is this:

  Clone datasette/datasette-enrichments
  from GitHub to /tmp and imitate the
  testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

2 more replies

jihadjihad5mo ago

2 more replies

archagon5mo ago

krschacht4mo ago

For something like tests, where I have very specific opinions on how I want them written, I have a simple doc (tests.md) and I’ll regularly tag Claude with it.

Claude writes a bunch of new code and I’ll tell it, “Before I review this code, make sure all tests adhere to the guidance of @tests.md” (you can probably make this a slash command too)

kaydub5mo ago

Once the agent writes your tests, have another agent review them and ask that agent to look for pointless tests, to make sure testing is around more than just the "happy path", etc. etc.

Just like anything else in software, you have to iterate. The first pass is just to thread the needle.

wvenable5mo ago

> I dunno, maybe I have high standards

I don't get it. I have insanely high standards so I don't let the LLM get away with not meeting my standards. Simple.

touristtam5mo ago

Embrace TDD? Write those tests and tell the agent to write the subject under test?

1 more reply

j / k navigate · click thread line to collapse