- Cursor attempted to make a browser from scratch: https://cursor.com/blog/scaling-agents
- Anthropic attempted to make a C Compiler: https://www.anthropic.com/engineering/building-c-compiler
I have been wondering if there are software packages that can be easily reproduced by taking the available test suites and tasking agents to work on projects until the existing test suites pass.
After playing with this concept by having Claude Code reproduce redis and sqlite, I began looking for software packages where an agent-made reproduction might actually be useful.
I found libxml2, a widely used, open-source C language library designed for parsing, creating, and manipulating XML and HTML documents. Three months ago it became unmaintained with the update, "This project is unmaintained and has [known security issues](https://gitlab.gnome.org/GNOME/libxml2/-/issues/346). It is foolish to use this software to process untrusted data.".
With a few days of work, I was able to create xmloxide, a memory safe rust replacement for libxml2 which passes the compatibility suite as well as the W3C XML Conformance Test Suite. Performance is similar on most parsing operations and better on serialization. It comes with a C API so that it can be a replacement for existing uses of libxml2.
- crates.io: https://crates.io/crates/xmloxide
- GitHub release: https://github.com/jonwiggins/xmloxide/releases/tag/v0.1.0
While I don't expect people to cut over to this new and unproven package, I do think there is something interesting to think about here in how coding agents like Claude Code can quickly iterate given a test suite. It's possible the legacy code problem that COBOL and other systems present will go away as rewrites become easier. The problem of ongoing maintenance to fix CVEs and update to later package versions becomes a larger percentage of software package management work.
It should also be noted that the remaining security issues in the core parser have to do with algorithmic complexity, not memory safety. Many other parts of libxml2 aren't security-critical at all.
Second, I found this entirely by accident just now: https://www.sovereign.tech/programs/fellowship
> For the duration of the fellowship, one “maintainer-in-residence” will be employed up to full-time (32-40 hours per week) as part of the Sovereign Tech Agency team. > This option offers the maintainer the personal and professional advantages of being part of team, as well as the stability of being employed to continue working on critical FOSS infrastructure. > This position is only available for maintainers located in Germany,
I know a few companies have programs where engineers can designate specific projects as important and give them funds. But it doesn't happen enough to support all the projects that currently need work, maybe AI coding tools will lower the cost of maintenance enough to improve this.
I do think there are two possible approaches that policy makers could consider.
1) There could probably be tax credits or deductions for SWEs who 'volunteer' their time to work on these projects.
2) Many governments have tried to create cyber reserve corps, I bet they could designate people as maintainers of key projects that they rely on to maintain both the projects as well as people skilled with the tools that they deem important.
The alternative is another XZ backdoor.
Why exclusive to SWEs? They tend to be more time-restricted than financial-restricted (assuming the "SWE" comes from a job description). I'd be more interested in making sure that those with less well-paying jobs are able to access such benefits rather than stacking it onto those already (probably) making 6-figures.
Of course, the problems arise in the details. Define "volunteer": if $DAYJOB also uses it (in a way related to my role), is it actually, instead, wage theft? Also, quantifying the benefit is a sticky question. Is maintaining 10k emoji packages on NPM equivalent to volunteer work on libcurl? Could it ever be? Is it volunteer work if it ends up with a bug bounty payday? Google's fuzzing grant incentives?
redhat, apple, samsung, huawei, google, etc...
Conclusion: support OSS from general taxation, like the Sovereign Tech Fund in Germany does. It's a public good!
OSS is allowed to make money and there are projects that require paid licenses for commerical use.
The source is available and collaborative.
Qt states this on their site: Simply put, this is how it works: In return for the value you receive from using Qt to create your application, you are expected to give back by contributing to Qt or buying Qt.
And there is a lot of companies out there that make their money based on open source software, red hat is maybe the biggest and most well known.
EDIT: Sorry, I’ve had a shitty day and that wasn’t a helpful comment at all. I should’ve said that as I understand it TOTC primarily relates to finite resources, so I don’t think it applies here. Sorry again for being a dick.
As a side note and this isn't a knock on your project specifically. I think the community needs to normalize disclaimers for "vibe-coded" packages. Consumers really need to understand the potential risks of relying on agent-generated code upfront.
Unlike the development work of old (pre-2025), work with high-end models incurs a very direct monetary cost, one burns tokens which cost money, and you can't have something as powerful to be running locally (even if you happened to have a Mac Pro Ultra with RAM maxed out).
Some of my friends burned through hundreds of dollars a day while doing large amounts of (allegedly efficient) work with Claude Code.
As for the workflow, I think the best advice I can give is to setup as many guardrails and tools as possible, so Claude and do as many iterations before needing any intervention. So in this case I setup pre-commit hooks for linting and formatting, gave it access to the full testing suite, and let it rip. The majority of the work was done in a single thinking loop that lasted ~3 hours where Claude was able to run the tests, see what failed, and iterate until they all passed. From there, there was still lots of iterations to add features, clean up, test, and improve performance - but allowing Claude to iterate quickly on it's own without my involvement was crucial.
If I were looking for an XML parser/generator library, I might stumble across this and think it might be production-quality, and assume it was built by humans, or at least that humans had fully vetted and understand the code.
This feels like if you want to know if the code is good or bad, read the code and check the tests. Assuming human = good, LLM = bad does not make much sense given the amount of bad human code I've seen.
Sure, if the code is from a repuatable company or creator then I'd take that as a strong signal quality over an LLM but I wouldn't take a random human programmer as a strong signal over generated.
Why "in the public API"? Does this imply it's using unsafe behind the hood? If so, what for?
The only usages of unsafe are in src/ffi, which is only compiled when the ffi feature is enabled. ffi is fundamentally unsafe ("unsafe" meaning "the compiler can't automatically verify this code won't result in undefined behavior") so using it there is reasonable, and the rest of the crate is properly free of unsafe.
It is absolutely a useful distinction on whether your users need to deal with unsafe themselves or not.
No or very little but verified unsafe internal code is the bar for many Rust reimplementations. It would also be what keeps the code memory safe.
This is a point I've tried to advocate for a while. Specially to empower non coders and make them see that we CAN approach automation with control.
Some aspects will be the classic unit or integration tests for validation. Others, will be AI Evals [1] which to me could be the common language for product design for different families/disciplines who don't quite understand how to collaborate with each other.
The amount of progress in a short time is amazing to see.
- [1] https://ai-evals.io/
Words win when they're used. Just because Agent Skills is just a pattern for standarization and saving context doesn't mean it wasn't incredibly useful.
Think beyond software developers by trade. Think beyond people those who realized they needed tests instead of those who thought "the models will just get smarter" and "they told me there's guardrails".
It could be doing double checks in both tokeniser and parser and things like that.
Actually looks like a good starting point and reference for someone working on xml parsers in rust.
libxml2 is always one of those libraries that i used to have trouble with for different platforms
I think its great that more and more OSS projects get attention now with ai coding agents
Doesn't seem to have shut down or even be unmaintained. Perhaps it was briefly, and has now been resurrected?
It’s time to make this mandatory.
Nothing against AI - just to inform people about quality, maintainability and future of this library. No human has mental model of the code, so don’t waste your time creating it - the original author didn’t either.
none of your arguments make sense here