I built Magpie because I was tired of AI code reviewers being too "nice."
Most AI tools just say "LGTM" or nitpick formatting. To fix this, Magpie uses an adversarial approach: it spawns two different AI agents (e.g., a Security Expert and a Performance Critic) and forces them to debate your changes.
They don't just list bugs; they attack each other's arguments until they reach a consensus. This cuts down on hallucinations and lazy approvals.
Features:
Adversarial Debate: Watch Claude and GPT-4o fight over your code.
Local & CI: Works on local files or GitHub PRs.
Model Agnostic: Supports OpenAI, Anthropic, and Gemini.
The Experiment: This is also an experiment in "coding without coding." I didn't write a single line of TypeScript for this project manually. The entire repo was built using Claude Code.
I'd love to hear your feedback—especially if you manage to make the models get into an infinite argument.
gdb showed that a critical pointer was garbage: 0x676974736e6f5373.
Usually, I’d suspect a race condition or a use-after-free. I stared at the hex for a while, checking for alignment issues or bit-flips, but it just looked like random entropy.
Out of frustration, I pasted the info locals dump into Gemini 3. I didn't ask it to fix the code, I just asked: "What do you see?"
It didn't try to analyze the C++ logic. Instead, it treated the address as data. It pointed out that on an x86-64 (Little Endian) system, 0x676974736e6f5373 decodes perfectly to the ASCII string: "sSonstig".
It clicked immediately. "Sonstig" is German for "Miscellaneous".
It turns out a legacy localization function was writing the category name "Sonstiges" into a stack buffer that was too small. It overflowed and perfectly overwrote the FiberManager pointer with the bytes of the word.
I think we often focus too much on LLMs for "Code Generation" (writing boilerplate). For me, the real killer feature is Pattern Recognition in raw data. I would have stared at that hex for hours seeing only noise; the model recognized the semantic meaning in milliseconds.
Has anyone else found LLMs useful specifically for decoding raw dumps or logs like this?
Six months in, the runtime performance is amazing, but our iteration speed is absolutely tanking.
It feels like we are paying a massive tax on every single feature. Just yesterday, I wasted an entire afternoon fighting CMake just to link a library that would have been a one-line go get or npm install in any other ecosystem. We also constantly deal with phantom bugs that turn out to be subtle ABI mismatches between our M1 Macs and the Linux CI runners—issues that simply don't exist in modern toolchains.
It’s frustrating because our "slower" competitors are shipping features weekly while we are stuck debugging linker errors or waiting for 20-minute clean builds.
I'm starting to wonder if the "performance moat" is a trap. For those who recently started infra projects: did you stick with C++? Did you bail for Rust/Go? Or do you just accept that velocity will be terrible in exchange for raw speed?