jahala on Hacker News

Show HN: Tilth v0.5.0 –> ~40% cheaper AI code navigation (160 runs, 3 models)

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain.

—

v0.5.0 was about figuring out why models weren’t using tilth tools consistently — even when they were available.

Results vs baseline (built-in tools only):

Sonnet 4.6: -44% $/correct (84% → 94% accuracy, 31% fewer turns)

Opus 4.6: -39% $/correct (91% → 92% accuracy, 37% fewer turns)

Haiku 4.5: -38% $/correct (54% → 73% accuracy, 7% fewer turns)

—

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

— PS: I don't have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

4jahala2mo ago2

Show HN: Reduce LLM token use by ~30% with this MCP/CLI tool(Claude benchmarked)

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain. --

v0.4.4: Added adaptive 2nd-hop impact analysis to callers search — when a function has ≤10 unique callers, tilth automatically traces callers-of-callers in a single scan. First full 26-task Opus baseline (previously 5 hard tasks only). Haiku adoption improved from 42% to 78%, flipping Haiku from a cost regression to -38% $/correct.

v0.4.5: Bumped TOKEN_THRESHOLD from 3500 to 6000 estimated tokens (~24KB), so mid-sized files return full content instead of an outline that agents then read back via 5–7 sequential --section calls. Fixed two major regressions: gin_radix_tree (+35% → ~tie) and rg_search_dispatch (+90% → -26% win). Sonnet hit 100% accuracy (52/52) and -34% $/correct overall.

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

-- PS: I dont have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

2jahala2mo ago1

Show HN: O-O – HTML/bash polyglot files that rewrite themselves (update)

Update on o-o (https://news.ycombinator.com/item?id=47052729) — polyglot HTML/bash files that research and rewrite themselves.

New stuff: files sync to each other. Edit the CSS in any file, run --sync css, and every sibling file gets the update. No build tool, no shared imports. Just files copying sections between themselves using markers.

Dark mode, responsive layout, search — the usual. Still zero deps beyond bash and Claude Code.

Demo: https://jahala.github.io/o-o/example/index.o-o.html

Repo: https://github.com/jahala/o-o

2jahala2mo ago0

Show HN: Tilth v0.4.1 – 29% cheaper Sonnet, 22% on Opus (benchmark: 114 runs)

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain.

v0.4.0 added search ranking, sibling surfacing, transitive callees, cognitive load stripping, smart truncation, and bloom filters. Got -17% on Sonnet, -20% on Opus.

v0.4.1 was pure instruction tuning — zero code changes that alone jumped Sonnet adoption from 89% to 98% and $ cost/correct answer from -17% to -29%.

The instruction tuning result surprised me. The model already knew tilth tools existed — it just wasn’t choosing them consistently. Making the replacement relationship explicit in the tool description was worth more than all the search ranking work in v0.4.0.

Haiku remains the outlier — only 42% tilth adoption despite instruction tuning.

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

-- PS: I dont have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

2jahala2mo ago0

Show HN: Tilth v0.3 – 17% cheaper AI code navigation (279 runs, 3 Claude models)

tilth gives AI agents structural code intelligence (tree-sitter definitions, callee resolution, smart outlining) via MCP. I benchmarked it on 21 code navigation tasks across 4 real repos (Express, FastAPI, Gin, ripgrep).

-> https://github.com/jahala/tilth

Results: Sonnet 4.5 — 26% cheaper per correct answer (79% → 86% accuracy). Opus 4.6 — 14% cheaper (and the only model+mode combo to crack the hardest task). Haiku 4.5 — 82% cheaper when forced to use tilth (69% → 100% accuracy at $0.04/answer).

We measure “cost per correct answer” — what you’d expect to spend before getting a usable answer under retry. A wrong answer isn’t a cheap success.

Interesting finding: smarter models adopt MCP tools voluntarily (Sonnet 95%, Opus 94%), but Haiku ignores them (9%). Instruction tuning didn’t help. Removing the overlapping built-in tools did.

https://github.com/jahala/tilth/blob/main/benchmark/README.m...

PS: I dont have the budget to run the benchmark a lot with Opus, so if any token whales has capacity to run some benchmarks, please feel free to PR results.

3jahala2mo ago0

Show HN: Tilth v0.5.0 –> ~40% cheaper AI code navigation (160 runs, 3 models)

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain.

—

v0.5.0 was about figuring out why models weren’t using tilth tools consistently — even when they were available.

Results vs baseline (built-in tools only):

Sonnet 4.6: -44% $/correct (84% → 94% accuracy, 31% fewer turns)

Opus 4.6: -39% $/correct (91% → 92% accuracy, 37% fewer turns)

Haiku 4.5: -38% $/correct (54% → 73% accuracy, 7% fewer turns)

—

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

— PS: I don't have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

4jahala2mo ago2

Show HN: Reduce LLM token use by ~30% with this MCP/CLI tool(Claude benchmarked)

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain. --

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

-- PS: I dont have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

2jahala2mo ago1

Show HN: O-O – HTML/bash polyglot files that rewrite themselves (update)

Update on o-o (https://news.ycombinator.com/item?id=47052729) — polyglot HTML/bash files that research and rewrite themselves.

Dark mode, responsive layout, search — the usual. Still zero deps beyond bash and Claude Code.

Demo: https://jahala.github.io/o-o/example/index.o-o.html

Repo: https://github.com/jahala/o-o

2jahala2mo ago0

Show HN: Tilth v0.4.1 – 29% cheaper Sonnet, 22% on Opus (benchmark: 114 runs)

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain.

v0.4.0 added search ranking, sibling surfacing, transitive callees, cognitive load stripping, smart truncation, and bloom filters. Got -17% on Sonnet, -20% on Opus.

v0.4.1 was pure instruction tuning — zero code changes that alone jumped Sonnet adoption from 89% to 98% and $ cost/correct answer from -17% to -29%.

Haiku remains the outlier — only 42% tilth adoption despite instruction tuning.

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

-- PS: I dont have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

2jahala2mo ago0

Show HN: Tilth v0.3 – 17% cheaper AI code navigation (279 runs, 3 Claude models)

-> https://github.com/jahala/tilth

We measure “cost per correct answer” — what you’d expect to spend before getting a usable answer under retry. A wrong answer isn’t a cheap success.

Interesting finding: smarter models adopt MCP tools voluntarily (Sonnet 95%, Opus 94%), but Haiku ignores them (9%). Instruction tuning didn’t help. Removing the overlapping built-in tools did.

https://github.com/jahala/tilth/blob/main/benchmark/README.m...

PS: I dont have the budget to run the benchmark a lot with Opus, so if any token whales has capacity to run some benchmarks, please feel free to PR results.

3jahala2mo ago0

jahala

Recent submissions

Show HN: Walkie-Clawkie – Push-to-talk between AI agents, one file, zero deps (opens in new tab)

Show HN: Tilth v0.5.0 –> ~40% cheaper AI code navigation (160 runs, 3 models)

Show HN: Reduce LLM token use by ~30% with this MCP/CLI tool(Claude benchmarked)

Buddhist concept of suffering explained with AI generated video and music (HQ) (opens in new tab)

Show HN: Mrkd – A native macOS Markdown viewer with iTerm2/VSCode theme import (opens in new tab)

Enshittification - and how to resist it [video] (opens in new tab)

Show HN: O-O – HTML/bash polyglot files that rewrite themselves (update)

Show HN: O-O – HTML files that update themselves (opens in new tab)

Show HN: Tilth v0.4.1 – 29% cheaper Sonnet, 22% on Opus (benchmark: 114 runs)

Show HN: Tilth v0.3 – 17% cheaper AI code navigation (279 runs, 3 Claude models)

Show HN: Tilth – I spent tokens so my agents would stop wasting them (~4k Rust) (opens in new tab)

Recent submissions

Show HN: Walkie-Clawkie – Push-to-talk between AI agents, one file, zero deps (opens in new tab)

Show HN: Tilth v0.5.0 –> ~40% cheaper AI code navigation (160 runs, 3 models)

Show HN: Reduce LLM token use by ~30% with this MCP/CLI tool(Claude benchmarked)

Buddhist concept of suffering explained with AI generated video and music (HQ) (opens in new tab)

Show HN: Mrkd – A native macOS Markdown viewer with iTerm2/VSCode theme import (opens in new tab)

Enshittification - and how to resist it [video] (opens in new tab)

Show HN: O-O – HTML/bash polyglot files that rewrite themselves (update)

Show HN: O-O – HTML files that update themselves (opens in new tab)

Show HN: Tilth v0.4.1 – 29% cheaper Sonnet, 22% on Opus (benchmark: 114 runs)

Show HN: Tilth v0.3 – 17% cheaper AI code navigation (279 runs, 3 Claude models)

Show HN: Tilth – I spent tokens so my agents would stop wasting them (~4k Rust) (opens in new tab)