1Show HN: Cheddar-bench – unsupervised benchmark for coding agents (opens in new tab)(github.com)9przadka1mo ago0
4I compared my daughter against SOTA models on math puzzles (opens in new tab)(blog.michalprzadka.com)15przadka1y ago3