It was really helpful to make and run a benchmark - it led to some important changes and improvements, so thanks again for your question kp!
The result is ~17% reduction in raw cost. If calculated per correct answer, its ~25% reduction per correct answer.
Just posted the update -> https://news.ycombinator.com/item?id=47016959