Can LLMs Do Accounting? (opens in new tab)

(accounting.penrose.com)

6 pointsyunyu11mo ago8 comments

8 comments

8 comments · 4 top-level

yunyuOP11mo ago· 2 in thread

LLMs are on the verge of replacing data scientists and investment bankers. But can they perform simple accounting tasks for a real business?

We built AccountingBench, a test where LLMs must "close the books" for a real SaaS business using 1 year of Stripe/Ramp/Rippling/Mercury data.

Claude 4 and Grok 4 start strong - within 1% of human CPA baselines in month 1.

But as time progresses, all models inevitably accumulate compounding errors and exhibit erratic behavior, causing significant deviations.

That said, the early accuracy here is promising. With targeted post-training, models may be able to replace humans for this kind of work.

simmerup11mo ago

Accounting isn't really the type of thing that can accept errors though is it?

Like it needs to be 0% error rate

yunyuOP11mo ago

A certain level of errors is tolerable/inevitable. But the accountants need to be able to correct for them once they build up

AlSweigart11mo ago· 1 in thread

LLMs are really not good at following specific processes like math. They operate off vibes.

Ask Claude to multiply two ten-digit numbers. It gets the first one or two digits correct, and then makes up the rest.

ChatGPT used to have the same problem, but now it writes a program to perform the math for it.

yunyuOP11mo ago

This was true up until they started training them using Reinforcement Learning from Verifier Feedback (started with O1). By sticking a calculator in the training loop, they seem to have gotten out of the arithmetic error regime. That said, the ChatGPT default is 4o which is still susceptible to these issues.

bell-cot11mo ago· 1 in thread

Given their inclination to fabricate user-pleasing answers...could I let an LLM do my tax returns?

yunyuOP11mo ago

No comment, the good news is that accounting and taxes are verifiable - so in principle it is possible to RL models to do them correctly

mmarian11mo ago

I was just thinking of that earlier today, really cool!

j / k navigate · click thread line to collapse

8 comments

8 comments · 4 top-level

yunyuOP11mo ago· 2 in thread

LLMs are on the verge of replacing data scientists and investment bankers. But can they perform simple accounting tasks for a real business?

We built AccountingBench, a test where LLMs must "close the books" for a real SaaS business using 1 year of Stripe/Ramp/Rippling/Mercury data.

Claude 4 and Grok 4 start strong - within 1% of human CPA baselines in month 1.

But as time progresses, all models inevitably accumulate compounding errors and exhibit erratic behavior, causing significant deviations.

That said, the early accuracy here is promising. With targeted post-training, models may be able to replace humans for this kind of work.

simmerup11mo ago

Accounting isn't really the type of thing that can accept errors though is it?

Like it needs to be 0% error rate

yunyuOP11mo ago

A certain level of errors is tolerable/inevitable. But the accountants need to be able to correct for them once they build up

AlSweigart11mo ago· 1 in thread

LLMs are really not good at following specific processes like math. They operate off vibes.

Ask Claude to multiply two ten-digit numbers. It gets the first one or two digits correct, and then makes up the rest.

ChatGPT used to have the same problem, but now it writes a program to perform the math for it.

yunyuOP11mo ago

bell-cot11mo ago· 1 in thread

Given their inclination to fabricate user-pleasing answers...could I let an LLM do my tax returns?

yunyuOP11mo ago

No comment, the good news is that accounting and taxes are verifiable - so in principle it is possible to RL models to do them correctly

mmarian11mo ago

I was just thinking of that earlier today, really cool!

j / k navigate · click thread line to collapse