I do software for a retail company now and we've been having a similar debate: AI helps me and other departments do work more efficiently, but me getting a feature out the door faster is better for the business but doesn't get more products off the shelves. So, to the shareholders and the C-suite, is AI doing anything for the company?
So the right place to look should be looking at free time of employees or cognitive load/stress they are under.
How is it possible to measure that?
Anyways Goldman might not be the right firm to measure it because they are not interested in anything that isn't money.
I think finding out whether or not AI is actually boosting productivity is a problem of measuring productivity period, which at the very least my current company is pretty bad at. For a developer, is their productivity their lines of code produced, hours worked, project tasks completed per unit of time, agile points completed per unit of time, PRs reviewed, PRs submitted? In more human metrics, is it what their coworkers say about them, what leadership says about them, what customers say about them, testers, QA? The amount of bugs they fix, the amount of bugs they don't ship?
Sorry for the ramble, but apply this productivity measurement conundrum to entire corporations and it's no wonder that no productivity boost is being recorded. I'd be surprised if semi-accurate productivity measurements were taken in the first place.
“Bottom line” is a reference to costs, it doesn’t matter whether a department is a profit center. If AI is making these departments more efficient, it should show up in the bottom line.
Someone I know works for a municipality in digital transformation. They have a public facing website where local residents can report things to the municipality, such as potholes, water system issues, noise complaints, etc.
The UI has this huge taxonomy with like 200 categories with three levels of nesting that route to different departments. There are multiple "other" categories.
When a resident chooses other, it becomes an employee's job to choose a different category. But 30% of the categories employees choose are wrong (about the same for residents).
It's a UX problem, a taxonomy problem, a training problem, a change management problem, a work routing problem, and all these contribute to longer resolution times.
My friend started tinkering with a small quantized local LLM, testing whether it could classify the reported issues more accurately than the public/staff. Of course it could. They're preparing to integrate it into production. Installing it will dramatically improve the UX for the residents, save staff time (at least hundreds of hours a year), improve resolution time, etc.
My friends boss mentioned it at a conference and apparently "no one else is doing this".
So yeah, we are just barely scratching the surface of the productivity opportunities LLMs offer. It's early, not because the technology isn't developed enough to help (it is), but because most people are still figuring out how it can help.
You know you can choose to not always assume the worst about everything, right?
Wrapping business processes around these LLMs is the same kind of hard organizational problem plaguing most internal IT projects. People are still the bottleneck.
You also run into the issue of accuracy compounding. Running multi step flows with AI compounds the success rate and dramatically increases the chances of a full-job failure. E.g. even at 99% success rate for any single step, a 30-step process is only likely to succeed 75% of the time without errors. If you go down to 95% success for each, you only have a 75% likelihood of flawless execution at about 6 steps.
So it’s also about getting those per step success rates way up.
I had a conversation with an acquaintance a few weeks ago who was adamant that ChatGPT only really showed up less than a year ago. They were absolutely mind blown when I pointed out that ChatGPT got crazy popular literally years ago, late 2022, early 2023. They were convinced this was still very new stuff, like 8 months ago or so
I don't really blame people either. Personally I feel like the years since COVID have been a weird blur. People don't realize that lockdowns were half a decade ago already
On-prem to cloud -- took about 10 years. De-facto success.
Blockchain to new age of finance -- 12 years later, not much success overall (Instant regret on anyone dumb enough to buy junk crypto and NFTs)
Quantum computing -- 10ish years later, nothing major, but it might bear fruit long-term.
LLMs -- Let's average all the above and say 11 years from 2022. So 2033 is when you can say it was a bust.
It mainly helps with mundane tasks. I think mostly employees have better life within the company do those stupid task or another email or meeting notes.
Seems like hand waving and layoffs will have to stop before we get real data