undefined | Better HN

0 pointsafro881y ago0 comments

> potential applications > if you ... > for example ...

Yes there seems to be lots of potential. Yes we can brainstorm things that should work. Yes there is a lot of examples of incredible things in isolation. But it's a little bit like those youtube videos showing amazing basketball shots in 1 try, when in reality lots of failed attempts happened beforehand. Except our users experience the failed attempts (LLM replies that are wrong, even when backed by RAG) and it's incredibly hard to hide those from them.

Show me the things you / your team has actually built that has decent retention and metrics concretely proving efficiency improvements.

LLMs are so hit and miss from query to query that if your users don't have a sixth sense for a miss vs a hit, there may not be any efficiency improvement. It's a really hard problem with LLM based tools.

There is so much hype right now and people showing cherry picked examples.

0 comments

7 comments · 7 top-level

jihadjihad1y ago

> Except our users experience the failed attempts (LLM replies that are wrong, even when backed by RAG) and it's incredibly hard to hide those from them.

This has been my team's experience (and frustration) as well, and has led us to look at using LLMs for classifying / structuring, but not entrusting an LLM with making a decision based on things like a database schema or business logic.

I think the technology and tooling will get there, but the enormous amount of effort spent trying to get the system to "do the right thing" and the nondeterministic nature have really put us into a camp of "let's only allow the LLM to do things we know it is rock-solid at."

2 more replies

fnordpiglet1y ago

We have built quite a few highly useful LLM applications in my org that have reduced cost and improved outcomes in several domains - fraud detection, credit analysis, customer support, and a variety of other spaces. By in large they operate as cognitive load reducers but also handle through automation the vast majority of work since in our uses false negatives are not as bad as false positives but the majority of things we analyze are not true positives (99.999%+). As such the LLMs do a great job at anomaly detection and allow us to do tasks it would be prohibitively expensive with humans and their false positive and negative rates are considerably higher than LLMs.

I see these statements often here about “I’ve never seen an effective commercial use of LLMs,” which tells me you aren’t working with very creative and competent people in areas that are amenable to LLMs. In my professional network beyond where I work now I know at least a dozen people who have successful commercial applications of LLMs. They tend to be highly capable people able to build the end to end tool chains necessary (which is a huge gap) and understand how to compose LLMs in hierarchical agents with effective guard rails. Most ineffectual users of LLMs want them to be lazy buttons that obviate the need to think. They’re not - like any sufficiently powerful tool they require thought up front and are easy to use wrong. This will get better with time as patterns and tools emerge to get the most use out of them in a commercial setting. However the ability to process natural language and use an emergent (if not actual) abductive reasoning is absurdly powerful and was not practically possible 4 years ago - the assertion such an amazing capability in an information or decisioning system is not commercially practical is on the face absurd.

3 more replies

VeejayRampay1y ago

really agree with this and I think it's been the general experience: people wanting LLMs to be so great (or making money off them) kind of cherry picking examples that fit their narrative, which LLMs are good at because they produce amazing results some of the time like the deluxe broken clock that they are (they're right many many times a day)

at the end of the day though, it's not exactly reliable or particularly transformative when you get past the party tricks

anilgulecha1y ago

LLMs are not hype.

In education at least, we've actively improved efficiency by ~25% across a large swath of educators (direct time saved) - agentic evaluators, tutors and doubt clarifiers. The wins in this industry are clear. And this is that much more time to spend with students.

I also know from 1-1 conversation with my peers in large-finance world, and there too the efficiency improvements on multiple fronts are similar.

1 more reply

physicsguy1y ago

We’ve found that the text it generates in our RAG application is good, but it cocks up probably 5-10% of the time doing the inline references to the documents which users think is a bug and which we aren’t able to fix. This is static rather than interactively generated too

jacobr11y ago

This is why we are only at the start of exploring the solution space. What applications don't require 100% accuracy? What tooling can we build that enables a human in the loop to choose between options? What options do we have to better testing or checking accuracy? There is a lot more to be done to invest hybrid systems that use other types of models or novel training date or heuristics or human workflows in novel ways that shore up the shortcomings ... but in aggregate allow us to do new things. It will take many years for us to figure where this makes the most sense.

archiepeach1y ago

To be fair in the human-based teams I've worked with in startups I couldn't show you products with decent retention.

j / k navigate · click thread line to collapse