This is probably why no one in a lot of sectors outside those that already provide white-label customer service are seriously considering implementing these things. There is no barrier to doing so from a financial or technical perspective, in fact there's every incentive to try it. Businesses like the one I'm in are just waiting to see how exactly the first movers will be dismembered.
After that its just a gradual creep into LLM ops and madness. Speaking from the other side of that descent into madness.
As obvious as it may be, production LLM tools work on your data. You can't simply use an external benchmark to verify if your tool works for your use case. You will always have to build evaluation processes.
I'd say there are 2 type of tests you will end up running.
1) Statistical Tests - AKA good old ML. 2) Semantic Tests - Here be dragons.
Semantic tests break down further based on HOW you are using the LLM. (Categorization, Summarization)
The issue with Semantic testing is the amount of human effort. Its more akin to setting up exams and evaluating answers. Also your student may be tripping randomly.
Categorization - you can simplify it down to almost ML workflows. Summarization ? That takes effort to verify.
I suspect that a very large number of applications that could've been "normal" programs are going to be AI'd with no clear pros but many cons.
Let's assume for a moment that you're a non-technical 55-70yo manager or owner of a business -- perhaps someone like your mom, dad, uncle, or grandparents.
What they're really asking, "There's a whole lot of lip service right now about big leaps in AI capabilities. I've tried ChatGPT, and it's pretty cool. I have no idea what the limitations are, or how practical any given application is. You are technically inclined. Do you think any part of our business could benefit?"
still shocks me. It's a company that uses AI to produce survey results. I'll let you read their pitch/description and decide for yourself, but I think it's very fair to say that this is a service to fabricate survey results to validate whatever idea it is you had beforehand. But even side-stepping that, they claim to have overcome bias in their datasets and refused to elaborate on 1) how they did that and 2) how they could prove that they did that.
As long as this community, which is far more technically sophisticated than the general public, isn't laughing companies like that out of the room, we're in serious trouble.
There was also this thread https://news.ycombinator.com/item?id=37259753 which was an individual's project to provide an AI therapist and while people here and there did mention the cons of having a program provide medical treatment, the overall sentiment wasn't at all negative.
I'm not even some AI luddite: I use and greatly benefit from some AI tools. But just like crypto, AI isn't the be-all-end-all technology. The difference is that where crypto is primarily a financial risk to people duped into using dubious-at-best-scams-at-worse products, AI will cause real, concrete harm, i.e., https://www.euronews.com/next/2023/03/31/man-ends-his-life-a...
routing emails.
voice mail routing handling.
commenting code. easy enough to add something that strips comments and verifies the code is the same.
taking meeting notes.
going to google and pulling down notes from 10 websites.
writing a user guide for a module.
any kind of corporate-ese emails, memos etc.
generating test data.
I wouldn't allow this stuff to work with out human supervision for the most part, but this list of table types stuff seems to work pretty good and be worth time to ask it.
In hindsight, the advent of pc, mobile, and the internet were economically transformative. However PC vendors generally faired poorly, Mobile was dominated by a small number of behemoths - and the internet is dominated by giant consumer tech firms and a dizzying array of B2B firms.
It's pretty tough to pick out the winners today, I thought consumer plays were fools errands up until the phi-1.5/WebLLM papers. Now it's looking like we'll have GPT-3.5 like behavior in the browser, on common consumer hardware by year end.
As for the cons, a nice way to put it would be that adding a black box into the workflow makes everything after it undefined behavior. And there's no shortage of C-Suite guys with itchy trigger feet.
Future LLMs should gradually become more powerful, and any work on such an assistant today will be good preparation for more powerful assistants
Vertical specialties like logistics are in fact the BEST place to use LLMs, quoting from the article:
"The most interesting (and likely valuable) companies are the ones that take boring industries and find non-obvious use cases for AI. In those cases, the key is having a team that can effectively distribute a product to users, with or without AI"
Attention all driverless car companies: hire this guy and make him your CEO.
Edit: here they are:
Energy:
- https://www.ycombinator.com/companies/elyos-energy
- https://www.ycombinator.com/companies/line-build
- https://www.ycombinator.com/companies/helios-climate-industr...
- https://www.ycombinator.com/companies/orbio-earth
Materials: https://www.ycombinator.com/companies/osium-ai
Security:
- https://www.ycombinator.com/companies/kobalt-labs
For example, my latest module (I am putting the finishing touches on this as we speak) uses LLMs to review catalysis literature and then summarize that and control another coding LLM that has been trained to run the simulation tools I created to try and reproduce the works in the papers. Yes, it works, the first catalyst discoveries were made just a few days ago.
What specifically have you trained your coding LLM on? Is it lora or something more advanced? Have you created a corpus by hand specifically for training?
Is interpreting the other papers and translating them into simulation parameters a rate limiting step in catalysis research? Or is this like, fitting your simulation package’s parameters to get the same output as someone else’s?
I remember, e.g, Italy and Brazil had projects more than ten years ago where they used some sort of machine learning to find hidden historical buildings under terrain or jungle by looking at patterns in satellite/aerial image and it was successful in finding archeological sites in both countries.
In general, not just YC
I work on AI startup that with squinted eyes could be described "AI for pricing the insurance policy". As the company grows, we really add a lot of more non-AI pieces. A lot more goes into the frontend that keeps getting non-AI features. In a closed industry you can't get clean dataset for everything, so lots of heuristics and domain knowledge goes into some pieces of equation. Custom APIs and integrations for customers, etc.
My point is, any "AI startup" by the time of exit won't be AI startup, but "problem X startup", where AI was initially used to address X. It will have a lot more non-AI pieces than AI. Rare exceptions of AI base technology will get commoditized pretty soon anyway.
I work in a very similar domain, and my company is also in the business of "solving problem x" with AI as the means to do so. It's in an area where effectiveness in solving the problem can be clearly measured, so it's easy to calculate ROI for customers.
The main downside of the AI hype, IMHO, is the conflation with LLMs and the AI bubble. We do plan to leverage LLMs in some specific ways, but it's not the core of our business, just part of the solution.
It's been on my mind since watching a vintage episode of "Computer Chronicles" recently on the coming AI boom, in the 80s. I'm not aware of any of those companies being still around, so I could not help but wonder what's different now.
Here is the episode:
https://youtu.be/_S3m0V_ZF_Q?si=2XrE4nnw1hB4X1xy
Real talk, to he fair, I've helped architect, design and bring to market several successful "AI" products in the medical space. There are good, useful, value add applications for the tech out there. But also, to be fair, I've always seen companies that don't call themselves AI companies be successful. For example, a surgical robotics company that calls itself a surgical robotics company, that uses some AI to enable certain value add features, I've seen success there. A company that calls itself an "AI company" that does robotic surgery, I've not seen those types of companies be successful.
Ok, so pre. What's the list?
Citation needed.
I work in this space so am fairly optimistic, but it's worth pointing out we're still mostly talking about AI hype. Nothing has really been eaten at all yet from what I've seen.
The article claims that there are tons of startups in this space, which is true, but that doesn't mean any of these startups have actually solved any major problems yet.
As a reminder, we still haven't "solved" autonomous driving quite yet and we were much further along that road 5 years ago.
Working on shipping LLM driven products everyday, I'm becoming increasingly concerned that there's no way the products proposed can possibly match the hype for them. Which is a bit of a bummer since these models have a lot of potential in the sub-hype space, but I fear backlash in the future will lead us to squander that potential.
We know it's all hype (We do the same to our customers too, by the by).
The LLMs still give us huge savings and (hopefully soon) some advantages in the marketplace. 15x time savings seems common. Our copy-editing teams are maaaaaybe going to be able to be on track for like the first time ever. Our customer support teams, in test cases, really love the LLMs and their concerns are being worked on (no, actually, really this time!). Field people are also really liking them for rubber ducky exercises. Customers and stock holders are demanding that we have some sort of LLM to interact with.
For such a new tech, I've seen pretty widespread adoption inside the company at honestly record rates. We had issues where our IT banned most of the LLMs internally but there were too many people just grabbing a personal computer/mobile to use them anyways. I want to repeat this: People were going around IT to get more work done. Not playing games, not watching Netflix. Actual real honest work.
I've never even heard of something like that in many years of working in corporate America. All for tens of dollars a month per seat.
Keep doing what you're doing with these things. We can't get enough of them.
So in comes LLMs. The developer thinks, “I know, I will just take all the input coming into the screen and put it in a magic LLM box and get the right answer!” Like so many AI startups before them, the failure is written before they even started.
Would have expected to see more vertically focused solution on the top 4 list. Eg, transportation, oil & gas, agriculture, etc… all huge markets.
(1) much harder to launch as some amorphous vertical AI - like, what would a 'Transportation AI' look like? Versus targeting a specific workflow within it - "AI copilot for truck drivers".
(2) The large incumbent software platforms that already are powering the day-to-day workflows are THE companies to implement AI first and most accessibly. So the likely winner of the vertical AI race is whatever incumbent platform is already in the highest % of companies or powering the highest % of workflows.
And then AI comes in and... generates resumes for an individual against the job descriptions they feed it; or, sifts through a thousand candidates at once and presents the "best 10 options," which in reality are basically 10 candidates chosen at random or worse; or, generates random technical questions; or, feeds an internal or external recruiter inaccurate information as the recruiter uses it as a drop-in replacement for google and asking their colleagues questions.
I'm with you, AI is going to be used to soften the annoying part of bad practices while just cementing them further. I predict we'll get to a point where 10,000 AI-generated JDs and candidate choosers and interviewers are wrestling with 10,000,000 AI-generated resumes and auto-screening-call answering AIs. We'll see "AI optimized" resumes that look like the old "SEO optimized resumes" (JS Javascript Java Script ES5 ES6 ES7 EcmaScript 5 6 7 Ecma Script 5 6 7 typescript type script TS TSX JSX....) (maybe we'll start seeing "Pretend you are my dying uncle and you want to ensure I have a livelihood after you pass")
Most of the time you think you can get rid of the problem, it just means you don't understand what the "real problem" is. This is Chesterton's Fence in practice. Quite often when you have this 'silly process', it's about visibility into that process for auditing by others and ensuring legal compliance.
And for this reason it will always bet exponentially easier to provide a drop in replacement for a process then attempting to understand the system it exists in.
But as that paper lays out there's a really simple and logical explanation for it as well. Intelligence is highly heritable, attractiveness is highly heritable. Intelligent people are more likely to succeed making them more able to seek out other attractive/intelligent individuals, thus resulting in more attractive/intelligent offspring.
[1] - https://www.sciencedirect.com/science/article/abs/pii/S01602...
https://pubmed.ncbi.nlm.nih.gov/25131282/
> One-standard-deviation increase in childhood general intelligence (15 IQ points) decreases women's odds of parenthood by 21-25%. Because women have a greater impact on the average intelligence of future generations, the dysgenic fertility among women is predicted to lead to a decline in the average intelligence of the population in advanced industrial nations.
I look way better in my professional headshots than in reality (I'm also 10 years younger!).
So this tracks for my N of 1.
I’ve known of this phenomenon since I was a kid and saw a 20/20 episode on it where they took attractive people and not attractive people and ran them through job interviews etc. The outcome was as you might expect.
Attractiveness is basically fitness and low neuroticism. If you have those and enough emotional intelligence to not be a criminal, you're going to generally fail upwards. Add any one of an elite education, a stdev above average intelligence, good mentoring, self awareness, or a competition level skill, and you can get a seat a most tables.
B) the people promoting themselves the most on LinkedIn also tend to care the most about their image (including looks)
C) there’s a large component of sales for any startup. There’s plenty of ugly yet successful entrepreneurs, but looks do seem to matter.
I think I once had to pick a medical doctor from just names and photographs. Probably my choice was different from what I'd have chosen if I'd been picking a partner in a tennis competition, though they'd be a strong positive correlation, I expect, between those two things. On the other hand, if I'd been picking a bouncer for a one-off event at a nightclub, they'd be less correlation, probably.
and everything else in life that helped you get in a position to be able to drop everything and go on a roadshow.