It's important to note that in most jurisdictions you can't actually do this legally? Like, you may be able to get away with it, but it is actually illegal to sell financial services by misrepresentation?
"They will have $1B in revenue by year end" is perfectly fine to completely make up.
that domain knowledge is acquired by talking to people - which A.I can't do - all kinds of people since the knowledge isn't written down.
I know this having dated a girl who did M&A deals for media properties - you know your big tv shows/movies etc
It might be lower stakes, but isn't that still a juicy target for data-exfiltration attacks?
In other words, imagine if one of your direct competitors was watching everything your employee read while making spreadsheets and slideshows.
If AI is really as wonderous as everybody says, why didn't all the employees of all the AI companies simply type "Claude, file my taxes for me" as a prompt and walk away?
Luckily there is still a significant market for the services.
Currently we don't know the risk, so it is kind of hard to absorb.
Why, they can sell user data to other brokers. Experts indeed! But not in insurance or finance, of course.
But there's a process risk here based on their current practises. I'm hoping those practises change so that I can recommend Claude to everyone I know, but as of now, there's existential risk exposure here that's greater than Google's.
Anthropic's automated systems can and will ban you for pretty arbitrary things; and you won't get human support or Claude – even if you are an enterprise paying out of your nose. And there's 0 redressal unless you go viral on social media. Or know someone who knows someone. See: https://x.com/Whizz_ai/status/2051180043355967802 https://x.com/theo/status/2045618854932734260
And I say that as someone who likes how Anthropic has been training Claude and Opus. I just don't think they're prepared to be the trillion dollar company they've become. They are – in a very real way – suffering from success. Which is extremely inconvenient to be on the receiving end of when you're on a deadline.
Code review has become unbearable because before AI, developers were reviewing code as they went writing it in the first place. Granted, never perfect and why a second person reviewing code was (is?) a best practice. But effectively there was always some level of code review happening as developers wrote code.
I fear it is way more boring to review financial and medical documents completely written by AI than it is to write (and at the same time review) by yourself. And way more dangerous to ship mistakes than in most software.
But more often than not that developer ends up reviewing far more lines of code due to the typical verbosity of an LLM.
The analysis itself; I'm doing it by hand.
Far too often people think productivity is the point. Maybe the point is developer's understanding of the product IS the product?
You're not engineering black boxes, you're engineering legible boxes.
Here's some of the horrible things i've seen. Frontend dashboard with PHI/PII deployed via vercel/next because AI told them how to get their site online. Login is hardcoded into the frontend so anyone with inspect can find the password.
Another "fixed" dashboard deployed the same way. This time they added firebase auth so they got sign in with Google added with only logging into our domain. Wait how would they be able to create a token for our domain? They didn't the frontend just blocks domains from calling firebase.auth but firebase doesn't care. So simply calling the function in the console lets me login with any gmail account....
They also where showing me their RBAC with firebase. Again they don't have access to our Orgnization/Directory/Groups. So i wondered how they did this.. wouldn't you guess its a hardcoded list of approved users. You can literally call firebase.auth and sign in anonymously. Again only the frontend checks the email addresses. So now that i have a firebase auth all the backend firebase function just check that you have auth'd. So i can make any request i want to the backend. The frontend simply won't show me the code.
I could go on and on about the stupidity levels I'm facing but I don't feel like crashing out.
All I can say is this tool is only useful if you already know how to correctly implement these things. Does it save me time sure but I have to call it retarded and explain why not to do things. Honestly I feel like claude is good for people who like to gamble. When it gets it right it feels great but I don't want to roll the dice 30 times to get it correct.
Sadly this sounds like par for the course when it comes to tech. Too many messages and requests for help depend on knowing someone in the right slack groups.
The templates being: pitch builder, meeting preparer, earnings reviewer, model builder, market researcher, valuation reviewer, general ledger reconciler, month-end closer, statement auditor, KYC (Know Your Customer) screener.
Seems pretty scattershot. Reminds me of GPT Store.
Any idea how they ensure this doesnt happen? As in, how can a user verify that the model did not touch any of the numbers and that it only built pipelines for them.
what I've been telling my CFO who wants to get AI involved in things is that for a lot of accounting and finance work "Trust but verify" doesnt work because verify is often the same process as doing the work.
Build a deterministic query set and automate it for monthly or daily reporting reconcilliation.
Leave AI out of it.
How do you verify that all the tariffs are properly allocated to the correct GL code without going through the invoices and checking for each tariff on the list? How do you make sure none were accidentally assigned to other GL codes? All you have is pdfs, you dont know what the AI did or didnt do with the info on the pdf, there are not many use-cases to catch its errors without doing the work yourself.
If anything, it's going to add a step to these "kids" work where they have to use the AI to do the work and then redo 90% of the work anyway just to verify the output and then AI is going to get the credit anyway.
Or the overworked people are going to use AI and not verify it, which means not catching any errors or hallucinations, which apparently is fine because someone claims it's a solved problem for the black box of infinite possibility and inconsistent output.
I feel like there’s a metaphor in there... maybe I’ll ask Claude about it.
My money's on that.
I’ve also had some great results with a /reflect skill that asks the agent to look at the work in the broader context of the project. But those are the only two skills I use regularly that aren’t specific to our company, codebase, or tools.
The AI is an expert in both following and generating prompts.
It seems the initial product footprint tries to sidestep this problem by not giving the agents control on who to lend to or which applications to approve. Even so I think it's quite an optimistic read on their end. Happy to share reports to anyone who's interested (montana@latentevals.com), especially if you work at a frontier model lab and are interested in plugging my evals into your RL systems!
All I did was upgrade claude code and use the new model. It most definitely exhibits misaligned behavior (compared to 4.6)
I assume that 4.6 will become unavailable at some point, but I hope not any time soon. 4.7 hit usage limits faster, didn't do anything obviously better, and had more annoying behaviors in other aspects. I don't know if this is strictly a model issue or if there are also problems with how it's harnessed through Claude Code. I'm not willing to spend more time digging into it until I'm forced to.
This probably killed a thousand startups in this space.
in the early internet you wouldn't see google creating their own news site or facebook building their own animal farm. what happened to platformication of everything?
I have given up on trying to get through to him how bad of an idea this is. He's unemployed and has been working on this for over a year.
Y Combinator is accepting applications for the Summer 2026 Batch funding cycle. Make sure they don't miss out!
Before, some idiot would pitch their stupid idea to dozens of local webdev companies and banks and get told dozens of times their idea is straight up stupid and never going to work and they are stupid.
Now these LLMs allows them to bypass all of that advice and create what they want without any input or even knowing how the tech behind it works.
We are so fucked lol
No, why would they if they have the choice?
> what happened to platformication of everything?
Business happened. The web works differently from how it used to. The users are different. LLM inference and AI tools is a different core product from search and ads. That, and we have the benefit of hindsight now. Maybe a Google newsroom would've actually been a good idea in 2006 in hindsight, who knows.
Also realistically you could say the same thing about Google Maps and Street View. That probably also killed some startups. Google isn't running a charity for startups.
They are also fighting for their lives because these insane valuations simply aren’t justified by being dumb pipes. Fortunately, open weights models are widely available and have crossed a threshold of usefulness that cements their place as good substitutes.
The issue with that is obviously that most of the generated value would be captured by that company in the middle, while Anthropic would stay in the cost-conscious inference market.
We're not talking about what is best for the consumer (ex more competition to force iterations and improvements), but what Anthropic thinks is best for Anthropic.
Is this a serious question?
Without the big labs with deep pockets investing to change the consumer mindset do you think a small company with no funding has any chance of even existing?
I remember when paying $1.99 for a mobile game on iOS was considered too expensive and now it seem most consumers are primed to spend more on in-app purchases every week. That mind-shift did not happen overnight.
It was not that long ago $200 for ChatGPT subscription was considered extravagant but now even wrappers can charge this price without hesitation - some of them do.
What Anthropic is doing is priming the market of which they will be potentially one of the main beneficiaries as long as they can continue existing. But I don't think anyone will go to Anthropic directly to source their financial services agent. They will go to financial service companies that use Anthropic to build the capabilities.
I think someone stated it clearly - they can't take on these kinds of businesses until they build out the risk side and the personnel, all of which is a human problem not a tech one. A lot of processes still require physical steps and backstops because it's not possible to source all the data needed to act on it in the first place. Then you have audits and reconciliations, a bunch of strict workflow rules and atomicity to reach levels of software that bigger financial institutions would accept.
My gut reaction to stuff like this is a mix of "oh shit, they could take over my company" and "they're the next script kiddy that thinks software is anywhere near a majority of the work in some software spaces".
Yes they can? They have infinite more cash to pay off any risk. What do you need personnel for besides sign off if the AI does it right?
Will Anthropic externalize the risk, selling access to agents? Or will internalize the risk and liability, selling financial services? Maybe both? I guess lots of companies want both, doing some things internally and keeping other things at arms length by outsourcing to 3rd party accountants.
Google News was definitely a thing (and actually still exists).
Unfortunately no.
The TAM for Anthropic and OpenAI is anything that runs software or a screen.
Any software or technology business that has high margins that Anthropic and OpenAI are not doing will be a target.
After both their IPO's mandates Wall Street them to push for more growth by competing in other technology business areas or they will get punished in the markets.
It is ROI or bust.
There was an app for OSX that added window snapping, long before Apple added it to their desktop environment. $5 or something for a free feature that just makes sense to build into your product from the start. Apple is king at absorbing this sort of paid add on, eventually. AI makes that faster.
Less cynically, you might say that "use AI to do <obvious thing>" is not really a viable startup pitch anymore. That's not necessarily bad.
What's even sadder is it can work for way too long.
The car industry, oil and gas… all could have played out differently if different players had gained wider adoption or if governments used a different economic model.
There isn't going to be any moat for the hosted providers besides hardware scale. They can run your request on shared 1TB memory hardware, or whatever.
But local hardware is going to catch up, the hosted providers are going to become commoditized, and the costs are just going to be compute whether its your hardware or theirs.
And your laptop is going to be powerful enough to be good enough for most cases.
Building is the easy part. There are lot of service level stuff that I am sure anthropic will not be able to provide, therefore they are trying to partner with other orgs in that realm.
I am very skeptical about their stuff now.
If you are builder, I believe you should avoid anthropic, it can be default to monopolistic behavior, I am not saying they are doing it, but they could, where in they see what you are building, if you have traction, position a product in that realm. Just saying.
If you can’t prove PMF and differentiation with $10m, I’m sorry but you’re not a serious enterprise.
And if what you’re building is “pitch deck AI”, I mean, come on.
This is an attempt to inflate token generation to fool people into increasing anthropic’s valuation.
I've really only seen it used for research / exploration thus far. Either for economic research slide deck or for exploring trading hypothesis
Though we’ve had a few incidents where employees have submitted AI-generated receipts for reimbursement which is another issue..
I'm in that space so naturally interested in what people are up to :)
Nowhere near self sufficient tools though, just great to answer questions over the data that would usually take a few hours of custom scripting/excel. I wouldn't trust our stakeholders using AI directly either, being frank.
For research and theses evaluations, we're observing that firms - of names we all know - are bullish and even eager to try AI products.
Regarding automated asset management and the likes, indeed there's much more apprehension.
> I've really only seen it used for research / exploration thus far
Summaries and translation for sure.
Speaking with devs in the field I know that AI tools are used to summarize and extract data from... PDFs. Now, thankfully, LLMs got better at answering "How many 'r' in 'strawberry" and it looks like they're good enough for summarizing PDFs and extracting key numbers but I'd still be cautious.
And I've got a friend who's a translator specifically for financial documents: she's a contractor and getting about 1/10th of the work (and 1/10th of the pay) she used to have for now she's only tasked to verify that the translations are correct. Of course she already had lots of tools, way before he LLM era, automating some of her work but she was still billing he use of those tools. Now LLMs are doing nearly all the work and not "for her": it's happening upstream and she only gets the output of the LLMs and has to verify them. And there aren't that many errors.
https://www.bloomberg.com/professional/insights/press-announ...
It feels like juggling pipe bombs and I have a ton of empathy for the teams being pressured by the business to roll them out with no appreciation for the regulatory rat's nest that ensues.
More industry exposure to well-managed agentic experiences will create oodles of opportunities to reduce premiums for consumers and offput some inflation-driven increases in cost of coverage.
however the result (excel/spreadsheet) looks different each time you run it. Which is annoying when you run it at the end of each month.
btw: this is not surprising when you look at the low details the skills have.
Just yesterday I told a colleague that he should by some of their vests for his company :-D
"ready-to-run agent templates for the most time-consuming work in financial services: building pitchbooks, screening KYC files, and closing the books at month-end"
Ok, maybe you can squeeze a vaguely passable pitchbook out of Claude.
But screening KYC files or closing books at month-end ?
"I'll have some of what they're smoking" as the cool kids say.
No regulator or tax office on this planet is going to accept the "but Claude said it was ok" excuse.
The only people who are going to profit out of this are Anthropic, Lawyers and Governments (through increased fines).
LLMs do not change the equation all that much: human's ability to imagine is the most scarce resource on the planet and LLMs will not help all that much with it.
Better Call Saul when (not if) it does.
https://www.lawnext.com/2025/05/ai-hallucinations-strike-aga...
Why didn’t I think of that.
Is the plan to have an LLM do everything? And do it worse?
"Oh yeah my Claude didn't agree with the pitch from their Claude"
The goal of current tech is to make humanity a gerbil running on a Claude wheel
I don't necessarily disagree with that but doing it through LinkedIn slop companies? Come on man you know better than that
What I predict instead is that we will have a common UI layer plugin and a "protocol" than can speak to ui elements -- this might be more composable.
As someone who has been interviewing lately, I think this is the next step after leetcode and whiteboard style interviews.
2. I’m almost certainly talking about health insurance, made obvious by you even mentioning that. There’s a HN guideline about discussing in good faith.
3. I find it humorous you hand-wave away our inhuman healthcare system as “for a variety of reasons”.
4. I see your career is in hedge funds, defense, and big tech. Best of luck ;)
As mentioned the problems with the US healthcare system are numerous, complex, and interrelated. I don't think they have a simple solution, nor do I think they are insurance problems at their core. For example the cost of drugs in the US vs the rest of the world has very little to do with insurance.