undefined | Better HN

0 pointsilaksh17d ago0 comments

3 years max. Maybe 5 if you are lucky.The models will continue to improve. The exponential gains in compute efficiency that have been ongoing for 70+ years will continue and that will result in even smarter models. There are dramatic hardware changes in the pipeline.

But really that particular issue could have been solved by literally just telling it in a markdown file or instructions something like "verify all facts or compliance requirements with web search and include citations in responses".

0 comments

22 comments · 5 top-level

ofjcihen17d ago· 9 in thread

This is akin to “don’t make mistakes”

“Verify all facts and compliance requirements” leaves enormous holes even if you assume the LLM has a concept of facts and requirements (it does not).

What facts? What requirements? For what industry? For what subset of that industry? For what country or countries that you will be doing business in? Are these current “facts” and “requirements” or is the LLM referencing a dusty article from 1992 for which the subject matter has been radically overhauled?

In my job I regularly see small but incredibly important mistakes like this lead to major issues. Some of those are human driven but increasingly the defense of the person responsible has turned into “Claude said it was fine though!”

rfgplk17d ago

> “Verify all facts and compliance requirements”

No. This is a disasterous instruction. Not only is it vague, but it's also meaningless. When giving instructions to an LLM your prompt must be concise and exact. Tell it _exactly_ which requirements need to be followed, ideally have it write or (preferably) pass audited tests to enforce these requirements. You also need to provide it with a hard source of truth it can rely upon. Instead of saying "verify facts", you're better off by saying "... make sure [whatever you're doing] matches with data at X.Y.Z, verify by running [instruction/command/program]"

ofjcihen16d ago

I think you might have meant to reply to the parent comment.

zuzululu17d ago

Especially in cybersecurity.

zuzululu17d ago

If someone can’t distinguish between the two then I honestly wonder what company would be comfortable putting them anywhere near a regulated or security-sensitive workflow especially from someone one that condescendingly views their own jobs as a daycare for people seemingly beneath them.

ilakshOP17d ago

It can make mistakes and will sometimes, but what he specifically mentioned was a case where it did not pull up a reference that it needed. So using a web search tool effectively would make a big difference.

ofjcihen17d ago

It still does not rise the standard he requires which your response indicated would be easy for the model to achieve with a simple prompt.

Additionally, using a specific tool does not suddenly give the model common sense enough to say “this piece of information doesn’t answer the question of whether this solution fits in this specific industry at this time in this place”.

1 more reply

kolinko17d ago

Well, you wouldn't just give human a task "verify all facts and compliance requirements" and expect it to end well either, no?

ofjcihen17d ago

If I was working with someone who had experience in the specific industry then yes, that is in fact what I would do.

If I plucked a random passerby and gave them the task then no, I’d find myself detailing out every specific to them.

You’re equating the LLM to the least qualified candidates. I don’t think your argument is communicating what you intended.

zuzululu17d ago

of course not, nobody experienced at their job would/should be saying that and expecting it to be flawlessly followed through especially cybersecurity.

feel like the parent you are replying to literally views their place of work as a daycare which is very condescending

1 more reply

vor_17d ago· 4 in thread

> 3 years max. Maybe 5 if you are lucky.The models will continue to improve. The exponential gains in compute efficiency that have been ongoing for 70+ years will continue and that will result in even smarter models. There are dramatic hardware changes in the pipeline.

I remember hearing that 10 years ago about self-driving.

oblio17d ago

60 years ago about flying cars, 40 years ago about cold fusion, the list is long.

We need a lot more basic research into LLMs and also a lot cheaper hardware.

The current batch of LLMs will turn a lot of fields upside down, but not to the tune of $3tn or whatever crazy amounts are being invested right now.

ilakshOP17d ago

I mean basically you and I are effectively living in parallel universes. Waymo has been running for years, and there are other services including in China and Tesla which is not 100% there but actually very effective.

And the thing he complained about is fixable with a web search, and AI does programming and office work today. So, it's already here. It's just a question of degrees.

habinero17d ago

Waymo heavily relies on real humans to get their robots unstuck. They also rely on extremely detailed mapping data, which is why they're only in a few cities.

Tesla has been a couple years away from FSD for, what, like ten years now?

If you scrape off the glitter, you'll find a lot more duct tape and wire than you think.

DaSHacka17d ago

"Just 2 more weeks guys, and AI will be able to do everything!"

eikenberry17d ago· 3 in thread

The classic 3-5 year window for a new technology that is uncertain and requires just a few more breakthroughs to get there...

Upvoter3317d ago

written with confidence too. I'm amazed at the levels of confidence people have in predicting the (unclear) future.

latentsea17d ago

LLMs learnt to be confidently wrong from us.

weakfish17d ago

Like full self driving!

suttontom17d ago· 1 in thread

Ah yes, the magical equivalent of "you are a senior software engineer who writes bug-free code".

IME people would benefit greatly from the process, albeit tedious and time-consuming, of testing out the same prompt sequence/session with the exact same model multiple times. It becomes clear extremely quickly how capable but unreliable and inconsistent a model can be even when given the same context. If you have ever completed a long, complicated task with an agent and then lost the session and tried doing the same thing again from scratch you may have had the experience of seeing the subtle changes that come up in the model's thinking which lead it to accept or reject certain paths and ignore or incorporate prompt instructions like the one you've provided.

ilakshOP17d ago

Change the temperature to 0 and it will be more consistent.

1 more reply

jppope17d ago

Stuff like that is risk tolerance... its not strictly codified and its more akin to probability. Different companies at different stages, in different industries will all interpret their risk differently... how will a smarter model improve that?

j / k navigate · click thread line to collapse

0 comments

22 comments · 5 top-level

ofjcihen17d ago· 9 in thread

This is akin to “don’t make mistakes”

“Verify all facts and compliance requirements” leaves enormous holes even if you assume the LLM has a concept of facts and requirements (it does not).

rfgplk17d ago

> “Verify all facts and compliance requirements”

ofjcihen16d ago

I think you might have meant to reply to the parent comment.

zuzululu17d ago

Especially in cybersecurity.

zuzululu17d ago

ilakshOP17d ago

ofjcihen17d ago

It still does not rise the standard he requires which your response indicated would be easy for the model to achieve with a simple prompt.

1 more reply

kolinko17d ago

Well, you wouldn't just give human a task "verify all facts and compliance requirements" and expect it to end well either, no?

ofjcihen17d ago

If I was working with someone who had experience in the specific industry then yes, that is in fact what I would do.

If I plucked a random passerby and gave them the task then no, I’d find myself detailing out every specific to them.

You’re equating the LLM to the least qualified candidates. I don’t think your argument is communicating what you intended.

zuzululu17d ago

of course not, nobody experienced at their job would/should be saying that and expecting it to be flawlessly followed through especially cybersecurity.

feel like the parent you are replying to literally views their place of work as a daycare which is very condescending

1 more reply

vor_17d ago· 4 in thread

I remember hearing that 10 years ago about self-driving.

oblio17d ago

60 years ago about flying cars, 40 years ago about cold fusion, the list is long.

We need a lot more basic research into LLMs and also a lot cheaper hardware.

The current batch of LLMs will turn a lot of fields upside down, but not to the tune of $3tn or whatever crazy amounts are being invested right now.

ilakshOP17d ago

And the thing he complained about is fixable with a web search, and AI does programming and office work today. So, it's already here. It's just a question of degrees.

habinero17d ago

Waymo heavily relies on real humans to get their robots unstuck. They also rely on extremely detailed mapping data, which is why they're only in a few cities.

Tesla has been a couple years away from FSD for, what, like ten years now?

If you scrape off the glitter, you'll find a lot more duct tape and wire than you think.

DaSHacka17d ago

"Just 2 more weeks guys, and AI will be able to do everything!"

eikenberry17d ago· 3 in thread

The classic 3-5 year window for a new technology that is uncertain and requires just a few more breakthroughs to get there...

Upvoter3317d ago

written with confidence too. I'm amazed at the levels of confidence people have in predicting the (unclear) future.

latentsea17d ago

LLMs learnt to be confidently wrong from us.

weakfish17d ago

Like full self driving!

suttontom17d ago· 1 in thread

Ah yes, the magical equivalent of "you are a senior software engineer who writes bug-free code".

ilakshOP17d ago

Change the temperature to 0 and it will be more consistent.

1 more reply

jppope17d ago

j / k navigate · click thread line to collapse