undefined | Better HN

0 pointsamelius1mo ago0 comments

At that point why don't we ask the AI directly to filter through our data? The AI query language is much more powerful.

0 comments

Because the output you get can have hallucinations, which don’t happen with a deterministic tool. Furthermore, by getting the `jq` command you get something which is reusable, fast, offline, local, doesn’t send your data to a third-party, doesn’t waste a bunch of tokens, … Using an LLM to filter the data is worse in every metric.

alwillis1mo ago

I get that AI isn’t deterministic by definition, but IMHO it’s become the go-to response for a reason to not use AI, regardless of the use case.

I’ve never seen AI “hallucinate” on basic data transformation tasks. If you tell it to convert JSON to YAML, that’s what you’re going to get. Most LLMs are probably using something like jq to do the conversion in the background anyway.

AI experts say AI models don’t hallucinate, they confabulate.

tkclough1mo ago

Just because you haven't seen it hallucinate on these tasks doesn't mean it can't.

When I'm deciding what tool to use, my question is "does this need AI?", not "could AI solve this?" There's plenty of cases where its hard to write a deterministic script to do something, but if there is a deterministic option, why would you choose something that might give you the wrong answer? It's also more expensive.

The jq script or other script that an LLM generates is way easier to spot check than the output if you ask it to transform the data directly, and you can reuse it.

1 more reply

djhn1mo ago

LLMs will often helpfully predict made up tokens for the content of the data fields.

For 100% of jq use cases I have the data wouldn’t fit into context. But even for the smaller things, I have never, not even once, had an LLM not mangle data that is fed into it.

Take a feed of blog posts (and select the first 50 or so just to give the model a fighting chance). I’ll give you 80% likelihood of the output being invalid JSON. And if you manage to get valid JSON out of it, the actual dates, times and text content will have changed.

1 more reply

ameliusOP1mo ago

You can use a local LLM and you can ask it to use tools so it is faster.

sigseg1v1mo ago

"so it is faster" than what? A cloud hosted LLM? That's a pretty low bar. It's certainly not faster than jq.

kelvinjps101mo ago

There is hardware that is able to run jq but no a local AI model that's powerful enough to make the filtering reliable. Ex a raspberry pi

imcritic1mo ago

Because the input might be sensitive.

Because the input might be huge.

Because there is a risk of getting hallucinations in the output.

Isn't this obvious?

aduitsis1mo ago

...and because it's going to burn a million times the energy of what jq would require.

Shorel1mo ago

You really need to go and learn about the concept of determinism and why for some tasks we need and want deterministic solutions.

It's an important idea in computer science. Go and learn.

ameliusOP1mo ago

You need to learn to adapt to the real world where most things are not deterministic. Go and learn.

Shorel1mo ago

I already know that. That's why we have deterministic algorithms, to simplify that complexity. You have much to learn, witty answers mean nothing here, particularly empty witty answers, which are no better than jokes. Maybe stand-up comedy is your call in life.

johnisgood1mo ago

That may be true, but do you not want determinism where possible, especially within this context, i.e. filtering data?

skipants1mo ago

Is your argument that the world isn't deterministic and so we should also apply nondeterminism to filtering json data?

j / k navigate · click thread line to collapse

0 comments

latexr1mo ago

alwillis1mo ago

I get that AI isn’t deterministic by definition, but IMHO it’s become the go-to response for a reason to not use AI, regardless of the use case.

AI experts say AI models don’t hallucinate, they confabulate.

tkclough1mo ago

Just because you haven't seen it hallucinate on these tasks doesn't mean it can't.

The jq script or other script that an LLM generates is way easier to spot check than the output if you ask it to transform the data directly, and you can reuse it.

1 more reply

djhn1mo ago

LLMs will often helpfully predict made up tokens for the content of the data fields.

For 100% of jq use cases I have the data wouldn’t fit into context. But even for the smaller things, I have never, not even once, had an LLM not mangle data that is fed into it.

1 more reply

ameliusOP1mo ago

You can use a local LLM and you can ask it to use tools so it is faster.

sigseg1v1mo ago

"so it is faster" than what? A cloud hosted LLM? That's a pretty low bar. It's certainly not faster than jq.

kelvinjps101mo ago

There is hardware that is able to run jq but no a local AI model that's powerful enough to make the filtering reliable. Ex a raspberry pi

imcritic1mo ago

Because the input might be sensitive.

Because the input might be huge.

Because there is a risk of getting hallucinations in the output.

Isn't this obvious?

aduitsis1mo ago

...and because it's going to burn a million times the energy of what jq would require.

Shorel1mo ago

You really need to go and learn about the concept of determinism and why for some tasks we need and want deterministic solutions.

It's an important idea in computer science. Go and learn.

ameliusOP1mo ago

You need to learn to adapt to the real world where most things are not deterministic. Go and learn.

Shorel1mo ago

johnisgood1mo ago

That may be true, but do you not want determinism where possible, especially within this context, i.e. filtering data?

skipants1mo ago

Is your argument that the world isn't deterministic and so we should also apply nondeterminism to filtering json data?

j / k navigate · click thread line to collapse