undefined | Better HN

0 pointsBoppreH1mo ago0 comments

An LLM company using regexes for sentiment analysis? That's like a truck company using horses to transport parts. Weird choice.

0 comments

lopsotronic1mo ago

The difference in response time - especially versus a regex running locally - is really difficult to express to someone who hasn't made much use of LLM calls in their natural language projects.

Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.

Now personally I would have maybe made a call through a "traditional" ML widget (scikit, numpy, spaCy, fastText, sentence-transformer, etc) but - for me anyway - that whole entire stack is Python. Transpiling all that to TS might be a maintenance burden I don't particularly feel like taking on. And on client facing code I'm not really sure it's even possible.

cyanydeez1mo ago

So, think of it as a business man: You don't really care if your customers swear or whatever, but you know that it'll generate bad headlines. So you gotta do something. Just like a door lock isn't designed for a master criminal, you don't need to design your filter for some master swearer; no, you design it good enough that it gives the impression that further tries are futile.

So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.

true_religion1mo ago

This door lock doesn’t even work against people speaking French, so I think they could have tried a mite harder.

ben_w1mo ago

The up-side of the US market is (almost) everyone there speaks English. The down side is, that includes all the well-networked pearl-clutchers. Europe (including France) will have the same people, but it's harder to coordinate a network of pearl-clutching between some saying "Il faut protéger nos enfants de cette vulgarité!" and others saying "Η τηλεόραση και τα μέσα ενημέρωσης διαστρεβλώνουν τις αξίες μας!" even when they care about the exact same media.

For headlines, that's enough.

For what's behind the pearl-clutching, for what leads to the headlines pandering to them being worth writing, I agree with everyone else on this thread saying a simple word list is weird and probably pointless. Not just for false-negatives, but also false-positives: the Latin influence on many European languages leads to one very big politically-incorrect-in-the-USA problem for all the EU products talking about anything "black" (which includes what's printed on some brands of dark chocolate, one of which I saw in Hungary even though Hungarian isn't a Latin language but an Ugric language and only takes influences from Latin).

1 more reply

sebastiennight1mo ago

En toute honnêteté, je pense avoir dit "damn it" plus d'une fois à chat gépété avant de fermer la fenêtre dans un accès de rage

tomaskafka1mo ago

Nom de dieu de putain de bordel de merde de saloperie de connard d'enculé de ta mère.

1 more reply

bigbuppo1mo ago

There are only Americans on the internet.

themafia1mo ago

Yea.. but.. in English only.

Fortunately I can swear pretty well in Spanish.

senderista1mo ago

Only a native speaker can tell if you swear well in a foreign language.

1 more reply

jacquesm1mo ago

That's like saying you can use a chisel for woodworking.

wcrossbow1mo ago

If it’s good enough it’s good enough, but just like there are many more options than going full blown LLM or just use a regex there are more options than transpile a massive Python stack to TS or give up.

mlmonkey1mo ago

> Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude.

You do know that 10,000x _is_ four orders of magnitude, right? :-D

jonbwhite1mo ago

OP is saying that in their experience it is more like eight orders of magnitude

mlmonkey1mo ago

I guess I need reading glasses ... :-D

noprof66911mo ago

They're sending it to an llm anyway tho? Not sure why they wouldn't just add a sentiment field to the requested response shape.

FuckButtons1mo ago

because a regex on the client is free vs gpu compute is absolutely not.

noprof66911mo ago

BUT THEY'RE ALREADY RUNNING IT THROUGH THE LLM.

stingraycharles1mo ago

Because they want it to be executed quickly and cheaply without blocking the workflow? Doesn’t seem very weird to me at all.

_fizz_buzz_1mo ago

They probably have statistics on it and saw that certain phrases happen over and over so why waste compute on inference.

crem1mo ago

More likely their LLM Agent just produced that regex and they didn't even notice.

mycall1mo ago

The problem with regex is multi-language support and how big the regex will bloat if you to support even 10 languages.

doublesocket1mo ago

Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick.

2 more replies

TeMPOraL1mo ago

We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :).

3 more replies

crimsonnoodle581mo ago

They only need to look at one language to get a statistically meaningful picture into common flaws with their model(s) or application.

If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.

b1121mo ago

Did you just complain about bloat, in anything using npm?

Foobar85681mo ago

Why do you need to do it at the client side? You are leaking so much information on the client side. And considering the speed of Claude code, if you really want to do on the client side, a few seconds won't be a big deal.

plorntus1mo ago

Depends what its used by, if I recall theres an `/insights` command/skill built in whatever you want to call it that generates a HTML file. I believe it gives you stats on when you're frustrated with it and (useless) suggestions on how to "use claude better".

Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.

matkoniecz1mo ago

> a few seconds won't be a big deal

it is not that slow

orphea1mo ago

It looks like it's just for logging, why does it need to block?

jflynn21mo ago

Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)

Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts

orphea1mo ago

This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.

1 more reply

nojs1mo ago

Oh it’s worse than that. This one ended up getting my account banned: https://github.com/anthropics/claude-code/issues/22284

lanbin1mo ago

This is a tricky problem, I mean, Pinyin also uses the English alphabet.

foodevl1mo ago

It is not a tricky problem because it has a simple and obvious solution: do not filter or block usage just because the input includes a word like "gun".

cryptonector1mo ago

Wow, that's horrible.

toraway1mo ago

... and closed for inactivity like basically every issue in the repo, of course.

blks1mo ago

Because they actually want it to work 100% of the time and cost nothing.

mohsen11mo ago

Maybe hard to believe but not everyone is speaking English to Claude

orphea1mo ago

Then they made it wrong. For example, "What the actual fuck?" is not getting flagged, neither is "What the *fuck*".

arcfour1mo ago

It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.

Zamaamiro1mo ago

Classic over-engineering. Their approach is just fine 90% of the time for the use case it’s intended for.

orphea1mo ago

75-80% [1], 90%, 99% [2]. In other words, no one has any idea.

I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.

Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P

[1]: https://news.ycombinator.com/item?id=47587286

[2]: https://news.ycombinator.com/item?id=47586932

1 more reply

morkalork1mo ago

Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly

vntok1mo ago

They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.

codegladiator1mo ago

what you are suggesting would be like a truck company using trucks to move things within the truck

argee1mo ago

That’s what they do. Ever heard of a hand truck?

eadler1mo ago

I never knew the name of that device.

Thanks

freedomben1mo ago

Depending on the region you live in, it's also frequently called a "dolly"

1 more reply

istoleabread1mo ago

Do we have a hand llm perchance?

svnt1mo ago

Yeah it’s called a regex. With a lot of human assistance it can do less but fits in smaller spaces and doesn’t break down.

1 more reply

floralhangnail1mo ago

Well, regex doesn't hallucinate....right?

raw_anon_11111mo ago

I just went to expertSexChange.com…

geon1mo ago

buttbuttination

mmh00001mo ago

The Clbuttical problem[1]

[1] https://en.wikipedia.org/wiki/Scunthorpe_problem

draxil1mo ago

Good to have more than a hammer in your toolbox!

lazysheepherd1mo ago

Because they are engineers? The difference between an engineer and a hobbyist is an engineer has to optimize the cost.

As they say: any idiot can build a bridge that stands, only an engineer can build a bridge that barely stands.

raw_anon_11111mo ago

Cloud hosted call centers using LLMs is one of my specialties. While I use an LLM for more nuanced sentiment analysis, I definitely use a list of keywords as a first level filter.

nitekode1mo ago

A lot if things dont make sense until you involve scale. Regex could be good enough do give a general gist.

j451mo ago

Asking a non deterministic software to act like a deterministic one (regex) can be a significantly higher use of tokens/compute for no benefit.

Some things will be much better with inference, others won’t be.

__alexs1mo ago

Using some ML to derive a sentiment regex seems like a good actually?

ldobre1mo ago

It's more like a truck company using people to transport some parts. I could be wrong here, but I bet this happens in Volvo's fabrics a lot.

throwaw121mo ago

because impact of WTF might be lost in the result of the analysis if you solely rely on LLM.

parsing WTF with regex also signifies the impact and reduces the noise in metrics

"determinism > non-determinism" when you are analysing the sentiment, why not make some things more deterministic.

Cool thing about this solution, is that you can evaluate LLM sentiment accuracy against regex based approach and analyse discrepancies

irthomasthomas1mo ago

This just proves its vibe coded because LLMs love writing solutions like that. I probably have a hundred examples just like it in my history.

irthomasthomas1mo ago

Actually, this could be a case where its useful. Even it only catches half the complaints, that's still a lot of data, far more than ordinary telemetry used to collect.

makeitrain1mo ago

Don’t worry, they used an llm to generate the regex.

arnarbi1mo ago

It's more like workers on a large oil tanker using bicycles to move around it, rather than trying to use another oil tanker.

ojr1mo ago

I used regexes in a similar way but my implementation was vibecoded, hmmm, using your analysis Claude Code writes code by hand.

harikb1mo ago

Not everything done by claude-code is decided by LLM. They need the wrapper to be deterministic (or one-time generated) code?

intended1mo ago

The amount of trust and safety work that depends on google translate and the humble regex, beggars the imagination.

apgwoz1mo ago

> That's like a truck company using horses to transport parts. Weird choice.

Easy way to claim more “horse power.”

mghackerlady1mo ago

More like a car company transporting their shipments by truck. It's more efficient

pdntspa1mo ago

LLMs cost money, regular expressions are free. It really isn't so strange.

scotty791mo ago

As far as I can tell they do nothing with it. They just log it.

pfortuny1mo ago

They had the problem of sentiment analysis. They use regexes.

You know the drill.

artrockalter1mo ago

LLMs are good at writing complex regex, from my experience

kjshsh1231mo ago

Using regex with LLMs isn't uncommon at all.

slashdave1mo ago

Maybe. Could just be a pre filter.

lou13061mo ago

They're searching for multiple substrings in a single pass, regexes are the optimal solution for that.

noosphr1mo ago

The issue isn't that regex are a solution to find a substring. The issue is that you shouldn't be looking for substrings in the first place.

This has buttbuttin energy. Welcome to the 80s I guess.

lou13061mo ago

> The issue is that you shouldn't be looking for substrings in the first place.

Why? They clearly just want to log conversations that are likely to display extreme user frustration with minimal overhead. They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.

noosphr1mo ago

>Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

The only time to use a regex is when searching with a human in the loop. All other uses are better handled some other way.

>They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.

Every conversation is sent to an llm at least a thousand times the size of gpt2 which could one shot this nearly a decade ago.

1 more reply

8cvor6j844qw_d61mo ago

Very likely vibe coded.

I've seen Claude Code went with a regex approach for a similar sentiment-related task.

mr_00ff001mo ago

My understanding of vibe coding is when someone doesn’t look at the code and just uses prompts until the app “looks and acts” correct.

I doubt you are making regex and not looking at it, even if it was AI generated.

1 more reply

rdiddly1mo ago

Clbuttic!

BoppreHOP1mo ago

It's fast, but it'll miss a ton of cases. This feels like it would be better served by a prompt instruction, or an additional tiny neural network.

And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.

hk__21mo ago

It’s fast and it matches 80% of the cases. There’s no point in overengineering it.

NitpickLawyer1mo ago

> There’s no point in overengineering it.

I swear this whole thread about regexes is just fake rage at something, and I bet it'd be reversed had they used something heavier (omg, look they're using an LLM call where a simple regex would have worked, lul)...

vharuck1mo ago

The pattern only matches if both ends are word boundaries. So "diffs" won't match, but "Oh, ffs!" will. It's also why they had to use the pattern "shit(ty|tiest)" instead of just "shit".

BoppreHOP1mo ago

You're right, I missed the \b's. Thanks for the correction.

feketegy1mo ago

It's all regex anyways

make31mo ago

it's like a faster than light spaceship company using horses. There's been infinite solutions to do this better even CPU only for years lol.

sumtechguy1mo ago

hmm not a terrible idea (I think).

You have a semi expensive process. But you want to keep particular known context out. So a quick and dirty search just in front of the expensive process. So instead of 'figure sentiment (20seconds)'. You have 'quick check sentiment (<1sec)' then do the 'figure sentiment v2 (5seconds)'. Now if it is just pure regex then your analogy would hold up just fine.

I could see me totally making a design choice like that.

sfn421mo ago

It's almost as if LLMs are unreliable

j / k navigate · click thread line to collapse

0 comments

lopsotronic1mo ago

The difference in response time - especially versus a regex running locally - is really difficult to express to someone who hasn't made much use of LLM calls in their natural language projects.

Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.

cyanydeez1mo ago

So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.

true_religion1mo ago

This door lock doesn’t even work against people speaking French, so I think they could have tried a mite harder.

ben_w1mo ago

For headlines, that's enough.

1 more reply

sebastiennight1mo ago

En toute honnêteté, je pense avoir dit "damn it" plus d'une fois à chat gépété avant de fermer la fenêtre dans un accès de rage

tomaskafka1mo ago

Nom de dieu de putain de bordel de merde de saloperie de connard d'enculé de ta mère.

1 more reply

bigbuppo1mo ago

There are only Americans on the internet.

themafia1mo ago

Yea.. but.. in English only.

Fortunately I can swear pretty well in Spanish.

senderista1mo ago

Only a native speaker can tell if you swear well in a foreign language.

1 more reply

jacquesm1mo ago

That's like saying you can use a chisel for woodworking.

wcrossbow1mo ago

mlmonkey1mo ago

> Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude.

You do know that 10,000x _is_ four orders of magnitude, right? :-D

jonbwhite1mo ago

OP is saying that in their experience it is more like eight orders of magnitude

mlmonkey1mo ago

I guess I need reading glasses ... :-D

noprof66911mo ago

They're sending it to an llm anyway tho? Not sure why they wouldn't just add a sentiment field to the requested response shape.

FuckButtons1mo ago

because a regex on the client is free vs gpu compute is absolutely not.

noprof66911mo ago

BUT THEY'RE ALREADY RUNNING IT THROUGH THE LLM.

stingraycharles1mo ago

Because they want it to be executed quickly and cheaply without blocking the workflow? Doesn’t seem very weird to me at all.

_fizz_buzz_1mo ago

They probably have statistics on it and saw that certain phrases happen over and over so why waste compute on inference.

crem1mo ago

More likely their LLM Agent just produced that regex and they didn't even notice.

mycall1mo ago

The problem with regex is multi-language support and how big the regex will bloat if you to support even 10 languages.

doublesocket1mo ago

2 more replies

TeMPOraL1mo ago

We're talking about Claude Code. If you're coding and not writing or thinking in English, the agents and people reading that code will have bigger problems than a regexp missing a swear word :).

3 more replies

crimsonnoodle581mo ago

They only need to look at one language to get a statistically meaningful picture into common flaws with their model(s) or application.

If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.

b1121mo ago

Did you just complain about bloat, in anything using npm?

Foobar85681mo ago

plorntus1mo ago

matkoniecz1mo ago

> a few seconds won't be a big deal

it is not that slow

orphea1mo ago

It looks like it's just for logging, why does it need to block?

jflynn21mo ago

Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)

Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts

orphea1mo ago

1 more reply

nojs1mo ago

Oh it’s worse than that. This one ended up getting my account banned: https://github.com/anthropics/claude-code/issues/22284

lanbin1mo ago

This is a tricky problem, I mean, Pinyin also uses the English alphabet.

foodevl1mo ago

It is not a tricky problem because it has a simple and obvious solution: do not filter or block usage just because the input includes a word like "gun".

cryptonector1mo ago

Wow, that's horrible.

toraway1mo ago

... and closed for inactivity like basically every issue in the repo, of course.

blks1mo ago

Because they actually want it to work 100% of the time and cost nothing.

mohsen11mo ago

Maybe hard to believe but not everyone is speaking English to Claude

orphea1mo ago

Then they made it wrong. For example, "What the actual fuck?" is not getting flagged, neither is "What the *fuck*".

arcfour1mo ago

It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.

Zamaamiro1mo ago

Classic over-engineering. Their approach is just fine 90% of the time for the use case it’s intended for.

orphea1mo ago

75-80% [1], 90%, 99% [2]. In other words, no one has any idea.

I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.

Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P

[1]: https://news.ycombinator.com/item?id=47587286

[2]: https://news.ycombinator.com/item?id=47586932

1 more reply

morkalork1mo ago

Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly

vntok1mo ago

They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.

codegladiator1mo ago

what you are suggesting would be like a truck company using trucks to move things within the truck

argee1mo ago

That’s what they do. Ever heard of a hand truck?

eadler1mo ago

I never knew the name of that device.

Thanks

freedomben1mo ago

Depending on the region you live in, it's also frequently called a "dolly"

1 more reply

istoleabread1mo ago

Do we have a hand llm perchance?

svnt1mo ago

Yeah it’s called a regex. With a lot of human assistance it can do less but fits in smaller spaces and doesn’t break down.

1 more reply

floralhangnail1mo ago

Well, regex doesn't hallucinate....right?

raw_anon_11111mo ago

I just went to expertSexChange.com…

geon1mo ago

buttbuttination

mmh00001mo ago

The Clbuttical problem[1]

[1] https://en.wikipedia.org/wiki/Scunthorpe_problem

draxil1mo ago

Good to have more than a hammer in your toolbox!

lazysheepherd1mo ago

Because they are engineers? The difference between an engineer and a hobbyist is an engineer has to optimize the cost.

As they say: any idiot can build a bridge that stands, only an engineer can build a bridge that barely stands.

raw_anon_11111mo ago

Cloud hosted call centers using LLMs is one of my specialties. While I use an LLM for more nuanced sentiment analysis, I definitely use a list of keywords as a first level filter.

nitekode1mo ago

A lot if things dont make sense until you involve scale. Regex could be good enough do give a general gist.

j451mo ago

Asking a non deterministic software to act like a deterministic one (regex) can be a significantly higher use of tokens/compute for no benefit.

Some things will be much better with inference, others won’t be.

__alexs1mo ago

Using some ML to derive a sentiment regex seems like a good actually?

ldobre1mo ago

It's more like a truck company using people to transport some parts. I could be wrong here, but I bet this happens in Volvo's fabrics a lot.

throwaw121mo ago

because impact of WTF might be lost in the result of the analysis if you solely rely on LLM.

parsing WTF with regex also signifies the impact and reduces the noise in metrics

"determinism > non-determinism" when you are analysing the sentiment, why not make some things more deterministic.

Cool thing about this solution, is that you can evaluate LLM sentiment accuracy against regex based approach and analyse discrepancies

irthomasthomas1mo ago

This just proves its vibe coded because LLMs love writing solutions like that. I probably have a hundred examples just like it in my history.

irthomasthomas1mo ago

Actually, this could be a case where its useful. Even it only catches half the complaints, that's still a lot of data, far more than ordinary telemetry used to collect.

makeitrain1mo ago

Don’t worry, they used an llm to generate the regex.

arnarbi1mo ago

It's more like workers on a large oil tanker using bicycles to move around it, rather than trying to use another oil tanker.

ojr1mo ago

I used regexes in a similar way but my implementation was vibecoded, hmmm, using your analysis Claude Code writes code by hand.

harikb1mo ago

Not everything done by claude-code is decided by LLM. They need the wrapper to be deterministic (or one-time generated) code?

intended1mo ago

The amount of trust and safety work that depends on google translate and the humble regex, beggars the imagination.

apgwoz1mo ago

> That's like a truck company using horses to transport parts. Weird choice.

Easy way to claim more “horse power.”

mghackerlady1mo ago

More like a car company transporting their shipments by truck. It's more efficient

pdntspa1mo ago

LLMs cost money, regular expressions are free. It really isn't so strange.

scotty791mo ago

As far as I can tell they do nothing with it. They just log it.

pfortuny1mo ago

They had the problem of sentiment analysis. They use regexes.

You know the drill.

artrockalter1mo ago

LLMs are good at writing complex regex, from my experience

kjshsh1231mo ago

Using regex with LLMs isn't uncommon at all.

slashdave1mo ago

Maybe. Could just be a pre filter.

lou13061mo ago

They're searching for multiple substrings in a single pass, regexes are the optimal solution for that.

noosphr1mo ago

The issue isn't that regex are a solution to find a substring. The issue is that you shouldn't be looking for substrings in the first place.

This has buttbuttin energy. Welcome to the 80s I guess.

lou13061mo ago

> The issue is that you shouldn't be looking for substrings in the first place.

noosphr1mo ago

>Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

The only time to use a regex is when searching with a human in the loop. All other uses are better handled some other way.

>They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.

Every conversation is sent to an llm at least a thousand times the size of gpt2 which could one shot this nearly a decade ago.

1 more reply

8cvor6j844qw_d61mo ago

Very likely vibe coded.

I've seen Claude Code went with a regex approach for a similar sentiment-related task.

mr_00ff001mo ago

My understanding of vibe coding is when someone doesn’t look at the code and just uses prompts until the app “looks and acts” correct.

I doubt you are making regex and not looking at it, even if it was AI generated.

1 more reply

rdiddly1mo ago

Clbuttic!

BoppreHOP1mo ago

It's fast, but it'll miss a ton of cases. This feels like it would be better served by a prompt instruction, or an additional tiny neural network.

And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.

hk__21mo ago

It’s fast and it matches 80% of the cases. There’s no point in overengineering it.

NitpickLawyer1mo ago

> There’s no point in overengineering it.

vharuck1mo ago

The pattern only matches if both ends are word boundaries. So "diffs" won't match, but "Oh, ffs!" will. It's also why they had to use the pattern "shit(ty|tiest)" instead of just "shit".

BoppreHOP1mo ago

You're right, I missed the \b's. Thanks for the correction.

feketegy1mo ago

It's all regex anyways

make31mo ago

it's like a faster than light spaceship company using horses. There's been infinite solutions to do this better even CPU only for years lol.

sumtechguy1mo ago

hmm not a terrible idea (I think).

I could see me totally making a design choice like that.

sfn421mo ago

It's almost as if LLMs are unreliable

j / k navigate · click thread line to collapse