undefined | Better HN

0 pointslazysheepherd2y ago0 comments

Or maybe they are just AB testing and aggressively optimizing the response generation?

LLMs are known to be compute/energy hungry to execute. It is a developing technology, if not downright experimental.

Therefore, this explanation is very likely. I cannot see the reason to call this a conspiracy.

0 comments

3 comments · 1 top-level

willsmith722y ago· 2 in thread

AB testing on what? AB tests need to produce some results which are then compared. How would releasing different versions in production help with that?

It would make more sense if that was internal and the responses were then graded.

A failed canary release would be more likely, where they released this version to a small amount of people not realising it was bad

lazysheepherdOP2y ago

On top of my mind: responses have feedback buttons below them.

You can simply deploy different versions and compare the neutral + positive / negative feeback ratio.

It would be sinful if they did not add other metrics like how many times the user had to correct and update their prompt before ending the chat, etc.

Data, data, data...

timthelion2y ago

There are the up down thumbs and automatic sentiment analysis as a test.

j / k navigate · click thread line to collapse