undefined | Better HN

0 pointsspecialp2y ago0 comments

Great accuracy as tested to a continually changing black box. GPT hits are also expensive and often have unpredictable latency. This would have to be integration tested to detect changes to GPT answers.

0 comments

adventured2y ago

Correct me if I'm wrong, you can pick which dated GPT API to utilize and expect that to not act as a continually changing black box. I've been using the API for a long time and have been able to pick the version.

So for example: gpt-4-0314, or gpt-3.5-turbo-0613, etc.

The latency issue is definitely true. Ideally the cost could be limited to a very small percentage of hard cases (which you first have to identify).

eatonphil2y ago

LLMs don't seem to be deterministic [0, 1, 2, 3]. So no, pinning the version wouldn't be enough.

[0] https://matt-rickard.com/foundational-models-are-not-enough

[1] https://arxiv.org/pdf/2308.02828.pdf

[2] https://www.sitation.com/non-determinism-in-ai-llm-output/

[3] https://towardsdatascience.com/the-magic-of-llms-prompt-engi...

adventured2y ago

> So no, pinning the version wouldn't be enough.

You can to an extent dictate GPT's determinism with settings you can pass along in the API, combined with the parent already proclaiming they saw a 100% success rate.

So how do you know it wouldn't be enough? The parent is already saying their test suite indicates it is enough. What tests have you run counter to their claim to show it fails? And how do you know the parent can't increase the determinism even further beyond what they were already using in their testing (and decreasing the risk of negative outcomes by doing so)?

j / k navigate · click thread line to collapse

0 comments

adventured2y ago

So for example: gpt-4-0314, or gpt-3.5-turbo-0613, etc.

The latency issue is definitely true. Ideally the cost could be limited to a very small percentage of hard cases (which you first have to identify).

eatonphil2y ago

LLMs don't seem to be deterministic [0, 1, 2, 3]. So no, pinning the version wouldn't be enough.

[0] https://matt-rickard.com/foundational-models-are-not-enough

[1] https://arxiv.org/pdf/2308.02828.pdf

[2] https://www.sitation.com/non-determinism-in-ai-llm-output/

[3] https://towardsdatascience.com/the-magic-of-llms-prompt-engi...

adventured2y ago

> So no, pinning the version wouldn't be enough.

You can to an extent dictate GPT's determinism with settings you can pass along in the API, combined with the parent already proclaiming they saw a 100% success rate.

j / k navigate · click thread line to collapse