undefined | Better HN

0 pointssimonw3y ago0 comments

I'm going to guess it involves engineering prompts.

Which requires a surprising amount of skill and experience!

I still haven't found a 100% reliable way of getting a LLM to always produce results in JSON for example. See https://twitter.com/genmon/status/1646194992761782278

    State of the art techniques to get GPT to return JSON

    - logic: Responses must match this JSON schema
    - demonstration: For example…
    - appeal to identity: You are a chatbot that speaks perfect JSON
    - cajoling: Remember always return JSON!
    - threat: if you don't I SWEAR I'm gonna–

I wrote more about why I think prompt engineering deserves more respect than it gets here: https://simonwillison.net/2023/Feb/21/in-defense-of-prompt-e...

State of the art techniques to get GPT to return JSON - logic: Responses must match this JSON schema - demonstration: For example… - appeal to identity: You are a chatbot that speaks perfect JSON - cajoling: Remember always return JSON! - threat: if you don't I SWEAR I'm gonna–

0 comments

12 comments · 6 top-level

dpflan3y ago· 3 in thread

A glaring problem is: Non-determinism of LLMs, creating different answers to the same prompt. I appreciate your blogging and analysis in this space, so I am I am interested in your responses. The non-determinism implies that prompt engineering is brittle, difficult, and prone to no formal evaluation techniques for correctness.

jaredsohn3y ago

Set the temperature to zero to make it more deterministic.

simonwOP3y ago

I agree with everything you've said there.

dpflan3y ago

It is certainly an interesting phenomenon, and I wonder what techniques from neuroscience for brain mapping could be used for model "brain mapping", which could lend itself more to prompt engineering as a science (latent space mapping).

AmazingTurtle3y ago· 1 in thread

You can use vicuna-7b-1.1. No need for chat prompts. Just slam in your data and end it off like so

Generate a JSON with this and that {"this": "

Lower the temperature to minimum for deterministic results, fine tune the other parameters if needed. And have a stop token for JSON closing tag like so }.

That usually works perfectly fine for me in most scenarios. Best: that stuff runs on RTX 3080 with 15token/s (quite fast!). Also vicuna-7b is pretty much as good as gpt-3 when it came out.

arthurcolle3y ago

Vicuna-7b is much better than Gpt4All, but still struggles with math - I can't wait until my new work computer comes in, I will try to run the new StableLM models

bugglebeetle3y ago· 1 in thread

> I still haven't found a 100% reliable way of getting a LLM to always produce results in JSON for example.

Have you tried Guardrails?

https://github.com/ShreyaR/guardrails

dpflan3y ago

Interesting project…thanks for sharing.

gcr3y ago· 1 in thread

why not just adjust the decoder / beam search to not emit any tokens that aren't semantically valid JSON?

ie. instead of using temperature to sample something from the top k most likely tokens, first exclude all the tokens that cause the output to be malformed. the model can only emit {, ", [, or a number for the first token, for example.

if someone would like a fun project to try this right away, one place to start would be to modify llama.cpp's chat example just before the line that samples tokens [1], going through `lctx.logits` to zero out invalid tokens (or these are logits, so i guess set them to -INFINITY). For smoketest, fix the first token of the model's output to "{" without any other changes and I bet you'd get something approaching JSON out.

[1]: here's the line to change: https://github.com/ggerganov/llama.cpp/blob/c4fe84fb0d28851a... see the bit on line 317-319 about how it ignores the end-of-sequence token by zeroing out the probability of sampling it? just like that!

i mean, the most principled approach probably requires some theoretic CS knowledge about regular expression derivatives or parsing machine derivatives, but i'm surprised it isn't more common to just hook into the decoder design a little, given how much we want structured data out of these models

i wish i knew how to voice my ignorant skepticism in a less disparaging way, sorry.... but i feel like a lot of this "legitimization of prompt engineering as a useful trade/practice" thinking assumes that we're trapped in the "magic circle" where the only input we have to the model is picking the prompt and the only possible output is the most likely token. but these are generative models! conditioned on their output, we have our choice about which token to accept, so why not just condition on the distribution of possible JSON output instead of the distribution of possible prose?

i suspect very quickly the most competitive prompt engineers will combine their solid understanding of theoretic machine learning and statistics with a solid understanding of computer science, perhaps even combined with a dash of persuasion / neurolinguistic programming experience. kinda worries me but it's how it is

int_19h3y ago

You're basically describing https://github.com/newhouseb/clownfish/, except there it tries to validate the output against JSON schema on the fly.

williamcotton3y ago

Hey Simon! I've been digging your writings on LLMs lately.

I've been having some decent luck with some of the approaches that I've discussed in the following articles and projects:

From Prompt Alchemy to Prompt Engineering: An Introduction to Analytic Augmentation: https://github.com/williamcotton/empirical-philosophy/blob/m...

Writing Web Applications with LLMs: https://www.williamcotton.com/articles/writing-web-applicati...

https://github.com/williamcotton/transynthetical-engine

I'd love to hear your thoughts on the matter!

One of the techniques that I've found for reliably returning JSON is... ask for multiple responses and then use one of the responses that successfully parses!

arthurcolle3y ago

Hi Simon, your blog posts have been invaluable in my ongoing process of refining a document that covers major concepts in prompt engineering and LLM fine-tuning and I'd love to pick your brain over email or a call if you have any bandwidth!

j / k navigate · click thread line to collapse

0 comments

12 comments · 6 top-level

dpflan3y ago· 3 in thread

jaredsohn3y ago

Set the temperature to zero to make it more deterministic.

simonwOP3y ago

I agree with everything you've said there.

dpflan3y ago

AmazingTurtle3y ago· 1 in thread

You can use vicuna-7b-1.1. No need for chat prompts. Just slam in your data and end it off like so

Generate a JSON with this and that {"this": "

Lower the temperature to minimum for deterministic results, fine tune the other parameters if needed. And have a stop token for JSON closing tag like so }.

That usually works perfectly fine for me in most scenarios. Best: that stuff runs on RTX 3080 with 15token/s (quite fast!). Also vicuna-7b is pretty much as good as gpt-3 when it came out.

arthurcolle3y ago

Vicuna-7b is much better than Gpt4All, but still struggles with math - I can't wait until my new work computer comes in, I will try to run the new StableLM models

bugglebeetle3y ago· 1 in thread

> I still haven't found a 100% reliable way of getting a LLM to always produce results in JSON for example.

Have you tried Guardrails?

https://github.com/ShreyaR/guardrails

dpflan3y ago

Interesting project…thanks for sharing.

gcr3y ago· 1 in thread

why not just adjust the decoder / beam search to not emit any tokens that aren't semantically valid JSON?

int_19h3y ago

You're basically describing https://github.com/newhouseb/clownfish/, except there it tries to validate the output against JSON schema on the fly.

williamcotton3y ago

Hey Simon! I've been digging your writings on LLMs lately.

I've been having some decent luck with some of the approaches that I've discussed in the following articles and projects:

From Prompt Alchemy to Prompt Engineering: An Introduction to Analytic Augmentation: https://github.com/williamcotton/empirical-philosophy/blob/m...

Writing Web Applications with LLMs: https://www.williamcotton.com/articles/writing-web-applicati...

https://github.com/williamcotton/transynthetical-engine

I'd love to hear your thoughts on the matter!

One of the techniques that I've found for reliably returning JSON is... ask for multiple responses and then use one of the responses that successfully parses!

arthurcolle3y ago

j / k navigate · click thread line to collapse