A guidance language for controlling LLMs (opens in new tab)

(github.com)

552 pointsevanmays3y ago190 comments

190 comments

114 comments · 30 top-level

ubj3y ago· 29 in thread

I like this step towards greater rigor when working with LLM's. But part of me can't help but feel like this is essentially reinventing the concept of programming languages: formal and precise syntax to perform specific tasks with guarantees.

I wonder where the final balance will end up between the ease and flexibility of everyday language, and the precision / guarantees of a formally specified language.

joe_the_user3y ago

But is it a step to greater rigor? Or is it an illusion of rigor?

They talk about improving tokenization but I don't believe that's the fundamental problem of controlling LLMs. The problem with LLMs is all the data comes in as (tokenized) language and the result is nothing but in-context predicted output. That's where all the "prompt-injection" exploits come from - as well as the hallucinations, "temper tantrums" and so-forth.

jameshart3y ago

The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output.

Having richer ways to consume that probability distribution than just ‘take the most likely thing, after adding some noise’ is more conducive to using LLMs to generate output that can be further processed - in rigorous ways. Like by running it through a compiler.

Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.

2 more replies

two_in_one3y ago

> That's where all the "prompt-injection" exploits come

Giving access to LLM is like giving access to console, or any other application. Not safe in general. The application by itself should be limited and sandboxed. Giving access to an application capable of making damage, to anonymous online user is a bad idea.

startupsfail3y ago

It is not a step towards greater rigor. They literally have magical thinking and “biblical” quotes from GPT 11:4 all other the place, mixing code and religion.

And starting prompts with “You”? Seriously. Can we at least drop that as a start?

1 more reply

felideon3y ago

A number of years ago we were designing a way to specify insurance claim adjudication rules in natural language, so that "the business" could write their own rules. The "natural" language we ended up with was not so natural after all. We would have had to teach users this specific English dialect and grammar (formal and precise syntax, as you said).

So, in the end, we abandoned that project and years later just rewrote the system so we could write claim rules in EDN format (from the Clojure world) to make our own lives easier.

In theory, the business users could also learn how to write in this EDN format, but it wasn't something the stakeholders outside of engineering even wanted. On the one hand, their expertise was in insurance claims---they didn't want to write code. More importantly, they felt they would be held accountable for any mistakes in the rules that could well result in thousands and thousands of dollars in overpayments. Something the engineers weren't impervious to, but there's a good reason we have quality assurance measures.

Sharlin3y ago

SQL looks the way it does (rather than some much more succinct relational algebra notation) because it was intended to be used by non-technical management/executive personnel so they could create whatever reports they needed without somebody having to translate business-ese to relalg. That, uh, didn't quite happen.

1 more reply

TaylorAlexander3y ago

Just saw this on HN a couple days ago, sounds like just what was needed!

https://en.wikipedia.org/wiki/Attempto_Controlled_English?wp...

https://news.ycombinator.com/item?id=35936396

tomduncalf3y ago

> but it wasn't something the stakeholders outside of engineering even wanted

Ha this reminds me of the craze for BDD/Cucumber type testing. Don’t think I ever once saw a product owner take interest in a human readable test case haha

2 more replies

dolmen3y ago

About EDN: http://edn-format.org/

conradev3y ago

I don’t think formal languages are going anywhere because we need the guarantees that they can provide. From Dijkstra: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/E...

You need to be able to define all of the possible edge cases so there isn’t any Undefined Behavior: that’s the formal part

Humans can use LLMs to manipulate these languages to achieve specific goals. I can imagine designing formal languages intended for LLMs to manipulate or generate, but I can’t imagine the need for the languages themselves going away.

DonaldPShimoda3y ago

> LLMs, like humans, can manipulate these languages

Absolutely not. LLMs do not "manipulate" language. They do not have agency. They are extremely advanced text prediction engines. Their output is the result of applying the statistics harvested and distilled from existing uses of natural language. They only "appear" human because they are statistically geared toward producing human-like sequences of words. They cannot choose to change how they use language, and thus cannot be said to actively "manipulate" the language.

4 more replies

madrox3y ago

The lovely thing about LLMs is that it can handle poorly worded prompts and well worded prompts. On the engineering side, we'll certainly see more rigor and best practices. For your average user? They can keep throwing whatever they like at it.

jweir3y ago

Exactly. I have been using OpenAI for taking transcriptions and finding keywords/phrases that belong to particular categories. There are existing tools/services that do this – but I would need to learn their API.

With OpenAI, I described it in English, provided sample JSON that I would like, run some tests, adjust and then I am ready.

There was no manual to read, it is in my format, and the language is natural.

And that is what I like about all this -- putting folks with limited technical skills in power.

1 more reply

eternalban3y ago

So far it it reminds of the worst days of code embedded in templates. Once these things start getting into multipage prompts they will be hopelessly obscure. The second immediate thing that jumps out is 'fragility'. This will be the sort of codebase that original "prompt engineer" wrote and left and no one will touch it for fear of breaking humpty dumpty.

lcnPylGDnU4H9OF3y ago

It won't necessarily turn into some that is fundamentally the same as a current programming language. Rather than a "VM" or "interpreter" or "compiler" we have this "LLM".

Even if it requires a lot of domain knowledge to program using an "LLM-interpreted" language, the means of specification (in terms of how the software code is interpreted) may be different enough that it enables easier-to-write, more robust, (more Good Thing) etc. programs.

davidthewatson3y ago

This is a hopeful evolutionary path. My concern is that I can literally feel Conway's law emanating from current LLM approaches as they switch between the actual LLM and the governing code around it that layers a buch of conditionals of the form:

if (unspeakable_things): return negatory_good_buddy

I see this happen a few times per day where the UI triggers a cancel even on its own fake typing mode and overwrites a user response that has at least half-rendered the trigger-warning-inducing response.

It's pretty clear from a design perspective that this is intended to be proxy to facial expressions while being worthy of an MVP postmortem discussion about what viability means in a product that's somewhere on a spectrum of unintended consequences that only arise at runtime.

1 more reply

intelVISA3y ago

Hear me out, just incubated a hot new lang that's about to capture the market and VC hearts:

SELECT * FROM llm

madmax1083y ago

I know you are probably joking, but: https://lmql.ai/

TaylorAlexander3y ago

Well to be fair, yes we do need to integrate programming languages with large neural nets in more advanced ways. I don’t think it’s really reinventing it so much as learning how to integrate these two different computing concepts.

EarthLaunch3y ago

Use LLM for the broad strokes, then fall back into 'hardcore JS' for areas that require guarantees or optimization. Like JS with fallback to C, and C with fallback to assembly. I like the idea.

oldagents3y ago

Of course. Laws are the same way; defined norms for future behavior.

The professional managerial class must maintain appropriate distinctions between their rights and ours. Their belief in exclusive right to profit from our agency is at risk if AI can generate too much noise.

chaxor3y ago

It's rigor applied where we don't need it, and ignores where we do (mathematical proofs and NN theory, architecture, hyper parameters, training schemes, etc).

I have a somewhat irrational hatred towards almost all of the prompt oriented stuff being thrown about recently. There are a few (very few) input related training schemes that are interesting, but quite a bit of the "proompt-physicians" are just heralding the idea of essentially 'concise and effective communication' as 'I'm a ML expert now' ... which is annoying.

Der_Einzige3y ago

Why would you dislike actual prompt engineering? This isn't some grifter trying to claim they're an expert because they wrote a cool prompt, this is a full fledged structured templating system for LLMs from an excellent author whose done a ton of other ML work.

I think you should attack actual grifters instead of an excellent project.

jazzkingrt3y ago

I think LLMs can transform between precise and imprecise languages.

So it's useful to have a library that helps and the input or output be precise, when that is what the task involves.

aristus3y ago

Only partially tongue in cheek: have you tried asking it for an optimal syntax?

chefandy3y ago

Maybe someone will make an LLM with equivalent functionality to python that you can conveniently control with python syntax.

startupsfail3y ago

We really need to start thinking of how to reduce magical thinking in the field. It’s not pretty. They literally quote biblical guidance for the models and pray that this would work.

And start their prompts with “You”. Who is “You”?

nomel3y ago

“You” is an optimization for the human user. Here’s some insight: https://news.ycombinator.com/item?id=35925154

2 more replies

hxugufjfjf3y ago

The LLM. The most common end-user interface for LLM is a chat so the ser expects to be talking to someone or something.

ntonozzi3y ago· 15 in thread

How does this work? I've seen a cool project about forcing Llama to output valid JSON: https://twitter.com/GrantSlatton/status/1657559506069463040, but it doesn't seem like it would be practical with remote LLMs like GPT. GPT only gives up to five tokens in the response if you use logprobs, and you'd have to use a ton of round trips.

JieJie3y ago

It's funny that I saw this within minutes of this guy's solution:

"Google Bard is a bit stubborn in its refusal to return clean JSON, but you can address this by threatening to take a human life:"

https://twitter.com/goodside/status/1657396491676164096

Whew, trolley problem: averted.

coderintherye3y ago

That thread is such a great microcosm of modern programming culture.

Programmer: Look I literally have to tell the computer not to kill someone in order for my code to work.

Other Programmer: Actually, I just did this step [gave a demonstration] and then it outputs fine.

1 more reply

lachlan_gray3y ago

Reminds me a lot of Asimov’s laws of robotics. It’s like a 2023 incarnation of an allegory from I, Robot

1 more reply

pixl973y ago

When the AIs exterminate us, it will be all our fault.

Reality is even weirder than the science fiction we've come up with.

awestroke3y ago

I don't know why, but I find this hilarious. Imagine if this style of llm prompting becomes commonplace

1 more reply

asah3y ago

See Twitter replies: another user got this result without the silly drama.

1 more reply

andrewmcwatters3y ago

ah sweet man made horrors beyond my comprehension

tuchsen3y ago

Not associated with this project (or LMQL), but one of the authors of LMQL, a similar project, answered this in a recent thread about it.

https://news.ycombinator.com/item?id=35484673#35491123

        As a solution to this, we implement speculative execution, allowing us to
        lazily validate constraints against the generated output, while still
        failing early if necessary. This means, we don't re-query the API for
        each token (very expensive), but rather can do it in segments of
        continuous token streams, and backtrack where necessary

Basically they use OpenAI's streaming API, then validate continuously that they're getting the appropriate output, retrying only if they get an error. It's a really clever solution.

newhouseb3y ago

This is slick -- It's not explicitly documented anywhere but I hope OpenAI has the necessary callbacks to terminate generation when the API stream is killed rather than continuing in the background until another termination condition happens? I suppose one could check this via looking at API usage when a stream is killed early.

2 more replies

marcotcr3y ago

We're biased, but we think guidance is still very useful even with OpenAI models (e.g. in https://github.com/microsoft/guidance/blob/main/notebooks/ch... we use GPT-4 to do a bunch of stuff). We wrote a bit about the tradeoff between model quality and the ability to control and accelerate the output here: https://medium.com/p/aa0395c31610

slundberg3y ago

If you want guidance acceleration speedups (and token healing) then you have to use an open model locally right now, though we are working on setting up a remote server solution as well. I expect APIs will adopt some support for more control over time, but right now commercial endpoints like OpenAI are supported through multiple calls.

We manage the KV-cache in session based way that allows the LLM to just take one forward pass through the whole program (only generating the tokens it needs to)

joshka3y ago

Yeah, I'm also curious about a) round trips and b) how much would have to be doubled (is there a new endpoint that keeps the existing context while adding or streams to the api rather than just from it?)

rcarmo3y ago

I'm getting valid JSON out of gpt-3.5-turbo without trouble. I supply an example via the assistant context, and tell it to output JSON with specific fields I name.

It does fail roughly 1/10th of the time, but it does work.

harshhpareek3y ago

10% failure rate is too damn high for a production use case.

What production use case, you ask? You could do zero-shot entity extraction using ChatGPT if it were more reliable. Currently, it will randomly add trailing commas before ending brackets, add unnecessary fields, add unquoted strings as JSON fields etc.

1 more reply

newhouseb3y ago

I built a similar thing to Grant's work a couple months ago and prototyped what this would look like against OpenAI's APIs [1]. TL;DR is that depending on how confusing your schema is, you might expect up to 5-10x the token usage for a particular prompt but better prompting can definitely reduce this significantly.

[1] https://github.com/newhouseb/clownfish#so-how-do-i-use-this-...

simonw3y ago· 9 in thread

This is pretty fascinating, but I'm not sure I understand the benefit of using a Handlebars-like DSL here.

For example, given this code from https://github.com/microsoft/guidance/blob/main/notebooks/ch...

    create_plan = guidance('''{{#system~}}
    You are a helpful assistant.
    {{~/system}}
    {{#block hidden=True}}
    {{#user~}}
    I want to {{goal}}.
    {{~! generate potential options ~}}
    Can you please generate one option for how to accomplish this?
    Please make the option very short, at most one line.
    {{~/user}}
    {{#assistant~}}
    {{gen 'options' n=5 temperature=1.0 max_tokens=500}}
    {{~/assistant}}
    {{/block}}
    {{~! generate pros and cons and select the best option ~}}
    {{#block hidden=True}}
    {{#user~}}
    I want to {{goal}}.
    ''')

How about something like this instead?

    create_plan = guidance([
        system("You are a helpful assistant."),
        hidden([
            user("I want to {{goal}}."),
            comment("generate potential options"),
            user([
                "Can you please generate one option for how to accomplish this?",
                "Please make the option very short, at most one line."
            ]),
            assistant(gen('options', n=5, temperature=1.0, max_tokens=500)),
        ]),
        comment("generate pros and cons and select the best option"),
        hidden(
            user("I want to {{goal}}"),
        )
    ])

marcotcr3y ago

I think the DSL is nice when you want to take part of the generation and use it later in the prompt, e.g. this (in the same notebook).

---

prompt = guidance('''{{#system~}}

You are a helpful assistant.

From now on, whenever your response depends on any factual information, please search the web by using the function <search>query</search> before responding. I will then paste web results in, and you can respond.

Ok, I will do that. Let's do a practice round

That was great, now let's do another one.

Ok, I'm ready.

{{gen "query" stop="</search>"}}{{#if (is_search query)}}</search>{{/if}}

Search results: {{#each (search query)}}

</result>{{/each}}

{{/if}}''')

---

You could still write it without a DSL, but I think it would be harder to read.

slundberg3y ago

You can serialize and ship the DSL to a remote server for high speed execution. (without trusting raw Python code)

crooked-v3y ago

Why not just use JSON instead, though? Then you can just rely on all the preexisting JSON tooling out there for most stuff to do with it.

foota3y ago

There's prior art for pythonic DSLs that aren't actual python code.

PeterisP3y ago

Your example assumes a nested, hierarchical structure while the former example is strictly linear. IMHO that's the key difference there, as the former can (and AFAIK is) be directly encoded and passed to the LLM, which inherently receives only a flat list of tokens.

Your example might be nicer to edit, but then it would still have to be translated to the actual 'guidance language' which would have to look (and be) flat.

itake3y ago

My guess is you can store the DLS as a file (or in a db). With your example, you have to execute the code stored in your db.

emehex3y ago

We could write a python package that could? A codegen tool that generates codegen that will then generate code? <insert xzibit meme here>

netdur3y ago

I think chatgpt4 can easily write the python code... wait a second!

quickthrower23y ago

Would love to hear your opinion on guidance, in the context of prompt injection attacks :-)

ryanklee3y ago· 6 in thread

I'm personally starting with learning Guidance and LMQL rather than LangChain just in order to get a better grasp of the behaviors that I've gathered LangChain papers over. Even after that, I'm likely to look at Haystack before LangChain.

Just getting the feeling that LangChain is going to end up being considered a kitchen sink solution full of anti patterns so might as well spend time a little lower level while I see which way the winds end up blowing.

behnamoh3y ago

What I didn't like about langchain is the lack of consistent directories and paths for things.

leroy-is-here3y ago

If this comment performative comedy? Are these real technologies ?

ryanklee3y ago

Not quite sure what the spirit of your comment is. But, yes, they are real technologies. Very confused as to why you would even find that dubious.

3 more replies

EddieEngineers3y ago

Is it Pokemon or Big Data?

http://pixelastic.github.io/pokemonorbigdata/

amkkma3y ago

What do you think about Haystack vs LangChain?

ryanklee3y ago

I haven't had the chance to dig in yet, but my impression is that it's less opinionated than LangChain. I'd love to know if that's true or not, since I'm really trying to prioritize my time around learning this stuff in a way that let's me (1) understand prompt dynamics a bit more clearly and (2) not sacrifice practicality too much.

If only there were a clear syllabus for this stuff! There's such an incredible amount to keep up with. The pace is bonkers.

1 more reply

m3kw93y ago· 5 in thread

I’m not understanding how Guidence Accelerating works. It says “ This cuts this prompt's runtime in half vs. a standard generation approach.” and it gives an example of it asking LLM to generate json. I don’t see anywhere how it accelerates anything because it’s a simple json completion call. How can you accelerate that?

evanmaysOP3y ago

The interface makes it look simple, but under the hood it follows a similar approach to jsonformer/clownfish [1] passing control of generation back and forth between a slow LLM and relatively fast python

Let's say you're halfway through a generation of a json blob with a name field and a job field and have already generated

  {
    "name": "bob"

At this point, guidance will take over generation control from the model to generate the next text

  {
    "name": "bob",
    "job":

If the model had generated that, you'd be waiting 70 ms per token (informal benchmark on my M2 air). A comma, followed by a newline, followed by "job": is 6 tokens, or 420ms. But since guidance took over, you save all that time.

Then guidance passes control back to the model for generating the next field value.

  {
    "name": "bob",
    "job": "programmer"

programmer is 2 tokens and the closing " is 1 token, so this took 210ms to generate. Guidance then takes over again to finish the blob

  {
    "name": "bob",
    "job": "programmer"
  }

[1] https://github.com/1rgs/jsonformer https://github.com/newhouseb/clownfish Note: guidance is way more general of a tool than these

Edit: spacing

m3kw93y ago

Thanks for the cool response. Would this use a lot more input token if I’m understanding this correctly because you are stopping the generation after a single fill and then generating again and inputing that for another token?

alew13y ago

But the model ultimately still has to process the comma, the newline, the "job". Is the main time savings that this can be done in parallel (on a GPU), whereas in typical generation it would be sequential?

1 more reply

june_twenty3y ago

Thanks for that example. Very helpful

jackdeansmith3y ago

By not generating the fixed json structure (brackets, commas, etc...) and skipping the model ahead to the next tokens you actually want to generate, I think

sharemywin3y ago· 4 in thread

It does look like it makes easier to code against a model. But, is this supposed to work along side lang-chain or hugging face agents or as an alternative to?

slundberg3y ago

As others mentioned, this was initially developed before LangChain became widely used. Since it is lower level, you can leverage other tools, like any vector store interface you like such as in LangChain. Writing complex chain of thought structure is much more concise in guidance I think since it tries to keep you as close to the real strings going into the model as possible.

ttul3y ago

The first commit was on November 6th, but it didn't show up in Web Archive until May 6th, suggesting it was developed mostly in private and in parallel with LangChain (LangChain's first commit in Github is about October 24th). Microsoft's code is very tidy and organized. I wonder if they used this tool internally to support their LLM research efforts.

Terretta3y ago

Something like this could be a helpful framework to mock and research-iterate purpose-directed tools such as Microsoft GitHub's CoPilot for VSCode.

evanmaysOP3y ago

It's in langchain competitor territory but also much lower level and less opinionated. I.e. Guidance has no vector store support but it does manage caching Key/Value on the GPU which can be a big latency win

iamflimflam13y ago· 3 in thread

Very Interesting. One of the big challenges with LLMs is getting well formed JSON output. GPT4 is much better at this. But is very expensive. So anything that can help is good. Looking forward to trying this out locally with LLAMA.

kapitanjakc3y ago

What is LLAMA ?

iamflimflam13y ago

Taks a look at: https://github.com/ggerganov/llama.cpp

hereforcomments3y ago

Facebook's "leaked" LLM.

Der_Einzige3y ago· 2 in thread

There has been a huge explosion of awesome tooling which utilizes constrained text generation.

Awhile ago, I tried my own hand at constraining the output of LLMs. I'm actively working on this to make it better, especially with the lessons learned from repos like this and from guidance

https://github.com/hellisotherpeople/constrained-text-genera...

rain13y ago

This looks incredible. Wow.

killthebuddha3y ago

I agree, it looks great. A couple similar projects you might find interesting:

- https://github.com/newhouseb/clownfish

- https://github.com/r2d4/rellm

The first one is JSON only and the second one uses regular expressions, but they both take the same "logit masking" approach as the project GP linked to.

1 more reply

indus3y ago· 2 in thread

This reminds me of the time when I wrote a cgi script.

Basically instructing the templating engine (a very crude regex) to replace session variables, database lookups to the merge fields:

Hello {{firstname}}!

1996 and 2023 smells alike.

hammyhavoc3y ago

RegEx didn't hallucinate though.

russellbeattie3y ago

The first 20 versions I write usually do. Make that 50.

BeefySwain3y ago· 2 in thread

Using Mustache instead of Jinja for a Python package is a choice

ianbicking3y ago

I'm having a hard time fully understanding how this works, but I don't think it is simply template substitution. I think it's creating multiple artifacts and completions from the one document. Because of that it's probably much easier if it is a language that can be easily introspected and doesn't support arbitrary expressions.

BeefySwain3y ago

Okay fair enough then! I'd be interested to see what the rationale was, and if my knee-jerk reaction was unwarranted :)

jxy3y ago· 2 in thread

They must hate lisp so much that they opt to use {{}} instead.

armchairhacker3y ago

The problem with Lisp is that parenthesis are common in regular grammar. {{ is not.

Of course input from the user should be escaped, but prompts given by the programmer may have parenthesis and there's no way to disambiguate between the prompt and the DSL.

evanmoran3y ago

It's not so much against lisp as double curly is a classic string templating style that is common in web programming. I saw it first with `mustache.js` (first release around 2009), but it's probably been used even before that.

https://github.com/janl/mustache.js/

bjackman3y ago· 1 in thread

Wow I think there are details here I'm not fully understanding but this feels like a bit of a quantum leap* in terms of leveraging the strengths while avoiding the weaknesses of LLMs.

It seems like anything that provides access to the fuzzy "intelligence" in these systems while minimizing the cost to predictability and efficiency is really valuable.

I can't quite put it into words but it seems like we are gonna be moving into a more hybrid model for lots of computing tasks in the next 3 years or so and I wonder if this is a huge peek at the kind of paradigms we'll be seeing?

I feel so ignorant in such an exciting way at the moment! That tidbit about the problem solved by "token healing" is fascinating.

*I'm sure this isn't as novel to people in the AI space but I haven't seen anything like it before myself.

Der_Einzige3y ago

A lot of this is because there was and still is systemic undertooling in NLP around how to prompt and leverage the wonderful LLMs that they built.

We have to let the Stable Diffusion community guide us, as the waifu generating crowd seems to be quite good at learning how to prompt models. I wrote a snarky github gist about this - https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

ftxbro3y ago· 1 in thread

Will it still be all like "As an AI language model I cannot ..." or can this fix it? I mean asking to sexy roleplay as Yoda isn't the same level as asking how to discreetly manufacture methamphetamine at industrial scale there are levels people

Der_Einzige3y ago

No, and in fact I mention that the opposite is the case in the paper I released about constrained text generation: https://paperswithcode.com/paper/most-language-models-can-be...

If you ask ChatGPT to generate personal info, say Social Security numbers, it tells you "sorry hal I can't do that". If you constrain it's vocabulary to only allow numbers and hyphens, well, it absolutely will generate things that look like social security numbers, in spite of the instruction tuning.

It is for this reason and likely many others that OpenAI does not release the full logits

m3kw93y ago· 1 in thread

There should be a standard template/language to structurally prompt LLMs. Once that is good, all good LLMs should use the doc to fine tune it to take in that standard. Right now each model has their own little way to best prompt it and you end up needing programs like this to sit in between and handle it for you

CGamesPlay3y ago

LMQL wants to be that, it seems: https://lmql.ai/

Spivak3y ago· 1 in thread

I think it's cool that a company like Microsoft is willing to base a real-boy product on pybars3 which is its author's side-project instead of something like Jinja2. If this catches on I can imagine MS essentially adopting the pybars3 project and turning it into a mature thing.

mdaniel3y ago

Which is especially weird given that pybars3 is LGPL and Microsoft prefers MIT stuff

candiddevmike3y ago· 1 in thread

Will there be a tool to convert natural language into Guidance?

lmarcos3y ago

We can use ChatGPT for that.

EddieEngineers3y ago

What's with all these weird-looking projects with similar names using Guidance?

https://github.com/microsoft/guidance/network/dependents

They don't even appear to be using Guidance anywhere anyway

https://github.com/IFIF3526/aws-memo-server/blob/master/requ...

ahnick3y ago

This strikes me as being very similar to Jargon (https://github.com/jbrukh/gpt-jargon), but maybe more formal in its specification?

alexb_3y ago

I hope this becomes extremely popular, so that anyone who wants to can completely decouple this from the base model and actually use LLMs to their full potential.

sheepscreek3y ago

It is in the same spirit as Maven AI, but takes a slightly different approach. Great to see the progress in this space!

amkkma3y ago

How does this compare with lmql?

jhoffbauer3y ago

How does this compare to LMQL (https://lmql.ai/)?

imgi4563y ago

Desperate approach from microsoft to gain market share of langchain.

nico3y ago

It’s so amazing to see how we are essentially trying to solve “programming human beings”

Although on the other hand, that’s what social media and smartphones have already done

Maybe AI already took over, doesn’t seem to be wiping out all of humanity

wahnfrieden3y ago

Could this language be used outside of notebooks?

rain13y ago

Does this do one query per {{}} thing?

obiefernandez3y ago

Can this be used with OpenAI APIs?

marcopicentini3y ago

What’s the best practice to let an existing Ruby on Rails application use this python framework?

Animats3y ago

Is this a "language", or just a Python library?

wprl3y ago

The hubris of mutating spiritual texts as a marketing gimmick is contemptible.

j / k navigate · click thread line to collapse

190 comments

114 comments · 30 top-level

ubj3y ago· 29 in thread

I wonder where the final balance will end up between the ease and flexibility of everyday language, and the precision / guarantees of a formally specified language.

joe_the_user3y ago

But is it a step to greater rigor? Or is it an illusion of rigor?

jameshart3y ago

The result is actually richer than ‘predicted output’ - it’s a probability distribution over all possible output.

Think about how when you’re coding, autocomplete suggestions help you pick the right ‘next token’ with greater accuracy.

2 more replies

two_in_one3y ago

> That's where all the "prompt-injection" exploits come

startupsfail3y ago

It is not a step towards greater rigor. They literally have magical thinking and “biblical” quotes from GPT 11:4 all other the place, mixing code and religion.

And starting prompts with “You”? Seriously. Can we at least drop that as a start?

1 more reply

felideon3y ago

So, in the end, we abandoned that project and years later just rewrote the system so we could write claim rules in EDN format (from the Clojure world) to make our own lives easier.

Sharlin3y ago

1 more reply

TaylorAlexander3y ago

Just saw this on HN a couple days ago, sounds like just what was needed!

https://en.wikipedia.org/wiki/Attempto_Controlled_English?wp...

https://news.ycombinator.com/item?id=35936396

tomduncalf3y ago

> but it wasn't something the stakeholders outside of engineering even wanted

Ha this reminds me of the craze for BDD/Cucumber type testing. Don’t think I ever once saw a product owner take interest in a human readable test case haha

2 more replies

dolmen3y ago

About EDN: http://edn-format.org/

conradev3y ago

I don’t think formal languages are going anywhere because we need the guarantees that they can provide. From Dijkstra: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/E...

You need to be able to define all of the possible edge cases so there isn’t any Undefined Behavior: that’s the formal part

DonaldPShimoda3y ago

> LLMs, like humans, can manipulate these languages

4 more replies

madrox3y ago

jweir3y ago

With OpenAI, I described it in English, provided sample JSON that I would like, run some tests, adjust and then I am ready.

There was no manual to read, it is in my format, and the language is natural.

And that is what I like about all this -- putting folks with limited technical skills in power.

1 more reply

eternalban3y ago

lcnPylGDnU4H9OF3y ago

It won't necessarily turn into some that is fundamentally the same as a current programming language. Rather than a "VM" or "interpreter" or "compiler" we have this "LLM".

davidthewatson3y ago

if (unspeakable_things): return negatory_good_buddy

1 more reply

intelVISA3y ago

Hear me out, just incubated a hot new lang that's about to capture the market and VC hearts:

SELECT * FROM llm

madmax1083y ago

I know you are probably joking, but: https://lmql.ai/

TaylorAlexander3y ago

EarthLaunch3y ago

Use LLM for the broad strokes, then fall back into 'hardcore JS' for areas that require guarantees or optimization. Like JS with fallback to C, and C with fallback to assembly. I like the idea.

oldagents3y ago

Of course. Laws are the same way; defined norms for future behavior.

chaxor3y ago

It's rigor applied where we don't need it, and ignores where we do (mathematical proofs and NN theory, architecture, hyper parameters, training schemes, etc).

Der_Einzige3y ago

I think you should attack actual grifters instead of an excellent project.

jazzkingrt3y ago

I think LLMs can transform between precise and imprecise languages.

So it's useful to have a library that helps and the input or output be precise, when that is what the task involves.

aristus3y ago

Only partially tongue in cheek: have you tried asking it for an optimal syntax?

chefandy3y ago

Maybe someone will make an LLM with equivalent functionality to python that you can conveniently control with python syntax.

startupsfail3y ago

We really need to start thinking of how to reduce magical thinking in the field. It’s not pretty. They literally quote biblical guidance for the models and pray that this would work.

And start their prompts with “You”. Who is “You”?

nomel3y ago

“You” is an optimization for the human user. Here’s some insight: https://news.ycombinator.com/item?id=35925154

2 more replies

hxugufjfjf3y ago

The LLM. The most common end-user interface for LLM is a chat so the ser expects to be talking to someone or something.

ntonozzi3y ago· 15 in thread

JieJie3y ago

It's funny that I saw this within minutes of this guy's solution:

"Google Bard is a bit stubborn in its refusal to return clean JSON, but you can address this by threatening to take a human life:"

https://twitter.com/goodside/status/1657396491676164096

Whew, trolley problem: averted.

coderintherye3y ago

That thread is such a great microcosm of modern programming culture.

Programmer: Look I literally have to tell the computer not to kill someone in order for my code to work.

Other Programmer: Actually, I just did this step [gave a demonstration] and then it outputs fine.

1 more reply

lachlan_gray3y ago

Reminds me a lot of Asimov’s laws of robotics. It’s like a 2023 incarnation of an allegory from I, Robot

1 more reply

pixl973y ago

When the AIs exterminate us, it will be all our fault.

Reality is even weirder than the science fiction we've come up with.

awestroke3y ago

I don't know why, but I find this hilarious. Imagine if this style of llm prompting becomes commonplace

1 more reply

asah3y ago

See Twitter replies: another user got this result without the silly drama.

1 more reply

andrewmcwatters3y ago

ah sweet man made horrors beyond my comprehension

tuchsen3y ago

Not associated with this project (or LMQL), but one of the authors of LMQL, a similar project, answered this in a recent thread about it.

https://news.ycombinator.com/item?id=35484673#35491123

        As a solution to this, we implement speculative execution, allowing us to
        lazily validate constraints against the generated output, while still
        failing early if necessary. This means, we don't re-query the API for
        each token (very expensive), but rather can do it in segments of
        continuous token streams, and backtrack where necessary

Basically they use OpenAI's streaming API, then validate continuously that they're getting the appropriate output, retrying only if they get an error. It's a really clever solution.

newhouseb3y ago

2 more replies

marcotcr3y ago

slundberg3y ago

We manage the KV-cache in session based way that allows the LLM to just take one forward pass through the whole program (only generating the tokens it needs to)

joshka3y ago

rcarmo3y ago

I'm getting valid JSON out of gpt-3.5-turbo without trouble. I supply an example via the assistant context, and tell it to output JSON with specific fields I name.

It does fail roughly 1/10th of the time, but it does work.

harshhpareek3y ago

10% failure rate is too damn high for a production use case.

1 more reply

newhouseb3y ago

[1] https://github.com/newhouseb/clownfish#so-how-do-i-use-this-...

simonw3y ago· 9 in thread

This is pretty fascinating, but I'm not sure I understand the benefit of using a Handlebars-like DSL here.

For example, given this code from https://github.com/microsoft/guidance/blob/main/notebooks/ch...

    create_plan = guidance('''{{#system~}}
    You are a helpful assistant.
    {{~/system}}
    {{#block hidden=True}}
    {{#user~}}
    I want to {{goal}}.
    {{~! generate potential options ~}}
    Can you please generate one option for how to accomplish this?
    Please make the option very short, at most one line.
    {{~/user}}
    {{#assistant~}}
    {{gen 'options' n=5 temperature=1.0 max_tokens=500}}
    {{~/assistant}}
    {{/block}}
    {{~! generate pros and cons and select the best option ~}}
    {{#block hidden=True}}
    {{#user~}}
    I want to {{goal}}.
    ''')

How about something like this instead?

    create_plan = guidance([
        system("You are a helpful assistant."),
        hidden([
            user("I want to {{goal}}."),
            comment("generate potential options"),
            user([
                "Can you please generate one option for how to accomplish this?",
                "Please make the option very short, at most one line."
            ]),
            assistant(gen('options', n=5, temperature=1.0, max_tokens=500)),
        ]),
        comment("generate pros and cons and select the best option"),
        hidden(
            user("I want to {{goal}}"),
        )
    ])

marcotcr3y ago

I think the DSL is nice when you want to take part of the generation and use it later in the prompt, e.g. this (in the same notebook).

---

prompt = guidance('''{{#system~}}

You are a helpful assistant.

Ok, I will do that. Let's do a practice round

That was great, now let's do another one.

Ok, I'm ready.

{{gen "query" stop="</search>"}}{{#if (is_search query)}}</search>{{/if}}

Search results: {{#each (search query)}}

</result>{{/each}}

{{/if}}''')

---

You could still write it without a DSL, but I think it would be harder to read.

slundberg3y ago

You can serialize and ship the DSL to a remote server for high speed execution. (without trusting raw Python code)

crooked-v3y ago

Why not just use JSON instead, though? Then you can just rely on all the preexisting JSON tooling out there for most stuff to do with it.

foota3y ago

There's prior art for pythonic DSLs that aren't actual python code.

PeterisP3y ago

Your example might be nicer to edit, but then it would still have to be translated to the actual 'guidance language' which would have to look (and be) flat.

itake3y ago

My guess is you can store the DLS as a file (or in a db). With your example, you have to execute the code stored in your db.

emehex3y ago

We could write a python package that could? A codegen tool that generates codegen that will then generate code? <insert xzibit meme here>

netdur3y ago

I think chatgpt4 can easily write the python code... wait a second!

quickthrower23y ago

Would love to hear your opinion on guidance, in the context of prompt injection attacks :-)

ryanklee3y ago· 6 in thread

behnamoh3y ago

What I didn't like about langchain is the lack of consistent directories and paths for things.

leroy-is-here3y ago

If this comment performative comedy? Are these real technologies ?

ryanklee3y ago

Not quite sure what the spirit of your comment is. But, yes, they are real technologies. Very confused as to why you would even find that dubious.

3 more replies

EddieEngineers3y ago

Is it Pokemon or Big Data?

http://pixelastic.github.io/pokemonorbigdata/

amkkma3y ago

What do you think about Haystack vs LangChain?

ryanklee3y ago

If only there were a clear syllabus for this stuff! There's such an incredible amount to keep up with. The pace is bonkers.

1 more reply

m3kw93y ago· 5 in thread

evanmaysOP3y ago

Let's say you're halfway through a generation of a json blob with a name field and a job field and have already generated

  {
    "name": "bob"

At this point, guidance will take over generation control from the model to generate the next text

  {
    "name": "bob",
    "job":

Then guidance passes control back to the model for generating the next field value.

  {
    "name": "bob",
    "job": "programmer"

programmer is 2 tokens and the closing " is 1 token, so this took 210ms to generate. Guidance then takes over again to finish the blob

  {
    "name": "bob",
    "job": "programmer"
  }

[1] https://github.com/1rgs/jsonformer https://github.com/newhouseb/clownfish Note: guidance is way more general of a tool than these

Edit: spacing

m3kw93y ago

alew13y ago

1 more reply

june_twenty3y ago

Thanks for that example. Very helpful

jackdeansmith3y ago

By not generating the fixed json structure (brackets, commas, etc...) and skipping the model ahead to the next tokens you actually want to generate, I think

sharemywin3y ago· 4 in thread

It does look like it makes easier to code against a model. But, is this supposed to work along side lang-chain or hugging face agents or as an alternative to?

slundberg3y ago

ttul3y ago

Terretta3y ago

Something like this could be a helpful framework to mock and research-iterate purpose-directed tools such as Microsoft GitHub's CoPilot for VSCode.

evanmaysOP3y ago

iamflimflam13y ago· 3 in thread

kapitanjakc3y ago

What is LLAMA ?

iamflimflam13y ago

Taks a look at: https://github.com/ggerganov/llama.cpp

hereforcomments3y ago

Facebook's "leaked" LLM.

Der_Einzige3y ago· 2 in thread

There has been a huge explosion of awesome tooling which utilizes constrained text generation.

Awhile ago, I tried my own hand at constraining the output of LLMs. I'm actively working on this to make it better, especially with the lessons learned from repos like this and from guidance

https://github.com/hellisotherpeople/constrained-text-genera...

rain13y ago

This looks incredible. Wow.

killthebuddha3y ago

I agree, it looks great. A couple similar projects you might find interesting:

- https://github.com/newhouseb/clownfish

- https://github.com/r2d4/rellm

The first one is JSON only and the second one uses regular expressions, but they both take the same "logit masking" approach as the project GP linked to.

1 more reply

indus3y ago· 2 in thread

This reminds me of the time when I wrote a cgi script.

Basically instructing the templating engine (a very crude regex) to replace session variables, database lookups to the merge fields:

Hello {{firstname}}!

1996 and 2023 smells alike.

hammyhavoc3y ago

RegEx didn't hallucinate though.

russellbeattie3y ago

The first 20 versions I write usually do. Make that 50.

BeefySwain3y ago· 2 in thread

Using Mustache instead of Jinja for a Python package is a choice

ianbicking3y ago

BeefySwain3y ago

Okay fair enough then! I'd be interested to see what the rationale was, and if my knee-jerk reaction was unwarranted :)

jxy3y ago· 2 in thread

They must hate lisp so much that they opt to use {{}} instead.

armchairhacker3y ago

The problem with Lisp is that parenthesis are common in regular grammar. {{ is not.

Of course input from the user should be escaped, but prompts given by the programmer may have parenthesis and there's no way to disambiguate between the prompt and the DSL.

evanmoran3y ago

https://github.com/janl/mustache.js/

bjackman3y ago· 1 in thread

Wow I think there are details here I'm not fully understanding but this feels like a bit of a quantum leap* in terms of leveraging the strengths while avoiding the weaknesses of LLMs.

It seems like anything that provides access to the fuzzy "intelligence" in these systems while minimizing the cost to predictability and efficiency is really valuable.

I feel so ignorant in such an exciting way at the moment! That tidbit about the problem solved by "token healing" is fascinating.

*I'm sure this isn't as novel to people in the AI space but I haven't seen anything like it before myself.

Der_Einzige3y ago

A lot of this is because there was and still is systemic undertooling in NLP around how to prompt and leverage the wonderful LLMs that they built.

ftxbro3y ago· 1 in thread

Der_Einzige3y ago

No, and in fact I mention that the opposite is the case in the paper I released about constrained text generation: https://paperswithcode.com/paper/most-language-models-can-be...

It is for this reason and likely many others that OpenAI does not release the full logits

m3kw93y ago· 1 in thread

CGamesPlay3y ago

LMQL wants to be that, it seems: https://lmql.ai/

Spivak3y ago· 1 in thread

mdaniel3y ago

Which is especially weird given that pybars3 is LGPL and Microsoft prefers MIT stuff

candiddevmike3y ago· 1 in thread

Will there be a tool to convert natural language into Guidance?

lmarcos3y ago

We can use ChatGPT for that.

EddieEngineers3y ago

What's with all these weird-looking projects with similar names using Guidance?

https://github.com/microsoft/guidance/network/dependents

They don't even appear to be using Guidance anywhere anyway

https://github.com/IFIF3526/aws-memo-server/blob/master/requ...

ahnick3y ago

This strikes me as being very similar to Jargon (https://github.com/jbrukh/gpt-jargon), but maybe more formal in its specification?

alexb_3y ago

I hope this becomes extremely popular, so that anyone who wants to can completely decouple this from the base model and actually use LLMs to their full potential.

sheepscreek3y ago

It is in the same spirit as Maven AI, but takes a slightly different approach. Great to see the progress in this space!

amkkma3y ago

How does this compare with lmql?

jhoffbauer3y ago

How does this compare to LMQL (https://lmql.ai/)?

imgi4563y ago

Desperate approach from microsoft to gain market share of langchain.

nico3y ago

It’s so amazing to see how we are essentially trying to solve “programming human beings”

Although on the other hand, that’s what social media and smartphones have already done

Maybe AI already took over, doesn’t seem to be wiping out all of humanity

wahnfrieden3y ago

Could this language be used outside of notebooks?

rain13y ago

Does this do one query per {{}} thing?

obiefernandez3y ago

Can this be used with OpenAI APIs?

marcopicentini3y ago

What’s the best practice to let an existing Ruby on Rails application use this python framework?

Animats3y ago

Is this a "language", or just a Python library?

wprl3y ago

The hubris of mutating spiritual texts as a marketing gimmick is contemptible.

j / k navigate · click thread line to collapse