Show HN: Magentic – Use LLMs as simple Python functions (opens in new tab)

(github.com)

283 pointsjackmpcollins2y ago63 comments

This is a Python package that allows you to write function signatures to define LLM queries. This makes it easy to mix regular code with calls to LLMs, which enables you to use the LLM for its creativity and reasoning while also enforcing structure/logic as necessary. LLM output is parsed for you according to the return type annotation of the function, including complex return types such as streaming an array of structured objects.

I built this to show that we can think about using LLMs more fluidly than just chains and chats, i.e. more interchangeably with regular code, and to make it easy to do that.

Please let me know what you think! Contributions welcome.

https://github.com/jackmpcollins/magentic

Show HN: Magentic – Use LLMs as simple Python functions

(github.com)

283 pointsjackmpcollins2y ago63 comments

I built this to show that we can think about using LLMs more fluidly than just chains and chats, i.e. more interchangeably with regular code, and to make it easy to do that.

Please let me know what you think! Contributions welcome.

https://github.com/jackmpcollins/magentic

63 comments

59 comments · 28 top-level

ElectricalUnion2y ago· 5 in thread

Is it really LLMs (plural) when you only have OpenAPI integration?

jackmpcollinsOP2y ago

Right now it just works with OpenAI chat models (gpt-3.5-turbo, gpt-4) but if there's interest I plan to extend it to have several backends. These would probably each be an existing library that implements generating structured output like https://github.com/outlines-dev/outlines or https://github.com/guidance-ai/guidance. If you have ideas how this should be done let me know - on a github issue would be great to make it visible to others.

jackmpcollinsOP2y ago

Oh, and some companies offer APIs that match the OpenAI API and there are some open-source projects that do this for llama running locally. Since those would be compatible with the openai python package they will work with magentic too - though some of these do not support function calling.

See for example Anyscale Endpoints https://app.endpoints.anyscale.com/landing and https://github.com/AmineDiro/cria

1 more reply

AmazingTurtle2y ago

I tried out guidance. Encountered endless bugs

msikora2y ago

OpenAI offers a few different LLMs :)

dragonwriter2y ago

text-generation-webui offers an OpenAI API implementation, specifically to support OpenAI API clients, so you can get something more than just OpenAI support by just wrapping the OpenAI API.

You could have more flexibility by abstracting out the underlying LLM APIs, but then you also have a bigger deal with supported features of different APIs, the same conceptual feature supported with very different parameter structures, etc., etc.

hitori2y ago· 4 in thread

I am amazed that `...` is a valid syntax in Python, not a pseudo grammar.

This library is impressive, I appreciate it and I will apply it to my project.

joelthelion2y ago

What's the difference between '...' and the more common 'pass'?

hoosieree2y ago

I find students correctly infer what to do with "..." whereas they were afraid to touch "pass".

E.g, if I gave them this:

    def foo(x):
      ...  #add your implementation here
    
    def bar(x):
      pass #add your implementation here

I'd get back this:

    def foo(x):
      return x+1
      
    def bar(x):
      return x+1
      pass

inpaner2y ago

In code, using ... implies that the code is yet to be written. pass means it's explicitly a noop.

jwestbury2y ago

In this case, functionally, nothing. Some other commenters have suggested it does something interesting by implying "AI will provide the logic," whereas "pass" doesn't necessarily do that.

jstarfish2y ago· 3 in thread

This looks really useful. Langchain is not my idea of a fun time.

Love the examples too. Low-effort humor is the best:

> create_superhero("Garden Man")

> # Superhero(name='Garden Man', age=30, power='Control over plants', enemies=['Pollution Man', 'Concrete Woman'])

brandall102y ago

FWIW, at my last company we had a section in the developer guide encouraging using humor in tests - not only did it make them more fun to write, but it engaged the readership better.

phatskat2y ago

I’ve been integrating humor into our unit tests for a bit now and have gotten feedback from a few engineers who really seem to appreciate it.

cosmonoot2y ago

Would check out https://www.askmarvin.ai/ if you're into this.

I haven't downloaded 1.5 yet, but they released this last week: https://www.askmarvin.ai/prompting/prompt_function/

smilingemoji2y ago· 3 in thread

The API looks very clean. Today I learned about "..." in Python

quickthrower22y ago

It is just a noop, but here it looks very appropriate/readable because it reads as saying "AI will fill this in".

politelemon2y ago

It's a misuse of the Python Ellipsis, though PEP has no opinion on it. The Ellipsis is "Special value used mostly in conjunction with extended slicing syntax for user-defined container data types."

In other words, it happens to work and look neat, but pass is the correct way to do it.

1 more reply

nodesocket2y ago

The same as “pass”?

jedberg2y ago· 3 in thread

Curious as to why you chose to do it as a decorator instead of just a function call?

jackmpcollinsOP2y ago

I found this was the most compact way to represent what I wanted to define, and makes it easy to keep the type hints for parameters. If you look inside `@prompt` it's creating a `PromptFunction` instance which I think would be a similar API to what you would end up with without using decorators https://github.com/jackmpcollins/magentic/blob/afdb22513385b...

3abiton2y ago

I never got on board of decorators in python, but you sold me on it.

1 more reply

dragonwriter2y ago

Looking at the code, it looks like it is a way to support typing; just making it a function with the string template would let you return a dynamically-defined function but I think would make it harder to get static typing.

BoorishBears2y ago· 2 in thread

I've personally found frameworks like this to get in the way of quality COT: It's rare for a prompt that takes great advantage of the LLM's reasoning to fit in the format these generators encourage

A friend mentioned how terrible most cold email generators are at actually generating natural feeling emails. It just took asking him questions about how actual people in marketing come up with emails to come up with a chain of thought that produces intentionally uncanny emails for a wide range of inputs: https://rentry.co/54hbz

It's not like you can't technically fit what I described into bunch of comments (or an obnoxiously long multiline comment), but it'd be bulky and not conducive to general happiness of anyone involved.

I much prefer repurposing Nunjucks templates to keep all of that a separate document that's easy to manage with version control

jackmpcollinsOP2y ago

With magentic you could do chain-of-thought in two or more steps: one function that generates a string output containing the chain-of-thought reasoning and answer, and a second that takes that output and converts it to the final answer object. I agree though that this is not encouraged or made obvious by the framework.

The approach I'm encouraging with this is to write many functions to achieve your goal. So in the case of your email writing example you might have some of the following prompt-functions - write key bullet points for email about xyz -> list[str] - write email based on bullet points -> str - generate feedback for email to meet criteria abc -> str - update email based on feedback -> str - does email meet all criteria abc -> bool And between these you could have regular python code check things like blacklist/whitelist of keywords, length of paragraphs, and even add hardcoded strings to the feedback based on these checks.

BoorishBears2y ago

Why would you add a second function for the answer object when you can return an answer object in the same response as the chain of thought?

Overall your second approach makes for really terrible UX and dramatically weakens the performance at the task unless you go and repeat every single definition along the way: ensuring you now have X copies of the prompt spread across the code base and have blown up your token count.

Once you get to that level of granularity between calls, you've pretty much fall back into doing a slower more expensive version of NLP pre-ChatGPT.

quickthrower22y ago· 2 in thread

Does this do System vs. Assistant vs. User prompting?

jackmpcollinsOP2y ago

Right now we just pass a single user prompt to the chat model. Setting the system prompt could also be done in the `@prompt` decorator. I've added a github issue to track https://github.com/jackmpcollins/magentic/issues/31

jackmpcollinsOP2y ago

Update: I've added the ability to add chat messages using a new decorator `@chatprompt` in v0.7.0. See https://github.com/jackmpcollins/magentic/releases/tag/v0.7....

bl00p2y ago· 2 in thread

Are you familiar with https://github.com/PrefectHQ/marvin? This looks very similar

jackmpcollinsOP2y ago

Yes, similar ideas. Marvin [asks the LLM to mimic the python function](https://github.com/PrefectHQ/marvin/blob/f37ad5b15e2e77dd998...), whereas in magentic the function signature just represents the inputs/outputs to the prompt-template/LLM, so the LLM “doesn’t know” that it is pretending to be a python function - you specify all the prompts.

fredoliveira2y ago

(Completely off-topic, but oh how I wish HN supported markdown)

avindroth2y ago· 2 in thread

We need a new language/DSL. Python is a lost cause for strings as first-class.

conor_f2y ago

How so? What disadvantages does having strings as a first class Type have?

cc_ashby2y ago

I expressed myself too succinctly and without context, sorry.

I meant we need a new DSL better suited for prompt engg, and a UI that better supports longer strings. Actualy this UI can be something compatible with Python.

But overall a reimagination of the dev experience is what I am getting at (like Jupyter for LLMs).

Dm me [redacted] on X for more.

ramraj072y ago· 1 in thread

Awesome job with the simplicity, gonna play with it. Have you tried using yaml as the format with the models instead of JSOn? Feel like you'll use far fewer tokens to describe the same thing. Perhaps it's a bit more forgiving as well.

EDIT: Just tried using the decorator to output a fairly complex pydantic model and it failed with "magentic.chat_model.openai_chat_model.StructuredOutputError: Failed to parse model output. You may need to update your prompt to encourage the model to return a specific type."

I typically try to give examples in the pydantic Config class, perhaps those could be piped in for some few-shot methods, and also have some iteration if the model output is not perfectly parseable to correct the output syntax..

jackmpcollinsOP2y ago

Yes, I'm working on allowing few-shot examples to be provided as part of defining the prompt-function, which should help in cases like this. Unfortunately from my testing just now it appears that OpenAI ignores examples added to the model config.

In the meantime, have a look at the ValidationError traceback which might highlight a specific field that is causing the issue. Some options to resolve the issue might be: the type for this field could be made more lenient (e.g. str); the `Annotated` type hint could be used to give the field a description to help correct the error [0]; the field could be removed. You could also try using gpt-4 by setting the env var MAGENTIC_OPENAI_MODEL [1].

If none of these help resolve it or it appears to be an issue with magentic itself please file a github issue with an example. Comments on how to improve error messages and debugging are also welcome! Thanks for trying it out.

[0] https://docs.pydantic.dev/latest/concepts/fields/#using-anno...

[1] https://github.com/jackmpcollins/magentic#configuration

Difwif2y ago· 1 in thread

Looks great! I don't normally like these LLM libraries but this one sparks joy. I'll try it out on my next experiment.

Could you highlight how you're parsing to structured objects and how it can fail? Ever since I discovered guidance's method of pattern guides I've been wanting this more and more (only works for local hugging face models though). Wish OpenAI offered a similar API.

jackmpcollinsOP2y ago

Thanks! Currently magentic just uses OpenAI function-calling; it provides it a function schema that matches the structure of the output object. So it fails in the same ways as function-calling - struggles to match complex schemas, occasionally returns empty arrays, ...

conor_f2y ago· 1 in thread

Looks super cool! A few questions:

1) Can you get the actual code output or will this end up calling OpenAI each function call? 2) What latency does it add? What about token usage? 3) Is the functionality deterministic?

jackmpcollinsOP2y ago

1) The OpenAI API will be queried each time a "prompt-function" is called in python code. If you provide the `functions` argument in order to use function-calling then magentic will not execute the function the LLM has chosen, instead it returns a `FunctionCall` instance which you can validate before calling.

2) I haven't measured additional latency but it should be negligible in comparison to the speed of generation of the LLM. And since it makes it easy to use streaming and async functions you might be able to achieve much faster generation speeds overall - see the Async section in the README. Token usage should also be a negligible change from calling the OpenAI API directly - the only "prompting" magentic does currently is in naming the functions sent to OpenAI, all other input tokens are written by the user. A user switching from explicitly defining the output schema in the prompt to using function-calling via magentic might actually save a few tokens.

3) Functionality is not deterministic, even with `temperature=0`, but since we're working with python functions one option is to just add the `@cache` decorator. This would save you tokens and time when calling the same prompt-function with the same inputs.

---

1) https://github.com/jackmpcollins/magentic#usage 2) https://github.com/jackmpcollins/magentic#asyncio 3) https://docs.python.org/3/library/functools.html#functools.c...

czyhandsome2y ago· 1 in thread

Do you support custom LLMs?

jackmpcollinsOP2y ago

At the moment only those that support the OpenAI Chat API, with function calling for the structured outputs. For example you can use LocalAI[0][1] to run models locally.

[0] https://github.com/go-skynet/LocalAI

[1] https://localai.io/features/openai-functions/

visarga2y ago· 1 in thread

Write some tests for those functions. It will be worth it. No, I am not kidding, especially for AI we need tests, but we should report accuracy instead of a hard fail/pass.

hoosieree2y ago

No problem, boss.

    @prompt("Find out if {programs} are correct.")
    def do_they_work(programs: list) -> bool:
        ...

I just pushed it to production. Dashboard is all green. See you when I get back from vacation!

denysvitali2y ago

At first I was like: "okay, it's just a decorator to add a prompt when you have str as an input and str as an output.

Then I kept on reading, and I have to admit that the object creation with LLMs is really amazing!

jumploops2y ago

I built a similar package for Typescript[0], with the goal of having type-safe LLM responses automagically.

It's pretty fun, but I've found that having the LLM write code is often-times what I actually want most of the time.

[0] https://github.com/jumploops/magic

js982y ago

Very cool! At first the title reminded me of a project me and my colleague are working on called OpenAI-Functools [1], but your concept is quite the opposite, combining LLMs in your code rather seamlessly instead of the other way around. Quite cool, and interesting examples :)

I’ll definitely try to apply it in one of my pet projects. Good stuff

[1] https://github.com/Jakob-98/openai-functools

te_chris2y ago

This is great. I hacked a smaller version of this together when I built an LLM app with Elixir. Honestly, the async by default of Ex is so much better suited to this stuff, especially as it’s just api calls.

Tempted to have a go at porting these ideas. Should be v doable with the macro system.

bogtog2y ago

Really like how this is implemented with decorators. Everything just feels really smooth

fbnbr2y ago

I think the comments on great api design got me thinking of a world in which you can have multiple of these frameworks orchestrate together. I could see use in adding this to a platform I’m currently building to overcome some issues llamaindex eg introduces.

ccmillion2y ago

See also: `antiscope`, an experiment in subjunctive programming

https://github.com/MillionConcepts/antiscope

zainhoda2y ago

Nice! I’m going to try it out and possibly integrate it into my Python package: https://vanna.ai

pphysch2y ago

This is neat. It makes it easy to prototype, and then you can just remove the decorator and write a specific implementation if you need to.

retrovrv2y ago

Super cool! Looks quite intuitive, especially for function calls.

pedrovhb2y ago

Just wanna say, that's pretty great API design :)

cosmonoot2y ago

Seems a lot like https://github.com/PrefectHQ/marvin?

The prompting you do seems awfully similar to:

https://www.askmarvin.ai/prompting/prompt_function/

1 more reply

bluecoconut2y ago

Pretty cool, I made something similar (lambdaprompt[1]), with the same ideal of functions being the best interface for LLMs.

Also, here's some discussion about this style of prompting and ways of working with LLMs from a while ago [2].

[1] https://github.com/approximatelabs/lambdaprompt/ [2] https://news.ycombinator.com/context?id=34422917

1 more reply

lachlan_gray2y ago

This is also similar in spirit to LMQL

https://github.com/eth-sri/lmql

1 more reply

j / k navigate · click thread line to collapse

63 comments

59 comments · 28 top-level

ElectricalUnion2y ago· 5 in thread

Is it really LLMs (plural) when you only have OpenAPI integration?

jackmpcollinsOP2y ago

See for example Anyscale Endpoints https://app.endpoints.anyscale.com/landing and https://github.com/AmineDiro/cria

1 more reply

AmazingTurtle2y ago

I tried out guidance. Encountered endless bugs

msikora2y ago

OpenAI offers a few different LLMs :)

dragonwriter2y ago

text-generation-webui offers an OpenAI API implementation, specifically to support OpenAI API clients, so you can get something more than just OpenAI support by just wrapping the OpenAI API.

hitori2y ago· 4 in thread

I am amazed that `...` is a valid syntax in Python, not a pseudo grammar.

This library is impressive, I appreciate it and I will apply it to my project.

joelthelion2y ago

What's the difference between '...' and the more common 'pass'?

hoosieree2y ago

I find students correctly infer what to do with "..." whereas they were afraid to touch "pass".

E.g, if I gave them this:

    def foo(x):
      ...  #add your implementation here
    
    def bar(x):
      pass #add your implementation here

I'd get back this:

    def foo(x):
      return x+1
      
    def bar(x):
      return x+1
      pass

inpaner2y ago

In code, using ... implies that the code is yet to be written. pass means it's explicitly a noop.

jwestbury2y ago

In this case, functionally, nothing. Some other commenters have suggested it does something interesting by implying "AI will provide the logic," whereas "pass" doesn't necessarily do that.

jstarfish2y ago· 3 in thread

This looks really useful. Langchain is not my idea of a fun time.

Love the examples too. Low-effort humor is the best:

> create_superhero("Garden Man")

> # Superhero(name='Garden Man', age=30, power='Control over plants', enemies=['Pollution Man', 'Concrete Woman'])

brandall102y ago

FWIW, at my last company we had a section in the developer guide encouraging using humor in tests - not only did it make them more fun to write, but it engaged the readership better.

phatskat2y ago

I’ve been integrating humor into our unit tests for a bit now and have gotten feedback from a few engineers who really seem to appreciate it.

cosmonoot2y ago

Would check out https://www.askmarvin.ai/ if you're into this.

I haven't downloaded 1.5 yet, but they released this last week: https://www.askmarvin.ai/prompting/prompt_function/

smilingemoji2y ago· 3 in thread

The API looks very clean. Today I learned about "..." in Python

quickthrower22y ago

It is just a noop, but here it looks very appropriate/readable because it reads as saying "AI will fill this in".

politelemon2y ago

It's a misuse of the Python Ellipsis, though PEP has no opinion on it. The Ellipsis is "Special value used mostly in conjunction with extended slicing syntax for user-defined container data types."

In other words, it happens to work and look neat, but pass is the correct way to do it.

1 more reply

nodesocket2y ago

The same as “pass”?

jedberg2y ago· 3 in thread

Curious as to why you chose to do it as a decorator instead of just a function call?

jackmpcollinsOP2y ago

3abiton2y ago

I never got on board of decorators in python, but you sold me on it.

1 more reply

dragonwriter2y ago

BoorishBears2y ago· 2 in thread

I've personally found frameworks like this to get in the way of quality COT: It's rare for a prompt that takes great advantage of the LLM's reasoning to fit in the format these generators encourage

I much prefer repurposing Nunjucks templates to keep all of that a separate document that's easy to manage with version control

jackmpcollinsOP2y ago

BoorishBears2y ago

Why would you add a second function for the answer object when you can return an answer object in the same response as the chain of thought?

Once you get to that level of granularity between calls, you've pretty much fall back into doing a slower more expensive version of NLP pre-ChatGPT.

quickthrower22y ago· 2 in thread

Does this do System vs. Assistant vs. User prompting?

jackmpcollinsOP2y ago

Update: I've added the ability to add chat messages using a new decorator `@chatprompt` in v0.7.0. See https://github.com/jackmpcollins/magentic/releases/tag/v0.7....

bl00p2y ago· 2 in thread

Are you familiar with https://github.com/PrefectHQ/marvin? This looks very similar

jackmpcollinsOP2y ago

fredoliveira2y ago

(Completely off-topic, but oh how I wish HN supported markdown)

avindroth2y ago· 2 in thread

We need a new language/DSL. Python is a lost cause for strings as first-class.

conor_f2y ago

How so? What disadvantages does having strings as a first class Type have?

cc_ashby2y ago

I expressed myself too succinctly and without context, sorry.

I meant we need a new DSL better suited for prompt engg, and a UI that better supports longer strings. Actualy this UI can be something compatible with Python.

But overall a reimagination of the dev experience is what I am getting at (like Jupyter for LLMs).

Dm me [redacted] on X for more.

ramraj072y ago· 1 in thread

jackmpcollinsOP2y ago

[0] https://docs.pydantic.dev/latest/concepts/fields/#using-anno...

[1] https://github.com/jackmpcollins/magentic#configuration

Difwif2y ago· 1 in thread

Looks great! I don't normally like these LLM libraries but this one sparks joy. I'll try it out on my next experiment.

jackmpcollinsOP2y ago

conor_f2y ago· 1 in thread

Looks super cool! A few questions:

1) Can you get the actual code output or will this end up calling OpenAI each function call? 2) What latency does it add? What about token usage? 3) Is the functionality deterministic?

jackmpcollinsOP2y ago

---

1) https://github.com/jackmpcollins/magentic#usage 2) https://github.com/jackmpcollins/magentic#asyncio 3) https://docs.python.org/3/library/functools.html#functools.c...

czyhandsome2y ago· 1 in thread

Do you support custom LLMs?

jackmpcollinsOP2y ago

At the moment only those that support the OpenAI Chat API, with function calling for the structured outputs. For example you can use LocalAI[0][1] to run models locally.

[0] https://github.com/go-skynet/LocalAI

[1] https://localai.io/features/openai-functions/

visarga2y ago· 1 in thread

Write some tests for those functions. It will be worth it. No, I am not kidding, especially for AI we need tests, but we should report accuracy instead of a hard fail/pass.

hoosieree2y ago

No problem, boss.

    @prompt("Find out if {programs} are correct.")
    def do_they_work(programs: list) -> bool:
        ...

I just pushed it to production. Dashboard is all green. See you when I get back from vacation!

denysvitali2y ago

At first I was like: "okay, it's just a decorator to add a prompt when you have str as an input and str as an output.

Then I kept on reading, and I have to admit that the object creation with LLMs is really amazing!

jumploops2y ago

I built a similar package for Typescript[0], with the goal of having type-safe LLM responses automagically.

It's pretty fun, but I've found that having the LLM write code is often-times what I actually want most of the time.

[0] https://github.com/jumploops/magic

js982y ago

I’ll definitely try to apply it in one of my pet projects. Good stuff

[1] https://github.com/Jakob-98/openai-functools

te_chris2y ago

Tempted to have a go at porting these ideas. Should be v doable with the macro system.

bogtog2y ago

Really like how this is implemented with decorators. Everything just feels really smooth

fbnbr2y ago

ccmillion2y ago

See also: `antiscope`, an experiment in subjunctive programming

https://github.com/MillionConcepts/antiscope

zainhoda2y ago

Nice! I’m going to try it out and possibly integrate it into my Python package: https://vanna.ai

pphysch2y ago

This is neat. It makes it easy to prototype, and then you can just remove the decorator and write a specific implementation if you need to.

retrovrv2y ago

Super cool! Looks quite intuitive, especially for function calls.

pedrovhb2y ago

Just wanna say, that's pretty great API design :)

cosmonoot2y ago

Seems a lot like https://github.com/PrefectHQ/marvin?

The prompting you do seems awfully similar to:

https://www.askmarvin.ai/prompting/prompt_function/

1 more reply

bluecoconut2y ago

Pretty cool, I made something similar (lambdaprompt[1]), with the same ideal of functions being the best interface for LLMs.

Also, here's some discussion about this style of prompting and ways of working with LLMs from a while ago [2].

[1] https://github.com/approximatelabs/lambdaprompt/ [2] https://news.ycombinator.com/context?id=34422917

1 more reply

lachlan_gray2y ago

This is also similar in spirit to LMQL

https://github.com/eth-sri/lmql

1 more reply

j / k navigate · click thread line to collapse