I built this to show that we can think about using LLMs more fluidly than just chains and chats, i.e. more interchangeably with regular code, and to make it easy to do that.
Please let me know what you think! Contributions welcome.
See for example Anyscale Endpoints https://app.endpoints.anyscale.com/landing and https://github.com/AmineDiro/cria
You could have more flexibility by abstracting out the underlying LLM APIs, but then you also have a bigger deal with supported features of different APIs, the same conceptual feature supported with very different parameter structures, etc., etc.
This library is impressive, I appreciate it and I will apply it to my project.
E.g, if I gave them this:
def foo(x):
... #add your implementation here
def bar(x):
pass #add your implementation here
I'd get back this: def foo(x):
return x+1
def bar(x):
return x+1
passLove the examples too. Low-effort humor is the best:
> create_superhero("Garden Man")
> # Superhero(name='Garden Man', age=30, power='Control over plants', enemies=['Pollution Man', 'Concrete Woman'])
I haven't downloaded 1.5 yet, but they released this last week: https://www.askmarvin.ai/prompting/prompt_function/
In other words, it happens to work and look neat, but pass is the correct way to do it.
A friend mentioned how terrible most cold email generators are at actually generating natural feeling emails. It just took asking him questions about how actual people in marketing come up with emails to come up with a chain of thought that produces intentionally uncanny emails for a wide range of inputs: https://rentry.co/54hbz
It's not like you can't technically fit what I described into bunch of comments (or an obnoxiously long multiline comment), but it'd be bulky and not conducive to general happiness of anyone involved.
I much prefer repurposing Nunjucks templates to keep all of that a separate document that's easy to manage with version control
The approach I'm encouraging with this is to write many functions to achieve your goal. So in the case of your email writing example you might have some of the following prompt-functions - write key bullet points for email about xyz -> list[str] - write email based on bullet points -> str - generate feedback for email to meet criteria abc -> str - update email based on feedback -> str - does email meet all criteria abc -> bool And between these you could have regular python code check things like blacklist/whitelist of keywords, length of paragraphs, and even add hardcoded strings to the feedback based on these checks.
Overall your second approach makes for really terrible UX and dramatically weakens the performance at the task unless you go and repeat every single definition along the way: ensuring you now have X copies of the prompt spread across the code base and have blown up your token count.
Once you get to that level of granularity between calls, you've pretty much fall back into doing a slower more expensive version of NLP pre-ChatGPT.
I meant we need a new DSL better suited for prompt engg, and a UI that better supports longer strings. Actualy this UI can be something compatible with Python.
But overall a reimagination of the dev experience is what I am getting at (like Jupyter for LLMs).
Dm me [redacted] on X for more.
EDIT: Just tried using the decorator to output a fairly complex pydantic model and it failed with "magentic.chat_model.openai_chat_model.StructuredOutputError: Failed to parse model output. You may need to update your prompt to encourage the model to return a specific type."
I typically try to give examples in the pydantic Config class, perhaps those could be piped in for some few-shot methods, and also have some iteration if the model output is not perfectly parseable to correct the output syntax..
In the meantime, have a look at the ValidationError traceback which might highlight a specific field that is causing the issue. Some options to resolve the issue might be: the type for this field could be made more lenient (e.g. str); the `Annotated` type hint could be used to give the field a description to help correct the error [0]; the field could be removed. You could also try using gpt-4 by setting the env var MAGENTIC_OPENAI_MODEL [1].
If none of these help resolve it or it appears to be an issue with magentic itself please file a github issue with an example. Comments on how to improve error messages and debugging are also welcome! Thanks for trying it out.
[0] https://docs.pydantic.dev/latest/concepts/fields/#using-anno...
Could you highlight how you're parsing to structured objects and how it can fail? Ever since I discovered guidance's method of pattern guides I've been wanting this more and more (only works for local hugging face models though). Wish OpenAI offered a similar API.
1) Can you get the actual code output or will this end up calling OpenAI each function call? 2) What latency does it add? What about token usage? 3) Is the functionality deterministic?
2) I haven't measured additional latency but it should be negligible in comparison to the speed of generation of the LLM. And since it makes it easy to use streaming and async functions you might be able to achieve much faster generation speeds overall - see the Async section in the README. Token usage should also be a negligible change from calling the OpenAI API directly - the only "prompting" magentic does currently is in naming the functions sent to OpenAI, all other input tokens are written by the user. A user switching from explicitly defining the output schema in the prompt to using function-calling via magentic might actually save a few tokens.
3) Functionality is not deterministic, even with `temperature=0`, but since we're working with python functions one option is to just add the `@cache` decorator. This would save you tokens and time when calling the same prompt-function with the same inputs.
---
1) https://github.com/jackmpcollins/magentic#usage 2) https://github.com/jackmpcollins/magentic#asyncio 3) https://docs.python.org/3/library/functools.html#functools.c...
@prompt("Find out if {programs} are correct.")
def do_they_work(programs: list) -> bool:
...
I just pushed it to production. Dashboard is all green. See you when I get back from vacation!Then I kept on reading, and I have to admit that the object creation with LLMs is really amazing!
It's pretty fun, but I've found that having the LLM write code is often-times what I actually want most of the time.
I’ll definitely try to apply it in one of my pet projects. Good stuff
Tempted to have a go at porting these ideas. Should be v doable with the macro system.
The prompting you do seems awfully similar to:
Also, here's some discussion about this style of prompting and ways of working with LLMs from a while ago [2].
[1] https://github.com/approximatelabs/lambdaprompt/ [2] https://news.ycombinator.com/context?id=34422917