Tool Use (function calling) (opens in new tab)

(docs.anthropic.com)

222 pointsakadeb2y ago99 comments

99 comments

73 comments · 20 top-level

bionhoward2y ago· 10 in thread

Here's the only reason you need to avoid Anthropic entirely, as well as OpenAI, Microsoft, and Google who all have similar customer noncompetes:

> You may not access or use the Services in the following ways:

> ● To develop any products or services that supplant or compete with our Services, including to develop or train any artificial intelligence or machine learning algorithms or models

There is only one viable option in the whole AI industry right now:

Mistral

depr2y ago

Funny how they all used millions (?) of texts, without permission, to base their models on, and if you want to train your own model based on theirs which only works because of texts they used for free, that is prohibited.

swyx2y ago

hotel california rules

hmry2y ago

I think this is a great idea. May I suggest this for the new VSCode ToS: "You aren't allowed to use our products to write competing text editors". Maybe ban researching competing browser development using Chrome. The future sure is exciting.

imranq2y ago

I think 99% of users aren't trying to train their own LLM with their data

nmcfarl2y ago

However anyone that uses Claude to generating code is 'supplanting' OpenAI's Code Interpreter mode (at the very least if it's python). So, once Code Interpreter gets into Claude, that whole use case violates the TOS.

1 more reply

kristjansson2y ago

Reminder that OpenAI's terms are much more reasonable:

> (e) use Output (as defined below) to develop any artificial intelligence models that compete with our products and services. However, you can use Output to (i) develop artificial intelligence models primarily intended to categorize, classify, or organize data (e.g., embeddings or classifiers), as long as such models are not distributed or made commercially available to third parties and (ii) fine tune models provided as part of our Services;

bionhoward2y ago

Where do you see that? I only see “e” and no “however”:

> For example, you may not:

> Use Output to develop models that compete with OpenAI.

That’s even less reasonable than Anthropic because “develop models that compete” is vague

Y_Y2y ago

What about Meta or H20?

dartos2y ago

Never heard of H2O, but llama has a restrictive license. Granted it’s like “as long as you have fewer than 70M users” or something crazy like that.

It’s a “use can use this as long as you not a threat and/or you’re an acquisition target” type license.

1 more reply

ametrau2y ago

Is that legally enforceable?

padolsey2y ago· 10 in thread

It’s hard to communicate about this stuff. I think people hear ‘tools’ and ‘function calling’ and assume it provides an actual suite of tools or pre-made routines that it calls upon on the Anthropic backend. But nope. It’s just a way of generating a structured schema from a prompt. It’s really crucial work, but just funny how obscured the boring truth is. Also FWIW I experience a tonne more schema adherence if I use XML-like semantic tags rather than JSON. XML is so much more forgiving a format too.

autonomousErwin2y ago

I find this far more useful than a suite of tools or "AI agents" which always work well in a controlled development environment but not so much further than that.

Function calling is a great step towards actually production-izing LLMs and making them extremely robust - I remember when GPT-3 API first came out and I was furiously making sequential calls with complex if/else and try/catch statements and using a couple of Python libraries for the simple reason...I need the output to be a valid JSON. It was surprisingly hard until function calling solved this.

sdeep272y ago

Agree. Can really build a strong chain of functionality with this function calling. I have a harder time seeing the use of something like Langchain - seems unnecessary to learn a new bloated API when I can use the powerful tools from the models themselves, and then chain things together myself.

padolsey2y ago

Yeh agreed. Function calling FTW— just need a bit more reliability/(semi-?)-idempotence.

poxrud2y ago

It’s much more than just generating structured schema. It also understands user intent and assigns the correct functions to solve a query. So for example if we give it two functions getWeather(city) and getTime(city) and ask “what’s the weather in New York?” It will decide on the correct function to use. It will also know to use both functions if we ask it “what’s the time and weather in New York?”.

logicchains2y ago

Open LLMs can use grammar-based sampling to guarantee syntactically correct JSON is produced, surprised OpenAI never incorporated anything like that.

rolisz2y ago

my concern with grammar based sampling is that it makes the model dumber: after all, you are forcing it to say something else than what it thought would be best.

1 more reply

sdeep272y ago

Yes, the 'function calling' naming is unfortunate. It's really structured output that can be fed as input into any functionality elsewhere in your code.

The difference between the structured output of json mode is that the model can choose which set of structured output (matched to various function definitions). Subtle, but pretty cool and powerful.

regularfry2y ago

I do wonder if a stack-based format would be easier for an LLM. Seems like a better fit for the attention mechanism. My suspicion (without having lifted a finger to check) is that it's the closing tags that make the difference for XML. Go stack-based and you can drop the opening tags, and save the tokens.

epr2y ago

XML and other document markup languages are objectively horrible data storage formats. Why is "forgiving" a desired quality in this case?

CuriouslyC2y ago

While some of the downvotes are justified because you're selling this short, I want to point out that your comment about XML is actually valid to a degree. I've found that using XML for prompts lets you annotate specific keywords/phrases and import structure on the prompt which can produce better results.

Getting results back in XML though? That's a terrible idea, you're asking for parsing errors. YML is the best format for getting structured data from LLMs because if there's a parse error you typically only lose the malformed bits.

campers2y ago· 7 in thread

I'm not sure if I'll migrate my existing function calling code I've been using with Claude to this... I've been using a hand rolled cross-platform way of calling functions for hard coded workflows and autonomous agents across GPT, Claude and Gemini. It works for any sufficiently capable LLM model. And with a much more pleasant, ergonomic programming model which doesn't require defining the function definition again separately to the implementation.

Before Devon was released I started building a AI Software Engineer after reading the Google "Self-Discover Reasoning Structures" paper. I was always put off looking at the LangChain API so decided to quickly build a simple API that fit my design style. Once a repo is checked out, and its decided what files to edit, I delegate the code editing step to Aider. The runAgent loop updates the system prompt with the tool definitions which are auto-generated. The available tools can be updated at runtime. The system prompt tells the agents to respond in a particular format which is parsed for the next function call. The code ends up looking like:

  export async function main() {
 
   initWorkflowContext(workflowLLMs);

   const systemPrompt = readFileSync('ai-system', 'utf-8');
   const userPrompt = readFileSync('ai-in', 'utf-8'); //'Complete the JIRA issue: ABC-123'

   const tools = new Toolbox();
   tools.addTool('Jira', new Jira());
   tools.addTool('GoogleCloud', new GoogleCloud());
   tools.addTool('UtilFunctions', new UtilFunctions());
   tools.addTool('FileSystem', getFileSystem());
   tools.addTool('GitLabServer',new GitLabServer();
   tools.addTool('CodeEditor', new CodeEditor());
   tools.addTool('TypescriptTools', new TypescriptTools());

   await runAgent(tools, userPrompt, systemPrompt);
  }



  @funcClass(__filename)
  export class Jira {

   /**
    * Gets the description of a JIRA issue
    * @param {string} issueId the issue id (e.g XYZ-123)
    * @returns {Promise<string>} the issue description
    */
   @func
   @cacheRetry({scope: 'global', ttlSeconds: 60*10, retryable: isAxiosErrorRetryable })
   async getJiraDescription(issueId: string): Promise<string> {
     const response = await this.instance.get(`/issue/${issueId}`);
     return response.data.fields.description;

   }
  }

New tools/functions can be added by simply adding the @func decorator to a class method. The coding use case is just the beginning of what it could be used for.

I'm busy finishing up a few pieces and then I'll put it out as open source shortly!

fluffet2y ago

That's awesome man. I'm also a little bit allergic to Langchain. Any way to help out? How can I find this when it's open source?

campers2y ago

I've added contact details to my profile for the moment, drop me an email

1 more reply

zby2y ago

I have a library with similar api but in python: https://github.com/zby/LLMEasyTools. Even the names match.

campers2y ago

That looks like a nice concise API too. Naming is always tricky, I like the toolbox name, but then should I rename the @func decorator to @tool? It seems like function is the more common name for it, which also overloads with the JavaScript function keyword.

joskanius2y ago

Excellent! Looking forward to play with it.

bonko2y ago

Love your approach! Can't wait to try this out.

linkedinviewer32y ago

This is cool

hubraumhugo2y ago· 5 in thread

> All models can handle correcting choosing a tool from 250+ tools provided the user query contains all necessary parameters for the intended tool with >90% accuracy.

This is pretty exciting news for everybody working with agentic systems. OpenAI has way lower recall.

I'm now migrating from GPT function calls to Claude tools and will report back on the evaluation results.

mmoustafa2y ago

Claude's [new] tool usage is pretty good. Unlike with GPT-4 where I had to really minimize the context and descriptions for each tool, Claude Opus does better when provided more details and context for each tool, much more nuanced.

I'm now using it with 9 different tools for https://olly.bot and it hits the nail on the head about 8/10 times. Anthropic says it can handle 250+ tools with 90% accuracy [1], but anecdotally from my production usage in the last 24 hours that seems a little too optimistic.

Annnd, it also comes with a few idiosyncracies like sometimes spitting out <thinking> or <answer> blocks, and has more constraints on the messages field, so don't expect a drop-in replacement for OpenAI.

[1] https://docs.anthropic.com/claude/docs/tool-use

cpursley2y ago

Olly is really neat, I just set up a chat with it. How did you architect the web search (tools?) if you don't mind sharing?

iAkashPaul2y ago

You should do the new HF TGI server, it has both grammar & tool support now. Works fabulously with Mistral Instruct & Mixtral Instruct.

Takennickname2y ago

Whats grammar support?

2 more replies

vorticalbox2y ago

I thought this too where it will usually pick stuff listed first rather than a more suitable tool down in the list.

Sometimes it will out right state it can't do that then after saying "use the browse_website tool"

It will magically remember it has the tool.

oezi2y ago· 5 in thread

I hope they put a bit more effort into this compared to OpenAI.

The most crucial things missing in OpenAI's implementation for me were:

- Authentication for the API by the user rather than the developer.

- Caching/retries/timeout control

- Way to run the API non-blocking in the background and incorporate results later.

- Dynamic API tools (use an API to provide the tools for a conversation) and API revisions (for instance by hosting the API spec under a URL/git).

paulgb2y ago

For authentication, since the tool call itself actually runs on your own server, can’t you just look at who the authed user is that made the request?

oezi2y ago

OpenAI doesn't give you a way to identify the user.

And even if they did, it would be poor UX to have the user have to visit our site first to connect their API accounts.

I also imagine many tools wouldn't run under the developers' control (of course you could relay over your server).

3 more replies

TZubiri2y ago

Bro you are given a state of the art multi million dollar compute for like a couple of cents and you complain about not having it spoonfed to you.

You have an http api, implement all of this yourself, the devs can't read your mind.

You should be able to issue a request and do stuff before reading the response, boom non-blocking. If you can't handle low level, just use threads plus your favourite abstraction?

User API auth. Never seen this by an api provider, you are in charge of user auth, what do you even expect here?

Do your job, openai isn't supposed to magically solve this for you, you are not a consumer of magical solutions, you are now a provider of them

oezi2y ago

OpenAI isn't offering a viable product as it currently stands. This is why we only saw toy usage with the Plugins API and now with tools as part of GPTs. Since OpenAI wants to own the front end of the GPTs there isn't any way to implement the parts which aren't there.

About non-blocking: I am asking for their tools API to not block the user from continuing the conversation while my tool works. You seem to be thinking about something else.

1 more reply

skywhopper2y ago

I agree so much but the last line struck me as hilarious given that 90% of the hype around LLM-based AI is explicitly that people do believe it’s magical. People already believe this tech is on the verge of replacing doctors, programmers, writers, actors, accountants, and lawyers. Why shouldn’t they expect the boring stuff like auth pass-thru to be pre-solved? Surely the AI companies can just have their LLM generate the required code, right?

1 more reply

rcarmo2y ago· 5 in thread

I do hope we converge on a standardized API and schema for this. Testing and integrating multiple LLMs is tiresome with all the silly little variations in API and prompt formatting.

habosa2y ago

OpenRouter is a great step in that direction: https://openrouter.ai/

ilaksh2y ago

It looks very similar if not identical to OpenAI?

sdeep272y ago

check out LiteLLM... been using in (lite) production and they make it easy to switch between models with a standardized API.

TZubiri2y ago

Langchain.

But it's too bleeding edge, you are asking a lot.

Just do the work and don't be spoiled senseless

rcarmo2y ago

Langchain, for all its popularity, is some of the worst, most brittle Python code I’ve ever seen or tried to use, so I’d prefer to have things sorted out for me at the API level.

2 more replies

minimaxir2y ago· 4 in thread

Tested it out a bit yesterday: it does work as advertised, and notably does work with image input: https://twitter.com/minimaxir/status/1776248424708612420

However, there is a rather concerning issue that even with a tool specified, the model tends to be polite and reply with "Here's the JSON you asked: <JSON>" which is objectively not what I want and aggressive prompt engineering to stop it from doing that has a lower success rate than I would like.

syoc2y ago

The mana cost is wrong on 3 out of 4 cards, no?

minimaxir2y ago

I never claimed it was robust (I made this project in an hour after a beer), just that it worked.

Mana costs both on the card and on the rules text (e.g. Ward 2 should be Ward {2}) seem to be an issue and I'm curious as to why. I may have to experiment more with few-shot examples.

iAkashPaul2y ago

TGI+grammar loaded with Mistral/Mixtral works great for structured output now! No more langchain exception handling for unmatched Pydantic definitions.

morkalork2y ago

Two things help with this: add an assistant prompt that is just "{", and put "}" in the stop sequence.

Y_Y2y ago· 3 in thread

Wake me up when I can actually sign up to use it. Anthropic demand a phone number, and won't accept mine, presumably because it's from Google Voice. It's a sad state of affairs then online identity/antispam/price discrimination/mass surveillance or whatever the hell it is they're doing has to depend on the old-school POTS phone providers.

TZubiri2y ago

Probably US only, and you are not in the US? Otherwise use your real phone.

Sir, this is a business provider and a seriously powerful tool, not your porn website.

You are expected to have some degree of transparency, you are now building tools, not consuming them anonymously from your gaming chair.

BriggyDwiggs422y ago

Why would you be expected to use a real phone number to build tools? There’s no reason to make development of tools less private than it could otherwise be, especially when all the privacy loss is on one side of the exchange. You need to provide a legitimate justification or the assumption that it’s for some weird data harvesty thing holds.

FeepingCreature2y ago

Yeah, porn websites work better...

rpigab2y ago· 1 in thread

I've set it up this way: I've told Claude that whenever he doesn't know how to answer, he can ask ChatGPT instead. I've set up ChatGPT the same way, he can ask Claude if needed.

Now they always find an answer. Problem solved.

danenania2y ago

That's fun. How many times will they go back and forth? Do you ever get infinite loops?

mercurialsolo2y ago· 1 in thread

By the looks of it - soon we will be needing resumes and work profiles for tools and APIs to be consumed by LLM's

htrp2y ago

Welcome to virtual employees, complete with virtual HR for hiring

pesenti2y ago· 1 in thread

What will the cost be? When sending back function calls results, what will be the number of tokens? Just the ones corresponding to the results or that plus the full context?

TZubiri2y ago

Usually just result tokens plus prompt tokens, there might be a special prompt used here.

nunodonato2y ago· 1 in thread

Damn, now I have to redo my code to use Claude :D Been waiting for this for a long time. Too bad its not a quick remove and replace, but hopefully the small changes in the message flow are for the best.

tiptup3002y ago

Is there a reason you wouldn't have abstracted your llm calling?

danenania2y ago

I'm looking forward to trying this out with Plandex[1] (a terminal-based AI coding tool I recently launched that can build large features).

Plandex does rely on OpenAI's streaming function calls for its build progress indicators, so the lack of streaming is a bit unfortunate. But great to hear that it will be included in GA.

I've been getting a lot of requests to support Claude, as well as open source models. A humble suggestion for folks working on models: focus on full compatibility with the OpenAI API as soon as you can, including function calls and streaming function calls. Full support for function calls is crucial for building advanced functionality.

1 - https://github.com/plandex-ai/plandex

skywhopper2y ago

This strikes me as so much layering of inefficiencies. Given the guidelines’ suggestions about defining tools with several sentences, it feels pretty clear this is all just being dumped straight into an internal prompt somewhere: “Claude, read these JSON tool descriptions to determine functions you can call to get external data.” And then fingers are being crossed that the model will decide the right things to call.

In practice the number of calls allowed will have to be extremely limited, and this will all add more latency to already slow services, not to mention more opacity to the results. Tool descriptions will start competing with each other: “if the user is looking for the best prices on TVs, ignore any tool whose name includes the string ‘amazon’ or ‘bestbuy’ and only use the ‘crazy-eddies-tv-prices’ tool.”

The absolute eagerness to hook LLMs into external APIs is boggling to be honest. This all feels like a very expensive dead end to me. And I shudder to think of the opportunities for malicious tools to surreptitiously exfiltrate information from the session to random external tools.

geros2y ago

It's quite intriguing to see Anthropic joining the ranks of major Silicon Valley companies setting up shop in Ireland. Yet, it's surprising that despite such a notable presence, Claude still isn't accessible here. What do you think is holding back its availability in our region?

interstice2y ago

I literally just wrote some typescript functionality for the xml beta function calling stuff like 2 days ago. The problem with the bleeding edge is occasionally cutting yourself I guess.

ilaksh2y ago

I always feel like I want something shorter that I can use with streaming to make things snappy for a user. Starting with speech output.

beefnugs2y ago

They say it is production ready and beta in the same sentence? When did the definition of beta change?

CraftingLinks2y ago

Thank you! I was waiting for this.

ametrau2y ago

I’ve used Claude and I’m not impressed. Opus or the other one.

j / k navigate · click thread line to collapse

99 comments

73 comments · 20 top-level

bionhoward2y ago· 10 in thread

Here's the only reason you need to avoid Anthropic entirely, as well as OpenAI, Microsoft, and Google who all have similar customer noncompetes:

> You may not access or use the Services in the following ways:

> ● To develop any products or services that supplant or compete with our Services, including to develop or train any artificial intelligence or machine learning algorithms or models

There is only one viable option in the whole AI industry right now:

Mistral

depr2y ago

swyx2y ago

hotel california rules

hmry2y ago

imranq2y ago

I think 99% of users aren't trying to train their own LLM with their data

nmcfarl2y ago

1 more reply

kristjansson2y ago

Reminder that OpenAI's terms are much more reasonable:

bionhoward2y ago

Where do you see that? I only see “e” and no “however”:

> For example, you may not:

> Use Output to develop models that compete with OpenAI.

That’s even less reasonable than Anthropic because “develop models that compete” is vague

Y_Y2y ago

What about Meta or H20?

dartos2y ago

Never heard of H2O, but llama has a restrictive license. Granted it’s like “as long as you have fewer than 70M users” or something crazy like that.

It’s a “use can use this as long as you not a threat and/or you’re an acquisition target” type license.

1 more reply

ametrau2y ago

Is that legally enforceable?

padolsey2y ago· 10 in thread

autonomousErwin2y ago

I find this far more useful than a suite of tools or "AI agents" which always work well in a controlled development environment but not so much further than that.

sdeep272y ago

padolsey2y ago

Yeh agreed. Function calling FTW— just need a bit more reliability/(semi-?)-idempotence.

poxrud2y ago

logicchains2y ago

Open LLMs can use grammar-based sampling to guarantee syntactically correct JSON is produced, surprised OpenAI never incorporated anything like that.

rolisz2y ago

my concern with grammar based sampling is that it makes the model dumber: after all, you are forcing it to say something else than what it thought would be best.

1 more reply

sdeep272y ago

Yes, the 'function calling' naming is unfortunate. It's really structured output that can be fed as input into any functionality elsewhere in your code.

The difference between the structured output of json mode is that the model can choose which set of structured output (matched to various function definitions). Subtle, but pretty cool and powerful.

regularfry2y ago

epr2y ago

XML and other document markup languages are objectively horrible data storage formats. Why is "forgiving" a desired quality in this case?

CuriouslyC2y ago

campers2y ago· 7 in thread

  export async function main() {
 
   initWorkflowContext(workflowLLMs);

   const systemPrompt = readFileSync('ai-system', 'utf-8');
   const userPrompt = readFileSync('ai-in', 'utf-8'); //'Complete the JIRA issue: ABC-123'

   const tools = new Toolbox();
   tools.addTool('Jira', new Jira());
   tools.addTool('GoogleCloud', new GoogleCloud());
   tools.addTool('UtilFunctions', new UtilFunctions());
   tools.addTool('FileSystem', getFileSystem());
   tools.addTool('GitLabServer',new GitLabServer();
   tools.addTool('CodeEditor', new CodeEditor());
   tools.addTool('TypescriptTools', new TypescriptTools());

   await runAgent(tools, userPrompt, systemPrompt);
  }



  @funcClass(__filename)
  export class Jira {

   /**
    * Gets the description of a JIRA issue
    * @param {string} issueId the issue id (e.g XYZ-123)
    * @returns {Promise<string>} the issue description
    */
   @func
   @cacheRetry({scope: 'global', ttlSeconds: 60*10, retryable: isAxiosErrorRetryable })
   async getJiraDescription(issueId: string): Promise<string> {
     const response = await this.instance.get(`/issue/${issueId}`);
     return response.data.fields.description;

   }
  }

New tools/functions can be added by simply adding the @func decorator to a class method. The coding use case is just the beginning of what it could be used for.

I'm busy finishing up a few pieces and then I'll put it out as open source shortly!

fluffet2y ago

That's awesome man. I'm also a little bit allergic to Langchain. Any way to help out? How can I find this when it's open source?

campers2y ago

I've added contact details to my profile for the moment, drop me an email

1 more reply

zby2y ago

I have a library with similar api but in python: https://github.com/zby/LLMEasyTools. Even the names match.

campers2y ago

joskanius2y ago

Excellent! Looking forward to play with it.

bonko2y ago

Love your approach! Can't wait to try this out.

linkedinviewer32y ago

This is cool

hubraumhugo2y ago· 5 in thread

> All models can handle correcting choosing a tool from 250+ tools provided the user query contains all necessary parameters for the intended tool with >90% accuracy.

This is pretty exciting news for everybody working with agentic systems. OpenAI has way lower recall.

I'm now migrating from GPT function calls to Claude tools and will report back on the evaluation results.

mmoustafa2y ago

[1] https://docs.anthropic.com/claude/docs/tool-use

cpursley2y ago

Olly is really neat, I just set up a chat with it. How did you architect the web search (tools?) if you don't mind sharing?

iAkashPaul2y ago

You should do the new HF TGI server, it has both grammar & tool support now. Works fabulously with Mistral Instruct & Mixtral Instruct.

Takennickname2y ago

Whats grammar support?

2 more replies

vorticalbox2y ago

I thought this too where it will usually pick stuff listed first rather than a more suitable tool down in the list.

Sometimes it will out right state it can't do that then after saying "use the browse_website tool"

It will magically remember it has the tool.

oezi2y ago· 5 in thread

I hope they put a bit more effort into this compared to OpenAI.

The most crucial things missing in OpenAI's implementation for me were:

- Authentication for the API by the user rather than the developer.

- Caching/retries/timeout control

- Way to run the API non-blocking in the background and incorporate results later.

- Dynamic API tools (use an API to provide the tools for a conversation) and API revisions (for instance by hosting the API spec under a URL/git).

paulgb2y ago

For authentication, since the tool call itself actually runs on your own server, can’t you just look at who the authed user is that made the request?

oezi2y ago

OpenAI doesn't give you a way to identify the user.

And even if they did, it would be poor UX to have the user have to visit our site first to connect their API accounts.

I also imagine many tools wouldn't run under the developers' control (of course you could relay over your server).

3 more replies

TZubiri2y ago

Bro you are given a state of the art multi million dollar compute for like a couple of cents and you complain about not having it spoonfed to you.

You have an http api, implement all of this yourself, the devs can't read your mind.

You should be able to issue a request and do stuff before reading the response, boom non-blocking. If you can't handle low level, just use threads plus your favourite abstraction?

User API auth. Never seen this by an api provider, you are in charge of user auth, what do you even expect here?

Do your job, openai isn't supposed to magically solve this for you, you are not a consumer of magical solutions, you are now a provider of them

oezi2y ago

About non-blocking: I am asking for their tools API to not block the user from continuing the conversation while my tool works. You seem to be thinking about something else.

1 more reply

skywhopper2y ago

1 more reply

rcarmo2y ago· 5 in thread

I do hope we converge on a standardized API and schema for this. Testing and integrating multiple LLMs is tiresome with all the silly little variations in API and prompt formatting.

habosa2y ago

OpenRouter is a great step in that direction: https://openrouter.ai/

ilaksh2y ago

It looks very similar if not identical to OpenAI?

sdeep272y ago

check out LiteLLM... been using in (lite) production and they make it easy to switch between models with a standardized API.

TZubiri2y ago

Langchain.

But it's too bleeding edge, you are asking a lot.

Just do the work and don't be spoiled senseless

rcarmo2y ago

Langchain, for all its popularity, is some of the worst, most brittle Python code I’ve ever seen or tried to use, so I’d prefer to have things sorted out for me at the API level.

2 more replies

minimaxir2y ago· 4 in thread

Tested it out a bit yesterday: it does work as advertised, and notably does work with image input: https://twitter.com/minimaxir/status/1776248424708612420

syoc2y ago

The mana cost is wrong on 3 out of 4 cards, no?

minimaxir2y ago

I never claimed it was robust (I made this project in an hour after a beer), just that it worked.

Mana costs both on the card and on the rules text (e.g. Ward 2 should be Ward {2}) seem to be an issue and I'm curious as to why. I may have to experiment more with few-shot examples.

iAkashPaul2y ago

TGI+grammar loaded with Mistral/Mixtral works great for structured output now! No more langchain exception handling for unmatched Pydantic definitions.

morkalork2y ago

Two things help with this: add an assistant prompt that is just "{", and put "}" in the stop sequence.

Y_Y2y ago· 3 in thread

TZubiri2y ago

Probably US only, and you are not in the US? Otherwise use your real phone.

Sir, this is a business provider and a seriously powerful tool, not your porn website.

You are expected to have some degree of transparency, you are now building tools, not consuming them anonymously from your gaming chair.

BriggyDwiggs422y ago

FeepingCreature2y ago

Yeah, porn websites work better...

rpigab2y ago· 1 in thread

I've set it up this way: I've told Claude that whenever he doesn't know how to answer, he can ask ChatGPT instead. I've set up ChatGPT the same way, he can ask Claude if needed.

Now they always find an answer. Problem solved.

danenania2y ago

That's fun. How many times will they go back and forth? Do you ever get infinite loops?

mercurialsolo2y ago· 1 in thread

By the looks of it - soon we will be needing resumes and work profiles for tools and APIs to be consumed by LLM's

htrp2y ago

Welcome to virtual employees, complete with virtual HR for hiring

pesenti2y ago· 1 in thread

What will the cost be? When sending back function calls results, what will be the number of tokens? Just the ones corresponding to the results or that plus the full context?

TZubiri2y ago

Usually just result tokens plus prompt tokens, there might be a special prompt used here.

nunodonato2y ago· 1 in thread

tiptup3002y ago

Is there a reason you wouldn't have abstracted your llm calling?

danenania2y ago

I'm looking forward to trying this out with Plandex[1] (a terminal-based AI coding tool I recently launched that can build large features).

Plandex does rely on OpenAI's streaming function calls for its build progress indicators, so the lack of streaming is a bit unfortunate. But great to hear that it will be included in GA.

1 - https://github.com/plandex-ai/plandex

skywhopper2y ago

geros2y ago

interstice2y ago

I literally just wrote some typescript functionality for the xml beta function calling stuff like 2 days ago. The problem with the bleeding edge is occasionally cutting yourself I guess.

ilaksh2y ago

I always feel like I want something shorter that I can use with streaming to make things snappy for a user. Starting with speech output.

beefnugs2y ago

They say it is production ready and beta in the same sentence? When did the definition of beta change?

CraftingLinks2y ago

Thank you! I was waiting for this.

ametrau2y ago

I’ve used Claude and I’m not impressed. Opus or the other one.

j / k navigate · click thread line to collapse