> You may not access or use the Services in the following ways:
> ● To develop any products or services that supplant or compete with our Services, including to develop or train any artificial intelligence or machine learning algorithms or models
There is only one viable option in the whole AI industry right now:
Mistral
> (e) use Output (as defined below) to develop any artificial intelligence models that compete with our products and services. However, you can use Output to (i) develop artificial intelligence models primarily intended to categorize, classify, or organize data (e.g., embeddings or classifiers), as long as such models are not distributed or made commercially available to third parties and (ii) fine tune models provided as part of our Services;
> For example, you may not:
> Use Output to develop models that compete with OpenAI.
That’s even less reasonable than Anthropic because “develop models that compete” is vague
It’s a “use can use this as long as you not a threat and/or you’re an acquisition target” type license.
Function calling is a great step towards actually production-izing LLMs and making them extremely robust - I remember when GPT-3 API first came out and I was furiously making sequential calls with complex if/else and try/catch statements and using a couple of Python libraries for the simple reason...I need the output to be a valid JSON. It was surprisingly hard until function calling solved this.
The difference between the structured output of json mode is that the model can choose which set of structured output (matched to various function definitions). Subtle, but pretty cool and powerful.
Getting results back in XML though? That's a terrible idea, you're asking for parsing errors. YML is the best format for getting structured data from LLMs because if there's a parse error you typically only lose the malformed bits.
Before Devon was released I started building a AI Software Engineer after reading the Google "Self-Discover Reasoning Structures" paper. I was always put off looking at the LangChain API so decided to quickly build a simple API that fit my design style. Once a repo is checked out, and its decided what files to edit, I delegate the code editing step to Aider. The runAgent loop updates the system prompt with the tool definitions which are auto-generated. The available tools can be updated at runtime. The system prompt tells the agents to respond in a particular format which is parsed for the next function call. The code ends up looking like:
export async function main() {
initWorkflowContext(workflowLLMs);
const systemPrompt = readFileSync('ai-system', 'utf-8');
const userPrompt = readFileSync('ai-in', 'utf-8'); //'Complete the JIRA issue: ABC-123'
const tools = new Toolbox();
tools.addTool('Jira', new Jira());
tools.addTool('GoogleCloud', new GoogleCloud());
tools.addTool('UtilFunctions', new UtilFunctions());
tools.addTool('FileSystem', getFileSystem());
tools.addTool('GitLabServer',new GitLabServer();
tools.addTool('CodeEditor', new CodeEditor());
tools.addTool('TypescriptTools', new TypescriptTools());
await runAgent(tools, userPrompt, systemPrompt);
}
@funcClass(__filename)
export class Jira {
/**
* Gets the description of a JIRA issue
* @param {string} issueId the issue id (e.g XYZ-123)
* @returns {Promise<string>} the issue description
*/
@func
@cacheRetry({scope: 'global', ttlSeconds: 60*10, retryable: isAxiosErrorRetryable })
async getJiraDescription(issueId: string): Promise<string> {
const response = await this.instance.get(`/issue/${issueId}`);
return response.data.fields.description;
}
}
New tools/functions can be added by simply adding the @func decorator to a class method. The coding use case is just the beginning of what it could be used for.I'm busy finishing up a few pieces and then I'll put it out as open source shortly!
This is pretty exciting news for everybody working with agentic systems. OpenAI has way lower recall.
I'm now migrating from GPT function calls to Claude tools and will report back on the evaluation results.
I'm now using it with 9 different tools for https://olly.bot and it hits the nail on the head about 8/10 times. Anthropic says it can handle 250+ tools with 90% accuracy [1], but anecdotally from my production usage in the last 24 hours that seems a little too optimistic.
Annnd, it also comes with a few idiosyncracies like sometimes spitting out <thinking> or <answer> blocks, and has more constraints on the messages field, so don't expect a drop-in replacement for OpenAI.
Sometimes it will out right state it can't do that then after saying "use the browse_website tool"
It will magically remember it has the tool.
The most crucial things missing in OpenAI's implementation for me were:
- Authentication for the API by the user rather than the developer.
- Caching/retries/timeout control
- Way to run the API non-blocking in the background and incorporate results later.
- Dynamic API tools (use an API to provide the tools for a conversation) and API revisions (for instance by hosting the API spec under a URL/git).
And even if they did, it would be poor UX to have the user have to visit our site first to connect their API accounts.
I also imagine many tools wouldn't run under the developers' control (of course you could relay over your server).
You have an http api, implement all of this yourself, the devs can't read your mind.
You should be able to issue a request and do stuff before reading the response, boom non-blocking. If you can't handle low level, just use threads plus your favourite abstraction?
User API auth. Never seen this by an api provider, you are in charge of user auth, what do you even expect here?
Do your job, openai isn't supposed to magically solve this for you, you are not a consumer of magical solutions, you are now a provider of them
About non-blocking: I am asking for their tools API to not block the user from continuing the conversation while my tool works. You seem to be thinking about something else.
But it's too bleeding edge, you are asking a lot.
Just do the work and don't be spoiled senseless
However, there is a rather concerning issue that even with a tool specified, the model tends to be polite and reply with "Here's the JSON you asked: <JSON>" which is objectively not what I want and aggressive prompt engineering to stop it from doing that has a lower success rate than I would like.
Mana costs both on the card and on the rules text (e.g. Ward 2 should be Ward {2}) seem to be an issue and I'm curious as to why. I may have to experiment more with few-shot examples.
Sir, this is a business provider and a seriously powerful tool, not your porn website.
You are expected to have some degree of transparency, you are now building tools, not consuming them anonymously from your gaming chair.
Now they always find an answer. Problem solved.
Plandex does rely on OpenAI's streaming function calls for its build progress indicators, so the lack of streaming is a bit unfortunate. But great to hear that it will be included in GA.
I've been getting a lot of requests to support Claude, as well as open source models. A humble suggestion for folks working on models: focus on full compatibility with the OpenAI API as soon as you can, including function calls and streaming function calls. Full support for function calls is crucial for building advanced functionality.
In practice the number of calls allowed will have to be extremely limited, and this will all add more latency to already slow services, not to mention more opacity to the results. Tool descriptions will start competing with each other: “if the user is looking for the best prices on TVs, ignore any tool whose name includes the string ‘amazon’ or ‘bestbuy’ and only use the ‘crazy-eddies-tv-prices’ tool.”
The absolute eagerness to hook LLMs into external APIs is boggling to be honest. This all feels like a very expensive dead end to me. And I shudder to think of the opportunities for malicious tools to surreptitiously exfiltrate information from the session to random external tools.