I've also noticed that the LLMs are much better at writing code than structured JSON (no real surprise given the popularity of code assistants). If it makes sense in the specific situation, I now have the LLM generate code and parse it into the right structure rather than requesting structured data directly:
`generate_event("I need to do X", new Date("1-1-2025"))` seems to be more reliable to generate than `{ "description": "I need to do X", "when": "1-1-2025" }`
I have a 7000 token prompt that generates JSON chugging away in production and at scale I'm seeing ~1 in 4000 generations require a re-generation, and even that could probably be killed with some basic "healing" code.
OSS are prone to outputting garbage in my experience, but OP mentions ChatGPT:
How are you running into issues if you simply prefill the response with ```json and set ``` as your stop token?
Also, are people also just not trying to parse the opening and closing bracket and treating it as broken if there's a preamble? The prefill gets rid of the preamble, but if you're not willing/able to prefill, how hard is getting JSON out of a string?
I've been working on solving this for the past 2 years or so and I went through much of the same struggles in the beginning until we came up with a solution which is fairly complex, to get LLM's to output data in a way we can use.
The big problem is that 95% accuracy is not good enough for calendars. People lose confidence after 1 failed attempt. Trying to get LLM's to output JSON can have a 1 in 1000 invalid JSON problem which is unrecoverable. What I wound up doing is training models for the tasks with tremendous amounts of data. I did not use OpenAI's models as they were not right for the job. Would love feedback.
convoke.ai
Is this like PowerAutomate, but for G Suite products?
Also I don't know why, looking at the site makes me think it's a candidate that Google is going to kill off without warning: https://www.google.com/script/start/
It's one of the Google products I worry least about, mainly because there are 15+ years of existing Google Sheets documents that people have built using it at this point. I don't think even Google would lightly break THAT many of their existing (often paid) users.
Says everyone using GCP services that get deprecated.
And it has had significant updates as well. The newer runtime is based on V8 using some clever way to isolate the code. Previously they used Mozilla Rhino as the runtime because it was easier to sandbox, but it was also very frustrating to work with.
Now the DX is much better with more recent and performant executions and a better UI.
If I sound like a sucker it's because I am and building with Apps Script is fun even with its limitations.
While Power Platform in comparison might be 100x more powerful for all I care and I still wouldn't touch it with a 10ft pole. Most frustrating experience I ever had.
But for ingesting unstructured data I didn't generate, I'm going to reach for an LLM now (provided latency isn't an issue).
In cases like this the author could just set up a filter like:
"If the email contains a task for me" (or some variation)
Then add a Gmail label to it.
In this way the author will immediately find all the actionable emails for him in a specific folder, much faster to skim and to keep track of all of them.
Another option would it be to have GabrielAI generate a Draft like "reply acknowledging the task and put a to-do date in the email in 1 week"
This would allow Google to track the email and the deadline.
A bit classless, but do you mind if I reach out to you about your experience building GabrielAI?
Feel free to use the email address on the GabrielAI website!
I'd be happy to chat about it!
Maintaining multi-level priorities requires more decisions to evaluate relative priorities of different tasks and possible priority re-evaluation when new tasks arrive. Throw some colleagues, friends or others into the mix and agreements on the decisions become more distant.
Within the low-priority list these are sorted on the date and time required. If you then choose to ignore the priorities or sorting, the deviation will take you down the priority re-evaluation rabbit hole again. It's then your choice to follow the process, or not. Avoiding adding complexity to task scheduling and processes ensures I have focus.
Of course, this will not be for everyone. Good luck with that LLM!