YakGPT is a simple, frontend-only, ChatGPT UI you can use to either chat normally, or, more excitingly, use your mic + OpenAI's Whisper API to chat hands-free.
Some features:
* A few fun characters pre-installed
* No tracking or analytics, OpenAI is the only thing it calls out to
* Optimized for mobile use via hands-free mode and cross-platform compressed audio recording
* Your API key and chat history are stored in browser local storage only
* Open-source, you can either use the deployed version at Vercel, or run it locally
Planned features:
* Integrate Eleven Labs & other TTS services to enable full hands-free conversation
* Implement LangChain and/or plugins
* Integrate more ASR services that allow for streaming
Source code: https://github.com/yakGPT/yakGPT
I’d love for you to try it out and hear your feedback!
Most people can talk faster than they can type, but they can read faster than other people can talk. So an interface where I speak but read the response is an ideal way of interfacing with ChatGPT.
What would be nice is if I didn't have to press the mic button to speak -- if it could just tell when I was speaking (perhaps by saying "hey YakGPT"). But I see how that might be hard to implement.
Would love to hook this up to some smart glasses with a heads-up display where I could speak and read the response.
Most people I know type faster than they can talk. Also more accurate. I find talking a horrible interface to a computer while sitting down. On the move it is another story entirely of course.
By the way, chatgpt is not very fast either, so usually I type something in the chat and continue working while it generates the response.
> smart glasses
I just tried that; it works quite well, however, pressing the mic button kind of messes up that experience.
I gave up at
Creating an optimized production build ...TypeError: Cannot read properties of null (reading 'useRef')
Failed to compile.
pages/index.tsx
`next/font` error:
Failed to fetch `Inter` from Google Fonts.
> Build failed because of webpack errors
Apparently because it can't fetch a font from Google. There should be assets that are critical (js/ts code, templates,css) and assets that are not (freaking fonts) to a yarn build.edit: hacketyfixey, let's punch the thing in the face until it works:
./pages/index.tsx:
2: // import { Inter } from "next/font/google";
12: // const inter = Inter({ subsets: ["latin"] });
(I am sorry)It uses 2 external calls to a javascript CDN for the microphone package and something else. It would probably be best if it was localhost calls only since it handles an API key
Any chance you could integrate the backend-api, and let me paste in my Bearer token from there?
* GPT-4 is decently faster when talking straight to the API
* The API is so stupidly cheap that it's basically a rounding error for me. Half an hour of chatting to GPT3.5 costs me $0.02
Would be curious what you mean by integrating the backend-api?
But the benefit from using the API is that you can change the model on the fly, so you chat with 3.5 until you notice that it's not responding properly and, with all the history you have (probably stored in your database), you can send a bigger request with a probably better response once with GPT-4 as the selected model.
I really wish the interface on chat.openai.org would allow me to switch between models in the same conversation in order to 1) not use up your quota of GPT-4 interactions per 3 hours as quickly and 2) not strain the backend unnecessarily when you know that starting a conversation with GPT-3.5 is efficient enough until you notice that you better switch models.
OpenAI already has this implemented: When you use up your quota of GPT-4 chats, it offers you to drop down into GPT-3.5 in that same conversation.
Maybe I’ll have to try this for a month and see if it end up costing more than $20. Thanks for creating it!
https://help.openai.com/en/articles/5722486-how-your-data-is...
That's not what "run locally" means. This isn't any more "local" than talking to chatgpt directly, which is never running locally.
I'm thinking: What would a GPT project manager do? What would a GPT money manager do? What would a GPT logistics manager do? GPT Data Analyst, Etc.
> Please enter your OpenAI key
...
Do people just not get it?
I would in fact rather give all my company secrets to this random dude than OpenAI.
It seems they are genuine, and they phrase it exactly as it is. The only thing I would have maybe wanted to see in the title is "open-source" or free software.
One feature I am missing from all these front ends is the ability to edit your text and generate new response from that point. Official chat gpt UI is the only one that seems to do that.
There are a few drawbacks to local, I've discovered. For example I doubt the new plugins can be extended to beyond ChatGPT's web UI. Also, it doesn't stream response tokens as they're generated, which is a pain. I haven't looked into whether OpenAPI let you do that though.
Nice work!
Do you know their privacy for our voices? Do they train on it, hear it, etc ?
if you're using the hosted whisper, they can. however, they don't specifically talk about it.
I kind of want to throw this up on a server for my housemates to use, I am currently the only person with a openai account, so I would like the ability to embed my API key. Minor feature request :-)
Otherwise, it really looks good.
As mentioned by others, it would be great to customize or write new personas/prompts.
Also could you add a voice chatbot as well using vocode? It could be an alternative UI for each of the personas.
BTW I have a lot of these ChatGPT UI apps installed, mostly free and open-source. Perhaps this is really the era of going back to just talking to a chat interface like the old times.
Interesting note: I tried speaking mandrain chinese to the mic and it auto translated what I said into English.
Well done.
You can also generate multiple keys, so if one app misbehaves, you don't need to rotate all the keys, just the one that misbehaves.
This is assuming the API keys can only do generation. If it can access billing details or something it's very different of course.
Because it's bad practice to provide sensitive information to untrusted sources, and if you are an ethical developer, it's an anti-pattern to write software that encourages bad practices.
Your credit card company will reverse any authorized charges. Will you email me all your credit card info?
I answer back to myself: I miss-understood since the idea of the developer is to run it locally http://localhost:3000 while I got scared from the DEMO
Congrats to the developer!
(now, I just need Openai to take me off the waitlist for GPT-4)
I haven't played with the OpenAI API yet. Is there examples of good prompts to use to get good responses?
This seems to be a contradiction. Am I running it locally, or is it running on someone else's server?