Jarvis: A Voice Virtual Assistant in Python (OpenAI, ElevenLabs, Deepgram) (opens in new tab)

(github.com)

83 pointsAlyx13372y ago58 comments

58 comments

51 comments · 14 top-level

Spiwux2y ago· 10 in thread

I wonder if we're at a point where you could build a voice assistant like that, except almost-realtime and streamed end to end:

User speaks and speech to text starts streaming text while the user is still speaking. That text stream is piped into a LLM, which also streams its output text. That output text is streamed to text-to-speech, which also generates audio in a streaming manner.

modeless2y ago

I implemented this! All local models. And I packaged it up so people can install it with one click: https://apps.microsoft.com/detail/9NC624PBFGB7

The speech recognition part needs work for sure, but when it works you can see the potential. It's very different from the way it feels to talk to Siri or even ChatGPT's voice mode. It won't be long before we are having real conversations with our computers.

bjelkeman-again2y ago

Could you record a demo of this?

1 more reply

3abiton2y ago

But how realtime is it?

1 more reply

evilantnie2y ago

TTS and STT models have decent support for streaming in chunks, but the accuracy drops the smaller the chunk size. Current state of LLMs are pretty limited in their ability to handle streaming inputs due to attention window constraints. There is some emerging research into attention sinks and caching initial tokens that look promising. I don't think we're quite there yet though.

everforward2y ago

You can do the "almost-realtime" part, all locally. I tinkered with a Python script for a few hours that used Whisper to speech-to-text, fed that into a local Mistral model (don't recall which), and then piped the output into text-to-speech.

It wasn't really streamed, though. Audio input was buffered, fully evaluated to a string, then fed into the LLM and the full text was converted back to audio.

The Whisper speech-to-text was pretty real-time, the LLM was not. I was barely scraping by on hardware specs, though.

canadiantim2y ago

you try using ESP box?

zaptrem2y ago

Available as a phone line API (https://www.vocode.dev) and OS project (https://github.com/vocodedev/vocode-python)

adroitboss2y ago

This has happened already. It was maybe about 7 months ago and I believe it was a twitter link posted here. They took it further and streamed it to twilio to create a live phone call.

fudged712y ago

The one I tried was called Vocode

WiSaGaN2y ago

Any existing stream api for llm input?

bitsandbooks2y ago· 8 in thread

"Jarvis" is a trademark of Marvel, so that name will definitely not work. https://trademarks.justia.com/862/94/jarvis-86294162.html

torstenvl2y ago

IANAIPL but I find it difficult to believe that trademark is valid, considering it's never been used in trade. There is no Jarvis digital assistant software sold either fictionally or IRL. Even if the trademark were somehow upheld, I don't see how there could be any damages.

bsenftner2y ago

Trademarks are industry specific. If you made a fictional AI character named Jarvis and tried selling media based on that character THEN Marvel has a case. Creating a talking AI Assistant named Jarvis would be an expensive court case, which Marvel/Disney has to cash to pursue, but it would be a legal stretch with a lot of moneyed interests willing to back the non-Marvel/Disney side.

djoldman2y ago

And just for those who felt the need to check:

> Computer application software that may be downloaded via global computer networks and electronic communication networks for use in connection with mobile computers, mobile phones, and tablet computers, namely, software for use as a voice controlled personal digital assistant

1 more reply

beardyw2y ago

But there are many apps called Jarvis, so I am not sure how that is supposed to work?

petemir2y ago

So many that it is actually quite counterproductive to call it that way. I honestly have lost track of how many AI-based assistants named JARVIS I have encountered already =/.

racl1012y ago

What about Jenkins? oh yeah nope.

Um what about Jeeves? oh yeah nope.

Ok we need more butler names.

What about Smithers? Or Jeffrey?