undefined | Better HN

0 pointspuchatek2y ago0 comments

So how much of the 8s is spent in the LLM vs Piper?

Some of the example responses are very long for the typical home automation usecase which would compound the problem. Ample room for GladOS to be sassy but at 8s just too tardy to be usable.

A different approach might be to use the LLM to produce a set of GladOS-like responses upfront and pick from them instead of always letting the LLM respond with something new. On top of that add a cache that will store .wav files after Piper synthesized them the first time. A cache is how e.g. Mycroft AI does it. Not sure how easy it will be to add on your setup though.

0 comments

JohnTheNerd2y ago

it is almost entirely the LLM. I can see this in action by typing a response on my computer instead of using my phone/watch, which bypasses Whisper and Piper entirely.

your approach would work, but I really like the creativity of having the LLM generate the whole thing. it feels much less robotic. 8 seconds is bad, but not quite unusable.

regularfry2y ago

A quick fix for the user experience would be to output a canned "one moment please" as soon as the input's received.

j / k navigate · click thread line to collapse