My Journey to a reliable and enjoyable locally hosted voice assistant (2025) (opens in new tab)

https://repebble.com/index

tomComb9d ago

The pebble index seems like the optimal form for this.

Could be pressed even if your hands were busy.

_kb9d ago

Like a light switch?

Or do you mean a button that activates chunked recording, passes it to a speech-to-text model, forwards to an LLM to infer intent, which triggers HA to issue a command, over a wireless network, to the computer with the light attached, to tell the light to turn on.

kortilla9d ago

Rules out a bunch of cases where your hands are busy handling ingredients in the kitchen, etc

croes9d ago

Time for a real life Star Trek comm badge

pwillia79d ago

I'm in if I can embed it into my forearm

jcims9d ago

I have a feeling beamforming microphone arrays might help here, something like this could improve the audio being processed substantially - https://www.minidsp.com/products/usb-audio-interface/uma-8-m....

ethagnawl9d ago

That's a good call. I have a PS3(?) mic/camera that I was using when I was running the original Mycroft project on a Pi. I wonder if that would help with the inbuilt HA mic not waking for most of my family, most of the time. I will have to look at my VA Preview device and its specs later because I'm not sure if you can connect an external mic to it out-of-the-box.

IshKebab9d ago

Alexa devices have these (or used to at least), but Google Home's never did. So it shouldn't be necessary.

ethagnawl9d ago

What's been surprising in my experience regarding the wake word is that it recognizes me (adult male) saying the wake word ~95% of the time. However, it only registers the rest of my family (women and children) ~30% of the time.

vineyardmike9d ago

I have no firsthand knowledge, but I’d strongly bet that the home-assistant effort to donate training data is mostly get adult males, and nearly zero children.

mlrtime9d ago

I thought all people's voice had to be trained, and if you didn't go through it the match % was much smaller.

robotswantdata9d ago

What about your wifi APs sensing which room you are in, with your choice of hilarious dance moves as the trigger ?

Funky chicken for Gemini

Penguin dance for OpenAI

Claude?

Crespyl9d ago

> Claude?

The Zoidberg Shuffle?

senkora9d ago

Why not use an easier to detect wake “word”, like two claps in quick succession? Or a couple of notes of a melody?

hamdingers9d ago

Can't clap if your hands are full and I would not subject my family to my attempts at delivering a melody.

I haven't tried training my own wake word though, I'm tempted to see if it improves things.

stavros9d ago

Train a custom wakeword with this? https://github.com/TaterTotterson/microWakeWord-Trainer-Nvid...

JohnTheNerd9d ago

or if you have an Apple Silicon Mac, this: https://github.com/TaterTotterson/microWakeWord-Trainer-Appl...

I used it personally, did a lot of research (including asking questions to the creator of microWakeWord), and submitted an upstream PR (I think it's already merged), which improved the resulting model slightly. I imagine the Nvidia version is similar, but I don't have experience with it. I also noticed that the model is so small (~25000 parameters), the actual training part doesn't even noticably improve with the GPU, only the TTS voice generation really only uses it.

if you are using this, I strongly recommend you create lots of personal samples with the recorder. I personally used 400, 200 from myself and 200 from my partner, with varying moods and in all the rooms we plan on using the assistant. I am considering re-training with more samples. it takes effort, but the resulting model seems to be well worth it.

[0] https://www.home-assistant.io/voice_control/worlds-most-priv...

pjc509d ago

Wake word detection in low power DSP is a not-quite-COTS product but definitely exists. I believe PC manufacturers are looking at adding it to laptops soon, precisely to use with AI assistants.

lostmsu8d ago

Why do you even need a wake word? Have a model look at full transcript and decide when to engage.

homeonthemtn8d ago

How are you using Gemini in HA?

tkems9d ago

One that I have been experimenting with is using analog phones (including rotary ones!) to act as the satellites. I live in an older home and have phone jacks in most of the rooms already so I only had to use a single analog telephone adapter. [0] The downside is I don't have wake word support, but it makes it more private and I don't find myself missing my smart speakers that much. At some point I would like to also support other types of calls on the phones, but for now I need to get an LLM hooked up to it.

Obscurity43409d ago

I wish I was remotely closer to being this kind of hacker :(

StevenNunez8d ago

I believe in you.

ljclifford9d ago

actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.

the core issue is prosody: kokoro and piper are trained on read speech, but conversational responses have shorter breath groups and different stress patterns on function words. that's why numbers, addresses, and hedged phrases sound off even when everything else works.

the fix is training data composition. conversational and read speech have different prosody distributions and models don't generalize across them. for self-hosted, coqui xtts-v2 [1] is worth trying if you want more natural english output than kokoro.

btw i'm lily, cofounder of rime [2]. we're solving this for business voice agents at scale, not really the personal home assistant use case, but the underlying problem is the same.

[1] https://github.com/coqui-ai/TTS [2] https://rime.ai

bachittle9d ago

Coqui TTS is actually deprecated, the company shut down. I have a voice assistant that is using gpt-5.4 and opus 4.6 using the subsidized plans from Codex and Claude Code, and it uses STT and TTS from mlx-audio for those portions to be locally hosted: https://github.com/Blaizzy/mlx-audio

Here are the following models I found work well:

- Qwen ASR and TTS are really good. Qwen ASR is faster than OpenAI Whisper on Apple Silicon from my tests. And the TTS model has voice cloning support so you can give it any voice you want. Qwen ASR is my default.

- Chatterbox Turbo also does voice cloning TTS and is more efficient to run than Qwen TTS. Chatterbox Turbo is my default.

- Kitten TTS is good as a small model, better than Kokoro

- Soprano TTS is surprisingly really good for a small model, but it has glitches that prevent it from being my default

But overall the mlx-audio library makes it really easy to try different models and see which ones I like.

alias_neo9d ago

Do you know which HA integration I would use if I want to try out Qwen 3 ASR in HA? Some screenshots in the OP reference Qwen 3 ASR for STT but I can't seem to find any reference to which integration I'd use.

quickthoughts9d ago

I've been working on the flip side of this with ASR models, but the problem space is the same, conversational/real-world data is needed. Whisper often mistook actual words I say and hallucinate all the time when speaking technical jargon. The solution is to fine-tuning whisper with my own data. Hardest part imo was getting the actual data, which in turn got me to build listenr (https://github.com/rebreda/listenr).It's an always-on VAD-based audio dataset builder. Could be used for building conversational/real-world voice datasets for TTS models too?

After getting it working i was get motivation to actually able to build out the full fine-tuning pipeline. I wrote a little post about it all https://quickthoughts.ca/posts/listenr-asr-training-data-pro...

cptskippy9d ago

> actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.

I would argue that the hardest part is correctly recognizing that it's being addressed. 98% of my frustration with voice assistants is them not responding when spoken to. The other 2% is realizing I want them to stop talking.

cdcarter9d ago

80% of my home voice assistant requests really need no response other than an affirmative sound effect.

nickthegreek9d ago

100% agree. I dont want a Yes, Got it, Will do or even worse, I have turned on the Bedroom Light. I want soft success ding or a low failure boop.

https://www.youtube.com/watch?v=BIjjDC3tFfU

renewiltord9d ago

That’s what Google Home does. “Hey, Google, good night”. Beep response then turns off the lights, brings down the blinds etc. but if something is out of whack it talks. I find it convenient.

ericmcer8d ago

Seriously for audio conversations the LLM layer is fairly stable. Getting STT and TTS to be reliable has been a much bigger hurdle.

I hear the same phrases 10+ times in a day and they stress things a bit different each time, it seems like an exceptionally hard problem. My dream of a super reliable [llm output stream -> streaming TTS endpoint -> webRTC audio stream] seems pretty much impossible at this point.

Is the goal to trick people into thinking it is a human or to create a high trust robot? I am hoping as voice agents get more sophisticated the stigma around "It's making me talk to a robot" lessens so we don't need to worry so much about convincing someone it is a real person.

buildsjets9d ago

Can you make it sound just like Titus Moody? I want to hear your voice assistant say "No sir, I don't hold with furniture that talks."

voidUpdate9d ago

Do people like talking to voice assistants? I've used one occasionally (mostly for timers when I'm cooking), but most of the time it would be faster for me to just do it myself, and feels much less awkward than talking to empty air, asking it to do things for me. It might be because I just really don't like making more noise than I have to

(Yes, I appreciate that some people may be disabled in such a way that it makes sense to use voice assistants, eg motor problems)

hamdingers9d ago

I consider each time I need to pull out my phone and "do it myself" to be a failure of my smart home system.

If a light cannot be automatically on when I need it (like a motion sensor) or controlled with a dedicated button within arms reach (like a remote on my desk) then the third best option is one that lets me control it without interrupting what I'm doing, moving from where I am, using my hands, or possessing anything (a voice assistant).

voidUpdate9d ago

Do you not just turn the light on when you go in a room, and turn it off again when you go out? All the rooms in my flat have switches next to the door

mlrtime9d ago

Do you have a wife / kids? If so how do you "teach" them this?

My point being that it might be a failure to you but not them, some people don't want it.

This is my struggle, how to get the automation to do what I want without affecting everyone else equally. (And vise versa)

freeone30009d ago

I use it frequently for reminders and calendar events when not at a computer, as voice is faster than the mobile interface (with so many screens) for setting something up

Insanity9d ago

I guess most of my use is whilst driving, to start/stop music or audiobooks, change navigation etc. Although changing navigation through Siri is somewhat painful as it often gets my intended destination wrong lol.

nickthegreek9d ago

I prefer voice strongly. I don't want to stop what i am doing, find a device, open the app, wait for it refresh, navigate and click to get Milk on a list. Sure you can bring this down a few steps, but all of which still require me to move, have a hand and eye free.

RankingMember9d ago

I love it for lists- like my hands are full making something in the kitchen and I can just tell it to add things to my grocery list as soon as I notice I'm out of something.

lordmathis9d ago

I started designing and building a voice assistant for myself and then realized that the only time I'd find it useful would be during cooking to set timers. But a loud extractor fan would be running making the voice recognition very difficult.

jlokier9d ago

An extractor fan is the kind of consistent noise that good signal processing and voice recognition ought to be able to strip out, especially if using a dispersed mic array. Even if your voice is much quieter (to your human ears) than the fan. It's a channel separation problem.

phatskat9d ago

I pretty much only use them for timers and weather, and the occasional lookup for quick random info. And this is all only if I don’t have a phone handy or eg the toddler is going to timeout and I need to set his timer in the midst of him having a meltdown about it.

It’s why I haven’t and won’t enable Gemini, and I’ll likely chuck my nest minis once I’m forced to have an LLM-based experience. Hopefully they’ll be able to at least function as dumb Bluetooth speakers still but I’m not holding out hope on that end

pedro_caetano9d ago

Not voice assistants but for anything that falls into the body of text category (emails, letters, documentaton) I just use Dragon NaturallySpeaking, mainly to give myself an RSI break from typing.

A Radiologist friend of mine convinced me to give it a try, apparently radiology reports are dictated in most places nowadays

I think the main frustration is often speed and precision but with modern dictation software it is pretty flawless.

konart7d ago

Voice assistants can be great if you have kids. Sometimes your hands are full in more way than one.

Same for different scenarios when you don't want to use your hands (say you are replanting a flower or something).

harrall9d ago

I would, if they worked even 90%.

I mostly set timers because it’s one of the few things that always works.

nancyminusone9d ago

I don't. I pretty much don't like talking in general, especially if I'm alone. Accordingly, no voice assistants; I don't think I've ever triggered one except accidentally.

yanis_t9d ago

I'm still waiting till the promise of voice AI that was showed during the OpenAI demo in 2024 turn real somehow. It's not clear to me, why there has been zero progress since then.

j459d ago

What tech can do vs applying it requires it often to be configured and packaged to be usable in that way.

phito9d ago

It also needs to work at least 99% of the time if not more. Not easy to do this with indeterministic models.

dewey9d ago

Their first version is most likely already 10x better than Siri.

> Understands when it is in a particular area and does not ask “which light?” when there is only one light in the area, but does correctly ask when there are multiple of the device type in the given area.

alex_young9d ago

One of my favorite episodes:

I set 2 timers for the same thing somehow. I then tried to cancel one of them.

  >“Siri, cancel the second timer”
  “You have 2 timers running, would you like me to cancel one of them?”
  >“Yes”
  “Yes is an English rock band from the 70s…”
  >“Siri, please cancel the timer with 2 minutes and 10 seconds on it”
  “Would you like me to cancel the timer with 2 minutes and 8 seconds on it?”
  >“Yes”
  “Yes is an English rock band from the 70s…”

Eventually they both rang and she listened when I said stop.

abroadwin9d ago

My favorite is when I ask Siri to set a timer and get back "there are no timers running."

https://xdaforums.com/t/unlock-root-twrp-unbrick-amazon-echo...

0_____09d ago

> "Stop" is a song by English girl group the Spice Girls from their second studio album, Spiceworld (1997).

ted_dunning9d ago

At that point I would be very impressed if you could remember what the timers are for.

xp849d ago

Helping my kid get ready for shower I had this exchange:

Me: "Text Jane Would you mind dropping down the robe and underpants"

Siri: Sends Jane "Would you mind dropping down"

Me: rolls eyes "Text Jane robe and underpants"

Siri: "I don't see a Jane Robe in your contacts."

Me: wishes I could drown Siri in the bathtub

It's wild to me that Apple got the ability to do the actual speech-to-text part pretty much 100% solved more than half a decade ago, yet struggles in 2026 to turn streams of very simple, correctly-transcribed text into intents in ways that even a local model can figure out. Siri is good STT, a bunch of serviceable APIs that can control lots of stuff, with the digital equivalent of a brain-damaged cat sitting at the center of it guaranteeing the worst possible experience.

phatskat9d ago

It’s wild how many of you have issues with Siri - and to be clear I’m not here to discount those issues, and I very much believe all of the anecdotes here.

For me, Siri on either phone or watch is pretty much perfect - I don’t ask for much, mostly timers or making reminders.

Google’s Nest Minis though? “Lights on” has a 50/50 shot of being a song of the same name, or similar name, or totally unrelated name. Same for “lights off”. If I don’t annunciate “play rain sounds” clearly enough I get an album called “Rain Songs” that is very much NOT calming for bed time. It doesn’t help that none of these understand that if I whisper a command, it should respond quietly - honestly the siris and nests and alexas all got like one iteration and then stopped it feels like.

I want more features but less LLM. I want more control, and more predictability. Eg if every night around 1am I say “play rain sounds” my god just learn that I’m not, in all likelihood, asking to hear an album I’ve never listened to!

kbuck9d ago

I bought a Home Assistant Voice Preview Edition to try out. It's surprisingly good, but still falls short when compared to Google Home speakers:

- Wake word detection isn't as good as the Google Homes (more false positives, more false negatives - so I can't just tune sensitivity).

- Mic and speakers are both of poor quality in comparison to Google Home devices.

- Flow is awkward. On a Google Home device, you can say "Okay Google, turn on the lights" with no pause. On the Voice PE, you have to say "Hey Mycroft [awkward pause while you wait for the acknowledgement noise] turn on the lights" - it seems like the Google Home devices start buffering immediately after the wake word, but the Voice PE doesn't.

- Voice fingerprints don't exist, so this prevents the device from figuring out that two separate people are talking, or who is talking to it.

- The device has poor identification of background noise, so if you talk to it while there is a TV playing speech in the background, it will continue to listen to the speech from the TV. It will eventually transcribe everything you said + everything from the TV and get confused. (This probably folds into the voice print thing as well.)

On the upside, though:

- Setting it up was really easy.

- All of the entities I want to control with it are already available, without needing to export them or set them up separately in Google Home.

- Despite all of the above complaints, the device is probably 80-90% of what I realistically need to use it day-to-day. If they throw a better speaker and mic array in, I'd likely be comfortable replacing all of my Google Homes.

IX-1039d ago

> it seems like the Google Home devices start buffering immediately after the wake word, but the Voice PE doesn't.

Google Home devices are always buffering. The wake word just tells it to look back in the buffer and start processing.

leftytak8d ago

I picked up the same model, including the shipping to Canada, it ended up costing a lot for what it is.

How are you hosting your LLM locally? I tried Ollama on an M4 Mac mini, even with a smaller LLM, the performance was very poor.

daveoc649d ago

I've recently purchased a couple of the Home Assistant Voice Preview Edition devices, and they leave a lot to be desired.

The wake word detection isn't great, and the audio quality is abysmal (for voice responses, not music).

Amazon has ruined their Alexa and Echo devices with ads and annoying nag messages.

I'd really like an open alternative, but the basics are lacking right now.

touristtam9d ago

Can those devices (Amazon) be _jail broken_? I was just wondering that this morning while taking a shower.

vineyardmike9d ago

Generally no. Big tech companies have gotten good at locking down devices to the boot loader. Some of the signing keys for certain OTA versions have leaked, but you can’t rely on that.

Some of the devices contain browsers, and people have set up hacky ways to turn them into thin clients through that, but it’s not particularly reliable IME.

I heard some Chinese brands which made similar hardware for Chinese consumers don’t lock their devices down, letting you flash an open install of Android on them, but I haven’t seen anyone try that IRL.

mrpf1ster9d ago

Yes

locao9d ago

Youtube is trying to push me to watch a video about jail breaking the Echo Show for a week now. I didn't watch it, but it's probably easy to find.

Slav_fixflex9d ago

Great write-up. I've been going down the self-hosted rabbit hole too – started with just a VPS, ended up building monitoring and security automation around it. The moment you start self-hosting seriously you realize how much 'invisible work' managed services were doing for you.

quirk9d ago

The best fix I've made to any voice-mode AI is giving it a "done" word. So it has to listen for "pineapple" before it's allowed to process what I said. Just like radio comms (over and out).

system29d ago

I think you will get tired of saying pineapple.

jimmcslim9d ago

I’m keen to see if Nabu Casa release an update to the Voice Assist hardware sometime soon. Something with the same fidelity and finish of the Amazon and Google options but open would be fantastic.

gausswho9d ago

This is five months old now. Any substantial changes to the recommended setup?

nacs9d ago

It looks like the author has kept it updated since then.

They mention the "Qwen3.5 (35B)" model for example which was released around 2 weeks ago.

Insanity9d ago

For some anecdata, I've set up Qwen3.5 on a RX 7900XTX last weekend. It runs fine, did some simple coding prompts and got responses in 15-30 seconds. It's my first foray into running models locally just to see what's possible, and I guess I'm happily surprised so far.

Also, the entire setup was done through Codex. I asked Codex to figure out how to run models locally given my architecture (Ubuntu, AMD GPU). It told me which steps to apply and I hit zero snags.

Animats9d ago

Is there a locally hosted voice assistant for Android phones? One available through F-Droid, if possible.

MrDrMcCoy9d ago

There are a few. I'm currently using Transcribro.

xrd9d ago

I've been having a lot of fun using my old Mycroft AI device. Neon is the new software package. It didn't solve the issues highlighted in this thread, but it is a fun open device to hack on. I wrote a little web app that will speak in the standard voice and say things like "hey kids, I'm AI and know everything, and your dad is really cool." They love to yell at me when I do that.

Gri22lyBear9d ago

the tts thing is a legit pain, right? i tried a few different voices and they all sounded so robotic. kokoro is interesting, i'll have to check that out.

rdos8d ago

> llama.cpp (previously Ollama)

I almost fainted

leeeeeep10129d ago

nice i run one dictatorflow.com that i open sourced lee101/voicetype

j / k navigate · click thread line to collapse

140 comments

hamdingers9d ago

acidburnNSA9d ago

freedomben8d ago

_spduchamp9d ago

How about a button?

I'd prefer to physically press a button on an intercom box than having something churning away constantly processing sound.

hamdingers9d ago

Also I have all my voice assistant devices mounted to the ceiling

1. https://news.ycombinator.com/item?id=47399909

https://repebble.com/index

tomComb9d ago

The pebble index seems like the optimal form for this.

Could be pressed even if your hands were busy.

_kb9d ago

Like a light switch?

kortilla9d ago

Rules out a bunch of cases where your hands are busy handling ingredients in the kitchen, etc

croes9d ago

Time for a real life Star Trek comm badge

pwillia79d ago

I'm in if I can embed it into my forearm

jcims9d ago

ethagnawl9d ago

IshKebab9d ago

Alexa devices have these (or used to at least), but Google Home's never did. So it shouldn't be necessary.

ethagnawl9d ago

vineyardmike9d ago

I have no firsthand knowledge, but I’d strongly bet that the home-assistant effort to donate training data is mostly get adult males, and nearly zero children.

mlrtime9d ago

I thought all people's voice had to be trained, and if you didn't go through it the match % was much smaller.

robotswantdata9d ago

What about your wifi APs sensing which room you are in, with your choice of hilarious dance moves as the trigger ?

Funky chicken for Gemini

Penguin dance for OpenAI

Claude?

Crespyl9d ago

> Claude?

The Zoidberg Shuffle?

senkora9d ago

Why not use an easier to detect wake “word”, like two claps in quick succession? Or a couple of notes of a melody?

hamdingers9d ago

Can't clap if your hands are full and I would not subject my family to my attempts at delivering a melody.

I haven't tried training my own wake word though, I'm tempted to see if it improves things.

stavros9d ago

Train a custom wakeword with this? https://github.com/TaterTotterson/microWakeWord-Trainer-Nvid...

JohnTheNerd9d ago

or if you have an Apple Silicon Mac, this: https://github.com/TaterTotterson/microWakeWord-Trainer-Appl...

[0] https://www.home-assistant.io/voice_control/worlds-most-priv...

pjc509d ago

Wake word detection in low power DSP is a not-quite-COTS product but definitely exists. I believe PC manufacturers are looking at adding it to laptops soon, precisely to use with AI assistants.

lostmsu8d ago

Why do you even need a wake word? Have a model look at full transcript and decide when to engage.

homeonthemtn8d ago

How are you using Gemini in HA?

tkems9d ago

Obscurity43409d ago

I wish I was remotely closer to being this kind of hacker :(

StevenNunez8d ago

I believe in you.

ljclifford9d ago

actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.

btw i'm lily, cofounder of rime [2]. we're solving this for business voice agents at scale, not really the personal home assistant use case, but the underlying problem is the same.

[1] https://github.com/coqui-ai/TTS [2] https://rime.ai

bachittle9d ago

Here are the following models I found work well:

- Chatterbox Turbo also does voice cloning TTS and is more efficient to run than Qwen TTS. Chatterbox Turbo is my default.

- Kitten TTS is good as a small model, better than Kokoro

- Soprano TTS is surprisingly really good for a small model, but it has glitches that prevent it from being my default

But overall the mlx-audio library makes it really easy to try different models and see which ones I like.

alias_neo9d ago

quickthoughts9d ago

cptskippy9d ago

> actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.

cdcarter9d ago

80% of my home voice assistant requests really need no response other than an affirmative sound effect.

nickthegreek9d ago

100% agree. I dont want a Yes, Got it, Will do or even worse, I have turned on the Bedroom Light. I want soft success ding or a low failure boop.

https://www.youtube.com/watch?v=BIjjDC3tFfU

renewiltord9d ago

That’s what Google Home does. “Hey, Google, good night”. Beep response then turns off the lights, brings down the blinds etc. but if something is out of whack it talks. I find it convenient.

ericmcer8d ago

Seriously for audio conversations the LLM layer is fairly stable. Getting STT and TTS to be reliable has been a much bigger hurdle.

buildsjets9d ago

Can you make it sound just like Titus Moody? I want to hear your voice assistant say "No sir, I don't hold with furniture that talks."

voidUpdate9d ago

(Yes, I appreciate that some people may be disabled in such a way that it makes sense to use voice assistants, eg motor problems)

hamdingers9d ago

I consider each time I need to pull out my phone and "do it myself" to be a failure of my smart home system.

voidUpdate9d ago

Do you not just turn the light on when you go in a room, and turn it off again when you go out? All the rooms in my flat have switches next to the door

mlrtime9d ago

Do you have a wife / kids? If so how do you "teach" them this?

My point being that it might be a failure to you but not them, some people don't want it.

This is my struggle, how to get the automation to do what I want without affecting everyone else equally. (And vise versa)

freeone30009d ago

I use it frequently for reminders and calendar events when not at a computer, as voice is faster than the mobile interface (with so many screens) for setting something up

Insanity9d ago

nickthegreek9d ago

RankingMember9d ago

I love it for lists- like my hands are full making something in the kitchen and I can just tell it to add things to my grocery list as soon as I notice I'm out of something.

lordmathis9d ago

jlokier9d ago

phatskat9d ago

pedro_caetano9d ago

Not voice assistants but for anything that falls into the body of text category (emails, letters, documentaton) I just use Dragon NaturallySpeaking, mainly to give myself an RSI break from typing.

A Radiologist friend of mine convinced me to give it a try, apparently radiology reports are dictated in most places nowadays

I think the main frustration is often speed and precision but with modern dictation software it is pretty flawless.

konart7d ago

Voice assistants can be great if you have kids. Sometimes your hands are full in more way than one.

Same for different scenarios when you don't want to use your hands (say you are replanting a flower or something).

harrall9d ago

I would, if they worked even 90%.

I mostly set timers because it’s one of the few things that always works.

nancyminusone9d ago

I don't. I pretty much don't like talking in general, especially if I'm alone. Accordingly, no voice assistants; I don't think I've ever triggered one except accidentally.

yanis_t9d ago

I'm still waiting till the promise of voice AI that was showed during the OpenAI demo in 2024 turn real somehow. It's not clear to me, why there has been zero progress since then.

j459d ago

What tech can do vs applying it requires it often to be configured and packaged to be usable in that way.

phito9d ago

It also needs to work at least 99% of the time if not more. Not easy to do this with indeterministic models.

dewey9d ago

Their first version is most likely already 10x better than Siri.

alex_young9d ago

One of my favorite episodes:

I set 2 timers for the same thing somehow. I then tried to cancel one of them.

  >“Siri, cancel the second timer”
  “You have 2 timers running, would you like me to cancel one of them?”
  >“Yes”
  “Yes is an English rock band from the 70s…”
  >“Siri, please cancel the timer with 2 minutes and 10 seconds on it”
  “Would you like me to cancel the timer with 2 minutes and 8 seconds on it?”
  >“Yes”
  “Yes is an English rock band from the 70s…”

Eventually they both rang and she listened when I said stop.

abroadwin9d ago

My favorite is when I ask Siri to set a timer and get back "there are no timers running."