But one casual experiment is not proof. I assume the implications for battery life alone would make this impractical.
While the article describes a 'trigger to activate' scenario, I guess the DSP could also be configured to output data at an earlier stage in the chain for post-processing on a server. The volume of data would be orders of magnitude lower than sending raw waveforms. Of course, this would introduce the need for some memory for buffering, which eats into the power budget, but having read about this device, I'm pretty convinced that it's technically feasible.
Whether or not phones actually do this at a low level is another matter. It would be an interesting experiment to graph the current draw from a sleeping phone's battery while conversations were being spoken into the mic.
[1] https://www.sensorsmag.com/components/mic-hears-all-all-time
I believe any microphone-authorized app (e.g.: Facebook) can do it, preferably when the phone sits on the table. "Buy", "Need", "Wish" are good keywords when they are followed by another word that sells easily : "shirt, phone, watch, ..."
I suggest trying out state-of-the-art voice assistants like Google Home and Amazon Echo. These devices have had enormous engineering and computer science resources thrown at them. They're able to do stuff like play a song from some artist or genre off YouTube or Spotify. Maybe. Sometimes. You'll find that even in this scenario that they have explicitly optimized for, you have to repeat yourself sometimes. I'm talking about a tiny domain here - playing music from artists - that the human is intentionally trying to hit.
The idea that the current state of Machine Learning can somehow take an arbitrary, open sentences like "I’m thinking about going back to uni" to lead to appropriate ad targeting for enrolling in college courses just doesn't align with the reality of Machine Learning.
Maybe you think you don't actually need great Machine Learning. You could just go with very rough categorizations such as...detecting the word "uni" appeared in the sentence so bucket them into the "uni" ad category for ad targeting! But then they would also bucket "I hated uni" and "oh yeah, season 2 takes place in uni" and "I'm driving past uni" all into that category. Ad targeting relevance would be diluted dramatically. Where is the financial incentive to do such terrible ad targeting?
tl;dr ML just isn't there yet. Also it requires monumental leaps in Machine Learning for Facebook to be financially incentivized to do this; ad targeting would suffer otherwise.
The “security consultant” claims to know that there’s clips of audio being sent back to servers, but not knowing what that audio is since it’s encrypted. First off, if the content is encrypted, you don’t know that it’s audio. Secondly, if you had proof of a major app unexpectedly sending back recorded audio clips, it would be huge news, so I’d assume he doesn’t have that.
Apps do not have access to always on microphone access by default as insinuated by the article. On Android, until P comes out, apps can access the mic in the background if you give microphone access to the app. On iOS, app microphone access triggers the microphone symbol in the top left. These are the apps explicitly asking to take microphone access though, not some sort of listener pattern where apps can attach to the OS’s always on microphone behavior.
Experiment is clearly flawed for numerous reasons (no starting recorded state, didn’t record behavior on FB/other sites that could have impacted results during expt, etc.)
Have been expecting some research to come out from either a security firm, university researchers, or other to verify. Don't think it would cost too much to run, hoping something comes out soon.