Private Inference: https://confer.to/blog/2026/01/private-inference/
This is similar to the weasely language Google is now using with the Magic Cue feature ever since Android 16 QPR 1. When it launched, it was local only -- now it's local and in the cloud "with attestation". I don't like this trend and I don't think I'll be using such products
Think PCs in 5y to 10y that can run SoTA multi-modal LLMs (cf Mac Pro) will cost as much as cars do, and I reckon folks will buy it.
Lots of ifs there, though. I do trust Moxie in terms of execution though. Doesn’t seem like the type of person to take half measures.
This is the key question.
What makes it so strange is such an execution environment would have clear applications outside of AI usage.
Sure, for e.g. E2E email, the expectation is that all the computation occurs on the client, and the server is a dumb store of opaque encrypted stuff.
In a traditional E2E chat app, on the other hand, you've still got a backend service acting as a dumb pipe, that shouldn't have the keys to decrypt traffic flowing through it; but you've also got multiple clients — not just your own that share your keybag, but the clients of other users you're communicating with. "E2E" in the context of a chat app, means "messages are encrypted within your client; messages can then only be decrypted within the destination client(s) [i.e. the client(s) of the user(s) in the message thread with you.]"
"E2E AI chat" would be E2E chat, with an LLM. The LLM is the other user in the chat thread with you; and this other user has its own distinct set of devices that it must interact through (because those devices are within the security boundary of its inference infrastructure.) So messages must decrypt on the LLM's side for it to read and reply to, just as they must decrypt on another human user's side for them to read and reply to. The LLM isn't the backend here; the chat servers acting as a "pipe" are the backend, while the LLM is on the same level of the network diagram as the user is.
Let's consider the trivial version of an "E2E AI chat" design, where you physically control and possess the inference infrastructure. The LLM infra is e.g. your home workstation with some beefy GPUs in it. In this version, you can just run Signal on the same workstation, and connect it to the locally-running inference model as an MCP server. Then all your other devices gain the ability to "E2E AI chat" with the agent that resides in your workstation.
The design question, being addressed by Moxie here, is what happens in the non-trivial case, when you aren't in physical possession of any inference infrastructure.
Which is obviously the applicable case to solve for most people, 100% of the time, since most people don't own and won't ever own fancy GPU workstations.
But, perhaps more interesting for us tech-heads that do consider buying such hardware, and would like to solve problems by designing architectures that make use of it... the same design question still pertains, at least somewhat, even when you do "own" the infra; just as long as you aren't in 100% continuous physical possession of it.
You would still want attestation (and whatever else is required here) even for an agent installed on your home workstation, so long as you're planning to ever communicate with it through your little chat gateway when you're not at home. (Which, I mean... why else would you bother with setting up an "E2E AI chat" in the first place, if not to be able to do that?)
Consider: your local flavor of state spooks could wait for you to leave your house; slip in and install a rootkit that directly reads from the inference backend's memory; and then disappear into the night before you get home. And, no matter how highly you presume your abilities to detect that your home has been intruded into / your computer has been modified / etc once you have physical access to those things again... you'd still want to be able to detect a compromise of your machine even before you get home, so that you'll know to avoid speaking to your agent (and thereby the nearby wiretap van) until then.
It's like, come on you know exactly what you're doing, it's unambiguous how people will interpret this, so just stop it. Cue everyone arguing over the minutiae while hardly anyone points out how troubling it is that these people/entities have no concerns with being so misleading/dishonest...
Edit: I'm a little weary to find there is convenient import but not export functionalities. I manually copied the conversation into a markdown file <https://gist.github.com/Gravifer/1051580562150ce7751146be0c9...>
Inevitably, the TEE hardware vendor must be trusted. I don't think this is a bad assumption in today's world, but this is still a fairly new domain and longer term it becomes increasingly likely TEE compromises like design flaws, microcode bugs, key compromises, etc. are discovered (if they haven't already been!) Then we'd need to consider how Confer would handle these and what sort of "break glass" protocols are in place.
This also requires a non-trivial amount of client side coordination and guards against any supply chain attacks. Setting aside the details of how this is done, even with a transparency log, the client must trust something about “who is allowed to publish acceptable releases”. If the client trusts “anything in the log,” an attacker could publish their own signed artifacts, So the client must effectively trust a specific publisher identity/key, plus the log’s append-only/auditable property to prevent silent targeted swaps.
The net result is a need to trust Confer's identity and published releases, at least in the short term as 3rd party auditors could flag any issues in reproducible builds. As I see it, the game theory would suggest Confer remains honest, Moxie's reputation plays are fairly large role in this.
"This application requires passkey with PRF extension support for secure encryption key storage. Your browser or device doesn't support these advanced features.Please use Chrome 116+, Firefox 139+, or Edge 141+ on a device with platform authentication (Face ID, Touch ID, Windows Hello, etc.)."
We are allowed into the blog though! https://confer.to/blog/
> Your authenticator doesn't support encryption keys. Please try again using 1Password — some password managers like Bitwarden don't work yet.
I mean, e2ee is great and welcome, of course. That's a wonderful thing. But I need more.
> LLMs are fundamentally stateless—input in, output out—which makes them ideal for this environment. For Confer, we run inference inside a confidential VM. Your prompts are encrypted from your device directly into the TEE using Noise Pipes, processed there, and responses are encrypted back. The host never sees plaintext.
I don’t know what model they’re using, but it looks like everything should be staying on their servers, not going back to, eg, OpenAI or Anthropic.
Even so, you're still exposing your data to Confer, and so you have to trust them that they'll behave as you want. That's a security problem that Confer doesn't help with.
I'm not saying Confer isn't useful, though. e2ee is very useful. But it isn't enough to make me feel comfortable.
(It got submitted a few times but did not get any comments - might as well consolidate these threads)
Using remote attestation in the browser to attest the server rather than the client is refreshing.
Using passkeys to encrypt data does limit browser/hardware combinations, though. My Firefox+Bitwarden setup doesn't work with this, unfortunately. Firefox on Android also seems to be broken, but Chrome on Android works well at least.
Few in this world have done as much for privacy as the people who built Signal. Yes, it’s not perfect, but building security systems with good UX is hard. There are all sorts of tradeoffs and sacrifices one needs to make.
For those interested in the underlying technology, they’re basically combining reproducible builds, remote attestation, and transparency logs. They’re doing the same thing that Apple Private Cloud Compute is doing, and a few others. I call it system transparency, or runtime transparency. Here’s a lighting talk I did last year: https://youtu.be/Lo0gxBWwwQE
Signal's achievement is that it's very private while being extremely usable (it just works). Under that lens, I don't think it could be improved much.
Exactly. Plus it basically pioneered the multi-device E2EE. E.g., Telegram claimed defaulting to E2EE would kill multi-client support:
"Unlike WhatsApp, we can allow our users to access their Telegram message history from several devices at once thanks to our built-in instant cloud sync"
https://web.archive.org/web/20200226124508/https://tgraph.io...
Signal just did it, and in a fantastic way given that there's no cross device key verification hassle or anything. And Telegram never caught up.
Perhaps manual, user-controlled updates is not part of the design
If the source code is available^1 then surely someone has modified it to remove the phone number requirement, not to mention other improvements
1. https://github.com/signalapp/Signal-Server
It seems like Signal may be another example of "read-only" open source, where there is no expectation anyone will actually try to _use_ the source code. Instead, there is an expectation that everyone will use binaries distributed by a third party and allow remote code installation and RCE of software on their computers _at the third party's discretion_. In other words, all users will cede control to a third party
NB. This comment is not referring to the "Signal protocol". It pertains to _control_ over the software that implements it
It actually is the least bad option available, and decentralization is always worth it even if development is slower and more complex as a consequence.
>broken SGX metadata protections
Citation needed. Also, SGX is just there to try to verify what the server is doing, including that the server isn't collecting metadata. The real talking is done by the responses to warrants https://signal.org/bigbrother/ where they've been able to hand over only two timestamps of when the user created their account and when they were last seen. If that's not good enough for you, you're better off using Tor-p2p messengers that don't have servers collecting your metadata at all, such as Cwtch or Quiet.
>weak supply chain integrity
You can download the app as an .apk from their website if you don't trust Google Play Store.
>a mandate everyone supply their phone numbers
That's how you combat spam. It sucks but there are very few options outside the corner of Zooko's triangle that has your username look like "4sci35xrhp2d45gbm3qpta7ogfedonuw2mucmc36jxemucd7fmgzj3ad".
>and agree to Apple or Google terms of service to use it?
Yeah that's what happens when you create a phone app for the masses.
ChatGPT already knows more about me than Google did before LLMs, but would I switch to inferior models to preserve privacy? Hard tradeoff.
The entire point of E2EE is that both "ends" need to be fully under your control.
From Wikipedia: "End-to-end encryption (E2EE) is a method of implementing a secure communication system where only the sender and intended recipient can read the messages."
Both ends do not need to be under your control for E2EE.
"Confer - Truly private AI. Your space to think."
"Your Data Remains Yours, Never trained on. Never sold. Never shared. Nobody can access it but you."
"Continue With Google"
Make of that what you will.
Usually in a context where a cypherpunk deploys E2EE it means only the intended parties have access to plaintexts. And when it's you having chat with a server it's like cloud backups, the data must be encrypted by the time it leaves your device, and decrypted only once it has reached your device again. For remote computing, that would require LLM handles ciphertexts only, basically, fully homomorphic encryption (FHE). If it's that, then sure, shut up and take my money, but AFAIK the science of FHE isn't nearly there yet.
So the only alternative I can see here is SGX where client verifies what the server is doing with the data. That probably works against surveillance capitalism, hostile takeover etc., but it is also US NOBUS backdoor. Intel is a PRISM partner after all, and who knows if national security requests allow compelling SGX keys. USG did go after Lavabit RSA keys after all.
So I'd really want to see this either explained, or conveyed in the product's threat model documentation, and see that threat model offered on the front page of the project. Security is about knowing the limits of the privacy design so that the user can make an informed decision.
It wouldn't be long until Google and Gemini can read this information and Google knows you are using Confer.
Wouldn't trust it regardless if Email is available.
The fact that confer allows Google login shows that Confer doesn't care about users privacy.
The web app itself feels poorly made—almost vibe-coded in places: nonsensical gradients, UI elements rendering in flashes of white, and subtly off margins and padding.
The model itself is unknown, but speaks with the cadence reminiscent of GPT-4o.
I'm no expert, but calling this "end-to-end encrypted" is only accurate if one end is your client and the other is a very much interposable GPU (assuming vendor’s TEE actually works—something that, in light of tee.fail, feels rather optimistic).
Thank you! :)
> .. assuming vendor’s TEE actually works
For sure TEEs have a rich history of vulnerabilities and nuanced limitations in their threat models. As a concept however, it is really powerful, and implementers will likely get things more and more right.
As for GPUs, some of Nvidia’s hardware does support remote attestation.
I see references to vLLM in the GitHub but not which actual model (Llama, Mistral, etc.) or if they have a custom fine tune, or you give your own huggingface link?
> This application requires passkey with PRF extension support for secure encryption key storage. Your browser or device doesn't support these advanced features.
> Please use Chrome 116+, Firefox 139+, or Edge 141+ on a device with platform authentication (Face ID, Touch ID, Windows Hello, etc.).
(Running Chrome 143)
So... does this just not support desktops without overpriced webcams, or am I missing something?
My usage of it would be quite different than ChatGPT. I’d be much freer in what I ask it.
I think there’s a real opportunity for something like this. I would have thought Apple would have created it but they just announced they’ll use Gemini.
Awesome launch Moxie!
Also fwiw I think tees and remote attestation are a pretty pragmatic solution here that meaningfully improves on the current state of the art for llm inference and I'm happy to see it.
>Data and conversations originating from users and the resulting responses from the LLMs are encrypted in a trusted execution environment (TEE) that prevents even server administrators from peeking at or tampering with them.
I think what they meant to say is that data is decrypted only in a trusted execution environment, and otherwise is stored/transmitted in an encrypted format.
Now, of course, it is in question as to whether my little graphics card can reasonably compare to a bigger cloud thing (and for me presently a very genuine question) but that really should be the gold standard here.
Like when someone sends me a message, I made something that categorises it for urgency. If I'd use cloud it means they get a copy of all those messages. But locally there's no issue and complexity wise it's pretty low for an LLM.
Things like research jobs I do do in cloud, but they don't really contain any personal content, they just research using sources they already have access to anyway. Same with programming, there's nothing really sensitive in there.
https://developer.nvidia.com/blog/confidential-computing-on-...
edit @ -4 points: please go ahead and explain why does Signal need your phone number and reject third party clients.
Same goes for Whatsapp, but the marketing is different there.
Or is your problem that your peer might run the app on an insecure device? How would you exclude decade old Android devices with unpatched holes? I don't want to argue nirvana fallacy here but what is the solution you'd like to propose?
Also while we would expect heavy promotion for a trapped app from some agency it's also a very reasonable situation for a protocol/app that actually was secure.
You can of course never be sure but the fact that it's heavily promoted/used by people on both the whistleblowers, large corporations and multiple different National Officials at the same time is probably the best trustworthyness signal we can ever get for something like this.
(if all of these can trust it somewhaat it has to be a ridiculously deep conspiracy to not have leaked at least to some national security agency and forbidden to use(
Kind of because Whatsapp adopted Signal's E2EE... And not even that long ago!
To be fair, that is largely because WhatsApp partnered with Open Whisper to bring the Signal protocol into Whatsapp. So effectively, you're saying "Signal-the-app is hardly more private than another app that shares Signal-the-protocol".
In practical terms, the only way for Signal to be significantly more private than WhatsApp is if WhatsApp were deliberately breaking privacy through some alternative channel (e.g. exfiltrating messages through a separate connection to Meta).