While it’s true that FHE schemes continue to get faster, they don’t really have hope of being comparable to plaintext speeds as long as they rely on bootstrapping. For deep, fundamental reasons, bootstrapping isn’t likely to ever be less than ~1000x overhead.
When folks realized they couldn’t speed up bootstrapping much more, they started talking about hardware acceleration, but it’s a tough sell at time when every last drop of compute is going into LLMs. What $/token cost increase would folks pay for computation under FHE? Unless it’s >1000x, it’s really pretty grim.
For anything like private LLM inference, confidential computing approaches are really the only feasible option. I don’t like trusting hardware, but it’s the best we’ve got!
A critical example is database search: searching through a database on n elements is normally done in O(log n), but it becomes O(n) when the search key is encrypted. This means that fully homomorphic Google search is fundamentally impractical, although the same cannot be said of fully homomorphic DNN inference.
To pick one out of a dozen possible examples: I regularly read 500 word news articles from 8mb web pages with autoplaying videos, analytics beacons, and JS sludge.
That’s about 3 orders of magnitude for data and 4-5 orders of magnitude for compute.
For the equivalent of $500 in credit you could self host the entire thing!
Something that normally takes 30 seconds now takes over 8 hours.
the parent post is right, confidential compute is really what we've got.
This is a large part of why you have to convince people to hide things even if "they have nothing to hide."
1) if you are a registered broker dealer you will just incur a massive amount of additional regulatory burden if you want to host this stuff in any sort of "random server"
2) Whoever you are, you need the pipe from your server to the exchange to be trustworthy, so no-one can MITM your connection and front-run your (client's) orders.
3) This is an industry where when people host servers in something like an exchange data center it's reasonably common to put them in a locked cage to ensure physical security. No-one is going to host on a server that could be physically compromised. Remember that big money is at stake and data center staff typically aren't well paid (compared to someone working for an IB or hedge fund), so social engineering would be very effective if someone wanted to compromise your servers.
4)Even if you are able to overcome #1 and are very confident about #2 and #3, even for slow market participants you need to have predictable latency in your execution or you will be eaten for breakfast by the fast players[1]. You won't want to be on a random server controlled by anyone else in case they suddenly do something that affects your latency.
[1] For example, we used to have quite slow execution ability compared with HFTs and people who were co-located at exchanges, so we used to introduce delays when we routed orders to multiple exchanges so the orders would arrive at their destinations at precisely the same time. Even though our execution latency was high, this meant no-one who was colocated at the exchange could see the order at one exchange and arb us at another exchange.
Your GPU was, and we also used to have dedicated math coprocessor accelerators. Now most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task.
Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.
I'd argue the only AI accelerators are something like what goes into modern SXM (sockets). This ditches the power issues and opens up more bandwidth. However only servers have the sxm sockets....and those are not cheap.
I think one reason they can be as good as or better than dedicated silicon is that they can be adjusted on the fly. If a hardware bug is found in your network chip, too bad. If one is found in your software emulation of a network chip, you can update it easily. What if a new network protocol comes along?
Don't forget the design, verification, mask production, and other one-time costs of making a new type of chip are immense ($millions at least).
> Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.
I think you may have the wrong impression of what modern GPUs are like. They may be descended from graphics cards (as in graphics ), but today they are designed fully with the AI market in mind. And they are design to strike an optional balance between fixed functionality for super-efficient calculations that we believe AI will always need, and programmability to allow innovation in algorithms. Anything more fixed would be unviable immediately because AI would have moved on by the time it could hit the market (and anything less fixed would be too slow).
- FHE for classic key-value stores and simple SQL database tables?
- the author's argument that FHE is experiencing accelerated Moore's law, and therefore will close 1000x gap quickly?
Thx!
I come from a values basis that privacy is a human right, and governments should be extremely limited in retailiatory powers against a just and democratic usage of powers against them. (things like voting, arts, media, free speech etc)
So unless Google lets me encrypt their entire search index, they can still see my query at the time it interacts with the index, or else they cannot fulfill it.
The other point is incentives: outside of some very few, high-trust high-stakes applications, I don't see why companies would go through the trouble and FHE services.
So that basically means that if a company has data that my program might want to use, the entirety of that data needs to be loaded into my program. Not quite feasible for something like the Google search index, which (afaik) doesn't even fit onto a single machine.
Also, while Google is fine with us doing searches, making the whole search index available to a homomorphic encrypted program is probably a quite different beast.
What I don't think they necessarily appreciate is how expensive that would be, and consequently how few people would sign up.
I'm not even assuming that the compute cost would be higher than currently. Let's leave aside the expected multiples in compute cost - although they won't help.
Assume, for example, a privacy-first Google replacement. What does that cost? (Google revenue is a good place to start that Calc.) Even if it was say $100 a year (hint; it's not) how many users would sign up for that? Some sure, but a long long way away from a noticeable percentage.
Once we start adding zeros to that number (to cover the additional compute cost) it gets even lower.
While imperfect, things like Tor provide most of the benefit, and cost nothing. As an alternative it's an option.
I'm not saying that HE is useless. I'm saying it'll need to be paid for, and the numbers that will pay to play will be tiny.
The key question I think is how much computing speed will improve in the future. If we assume FHE will take 1000x more time, but hardware also becomes 1000x faster, then the FHE performance will be similar to today's plaintext speed.
Predicting the future is impossible, but as software improves and hardware becoming faster and cheaper every year, and as FHE provides a unique value of privacy, it's plausible that at some point it can become the default (if not 10 years, maybe in 50 years).
Today's hardware is many orders of magnitudes faster compared to 50 years ago.
There are of course other issues too. Like ciphertext size being much larger than plaintext, and requirement of encrypting whole models or indexes per client on the server side.
FHE is not practical for most things yet, but its venn diagram of feasible applications will only grow. And I believe there will be a time in the future that its venn diagram covers search engines and LLMs.
I've been building and promoting digital signatures for years. Its bad for people and market-dynamics to have Hacker News or Facebook be the grand arbiter of everyone's identity in a community.
Yet here we are because its just that much simpler to build and use it this way, which gets them more users and money which snowballs until alternatives dont matter.
In the same vein, the idea that FHE is a missing piece many people want is wrong. Everything is still almost all run on trust, and that works well enough that very few use cases want the complexity cost - regardless of operation overhead - to consider FHE.
I agree with this wholeheartedly, and yet I do get the following question a lot "What's all that nonsense at the end of your emails". Any explanation is met with eye-rolls and 1000 yard stares. Have you managed to get laypeople on-board with any kind of client-side cryptography? how?
FHE + AI might be the killer combination, the latter sharing the complexity burden.
The articles is about getting an input encrypted with key k, processing it without decrypting it, and sending back an output that is encrypted with key k, too. Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query (which could be encrypted with key k) and a multi-terabyte database of pre-digested information that's Google's whole selling point, and there's no way this database could be encrypted with key k.
In other words this technique can be used when you have the complete control of all the inputs, and are renting the compute power from a remote host.
Not saying it's not interesting, but the reference to Google can be misunderstood.
That’s not the understanding I got from Apple’s CallerID example[0][1]. They don’t seem to be making an encrypted copy of their entire database for each user.
[0]: https://machinelearning.apple.com/research/homomorphic-encry...
[1]: https://machinelearning.apple.com/research/wally-search
Moreover, even if the details were slightly different, a scheme that reveals absolutely no information about the query while interacting with a database always needs to do a full scan. If some parts remain unread depending on the query, this tells you what the query wasn't. If you're okay with revealing some information, you can also hash the query and take a short prefix of the hash with many colliders, then only scan values with the same hash prefix. This is how browsers typically do safe browsing lookups, but by downloading that subset of the database instead of doing the comparison homomorphically on the server.
Consider the following (very weak) encryption scheme:
m, k ∈ Z[p], E(m) = m * k mod p, D(c) = c * k⁻¹ mod p
With this, I can implement a service that receives two cyphertexts and computes their encrypted sum, without knowledge of the key k:
E(x) + E(y) = x * k + y * k mod p = (x + y) * k mod p = E(x + y)
Of course, such a service is not too interesting, but if you could devise an algebraic structure that supported sufficiently complex operations on cyphertexts (and with a stronger encryption), then by composing these operations one could implement arbitrarily complex computations.
Why do people always do this thing where they think inventing a technology has somehow changed economics? I think the implications are very small. There is value in people's user data and people are very eager to barter that value against cheaper services, we can tell because people continue to vote with their wallets and feet.
You could already encrypt or offer zero retention policies on large amounts of internet businesses and every major company has competitors that do, but they exist on the margins because most people don't take that deal.
I don't think FHE is the solution to PIR but it might well form a part of it when combined with more practical approaches.
However, where FHE will shine already is in specific high-value, high consequence and high confidentiality applies, but relatively low complexity computational calculations. Smart contracts, banking, potentially medical have lots of these usecases. And the curve of Moore's law + software optimizations are now starting to finally bend into the zone of practicality for some of these.
See what Zama https://www.zama.ai/ is doing, both on the hardware as well as the devtools for FHE.
> An old car needs to go up and down a hill. In the first mile–the ascent–the car can only average 15 miles per hour (mph). The car then goes 1 mile down the hill. How fast must the car go down the hill in order to average 30 mph for the entire 2 mile trip?
Past improvement is no indicator of future possibility, given that each improvement was not re-application of the same solution as before. These are algorithms, not simple physical processes shrinking.
To cover the full 2 miles at an average of 30mph, we need to complete the entire journey in 4 minutes, leaving 225 seconds for the ascent.
We know that the old car was averaging 15 miles per hour, but the speedo on an old car is likely inaccurate, and we only need to assume a 6% margin of error for the car to show 15 miles per hour and cover the mile in 225 seconds. You probably couldn't even tell the difference between 15 and 16 on the speed anyway, but let's say that we also fitted out the car with brand new tyres (so the outer circumference will be more than old worn tyres), and it's entirely possible.
So, let's say 240mph. That's the average speed of our mile freefall in 15 seconds.
To average 30mph over 2 miles, you need to complete those 2 miles in 4 minutes.
But travelling the first mile at 15mph means that took 4 minutes. So from that point the only way to do a second mile and bring your average to 30mph is to teleport it in 0 seconds.
(Doing the second mile at 41mph would give you an average speed of just under 22mph for the two miles.)
This is fascinating. Could someone ELI5 how computation can work using encrypted data?
And does "computation" apply to ordinary internet transactions like when using a REST API, for example?
If we could find some kind of function “e” that preserves the underlying structure even when the data is encrypted you have the outline of a homomorphic system. E.g. if the following happens:
e(2,k)*e(m,k) = e(2m,k)
Here we multiplied our message with 2 even in its encrypted form. The important thing is that every computation must produce something that looks random, but once decrypted it should have preserved the actual computation that happened.
It’s been a while since I did crypto, so google might be your friend here; but there are situations when e.g RSA preserves multiplication, making it partially homomorphic.
But isn't such a function a weakened form of encryption? Properly encrypted data should be indistinguishable from noise. "Preserving underlying structure" seems to me to be in opposition to the goal of encryption.
other ones I imagine behave kinda like translating, stretching, or skewing a polynomial or a donut/torus, such that the point/intercepts are still solveable, still unknown to an observer, and actually represent the correct mathematical value of the operation.
just means you treat the []byte value with special rules
Let's assume they can train the LLMs over encrypted data, what if a large number of users inject some crappy data (like it has been seen with the Tay chatbot story). How can the companies still keep a way to clean the data?
Yes but then the model becomes encrypted.
IMO ML training is not a realistic application for FHE, but things like federated training would be the way to do that privately enough.
> If FHE is a possible option, people and institutions will demand it.
I don't think that privacy is a technical problem. To take the article's example, why would Google allow you to search without spying on you? Why would chatgpt discard your training data?
GPG has been around for decades. You can relatively easily add a plug-in to use it on top of gmail. Surely the protocol is not perfect, but could have been made better much more easily than it is to improve HPE, since a lot of its clunkiness can be corrected by UX. But people never cared enough that everything they write is read by Google to encrypt it. And since Google loves reading what you write, they'll never introduce something like HPE without overwhelming adoption and requirements by others.
I assume that doesn't happen? Can someone ELI5 please?
If your encryption scheme satisfies this, there are no patterns for the LLM to learn: if you only know the ciphertext but not the key, every continuation of the plaintext should be equally likely, so trying to learn the encryption scheme from examples is effectively trying to predict the next lottery numbers.
This is why FHE for ML schemes [1] don't try to make ML models work directly on encrypted data, but rather try to package ML models so they can run inside an FHE context.
[1] It's not for language models, but I like Microsoft's CryptoNets - https://www.microsoft.com/en-us/research/wp-content/uploads/... - as a more straightforward example of how FHE for ML looks in practice
How do you train a model when the input has no apparent correlation to the output ?
The problem is that the internet is a centralized system practically even though it is decentralized and some are fighting to keep it free.
Fight for decentralization instead, it will remove the need for unnecessary security and reduce the compute cost significantly.
This is far too optimistic. Just because you can build a system that doesn't harvest data, doesn't necessarily mean it's a profitable business model. I'm sure many of us here would be willing to pay for a FHE search engine, for example, but we're a minority.
It's idealistic to think this could solve data braches because businesses knowing who their customers are is such a fundamental concept.
I beg your unbelievable pardon, but no? This part of the equation is not addressed in the article, but it is by far and away the biggest missing piece for there to be any hope of FHE seeing widespread adoption.
Given that cryptography experts seem to be asserting otherwise, I assume that there's something important that I'm not understanding here.
unless replacement services are offered and adopted en masse (they wont be, u cant market against companies who can throw billions at breaking you), those giants wont give away their main source of revenue...
so even if technical challenges are overcome, there are more human and political challenges which will likely be even harder to crack...
Since lots of functions behave in this way in relation to sums and products, you "just" need to find ones that are hard to reverse so they can be used for encryption as well.
Unfortunately this turns out to not work so simply. In reality, they needed to find different functions FHESum and FHEMultiply, that are actually much harder to compute (1000x more CPU than the equivalent "plaintext" function is a low estimate of the overhead) but that guarantee the above.
read the article again
anyway, making the computer do the calculation is one thing, getting it to spew the correct data is another.... But still, the article (which seems great at the moment) brushes it of a bit too quickly.
Imagine your device sending Google an encrypted query and getting back the exact results it wanted — without you having any way of knowing what that query was or what result they returned. The technique to do that is called Fully Homomorphic Encryption (FHE).