"In use" implies that you have a need to process that data. It doesn't matter if the end client is submitting queries in plain text (protected in transit) or this fancy encryption, the client (or server) still needs to be authorized to query that data. Translating from plain-text to encryption does not add additional protections from a compliance perspective.
At an organizational level, it's extremely hard to control what information get put into a SaaS. There are far too many ways in which data can be de-anonymized or inferred against (e.g. a field existing can have privacy implications).
It's far safer to use a SaaS provider that meets general control requirements than to try to shoe-horn encrypted data into them.
It's not just the query that is encrypted in this case, but the data being queried. From MongoDB's description, the server never receives or stores plaintext data, and the query results can only be decrypted by a client who has the same key that was used to encrypt the data in the first place. From a compliance perspective, that's amazing if it works. It means the server is never storing or processing anything but ciphertext.
In the most extreme cases, the unencrypted values never leave the client. The database can concentrate on delivering storage and fast query answers without paying much attention to issues of security. Clients don't need to trust the database because they control the encryption.
Just as a dumb example; an auditor says passwords need to be hashed with bcrypt. They find a code sample that says "store(bcrypt(password))". Awesome; complied to a T. But true security goes beyond that: are we using a library for bcrypt, or an internal implementation? Is the internal implementation well-implemented? Is the library free of CVEs (maybe they check that)? Did we trace that call to ensure the data generated is what is inserted to the db, or was it intercepted by some middleware? Did we name that function 'bcrypt' but its actually just MD5?
My point is really not to assert that auditing is pointless, but rather its fundamentally limited in what kind of attestations it can make.
One great example I can pull from a few recent audits I've been through: serverless tech like Fargate. This oftentimes blows auditors away (or, rather, it used to; nowadays they've seen it so often that they just know). It checks so many boxes. They'll present multi-page forms about data center colos and operating system security and operator SSH access and we'll say "We use Fargate". "Oh nice, ok we can check all of these and carve out with AWS's attestation for (ComplianceFrameworkX)". It saves hours, days, of time.
That's, I think, where homomorphic encryption can go. That isn't what this is, but it's a step toward that. It's not about meeting today's compliance frameworks; it's about evolving the framework. And, in the interim, as advanced R&D teams meet these auditors, they'll educate-up how, yeah, you've got a lot of questions here, but its not that we do or don't meet them: its that they're fundamentally the wrong questions to ask; but we understand the spirit, here's how we meet the spirit, and here's how we're actually better than if we had just checked Yes on all of them.
Third example: years ago, our team was the first time our auditor had ever seen LetsEncrypt and k8s certificate-manager (then it was called kube-lego). He wanted an attestation that TLS certificates were current and not near-expiration. We countered: they can't be near-expiration, because we have automated systems which renew them. He'd never seen anything like it; he was used to expensive certificates and operations runbooks for renewal; and we nerded out for ten minutes showing it all off. Instead of documenting a runbook for renewing certificates, he documented our runbook for maintaining this automated service and ensuring uptime. Win-win.
Its a slow process, and its made even slower because there are tons of people in the industry who treat the frameworks as gospel. But, ultimately; we control the technology, not them. We decide what is secure; they just attest to it and double-check.
CipherStash works with any Database and also supports Range queries and sorting/ordering. We do it in the application layer. Only supports Ruby so far but C#, Java, Python, Rust are in the works.
It says it will support prefix search, substring search, and the like. Can anyone point me in the right direction on what the algorithm may be here? I don't get how you could do those things without making the encryption less secure and/or decrypting every record the fly.
Another interesting use case I found that isn't mentioned here is sort. I've had customers ask me to be able to sort the results by PII and we tell them... no, we can't do that because the field is encrypted.
MongoDB is very short on details, and I suspect they do something worse than homomorphic encryption, that does indeed make some kind of compromise between privacy and convenience.
CryptDB: Processing Queries on an Encrypted Database
E.g. googling i found http://cs.brown.edu/people/seny/pubs/edb.pdf
[0]: https://en.wikipedia.org/wiki/Column_Level_Encryption
[1]: https://github.com/bincyber/go-sqlcrypter
[2]: https://www.vaultproject.io/docs/secrets/transit#convergent-...
"Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz"
Some related papers with those two as authors:
Unfortunately I don't seem too be able to find this again, but a quick search turned out two papers that say that just encrypting your db isn't enough: [0], [1]. In particualr [1] doesn't seem to go into the details of how you could recover the data, but mentions that many operations as performed by "normal" databases leak information if performed over encrypted data. Maybe someone that is more familiar with Queryable Encryption can comment on this?
[0] https://www.cs.cornell.edu/~shmat/shmat_hotos17.pdf [1] https://www.microsoft.com/en-us/research/wp-content/uploads/...
This is obviously not that. They're encrypting locally. However, Simon Oya & Dr. Kerschbaum's paper, https://arxiv.org/abs/2010.03465, demonstrate a fantastic efficient attack to recover keywords on most constructions without a lot of queries. It is yet to be seen how effective MongoDB's implementation will be.
This is a very interesting space but structural encryption is the right way to put the theory into good use.
Most of the other encryption mechanisms such as homomorphic, partially homomorphic, etc. are just too impractical or require very specific niche use cases to be useful.
There are other misnamed technology I've seen in marketing such as "polymorphic encryption" or "vaultless" - but most of these haven't had real research or cryptanalysis behind it.
[0] https://info.ionic.com/hubfs/IonicDotCom/Resources/Assets/Se... [1] https://eprint.iacr.org/2017/111.pdf
(their pagination is implemented just by increasing the limit parameter).
It's also difficult to see how this could work on the server side without exposing some information about the encrypted fields. For example, if all documents have a value that begins with "a", then there must exist a prefix query that matches all those documents. I would expect it to be possible to figure out whether such a query is possible or not, only given access to the encrypted data, but even if that's not possible, the simple fact that a prefix query was issued that matched all documents gives away that information.
For something like, HIPAA, this ads very little value if fields are semi-known.
Is it really all client side? How could they do things like substring matching without sending the entire index back and forth to the client? The graphic seems to show the query being executed solely on the server (although graphics often lie).
Then it's just a matter of counting matching trigrams/chunks. The server doesn't need to know how to read the trigrams.
So let me get this right - its encrypted but you cansearch prefix and suffix?
So all the attacker has to do is do it one letter at a time, see if it starts with A, B, C, once they figure that out, go to the next letter and so on. (I presume that the DB is not supposed to be trusted since they make such a big fuss about only being decryptable on the client side)
Also there doesn't seem to be a whitepaper detailing algorithms or their threat model. Bitcoin scams try harder then this.
The use case here is just "advanced encryption at rest". Encrypting at rest is one thing, but this means people are less likely to see PII by accident, for example.
"Queryable Encryption implements a fast, searchable scheme that allows the server to process queries on fully encrypted data, without knowing anything about the data. The data and the query itself remain encrypted at all times on the server."
They are strongly implying that the someone with access to the database should not be able to decrypt the data. According to their blog post that seems to be the entire value proposition compared to what they describe as traditional encryption at rest.
Designing secure cryptosystems is hard. Experts fail at it all the time. The lack of technical details is a major red flag.
Not to mention the distinct possibility that even if this group made a secure system, the mongodb marketing dept may very well be misrepresenting its security/limitations.
I don't think this is exactly homomorphic. I hope they put out a whitepaper so researchers can properly evaluate its security.
See the MuchPIR project (https://github.com/ReverseControl/MuchPIR) which implements Information-Theoretic Private Information Retrieval (IT-PIR) in Postgresql; In addition to the demo there is a high performance version available for commercial use.
If they are able to do this without decrypting the data then I think you could describe this as a somewhat week encryption that exposes some data attributes as queryable. You could not implement this with strong encryption without at least decrypting for indexing.