MongoDB Releases Queryable Encryption Preview (opens in new tab)

(mongodb.com)

120 pointsandrewbarba4y ago67 comments

67 comments

52 comments · 13 top-level

SkyPuncher4y ago· 7 in thread

This is a really neat technology, but I don't understand it's use case. I've worked in HealthTech and currently in the compliance space. I'm skeptical of Mongo's claims (and their familiarity with compliance laws). Kind of feels like a solution in search of a problem.

"In use" implies that you have a need to process that data. It doesn't matter if the end client is submitting queries in plain text (protected in transit) or this fancy encryption, the client (or server) still needs to be authorized to query that data. Translating from plain-text to encryption does not add additional protections from a compliance perspective.

snorkel4y ago

This seems more applicable for the SaaS hosting model where the database service is managed by a 3rd party. So the use case is "I trust your SaaS service is compliant with my legal obligations to protect my customer data, but it'd be easier for everyone involved if your database service also has no way of seeing sensitive data fields. That would make it easier for me to pass my compliance audits, otherwise I need to audit you." So the data is encrypted client-side before it's sent over to the database service, and the database service is not able to decrypt it, but can still can include the encrypted value in a query.

SkyPuncher4y ago

The problem is that situation doesn't really exist.

At an organizational level, it's extremely hard to control what information get put into a SaaS. There are far too many ways in which data can be de-anonymized or inferred against (e.g. a field existing can have privacy implications).

It's far safer to use a SaaS provider that meets general control requirements than to try to shoe-horn encrypted data into them.

2 more replies

giaour4y ago

> It doesn't matter if the end client is submitting queries in plain text (protected in transit) or this fancy encryption

It's not just the query that is encrypted in this case, but the data being queried. From MongoDB's description, the server never receives or stores plaintext data, and the query results can only be decrypted by a client who has the same key that was used to encrypt the data in the first place. From a compliance perspective, that's amazing if it works. It means the server is never storing or processing anything but ciphertext.

gkop4y ago

Yes, and in the context of Mongo-as-a-Service, it's amazing both to the client and also the service provider (less liability).

xhkkffbf4y ago

In one of the books about the general idea, _Translucent Databases_, the idea is to save the costs of securing the raw data. Someone might break into the database server (or listen on the wire) and find only encrypted values. This can make many different architectural use cases easier to deliver.

In the most extreme cases, the unencrypted values never leave the client. The database can concentrate on delivering storage and fast query answers without paying much attention to issues of security. Clients don't need to trust the database because they control the encryption.

redwood4y ago

I see this as more about fundamental trust.. confidentiality from the service providers, not compliance

015a4y ago

To be fair: the Compliance Regime didn't invent any of the technologies they create frameworks around, and if we followed every compliance framework's recommendations to a T, the systems produced would by-and-large be insecure. They paint with an extremely large brush; and its a toss-up whether the auditor has even been involved in anything technology-related beyond auditing for, well, decades. There's good ones and bad ones, but the integrity of many audit processes relies to a significant degree on the goodwill of the SMEs of the systems and processes being audited.

Just as a dumb example; an auditor says passwords need to be hashed with bcrypt. They find a code sample that says "store(bcrypt(password))". Awesome; complied to a T. But true security goes beyond that: are we using a library for bcrypt, or an internal implementation? Is the internal implementation well-implemented? Is the library free of CVEs (maybe they check that)? Did we trace that call to ensure the data generated is what is inserted to the db, or was it intercepted by some middleware? Did we name that function 'bcrypt' but its actually just MD5?

My point is really not to assert that auditing is pointless, but rather its fundamentally limited in what kind of attestations it can make.

One great example I can pull from a few recent audits I've been through: serverless tech like Fargate. This oftentimes blows auditors away (or, rather, it used to; nowadays they've seen it so often that they just know). It checks so many boxes. They'll present multi-page forms about data center colos and operating system security and operator SSH access and we'll say "We use Fargate". "Oh nice, ok we can check all of these and carve out with AWS's attestation for (ComplianceFrameworkX)". It saves hours, days, of time.

That's, I think, where homomorphic encryption can go. That isn't what this is, but it's a step toward that. It's not about meeting today's compliance frameworks; it's about evolving the framework. And, in the interim, as advanced R&D teams meet these auditors, they'll educate-up how, yeah, you've got a lot of questions here, but its not that we do or don't meet them: its that they're fundamentally the wrong questions to ask; but we understand the spirit, here's how we meet the spirit, and here's how we're actually better than if we had just checked Yes on all of them.

Third example: years ago, our team was the first time our auditor had ever seen LetsEncrypt and k8s certificate-manager (then it was called kube-lego). He wanted an attestation that TLS certificates were current and not near-expiration. We countered: they can't be near-expiration, because we have automated systems which renew them. He'd never seen anything like it; he was used to expensive certificates and operations runbooks for renewal; and we nerded out for ten minutes showing it all off. Instead of documenting a runbook for renewing certificates, he documented our runbook for maintaining this automated service and ensuring uptime. Win-win.

Its a slow process, and its made even slower because there are tons of people in the industry who treat the frameworks as gospel. But, ultimately; we control the technology, not them. We decide what is secure; they just attest to it and double-check.

uberdru4y ago· 7 in thread

seriously did not think we would see homomorphic encryption productized for a few more years. pretty impressive!

8jy89hui4y ago

> Some of the existing tools, such as homomorphic encryption or secure enclaves have performance unsuited to scalable encrypted search, require proprietary hardware, or have uncertain security properties.

I don't think this is exactly homomorphic. I hope they put out a whitepaper so researchers can properly evaluate its security.

uberdru4y ago

Nice catch, I was scanning for homomorphic encryption, but missed this. Have no idea how else they would implement this.

muchpir4y ago

Homomorphic Encryption is available at large scale today for limited use cases.

See the MuchPIR project (https://github.com/ReverseControl/MuchPIR) which implements Information-Theoretic Private Information Retrieval (IT-PIR) in Postgresql; In addition to the demo there is a high performance version available for commercial use.

dandraper4y ago

Its not Homomorphic but "structural encryption". Less useful than HE but faster.

snorkel4y ago

Correct. It's not homomorphic encryption, but rather more like TDE (Transparent Data Encryption) except that MongoDB service isn't decrypting the data. This is essentially client-side encryption (at the driver) and without server-side decryption.

cvwright4y ago

Faster has a usefulness all its own

1 more reply

samwillis4y ago

Homomorphic encryption allows you to modify the encrypted data without decrypting it or even knowing the the content. I don’t think this is homomorphic encryption.

If they are able to do this without decrypting the data then I think you could describe this as a somewhat week encryption that exposes some data attributes as queryable. You could not implement this with strong encryption without at least decrypting for indexing.

throwaway2016a4y ago· 6 in thread

Help me understand this...

It says it will support prefix search, substring search, and the like. Can anyone point me in the right direction on what the algorithm may be here? I don't get how you could do those things without making the encryption less secure and/or decrypting every record the fly.

Another interesting use case I found that isn't mentioned here is sort. I've had customers ask me to be able to sort the results by PII and we tell them... no, we can't do that because the field is encrypted.

blintz4y ago

These things are indeed possible while maintaining fully semantically secure encryption. Recent, mostly theoretical work shows that this is possible using fully homomorphic encryption. The basic idea is, the client can encrypt its query, the server can process the encrypted query and produce an encrypted result, and send this back to the client. It sounds impossible, but it isn’t! Very cool stuff. There are actually also some practical implementations that work… so it’s gradually exiting the “theoretical only” stage.

MongoDB is very short on details, and I suspect they do something worse than homomorphic encryption, that does indeed make some kind of compromise between privacy and convenience.

dweinus4y ago

Yeah, they contrast their method with homomorphic encryption, which makes me share your suspicion

2 more replies

hapiri4y ago

It is less secure than your standard symmetric encryption. I guess they would use deterministic encryption in which 2 entries with same email address will have the same record string ( this leaks information to attacker ). Prefix search & sort can be achieved by using order preserving encryption. Not really sure about sub-string though.

throwaway2016a4y ago

I've researched order preserving encryption before but the tradeoffs (mainly that the attacker can tell the order and use that to narrow the search space) always seemed like high risk.

1 more reply

jalcazar4y ago

Related video explaining encryption schemes to make encrypted data in a DB queryable:

CryptDB: Processing Queries on an Encrypted Database

https://youtu.be/xsaXMUelOEA?t=807

bawolff4y ago

I was under the impression that cryptdb "encryption" was thoroughly broken. Am i mistaken?

E.g. googling i found http://cs.brown.edu/people/seny/pubs/edb.pdf

1 more reply

api4y ago· 4 in thread

Is this actually possible? Couldn't you make many repeated queries and slowly decrypt the text by e.g. slowly narrowing the range?

robmccoll4y ago

This is possible. The goal is that the server knows as little as possible, while the client has full information. It's order revealing encryption. The server side knows the ordering of the values, but doesn't know any specific value. When queried, it is always getting prefixes (or exact matches) following the same encryption scheme, so it can compare those to the corpus and select results since the query parameters fall into the same ordering. The server doesn't have access to the keys needed to generate query parameters, so in theory it would be difficult for the server to perform narrowing queries on its own. Over time the server could gather statistical results that may reveal more about the data it's holding. Also, these schemes may need to produce the same cipher text for the same input, so frequency distributions can be used to reveal information.

Diggsey4y ago

Yeah the article is very thin on technical details. To make this work as they describe, it must not be possible for any client to "forge" queries, or else they could trivially decode the content by sending prefix queries of increasing length.

It's also difficult to see how this could work on the server side without exposing some information about the encrypted fields. For example, if all documents have a value that begins with "a", then there must exist a prefix query that matches all those documents. I would expect it to be possible to figure out whether such a query is possible or not, only given access to the encrypted data, but even if that's not possible, the simple fact that a prefix query was issued that matched all documents gives away that information.

robmccoll4y ago

You could have a larger range than domain and throw in some noise. Exact match queries would need to become range queries that are de-noised at decryption.

SkyPuncher4y ago

Yes. This is the fundamental problem with this.

For something like, HIPAA, this ads very little value if fields are semi-known.

rafaelturk4y ago· 4 in thread

This looks really cool. Albeit feels that it is actually a feature implemented in the driver (client side) so my initial impression is that is not a meanignfull innovation on the server side. This can be implemented with any Database, even with current MongoDBs

gqewogpdqa4y ago

Nope it’s implemented on the server side. I think that they are going to talk more about it at a session and maybe even in a keynote

8jy89hui4y ago

> This can be implemented with any Database, even with current MongoDBs

Is it really all client side? How could they do things like substring matching without sending the entire index back and forth to the client? The graphic seems to show the query being executed solely on the server (although graphics often lie).

jayd164y ago

Perhaps encrypted trigrams (or some such thing) are sent during insert and search.

Then it's just a matter of counting matching trigrams/chunks. The server doesn't need to know how to read the trigrams.

rafaelturk4y ago

We use Mongoose, for sensitive data we have a wrapper around the .pre Save() method da encrypts it before sending data to the downstream db. Feels that MongoDB implemented that, in a more elegant structured code.

bawolff4y ago· 4 in thread

I call bullshit.

So let me get this right - its encrypted but you cansearch prefix and suffix?

So all the attacker has to do is do it one letter at a time, see if it starts with A, B, C, once they figure that out, go to the next letter and so on. (I presume that the DB is not supposed to be trusted since they make such a big fuss about only being decryptable on the client side)

Also there doesn't seem to be a whitepaper detailing algorithms or their threat model. Bitcoin scams try harder then this.

winrid4y ago

The use case you're outlining is someone already has access to the database. They can just do a find() in that case and get everything, no query required. You're basically describing an lz77 SSL hack that's like 20 years old, I'm pretty sure they would think of this.

The use case here is just "advanced encryption at rest". Encrypting at rest is one thing, but this means people are less likely to see PII by accident, for example.

bawolff4y ago

That's not what their blog post says. To quote:

"Queryable Encryption implements a fast, searchable scheme that allows the server to process queries on fully encrypted data, without knowing anything about the data. The data and the query itself remain encrypted at all times on the server."

They are strongly implying that the someone with access to the database should not be able to decrypt the data. According to their blog post that seems to be the entire value proposition compared to what they describe as traditional encryption at rest.

1 more reply

mushi4y ago

It’s already been mentioned that “Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz" - are you calling bullshit on their work? What are your qualifications?

bawolff4y ago

So long as whatever system they designed has not been published and reviewed by independent experts, then yes. I don't have to be an expert in this space to recognize what the norms are for making new production ready cryptosystems are, and that this doesn't remotely meet them.

Designing secure cryptosystems is hard. Experts fail at it all the time. The lack of technical details is a major red flag.

Not to mention the distinct possibility that even if this group made a secure system, the mongodb marketing dept may very well be misrepresenting its security/limitations.

GTP4y ago· 3 in thread

The problem is: is also the full query encrypted or just some values that are considered sensitive? I remember a research form some years ago showing that if an attacker is still able to see the SQL code can recover the content of the database by looking at the queries, the responses and "putting the pieces together". Now, if the target was to get the exact values inside the database (think about employees wages) it still required to observe a very big number of queries, but if you were interested in getting a reasonable interval for each value then the number of queries needed become small enough to be doable in practice.

Unfortunately I don't seem too be able to find this again, but a quick search turned out two papers that say that just encrypting your db isn't enough: [0], [1]. In particualr [1] doesn't seem to go into the details of how you could recover the data, but mentions that many operations as performed by "normal" databases leak information if performed over encrypted data. Maybe someone that is more familiar with Queryable Encryption can comment on this?

[0] https://www.cs.cornell.edu/~shmat/shmat_hotos17.pdf [1] https://www.microsoft.com/en-us/research/wp-content/uploads/...

mahmoudimus4y ago

You're on the right track. I work in the data security space and while this is a cool release, it's not novel[0] and has been around for a while[1]. As a general rule of thumb, the first thing to check is if the provider is asking you to pass in your query in plain text AND without a local client (very important, because if you're sending data in plaintext, the threat model is now transitioning to a honest-but-curious model).

This is obviously not that. They're encrypting locally. However, Simon Oya & Dr. Kerschbaum's paper, https://arxiv.org/abs/2010.03465, demonstrate a fantastic efficient attack to recover keywords on most constructions without a lot of queries. It is yet to be seen how effective MongoDB's implementation will be.

This is a very interesting space but structural encryption is the right way to put the theory into good use.

Most of the other encryption mechanisms such as homomorphic, partially homomorphic, etc. are just too impractical or require very specific niche use cases to be useful.

There are other misnamed technology I've seen in marketing such as "polymorphic encryption" or "vaultless" - but most of these haven't had real research or cryptanalysis behind it.

[0] https://info.ionic.com/hubfs/IonicDotCom/Resources/Assets/Se... [1] https://eprint.iacr.org/2017/111.pdf

GTP4y ago

Thank you for the information. Just one question: what do you mean by "without a local client"?

1 more reply

ihucos4y ago

I like it. There is always a way to hack something. This is an additional layer of security that yes, can be also broken.

dandraper4y ago· 2 in thread

This feature is a result of MongoDB's acquisition of Aroki. It looks like a good product but we actually beat them to it with https://cipherstash.com/activestash

CipherStash works with any Database and also supports Range queries and sorting/ordering. We do it in the application layer. Only supports Ruby so far but C#, Java, Python, Rust are in the works.

metadat4y ago

What about Go, or even Tcl, and Ocaml? Do you have pointers to docs that'd help OSS efforts in this department?

dandraper4y ago

Not yet but that's a good suggestion! The core client code is Rust so additional languages are (mostly) just native bindings to Rust. We will be releasing the Rust SDK publicly soon and welcome contributions!

bincyber4y ago· 1 in thread

This is really neat. Recently I explored similar functionality for relational databases and only got as far as implementing column-level encryption [0] in this Go library [1], but without support for querying the encrypted data. HashiCorp Vault's transit secrets engine supports Convergent Encryption [2] which provides limited ability to query the encrypted data, but I haven't yet experimented with it. If anyone is doing something like this in production, would love to hear about your experience.

[0]: https://en.wikipedia.org/wiki/Column_Level_Encryption

[1]: https://github.com/bincyber/go-sqlcrypter

[2]: https://www.vaultproject.io/docs/secrets/transit#convergent-...

muchpir4y ago

The MuchPIR project (https://github.com/ReverseControl/MuchPIR) implements Information-Theoretic Private Information Retrieval (IT-PIR) in Postgresql; In addition to the demo there is a high performance version available for commercial use.

eknkc4y ago· 1 in thread

I didn't know this was a thing. The article mentions it can do equality, range, prefix, suffix and substring queries. Does this mean that the encryption scheme creates sortable 1:1 mapped results after encryption? Kind of like a shift cipher?

tyingq4y ago

They mention this:

"Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz"

Some related papers with those two as authors:

https://eprint.iacr.org/2016/453.pdf

https://cs.brown.edu/people/seny/pubs/sgx.pdf

winrid4y ago

Neat. Did they fix their blog's pagination yet? If you hit next enough times you may or may not be able to take down the site, don't ask me how I know.

(their pagination is implemented just by increasing the limit parameter).

Redsquare4y ago

If it is going to the likes of aws kms everytime it will blow budgets

claudiug4y ago

can this be done in postgres via client or via server? I found it really nice

j / k navigate · click thread line to collapse

67 comments

52 comments · 13 top-level

SkyPuncher4y ago· 7 in thread

snorkel4y ago

SkyPuncher4y ago

The problem is that situation doesn't really exist.

It's far safer to use a SaaS provider that meets general control requirements than to try to shoe-horn encrypted data into them.

2 more replies

giaour4y ago

> It doesn't matter if the end client is submitting queries in plain text (protected in transit) or this fancy encryption

gkop4y ago

Yes, and in the context of Mongo-as-a-Service, it's amazing both to the client and also the service provider (less liability).

xhkkffbf4y ago

redwood4y ago

I see this as more about fundamental trust.. confidentiality from the service providers, not compliance

015a4y ago

My point is really not to assert that auditing is pointless, but rather its fundamentally limited in what kind of attestations it can make.

uberdru4y ago· 7 in thread

seriously did not think we would see homomorphic encryption productized for a few more years. pretty impressive!

8jy89hui4y ago

I don't think this is exactly homomorphic. I hope they put out a whitepaper so researchers can properly evaluate its security.

uberdru4y ago

Nice catch, I was scanning for homomorphic encryption, but missed this. Have no idea how else they would implement this.

muchpir4y ago

Homomorphic Encryption is available at large scale today for limited use cases.

dandraper4y ago

Its not Homomorphic but "structural encryption". Less useful than HE but faster.

snorkel4y ago

cvwright4y ago

Faster has a usefulness all its own

1 more reply

samwillis4y ago

Homomorphic encryption allows you to modify the encrypted data without decrypting it or even knowing the the content. I don’t think this is homomorphic encryption.

throwaway2016a4y ago· 6 in thread

Help me understand this...

blintz4y ago

MongoDB is very short on details, and I suspect they do something worse than homomorphic encryption, that does indeed make some kind of compromise between privacy and convenience.

dweinus4y ago

Yeah, they contrast their method with homomorphic encryption, which makes me share your suspicion

2 more replies

hapiri4y ago

throwaway2016a4y ago

I've researched order preserving encryption before but the tradeoffs (mainly that the attacker can tell the order and use that to narrow the search space) always seemed like high risk.

1 more reply

jalcazar4y ago

Related video explaining encryption schemes to make encrypted data in a DB queryable:

CryptDB: Processing Queries on an Encrypted Database

https://youtu.be/xsaXMUelOEA?t=807

bawolff4y ago

I was under the impression that cryptdb "encryption" was thoroughly broken. Am i mistaken?

E.g. googling i found http://cs.brown.edu/people/seny/pubs/edb.pdf

1 more reply

api4y ago· 4 in thread

Is this actually possible? Couldn't you make many repeated queries and slowly decrypt the text by e.g. slowly narrowing the range?

robmccoll4y ago

Diggsey4y ago

robmccoll4y ago

You could have a larger range than domain and throw in some noise. Exact match queries would need to become range queries that are de-noised at decryption.

SkyPuncher4y ago

Yes. This is the fundamental problem with this.

For something like, HIPAA, this ads very little value if fields are semi-known.

rafaelturk4y ago· 4 in thread

gqewogpdqa4y ago

Nope it’s implemented on the server side. I think that they are going to talk more about it at a session and maybe even in a keynote

8jy89hui4y ago

> This can be implemented with any Database, even with current MongoDBs

jayd164y ago

Perhaps encrypted trigrams (or some such thing) are sent during insert and search.

Then it's just a matter of counting matching trigrams/chunks. The server doesn't need to know how to read the trigrams.

rafaelturk4y ago

bawolff4y ago· 4 in thread

I call bullshit.

So let me get this right - its encrypted but you cansearch prefix and suffix?

Also there doesn't seem to be a whitepaper detailing algorithms or their threat model. Bitcoin scams try harder then this.

winrid4y ago

The use case here is just "advanced encryption at rest". Encrypting at rest is one thing, but this means people are less likely to see PII by accident, for example.

bawolff4y ago

That's not what their blog post says. To quote:

1 more reply

mushi4y ago

bawolff4y ago

Designing secure cryptosystems is hard. Experts fail at it all the time. The lack of technical details is a major red flag.

Not to mention the distinct possibility that even if this group made a secure system, the mongodb marketing dept may very well be misrepresenting its security/limitations.

GTP4y ago· 3 in thread

[0] https://www.cs.cornell.edu/~shmat/shmat_hotos17.pdf [1] https://www.microsoft.com/en-us/research/wp-content/uploads/...

mahmoudimus4y ago

This is a very interesting space but structural encryption is the right way to put the theory into good use.

Most of the other encryption mechanisms such as homomorphic, partially homomorphic, etc. are just too impractical or require very specific niche use cases to be useful.

There are other misnamed technology I've seen in marketing such as "polymorphic encryption" or "vaultless" - but most of these haven't had real research or cryptanalysis behind it.

[0] https://info.ionic.com/hubfs/IonicDotCom/Resources/Assets/Se... [1] https://eprint.iacr.org/2017/111.pdf

GTP4y ago

Thank you for the information. Just one question: what do you mean by "without a local client"?

1 more reply

ihucos4y ago

I like it. There is always a way to hack something. This is an additional layer of security that yes, can be also broken.

dandraper4y ago· 2 in thread

This feature is a result of MongoDB's acquisition of Aroki. It looks like a good product but we actually beat them to it with https://cipherstash.com/activestash

CipherStash works with any Database and also supports Range queries and sorting/ordering. We do it in the application layer. Only supports Ruby so far but C#, Java, Python, Rust are in the works.

metadat4y ago

What about Go, or even Tcl, and Ocaml? Do you have pointers to docs that'd help OSS efforts in this department?

dandraper4y ago

bincyber4y ago· 1 in thread

[0]: https://en.wikipedia.org/wiki/Column_Level_Encryption

[1]: https://github.com/bincyber/go-sqlcrypter

[2]: https://www.vaultproject.io/docs/secrets/transit#convergent-...

muchpir4y ago

eknkc4y ago· 1 in thread

tyingq4y ago

They mention this:

"Queryable Encryption was designed by MongoDB’s Advanced Cryptography Research Group, headed by Seny Kamara and Tarik Moataz"

Some related papers with those two as authors:

https://eprint.iacr.org/2016/453.pdf

https://cs.brown.edu/people/seny/pubs/sgx.pdf

winrid4y ago

Neat. Did they fix their blog's pagination yet? If you hit next enough times you may or may not be able to take down the site, don't ask me how I know.

(their pagination is implemented just by increasing the limit parameter).

Redsquare4y ago

If it is going to the likes of aws kms everytime it will blow budgets

claudiug4y ago

can this be done in postgres via client or via server? I found it really nice

j / k navigate · click thread line to collapse