Tokenization, which CipherCloud does, could actually be done fairly securely if you had a decent amount of local storage. They IIRC use a FIPS HSM for local key storage in their local appliance (I talked to one of their founders as a security event a year or two ago and was initially suspicious of their claims, but it seemed adequate for certain use cases based on how they were using it -- maybe things have changed). It's fundamentally not too different from when Stripe gives you a user key vs. PCI information.
Basically, if you can correctly identify certain fields as sensitive and others as not, and force all your traffic through a proxy, you could do totally unrelated random tokens in fields, and then do search locally on the appliance, rather than on the untrusted service. E.g. if you wanted to use Salesforce, but keep customer addresses secret (because they were super-confidential government sites or meth labs or something), you could still put names in Salesforce and do everything else, but just put a random string in for addresses; do address searches on the proxy, either going from single record to address or maybe even "give me all the records in Missouri". There is no magic here. Someone could do an open source implementation for any specific site (via scraping or a public API) easily. The difficulty is doing it for many sites, and keeping it updated, supporting it, and selling it to fortune 500.
I don't know if they've been pushed to do stupid stuff, or if they just have horrible marketing/PR now (which is weird since they raised a fuckton of VC), or what.
If instead they had entered the discussion with a sliver of respect and honesty it would have been great. Instead many people have been introduced to them via a negative and untrustworthy atmosphere, this has certainly tarnished their reputation. Despite what they say not all exposure is good exposure.
I'm not sure if it's that they have no PR experience in the company, or just don't consider StackExchange/HN/Reddit to be worthy of a serious effort.
IMO, this is the kind of thing founders should handle personally once it happens. Maybe guided by a PR person or an investor, but a founder giving an adequate response gets graded on a curve, and is thus a lot more effective than a completely polished PR/marketing person.
One issue is access patterns might leak information, so if you wanted maximum security you'd end up doing crazy things like heavily caching or accessing extra "chaff records" periodically. Well before that point you'd probably just give up on the SaaS app entirely.
And would be happy to have a full and frank discussion about the way our system works.... as long as you sign this NDA!
My BS meter is running high.
"Contributed to by our competitors" -- if that's the case, the competitors are giving informative SO answers about crypto. Whereas they are engaging in censorious shenanigans. I, for one, prefer the "competitors'" contributions.
That is not true; a private information retrieval protocol can be used to search encrypted data:
https://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=568331...
You could also use an oblivious RAM, although I do not think that is practical yet:
http://www.pdf-archive.com/2013/04/20/notice130419/preview/p...
[1] http://en.wikipedia.org/wiki/Deterministic_encryption
EDIT: Or maybe you know that and I missed the joke.
It would be hard for them to make a security guarantee that isn't bogus, if the screenshots from their demo is an accurate representation of their technology, but they don't appear to have made this particular fraudulent statement.
Graf 1, sentence 1: "a few board threads" -> Internet's current most important programming forum.
Graf 1, sentence 1: "contributed to by our competitors" -> Smoke screen, unsupported, irrelevant.
Graf 2, sentence 2: "basically admitted they really didn't know the facts" -> Because the facts weren't provided, the contributors set about reversing them from published material, the point of the thread.
Graf 3, sentence 4: "does use publicly available, well researched, and NIST validated cryptographic algorithms" -> Virtually all cryptography anywhere can make a similar claim, and most of that code is broken. NIST validates primitives and a few basic constructions, but tying those primitives into a functional cryptosystem is outside their purview.
Graf 4, sentence 1: "for any customer deployments" -> Leaves open the question of whether they implement semantically insecure constructions in any setting.
Graf 5, sentence 2: "fundamental security features (full field encryption, randomization through IVs) were disabled" -> Randomized encryption isn't a feature, it's a fundamental property of a cryptographic construction.
Graf 6, sentence 1: "currently in the process of obtaining our FIPS 140-2 certification" -> FIPS 140-2 doesn't involve a rigorous analysis of cryptographic primitives; the crypto-specific components focus on use of NIST-approved ciphers and block modes, but do not assure that those primitives are used securely. To illustrate that point: every vulnerable version of SSL3 and TLS1.0 and TLS1.1 has had a FIPS-compliant implementation somewhere.
They should just be honest about their desire to suppress the use of their copyrighted IP in critiques of their product. They're in a competitive space, they're a small company, hard to manage their online reputation and build product, &c. The Reddit/HN/Stack Overflow scene wouldn't like that response, but it's better than this one, which actually creates more questions about their product capabilities.
Which is a textbook case of fair use. They may want to do that, but legally, they almost certainly can't.
So, apparently they are going to be patenting padding/randomization in encryption and "full field encryption". Our patent system at work for obvious things.
https://webcache.googleusercontent.com/search?q=cache%3Ablog...
"A couple of recent discussions in a few board threads contributed to by our competitors have questioned CipherCloud’s small online payday loans. same day payday loans. easy online payday loan. direct lender payday loans online. approach to delivering cloud information protection."
we are unable to provide this patent pending document at this time.
Our legal department will be shortly dispatching a DMCA infringement notice to all parties "mirroring" our content as a sign of our ongoing commitment to protecting our valuable intellectual properties against these thieves and scoundrels.
If you should encounter any further difficulties with our information dissemination services, please sign the following non-disclosure agreement[1] and affix the supplied Fedex label to your firstborn. We aim to respond to all communications within 6 working months, as part of our Quality Commitment Assurance.
[1] Whilst blood is preferred, red ink will suffice.
> The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
Everything they do is destroying their reputation. Unfortunately, only among a few people who would have been suspicious anyway.
http://webcache.googleusercontent.com/search?q=cache:http://...
Probably some auto highlighter running amok.
...DMCA takedown requests.
If you had to guess, how many of CipherCloud's customers do you think keep cryptographers on staff?
It is feasible to strongly encrypt all data but you have to make sure that you do not accidentally implement ECB mode or something similar when using a common block cipher like AES. So you definitely want a unique IV for every piece of data you encrypt. But now you have also broken all server-side functionality because (almost) no useful operation will produce the expected result when operating on encrypted data. Client-side functionality is no problem because it only sees decrypted data.
Therefore they (have to) make compromises. Actually the user has to make the compromise - keep some data unencrypted or lose the server-side functionality. This is most prominent in the demos with numeric data that needs to be aggregated, averaged and what not. Actually it would be not to easy to encrypt this numeric data because you have to preserve the format including limits and disallowed values or otherwise the server would reject some values.
What about the infamous text fields? They are probably the easiest to encrypt but you still have to be careful not to break validation rules, for example by making the encrypted text much longer or making an e-mail regular upset (but I bet most applications perform only client-side validation). But this again makes the third-parts application a lot less useful because you lost the ability to search in your textual data. The problem to solve is the following one (with some minor details ignored).
text.contains(searchText) == encrypt(text).contains(doSomething(searchText))
I - not being a cryptography expert - can not think of a way to get this working without leaking information and CipherCloud's solution as discussed on Stack Exchange definitively leaks a lot of information. This is really a very tough problem. (Probably) not even homomorphic encryption would help because you have no control over the comparison method - it is plain old substring search, maybe case insensitive and that's it. It is solvable using private information retrieval in the relaxed case when you have control over the comparison operation but with substring search it is probably to hard (if you want to keep the cipher text length similar to the plain text length).