Anyway, I have been toying with the idea of creating an open-source web application. Open-source in the way that I would host the product on my own servers but the full source code, except, of course, authentication info, tokens and things which shouldn't be under source control anyway, would be available on Github. Anyone would also be able to set up a self-hosted version of the product, and if anyone wanted to contribute, I would accept pull requests, etc.
The idea behind that was that, since it would be targeted mostly at a tech-savy crowd and deals with personal information, I would like to introduce some level of trust that I'm not doing anything sneaky or unexpected behind the scenes (like storing information I shouldn't be).
So basically I started wondering if it is possible to implement a way people could verify that the same code they see on the Github repo is the code that's also running on the live hosted site? I will be working with node.js, but I don't think the tech stack is too relevant here.
The client-side part, as I imagined it, would be relatively trivial to verify - run the build part locally, compare fingerprint of the live code and the local version.
But my idea got stuck on the server side. Since there, even if I would make a server side endpoint that returns the fingerprint of the live code, there is no way for someone to check what is actually going on, I could just as well return a static file with a hardcoded fingerprint.
I'm sure someone has dealt with or thought about similar problems before, and I would be happy to hear some insights. Feels like this might be a pipe dream, but at least some level of verification would be nice to achieve.
We've been here before: this is Trusted Computing. You need a Trusted Platform Module on your servers (thankfully you're picking the hardware, so you can make that a hard requirement). Your users can inspect and sign your code with their keys, that they generate and keep on the client side (you never see them). Or more likely, they sign that they trust a particular third-party auditor. Either way, their data is uploaded encrypted with their keys and only code they have signed will ever be allowed to decrypt it.
It won't be easy. You'll have to keep old versions of your code around in case users haven't signed the new versions. The TPM-handling libraries are immature, though they get better every day. But it's possible, particularly since you only need to make it work with one particular model of TPM.
Good luck!
Unfortunately, I think the reason most open source people have a knee-jerk aversion to trusted platforms are that they've historically been designed to only serve the interests with the most money (read: the government and/or content industry).
There's nothing inherently anti-open source about the schemes, and they would provide innumerable benefits to increasing security confidence in a networked world.
However, when you can rattle off enough failed or botched encryption initiatives involving a hardware component to fill one hand just from the top of one's head (CSS, AACSS, HDCP, UEFI/SecureBoot, FairPlay), confidence is not inspired...
Oh wait, I can't do that? Hmm, so who are you trusting against? Me, you say?
Nope nope nope.
Bank: "For your security, you may only access our website with an officially supported browser"
Wicked. :)
1. Site publishes its hardware public key, allows users to verify it can sign on behalf on an Intel processor.
2. Site publishes source and reproducible build, so everyone can agree on a hash of acceptable bits.
3. Users submit requests encrypted to that public key (there's also something missing, where the key is actually a combination of the public key plus the hash of the executable code. Maybe the processor signs another cert for a specific proc+code combo).
4. Server can only decrypt when it has access to the matching private key, which is only available after entering the secure enclave.
5. If the server could decrypt the request and sign a response, the user knows it was handled by the right bits.
This still has many problems, the main one being that users are not going to really verify anything anyways. Also the data storage and all important handling needs to be done with encryption, so an admin can't just change the data.
But in theory, assuming no one can break the secure enclave/trust chain, it's a pretty nifty solution.
Are there any applications actually using Ethereum yet? Almost everything linked to from the home page consists of "roadmaps" or "coming soon" pages. Is there a public network that's up and running? How many nodes does it have?
Also, its probably important to note that a contract is run by every miner (or, every miner on a certain part of the network, depending on how the scalability thing ends up being done?), and that as such, you will want to keep any computation done by a contract to be computationally cheap, so that it will be cheap to use. Instead of having the computation do the computation that needs to be done, it may in some cases work better to just have the contract verify the accuracy of computations that have been done.
However, now that I think of it, I'm not sure that Ethereum would solve all of the problems OP gives, because one of the problems OP wanted to solve was that they wanted to demonstrate that they were not storing information that they shouldn't be. But with Ethereum, because the contracts are executed by "everyone", everyone has to have access to the data the contracts are using to run, and there is no way to insure that they don't hold onto that data.
One way to solve this could maybe be doing computations on shared secrets (as talked about in one of the Ethereum blog posts, but which is not something Ethereum is to have), but this might require more messages to be sent over the network than one is willing to use. Still much more practical than homeomorphic encryption though I think. (If one was using the shared secret thing, I'm not sure one would do it with Ethereum.)
Ethereum could/would solve the problem of ensuring that the program being run is exactly as claimed, but it might not keep certain information private. Depending on the specific problem at hand, there might be ways around that though?
EDIT: ok, so, something suggested that homeomorphic encryption might have gotten an improvement to the point of practicality recently? I wasn't aware of that. May make this post slightly out of date.
It appears to me, however, to be theoretically impossible to achieve. The code is running on your hardware; there is simply at the moment no known way to give any kind of assurance about remotely running code.
What would work, of course, would be to implement the entire app as a JavaScript application that only asks the backend for information when necessary. The blockchain.info bitcoin wallet does something to this effect. So does the web version of the LastPass vault. There is, of course, still no assurance that you will not occasionally inject compromising code when it is difficult to spot, but this is the closest that I have seen.
Richard Stallman is also advocating something to this effect: https://www.gnu.org/philosophy/javascript-trap.html
Implementing as an API doesn't help much, and in fact might just make it easier to fake. This is because you have a reduced surface area that you need to check and modify.
It's theoretically impossible if you assume full control of hardware. But most people are not capable of controlling hardware, so secure enclaves and remote attestation are likely to be a legitimate win if feasible.
"Browser users also need a convenient facility to specify JavaScript code to use instead of the JavaScript in a certain page. (The specified code might be total replacement, or a modified version of the free JavaScript program in that page.) Greasemonkey comes close to being able to do this, but not quite, since it doesn't guarantee to modify the JavaScript code in a page before that program starts to execute."
As for the second claim, actually implementing as an API does indeed help, because most of the code is then running in the browser and can be audited.
As for the third, the are no known ways to implement secure enclaves and remote attestation, that is what the questioner is asking. If you know of any, do share them.
1) Anyone, not just nice people, can view source code on GitHub
2) Source code can be used to find vulnerabilities (which is of course one of the great values of using open source code - vulnerabilities are usually spotted more quickly by a larger group)
3) A single vulnerability that allows access to private data OR can lead to corruption of loss of data could put your company out of business
There are people on both sides of that fence, but you do need to be on one side or the other.
binary, black and white thinking.
That means that if someone hot-edits the files on the server, the resulting edits should be visible, and/or the site is clearly unverified. If you deploy from a branch someone doesn't know about, it should be clear. If you just don't document that you made a deployment, someone should be able to figure that out.
Of course that can be spoofed, but spoofing a solid claim on what is running is very different than not making any claim about the code that is running, so you've made yourself accountable.
So, if each branch's code was signed and contained an embedded key and chosen encryption algorithm, then if the app used those during processing and users received verifiable transmissions, that app's output could be verified by users as having come from that advertised branch.
* software "sign its own output"
* software "encrypt its own output"
* software "encrypt its output"
* software "sign its output"
Some interesting results:
* Computer scientists develop 'mathematical jigsaw puzzles' to encrypt software (UCLA) #comment by zblaxell http://lwn.net/Articles/562113/
* Cryptographic Verification of Test Coverage Claims http://www.cs.ucdavis.edu/~devanbu/doc.ps
* Study of Security in Multi-Agent Architectures §3.4 http://www.ecs.soton.ac.uk/~lavm/papers/sec.pdf
If Github wanted to get into the hosting business, they could offer this... you'd be trusting what they say when they tell users that the code is identical in both.
I can't think of any clever way to prove it otherwise. Though, if you could, it would have broader implications... imagine Microsoft handing the code to an app over (for viewing), and then being able to prove that the shipped version of the app was the same. They could verifiably claim that their software has no backdoors (save those that are also in the source code, but obfuscated... those are rare, but exist apparently).
This is an idea worth exploring. Good luck.
You'd need to go a step even further. The "application" code is only one thing - what about other applications, processes, DB logic, HTTP front-ends?
All of those can modify requests, data, copy data, etc - even if you could "100% prove" that the server is running that particular git revision, there's so many side-channels as to make it useless.
If the server needs to do stuff with the data, then what you want probably not possible. (It depends on the exact thing that needs to be done, as there are things like homomorphic encryption.) Instead, you should focus on non-technical assurances that you are acting in good faith and that promote trust.
This would include things like having a privacy policy on the site with strong guarantees about what can be changed in the future and having a physical address. You could put funds in escrow that would pay out to the users if you violated the policy. You could have outside auditors come and verify your procedures.
Honestly, you can't guarantee that you won't have a security breach or that the government will give you a national security letter. I'd focus on building your service and making it useful enough that users deem the risk a worthwhile trade-off.
I believe that the best route to take that is most in-line with your goals would be to design it such that the server-side is untrusted from a security standpoint. Have the client process the data, and only give encrypted or sanitary data to the server side. Don't trust the server with anything other than availability.
You probably shouldn't even be trusting any one server even with that!
A video feed from a camera and another from the screen itself. Starting from a fresh system, install the dependencies, download the source, verify the hashes, and run the server. Then show the server response from another machine.
Sure it doesn't guarantee 100%. The OS image could have been tampered, or you mucked with the network and intercepted requests. But it's easy to record, easy to understand and much more secure than a black box server.
What I was thinking is allowing anonymous SSH access with a VERY locked down shell (/bin/rbash [1]) and let them view that everything is in order with their own eyes.
You still REALLY can't make a system that the user can conclusively see isn't just for show. You could be shoving them into a jail/chroot that provides the illusion of transparency, but be serving them elsewhere.
I think that this might be the closest you'll ever come to user software freedom on hardware that they don't own. There are a lot of security concerns, though, so I think it's out of the question for any type of production environment. I'd love to see a proof of concept from someone though.
[1]: https://www.gnu.org/software/bash/manual/html_node/The-Restr...
I realize this doesn't meet your requirement re hosting it yourself -- but since I don't know the full background, perhaps that doesn't matter to you.
Again, this requires a root of trust on the server, otherwise, anything returned from the server including any information you would need to verify code the server is running could be spoofed.
You can make a snapshot for free right now if you want to. This should solve your problem.
Let me know if you have any questions, but using terminal's snapshot feature you can distribute your web app at a known state.
Edit: it's like git versioning for machine state.
A solid example to keep in mind is MtGox. How can we run something and know no invalid trades are added, no fake password resets processed, etc etc.
which is open source: https://github.com/jlmucb/cloudproxy
It relies on TPMs (trusted platform modules, a hardware root of trust).
What confused me about the naming is that CloudProxy is an OS, not a proxy server. It's a distributed OS that provides attestation of the identity of remote code. To do this you need secure boot and key management.
If anyone dives further into it, let me know :) I'm curious how deployable it is from the Github repo. I guess you can run it on Linux, but I'm not sure how the kernel is involved in the chain of trust. I would have thought you needed your own OS.
The CloudProxy Tao (henceforth, “the Tao”) is a recipe for creating secure, distributed, cloud-based services by combining ingredients that are already available in many cloud data centers. The Tao is realized as an interface that can be implemented at any layer of a system. CloudProxy implements multiple layers of the Tao and provides means for
- protecting the confidentiality and integrity of information stored or transmitted by some hosted program,
- establishing that the code executed as a hosted program in a cloud is the expected code and is being run in the expected environment, and
- authenticating requests to the hosted program to check that they come from a client executing some expected program in an expected environment, either remotely or locally in the cloud.
CloudProxy is the first implemented, fully fleshed-out system providing these properties along with key management and an appropriate trust model for all principals.
Also, even this does not assure you that the blackbox isn't doing something bad via a side channel. For instance, if we trust the genuine blackbox not to transmit personal data somewhere, how do we know that the BUT (blacbox-under-test) isn't doing that? We have to isolate the blackbox, and then destroy it at the end of the test (so it can't store something and transmit later). You can't isolate a remote box. A remote box has inputs and outputs unknown to you that you cannot monitor.
Trust is a big problem with the SaaS model. When you're trusting a SaaS application, you're really trusting the provider who is hosting that application. There is no way around that.
I think, however, that there is an intermediate wortwhile goal: to have assurance that you're trusting only the host of the application!
That is to say, that the SaaS application doesn't contain malware which was not intentionally injected by the provider, but somehow got there via an upstream code source or through some exploit or whatever.
If you can provide a way that the SaaS provider itself can check that it's running the code that it thinks it is running, that is valuable. If the SaaS is rogue and customizes the code to do things that the users don't agree with, that's a separate concern.
By doing this, you are essentially telling the world that not only do they need to trust you, but that you are willing to make yourself legally culpable if you surreptitiously run different code on your servers (ie., you would be violating the copyright terms of your contributors whose code is licensed under AGPL and of which your application is a derivative).
It would be nice to have a technical solution as well, I'm not discounting any of the other suggestions here. Just saying that adding a legal dimension could help counterbalance the possibility that a technical protection could be defeated.
Don't do that. If you're targeting your product for the tinfoil hat crowd, that's simply not going to work. Instead, you create a build script that will generate your application from source, and (for example) generate a docker image. This image could be run on your servers, on a third-party server (AWS, DO, etc.) or on the user's own hardware, depending on the level of inconvenience/security tradeoff they are willing to endure.
I know you're probably looking for the consistent revenue streams of a SaaS, but unless user data can be completely encrypted during storage (e.g. email, backup, etc.), the truly paranoid don't want to trust their information to a 3rd party.
* the application isn't broken/still fulfills its API contract
* the application isn't compromised in a malicious way
As far as the first point, I wonder if it could be possible for users to run some sort of a test suite against the public API? Like a crowd-sourced test suite that verifies that production server behavior is still as advertised.
The second point I think can only be partially addressed by partial methods, since it's impossible to guarantee that some sneaky compromise hasn't happened. But you could allow outside auditing, let people have some form of read-only access to the directory tree that stores the code (if it's separate from the config), etc.
This is the scenario we are testing out with http://schemafreedb.com/
You do not have the private key so you cannot see the data but you still can offer the data portion as a service. Your client does need to host the application depending on your target client node may be a good choice or if you want to go mainstream go with something like php.
When ever you have update to the code you can provide a diff of the changes.
For example, if you want to store people's names and date of birth, you would encrypt those on the client-side and only ever send ciphertext to the server.
The encryption key could be derived via a passphrase composed of a user name and a password. Of course, this means that if someone loses their credentials, they lose their data forever.
If you could create e.g. a publicly available AMI of your application and prevent further runtime modifications to it (e.g. disallow SSH access), then maybe Amazon could offer an interface to verify that your application was running based on the trusted AMI.
Essentially Amazon would issue a statement connecting a certain IP address to a certain application.
Substitute AMI and Amazon for your choice of other technology as appropriate (docker container and docker hosting provider - this sounds like a competitive advantage for hosting providers hint hint)
The closest to actually trusted computing is, well, trusted computing. Remote attestation allows you to verify you're executing specific bits of code. But I don't know if the tech is exposed in a useful enough way. With DRM apps, the idea is that the hardware will be able to verify it's running known code. Then you provide encrypted input to that processor, knowing the only way to decrypt is to be in the trusted code. Still many remaining issues, such as extending the trusted code to cover all access to data storage and wallets.
You might look at other aspects, such as somehow creating community signatures. Like, I dunno, if main wallets needed sign off from 10% of the user population to do major changes. But that is rife with problems.
And really, users don't seem sophisticated enough to bother with this stuff anyways. You could just paste some JavaScript "verification" code and I'd guess you'd get loyal defenders that don't know the difference. The market that just went down did multisig, right? But since it's too difficult, people just ignored the option, no?
Forgive me if my main assumption about the motive being incorrect.
The drawback is, that you can show that the site is not compromised directly after a reboot, but you need to call your friend to login, to give the password for his validity checker. Once his app runs, other can connect to it, and use the public key of the app to check if your own app is ok.
The problem is to find someone who is independent of you, so your community trusts him, and you also need to trust him, as his code is running on your server.
I would build this idea on the Blockchain; every transaction is tracked and publicly available while details are secure
PTE: Publicly transparent entities (like a non profit, without the bullshit)
Also love the idea of having a computerized "gatekeeper"; a dangerous proposition, query-less tables that only computers can read; make it unreadable, and it is truly unhackable
> I would like to introduce some level of trust that I'm not doing anything sneaky or unexpected behind the scenes (like storing information I shouldn't be).
All you have to do is take something like OpenResty, use it to proxy the traffic and terminate SSL b/t the App and the rest of the internet, and you can do all the nefarious things you wanted.
The ability to use proxies in such a transparent manner guarantees that this isn't possible, regardless of whether or not you actually succeed in the stated aim of verifiable open source code.
Tbh, the closest viable solution is for a reliable 3rd party auditor with professional credentials to perform regular audits to match your production environment to what you tell the general public. Otherwise, you can simply circumvent whatever safeguards you create by simply using a separate application to proxy traffic.
At this point, you are in the realm of 3rd party software audits and that is an established field.
To know whether an open source program on server is modified, we send a customized different executable copy every time with one time use secrets. So when the program starts, it has to answer questions correctly and shortly (to protect against reverse engineering) to prove it's a genuine copy, then we can send it our encrypted access key. The access key will never be written to disk by a genuine copy, so a restarted program won't be able to access our data without asking for a key again, then we will know something is wrong.
The copies we upload to server functions exactly like open source one, but the user is responsible for adding secret parts to it so that it's closed source.
This would still require placing trust in a centralized entity though, and allow for administrators to manipulate or dump db info without the users knowing.
And from the answers so far, it's disheartening to hear that there doesn't seem to be any great way to guarantee fair play. But I suppose it generalizes - we don't have any way to guarantee that the people we interact with on a daily basis have our best interests at heart, and we do a lot of trusting just because it makes life easier.
"We certify that the code running on X machine came directly from Source Code Y at Version Z of Branch W."
I think it's a great idea.
Does anyone know what the state of executable signing for Linux is these days? I found some unmaintained DigSig project, and some noise about SecureBoot related patches from couple years back. And that would be just a start, I haven't heard anything that would allow enforcing code signing for "dynamic" code (like JS or Python)
If you -- or anyone in this thread -- want to build it together or to discuss ideas, please get in touch.
https://www.usenix.org/conference/osdi14/technical-sessions/...
Small bitcoin fee would be charged to the privacy nuts willing to confirm it.