I am planning on using a CDN (such as Amazon's S3) and my question is simple: How effective is obfuscating the names of publicly available files using a UUID for security purposes? For example, naming a file something like this 4b013ca21ba608373efb4717.jpg.
I would be fascinated to hear any thoughts from the HN users. Certain questions spring to mind, like:
- How easy would these be to guess? Can there be any guarantees of uniqueness?
- What is the best algorithm to create them?
- Should I just make them all private and use authenticated access controls?
I understand that this is an application specific question, that depends on what level of security I require, among other factors. But, for purposes of this discussion, lets just say that it needs to be high, but no Fort Knox: if a file was comprised it would not be critical.
Thank you in advance for any help. :)
Security: I had a similar need for my website and figured that if I'm the only one that knows about the URL then it's secure. I was dead wrong. Some browser plug-ins look at each url you enter and spider them. I know this is true because I started to see Alexa hit unpublished admin URLs on my website.
Unpublished URLs != Security.
This has the very desirable property of not being able to guess the name of the file even if you know the name of the "object".
In this case, extremely unlikely just means never unless you are being attacked by say Russians with secret technology and a room full of supercomputers. The world will almost certainly end before anyone guesses one of these numbers or before you generate two that are the same.
You can look at the probability table for the Birthday Attack in this Wikipedia article and decide how unlikely you would like this event to be: http://en.wikipedia.org/wiki/Birthday_problem
Also from the article: "In theory, MD5, 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more."
Thanks!
dsc-450.jpg > user/acbdddc1ab1b73536fabbd3f4ffeea4e dsc-450.jpgsalt > f043ce111eb398cd280b67c80c6c8ca6 usersaltdsc-450.jpg > 272ed08a85fc1e47b97462fb053a7c29
You'd need at least a secret site salt, as well as a per-user salt. If you have only a site salt, I can figure out what file names would be by uploading my own. If you had only a per-user salt, I calculate it myself. The problem of your site salt getting out remains.
At the very least, you should randomly generate identifiers. This makes it less obvious which targets are worth looking at. What you really need, though, is something like per-folder access control.
I would also recommend adding a salt to stop an attacker from being able to guess URLs by hashing common filenames. Though using the time in the hash is likely to prevent this, it's low-cost and probably worth it.
So, I would recommend (in PHP):
$url = md5( 'my_salt_string' . $time . $file_name );
But as Michael noted above, none of this can ever guarantee you security...Go read http://en.wikipedia.org/wiki/Salt_(cryptography) and come back once you understand what a salt is.
In general, relying on a "secret" URL is not a good way to keep things secret. Google has a nasty habit of finding URLs you thought had no links. Definitely tune your robots.txt to keep the images off legit search engines.
But if you put exact URLs in your robots.txt, then a normal person could pick them up from there and access your content.
Make sure you use a directory structure and forbid all files from the content directory :-)
Some other crawler may walk in there, then create pages that link your files, and then Google could link those pages, and it's game over.
Thanks for the tip about Google. It would be unacceptable for Google to index these files.
How easy would these be to guess?
There is a 1:3e38 chance of two randomly generated UUIDs colliding.
What is the best algorithm to create them?
http://en.wikipedia.org/wiki/Universally_Unique_Identifier#I...
Should I just make them all private and use authenticated access controls?
If your users don't want other people peaking at their files (DropBox): yes absolutely. If your users don't care (HotorNot), then no.
for smarter than brute-force random UUID algs, the odds of collision are lower.
Whilst it seems fair to assume two UUID will almost certainly not collide, the way you pose that last question sums up my query:
If your users don't want other people peaking at their files (DropBox): yes absolutely
The answer is yes. Ultimately, users should not be able to see each others files, and from what you and other commenters are saying, UUID's on published files are still not robust enough to achieve this.
Thanks for your information.
One often overlooked leak: the 'Referer' [sic] header. If your document is hypertext, and includes outlinks to elsewhere, and the authorized user(s) click those links, those outlink-target sites may receive your confidential URL as a 'Referer' header. As some sites then publish their referrers, in one way or another, the 'secret' URL could wind up in public.
Remember this before creating a 'Competitors' page with outlinks on a 'login-required' but plain-HTTP wiki!
It's a great point about the referrer header too. Thanks!
Version 1: MAC + timestamp => easy to predict
Version 2: MAC + some other static data + partial timestamp: => Also easy to predict
Version 3: MD5 of some file or random string => If an attacker has a file, he can generate the MD5 hash himself and see if you also have this file.
Version 4: Random data => slow
Version 5: Same as version 3, but with SHA1.
Your best bet IMHO is to use the HMAC of the file. This will defend you against all the flaws that using an unique ID would have.
The second step is to ensure that your secret links don't leak. You can employ robots.txt to disallow robots, use dereferer.org or anonym.to to hide away referers, but you still won't be secure, as someone can still copy and paste the link. If that is ok with you, then you can stop reading now.
You could of course add an EC2 machine between the user and your S3 storage that makes sure each link only works once, but this would be expensive and counter-effective. However, Amazon S3 allows you to create a request that can be made via HTTP GET and that is only valid until a specific time. (See the API documentation, chapter "Authentication and Access Control"). This will allow you to generate a new URL every time you want to serve a file to your client. The URL will only be valid for a specific time period. The downside is, that this again is time-demanding and that all caching on the user side will be useless.
Good luck!
There are two things, security and the perception of security. For arguments sake, even if we assume that the method of simply creating UUIDs for file names is secure (which as several people have said is not a valid assumption), I would argue that it does not provide a good enough perception of security.
So, if your users are really concerned about security and afraid that others will look at their files, a simple solution as above would just not cut it. You will have to really convince them that your system is really secure. Now, no matter what you use, UUID, MD5, SHA1/2/256 etc. to them it won't make a difference and that may mean that you will lose users.
Based on that, my suggestion would be to do what you said above, make them all private and use authenticated access control. This will provide security as well as perception of security and get you more satisfied users.