Ask HN: Is security by obfuscation sufficient? | Better HN

24 comments

24 comments · 9 top-level

MichaelApproved16y ago· 9 in thread

Random file name: You can create an md5 hash of the original file name and use that for the public file name. From what I understand it's extremely unlikely to have two strings hash into the same value. It's also long and nearly impossible to guess.

Security: I had a similar need for my website and figured that if I'm the only one that knows about the URL then it's secure. I was dead wrong. Some browser plug-ins look at each url you enter and spider them. I know this is true because I started to see Alexa hit unpublished admin URLs on my website.

Unpublished URLs != Security.

antirez16y ago

Instead to use the MD5 of the file name do this: create a random string and associate this string in your database. Simply grab 256 bits from /dev/urandom for instance (and convert this in hex).

This has the very desirable property of not being able to guess the name of the file even if you know the name of the "object".

andrewljohnson16y ago

If you compute a string that is say 32 characters long using just letters and numbers, then there are 62^32 possible combinations. This number is 58 digits long.

In this case, extremely unlikely just means never unless you are being attacked by say Russians with secret technology and a room full of supercomputers. The world will almost certainly end before anyone guesses one of these numbers or before you generate two that are the same.

You can look at the probability table for the Birthday Attack in this Wikipedia article and decide how unlikely you would like this event to be: http://en.wikipedia.org/wiki/Birthday_problem

Also from the article: "In theory, MD5, 128 bits, should stay within that range until about 820 billion documents, even if its possible outputs are many more."

tome16y ago

You're also relying on the unpredictability of MD5 here, which is not so watertight as it was once believed to be:

http://en.wikipedia.org/wiki/Md5#Vulnerability

milestinsleyOP16y ago

"Some browser plug-ins look at each url you enter and spider them" ... wow! I didn't know this.

Thanks!

mkyc16y ago

Whoa, no, don't do this.

dsc-450.jpg > user/acbdddc1ab1b73536fabbd3f4ffeea4e dsc-450.jpgsalt > f043ce111eb398cd280b67c80c6c8ca6 usersaltdsc-450.jpg > 272ed08a85fc1e47b97462fb053a7c29

You'd need at least a secret site salt, as well as a per-user salt. If you have only a site salt, I can figure out what file names would be by uploading my own. If you had only a per-user salt, I calculate it myself. The problem of your site salt getting out remains.

At the very least, you should randomly generate identifiers. This makes it less obvious which targets are worth looking at. What you really need, though, is something like per-folder access control.

mechanical_fish16y ago

Precisely. URLs are not secrets. Browser plug-ins can leak them, they get logged in every proxy's log files, they show up on people's screens, they get emailed in the clear, they get bookmarked, they are easily sniffed...

aarongough16y ago

I would second the idea of using a hash as the secret file name, however I would also recommend salting the hash with the current unix time. While having a collision between 2 different strings is extremely unlikely, having 2 users upload files with the same name is going to happen at some point.

I would also recommend adding a salt to stop an attacker from being able to guess URLs by hashing common filenames. Though using the time in the hash is likely to prevent this, it's low-cost and probably worth it.

So, I would recommend (in PHP):

  $url = md5( 'my_salt_string' . $time . $file_name );

But as Michael noted above, none of this can ever guarantee you security...

mooism216y ago

Or create an md5 (or sha1) hash of the file. (Salted, unless you want people to use you as a rainbow table --- I would salt it with the content type.)

cperciva16y ago

Salted, unless you want people to use you as a rainbow table --- I would salt it with the content type.

Go read http://en.wikipedia.org/wiki/Salt_(cryptography) and come back once you understand what a salt is.

eli16y ago· 3 in thread

It's not clear what you're trying to secure against. Are you worried about securing a particular image so that only certain designated users can see it? Are you worried about the original name "leaking"? Are you worried about someone iterating through all of your images?

In general, relying on a "secret" URL is not a good way to keep things secret. Google has a nasty habit of finding URLs you thought had no links. Definitely tune your robots.txt to keep the images off legit search engines.

chris10016y ago

Definitely tune your robots.txt

But if you put exact URLs in your robots.txt, then a normal person could pick them up from there and access your content.

Make sure you use a directory structure and forbid all files from the content directory :-)

andrewljohnson16y ago

It's also worth pointing out that only the likes of Google and Yahoo will even look at robots.txt.

Some other crawler may walk in there, then create pages that link your files, and then Google could link those pages, and it's game over.

milestinsleyOP16y ago

My main concerns are securing a file so only certain designated users can see it. And, yes, someone iterating through them would be bad also!

Thanks for the tip about Google. It would be unacceptable for Google to index these files.

falsestprophet16y ago· 2 in thread

First of all, Amazon S3 is absolutely not a CDN.

How easy would these be to guess?

There is a 1:3e38 chance of two randomly generated UUIDs colliding.

What is the best algorithm to create them?

http://en.wikipedia.org/wiki/Universally_Unique_Identifier#I...

Should I just make them all private and use authenticated access controls?

If your users don't want other people peaking at their files (DropBox): yes absolutely. If your users don't care (HotorNot), then no.

dustingetz16y ago

"Randomly generated UUIDs ... To put these numbers into perspective, one's annual risk of being hit by a meteorite is estimated to be one chance in 17 billion [24], that means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs."

for smarter than brute-force random UUID algs, the odds of collision are lower.

milestinsleyOP16y ago

I guess Amazon Cloud Front is the CDN then, rather than S3 itself! :)

Whilst it seems fair to assume two UUID will almost certainly not collide, the way you pose that last question sums up my query:

If your users don't want other people peaking at their files (DropBox): yes absolutely

The answer is yes. Ultimately, users should not be able to see each others files, and from what you and other commenters are saying, UUID's on published files are still not robust enough to achieve this.

Thanks for your information.

gojomo16y ago· 1 in thread

Obfuscation via a suitably-long unpredictable URL can work, but as others have noted, there are a number of ways such an URL can leak -- toolbars/browser-plugins being one of them. Also, that a user can easily forward the URL to others to grant access may be a bug or a feature, depending on your preferences.

One often overlooked leak: the 'Referer' [sic] header. If your document is hypertext, and includes outlinks to elsewhere, and the authorized user(s) click those links, those outlink-target sites may receive your confidential URL as a 'Referer' header. As some sites then publish their referrers, in one way or another, the 'secret' URL could wind up in public.

Remember this before creating a 'Competitors' page with outlinks on a 'login-required' but plain-HTTP wiki!

milestinsleyOP16y ago

Yes. This concept of 'leaking' the URL is becoming apparent to me. I naively assumed you could keep them semi-private (by simply not telling anyone), but this approach is doesn't appear to hold up.

It's a great point about the referrer header too. Thanks!

soult16y ago

There are two things to consider: First, you need to make sure that your filename is a long, unique, hard-to-guess string, that is easy to generate. This rules out all 5 UUID specifications:

Version 1: MAC + timestamp => easy to predict

Version 2: MAC + some other static data + partial timestamp: => Also easy to predict

Version 3: MD5 of some file or random string => If an attacker has a file, he can generate the MD5 hash himself and see if you also have this file.

Version 4: Random data => slow

Version 5: Same as version 3, but with SHA1.

Your best bet IMHO is to use the HMAC of the file. This will defend you against all the flaws that using an unique ID would have.

The second step is to ensure that your secret links don't leak. You can employ robots.txt to disallow robots, use dereferer.org or anonym.to to hide away referers, but you still won't be secure, as someone can still copy and paste the link. If that is ok with you, then you can stop reading now.

You could of course add an EC2 machine between the user and your S3 storage that makes sure each link only works once, but this would be expensive and counter-effective. However, Amazon S3 allows you to create a request that can be made via HTTP GET and that is only valid until a specific time. (See the API documentation, chapter "Authentication and Access Control"). This will allow you to generate a new URL every time you want to serve a file to your client. The URL will only be valid for a specific time period. The downside is, that this again is time-demanding and that all caching on the user side will be useless.

Good luck!

yumraj16y ago

Let me try to answer your question in a different way.

There are two things, security and the perception of security. For arguments sake, even if we assume that the method of simply creating UUIDs for file names is secure (which as several people have said is not a valid assumption), I would argue that it does not provide a good enough perception of security.

So, if your users are really concerned about security and afraid that others will look at their files, a simple solution as above would just not cut it. You will have to really convince them that your system is really secure. Now, no matter what you use, UUID, MD5, SHA1/2/256 etc. to them it won't make a difference and that may mean that you will lose users.

Based on that, my suggestion would be to do what you said above, make them all private and use authenticated access control. This will provide security as well as perception of security and get you more satisfied users.

prodigal_erik16y ago

The secret filename is not only on the wire but will show up in history and logs, which makes it a step down from basic HTTP authorization. I'd use at least digest authorization for anything that matters in any way.

rbrcurtis16y ago

if you are suggesting that you leave these secret urls open to the entire internet, and just rely on them being inaccessbile, you're looking for trouble. Depending on the type of documents, at the least your users may not be happy about them being unauthenticated, and I personally would worry about legal issues. Add authentication.

clutchski16y ago

It depends on your risk profile. For a bank, no. For an lolcatz photo site, sure.

j / k navigate · click thread line to collapse