Oh I see dropship is mentioned in the paper, great :)
In any case, interesting that they found some previously unknown security holes this way. This again proves that security through obscurity, at least for client software, doesn't work. When will people learn. You can't hide anything on the client for the user, at least not for long.
Presumably dropbox is through its enormous distribution a very fat target and I find it hard to believe that this published effort would be the first instance of such an undertaking. You're average blackhat isn't going to publish his hack but will market it for all it is worth.
Then you get pages like these:
http://1337day.com/exploit/description/19604
(click 'ok')
I don't think the dropbox team obfuscates their code as a security measure, they more likely do it to increase the depth of their moat by a little bit and to make it a bit harder to write third party clients against their non-published api's.
"The contrast with my visitor the next day couldn't be greater. Through a former colleague I got an introduction to Drew Houston, co-founder and CEO of the vastly successful start-up company Dropbox.
Python plays an important role in Dropbox's success: the Dropbox client, which runs on Windows, Mac and Linux (!), is written in Python. This is key to the portability: everything except the UI is cross-platform. (The UI uses a Python-ObjC bridge on Mac, and wxPython on the other platforms.) Performance has never been a problem -- understanding that a small number of critical pieces were written in C, including a custom memory allocator used for a certain type of objects whose pattern of allocation involves allocating 100,000s of them and then releasing all but a few. Before you jump in to open up the Dropbox distro and learn all about how it works, beware that the source code is not included and the bytecode is obfuscated. Drew's no fool. And he laughs at the poor competitors who are using Java."
Sometime after that, Drew poached Guido from Google. I remember this post. :)
[1] http://docs.python.org/2/c-api/typeobj.html#PyTypeObject.tp_allocThis "weakness" is no different than the weakness of two-factor authentication in any scenario where login is persistent. I have two-factor gmail authentication for gmail with "remember me" set so I do not have to log in every day. If someone steals my laptop and gets my cookies, they can log in as me regardless of two-factor authentication, until the cookie authentication expires.
I did this precisely because the laptop is a single point of failure. Steal somebody's laptop and bam, you've got access to everything important to that person.
My Android phone is also encrypted (with a much weaker password) and I can also remotely delete everything on it through Google Apps.
I'd also think that authentication was (should be) a server-side thing: and that at that point you'd get some form of session/token/ticket.
It's like when a new iPhone comes out and they throw the custom silicon under electron microscopes. It's entertaining, and I'm sure fun for the people doing it, but fighting information wars against ourselves just seems silly.
There are large problems humans don't have answers to, but we're busy making things then figuring out how the things we made work. Madness ensues.
Many technologies have been developed or accelerated through the need to reverse engineer something. I would argue the techniques developed to break the Enigma Code during WW2 had profound effects on computing generally.
Often reverse engineering a technology can also allow you to make improvements the other party has yet to realise, catalysing new ideas and research.
Not that all this means you are necessarily wrong, although perhaps it is a little too idealistic to hope for a world where information isn't a valuable currency?
Imagine a company where Team Database releases a binary-only library to the rest of the company. They won't tell you how it works and you can't talk to them, but it seems to work well enough. Then one day, Team Website wants to do something else with the database (a new type of query, new type of storage model, something non-trivial). In this backwards company, Team Website spends months reverse engineering the library and protocol to hack their own functionality into it. That's mad, right?
A large view presents two views of knowledge: things humans know —and— things humans don't know. We're circling around rediscovering what other people have done while they sit there quite able to give us what we want to know.
Now, adversarial conditions prevent such blanket sharing: capitalism, sovereign nations, war, etc.
Think of Intel. In some ways, they control the pinnacle of CPU design that humanity can surface at this point in time. We don't have anybody to ask "well, what comes next?" in the 10 year CPU roadmap—we have to discover the future along the way.
We should spend more time asking "well, what comes next?" and less time rediscovering what people already know how to do (modulo it making you better at actually discovering new things, or just for fun, or for cyberwar, etc).
Simply asking Dropbox how this stuff worked would've (probably) never uncovered these security issues.
Edit:
Just wanted to add one more benefit of this attempt at reverse engineering, from the whitepaper's introduction:
> Our work reveals the internal API used by Dropbox client and makes it straightforward to write a portable open-source Dropbox client
Is it not bad enough the Microsoft and Adobe hegemony force the entire world to have an attack surface wider than Jupiter to exploit at the whims of eastern european teenagers?
Say, I needed write a custom GPU driver for some device, either to improve performance for some specific application or to work outside the dependency or API constraints of the binary blob (like porting to another OS). Usually vendors provide no register level documentation about graphics hardware, so the only way to do this is by reverse engineering.
Another reason for reverse engineering can be to find backdoors and security vulnerabilities (like these guys did) or even for legal reasons to find whether some copyrighted (or GPLed) code was used.
No madness needed at all. Or maybe just a bit.
- you feel like cracking a 256-bit random value remotely (can't locally bruteforce it), or
- you have filesystem access.
I'd say both are irrelevant. You can't crack 256-bit values locally, let alone if you have to check the value remotely, and with filesystem access I imagine you can do a whole lot more than just uploading files to someone's Dropbox.
Bypassing two-factor authentication with either of the options is possible though, and I can see the issue, but this is by design. I don't think you want to have to enter your credentials (username, password, second factor) every single time you store a file or check for updates.
But I'm glad to hear that they found no "actual" weakness, that would enable a hacker with only my account name, or who is on my WiFi, to access my Dropbox.
To a _conservative, lean organization_, it's better to constrain customer use cases to known good clients than to handle fallout such as "I lost all my data!" "What rev of client were you running?" "zAxX0r'2 m0D51c|< ph3y3Ldr.0p 0.0.69r."
That said, I could hope for Dropbox to evolve to a more open (ssh-based?) model, though I'm not a security architect :)
> We found that two-factor authentication (as used by Dropbox) only protects against unauthorized access to the Dropbox’s website. The Dropbox internal client API does not support or use two-factor authentication!
Fun find from the source code: There's a module named "gandolf.py" which appears to have something to do with version control.