One of the implicit point of the article (that maybe shouldn't be implicit) is that these issues are not, in fact, orthogonal.
For example this:
Most of the computer languages used to write web applications such as DCMS systems contain a feature called eval, where programming instructions can be deliberately promoted from data to code at runtime.
In other words, proper input hygiene is a problem because you're dealing with a language that allows execution of data (i.e. a dynamic language).
If you mean comparing type systems there isn't much of a debate!
The obvious flaw in your example: you can exec a program unsafely in both C and in Scala, but only in C can you do it accidentally simply by idiomatically copying a string from one place to another.
Yes, but is it probable? History says no.
I just wrote an article on a sensible metric by which you can do exactly that: http://www.jerf.org/iri/post/2942
A lot of people already knew that stuff on one level or another anyhow, but it's helpful to spell it out sometimes and bring subconscious feelings up to the conscious level.
> Even if [..] and there's nothing like bash installed on the same computer as the web server
Bash installed? Huh? Why Bash exactly? I feel mentioning jails or containers here would be more on point..
> This is because every DCMS page view involves running a few tiny bits of software on your web server, rather than just returning the contents of some files that were generated earlier.
Sure, but guess how those pages are returned? By running code on the server..
It seems like hes real beef is with "dynamic" (vs staic) sites, but he keeps mentioning CMS's for some reasons (ike you can't can have "dynamic" sites witjout a cms)
> The web server executes no code on behalf of a viewer until that viewer has logged in..
1) Of course it does, 2) How do the site check your info without executing code? :)
etc etc..
There's a case for static sites, but this post just confuses things.
> Sure, but guess how those pages are returned? By running code on the server..
Yes, but by running static code.
> > The web server executes no code on behalf of a viewer until that viewer has logged in.. > 1) Of course it does, 2) How do the site check your info without executing code? :)
Again, static code.
So his point is not that we shouldn't run code at all, but that we shouldn't run code that heavily processes user inputs, or worse, evaluate generated code as DCMSs sometimes do.
Of course you can argue the no code is truly static, because it depends on the user input, but I don't think this is what you're arguing here.
what the hell is static code? Static has very specific meanings in different technical contexts (static pages, static allocation, static scoping, etc), but I've never heard someone refer to static code.
Can you give me an example of code that is and isn't static by your definition?
I think he might know a little bit: http://en.wikipedia.org/wiki/Paul_Vixie
Either this article is filled with intentional hyperbole, or this guy doesn't know what he's talking about when it comes to serving web pages.
Maybe because of this? https://en.wikipedia.org/wiki/Shellshock_%28software_bug%29
(I'm not arguing against the notion that static sites can be more secure, just that the article is bad ;)
(By ossify, I mean to take something dynamic and make it static).
For example, that WordPress site you commissioned for a movie 3 years ago? Its a huge liability, but you don't have to take it offline - just ossify it. No one is updating that blog anymore!
Under the hood, it would basically be a crawler, and the deliverable would be a zip file containing a 1-to-1, static copy of their website with all URLs still working. I suspect most folks here could whip up a shitty proof of concept in 48 hours.
If someone does this, email me! I have a couple of potential clients for you (I'm a former consultant, with lots of WordPress sites in my history).
wget --mirror --convert-links http://site.example.com/
From the wget manual: --convert-links
After the download is complete, convert the links in the document
to make them suitable for local viewing. This affects not only the
visible hyperlinks, but any part of the document that links to
external content, such as embedded images, links to style sheets,
hyperlinks to non-HTML content, etc.
The links to files that have been downloaded by Wget will be
changed to refer to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to
/bar/img.gif, also downloaded, then the link in doc.html will
be modified to point to ../bar/img.gif. This kind of
transformation works reliably for arbitrary combinations of
directories.I'd be happy to bill your clients to do it for them though!
They want to write a check and get back to business, not become a web developer!
(I think that tool is awesome though, and I appreciate the tip!)
1) The intention is that you replace your dynamic site with the static copy, but your visitors are none the wiser. All URLs are the same, as well as the content returned. Might require some .htaccess trickery.
2) It would have to preserve all the images, css, and other assets, some possibly hotlinked. (The Wayback Machine is not awesome at this, understandably)
While I'm on the subject, can anyone tell me the reasoning behind loading blog content with javascript? I hate when I visit a blog with no script turn on, and get greeted by a blank template. Why does it need javascript to grab the page content?
As for loading content with javascript -- I totally agree, this bugs the shit out of me. Also, I browse with cookies disabled by default and it is always frustrating when a site that is only serving content doesn't load properly without cookies. wtf?!
You can make WordPress generate static pages too. They have plugins for that.
The more resources you can cache fully (like the template) the fewer round trips you need to make and/or the shorter those round trips become. It's always the latency that slows us down (by definition), which is why even modern processors have sophisticated cache layers and branch prediction. The further from the CPU the resource is, the slower the interaction will be.
This debate is interesting and a bit funny to me because it's very similar to the old dumb terminal vs personal computer debate. We've swung back to the mainframe, except now it's distributed and we call it the cloud.
My prediction (take it or leave it) is that we'll soon be swung all the way back to dynamic client-side sites. And then eventually back to the quantum cloud (heh, "Electron Cloud"). Or something new and more powerful than whatever sits in our pocket or on our desk.
If your blog has a reusable template, wouldn't the images, fonts, and stylesheets that are part of the template get cached on the user's computer, regardless of whether it was dynamic or static?
Or are you talking about caching things serverside?
So a couple of years ago I decided to do static sites for ALL my clients (there may be dynamic components that I normally implement with a back-end data service). I still have a few dozen sites and web apps with DCMS but the goal is to migrate them as well.
1) Use a headless CMS. WordPress on the backend that provides and API which is consumed by a Node app, for example.
2) Shift any user-facing dynamic feature off the CMS. Commenting, login, subscription management, etc., can be handled by purpose-built apps that tie into the CMS-driven site via Javascript, preserving the security and cacheability of the CMS.
That's not to say that it's impossible to drive a large-scale news site with static files. I believe CNN does exactly that with their in-house CMS. But no open-source CMS that generates static files is powerful enough to use in a newsroom context, or popular enough to gain traction.
Just because we don't really have common, enterprise-grade authoring tools for non-technical people that publish static sites anymore doesn't mean that it's not the better way.
There is a minimal difference between a webserver serving static pages and a caching server serving static content. When you get down to it caching is simply a more flexable approach to the classic (autoring tool) -> static webpage approach. In many ways the only difference is the authoring tool is a website not a stand alone program.
My point being, sure you can get a more secure `something` by making it more and more static, but you'll probably cripple it somehow.
It's simply a balance you have to find for your use case.
I think problem is that current dynamic websites are sort of crippled already. Right now even simple shopping app requires UI based on HTML + web. Not a chance to use command line, some automated devices etc... In future we might see radically simplified protocols/webservices for more universal access.
All of the web apps I make at work are all javascript in a page apps. But before I start doing any of that, I make a REST API. 100% of the interaction between javascript and the web server is REST.
There are many reasons for this, but a key one is that it allows easy command-line or programatic interaction. Much easier than with traditional, server generated web apps.
This usually boils down to the shopping cart and checkout example... you can always attack the checkout process as it is a unique, dynamic part of the process that no web store wishes to ever be unavailable.
How does one "go static" with web applications that by necessity involve interactions with datastores?
In other cases, it's probably best to have each stateful system as an isolated component. For example, having a dynamic checkout doesn't require the news section to be dynamic. In fact, if you can isolate components like your checkout, you may be able to have someone else manage it for you. For example, at a previous job we used FoxyCart to deal with online checkouts; we just embedded specially-crafted URLs into our pages (although those pages were still running in Drupal!).
That's not the only option, but it's a boss-friendly company to name drop. Most organizations wouldn't blink at the price, especially if it means you can move off dynamic hosting to far cheaper static hosting.
The better solution for both dynamic and static sites is to set up a search appliance like elasticsearch, solr, or algolia. You can use JS to query it and still be static on the server.
If you do set up your own, remember to use a reverse proxy like nginx to avoid exposing elasticsearch directly to the internet.
I guess maybe we need people to use static sites, like trainer wheels on bikes, until they become more security concious.
Becoming "security concious"[sic] doesn't mean outgrowing best practices. If Bruce Schneier used "password" as his password, he wouldn't avoid getting attacked just because he knew it was a bad practice. Likewise, understanding the tradeoffs between static and dynamic Web sites doesn't make someone's dynamic site secure.
As the article points out, even a locked-down, well-tuned dynamic site with CAPTCHA-protected registration forms is orders of magnitude easier to bring down with DDoS attacks, since dynamic sites must perform more work per request, eg. to render "Hello CaptchaFarmUser99999" at the top of the page. If they don't need to perform more work per request, since all pages are always fully cached, then you've just re-invented static sites :)
Take the classic case of SQL injection. You have string input into your system that turns into string input into a SQL query that turns into string input to a database. That is dangerous because if you don't check on what the input string contains, it might contain nothing, or a semicolon, or it might not be a string at all!
We understand that putting a string direct into a SQL statement is dangerous at this point, but we have yet to fix its root cause - nonexistant protocols or boundaries in most code we write.
What a static language changes in that regard is compiler checked type signatures on your code. That generally stops you from say passing an Integer into something that needs a String. That solves a certain class of problems for sure, and the complier does it for you every time you change your code, so there is a convenience there.
What static typing doesn't give you is actual data correctness. Things like buffer overflows or SQL injection can still happen with static typing. You could use a language like Scala or Haskell to have stronger/more complex types that would have more distinct notion of value correctness and at that point the complier would be doing most of the work to ensure your program is correct.
Leaning on a type system in that regard is basically turning your types into the protocols that determine correctness in your system.
It is also possible to lean on stronger protocols that check messages in a dynamic languages to achieve largely the same thing.
In the end, to write safe, high quality software, you need to define the communication protocols between methods/functions/routines/services and enforce them much as you would with an externally facing REST api.
The difference between a dynamic system with dynamic protocol checking vs a static system with compiler type checking is the mechanism you are using to enforce the protocol and how easy it is to interact with it.
Dynamic systems might be easier to interface with externally because you don't have to understand a complex type, just pass a Hash/Dictionary sort of like a JSON API, vs a static system where you need to use the right types and so on, similar to a SOAP/WSDL API.
Performance is also a consideration, but really when you compare static vs dynamic, it is important to understand that at the end of the day you can write Ruby/Python/PHP that is functionally equivalent to C/C++/Java. They are all ultimately going to be able to do the same kinds of things.
The tradeoff is in how they solve the problem and how well that fits with the team writing the software.