Small "let's play" video: https://www.youtube.com/watch?v=4VX6TZEfqf4
EDIT TO ELABORATE:
Problem 1 is the original problem, that parts of your code do not agree upon what a particular string represents. This is the "strings problem," the mother of XSS and injection vulnerabilities. [1]
Problem 2 is that the escape_once method papers over these problems, making them harder to detect, and preventing you from hunting down the logic errors that cause them. (Since these errors often occur in upstream code, you need to find them before they execute to be safe, which is why compile-time methods [2] work best.)
[1] http://blog.moertel.com/posts/2007-08-15-a-bright-future-sec...
[2] http://blog.moertel.com/posts/2006-10-18-a-type-based-soluti...
Rails 3.x has reworked the entire escaping situation and now avoids the re-escaping trap (strings must be flagged as "HTML safe", otherwise they are escaped on final injection into a document). It still has escape_once, but it's not used by Rails itself.
You are of course right to blame 2.x for its flaws, but let's not blame entire projects for problems that have been fixed. (Not that Rails 3.x does not have other egregious inefficiencies, but that's another story.)
I generally stay away from web development so forgive me if this one is obvious, but why does so much text need to have HTML escaping performed in order to render the page? Also is there a way to quantify how much text that is? Like a few K per page or a few hundred K?
* Username in the header (which is probably validated with a specific format,
so probably doesn't need escaping).
* Project summary entered when the project is created.
* Drop-down with list of branch and tag names.
* Latest commit message.
* List of names for all folders and files tracked in git for the project.
* Readme text.
That's just the homepage for a project, and each one of those things could potentially contain malicious HTML or JS, meaning it should be escaped before rendered on the page. - posts.each do |post|
%li
= link_to post.title, post_url(post), class: "post_link"
Note that in the case of Rails 2.3 (which the article describes), Rails doesn't know if post.title, post_url() or "post_link" have been escaped or not. It originates at a time when the Rails people were fairly lax and ignorant about encodings and sanitization in general.For the post.title bit, the link_to method just passes it along, assuming it's valid HTML. Which means it's an injection point if the post was user-submitted. The URL and the {class: "post_link"} are passed through escape_once, described in the article.
As you can see, the amount of data that must potentially be sanitized/encoded can be a lot; lots of small pieces of text that add up to a lot of overhead.
Rails 3.x fixes this by assuming that no strings contain sanitized HTML; therefore, when you insert data into a template, it will automatically escape everything. This is good. If you know that something is already sanitized, you can declare it as such:
= link_to post.title.html_safe
This flags the string as safe, and link_to will not need to escape it.The latter is the source of XSS attacks, but the former is a pain too. Let's say that you run a stock photo website, and your database has images and captions. You decide you want to use captions for the alt attribute of your images, and one of your captions is Cat saying "meow!". If you blindly insert that caption into your HTML without escaping, you could end up with
<img src="cat.jpg" alt="Cat saying "meow!"">
And now you've got crappy HTML on your hands. It's the same principle as XSS attacks, but you do it to yourself with less disastrous (but still unwanted) effects.EDIT: Hm, seems like the comment I referenced is now gone? `phillmv pointed out that cross-site scripting (XSS) is an important reason to use escaping appropriately (and often widely).
Wow, that is...a lot...is that mostly rails or is that in user code?
http://meta.discourse.org/t/tuning-ruby-and-rails-for-discou...
Scroll down and you'll see that on an average request, Discourse eats up 230,000 strings [which are all objects in Ruby] per request.
The entire thread is about relaxing the garbage collector so that it won't run until the request is over.
As always GH, thanks for sharing.