That's not really the problem. The problem is there is no distinction between data and control leading to everything coming to you in one binary stream. If the control aspect would be out-of-band then the problem would really go away.
Length prefixes will just turn into one more thing to overwrite or intercept and change. That's much harder to do when you can't get at the control channel but just at the data channel. Many old school protocols worked like this.
This is the important takeaway here. Changing the encoding simply swaps out one set of vulnerabilities and attacks for another. Separating control flow and data is the actual silver bullet for this category of attacks.
Unfortunately, there’s rarely ever a totally clear logical separation between the two. Anything you want to bucket into “control”, someone else is going to want the client to be able to manipulate as data.
Granted, if you made that control channel stateful, you'd make a lot of problems go away. But you could do that with a combined control/data stream too.
What am I missing? How would an out-of-band control channel make things easier?
That said, I think many issues with the web could be solved by implementing new protocols as opposed to shoehorning everything into HTTP just to avoid a firewall...
So <html>abc</html> would go as
<html><datum 1></html> where datum 1 would refer to the first datum in the data stream, being 'abc' and no matter what trickery you'd pull to try to put another tag or executable bit or other such nonsense in the datum it would never be interpreted. This blocks any and all attacks based on being able to trick the server or eventual recipient browser of the two streams to do something active with the datum, it can only be passive data by definition.
For comparison take DTMF, which is inband signalling and so easily spoofed (and with the 'bluebox' additional tones may be generated that unlock interesting capabilities in systems on the line) and compare with GSM which does all its signaling out-of-band, and so is much harder to spoof.
The web is basically like DTMF, if you can enter data into a form and that data is spit back out again in some web page to be rendered by the browser later on you have a vector to inject something malicious and it will take a very well thought out sanitation process to get rid of all the possibilities in which you might do that.
If the web were more like GSM you could sit there and inject data in to the data channel until the cows came home but it would never ever lead to a security issue.
No amount of extra encoding and checks will ever close these holes completely as long as the data stays 'in band' with the control information.
I could easily see making <script> and <link> resources required to be separately requested (like images are now -- ignoring data/base64 resources), but we're back to redefining HTML.
I'm not arguing against that...
It's really hard to have these types of debates though, because everyone focuses on different problems of the HTTP/HTML webapp request/response cycle. Like you said, adding separate control/data channels would help, but that doesn't solve SQL injection attacks (which is a whole other class, but that's not really an HTTP/HTML issue, it's a backend issue and I don't see how you'd avoid that with a simple protocol change). Simply making HTTP stateful could potentially solve a different class of session highjacking, etc...
There are so many attack vectors that I think it does make sense to think about what a replacement for HTTP/HTML would look like. Most of these problems arise from trying to re-engineer a document format (HTML) to support interactive webapps. We should think about how to do this better... (without recreating ActiveX -- shudder).
(html "abc")
This guarantees that no matter what is inside "abc" it simply can't escape into the control stream: (html "This is not (malicious \"boo\")")
This is just a pretty display of what would actually be these bytes: (4:html29:This is not (malicious "boo"))
It doesn't matter what one puts in the atom: it can't escape and damage the control stream.Or, run your data through stored procedures instead. It took me a while to figure out why stored procedures were so much more secure than regular queries. I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.
Perhaps the most naive example: https://pastebin.com/acQqhDvy
I think they're more useful for organization and abstraction than security. Then again, a well organized and smartly abstracted system can lead to better security!
But I think bind parameters are probably a better example of security.
Binding effectively separates the data from the logic. So you define two separate types of things, and then safely join those things together by binding them. It doesn't matter too much whether that happens in the application making a call to the database or in the database in a stored procedure. Obviously this same concept can be applied at many different points along the application stack. The analogous concept in the UI is templating. You define a template and then safely inject data into that template.
This isn't well defined. Take this pseudocode stored procedure (OK, it's a python function):
def retrieve_relevant_data(user_input):
if user_input == 1:
return BACKING_STORE[5]
elif user_input == 2:
perform_side_effects()
return BACKING_STORE[1]
else:
return "Go away."
You can provide any input to that. You could think of this as a function which "treats all input as data with no possibility to run as code" (it never calls eval!). But you could also usefully think of this as defining a tiny virtual machine with opcodes 1 and 2. If you think of it that way, you'll be forced to conclude that it does run user input as code, but the difference is in how you're labeling the function, not in what the function does.The security gain from a stored procedure, on this analysis, is not that it won't run user input as code. It will! The security gain comes from replacing the full capability of the database ("run code on your local machine") with the smaller, whitelisted set of capabilities defined in the stored procedure.
String escaping SQL? How is anyone thinking that is still a thing in 2017? The problem has been solved for two decades
* At least with .Net/Entity Framework/Linq you mock out your dbcontext and test your queries with an in memory List<>
https://msdn.microsoft.com/en-us/library/dn314429(v=vs.113)....