undefined | Better HN

0 pointsjacquesm8y ago0 comments

> if html used length-prefixed tags rather than open/close tags most injection attacks would go away immediately.

That's not really the problem. The problem is there is no distinction between data and control leading to everything coming to you in one binary stream. If the control aspect would be out-of-band then the problem would really go away.

Length prefixes will just turn into one more thing to overwrite or intercept and change. That's much harder to do when you can't get at the control channel but just at the data channel. Many old school protocols worked like this.

0 comments

stouset8y ago

Thank you.

This is the important takeaway here. Changing the encoding simply swaps out one set of vulnerabilities and attacks for another. Separating control flow and data is the actual silver bullet for this category of attacks.

Unfortunately, there’s rarely ever a totally clear logical separation between the two. Anything you want to bucket into “control”, someone else is going to want the client to be able to manipulate as data.

mbreese8y ago

I'm having a hard time seeing how having separate control and data streams would have an effect here. Using FTP to retrieve a document isn't more secure than HTTP... the problem is in how the document itself is parsed. If you added a separate side channel for requesting data (a la FTP), you'd still have the issue of parsing the HTML on the other side.

Granted, if you made that control channel stateful, you'd make a lot of problems go away. But you could do that with a combined control/data stream too.

What am I missing? How would an out-of-band control channel make things easier?

That said, I think many issues with the web could be solved by implementing new protocols as opposed to shoehorning everything into HTTP just to avoid a firewall...

jacquesmOP8y ago

It makes sure that all your code is yours and that no matter what stuff makes it into the data stream it will never be able to do anything because it is just meant to be rendered.

So <html>abc</html> would go as

<html><datum 1></html> where datum 1 would refer to the first datum in the data stream, being 'abc' and no matter what trickery you'd pull to try to put another tag or executable bit or other such nonsense in the datum it would never be interpreted. This blocks any and all attacks based on being able to trick the server or eventual recipient browser of the two streams to do something active with the datum, it can only be passive data by definition.

For comparison take DTMF, which is inband signalling and so easily spoofed (and with the 'bluebox' additional tones may be generated that unlock interesting capabilities in systems on the line) and compare with GSM which does all its signaling out-of-band, and so is much harder to spoof.

The web is basically like DTMF, if you can enter data into a form and that data is spit back out again in some web page to be rendered by the browser later on you have a vector to inject something malicious and it will take a very well thought out sanitation process to get rid of all the possibilities in which you might do that.

If the web were more like GSM you could sit there and inject data in to the data channel until the cows came home but it would never ever lead to a security issue.

No amount of extra encoding and checks will ever close these holes completely as long as the data stays 'in band' with the control information.

mbreese8y ago

I guess what I'm getting at is that it isn't HTTP that's the issue -- it's HTML. I'm all for a control channel in HTTP. But you're still stuck parsing <html><datum_1></html>, and it is difficult to think about reorganizing each tag as a separate datum. At what level do you stop converting the data into separately requestable bits? How would you even code it? And making the tags themselves length-prefixed (like csexp's) wouldn't entirely solve the problem.

I could easily see making <script> and <link> resources required to be separately requested (like images are now -- ignoring data/base64 resources), but we're back to redefining HTML.

I'm not arguing against that...

It's really hard to have these types of debates though, because everyone focuses on different problems of the HTTP/HTML webapp request/response cycle. Like you said, adding separate control/data channels would help, but that doesn't solve SQL injection attacks (which is a whole other class, but that's not really an HTTP/HTML issue, it's a backend issue and I don't see how you'd avoid that with a simple protocol change). Simply making HTTP stateful could potentially solve a different class of session highjacking, etc...

There are so many attack vectors that I think it does make sense to think about what a replacement for HTTP/HTML would look like. Most of these problems arise from trying to re-engineer a document format (HTML) to support interactive webapps. We should think about how to do this better... (without recreating ActiveX -- shudder).

1 more reply

eadmund8y ago

Or, e.g. my preferred encoding of HTML:

    (html "abc")

This guarantees that no matter what is inside "abc" it simply can't escape into the control stream:

    (html "This is not (malicious \"boo\")")

This is just a pretty display of what would actually be these bytes:

    (4:html29:This is not (malicious "boo"))

It doesn't matter what one puts in the atom: it can't escape and damage the control stream.

1 more reply

51stpage8y ago

SQL injection attacks are an excellent example where code and data are mixed. One solution is to do a lot of clever escaping of 'attackable' characters that instruct the DBMS to stop treating a character string as data and start executing things [1]. Escaping attackable characters attempts to partition data from code. This usually works but not perfectly.

Or, run your data through stored procedures instead. It took me a while to figure out why stored procedures were so much more secure than regular queries. I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.

[1] https://xkcd.com/327/

stdgy8y ago

Hmm. I'm going to have to disagree about Stored Procedures providing security. You can do all sorts of bad things using stored procedures that may result in unintended code execution!

Perhaps the most naive example: https://pastebin.com/acQqhDvy

I think they're more useful for organization and abstraction than security. Then again, a well organized and smartly abstracted system can lead to better security!

But I think bind parameters are probably a better example of security.

Binding effectively separates the data from the logic. So you define two separate types of things, and then safely join those things together by binding them. It doesn't matter too much whether that happens in the application making a call to the database or in the database in a stored procedure. Obviously this same concept can be applied at many different points along the application stack. The analogous concept in the UI is templating. You define a template and then safely inject data into that template.

thaumasiotes8y ago

> I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.

This isn't well defined. Take this pseudocode stored procedure (OK, it's a python function):

    def retrieve_relevant_data(user_input):
        if user_input == 1:
            return BACKING_STORE[5]
        elif user_input == 2:
            perform_side_effects()
            return BACKING_STORE[1]
        else:
            return "Go away."

You can provide any input to that. You could think of this as a function which "treats all input as data with no possibility to run as code" (it never calls eval!). But you could also usefully think of this as defining a tiny virtual machine with opcodes 1 and 2. If you think of it that way, you'll be forced to conclude that it does run user input as code, but the difference is in how you're labeling the function, not in what the function does.

The security gain from a stored procedure, on this analysis, is not that it won't run user input as code. It will! The security gain comes from replacing the full capability of the database ("run code on your local machine") with the smaller, whitelisted set of capabilities defined in the stored procedure.

2 more replies

edoceo8y ago

Parameter-ized query builders are possible in every SQL library.

String escaping SQL? How is anyone thinking that is still a thing in 2017? The problem has been solved for two decades

1 more reply

scarface748y ago

Stored procedures are bad in so many ways - they harder to deploy and revert than code, harder to unit test* , harder to refactor and every implementation that I have ever seen that has business logic in stored procedures instead of microservices/packages/modules have been a nightmare to maintain.

* At least with .Net/Entity Framework/Linq you mock out your dbcontext and test your queries with an in memory List<>

https://msdn.microsoft.com/en-us/library/dn314429(v=vs.113)....

1 more reply

mdpopescu8y ago

Yeah, I thought the same thing until I found a colleague who was very fond of calling exec_sql in stored procedures, with the argument being a concatenation of the sp arguments.

mike_hearn8y ago

I think you mean parameterised queries. Stored procedures are a slightly different thing.

j / k navigate · click thread line to collapse

0 comments

stouset8y ago

Thank you.

mbreese8y ago

Granted, if you made that control channel stateful, you'd make a lot of problems go away. But you could do that with a combined control/data stream too.

What am I missing? How would an out-of-band control channel make things easier?

That said, I think many issues with the web could be solved by implementing new protocols as opposed to shoehorning everything into HTTP just to avoid a firewall...

jacquesmOP8y ago

It makes sure that all your code is yours and that no matter what stuff makes it into the data stream it will never be able to do anything because it is just meant to be rendered.

So <html>abc</html> would go as

If the web were more like GSM you could sit there and inject data in to the data channel until the cows came home but it would never ever lead to a security issue.

No amount of extra encoding and checks will ever close these holes completely as long as the data stays 'in band' with the control information.

mbreese8y ago

I could easily see making <script> and <link> resources required to be separately requested (like images are now -- ignoring data/base64 resources), but we're back to redefining HTML.

I'm not arguing against that...

1 more reply

eadmund8y ago

Or, e.g. my preferred encoding of HTML:

    (html "abc")

This guarantees that no matter what is inside "abc" it simply can't escape into the control stream:

    (html "This is not (malicious \"boo\")")

This is just a pretty display of what would actually be these bytes:

    (4:html29:This is not (malicious "boo"))

It doesn't matter what one puts in the atom: it can't escape and damage the control stream.

1 more reply

51stpage8y ago

[1] https://xkcd.com/327/

stdgy8y ago

Hmm. I'm going to have to disagree about Stored Procedures providing security. You can do all sorts of bad things using stored procedures that may result in unintended code execution!

Perhaps the most naive example: https://pastebin.com/acQqhDvy

I think they're more useful for organization and abstraction than security. Then again, a well organized and smartly abstracted system can lead to better security!

But I think bind parameters are probably a better example of security.

thaumasiotes8y ago

> I finally figured out it was because a stored procedure does exactly what the grandparent post says: It treats all inputs as data with no possibility to run as code.

This isn't well defined. Take this pseudocode stored procedure (OK, it's a python function):

    def retrieve_relevant_data(user_input):
        if user_input == 1:
            return BACKING_STORE[5]
        elif user_input == 2:
            perform_side_effects()
            return BACKING_STORE[1]
        else:
            return "Go away."

2 more replies

edoceo8y ago

Parameter-ized query builders are possible in every SQL library.

String escaping SQL? How is anyone thinking that is still a thing in 2017? The problem has been solved for two decades

1 more reply

scarface748y ago

* At least with .Net/Entity Framework/Linq you mock out your dbcontext and test your queries with an in memory List<>

https://msdn.microsoft.com/en-us/library/dn314429(v=vs.113)....

1 more reply

mdpopescu8y ago

Yeah, I thought the same thing until I found a colleague who was very fond of calling exec_sql in stored procedures, with the argument being a concatenation of the sp arguments.

mike_hearn8y ago

I think you mean parameterised queries. Stored procedures are a slightly different thing.

j / k navigate · click thread line to collapse