The ups and downs of the HTTP header (opens in new tab)

(blog.keithcirkel.co.uk)

58 pointsKeithamus12y ago33 comments

33 comments

26 comments · 8 top-level

teddyh12y ago· 8 in thread

A number of errors in this article makes me wary:

1. The "request" line in HTTP is not a header - it is the request, which can have associated headers. The headers are all “about” the request. The request itself is not a header, and does not follow the header syntax. (The historical reason for this is that the request line was defined in HTTP 0.9, which did not have headers.)

2. ISO-8859-1 is not “a crappy Windows character set”. It is an international standard specifically different from what Microsoft was using at the time (code page 437 was standard for MS-DOS in the US). Later, Windows switched to code page 1252, which is a copy of ISO-8859-1 except some extra glyphs in the bytes the ISO standard defined as control characters.

KeithamusOP12y ago

Thanks for the clarification about the request line, I'll edit the article to point that out!

I mostly referred to it as a "crappy Windows character set" because A) it has a limited set of characters, mostly Western European, and B) it's pretty much only used by Windows these days. While the term "crappy Windows character set" is not perhaps entirely accurate, it is a short, tongue in cheek summary of ISO-8859-1.

wereHamster12y ago

Unicode also has a limited set of characters, mostly those that the unicode consortium has agreed on including in the standard.

1 more reply

teddyh12y ago

> Thanks for the clarification about the request line, I'll edit the article to point that out!

(Apparently you weren’t thankful enough to upvote. EDIT: never mind, I must have been mistaken.)

A more accurate description of ISO-8859-1 would be “a crappy 8-bit character set mostly only still relevant for Windows which uses its own embraced and extended version, CP1252.”

1 more reply

pornel12y ago

For compatibility reasons browsers don't use ISO-8859-1, they interpret it as Windows 1252 instead (that de-facto requirement has been codified in the HTML standard now <http://encoding.spec.whatwg.org/>).

donavanm12y ago

To quibble further the request line typically wont have a "host" section. Its almost always a uri path/stem and the 1.1 client sends an additional Host header. The request line must also have the protocol and version, HTTP/1.0.

ethomson12y ago

To quibble further still: the request line may have the protocol and version if the client is HTTP/1.0 or newer. HTTP/1.0 servers must "recognize the format of the Request-Line for HTTP/0.9 and HTTP/1.0 requests" (RFC 1945).

1 more reply

ethomson12y ago

Indeed. The claim "Deflate sucks compared to Gzip" jumped out at me. A more thorough discussion here would be helpful, something along the lines of "While deflate would be the superior choice (though narrowly), it has historically been poorly implemented in servers and user-agents and should therefore be avoided for compatibility".

kayfox12y ago

It jumped out at me as well... because I'm under the impression that there are little differences between the two and they both use the same compression algorithm.

2 more replies

IgorPartola12y ago· 3 in thread

Why is the UA header so screwed up, aside from the historical issues with it? Isn't it time that we replace it with something a bit more sane and structured? It seems the idea of detecting the browser vs detecting browser features goes back and forth. Sure, on the client side, where you have access to the DOM and the JavaScript runtime, it's great to know whether you can use the placeholder attribute in a text input, but server-side you need to decide which video file to serve to the client, and this gets tricky.

Instead, why don't we have something like this?:

    OS: Windows
    OS-Version: 8.1
    Browser: Chrome
    Browser-Version: 18.5

(Not suggesting the format, just the type of data.)

That way we can ditch the stupid stuff such as "like Gecko" which means nothing, and focusing on actual useful things.

_greim_12y ago

Web developers have historically tended to write shitty UA detection logic en masse, which has in turn incentivized browser makers to carefully craft UAs to break as few of them as possible. Basically, avoiding the all-too-common "this site requires IE6 or higher" message when you visit a site in IE11. The same situation would likely develop with the proposal above, which is essentially no different from the original intent for UA strings. The most viable option would be for all browser makers to just simultaneously disable them. Like a band-aid; right off!

KeithamusOP12y ago

Given your scenario of serving the right video to browsers, you shouldn't need to do UA sniffing because the browser should have the right accept headers so you can do proper content negotiation.

UAs these days should only be good for one thing, which is analytics. The browser should provide all the necessary information for other stuff through its other headers, such as Accept. Of course, I emphasise should because there's a wee bit of fantasy in that statement.

noselasd12y ago

Your suggestion will work for a few years perhaps, and then it'll deteriorate to the current state over time.

julien_c12y ago· 3 in thread

Slightly off topic, but this is the first post I've read on a Ghost-powered blog – I think it looks great.

nly12y ago

Plain black text on a white background is great now?

peter_l_downs12y ago

Yes, but it's #3A4145 on white.

/pedantry

noblethrasher12y ago

Yes.

1 more reply

MichaelGG12y ago· 2 in thread

Well as part of a rant, I'll point out two bizarro-world features of HTTP headers: Line folding and comments.

You can add arbitrary crlfs to any header, so long you start the next line with whitespace. Proper implementations need to properly treat every next line as part of the single header. Very annoying to implement (and other similar protocols implementations' do not all agree!), and no benefit. Unless you're composing HTTP headers to read on a 80-column layout. And that kind of thing has no place in a computer protocol.

Comments. Seriously read this from the spec:

  Comments can be included in some HTTP header fields by surrounding
  the comment text with parentheses. Comments are only allowed in
  fields containing "comment" as part of their field value definition.
  In all other fields, parentheses are considered part of the field
  value.

That's even more bizarre. It further makes parsing need to know which header it is operating on. It just adds possibility for mis-implementation, security issues (confused deputy) and hurts performance. It's only useful if you're writing HTTP headers by hand and feel the need to comment them for ... I can't think of a legit case.

"Human readable" computer protocols are debatable (parsing rules always seem to become more difficult, which is very bad), but "human writable" is just silly.

ChickeNES12y ago

This tripped me up to no end when I had to implement a web proxy in one of my intro to CS classes. I couldn't find this mentioned in the standard anywhere and different browsers treated it differently.

MichaelGG12y ago

I've discovered exploitable holes "in-the-wild" due to SIP using the same inane parsing rules. Proxy A asserts security and billing. Server B processes the message but instead of reading Proxy A's assertions, it reads "cutely formed" data directly from the client.

Fixing is a royal pain, because some systems require the behaviour to be one way or another.

"Fortunately" security in VoIP is such a joke that tricks like this aren't the biggest issue and so far, I've not seen any such attempts in any attacks.

rplnt12y ago· 1 in thread

A bit of trivia why Opera is claiming to be 9.80: They used 10.00 in beta of Oepra 10 and found out that many site's sniffers couldn't process 2-digit version number. So with final release (and after that until the death of the browser) they used Opera/9.80 and put the actual version elsewhere in the string.

That being said, people who sniff UA string to serve different content (or even block the user) should end up in hell. I'd start with Google.

webignition12y ago

That being said, people who sniff UA string to serve different content (or even block the user) should end up in hell.

Goodness me how I could rant endlessly on this subject.

I operate an automated web frontend testing service and much of that centres around retrieving a HTML document and running some tests against it.

I have tried very hard to be nice and fair and to set appropriate UA strings, such as featuring only the product name and relevant version numbers. Unfortunately for reasons relating to how responses are altered in relation to the UA string this is not possible.

My product features the word 'test' in the name. Some server-side services return a 404 or a 500 if the UA string contains 'test' in any form. Due to this I can't include the full product name in the UA string and expect all tests for all end users to work in cases where they really should. Some others respond similarly is the UA string is only 'agent'.

The number of services that respond in a different manner to a blank UA string is significant. Likewise for cases where the UA string is not somewhat similar to that of common browsers.

On a related subject, I'd love it if everyone supported the simple HEAD method consistently.

Some services respond as expected and return only the response headers. Some services respond fairly with either a '405 Method Not Allowed' or '501 Not Implemented', giving me the option to try again with an equivalent GET request. Some services send a 404 or 500 in response to a HEAD in cases where the equivalent GET request works just fine.

And lastly, https://myspace.com/ responds with nothing when making a HEAD request and you have to wait for the request to time out in cases where an equivalent GET works just fine.

crazygringo12y ago· 1 in thread

> Opera 12 then just gets weird on us. It says "Generic English please, or U.S English, if not then uh... Arabic! If not then perhaps Catalan? If not then Danish, or if not that then Dutch. Ok perhaps Greek? Finnish?... Go home Opera, you're drunk.

Most amusing part. Seriously, I can't imagine why Opera sends all these languages in its request. Bizarre.

throwaway009412y ago

Not only that, but ... prioritized!

nmc12y ago

Interesting article, but for the part about the User-Agent header, I really liked the history lesson by Aaron Andersen [1] from 2008.

[1] http://webaim.org/blog/user-agent-string-history/

yukkurishite12y ago

Can't say I like the design of the page, but a good read nonetheless. Though after all those warnings, I expected it to be much longer. Is it really that long an article?

j / k navigate · click thread line to collapse

33 comments

26 comments · 8 top-level

teddyh12y ago· 8 in thread

A number of errors in this article makes me wary:

KeithamusOP12y ago

Thanks for the clarification about the request line, I'll edit the article to point that out!

wereHamster12y ago

Unicode also has a limited set of characters, mostly those that the unicode consortium has agreed on including in the standard.

1 more reply

teddyh12y ago

> Thanks for the clarification about the request line, I'll edit the article to point that out!

(Apparently you weren’t thankful enough to upvote. EDIT: never mind, I must have been mistaken.)

A more accurate description of ISO-8859-1 would be “a crappy 8-bit character set mostly only still relevant for Windows which uses its own embraced and extended version, CP1252.”

1 more reply

pornel12y ago

donavanm12y ago

ethomson12y ago

1 more reply

ethomson12y ago

kayfox12y ago

It jumped out at me as well... because I'm under the impression that there are little differences between the two and they both use the same compression algorithm.

2 more replies

IgorPartola12y ago· 3 in thread

Instead, why don't we have something like this?:

    OS: Windows
    OS-Version: 8.1
    Browser: Chrome
    Browser-Version: 18.5

(Not suggesting the format, just the type of data.)

That way we can ditch the stupid stuff such as "like Gecko" which means nothing, and focusing on actual useful things.

_greim_12y ago

KeithamusOP12y ago

Given your scenario of serving the right video to browsers, you shouldn't need to do UA sniffing because the browser should have the right accept headers so you can do proper content negotiation.

noselasd12y ago

Your suggestion will work for a few years perhaps, and then it'll deteriorate to the current state over time.

julien_c12y ago· 3 in thread

Slightly off topic, but this is the first post I've read on a Ghost-powered blog – I think it looks great.

nly12y ago

Plain black text on a white background is great now?

peter_l_downs12y ago

Yes, but it's #3A4145 on white.

/pedantry

noblethrasher12y ago

Yes.

1 more reply

MichaelGG12y ago· 2 in thread

Well as part of a rant, I'll point out two bizarro-world features of HTTP headers: Line folding and comments.

Comments. Seriously read this from the spec:

  Comments can be included in some HTTP header fields by surrounding
  the comment text with parentheses. Comments are only allowed in
  fields containing "comment" as part of their field value definition.
  In all other fields, parentheses are considered part of the field
  value.

"Human readable" computer protocols are debatable (parsing rules always seem to become more difficult, which is very bad), but "human writable" is just silly.

ChickeNES12y ago

MichaelGG12y ago

Fixing is a royal pain, because some systems require the behaviour to be one way or another.

"Fortunately" security in VoIP is such a joke that tricks like this aren't the biggest issue and so far, I've not seen any such attempts in any attacks.

rplnt12y ago· 1 in thread

That being said, people who sniff UA string to serve different content (or even block the user) should end up in hell. I'd start with Google.

webignition12y ago

That being said, people who sniff UA string to serve different content (or even block the user) should end up in hell.

Goodness me how I could rant endlessly on this subject.

I operate an automated web frontend testing service and much of that centres around retrieving a HTML document and running some tests against it.

The number of services that respond in a different manner to a blank UA string is significant. Likewise for cases where the UA string is not somewhat similar to that of common browsers.

On a related subject, I'd love it if everyone supported the simple HEAD method consistently.

And lastly, https://myspace.com/ responds with nothing when making a HEAD request and you have to wait for the request to time out in cases where an equivalent GET works just fine.

crazygringo12y ago· 1 in thread

Most amusing part. Seriously, I can't imagine why Opera sends all these languages in its request. Bizarre.

throwaway009412y ago

Not only that, but ... prioritized!

nmc12y ago

Interesting article, but for the part about the User-Agent header, I really liked the history lesson by Aaron Andersen [1] from 2008.

[1] http://webaim.org/blog/user-agent-string-history/

yukkurishite12y ago

Can't say I like the design of the page, but a good read nonetheless. Though after all those warnings, I expected it to be much longer. Is it really that long an article?

j / k navigate · click thread line to collapse