Edit: As mentioned below, this isn't correct since URLs should be able to be escaped and return the same resource, and an escaped + differs from an unescaped + on S3.
> My point is that the spec requires + to be escaped only inside the querystring.
So what? What the standard mandates for query strings is irrelevant here. It's up to the server how to interpret and map the URLs. "Unconventional and unfortunate" - yes, but breaking the HTTP spec? No.
They had implemented their own HTTP client, but forgot to add the "Host" header to requests which is required by HTTP 1.1.
Interestingly this client sent requests only to their own services, which means that they either released that without testing it or the backend once accepted faulty requests.
[1]: https://github.com/awslabs/amazon-kinesis-producer/issues/61
As an anecdote, about 15/20ish years ago I wrote my own webbrowser. Obviously something highly rudimentary albeit browsers were much easier to implement back then anyway. I was too lazy to read the HTTP spec (it was a hobby project and I was young and impatient) so a lot of what I did was trial and error. I too wasn't sending a host header but it took long while before I ran into any sites that rejected my HTTP requests. The web landscape was very different back then though and IPs were plentiful but it just goes to show how servers have coded around bad clients for years.
This would still be a red flag, as the service in question is their instance metadata service that provides authentication tokens.
Something that important should be integration-tested with the actual service.
Perhaps I don't understand the issue you're discussing but how would the client working on 3rd party services be a red flag when that is the desired behavior?
@dang can you please add (2010) to the title?
Little bit of hyperbole in the title imo. S3 has generally been very good at embracing the fundamental principles of HTTP and REST, leaving aside corner cases like this.
URLs and URIs have separate standards from HTTP and they have changed over time (been replaced by newer ones).
Many years ago it was common to encode a space as a + sign. For example, the PHP function urlencode[1] does the same thing with a + sign. If you're a PHP user, don't use this function unless you know you need to. There are better functions now.
As far as I can tell, this traces its history back to encoding for forms[4]. It's been used far beyond the encoding for forms and maybe someone can explain why.
It's also not just PHP whose function is that way. In Python urlencode encodes as a + (at least in 2.7.x).
I remember working on the web many years ago where "+" is what was used. This may have been a spec misinterpretation or something else. In any case, it was common enough.
Note, I'm not saying it was right. Just not uncommon.
[1] https://www.ietf.org/rfc/rfc1738.txt [2] https://www.w3.org/TR/html401/ [3] https://www.ietf.org/rfc/rfc2396.txt [4] https://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
Don't leave me hanging! What are the better functions now?
And here is where you'd ask that question, a coding forum https://stackoverflow.com/questions/996139/urlencode-vs-rawu...
And here is where you'd answer that question, a coding forum https://stackoverflow.com/questions/996139/urlencode-vs-rawu...
;)
Kidding aside, IIRC "rawurlencode" is the RFC compliant one.
That way you could opt in to the standard conformant behaviour if you require it, but they can still keep backward compatibility.
Immidiatly mark this behaviour as deprecated and switch over to proper '+' == '+' behaviour later.
edit: LiquidFire's idea is better.
Otherwise, if S3 does not use HTTP, we would need to see the S3 specification to determine if it (the implementation Amazon uses) is in violation
In earlier times, we would have both the ability and the balls to treat that unwillingness to uphold the rules we all set out with as damage to the Internet, and route around it. But sadly, AWS has become too big to fail, so the engineers introduce special cases into their products and deploy them.
The AWS support is explicitly acknowledging it's an issue, while giving a rational reason why it probably won't be fixed (even if you disagree with the reason). The back-compat concern is unfortunate but a good argument can be made it's not in users' interests either (beyond being just a cost to AWS to implement the change).
This is the opposite of that.