This means if you have an endpoint that returns HTML or JSON depending on the requested content type and you try to serve it from behind Cloudflare you risk serving cached JSON to HTML user agents or vice-versa.
I dropped the idea of supporting content negotiation from my Datasette project because of this.
(And because I personally don't like that kind of endpoint - I want to be able to know if a specific URL is going to return JSON or HTML).
E.g.
/fred.html - returns html
/fred.md - returns markdown
/fred.json - returns json
(I’m guessing that also — /fred - defaults to html
)Though he was a strict fundamentalist restafarian in other ways he held that this had the benefits of
1. Working
2. Being readily understood
I’ve seen worse ideas. Only downside is that the urls look a bit ugly.
If /fred returns HTML, then the URL is perfectly fine, since the only real consumer of fred.md or fred.json are automated systems/API clients, and they couldn't care less what the URL looks like, only that it's predictable.
That's not true. They do, but you have to add it to the Vary header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Va...
This way user-agents know that the Accept header is involved in the final form of the response. (As another example, Firefox also doesn't take Accept into consideration when caching locally by default, and it does with Varry: Accept)
Their documentation specifically mentions Vary as non-functional[1] outside a paid plan, and even then only for images[2].
[1] https://developers.cloudflare.com/cache/concepts/cache-contr...
[2] https://developers.cloudflare.com/cache/advanced-configurati...
> vary — Cloudflare does not consider vary values in caching decisions. Nevertheless, vary values are respected when Vary for images is configured and when the vary header is vary: accept-encoding.
Interestingly, one of the big CDN providers did have controls in their UI for explicitly allowing/disallowing Vary header entries but they disabled it for us at some point (e.g. it was still in the UI but greyed out). I assumed once we hit a certain level of traffic it was too computationally expensive? Ever since, I've avoided any kind of fancy header/response variance in APIs just in case I end up in the same situation. It is rarely a necessity. IIRC, the only thing they continued to support variance wise was gzip (e.g. content-encoding).
It's also worth noting they were extremely conservative with query parameters too. Also to reiterate, this was very high traffic and high volume with expectations of low latency, so probably not applicable to most people using CDNs for static website assets.
The accept header is passed along, so your server can respond however it wants for dynamic content/not cached by CDN.
When building a public scientific database I really want the URL identifying the item in the database to return the page for that item when I enter it in a browser but to return the appropriately structured data for that item when requested with "Accept: application/ld+json" or "application/rdf+xml" by a linked data library.
So it's unfortunate that there's no good way to support this with common CDNs.
Of course I always make it so that appending "?type=json" or "?type=xml" gets you the appropriate document.
Vary: Cookie, Authorization, Accept, Accept-Language
It's a documented decision that they've made: https://developers.cloudflare.com/cache/concepts/cache-contr...
Some effort to clean up the useless and duplicate features would be good.
I don't think anyone's complaint about server side rendering has ever been that it was too complex.
They added real-time search results to our Django SSR site in 2 lines of HTMX.
It’s the opposite of added complexity.
IMO, it's a great option for rapid prototyping and for projects that can be done with whatever backend stack you already have.
Lots of use cases could benefit from a hypermedia flow.
I do tend to prefer actual file extensions though. Friendlier for humans (curl https://endpoint/item.json vs curl -H "accept: application/json" https://endpoint/item), and it is visible in logging infrastructure, sharable, etc.
http://endpoint/item?format=json or ?type=json
I never ever had a problem with that. The only reason to use a file extension would be if the request would take no query parameters.
It would never occur to me to use header information for this.
What I don’t necessarily like about query params for this, is that their usage implies filtering. It’s a cultural assumption but it’s there.
Another point: you might actually have a literal file at a similar path that you serve, typically when you cache responses (managed cache or straight from your app server). Using file extensions gives you a bit more natural affordances. It’s just overal less clunky.
I sometimes prefer file extensions, but then consuming code can often get filled with appending or removing or changing formats.
Broadly, I like content negotiation as a concept because it describes what you actually want and is much more typed (as far as it can be). Adding file extensions feels very stringly typed processing.
Both have problems, you pick your poison and look wistfully at the greener grass on the other side.
Say, an RSS feed being served as a formatted and styled page to a browser, or a client that accepts the usual XML.
Hypermedia APIs should be rate limited as well, because otherwise people will just go and screen scrape (like many HN apps do, because HN doesn't offer an API). All a "data" API does is make the scraper's job easier.
> Data APIs typically use some sort of token-based authentication / Hypermedia APIs typically use some sort of session-cookie based authentication
So what. Any web framework worth its salt can support multiple authentication / credential mechanisms - the only "benefit" I can see from limiting cookie authentication is to make life for bad actors with cookie-stealer malware harder (like GitLab does, IIRC).
Announcement https://www.ycombinator.com/blog/hacker-news-api
Then proxies or other IPs will be used anyway.
Maybe in the trivial case saying "I'd prefer a JPG to a PNG" can be an assymetrical choice. But in all the interesting use cases I can think of, e.g. where there are competing representation formats, you'd want the server to be able to respond to a HEAD with the choices.
That's the kind of thing you can put in Swagger, but that might lead to hoisting the client's choice into the API, away from Content Negotiation.
The whole point of content negotiation is the client tells the server which types it wants the content in, with weights to determine preference;
The server then works out the best match for what's requested.
Why do you need to ask about what's available? Just ask for it if you support it, and then handle the response based on its type.
As a client, you do know what to ask for: everything you support, listed in order of preference.
Why would you, as a client, care whether the server supports some format that you don't understand?
It's ironic he wrote an article about how the industry uses the term "REST API" incorrectly, because he himself keeps using the term "API" incorrectly. If an "API" is tightly coupled to a single application, it's not an "Application Programming Interface"... it's just a part of your application.
An API is supposed to be an interface on top of which multiple applications may rest. Particularly without a specific frontend in mind - so web, desktop app, mobile app, as a component of other services and so on. Obviously if it serves site-specific HTML snippets, that's not the case. The only reason he advocates this whole thing is because without it HTMX won't work, and in this way I find it quite myopic as a position. But if I was pushing HTMX I'd also be compelled to figure out reasons to make it sound good.
So from that PoV, talking about "Content Negotiation in HTML APIs" loses meaning, as what he has is not an API in the first place, it's just his HTML website, but with some partial requests in the mix. And of course you wouldn't mix your API and your site. But this does not imply you can't and shouldn't use Content Negotiation either on your site, or in your API. You simply shouldn't use them to mix two things that never made sense to mix.
A lot of his blog posts would become completely unnecessary if he just says "don't mix your website and your API, and the HTMX partial requests are part of your website, not your API". Alas he's stuck on this odd formulation of "hypermedia API" and constantly having to clarify himself and making things as clear as mud.
Quoting Roy Fielding:
"The design of the Web had to shift from the development of a reference protocol library to the development of a network-based API, extending the desired semantics of the Web across multiple platforms and implementations. A network-based API is an on-the-wire syntax, with defined semantics, for application interactions. A network-based API does not place any restrictions on the application code aside from the need to read/write to the network, but does place restrictions on the set of semantics that can be effectively communicated across the interface."
Specifically a hypermedia API is something browsers (and the implicit backbone of the internet) understand very well. In fact you have to go out of your way to serve an application that your browser doesn’t inherently understand.
The clarity that seems to be lacking here is not necessarily a fault of the author. We re-purposed some of these terms (REST, API etc.) to serve specific needs. But then kind of lost the understanding of what we had before.
I think that’s not our fault though. Standardization didn’t move fast enough and the quality we needed in the mobile context wasn’t there.
It seems like building your application around a single API that is also used to provide data externally saves you time, but you end up polluting that API with presentation concerns needed to drive the application's reports/grids/views. It's not worth the mental energy to consider how changes you need to make for presentation might affect the 'purity' of your public API. Returning hypermedia from the 'internal' API just forces that separation: there's no expectation that this 'data' is being returned for consumption by anything except the app that uses it.
Even worse if the json api is public, as it may need to restrict capabilities.
I have basically never seen a nice user-facing API when it’s been split out. Sometimes that’s fine, but at least for enterprise use cases having a “real” API just feels like table stakes in so many domains for getting bigger clients onboard.
Originally I thought they meant 2 JSON APIs. One that's tightly coupled with the HTML to handle all the "ajax" requests, and the other for 3rd parties to fetch arbitrary bits of data.
Otherwise, I know what you mean. My company has our internal RPCs and then our customer-facing API and the customer one hasn't been updated in ages and it's just a thin layer over some old internal RPCs we used to have and now we have to maintain backwards compatibility but keep breaking it anyway.
I really think it’s so valuable to take what you use internally seriously, exposing and documenting it to end users is a great way to avoid hackiness, and just leads to more regular designs IMO. More work of course but … not that much in the vast majority of cases
Side note: I guess a better title for this blog post would be "Don't share endpoints between machines and humans". If the consumer of an endpoint is a human, it's probably a bad idea to make machines use the same endpoint. Content negotiation is fine, for example if I want to have an API return binary data instead of JSON (if, for example, I'm on an IoT device with sparse resources). This is fine because it's exactly the same API, with just the data format being different. And in both cases it's consumed by a machine/computer.
In the case of the frontend/HTML, the consumer is a human (and not a machine), which — as the article mentions — adds a bunch of constraints that are not applicable to machines: a need for pagination (because scrolling through a thousand results can be annoying), a need for ordering (because I want the most relevant results first), displaying related content (as the article explains). The machine API doesn't necessarily need any of these features.
Can't we handle this situation by regarding each version of the JSON output as a separate content type (which is arguably the semantically correct thing anyway) and then letting the server pick the more recent output version that the client supports?
I might consider using it for an API whose primary purpose is to support a specific client app that I also control, so users running older versions of the client app still get the desired results, but I don't think the additional elegance is a suitable trade-off for a general-use-by-others API sadly.
Little company-specific binary file formats are just the worst.
I feel like a good API is limited in what it accepts, and this alone is enough to say one should not do content negotiation unless forced to.
No, I don't think so. Why do you think so?
> I feel like a good API is limited in what it accepts, and this alone is enough to say one should not do content negotiation unless forced to.
Content negotiation does not change what the API accepts. It changes the format of the response.
The author also suggests:
> The alternative is to ... splitting your APIs. This means providing different paths (or sub-domains, or whatever) for your JSON API and your hypermedia (HTML) API.
I believe the alternative has been the norm actually. For example, many front-end frameworks encode UI states in URL, and it's not so sustainable to keep the alignment b/w UI states and data APIs in the long term.
In other words content negotiation is useful to be able to respond intelligibly. If a client asks you for json but not html, it might not make sense to return html.
For data, Json is the absolute king. For content, html is king. There is very little to negotiate.
The only case where I needed that feature was when data scientists wanted to download data from my API and needed a bunch of formats (parquet, CSV, TSV). But then they did not really grok content negotiation and asked for a query param. So finally I think this is like a lot of html features: half baked and from a different time. Html would do well to drop it.
And "Hypermedia Application Programming Interface" wrong, because generally not for an application at all, but rather a (non-programming, as the author says, "for humans") interface to display multimedia documents ? (I guess you get (inevitable, if not necessarily good) feature creep as soon as you start including something like forms - see also : forms in pdfs ?)
Personally I prefer sticking to the standards. At least that way when you move between projects you know what you’re getting into.
But everyone has their own conventions these days. It’s all fragmenting.
Nah I think he's right and it's coherent to avoid HTTP subtleties in that web architecture.