FWIW, the document's own URI is terrible: 'https://www.w3.org/Provider/Style/URI' - who could have any idea what the page is about from that? And what if the meaning of the word 'Provider' or 'Style' changes in x years from now? :) You could argue that the meaning/usage of 'URI' has already changed, because practically no-one uses that term any more. Everyone knows about URLs, not URIs. Not many people could tell you what the difference was. So the article's URI has already failed by its own rules.
No, a URL doesn't necessarily have to give you the title of the article, even if having some related words in it might be good for SEO value. If you paste it in plain text or similar, add a description to it. Here's how:
Cool URIs Don't Change: https://www.w3.org/Provider/Style/URI
There, now the reader will know what's this about.
I'd still get far more value out of this:
Additionally, widespread use of web search engines has made URI stability less relevant for humans. Bookmarks are not the only solution to find a leaf page by topic again. A dedicated person might find that archiving websites may have preserved content at their old URIs.
Some of this is allowed to happen because the content is ultimately disposable, expires, or possesses limited relevance outside of a limited audience. Some company websites are little more than brochures. Documents and applications that are relevant within organizations can be communicated out of band. Ordinary people and ordinary companies don't want to be consciously running identifier authorities forever.
The reason for the eventual demise of the URL will simply be the fact that the concept of "resource" will just not be sufficient enough to describe every future class of application or abstract behavior that the web will enable.
It depends on how you define a "resource" and what which value you attribute to that resource. And this is exactly the crux: this is out of the scope of the specification. It's entirely left to those who implement URI's within a specific knowledge domain or problem domain to define what a resource is.
Far more important then "resource" is the "identifier" part. URI's are above all a convention which allows for minting globally unique identifiers that can be used to reference and dereference "resources" whatever those might be.
It's perfectly valid to use URI's that reference perishable resources that only have a limited use. The big difficulty is in appreciating resources and deriving how much need there is to focus on persistence and longevity. Cool URI's are excellent for referencing research (papers, articles,...), or identifying core concepts in domain specific taxonomies, or natural/cultural objects, or endorsement of information as an authority,...
The fallacy, then, is reducing URI's to how the general understanding of how the Web works: the simple URL you type in the address bar which allows you to retrieve and display that particular page. If Google et al. end up stripping URL's from user interfaces, and making people believe that you don't need URI's, inevitably a different identifier scheme and a new conceptual framework will need to be developed to just to be able to do what the Web is all about today: naming and referencing discrete pieces of information.
Ironically, you will find that such a framework and naming scheme will bear a big resemblances, and solves the same basic problems the Web has been solving for the past 30 years. And down the line, you will discover the same basic problem Cool URI's are solving today: that names and identifiers can change or become deprecated as our understanding and appreciation about information changes.
In the late 90's - early 2000's, HTML started to being pushed into fields that, at my opinion, were unrelated (remember active desktop?). Before you had time to react, HTML was being used to pass data between applications. At the time I was already doing embedded stuff and I remember being astonished to learn that I have to code an HTML parser/server/stack in my small 16-bit micro because some jerk thought it was a good idea to pass an integer using HTML (SOAP, for example).
In the meantime, HTML was being dynamically generated, and then dynamically modified in the browser, and then modified back in the server using the same thing you use to modify it in the browser. It's a snowball that will implode, sooner or later.
(1) some operators only care about a handful of the URLs under their domain;
(2) hardly anyone uses link relations, so most links are devoid of semantic metadata and are essentially context-free, requiring a human to read the page and try to guess the purpose of the link;
(3) so many 'resources' are now entire applications, and the operators of these applications sometimes find it undesirable to encode application state into the URI, so for these you can only get to the entry point -- everything else is ephemeral state inside the browser's script context.
But I disagree with the statement that "the reason for the eventual demise of the URL will simply be the fact that the concept of 'resource' will just not be sufficient enough to describe every future class of application or abstract behavior that the web will enable."
URIs are a sufficient abstraction to accomodate any future use-case. It's a string where the part before the first colon tells you how to interpret the rest of it. It'd be hard to get more generic, yet more expressive.
The demise of URLs, if it ever comes to pass, will be due to politics or fashion: e.g. browser vendors not implementing support for certain schemes, lack of interoperability around length limits, concerns about readability and gleanability, and vertical integration around content discovery.
The article states early on, “Except insolvency, nothing prevents the domain name owner from keeping the name.” As it turns out, insolvency is a pretty significant source of URL rot, but also so is non renewal of domains by choice or by apathy, whether for financial or mere personal energy reasons (“who is my registrar again? Where do I go to renew?”) especially by individuals. You start a project and ten years later your interest has waned.
Domains are an increasingly abundant resource as TLDs proliferate. Why not default to a model where you pay once up front for the domain, and thereafter continued control is contingent on maintaining a certain percentage of previously published resources, and if you fail at that some revocable mechanism kicks in that serves mirrored versions of your old urls. Funding of these mirrors comes from the up front domain fees. Design of the mechanism is left as an exercise for the reader :-)
- The UK leaving the EU means British companies can't keep their .eu domains, unless they have a subsidiary in the EU.
- A trademark dispute can mean someone loses a domain.
If limited per customer it'd still be a similar situation, probably involving lots of 'fake' accounts and registrant details.
Years ago .info domains were being sold very cheaply. Their registrations skyrocketed and the quality of the average .info domain clearly went down.
Namespace pollution. What if my great-great grandson wants my user name on Google? I took it. Similarly, I took the .net domain with my last name.
Spam, squatting, maybe.
2016: https://news.ycombinator.com/item?id=11712449
2012: https://news.ycombinator.com/item?id=4154927
2011: https://news.ycombinator.com/item?id=2492566
2008 ("I just noticed that this classic piece of advice has never been directly posted to HN."): https://news.ycombinator.com/item?id=175199
also one comment from 7 months ago: https://news.ycombinator.com/item?id=21720496
http://www.pathfinder.com/money/moneydaily/1998/981212.moneyonline.html
This consists of:0. Access protocol
1. Hostname/DNS name
2. Arbitrary chosen path hirarchy
3. File extension
This is really a description where to find a document ("locator" not "identifier"). So, if you are:
- re-organizing / cleanup your file structure
- change or hide the file extension
- enable HTTPS
- migrating files to a different domain name
This WILL change the URL. What are you going to do? Not cleanup your space anymore? Stick to HTTP? So URLs DO change. That's just the reality.
If you want something that does not change, don't link to a location but link to content directly: E.g.
- git hashes do not change
- torrent/magnet Links don't change
- IPSFS links do not change.
Or use a central authority, that stewards the identifier:
- DOI numbers don't change
- ISBN numbers don't change
The article addresses this by reminding you that though URIs often look like paths, they can be aribtrarily mapped.
By all means move the resource, but put a redirect under the old URI. This means old links continue to work, which is the key point of the article.
I have tried to do it a few times, and eventually just gave up. Carrying forward bad naming decisions from the past, is tremendous effort. When cleaning up the house, I also don't leave around sticky notes at the places where I removed documents from.
On top of this:
- When using static site generators, it's not even possible to do 301 redirects (you would have to ugly slow JS version).
- It does not help if you don't own the old DNS name anymore.
The arbitrary path hierarchy is not so bad. Better than every URI just being https://domainname.com/meaninglesshash. You can also stick a short prefix in front, like https://domainname.com/v1/money/1998/etc, so that all documents created after a reorg can use a different prefix. If your reorg is so severe that there's no way to keep access to old documents under their old URI, even if it has its own prefix, it seem unlikely they'll be made available in any other location. In that context you can imagine the article is imploring you "please don't delete access to old documents".
Your remaining objections, for host name and access, boil down to "don't use URIs at all, and don't bother to avoid changing them". As I type this comment I'm starting to realise that was your whole point, but it was a bit buried alongside minor objections to this particular example. It's also perhaps a bit of an extreme point of view. Referencing a git hash alongside a URI is sensible, but on its own it's pretty useless, and many web pages won't have anything analogous.
Hostname, well perhaps if a company has been merged/sold.
Path/query is really down to information architecture and planning that early on can go a long way, e.g. contact, faq belonging in a /site subdirectory.
File extension doesn't really matter nowadays
Main thing is there's no technical reasons for the change. I recently saw someone wanting to change the URLs of their entire site because they now use PHP instead of ASP. They could use their webserver for PHP to deal with those pages and save the outside world a redirect and twice as many URLs to think about.
I really wish HTTPS hadn't changed the URL scheme so you could host both HTTPS and fallback HTTP under the same URL. However most HTTPS sites will redirect http://domain/(.*) to https://domain/$1 (or at least they should) so this doesn't need to break URLs.
This is excellent. I wish more people would make your distinction between URL and URI. URIs really are supposed to be IDs. When put in that parlance, it's hard to say that IDs should change willy-nilly on the web. That said, I think that does deprioritize a global hierarchy / taxonomy for a fundamentally graph-like data structure.
> If you want something that does not change, don't link to a location but link to content directly
I see motivation for this, but I've personally found this to be equally as problematic as blending the distinction between URIs and URLs. Most "depth" and hierarchy that's in URLs is stuff that ideally would be in the domain part of the URL. For instance:
http://company.com/blog/2019/02/10-cool-tips-you-wouldnt-bel...
would really map to:
http://blog.company.com/2019/02/10-cool-tips-you-wouldnt-bel...
and the "blog" subdomain would be owned by a team. You could imagine "payments", "orders", or whatever combo of relevant subdomains (or sub-subdomains). In my experience this hierarchical federation within an organization is not only natural, it's inevitable: Conway's Law.
So I do very much believe that the hierarchy of content and data is possible without needing a flat keyspace of ids. Just off the top of my head, issues with the flat keyspace are things like ownership of namespaces, authorization, resource assignment, different types of formats/content for the same underlying resources etc. Hierarchies really do scale and there's reason for them.
That said, most sites (the effective 'www' part of the domain) are really materialized _views_ of the underlying structure of the site/org. The web is fundamentally built to do this mashup of different views. Having your "location" be considered a reference "view" to the underlying "identity" "data" would go a long way to fixing stuff like this.
DOI and ISBN are as much locations as URL.
Content based URN are the only option.
> Historical note: At the end of the 20th century when this was written, "cool" was an epithet of approval particularly among young, indicating trendiness, quality, or appropriateness. In the rush to stake our DNS territory involved the choice of domain name and URI path were sometimes directed more toward apparent "coolness" than toward usefulness or longevity. This note is an attempt to redirect the energy behind the quest for coolness.
It's 2020 and "cool" still has that same meaning, as an informal positive epithet. I believe "cool" is the longest surviving informal positive epithet in the English language.
"Cool" has been cool since the 1920s, and it's still cool today. "Cool" has outlived "hip," "happening," "groovy," "fresh," "dope," "swell," "funky," "bad," "clutch," "epic," "fat," "primo," "radical," "bodacious," "sweet," "ace," "bitchin'," "smooth," and "fly."
My daughter says things are "cool." I predict that her children will say "cool," too.
Isn't that cool?
> Slang meaning "superior, classy, clever" is attested from 1893. Sense of "stylish" is from 1922.
> A 1599 dictionary has smoothboots "a flatterer, a faire spoken man, a cunning tongued fellow."
It may be time to bring that one back. "Did you see Keith chatting up that girl at the bar? Total smoothboots."
I enjoyed reading your list, it was like a trip down memory lane.
I don't have my hard copy here and Google is failing me but this is addressed by Terry Pratchett in (I think) Only You Can Save Mankind.
The context is some teen-agers talking about how it's not cool to say Yo, or Crucial, or Well Wicked, but Cool is always cool.
Would appreciate the full quote if somebody can find!
'It's not cool to say Yo any more,' said Wobbler.
'Is it rad to say cool?' said Johnny.
'Cool's always cool. And no-one says rad any more, either.'
Wobbler looked around conspiratorially and then fished a package from his bag.
'This is cool. Have a go at this.'
'What is it?' said Johnny.
...
'Yes. We call him Yo-less because he's not cool.'
'Anti-cool's quite cool too.'
'Is it? I didn't know that. Is it still cool to say "well wicked"?'
'Johnny! It was never cool to say "well wicked".'
'How about "vode"?'
'Vode's cool.'
'I just made it up.'
The capsule drifted onwards.
'No reason why it can't be cool, though.'
I have spoken.
Example from a book: https://books.google.com/books?id=yLj8m3K0kNoC&pg=PA224&dq=h...
A message from a W3C staff member on a W3C mailing list on 1999-06-21 mentions [1] that w3c.org should redirect to the corresponding page at w3.org, and the latter is considered the 'correct' domain.
[1] https://lists.w3.org/Archives/Public/www-rdf-comments/1999Ap...
One of OneDrive’s pet peeves is that if I move a file it changes the URI. So any time someone moves a file, it breaks all the links that point to it. Or if they change the name from foo-v1 to foo-v2. I wish they’d adopt google docs.
[1] https://www.nayuki.io/page/designing-better-file-organizatio...
I think it's for reasons like this that many mac users strongly prefer native apps over Electron or web apps.
Does make updating more awkward, and you still need some system of mapping the addresses to friendly names.
The migration to TLS for the majority of sites would have won him the bet but I see this one is still serving up non-TLS
Most URN schemes I have seen look something like an authority ID followed by either a date and a string you choose, or just a string you choose. This looks very like an HTTP URI. In other words, if you think your organization will be capable of creating URNs which will last, then prove it by doing it now and using them for your HTTP URIs. There is nothing about HTTP which makes your URIs unstable. It is your organization. Make a database which maps document URN to current filename, and let the web server use that to actually retrieve files.
Did this fail as a concept? Are there any active live examples of URNs?
One well-known example is the ISBN namespace [2], where the namespace-specific string is an ISBN [3].
The term 'URI' emerged as somewhat of an abstraction over URLs and URNs [4]. People were also catching onto the fact that URNs are conceptually useful, but you can't click on them in a mainstream browser, making its out-of-the-box usability poor.
DOI is an example of a newer scheme that considered these factors extensively [5] and ultimately chose locatable URIs (=URLs) as their identifiers.
[1] https://www.iana.org/assignments/urn-namespaces/urn-namespac... [2] https://www.iana.org/assignments/urn-formal/isbn [3] https://en.wikipedia.org/wiki/International_Standard_Book_Nu... [4] https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Hi... [5] https://www.doi.org/factsheets/DOIIdentifierSpecs.html
When a protocol ID is a URI it is common to use a URL rather than a URN so that the ID can serve as a link to its own documentation.
There is a bonkers DNS record called NAPTR https://en.wikipedia.org/wiki/NAPTR_record which was designed to be used to make the URN mapping database mentioned towards the end of your quote, using a combination of regex rewriting and chasing around the DNS. I get the impression NAPTR was never really used for resolving URNs but it has a second life for mapping phone numbers to network services.
There are too many moving parts to trust that even domain names will be the same. See geocities and tumblr for recent example. If you want a document, you should have archived it.
(Or maybe your point was deeper, that one not only can't trust that the resource location won't change but even that the resource itself will still be available somewhere? That is true, too! But saying that archive.org is the solution is just making one massively centralised point of failure. That doesn't mean that we shouldn't have or use archive.org, but that we should regard it as just the best solution we have now rather than the best solution, full stop.)
And then there are the URIs that aren't even made for human consumption, ridiculously long, impossible to parse or pass around. Another class is those that get destroyed on purpose. Your favorite search engine should just link to the content. Instead they link to a script that then forwards you to the content. This has all kinds of privacy implications as well as making it impossible to pass on for instance the link to a pdf document that you have found to a colleague because the link is unusable before you click it and after you click it you end up in a viewer.
For Firefox, I recommend the extension https://addons.mozilla.org/en-US/firefox/addon/google-direct.... The extension’s source code: https://github.com/chocolateboy/google-direct.
I can copy Google link just fine.
Here is a sample:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&c...
Obtained by right clicking the link to the pdf and then 'copy link location'. What you see is not what is sent to your clipboard.
For instance, `https://example.com/foo` tells you that the resource can be accessed via the HTTPS protocol, at the server with the hostname example.com (on port 443), by asking it for the path `/foo`. It is hence an URL. On the other hand, `isbn:123456789012` precisely identifies a specific book, but gives you no information about how to locate it. Thus, it is just an URI, not an URL. (Every URL is also an URI, though.)
End of the day, there is not clarity, so just use the term that will be best understood by the person you are talking to. URL is a good default, probably even for "about:"
Look for example at this link:
https://www.amazon.com/Fundamentals-Software-Architecture-Engineering-Approach-ebook/dp/B0849MPK73/ref=sr_1_1?dchild=1&keywords=software+architecture&qid=1594966348&sr=8-1
Maybe each part has a solid reason to exist, but the result is a monster.I would prefer something like this:
https://amazon.com/dp/B0849MPK73
And guess what, the above short link actually works! But Amazon didn't use this kind of links as a standard. https://amazon.com/Fundamentals-Software-Architecture/dp/B0849MPK73/
This includes the main title of the book + ID (this variant also works).handle.net (technically it’s like a url shortner, but there’s an escrow agreement you need to sign first to make sure that the urls stay available). Purl and w3id.org (that allow for easy moving of whole sites to a new domain name. And of course https://robustlinks.mementoweb.org/spec/
* Simplicity: Short, mnemonic URIs will not break as easily when sent in emails and are in general easier to remember.
* Stability: Once you set up a URI to identify a certain resource, it should remain this way as long as possible ("the next 10/20 years"). Keep implementation-specific bits and pieces such as .php out, you may want to change technologies later.
* Manageability: Issue your URIs in a way that you can manage. One good practice is to include the current year in the URI path, so that you can change the URI-schema each year without breaking older URIs.
This is what 301 HTTP status (permanent redirect) should be for... [1] So it seems to me if you use 301 you should be good to go.
Also from a quick search it seems the recommended thing to do is remove the old URLs from your sitemap.
1: https://en.wikipedia.org/wiki/URL_redirection#HTTP_status_co...
e.g.: https://news.ycombinator.com/item?id=8454570 https://news.ycombinator.com/item?id=10086156 https://news.ycombinator.com/item?id=803901
In this one https://news.ycombinator.com/item?id=1472611 the URI is actually broken - not sure if it changed or if it just was a mistake of OP back then.
True. Yet this submission will have dramatically greater visibility than it otherwise would have because the HN facebook bot linked it 5 minutes ago[1]. As a web archivist, I've dealt a lot with the erosion of URI stability at the hands of platform-centric traffic behavior and I don't see it letting up any time soon.
Sidenote: The fb botpage with a far larger audience, @hnbot[2], stopped posting some months ago.
[1]: https://facebook.com/hn.hiren.news/posts/2716971055212806
Here's some selected quotes:
6.2.1 "(...) The definition of resource in REST is based on a simple premise: identifiers should change as infrequently as possible. Because the Web uses embedded identifiers rather than link servers, authors need an identifier that closely matches the semantics they intend by a hypermedia reference, allowing the reference to remain static even though the result of accessing that reference may change over time. REST accomplishes this by defining a resource to be the semantics of what the author intends to identify, rather than the value corresponding to those semantics at the time the reference is created. It is then left to the author to ensure that the identifier chosen for a reference does indeed identify the intended semantics."
6.2.2 "Defining resource such that a URI identifies a concept rather than a document leaves us with another question: how does a user access, manipulate, or transfer a concept such that they can get something useful when a hypertext link is selected? REST answers that question by defining the things that are manipulated to be representations of the identified resource, rather than the resource itself. An origin server maintains a mapping from resource identifiers to the set of representations corresponding to each resource. A resource is therefore manipulated by transferring representations through the generic interface defined by the resource identifier."
[1] https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding...
Is it a bias I've developed or has anyone else realized just how many dangling links on microsoft.com? Redistributables, small tools, patches, support pages, documentation pages. I've recently found out when a link domain is microsoft.com I subconsciously expect it to be 404 with about 50% chance.
Is there a benefit to this? I am mostly just frustrated.
I think archive.org is the better long term plan. Not only does it preserve urls forever, it also preserves the content on them.