This is what XHTML was, and it was a complete disaster. There's a reason almost nobody serves XHTML with the application/xhtml+xml MIME type, and that reason is that getting a “parser error” (this is what browsers still do! try it!) is always worse than getting a page that 99% works.[0] I strongly believe that rejecting the robustness principle is a fatal mistake for a web-replacement project. The fact that horribly broken old sites can stay online and stay readable is a huge part of the web's value. Without that, it's not really “the web”, spiritually or otherwise.
[0] It's particularly “cool” how they simply do not work in the Internet Archive's Wayback machine. The page can be retrieved, but nobody can read it.
We could do the same with sites that are not 100% correct. User are already used to having to click "Open anyways" for older, non HTTPS, sites anyways
Today, when writers are using visual editors (or Markdown), few are writing their own HTML any more. A web standard requiring compliance would work differently today.
I'd say it was a minority of writers that were handcrafting XHTML. And it was the case that everyone or their handcrafting or using tools could validate their compliance using a browser which made it very easy to adjust your tools or your handcrafted code. We are now in a situation where there is no schema for HTML.
I, for one, am very much in favor of forking the web with a document format with a schema. It really seems like a small and simple change to me.
If it did somehow happen that a good deal of interesting content was published using the standard, the most popular client would probably be nonconforming, ignoring the rule to not render ambiguous content.
Protocols used to be limited by technology, now they're defined by ideology.
However, I don't see it that clearly that this cannot be done since the start so that the expectations are right since the beginning. For example, I don't see the same problem in other formats like JPEG or PNG where you expect the image to work perfectly or fail with a decoding error.
Other than implementing it and see how it goes, can you propose a feasible experiment to see how an new strict spec will measurably fail?
tried it right now - took a PNG and a JPEG, opened them in a text editor, literally deleted the second half of the file, saved, and dragged them into both Firefox and Chrome - they are displayed instead of erroring out.
there is a classic article why a minimal version of the web with features removed will fail - you removed 80% of the features that YOU think are not important. thats a classic fatal mistake
search the web for different proposals for a minimal web and you will understand - they will have removed some feature they think is bloat but which you kept in your proposal because you consider it critical. which is why you created a new proposal - their minimal proposal is not the right one for you
https://www.joelonsoftware.com/2001/03/23/strategy-letter-iv...
I think what is lost on many people, ironically even the ones who want to retvrn the web to its former glory, is that the browser tries to display broken, half transmitted content because it happened so frequently due to circumstances completely out of the website operator or the user's control. And in most cases showing a half transmitted web page with half of the closing tags missing is almost certainly better than just outright refusing to show anything.
That... is not how anything happened.
> I don't see the same problem in other formats like JPEG or PNG where you expect the image to work perfectly or fail with a decoding error.
Browsers absolutely decode as much as they can, and if the file is corrupted halfway through you generally get garbling, not the entire image being replaced by "fuck off". The only case where that is so is if the browser can't parse anything at all, or can't retrieve the file.
> Other than implementing it and see how it goes, can you propose a feasible experiment to see how an new strict spec will measurably fail?
We already did that and saw where it went.
What I meant is that you don't expect PNG or JPEG images to be created in a way that the parser needs to run a complex process to reconstruct the bits that are broken and interpret what you meant to say. Like this one:
https://html.spec.whatwg.org/multipage/parsing.html#adoption...
Perhaps a better example is a C program being compiled into an executable. You don't expect the compiler to guess what you meant while parsing.
The current expectation is that a web browser must load any broken HTML and still display what it can, and is this expectation what I would like to change.
I don't propose humans to write this format directly (although it should be human readable), but compile it from something that is easy to write, like Markdown or a similar language. The objective is to enforce tools that make the transformation to produce a strictly conformant document.
Having a context-free grammar allows simple and fast parsing tools that can process your document, in a similar way that you can query or manipulate a JSON file with tools like jq because the grammar is simple and strict.
What the heck are you talking about? User agent devs and users did indeed always go toward it mostly works.
And it's still unambiguous. You can cringe at what some people do, but it would be strictly a taste issue rather than a technical one, as the parse would still be unambiguous. And if you think you can fix taste issues with technical specification, well, you've already lost anyhow.
Would you like to have a law that forbids you, under penalty of fine, to read any book you buy or borrow that is lacking or has damaged pages?
Why are we okay with formats like PDF that have similarly catastrophic error handling?
In this brave new world we can try again. This time, though, when a parser error occurs we can spin up an Agent in the background to fix the document, looping until it passes the parser's validation, then display that! We can then have the browser automatically submit a PR or bug report to the website operator with the fix.
That way we can achieve well-defined wire formats with deterministic rendering behavior!
Having an LLM hallucinate a new page in case of errors isn't a better solution, it's qualitatively worse. If you want web documents to render with errors, just use HTML.
That’s not the reason almost nobody serves XHTML.
The real reason is Internet Explorer. Okay, it’s a little more nuanced than that, but I think it’s accurate enough. Microsoft killed XHTML by inaction.
It’s 2004. XHTML is now a few years old, and all the rage. You decide to use it for your new project which you’re developing. At the start, you serve pages as application/xhtml+xml, and that works well in Firefox; but you know that won’t work because Internet Explorer still doesn’t support XHTML, and 90% of your viewers will be using that. So, a little frustrated, you serve your nice XHTML as text/html. You still validate it manually for a while, but then that habit disappears. Eventually you make one or two small mistakes that would have been caught easily if it were parsed as XML—but it’s not, because of Internet Explorer. Over time this disparity grows.
People have been complaining of the inefficacy of XHTML for this exact reason for two or three years by this point.
It’s 2006. XHTML is acknowledged to have failed. Everything else supports it, but as long as IE doesn’t, you can’t serve as application/xhtml+xml, and so you can’t get the advantages of XML syntax.
Seriously, early failure is good—so long as you’re working with it from the start. The problems only occur when you try to add strictness later.
Just look at typing in code bases. Adding strictness to existing JavaScript or Python or Ruby? Nightmare. Starting with static types? Somewhere between fine and extremely desirable.
(I might be overselling strictness’s popularity at the time—people don’t always like what’s good for them. We’ve largely realised now that unfettered dynamic typing is a bad idea, but ten years ago that was not settled. People get used to things. If IE had permitted XHTML early on, people would have got used to the idea of XHTML’s strictness and, I think, got to mostly like it.)
XHTML did not fail because of XML’s catastrophic parse failure mode. It failed because HTML already worked, and Internet Explorer took way too losng to accept XHTML. If you’re forking the web and compatibility with existing documents is not a goal, you can’t use XHTML’s failure as an argument: it failed because of compatibility issues.
Well, Internet Explorer did eventually support application/xhtml+xml: in 2011, IE9. Way too late to matter. And so only by around 2015 or 2016 could you finally serve with XML syntax. And now why would you? For your system is big and has tiny errors here and there and your CMS just drops markup in and never got round to validating it and and and and so on. By that time, HTML had given up on the XML path, and although it worked, the momentum was entirely gone, so you’d run into difficulties due to inadequate documentation, inferior tooling (ironic), and various more.
Hard errors up front are great when you control the full content pipeline. It's very rare that that's the case, and was rare even in 2004. As soon as including someone else's broken content in your page prevents users from seeing your content, and that someone else can break the content at any time and you can't control it... few people will want hard errors.