I guess since HTML is so common it doesn't really matter, but really? We need 5 differnt types of markup, when one would have been fine?
There are a lot of good ideas in HTML5 but why did there need to be _another_ way of parsing HTML-like documents?
Apparently because it's the one HTML-parser to surpass and replace all other HTML-parsers out there. <sarcasm>Yeah, I totally believe that.</sarcasm>
It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in.
EDIT: Another example. I write some HTML in a text-editor/textarea and send it across to someone. If I missed a </LI>, should the parser reject it? If not, the standard should be accommodating enough so that this is valid.
Have it support <img>alt text</img> and <meta>content</meta>, as well as the old way, and then the developers can decide if they want to support legacy browsers. (They probably do, but at least we're looking at a future where html will be just a tad cleaner and more consistent.)
...and we already had SGML in the first place... :)
For example, take an English class. Any. English. Class.
That made it impossible for a browser to implement both XHTML2 and XHTML1 at once (which was in fact the goal of some of the committee members). And then browsers were faced with the choice of implementing XHTML2 (no content at all out there) or XHTML1+HTML (lots of content out there) but not both, they picked the one you'd expect them to pick...
I tried to use it but then completely reverted to HTML4. Thank god we have HTML5 now.
Currently:
<ul>
<li>Hello!</li>
</ul>
By your suggestion: {
"tagName": "ul",
"children": [
{
"tagName": "li",
"children": [
"Hello!"
]
}
]
}
(and, yes, it has been attempted... JSON.stringify(document.body))One of two things will happen, depending on your browser.
If your browser is following the WebIDL spec, so all the accessors are on the prototype, this will produce "{}".
If your browser is WebKit-based, this will throw an exception, because body.firstChild.parentNode == body and JSON.stringify throws on object graphs with loops.
{
'ul': {
'li': 'hello!'
}
}
I would think it would depend upon the parser.Regardless, I'd still rather write out HTML instead of JSON for markup.
(:ul (:li Hello))It would be much easier as a Lisp S-EXP.
And yes, it has been attempted.
HTML5 parsing is clearly defined and in most cases quite sensible. I think it was an excellent compromise.
XML is simple. Sure it's pedantic in the sense that it breaks, but html5 breaks too only subtly.
It's like the difference between java and JavaScript. Java isn't more "pedantic" than JS in ANY way, it just breaks in a more understandable way (break loudly, early and understandably is in my view "better").
If html5 fails on some seemingly valid input (e.g. makes a strange layout when you self-close a div-tag) then it isn't lenient, it's still pedantic. It's just as pedantic as an xml standard is about closing tags, only that the specification for closing tags is dozens of pages instead of three words.
In fact, I think most developers agree that an error message would be preferable to a corrupt layout in the case of the self-cosed div.
>There is absolutely no difference between <br> and <br />.
>Actually, one might argue that adding / to a void tag is an ignored syntax error.
>every browser and parser should not handle <br> and <br /> any differently
If it's optional and has absolutely no effect and makes no difference, how exactly would one argue that it's an error?
To me, this is like saying `print ${SHELL}` is erroneous because the braces don't do anything and `print $SHELL` does exactly the same thing. It may be superfluous, but it's not erroneous.
“It is not, and has never been, valid HTML to write `<br></br>`.”
Sure, but note that it is perfectly valid XHTML (which is a form of HTML).
Oh, and `<script src="foo" />` actually works the way you’d expect it to in XHTML.
Don’t use XHTML though.
Unless you want to combine it with SVG for a hybrid site [1].
[1] view source on http://emacsformacosx.com
I still think empty elements make more sense and a proper reformulation of XHTML(5) is the way it should have been done since the beginning.
Google's styleguide on that subject is also very clear that you should indeed not close void tags."
"Regarding your original suggestion: based on the arguments presented by the various people taking part in this discussion, I’ve now updated the specification to allow “/” characters at the end of void elements."
To which Sam Ruby responded:
"This is big. PHP’s nl2br function is now HTML5 compliant. WordPress won’t have to completely convert to HTML4 before people who wish to author documents targeting HTML5 can do so using this software. Such efforts can now afford to proceed much more incrementally. This is much more sensible and practical possibility."
http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble
Remember that both men played fundamental roles in shaping HTML5. And I think this one sentence sums up the mindset that shaped HTML5:
"The truth is that most HTML is authored by pagans."
and this was Sam Ruby's view at the time:
"When all the religion was stripped away from the trailing slash in always-empty HTML elements discussion, only one question remained: I think basically the argument is “it would help people” and the counter argument is “it would confuse people”. This is a eminently sane way to approach discussions such as these. I would argue that it would both help people and reduce confusion if a void <a/> element continued to be invalid HTML5 and, by implication, be invalid in XHTML5. By invalid, I simply mean that a parse error would be reported by a conformance checker whenever such constructs are found in a document. Non-draconian user agents can, of course, chose to recover from this error."
People with real lives have perhaps missed the sad slow way that the argument for XML on the Web, and therefore XHTML, has imploded. But the sad souls (such as me) who have followed this story are aware that the case against XHTML has developed slowly over the years.
The first salvo against XML on the web was launched by Mark Pilgrim way back in 2004. This is when the mania for XML was at its peak (before JSON had appeared), a time when people felt XML/XPATH would eventually replace SQL and RDBMS (an idea promoted by no less an authority than Sir Timothy Berners-Lee, who, at that time, could make a believable case that RDF was the future of the Web).
This is Pilgrims article "XML on the Web has Failed":
http://www.xml.com/pub/a/2004/07/21/dive.html
an excerpt:
"There are things called "transcoding proxies," used by ISPs and large organizations in Japan and Russia and other countries. A transcoding proxy will automatically convert text documents from one character encoding to another. If a feed is served as text/xml, the proxy treats it like any other text document, and transcodes it. It does this strictly at the HTTP level: it gets the current encoding from the HTTP headers, transcodes the document byte for byte, sets the charset parameter in the HTTP headers, and sends the document on its way. It never looks inside the document, so it doesn't know anything about this secret place inside the document where XML just happens to store encoding information. So there's a good reason, but this means that in some cases -- such as feeds served as text/xml -- the encoding attribute in the XML document is completely ignored."
The article we are talking about "To close or not to close" states:
"XHTML is basically the same as HTML but based on XML."
This is stated as a fact, but in fact many people have made the argument that XHTML never full functioned as XML, partly for the reasons that Pilgrim talks about, but also because only the strict versions of XHTML ever triggered the strict draconian error handling that has always been part of XML. However, there are other ways where XHTML was difficult to treat the same as XML. For instance:
No more "XML parsing failed" errors
http://intertwingly.net/blog/2011/10/03/No-more-XML-parsing-...
an excerpt:
"Note that the reason to do this is to deal with bad browser sniffing where sites send HTML/XHTML markup meant to be served as text/html as application/xhtml+xml, application/xml or text/xml only to Opera, which causes Opera to encounter an XML parse error that breaks the site for Opera."
Sam Ruby is a co-chair of the W3C's HTML Working Group, and if you've read his blog over the years, you are aware of the many problems that arise when treating XHTML as XML.
Some of the debates that have happened over the years simply reveal how much reality differs from the specs:
"HTML charset vs XML encoding"
http://www.intertwingly.net/blog/2004/02/13/HTML-charset-vs-...
If it was easy to develop a version of HTML that truly acted as a form of XML, would such debates have been necessary?
Please understand me: I am not criticizing all of the intelligent people who worked very hard on the specs for HTML and XML and XHTML. I am pointing out that after 15 years of effort, no one has found an easy way to treat XHTML as a form of XML under all circumstances. Surely if the brightest minds in the tech industry fail to make this work after 15 years, this is a circle that can not be squared?
Consider the fact that companies like Google felt they had no choice but to ignore the mime type "application/xhtml+xml":
Google Hates XHTML?
http://www.intertwingly.net/blog/2007/03/15/Confirmed-Google...
Sam Ruby also makes clear that the concessions to an XML style, including closing void elements, were thought of as an effort to ease the transition:
"I believe that if those that had created XHTML had the courage of their convictions, both Google and Microsoft would have had no choice. I also believe that there should have been a maintenance release or two of HTML4. In HTML5, the root element MAY have an xmlns attribute, but only if it matches the one defined by XHTML; and void elements may have terminating slash characters in their start element. It is these small touches that make transition easier."
Also, in another blog post Sam Ruby makes the point that the draconian error checking that is mandatory for XML also makes it impossible to develop those technologies that supporters of XML were excited about. He gave the example of sending an SVG image to his daughter, and her wanting to post it to her MySpace page: but SVG is XML, and so it should not render on a malformed page, and MySpace was permanently malformed. Sam Ruby could send a gif or a jpeg to his daughter, and she could post that, without a problem, to MySpace, but SVG was limited to well-formed, correctly served pages -- in a world where few pages are well-formed and correctly served. See the comments here:
http://www.intertwingly.net/blog/2006/11/24/Feedback-on-XHTM...
Also, if you have the time, see the debate here between Sam Ruby and Henri Sivonen:
http://intertwingly.net/blog/2012/11/09/In-defence-of-Polygl...
I feel that debate reveals much of the thinking that lead to HTML5 being so much more accepting than XHTML was.
Also, if you have a lot of time, this post from 2009, and the debate in the comments, will teach you a lot about the thinking that shaped HTML5:
http://intertwingly.net/blog/2009/04/08/HTML-Reunification
Finally, in a post I can not find, Sam Ruby makes the point that, for some strange reason, people seemed to very much want something called XHTML, even though it would not be able to act like real XML, for all the reasons that had been discussed in thousands of blog posts and chat rooms. He seemed puzzled by it.
Anyone who advocates for XHTML needs to think long and hard about what it is, exactly, that they are advocating for. If you want an HTML that has an XML style, can you say why?
Because I think that section 12.2 of the current HTML specification is outrageous. (The section is "Parsing HTML documents", if anyone is not familiar with it make sure to look at the subsections "Tokenization", "Tree Construction", etc.)
(That said, I appreciate your detailed comment; this is important history that too few people are aware of.)
(Also overenthusiasm for all things XML had nothing to do with RDF. RDF is not XML.)
There's perhaps no strong logical argument either way, but from a style perspective, I prefer to use closing slashes to make it absolutely clear what's going on.
The really evil one is to not make <div /> be exactly equivalent to <div></div> which is just batshit crazy. When I want a placeholder tag (to be populated later) I have to write <div></div> which feels completely unnatural,
But I admit, I suppose it could only be simple to me.
I don't see any good reason to use "</br>", but there's some other cases that could be useful, like not requiring spaces between quoted attributes (name1='value1'name2="value2"). I see a parallel with this and the evolution of natural languages: words and syntax that used to be incorrect gradually become accepted as part of the language and attain a normative meaning, because everyone still understands.
P.S. The article is very well written.
No you don't. You don't close void tags because it does absolutely nothing.
It's like adding HTML comments around javascript code. You haven't needed that in a decade, yet some people still do it.
The attribute value can remain unquoted if it doesn't contain space characters or any of " ' ` = < >
[1] http://www.w3.org/TR/html5/introduction.html#a-quick-introdu...
To close or not to close...
Why wouldn't you do it?
If you don't want to accidentally break it you shouldn't be writing XML by hand or gluing it from strings (https://hsivonen.fi/producing-xml/), so you need to output only using polyglot-compatible XML+HTML serializer.
That's a lot of work for case when maybe somebody will parse your markup as XML? All bots support HTML.
Only because it results in smaller files. For example it also recommends omitting optional tags for the same reason. I'm really skeptical that omitting these things helps readability (if that's what the guide is referring to when it says "scannability"). If size is at such a premium why not simply preprocess and minify HTML? Recently I tried briefly omitting "/>" from <br> and friends and I wasn't impressed as far as legibility goes. Maybe I just didn't try hard enough... :)
Note: “valid” here is defined as “theoretically valid as per the relevant spec” and doesn’t reflect what browsers actually support(ed).
All of them are optional to close, and everyone seems to differ on if you should close them.
Thanks!
One quibble: The conflation of tag and element in the article, making it hard to understand just what was meant.
For example, what is "tag content"?
WHTDOES ITEVN MEAN
Reminds me of PHP's T_PAAMAYIM_NEKUDOTAYIM