To close or not to close – Void HTML elements (opens in new tab)

(colorglare.com)

358 pointsenyo12y ago158 comments

158 comments

91 comments · 34 top-level

jrockway12y ago· 12 in thread

HTML loves its special cases. XML is overly complex, but at least your editor doesn't need to know anything special about what document type you're writing in order to indent it properly. Throw in HTML's special cases, and now it needs to know that <br> is different from <foo>.

I guess since HTML is so common it doesn't really matter, but really? We need 5 differnt types of markup, when one would have been fine?

https://xkcd.com/927/

bhaak12y ago

This. I wish they didn't do a HTML5 but instead only did a XHTML5.

There are a lot of good ideas in HTML5 but why did there need to be _another_ way of parsing HTML-like documents?

Apparently because it's the one HTML-parser to surpass and replace all other HTML-parsers out there. <sarcasm>Yeah, I totally believe that.</sarcasm>

jeswin12y ago

I prefer HTML over XHTML, because it is easier to write. I don't get the reasoning behind closing tags. LIs close before the next LI, or the UL. <BR> saves two characters over <BR /> and causes no harm. XHTML feels like trying too hard to make the machine overlord happy.

It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in.

EDIT: Another example. I write some HTML in a text-editor/textarea and send it across to someone. If I missed a </LI>, should the parser reject it? If not, the standard should be accommodating enough so that this is valid.

10 more replies

userbinator12y ago

I think it's excellent that HTML5 completely specifies the parsing in a very clear, and most backwards-compatible way; judging by what the big browser vendors have been doing, they seem to be following it. (It also gives a nice starting point that makes it easier for anyone to write their own parser, and have it behave the same as any other mainstream browser - and having the possibility of making more browsers available, with the same standard parsing behaviour, is a good thing.)

keeperofdakeys12y ago

XHTML lost, we decided that we preferred tag-soup, and keeping our past documents readable. Besides that, most XHTML came with a HTML mimetype, which meant it wasn't being read as XHTML. So the best bits, lighter parses, and XML embedding, were never usable.

1 more reply

vesinisa12y ago

Apparently, you can still write polyglot documents that are both valid XHTML and HTML5.[1] But this requires closing void tags with <tag />.

[1] http://blog.whatwg.org/xhtml5-in-a-nutshell

icebraining12y ago

I'm sure many at the W3C would have loved to develop only a new version of XHTML, but the problem is that it breaks retro-compatibility, and that's almost impossible to impose. Any browser to try that would see its lunch get eaten by its more permissive competition.

mcv12y ago

And if they were really going to break the standard like that, at least they could have broken it in such a way that it fixed all the stupid legacy decisions.

Have it support <img>alt text</img> and <meta>content</meta>, as well as the old way, and then the developers can decide if they want to support legacy browsers. (They probably do, but at least we're looking at a future where html will be just a tad cleaner and more consistent.)

praptak12y ago

Out of curiosity - what is the correct way to write the BR tag in XHTML5?

1 more reply

eponeponepon12y ago

> We need 5 differnt types of markup, when one would have been fine?

...and we already had SGML in the first place... :)

protonfish12y ago

Indenting HTML is a terrible practice. HTML is not a programming language - it is document markup. Source files should read like a line-wrapped text document sprinkled with embedded tags. Let your editor keep track of open/close pairs with highlighting (the way most already do.)

recursive12y ago

Have you ever maintained HTML? It doesn't sound like it.

1 more reply

saraid21612y ago

People indent text files for reasons other than "I am writing in a programming language."

For example, take an English class. Any. English. Class.

pavpanchekha12y ago· 6 in thread

I think the author's recommendations at the end, on making <meta> and <img> and <script> more sane, are good examples of where the "implement then standardize" process that the W3C uses falls down. In fact, XHTML2 (which was never implemented) had some good ideas. On the other hand, as we've seen so many times, implement then standardize reduces foot-dragging and needless bike-shedding. You take the good with the bad, I guess.

ghayes12y ago

I've been burned before by using <script src="..." /> and assuming it would work in all browsers. Instead, it subsumed later tags in a horrible way. I've never used empty-elements in HTML since.

mathias12y ago

`<script src="foo" />` only works the way you’d expect it to in XHTML. Proper XHTML, that is — served with the correct `Content-Type` header. http://mathiasbynens.be/notes/xhtml5

bsimpson12y ago

Chrome explicitly stopped supporting void syntax on the script tag to encourage people to use </script> so IE wouldn't die when it saw a <script />.

1 more reply

stormbrew12y ago

XHTML2 was largely fantastic, imo, and would have been an excellent successor to html. If it had been what started out the XHTML process I think it would have been more successful, but XHTML1 was such a foot-in-both-worlds mess that it needed to be put out of its misery.

bzbarsky12y ago

The big problem with XHTML2 was that it was designed by people who hated HTML. So they went and made it purposefully incompatible with HTML and XHTML1 in various ways (e.g. tags with the same localName and in the same namespace were supposed to have different behavior).

That made it impossible for a browser to implement both XHTML2 and XHTML1 at once (which was in fact the goal of some of the committee members). And then browsers were faced with the choice of implementing XHTML2 (no content at all out there) or XHTML1+HTML (lots of content out there) but not both, they picked the one you'd expect them to pick...

iSnow12y ago

Actually hardly anyone wanted XHTML2, because it was a purely academic excercise in making established things harder (<a href="..." target="_blank") without compelling features.

I tried to use it but then completely reverted to HTML4. Thank god we have HTML5 now.

1 more reply

rvkennedy12y ago· 6 in thread

HTML would be much easier to write if it were based on JSON. Less bandwidth too. Has this been attempted?

TazeTSchnitzel12y ago

Are you kidding me?

Currently:

  <ul>
    <li>Hello!</li>
  </ul>

By your suggestion:

  {
    "tagName": "ul",
    "children": [
      {
        "tagName": "li",
        "children": [
          "Hello!"
        ]
      }
    ]
  }

(and, yes, it has been attempted... JSON.stringify(document.body))

bzbarsky12y ago

> JSON.stringify(document.body)

One of two things will happen, depending on your browser.

If your browser is following the WebIDL spec, so all the accessors are on the prototype, this will produce "{}".

If your browser is WebKit-based, this will throw an exception, because body.firstChild.parentNode == body and JSON.stringify throws on object graphs with loops.

1 more reply

talmand12y ago

I don't think your example needs to be quite so verbose.

  {
    'ul': {
      'li': 'hello!'
    }
  }

I would think it would depend upon the parser.

Regardless, I'd still rather write out HTML instead of JSON for markup.

2 more replies

Shorel12y ago

LISP:

  (:ul (:li Hello))

Shorel12y ago

Nah.

It would be much easier as a Lisp S-EXP.

And yes, it has been attempted.

__david__12y ago

http://jsml.org

alkonaut12y ago· 5 in thread

When xhtml came to replace html4 it was such a huge relief for all OCD developers, and I thought I had seen the last non-xml compliant web page. Now I'm encouraged to write tag soup again because void elements? Humbug.

andybak12y ago

How does 'Tag soup' follow from 'let's not force pedantic XML parsing rules on people'?

HTML5 parsing is clearly defined and in most cases quite sensible. I think it was an excellent compromise.

alkonaut12y ago

It's the html5 standard that is complex and pedantic, it breaks silently when you violate one of hundreds of rules (e.g lists of void elements that can't be closed).

XML is simple. Sure it's pedantic in the sense that it breaks, but html5 breaks too only subtly.

It's like the difference between java and JavaScript. Java isn't more "pedantic" than JS in ANY way, it just breaks in a more understandable way (break loudly, early and understandably is in my view "better").

mattmanser12y ago

You say pedantic, while others might say simple, easy to remember, easy to error check and impossible to get wrong.

iSnow12y ago

Which probably goes to show that most developers are not afflicted with OCD but would rather have a more lenient spec. After all, XHTML2, which is even more strict, sold like hot cakes...

alkonaut12y ago

Don't confuse the term lenient to mean "pedantic but with very silent failures". The failures caused by forgetting to close tags in html5 are often catastrophic, which is why it isn't "lenient".

If html5 fails on some seemingly valid input (e.g. makes a strange layout when you self-close a div-tag) then it isn't lenient, it's still pedantic. It's just as pedantic as an xml standard is about closing tags, only that the specification for closing tags is dozens of pages instead of three words.

In fact, I think most developers agree that an error message would be preferable to a corrupt layout in the case of the self-cosed div.

dbbolton12y ago· 4 in thread

>Optionally, a "/" character, which may be present only if the element is a void element.

>There is absolutely no difference between <br> and <br />.

>Actually, one might argue that adding / to a void tag is an ignored syntax error.

>every browser and parser should not handle <br> and <br /> any differently

If it's optional and has absolutely no effect and makes no difference, how exactly would one argue that it's an error?

To me, this is like saying `print ${SHELL}` is erroneous because the braces don't do anything and `print $SHELL` does exactly the same thing. It may be superfluous, but it's not erroneous.

rimantas12y ago

It is erroneous. It only makes no difference because the error is ignored (or rather rendering is wrong). In HTML properly rendered <br/> would produce extra ">" on each occurrence. IIRC there was a browser (some reference implementation) that did this correctly. Also, I remember Gecko used to flag these slashes in source view too.

Ideka12y ago

I'm sure the parent was talking about HTML5. It isn't erroneous in HTML5.

1 more reply

userbinator12y ago

I noticed the same thing and made a more general observation about "errors" here: https://news.ycombinator.com/item?id=7311197

ivanca12y ago

Is erroneous because one of the stated goals of HTML5 is semantic value, and under any line of logic you cannot close something that isn't open, therefore is an error, albeit not (yet?) a technical one.

mathias12y ago· 3 in thread

From the article:

“It is not, and has never been, valid HTML to write `<br></br>`.”

Sure, but note that it is perfectly valid XHTML (which is a form of HTML).

Oh, and `<script src="foo" />` actually works the way you’d expect it to in XHTML.

Don’t use XHTML though.

__david__12y ago

> Don’t use XHTML though.

Unless you want to combine it with SVG for a hybrid site [1].

[1] view source on http://emacsformacosx.com

TheZenPsycho12y ago

not necessary. html5 permits inline svg and in <img src=>. And unlike the XML hybrid document stuff, it actually works in real browsers.

1 more reply

stan_rogers12y ago

XHTML is not a form of HTML; it's a dialect of XML that bears a strong surface resemblance (but only a surface resemblance) to HTML.

vixen9912y ago· 3 in thread

"has its disadvantages as well". Its!

evincarofautumn12y ago

What are you saying? “Its” is the possessive of “it”. “It’s” is a contraction of “it is” or “it has”.

enyoOP12y ago

It was "it's" before.

1 more reply

enyoOP12y ago

Thanks. Corrected.

muyuu12y ago· 2 in thread

I'm sorry but saying a discussion is over because "Google says so in their style guide" is contemptible.

I still think empty elements make more sense and a proper reformulation of XHTML(5) is the way it should have been done since the beginning.

enyoOP12y ago

Yes it is! Who said that? (Or was that all you got out of the article?)

muyuu12y ago

"Well, for those of you who are really addicted to X(HT)ML, you might think, «yeah, it's optional, but <br /> is still 'more correct'», but I have to tell you: it is not. Actually, one might argue that adding / to a void tag is an ignored syntax error. The possibility to write it has mostly been added for compatibility reasons and every browser and parser should not handle <br> and <br /> any differently.

Google's styleguide on that subject is also very clear that you should indeed not close void tags."

1 more reply

robin_reala12y ago· 2 in thread

Know why everyone writes <br /> instead of <br/>? IE5 on the Mac’s parser broke it if found an empty tag without a space before the closing slash. Funny how software can vanish into the mists of time yet still have an effect on current coding.

skakri12y ago

HTML and XHTML 'compatibility' for older versions of IE. IE6 couldn't parse application/html+xml MIME type and broke layouts. <br /> fixed that issue.

robin_reala12y ago

Add IE7 and IE8 to the list of browsers that don’t understand XHTML.

lkrubner12y ago· 2 in thread

This is from Ian Hickson in 2006, regarding the emergence of HTML5:

"Regarding your original suggestion: based on the arguments presented by the various people taking part in this discussion, I’ve now updated the specification to allow “/” characters at the end of void elements."

To which Sam Ruby responded:

"This is big. PHP’s nl2br function is now HTML5 compliant. WordPress won’t have to completely convert to HTML4 before people who wish to author documents targeting HTML5 can do so using this software. Such efforts can now afford to proceed much more incrementally. This is much more sensible and practical possibility."

http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble

Remember that both men played fundamental roles in shaping HTML5. And I think this one sentence sums up the mindset that shaped HTML5:

"The truth is that most HTML is authored by pagans."

and this was Sam Ruby's view at the time:

"When all the religion was stripped away from the trailing slash in always-empty HTML elements discussion, only one question remained: I think basically the argument is “it would help people” and the counter argument is “it would confuse people”. This is a eminently sane way to approach discussions such as these. I would argue that it would both help people and reduce confusion if a void <a/> element continued to be invalid HTML5 and, by implication, be invalid in XHTML5. By invalid, I simply mean that a parse error would be reported by a conformance checker whenever such constructs are found in a document. Non-draconian user agents can, of course, chose to recover from this error."

People with real lives have perhaps missed the sad slow way that the argument for XML on the Web, and therefore XHTML, has imploded. But the sad souls (such as me) who have followed this story are aware that the case against XHTML has developed slowly over the years.

The first salvo against XML on the web was launched by Mark Pilgrim way back in 2004. This is when the mania for XML was at its peak (before JSON had appeared), a time when people felt XML/XPATH would eventually replace SQL and RDBMS (an idea promoted by no less an authority than Sir Timothy Berners-Lee, who, at that time, could make a believable case that RDF was the future of the Web).

This is Pilgrims article "XML on the Web has Failed":

http://www.xml.com/pub/a/2004/07/21/dive.html

an excerpt:

"There are things called "transcoding proxies," used by ISPs and large organizations in Japan and Russia and other countries. A transcoding proxy will automatically convert text documents from one character encoding to another. If a feed is served as text/xml, the proxy treats it like any other text document, and transcodes it. It does this strictly at the HTTP level: it gets the current encoding from the HTTP headers, transcodes the document byte for byte, sets the charset parameter in the HTTP headers, and sends the document on its way. It never looks inside the document, so it doesn't know anything about this secret place inside the document where XML just happens to store encoding information. So there's a good reason, but this means that in some cases -- such as feeds served as text/xml -- the encoding attribute in the XML document is completely ignored."

The article we are talking about "To close or not to close" states:

"XHTML is basically the same as HTML but based on XML."

This is stated as a fact, but in fact many people have made the argument that XHTML never full functioned as XML, partly for the reasons that Pilgrim talks about, but also because only the strict versions of XHTML ever triggered the strict draconian error handling that has always been part of XML. However, there are other ways where XHTML was difficult to treat the same as XML. For instance:

No more "XML parsing failed" errors

http://intertwingly.net/blog/2011/10/03/No-more-XML-parsing-...

an excerpt:

"Note that the reason to do this is to deal with bad browser sniffing where sites send HTML/XHTML markup meant to be served as text/html as application/xhtml+xml, application/xml or text/xml only to Opera, which causes Opera to encounter an XML parse error that breaks the site for Opera."

Sam Ruby is a co-chair of the W3C's HTML Working Group, and if you've read his blog over the years, you are aware of the many problems that arise when treating XHTML as XML.

Some of the debates that have happened over the years simply reveal how much reality differs from the specs:

"HTML charset vs XML encoding"

http://www.intertwingly.net/blog/2004/02/13/HTML-charset-vs-...

If it was easy to develop a version of HTML that truly acted as a form of XML, would such debates have been necessary?

Please understand me: I am not criticizing all of the intelligent people who worked very hard on the specs for HTML and XML and XHTML. I am pointing out that after 15 years of effort, no one has found an easy way to treat XHTML as a form of XML under all circumstances. Surely if the brightest minds in the tech industry fail to make this work after 15 years, this is a circle that can not be squared?

Consider the fact that companies like Google felt they had no choice but to ignore the mime type "application/xhtml+xml":

Google Hates XHTML?

http://www.intertwingly.net/blog/2007/03/15/Confirmed-Google...

Sam Ruby also makes clear that the concessions to an XML style, including closing void elements, were thought of as an effort to ease the transition:

"I believe that if those that had created XHTML had the courage of their convictions, both Google and Microsoft would have had no choice. I also believe that there should have been a maintenance release or two of HTML4. In HTML5, the root element MAY have an xmlns attribute, but only if it matches the one defined by XHTML; and void elements may have terminating slash characters in their start element. It is these small touches that make transition easier."

Also, in another blog post Sam Ruby makes the point that the draconian error checking that is mandatory for XML also makes it impossible to develop those technologies that supporters of XML were excited about. He gave the example of sending an SVG image to his daughter, and her wanting to post it to her MySpace page: but SVG is XML, and so it should not render on a malformed page, and MySpace was permanently malformed. Sam Ruby could send a gif or a jpeg to his daughter, and she could post that, without a problem, to MySpace, but SVG was limited to well-formed, correctly served pages -- in a world where few pages are well-formed and correctly served. See the comments here:

http://www.intertwingly.net/blog/2006/11/24/Feedback-on-XHTM...

Also, if you have the time, see the debate here between Sam Ruby and Henri Sivonen:

http://intertwingly.net/blog/2012/11/09/In-defence-of-Polygl...

I feel that debate reveals much of the thinking that lead to HTML5 being so much more accepting than XHTML was.

Also, if you have a lot of time, this post from 2009, and the debate in the comments, will teach you a lot about the thinking that shaped HTML5:

http://intertwingly.net/blog/2009/04/08/HTML-Reunification

Finally, in a post I can not find, Sam Ruby makes the point that, for some strange reason, people seemed to very much want something called XHTML, even though it would not be able to act like real XML, for all the reasons that had been discussed in thousands of blog posts and chat rooms. He seemed puzzled by it.

Anyone who advocates for XHTML needs to think long and hard about what it is, exactly, that they are advocating for. If you want an HTML that has an XML style, can you say why?

bct12y ago

> If you want an HTML that has an XML style, can you say why?

Because I think that section 12.2 of the current HTML specification is outrageous. (The section is "Parsing HTML documents", if anyone is not familiar with it make sure to look at the subsections "Tokenization", "Tree Construction", etc.)

(That said, I appreciate your detailed comment; this is important history that too few people are aware of.)

(Also overenthusiasm for all things XML had nothing to do with RDF. RDF is not XML.)

rjd12y ago

I do it for a simple reason, layout clean up with auto-indent. I've found HTML layout cleanup to be unreliable in most editors. Where as XML layout works 99% of the time.

skywhopper12y ago· 2 in thread

HTML5 is a huge improvement over the HTML4.01/XHTML madness that was going on back in the day. And it's fine with me to allow non-closed singleton tags.

There's perhaps no strong logical argument either way, but from a style perspective, I prefer to use closing slashes to make it absolutely clear what's going on.

alkonaut12y ago

Allowing the void tags to be unclosed is the lesser evil of the two, I can even accept the argument behind it (they can't have content) even though it complicates the syntax.

The really evil one is to not make <div /> be exactly equivalent to <div></div> which is just batshit crazy. When I want a placeholder tag (to be populated later) I have to write <div></div> which feels completely unnatural,

talmand12y ago

Seems simple to me, a container element should always have opening and closing tags. An element that will never contain anything is self-closing.

But I admit, I suppose it could only be simple to me.

userbinator12y ago· 2 in thread

I think that now with HTML5 standardising the parsing behaviour ( http://www.w3.org/TR/html5/syntax.html ), looking at that is very useful too - it shows that void elements get closed automatically by the parser whether or not "/" is included, some other extraneous end tags get ignored completely, and also shows that "</br>" gets parsed as "<br>". So the example given in the article, "<br>Hello!</br>", does have a defined meaning in HTML5 - equivalent to "<br>Hello!<br>".

enyoOP12y ago

I'm talking about the specs in the article (not how browsers interpret errors). So </br> may be interpreted as <br> but is actually a syntax error. I quoted the HTML5 specification in the VALIDITY section of the article.

userbinator12y ago

The fact that HTML5 basically completely specified the parsing for any string of input, even "syntax error" cases, raises an interesting point: if these errors still result in some DOM and across all browsers that choose to implement the error handling (which has also been standardised) so they will have the same behaviour, are they really true "errors" anymore? We usually think of error cases (e.g. in a programming language) as ones which have no meaning or could cause implementation-defined/undefined behaviour, but these have been completely defined by the standard.

I don't see any good reason to use "</br>", but there's some other cases that could be useful, like not requiring spaces between quoted attributes (name1='value1'name2="value2"). I see a parallel with this and the evolution of natural languages: words and syntax that used to be incorrect gradually become accepted as part of the language and attain a normative meaning, because everyone still understands.

enscr12y ago· 2 in thread

TL;DR : "Google's styleguide on that subject is also very clear that you should indeed not close void tags"

P.S. The article is very well written.

btbuildem12y ago

Thanks, that should have been the first line after the introductory paragraph..

garethadams12y ago

The take-away from the article should be "…because now I understand the issues", and not "…because Google says so"

1 more reply

huhtenberg12y ago· 2 in thread

I appreciate the amount of research that went into it, but in reality this all falls squarely into domain of pedantry, because you close void tags either way and move on to more important matters.

enyoOP12y ago

How to close void tags is more of a leitmotif to learn more about the whole subject, and the reason for investigating it. If you're not interested in understanding the core features of the markup language you're using, then this article is definitely not for you.

ars12y ago

> because you close void tags either way

No you don't. You don't close void tags because it does absolutely nothing.

It's like adding HTML comments around javascript code. You haven't needed that in a decade, yet some people still do it.

eik3_de12y ago· 1 in thread

If you like to keep it terse, it's perfectly fine not to quote attribute values. The HTML5 spec[1] says:

The attribute value can remain unquoted if it doesn't contain space characters or any of " ' ` = < >

[1] http://www.w3.org/TR/html5/introduction.html#a-quick-introdu...

rimantas12y ago

This is also true for previous versions of HTML, as well as omitting some optional end tags.

yashg12y ago· 1 in thread

Now these are the kind of articles that I like to read on HN. A very detailed analysis about a single aspect of programming.

billmalarky12y ago

Not to mention it's a style choice I've struggled with back and forth for years. So simple, yet I can't decide which one to use.

To close or not to close...

dools12y ago· 1 in thread

There is an advantage to writing your HTML as well formed XML, and that's being able to parse it as XML if you want to. There's no disadvantage to writing your HTML as well formed XML.

Why wouldn't you do it?

pornel12y ago

The polyglot syntax gets weird in CDATA elements and you have to add a bunch of talismans to the code.

If you don't want to accidentally break it you shouldn't be writing XML by hand or gluing it from strings (https://hsivonen.fi/producing-xml/), so you need to output only using polyglot-compatible XML+HTML serializer.

That's a lot of work for case when maybe somebody will parse your markup as XML? All bots support HTML.

brunnsbe12y ago· 1 in thread

Nice write-up! I have never thought about shrinking the closing tags to </>, if it were supported it would shrink large HTML-pages quite nicely. Has there been a proposal at W3C to use that kind of a format back in the good old days of HTML 1.0?

userbinator12y ago

I don't know about the historical aspect but I do know that the HTML5 parsing spec explicitly ignores the "</>" sequence. More interestingly, "</ >" (with an extra space) is parsed as a "boguscomment" which means it basically adds a comment node.

nzp12y ago

> Google's styleguide on that subject is also very clear that you should indeed not close void tags.

Only because it results in smaller files. For example it also recommends omitting optional tags for the same reason. I'm really skeptical that omitting these things helps readability (if that's what the guide is referring to when it says "scannability"). If size is at such a premium why not simply preprocess and minify HTML? Recently I tried briefly omitting "/>" from <br> and friends and I wasn't impressed as far as legibility goes. Maybe I just didn't try hard enough... :)

1 more reply

mathias12y ago

I used the SGML NET trick a few years back in an attempt to create the shortest possible valid HTML documents for different versions of HTML: http://mathiasbynens.be/notes/minimal-html

Note: “valid” here is defined as “theoretically valid as per the relevant spec” and doesn’t reflect what browsers actually support(ed).

NKCSS12y ago

Very good article. Been doing most of my web development in the .NET area, starting with ASP.NET and the strict XHTML, I've picked up the habit to always write the /> variant, so it's nice to read about which one to use in the HTML5 age :)

Destitute12y ago

I feel so much better that I don't have to both typing out <br /> anymore after running my HTML through a validator when I first began serious HTML coding (self-taught). It was a habit that stuck with me and is by far one of the most difficult, finger-stretching pieces of code to write. Nowadays I don't even have too much use for breaks, but it's going to be a relief to just throw a <br> ... Ahhh that was so easy to type.

granttimmerman12y ago

Always Be Closing. Otherwise, say hello HTML preprocessors.

ars12y ago

I thought this was going to about closing LI, TR, TD, and TH, OPTION.

All of them are optional to close, and everyone seems to differ on if you should close them.

bsimpson12y ago

I've often wondered about why /> syntax doesn't work on some elements. Now I know.

Thanks!

jamesbritt12y ago

Quite interesting, especially the stuff about the SGML arcania.

One quibble: The conflation of tag and element in the article, making it hard to understand just what was meant.

For example, what is "tag content"?

ed_blackburn12y ago

I'm looking into polygot markup for an api at the moment. This makes lots of sense and explains the idiosyncrasies that easily confuse me.

Siecje12y ago

If you can use the text portion of meta and other tags. Which void tags are actually required?

rocky512y ago

Well researched article.

kirbyk12y ago

Fantastic article. And god I love this website's design.

steffex12y ago

nice article, i totally agree with the suggestions.

sippeangelo12y ago

SHORTTAG NETENABL IMMEDNET

WHTDOES ITEVN MEAN

Reminds me of PHP's T_PAAMAYIM_NEKUDOTAYIM

icantthinkofone12y ago

All of this has been clearly outlined in the spec for decades and many articles have been written over the years talking about this same issue. Why this is a problem for any professional developer, I just don't have a clue.

goggles9912y ago

TL,DR... Do not close.

j / k navigate · click thread line to collapse

158 comments

91 comments · 34 top-level

jrockway12y ago· 12 in thread

I guess since HTML is so common it doesn't really matter, but really? We need 5 differnt types of markup, when one would have been fine?

https://xkcd.com/927/

bhaak12y ago

This. I wish they didn't do a HTML5 but instead only did a XHTML5.

There are a lot of good ideas in HTML5 but why did there need to be _another_ way of parsing HTML-like documents?

Apparently because it's the one HTML-parser to surpass and replace all other HTML-parsers out there. <sarcasm>Yeah, I totally believe that.</sarcasm>

jeswin12y ago

It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in.

10 more replies

userbinator12y ago

keeperofdakeys12y ago

1 more reply

vesinisa12y ago

Apparently, you can still write polyglot documents that are both valid XHTML and HTML5.[1] But this requires closing void tags with <tag />.

[1] http://blog.whatwg.org/xhtml5-in-a-nutshell

icebraining12y ago

mcv12y ago

And if they were really going to break the standard like that, at least they could have broken it in such a way that it fixed all the stupid legacy decisions.

praptak12y ago

Out of curiosity - what is the correct way to write the BR tag in XHTML5?

1 more reply

eponeponepon12y ago

> We need 5 differnt types of markup, when one would have been fine?

...and we already had SGML in the first place... :)

protonfish12y ago

recursive12y ago

Have you ever maintained HTML? It doesn't sound like it.

1 more reply

saraid21612y ago

People indent text files for reasons other than "I am writing in a programming language."

For example, take an English class. Any. English. Class.

pavpanchekha12y ago· 6 in thread

ghayes12y ago

I've been burned before by using <script src="..." /> and assuming it would work in all browsers. Instead, it subsumed later tags in a horrible way. I've never used empty-elements in HTML since.

mathias12y ago

`<script src="foo" />` only works the way you’d expect it to in XHTML. Proper XHTML, that is — served with the correct `Content-Type` header. http://mathiasbynens.be/notes/xhtml5

bsimpson12y ago

Chrome explicitly stopped supporting void syntax on the script tag to encourage people to use </script> so IE wouldn't die when it saw a <script />.

1 more reply

stormbrew12y ago

bzbarsky12y ago

iSnow12y ago

Actually hardly anyone wanted XHTML2, because it was a purely academic excercise in making established things harder (<a href="..." target="_blank") without compelling features.

I tried to use it but then completely reverted to HTML4. Thank god we have HTML5 now.

1 more reply

rvkennedy12y ago· 6 in thread

HTML would be much easier to write if it were based on JSON. Less bandwidth too. Has this been attempted?

TazeTSchnitzel12y ago

Are you kidding me?

Currently:

  <ul>
    <li>Hello!</li>
  </ul>

By your suggestion:

  {
    "tagName": "ul",
    "children": [
      {
        "tagName": "li",
        "children": [
          "Hello!"
        ]
      }
    ]
  }

(and, yes, it has been attempted... JSON.stringify(document.body))

bzbarsky12y ago

> JSON.stringify(document.body)

One of two things will happen, depending on your browser.

If your browser is following the WebIDL spec, so all the accessors are on the prototype, this will produce "{}".

If your browser is WebKit-based, this will throw an exception, because body.firstChild.parentNode == body and JSON.stringify throws on object graphs with loops.

1 more reply

talmand12y ago

I don't think your example needs to be quite so verbose.

  {
    'ul': {
      'li': 'hello!'
    }
  }

I would think it would depend upon the parser.

Regardless, I'd still rather write out HTML instead of JSON for markup.

2 more replies

Shorel12y ago

LISP:

  (:ul (:li Hello))

Shorel12y ago

Nah.

It would be much easier as a Lisp S-EXP.

And yes, it has been attempted.

__david__12y ago

http://jsml.org

alkonaut12y ago· 5 in thread

andybak12y ago

How does 'Tag soup' follow from 'let's not force pedantic XML parsing rules on people'?

HTML5 parsing is clearly defined and in most cases quite sensible. I think it was an excellent compromise.

alkonaut12y ago

It's the html5 standard that is complex and pedantic, it breaks silently when you violate one of hundreds of rules (e.g lists of void elements that can't be closed).

XML is simple. Sure it's pedantic in the sense that it breaks, but html5 breaks too only subtly.

mattmanser12y ago

You say pedantic, while others might say simple, easy to remember, easy to error check and impossible to get wrong.

iSnow12y ago

Which probably goes to show that most developers are not afflicted with OCD but would rather have a more lenient spec. After all, XHTML2, which is even more strict, sold like hot cakes...

alkonaut12y ago

Don't confuse the term lenient to mean "pedantic but with very silent failures". The failures caused by forgetting to close tags in html5 are often catastrophic, which is why it isn't "lenient".

In fact, I think most developers agree that an error message would be preferable to a corrupt layout in the case of the self-cosed div.

dbbolton12y ago· 4 in thread

>Optionally, a "/" character, which may be present only if the element is a void element.

>There is absolutely no difference between <br> and <br />.

>Actually, one might argue that adding / to a void tag is an ignored syntax error.

>every browser and parser should not handle <br> and <br /> any differently

If it's optional and has absolutely no effect and makes no difference, how exactly would one argue that it's an error?

To me, this is like saying `print ${SHELL}` is erroneous because the braces don't do anything and `print $SHELL` does exactly the same thing. It may be superfluous, but it's not erroneous.

rimantas12y ago

Ideka12y ago

I'm sure the parent was talking about HTML5. It isn't erroneous in HTML5.

1 more reply

userbinator12y ago

I noticed the same thing and made a more general observation about "errors" here: https://news.ycombinator.com/item?id=7311197

ivanca12y ago

mathias12y ago· 3 in thread

From the article:

“It is not, and has never been, valid HTML to write `<br></br>`.”

Sure, but note that it is perfectly valid XHTML (which is a form of HTML).

Oh, and `<script src="foo" />` actually works the way you’d expect it to in XHTML.

Don’t use XHTML though.

__david__12y ago

> Don’t use XHTML though.

Unless you want to combine it with SVG for a hybrid site [1].

[1] view source on http://emacsformacosx.com

TheZenPsycho12y ago

not necessary. html5 permits inline svg and in <img src=>. And unlike the XML hybrid document stuff, it actually works in real browsers.

1 more reply

stan_rogers12y ago

XHTML is not a form of HTML; it's a dialect of XML that bears a strong surface resemblance (but only a surface resemblance) to HTML.

vixen9912y ago· 3 in thread

"has its disadvantages as well". Its!

evincarofautumn12y ago

What are you saying? “Its” is the possessive of “it”. “It’s” is a contraction of “it is” or “it has”.

enyoOP12y ago

It was "it's" before.

1 more reply

enyoOP12y ago

Thanks. Corrected.

muyuu12y ago· 2 in thread

I'm sorry but saying a discussion is over because "Google says so in their style guide" is contemptible.

I still think empty elements make more sense and a proper reformulation of XHTML(5) is the way it should have been done since the beginning.

enyoOP12y ago

Yes it is! Who said that? (Or was that all you got out of the article?)

muyuu12y ago

Google's styleguide on that subject is also very clear that you should indeed not close void tags."

1 more reply

robin_reala12y ago· 2 in thread

skakri12y ago

HTML and XHTML 'compatibility' for older versions of IE. IE6 couldn't parse application/html+xml MIME type and broke layouts. <br /> fixed that issue.

robin_reala12y ago

Add IE7 and IE8 to the list of browsers that don’t understand XHTML.

lkrubner12y ago· 2 in thread

This is from Ian Hickson in 2006, regarding the emergence of HTML5:

To which Sam Ruby responded:

http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble

Remember that both men played fundamental roles in shaping HTML5. And I think this one sentence sums up the mindset that shaped HTML5:

"The truth is that most HTML is authored by pagans."

and this was Sam Ruby's view at the time:

This is Pilgrims article "XML on the Web has Failed":

http://www.xml.com/pub/a/2004/07/21/dive.html

an excerpt:

The article we are talking about "To close or not to close" states:

"XHTML is basically the same as HTML but based on XML."

No more "XML parsing failed" errors

http://intertwingly.net/blog/2011/10/03/No-more-XML-parsing-...

an excerpt:

Sam Ruby is a co-chair of the W3C's HTML Working Group, and if you've read his blog over the years, you are aware of the many problems that arise when treating XHTML as XML.

Some of the debates that have happened over the years simply reveal how much reality differs from the specs:

"HTML charset vs XML encoding"

http://www.intertwingly.net/blog/2004/02/13/HTML-charset-vs-...

If it was easy to develop a version of HTML that truly acted as a form of XML, would such debates have been necessary?

Consider the fact that companies like Google felt they had no choice but to ignore the mime type "application/xhtml+xml":

Google Hates XHTML?

http://www.intertwingly.net/blog/2007/03/15/Confirmed-Google...

Sam Ruby also makes clear that the concessions to an XML style, including closing void elements, were thought of as an effort to ease the transition:

http://www.intertwingly.net/blog/2006/11/24/Feedback-on-XHTM...

Also, if you have the time, see the debate here between Sam Ruby and Henri Sivonen:

http://intertwingly.net/blog/2012/11/09/In-defence-of-Polygl...

I feel that debate reveals much of the thinking that lead to HTML5 being so much more accepting than XHTML was.

Also, if you have a lot of time, this post from 2009, and the debate in the comments, will teach you a lot about the thinking that shaped HTML5:

http://intertwingly.net/blog/2009/04/08/HTML-Reunification

Anyone who advocates for XHTML needs to think long and hard about what it is, exactly, that they are advocating for. If you want an HTML that has an XML style, can you say why?

bct12y ago

> If you want an HTML that has an XML style, can you say why?

(That said, I appreciate your detailed comment; this is important history that too few people are aware of.)

(Also overenthusiasm for all things XML had nothing to do with RDF. RDF is not XML.)

rjd12y ago

I do it for a simple reason, layout clean up with auto-indent. I've found HTML layout cleanup to be unreliable in most editors. Where as XML layout works 99% of the time.

skywhopper12y ago· 2 in thread

HTML5 is a huge improvement over the HTML4.01/XHTML madness that was going on back in the day. And it's fine with me to allow non-closed singleton tags.

There's perhaps no strong logical argument either way, but from a style perspective, I prefer to use closing slashes to make it absolutely clear what's going on.

alkonaut12y ago

Allowing the void tags to be unclosed is the lesser evil of the two, I can even accept the argument behind it (they can't have content) even though it complicates the syntax.

talmand12y ago

Seems simple to me, a container element should always have opening and closing tags. An element that will never contain anything is self-closing.

But I admit, I suppose it could only be simple to me.

userbinator12y ago· 2 in thread

enyoOP12y ago

userbinator12y ago

enscr12y ago· 2 in thread

TL;DR : "Google's styleguide on that subject is also very clear that you should indeed not close void tags"

P.S. The article is very well written.

btbuildem12y ago

Thanks, that should have been the first line after the introductory paragraph..

garethadams12y ago

The take-away from the article should be "…because now I understand the issues", and not "…because Google says so"

1 more reply

huhtenberg12y ago· 2 in thread

I appreciate the amount of research that went into it, but in reality this all falls squarely into domain of pedantry, because you close void tags either way and move on to more important matters.

enyoOP12y ago

ars12y ago

> because you close void tags either way

No you don't. You don't close void tags because it does absolutely nothing.

It's like adding HTML comments around javascript code. You haven't needed that in a decade, yet some people still do it.

eik3_de12y ago· 1 in thread

If you like to keep it terse, it's perfectly fine not to quote attribute values. The HTML5 spec[1] says:

The attribute value can remain unquoted if it doesn't contain space characters or any of " ' ` = < >

[1] http://www.w3.org/TR/html5/introduction.html#a-quick-introdu...

rimantas12y ago

This is also true for previous versions of HTML, as well as omitting some optional end tags.

yashg12y ago· 1 in thread

Now these are the kind of articles that I like to read on HN. A very detailed analysis about a single aspect of programming.

billmalarky12y ago

Not to mention it's a style choice I've struggled with back and forth for years. So simple, yet I can't decide which one to use.

To close or not to close...

dools12y ago· 1 in thread

There is an advantage to writing your HTML as well formed XML, and that's being able to parse it as XML if you want to. There's no disadvantage to writing your HTML as well formed XML.

Why wouldn't you do it?

pornel12y ago

The polyglot syntax gets weird in CDATA elements and you have to add a bunch of talismans to the code.

That's a lot of work for case when maybe somebody will parse your markup as XML? All bots support HTML.

brunnsbe12y ago· 1 in thread

userbinator12y ago

nzp12y ago

> Google's styleguide on that subject is also very clear that you should indeed not close void tags.

1 more reply

mathias12y ago

I used the SGML NET trick a few years back in an attempt to create the shortest possible valid HTML documents for different versions of HTML: http://mathiasbynens.be/notes/minimal-html

Note: “valid” here is defined as “theoretically valid as per the relevant spec” and doesn’t reflect what browsers actually support(ed).

NKCSS12y ago

Destitute12y ago

granttimmerman12y ago

Always Be Closing. Otherwise, say hello HTML preprocessors.

ars12y ago

I thought this was going to about closing LI, TR, TD, and TH, OPTION.

All of them are optional to close, and everyone seems to differ on if you should close them.

bsimpson12y ago

I've often wondered about why /> syntax doesn't work on some elements. Now I know.

Thanks!

jamesbritt12y ago

Quite interesting, especially the stuff about the SGML arcania.

One quibble: The conflation of tag and element in the article, making it hard to understand just what was meant.

For example, what is "tag content"?

ed_blackburn12y ago

I'm looking into polygot markup for an api at the moment. This makes lots of sense and explains the idiosyncrasies that easily confuse me.

Siecje12y ago

If you can use the text portion of meta and other tags. Which void tags are actually required?

rocky512y ago

Well researched article.

kirbyk12y ago

Fantastic article. And god I love this website's design.

steffex12y ago

nice article, i totally agree with the suggestions.

sippeangelo12y ago

SHORTTAG NETENABL IMMEDNET

WHTDOES ITEVN MEAN

Reminds me of PHP's T_PAAMAYIM_NEKUDOTAYIM

icantthinkofone12y ago

goggles9912y ago

TL,DR... Do not close.

j / k navigate · click thread line to collapse