HTML Optional Tags (opens in new tab)

(html.spec.whatwg.org)

62 pointssoftskunk2y ago34 comments

34 comments

25 comments · 6 top-level

JodieBenitez2y ago· 13 in thread

Yes... but why do this ? I don't regret the XHTML days and its feature stagnation, but this is just useless.

I learned HTML when XHTML 1.0 was current. I've long preferred strict HTML written using XML. It helps spotting mistakes, and there are no parsing surprise.

Now that HTML5 parsing is well specified, I've come to think that either you want to be strict and have the browser tell you something is wrong, and you use XHTML for this, or all these optional tags are just useless.

I want to optimize readability, and then file size. I believe closing all tags you opened and quoting all attributes helps readability, and also that all these <head>, <body>, <html> tags just get in the way and make your eyes go through useless boilerplate and makes your fingers type useless things too if you don't use templates.

You still need to specify the charset so characters are interpreted correctly, so for me, if you are not going to use application/html+xml anyway this works well:

    <!DOCTYPE html>
    <meta charset="utf-8" />
    <title> My title </title>

    <p> Lorem ipsum... </p>

Both quicker to read and write, while not raising maintenance costs.

Though just yesterday I edited my resume written in XHTML and the browser actually spotted a dumb mistake, so I still like the strictness of the XML parsing mode.

One counterpoint to dropping the optional tags is for pedagogy: if I had to teach HTML to someone, I would make them use all the tags, or the result of having html and body in the DOM and CSS working on them will be very confusing. Only when they understand the DOM, what nodes are in an HTML page, I'd make them drop the tags if they want. Which is an important step so they can understand that nodes that are present in the DOM are not necessarily in the source code.

xigoi2y ago

You need to include the <html> tag for the lang attribute, which is important for accessibility.

1 more reply

lifthrasiir2y ago

It is a formal specification of what browsers used to do with a broken HTML. Because it is now fully specified, everyone can safely write a broken HTML (half joking!), but also there is no longer a surprising behavior due to diverging behaviors from different browsers.

JodieBenitez2y ago

Good... now let's make a formal specification for this:

    <ahahah>mmm... <ohoho>nope.</ahaha></ohoho>

Fully joking :-P

1 more reply

MrVandemar2y ago

It's very useful.

I use optional -- therefore abbreviated -- HTML syntax as an alternative to MarkDown for writing.

JodieBenitez2y ago

You're saying you're using an abbreviated HTML as an alternative to a markup language that is a lightweigth alternative to the markup language known as HTML ?

We need to go deeper. •`_´•

1 more reply

GoblinSlayer2y ago

This way you can send spam sms messages with little html markup and google analytics.

layer82y ago

It was a codification of existing browser practice. Specifying something different wouldn’t have changed browser behavior, it would only have led to browsers ignoring the spec.

alerighi2y ago

Why not. If there is no ambiguity you save characeters (few bytes, but for each page) and thus pages will load faster even on slow connections. Also if you write HTML by hand (something not a lot of people does these days, but for example for my site I do) it's less characters to type and it's simpler.

kevincox2y ago

The problem is that every parser and emitter needs to be aware of these weird and changing rules. It wouldn't be that bad if the only things that read HTML were browsers but as it is every language has HTML parsers that are broken different ways leading to bugs and security vulnerabilities.

For example emitters need to know what void elements are because <br></br> is actually equivalent to`<br><br>`. But `<script src=foo.js/>` is only an opening tag so the rest of your document will be executed as JavaScript. So you can't just write an emitter for arbitrary elements, you need to emit different things for `br` and `script`. Plus `script` has special escaping rules that are often forgotten about. Plus you better keep that list up to date!

With XHTML it is very easy to write a parser that will construct a tree forever and can reserialize it with no issues. I have no issues with consistent changes such as empty attributes and unquoted attribute values, but I think that these element insertion, auto-closing, void elements and non-replaceable character data are a mistake because you need to maintain an up-to-date dataset of these custom rules or you get an incorrect result.

pwdisswordfishc2y ago

HTML is still feature-stagnated; features are mostly added to JavaScript and DOM APIs.

jimmaswell2y ago

CSS gets relatively frequent new features too. I think this status quo is fair enough - HTML is just there, and I never really think about needing more out of it. CSS and JS though I do find myself waiting for browsers to support upcoming experimental features often enough.

JodieBenitez2y ago

Yeah, let's say I'm late to the party then. But still, the improvements compared to HTML4 are huge.

danbruc2y ago· 6 in thread

Would there have been a way to avoid this mess?

  - a browser must reject any invalid HTML in order to force the developers to fix their HTML
  - a browser must try hard to make sense of messed up HTML, otherwise users will switch to a competing browser that renders the mess for them

Theoretically all browser vendors could coordinate so that everyone rejects invalid HTML, but there is probably no good way to avoid defectors. Why did this not happen for other technologies? My first thought was that there is no compilation step which allows forcing the developer to fix things without giving the end user any power through their choice of browser. But that seems not quite right, why do Bash or Python or your C++ compiler not make a best guess what your code is supposed to do? Because there is or was only one dominant implementation and therefore no competition? Because document markup is much more robust against small errors and probably remains readable while your code likely just crashes? That is probably one of the most important ones, I think. What role did browser specific features, evolving standards and incomplete implementations play?

What is the end result? Nothing for the end user, they do not care whether the browser has to deal with nice HTML or a mess. Developer writing HTML get to be more sloppy at the price of a lot of additional complexity and pain where ever code has to deal with HTML. This might actually have some negative impact on end users because of bugs or security issues stemming from the additional complexity. Maybe it made HTML somewhat more accessible to the casual user as they could get away with some mistakes. But was this worth it, could better tooling not have achieved the same with good error messages helping to fix errors?

solardev2y ago

Historically this was tried as XHTML Strict. It never caught on much and was soon forgotten about. https://en.wikipedia.org/wiki/XHTML

At the end of the day, HTML's flexibility as a markup language is what made it popular and usable by anyone, and ambiguity is the price we pay for it.

These days the DOM semantics are even less important as everything is done in JS anyway for all but the simplest documents

danbruc2y ago

HTML was essentially always rigorously defined through a SGML DTD. XHTML also conforms to a DTD which additionally conforms to XML which is itself a restricted subset of SGML. SGML offers huge flexibility, for example with regard to implicitly closing tags or redefining all kinds of aspects; you could easily replace the angle brackets with whatever you want and have your HTML document look like

  (START div)Hello (START b)world\b/!\div/

why ever you would want this.

But despite all the flexibility offered by SGML and because probably nobody implemented HTML parsing by using a SGML parser, we ended up with malformed HTML documents who's interpretation was essentially defined by whatever the ad hoc HTML parser implementations in browser did.

HTML5 finally abandoned the idea of basing HTML on SGML and a DTD and instead essentially formalized the status quo of HTML parsing and put it into the specification. At least this is my understanding as a non-web developer who gets to work with HTML only occasionally.

1 more reply

lifthrasiir2y ago

It is not really the mess---see the next chapter, 13.2 Parsing HTML documents, to see the actual mess. In fact, the HTML specification defines two concrete syntaxes for HTML where the first one is for `text/html` and another is for `application/xhtml+xml`. The latter has been never deprecated (thought the name XHTML was abolished). Moreover the spec states that:

> Some authors find it helpful to be in the practice of always quoting all attributes and always including all optional tags, preferring the consistency derived from such custom over the minor benefits of terseness afforded by making use of the flexibility of the HTML syntax. To aid such authors, conformance checkers can provide modes of operation wherein such conventions are enforced.

In the other words it recognizes the benefit from explicit tags, but also recognizes the benefit from optional tags. So they are equally conforming.

cush2y ago

> Because document markup is much more robust against small errors and probably remains readable while your code likely just crashes?

Exactly. The parser is designed to parse documents, not code. The document has a structure (like sections, paragraphs, tables, etc). When the structure doesn’t quite make sense, the parser still displays the content (the blog, story, words, etc).

> But was this worth it, could better tooling not have achieved the same with good error messages helping to fix errors?

You need to think about compatibility, especially backwards compatibility. If the standard was so strict that any error resulted in the browser rejecting to parse the document, then as the specification evolved every website would need to be updated. The lack of constraints around standards also means that different browsers can evolve and implement different features at different cadences.

perilunar2y ago

> force the developers to fix their HTML

Developers?

In the early days of the web it wasn't developers writing HTML. It was anybody who wanted to publish anything on the web. Real programmers didn't touch it. That is why browsers had to be tolerant of bad code.

danbruc2y ago

They did not have to, they could have rejected broken HTML. Writing valid HTML is not really harder than writing invalid but still parsable HTML, it just requires some additional tedious work fixing all the mistakes. Would this have been bad for adoption? Maybe.

paulddraper2y ago

And people wonder why XHTML was a thing.

(Is still a thing, but W3C recommends against it)

throwaway876512y ago

Fantastic! These are great suggestions to help write readable, maintainable HTML. Similar to: https://lofi.limo/blog/write-html-right

mattkenefick2y ago

It bummed me out when modern browsers started supporting mistakes in code; like when Chrome would interpret incorrect markup and fix it for you.

hyperhello2y ago

There is some variant of the theorem about any sufficiently complex language can’t express its own correctness and vice versa. We want to turn all expression failures into syntax errors; but we can’t. Just don’t write bad HTML, bad JavaScript, bad CSS, and there won’t be any trouble for you.

j / k navigate · click thread line to collapse

34 comments

25 comments · 6 top-level

JodieBenitez2y ago· 13 in thread

Yes... but why do this ? I don't regret the XHTML days and its feature stagnation, but this is just useless.

jraph2y ago

I learned HTML when XHTML 1.0 was current. I've long preferred strict HTML written using XML. It helps spotting mistakes, and there are no parsing surprise.

You still need to specify the charset so characters are interpreted correctly, so for me, if you are not going to use application/html+xml anyway this works well:

    <!DOCTYPE html>
    <meta charset="utf-8" />
    <title> My title </title>

    <p> Lorem ipsum... </p>

Both quicker to read and write, while not raising maintenance costs.

Though just yesterday I edited my resume written in XHTML and the browser actually spotted a dumb mistake, so I still like the strictness of the XML parsing mode.

xigoi2y ago

You need to include the <html> tag for the lang attribute, which is important for accessibility.

1 more reply

lifthrasiir2y ago

JodieBenitez2y ago

Good... now let's make a formal specification for this:

    <ahahah>mmm... <ohoho>nope.</ahaha></ohoho>

Fully joking :-P

1 more reply

MrVandemar2y ago

It's very useful.

I use optional -- therefore abbreviated -- HTML syntax as an alternative to MarkDown for writing.

JodieBenitez2y ago

You're saying you're using an abbreviated HTML as an alternative to a markup language that is a lightweigth alternative to the markup language known as HTML ?

We need to go deeper. •`_´•

1 more reply

GoblinSlayer2y ago

This way you can send spam sms messages with little html markup and google analytics.

layer82y ago

It was a codification of existing browser practice. Specifying something different wouldn’t have changed browser behavior, it would only have led to browsers ignoring the spec.

alerighi2y ago

kevincox2y ago

pwdisswordfishc2y ago

HTML is still feature-stagnated; features are mostly added to JavaScript and DOM APIs.

jimmaswell2y ago

JodieBenitez2y ago

Yeah, let's say I'm late to the party then. But still, the improvements compared to HTML4 are huge.

danbruc2y ago· 6 in thread

Would there have been a way to avoid this mess?

  - a browser must reject any invalid HTML in order to force the developers to fix their HTML
  - a browser must try hard to make sense of messed up HTML, otherwise users will switch to a competing browser that renders the mess for them

solardev2y ago

Historically this was tried as XHTML Strict. It never caught on much and was soon forgotten about. https://en.wikipedia.org/wiki/XHTML

At the end of the day, HTML's flexibility as a markup language is what made it popular and usable by anyone, and ambiguity is the price we pay for it.

These days the DOM semantics are even less important as everything is done in JS anyway for all but the simplest documents

danbruc2y ago

  (START div)Hello (START b)world\b/!\div/

why ever you would want this.

1 more reply

lifthrasiir2y ago

In the other words it recognizes the benefit from explicit tags, but also recognizes the benefit from optional tags. So they are equally conforming.

cush2y ago

> Because document markup is much more robust against small errors and probably remains readable while your code likely just crashes?

> But was this worth it, could better tooling not have achieved the same with good error messages helping to fix errors?

perilunar2y ago

> force the developers to fix their HTML

Developers?

danbruc2y ago

paulddraper2y ago

And people wonder why XHTML was a thing.

(Is still a thing, but W3C recommends against it)

throwaway876512y ago

Fantastic! These are great suggestions to help write readable, maintainable HTML. Similar to: https://lofi.limo/blog/write-html-right

mattkenefick2y ago

It bummed me out when modern browsers started supporting mistakes in code; like when Chrome would interpret incorrect markup and fix it for you.

hyperhello2y ago

j / k navigate · click thread line to collapse