57

So, HTML5 is the Big Step Forward, I'm told. The last step forward we took that I'm aware of was the introduction of XHTML. The advantages were obvious: simplicity, strictness, the ability to use standard XML parsers and generators to work with web pages, and so on.

How strange and frustrating, then, that HTML5 rolls all that back: once again we're working with a non-standard syntax; once again, we have to deal with historical baggage and parsing complexity; once again we can't use our standard XML libraries, parsers, generators, or transformers; and all the advantages introduced by XML (extensibility, namespaces, standardization, and so on), that the W3C spent a decade pushing for good reasons, are lost.

Fine, we have XHTML5, but it seems like it has not gained popularity like the HTML5 encoding has. See this SO question, for example. Even the HTML5 specification says that HTML5, not XHTML5, "is the format suggested for most authors."

Do I have my facts wrong? Otherwise, why am I the only one that feels this way? Why are people choosing HTML5 over XHTML5?

5 Answers5

25

I would recommend reading How Did We Get Here?. Mark Pilgrim gives an excellent and brief history of HTML up to HTML5.

Essentially though, my understanding is that many webpages don't even take advantage of the "X" of XHTML because they don't specify the proper MIME type for it, so browsers treat it as ordinary HTML.

pthesis
  • 766
6

If you produce xml compatible html5, and send them with xml as mime type, then the xml parser will be used all all that good jazz comes back ;)

EDIT: see that for some more informations : http://wiki.whatwg.org/wiki/HTML_vs._XHTML

deadalnix
  • 6,023
5

HTML5 is the logical and inevitable conclusion of browsers adopting Postel's law ("Be liberal in what you accept").

Once one browser with sufficient market share adopts this principle, others are forced to follow suit, not only in being liberal by accepting non-conforming content, but also rendering it the same way as their competitors do. HTML5 is the logical result of that situation: the browser vendors have decided that since they're not going to reject any content as invalid (at least, not at the HTML level - Javascript is another matter!) they might as well sit round the table and agree an interpretation for anything the content author might throw at them. In this environment, they haven't reacted kindly to standards-geeks telling them that if only they had rejected ill-formed content from the word go, they wouldn't have got into this mess.

So you and I can shout from the sidelines and tell the browser vendors and their users that the world would have been a better place if they hadn't believed John Postel, but the damage is done and it's very hard to undo it.

Michael Kay
  • 3,599
  • 1
  • 17
  • 13
3

You will never get the benefits of a simpler parser or standard XML tools on the client side anyway.

There are billions of pages on the web in HTML, some of them are written by people long dead, so they are never going to be updated to XML. So if you want to create a generally useful user agent you have to be able to parse old fashioned HTML anyway. Arguably XHTML only introduces additional complexity since it requires a new mode of parsing in addition to the HTML parsing you already have to support.

On the server side you can still take advantage of XML tools by eg. generating XHTML using XSLT. But if you are not specifically using a XML toolchain, there is no benefit in using XML syntax rather than just HTML.

(You are not correct that HTML is "non standard" syntax. The syntax of HTML is specified in painstaking detail in the HTML5 spec, so it is just as much a standard as XML syntax.)

JacquesB
  • 61,955
  • 21
  • 135
  • 189
1

The HTML5 specification has actually been greatly improved over the HTML4 specification. In particular, the handling of error conditions and invalid markup is actually standardized, meaning all browsers that correctly implement the standard will handle invalid markup in the same way.

HTML is written by humans more often than not (usually in conjunction with some kind of templating language), and humans make mistakes. As long as all browsers handle syntax errors in the same way, then the "be liberal in what you accept" rule is perfectly acceptable.

There is really little advantage in producing valid XML, since tools and libraries to handle HTML are (nearly) just as readily available, and HTML is easier for humans to write than XML.

Dean Harding
  • 19,911