XHTML 2 vs. HTML 5


Overview of XHTML 2.0

XHTML 2.0 is based solely on XML, forgoing the SGML heritage and syntax peculiarities present in current web markup. XHTML 2.0 is supposed to be a “general-purpose language,” with a minimal default feature set that is easy to extend using CSS and other technologies (XForms, XML Events, etc). It’s a modular approach that allows the XHTML2 group to focus on generic document markup, while others develop mechanisms for presentation, interactivity, document construction, etc.

Priority one for the XHTML2 working group is to further separate document content and structure from document presentation. Other goals include increased usability and accessibility, improved internationalization, more device independence, less scripting, and better integration with the Semantic Web. The group has been less concerned with backward compatibility than their predecessors (and the HTML working group), which has led them to drop some of the syntactic baggage present in earlier incarnations of HTML. The result is a cleaner, more concise language that corrects many of Web markup’s past indiscretions.

Overview of HTML 5

While XHTML 2.0 aims to be revolutionary, the HTML working group has taken a more pragmatic approach and designed HTML 5 as an evolutionary technology. That is to say, HTML 5 is an incremental step forward that remains mostly compatible with the current HTML 4/XHTML 1 standards. However, HTML 5 offers a host of changes and extensions to HTML 4/XHTML 1 that address many of the faults in these earlier specifications.

HTML 5 is about moving HTML away from document markup, and turning it into a language for web applications. To that end, much of the specification focuses on creating a more robust, feature-ful client side environment for web application development by providing a variety of APIs. Among other things, the spec stipulates that complying implementations must provide client-side persistent storage (both key/value and SQL storage engines), audio and video playback APIs, 2D drawing through thecanvas element, cross-document messaging, server-sent events, and a networking API.

The HTML 5 specification maintains an SGML-like syntax that is compatible with the current HTML specifications (though some of the more esoteric features of SGML areno longer supported). Also included in the specification is a second “XML Serialization” which allows developers to serve valid XML documents as well. Again, by maintaining an SGML-like serialization the HTML 5 working group has struck a balance between pragmatism and progress. Developers can choose to markup content using either the HTML serialization (which looks more like HTML 4.x) or the XML serialization (which looks more like XHTML 1.x).

Similar Features

It shouldn’t be too surprising that both working groups are proposing a number of similar features. These features address familiar pain points for web developers, and should be welcome additions to the next generation of markup languages.

Removal of Presentational Elements

A number of elements have been removed from both XHTML 2.0 and HTML 5 because they are considered purely presentational. The consensus is that presentation should be handled using style sheets.

HTML 5 and XHTML 2.0 documents cannot contain these elements: basefontbig,fontsstrikett, and u. XHTML 2.0 also removes the smallbi, and hrelements, while HTML 5 redefines them with non-presentational meanings. In XHTML 2.0, the hr element has been replaced with separator in an attempt to reduce confusion (since the hr element, which stands for horizontal rule, is not necessarily either of those things).

Navigation Lists

Navigation lists have been introduced in both XHTML 2.0 and HTML 5. In XHTML 2.0, navigation is marked up using the new nl elementNavigation lists must start with a child label element that defines the list title. Following the title, one or more lielements are used to markup links. Also new in XHTML 2.0 is the ability to create a hyperlink from any element using the href attribute. Combining these features produces simple, lightweight navigation markup:

  <li href="/">All</li>
  <li href="/news">News</li>
  <li href="/videos">Videos</li>
  <li href="/images">Images</li>

In HTML 5, the new nav element has been introduced for this purpose. Unfortunately,nav is not a list element, so it cannot contain child li elements to logically organize links (perhaps a new idiom will develop). And since anchor tags are still required to create hyperlinks in HTML 5, navigation markup is not quite as elegant:

    <li><a href="/">All</a></li>
    <li><a href="/news">News</a></li>
    <li><a href="/videos">Videos</a></li>
    <li><a href="/images">Images</a></li>
Enhanced Forms

Both specifications have new features to create more robust, consistent forms with less scripting. In XHTML 2.0, standard HTML forms are dropped completely in favor of the more comprehensive XForms standard. The XHTML2 working group does not control this standard, but references it from the XHTML 2.0 specification. To facilitate reuse, XForms separates the data being collected from the markup of the controls. It’s a robust and powerful language, but a full description is way beyond the scope of this post. Suffice it to say, there will be a bit of a learning curve for web developers trying to get up to speed with this technology.

HTML 5 retains the familiar HTML forms, but adds several new data types to simplify development and improve usability. In HTML 5, several new types of input elements have been introduced for email addresses, URLs, dates and times, and numeric data. This will allow user agents to provide more sophisticated user interfaces (e.g., calendar date pickers), integrate with other applications (e.g., pulling addresses from Outlook or Address Book), and validate user input before posting data to the server (less client-side javascript validation).

Semantic Markup

Both working groups have embraced the coming Semantic Web by allowing developers to embed richer metadata in their documents. As with forms, the XHTML2 working group has embraced a more sophisticated technology, while the HTML working group has kept things simple.

In XHTML 2.0, metadata can be embedded by using several new global attributes from the Metainformation Attributes Module. In particular, the new global role attribute is intended to describe the meaning of a given element in the context of the document. The technical term is Embedding Structured Data in Web Pages. Again, the group leverages an existing standard by referencing RDF. The technology is extremely powerful, but it’s also complicated.

The HTML working group has taken an approach that feels more like microformats by overloading the class attribute with a predefined set of reserved classes to represent various types of data. The specification currently lists seven reserved classes:copyrighterrorexampleissuenotesearch, and warning. While overloading the class attribute like this might be confusing, it’s unlikely that user agents will render elements with these classes differently. And the class names are specific enough that there’s little worry: if an element has its class set to copyright, it’s probably a copyright whether the developer knew about the reserved classes or not.

You may also like...