A Horse of a Different Color: XHTML is not HTML

Kindly stop reading and let that sink in for a minute. If you are using XHTML on your site, you are not using valid HTML. Try changing only the DOCTYPE and leaving the rest of the document alone, and then try validating it. You’ll see what I mean. Any extra slashes in the head of the document will return errors, because XHTML is “a horse of a different color.”

Every file has a MIME type; computers must know what type of file it is so they know what application or plug-in should be used to open and read the file. Normal, “old-fashioned” HTML carries the MIME type text/html. That’s pretty basic; it’s simply text—hypertext, to be specific—and more specifically, it’s HTML.

XHTML, properly served, carries the MIME type application/xhtml+xml. What’s the difference? While HTML is merely text used as a markup language, XHTML is an application of XML, the Extensible Markup Language. (By the way, the parent language of HTML is called SGML [Standard Generalized Markup Language] and has been around since the 1980s.)

The problem with MIME types for Web pages is that most of today’s browsers do not support application/xhtml+xml. Among these “old-fashioned” browsers is the popular Internet Explorer 6.0 for Windows. Our statistics for October 2003 indicate that 70% of our visitors are using this browser, and another 20% are using older versions of Internet Explorer. Newly emerging technologies such as PDAs, mobile devices and Internet-capable cell phones generally don’t support it either; they support something closer to HTML 3.2.

Our recent solution to this problem was to serve XHTML as application/xhtml+xml to browsers which will accept it (currently Netscape 6/7 and Mozilla) and as text/html to everyone else (Opera 6 and 7 will accept it, but scripting is an issue with them; more on that later). This works, but to me it is fundamentally wrong. The same Web page cannot be of two different MIME types any more than a JPEG image can be played as a video, or an ordinary donut can be a cream-filled Bismark, or a cup of black coffee can pass as a cappuccino. Yet that is precisely what we are trying to do, if we serve XHTML as text/html, the MIME type that most (even modern) browsers can understand.

Some get around this by dynamically changing the DOCTYPE and removing the extra slashes, or by using content negotiation to serve a separate HTML page to non-compliant browsers. The first approach requires that all HTML files be saved with .php extensions, which for most means changing all of the URIs—not a good thing. The second approach means having to maintain two separate pages, which can be a pain if you update frequently. It would also divide your traffic among the two pages—not good for search engine ratings. Neither is a tidy solution. I prefer the “one page fits all” approach; after all, it is the World Wide Web, right?

The Scripting Problem

Most websites make extensive use of client-side scripting, particularly JavaScript. To complicate matters for prospective XHTML users, JavaScript works differently in an XML environment than it does with normal HTML. Mark Pilgrim goes into some depth on this in his article for xml.com. One major problem is that often-used methods like document.write simply don’t work in XML, so other methods must be used instead. I tried this on our site, but it meant actually using both methods, since the newer methods aren’t supported by many existing browsers (including Opera 6; that’s why I don’t serve application/xhtml+xml to Opera).

Then another problem surfaced: HTML entity codes for special text characters appeared as plain text using the newer methods, so I had to key the characters into the script directly. This meant using characters which are not allowed in HTML. They work fine on Windows, but I wonder what weird Latin characters are showing up on other platforms?

Update, January 31, 2004: I later learned the syntax for presenting the characters in XML. We are using the greeting script on our Home page now.

Redundant Code

If there’s one thing I hate about designing websites, it’s using redundant code. Specifically I am referring to redundant name and id attributes in anchor tags, and redundant style sheet linking methods.

Redundant name and id attributes

XHTML requires the id attribute for anchor tags used as link targets. XHTML 1.1 does not allow the name attribute in anchors; XHTML 1.0 allows both for backwards compatibility with older browsers (such as Netscape 4) that don’t support linking to id attributes. But regular HTML does not require the id attribute, and everyone supports name, so why use both?

Redundant style sheet linking methods

The <link> tag is used in HTML for linking to external style sheets. Generally this works in XHTML also, but the preferred method is to use the ?xml-stylesheet syntax. To satisfy standards, both methods should be used in “backwards-compatible XHTML.” Hardly anyone actually does this, and I consider it a waste of time and bandwidth. The <link> tag works fine for everyone in regular HTML.

What is “backwards-compatible XHTML,” anyway? The only reason XHTML works on older browsers at all is because they are forgiving of HTML errors. They have to be, because 95% of all existing websites are written in invalid HTML. This is mainly due to how the Web evolved; browser manufacturers invented their own proprietary tags, and now most people still use them. Examples of these include <marquee>, <embed>, <bgsound> and <font face…>. Enough said.

The Bottom Line

The bottom line for me (and probably for most) is this: if you don’t need the extensibility of XML, XHTML is unnecessary. “What about forward compatibility?” you may ask. HTML is not going away any time soon. Too many expensive corporate websites are built with it. Remember when they said the United States would be entirely on the metric system by 1978? It’s now 2003, and everything is still measured in feet and inches here, because the public was so strong to resist change. So I (and others) predict it will be with HTML.

Unfortunately, this means the 21st Anniversary Edition of The Oo Kingdom will be short-lived. It’s back to the drawing boards, this time for a Greatly Simplified Version in HTML 4.01 Strict.