A Horse of a Different Color: XHTML is not HTML
Kindly stop reading and let that sink in for a minute. If you
are using XHTML on your site, you are not using valid
HTML. Try changing only the DOCTYPE
and leaving the rest of the document alone, and then try validating
it. You’ll see what I mean. Any extra slashes in the
head of the document will return errors, because
XHTML is “a horse of a different color.”
Every file has a MIME type;
computers must know what type of file it is so they know what
application or plug-in should be used to open and read the file.
Normal, “old-fashioned” HTML carries the
MIME type text/html. That’s
pretty basic; it’s simply text—hypertext, to be
specific—and more specifically, it’s HTML.
XHTML, properly
served, carries the MIME type
application/xhtml+xml. What’s the difference?
While HTML is merely text used as a markup
language, XHTML is an application of XML, the Extensible
Markup Language. (By the way, the parent language of HTML
is called SGML
[Standard Generalized Markup Language] and has been around since the
1980s.)
The problem with MIME types for Web pages is
that most of today’s browsers do not support
application/xhtml+xml. Among these “old-fashioned”
browsers is the popular Internet Explorer 6.0 for Windows. Our
statistics for October 2003 indicate that 70% of our visitors are
using this browser, and another 20% are using older versions of
Internet Explorer. Newly emerging technologies such as PDAs, mobile devices and
Internet-capable cell phones generally don’t support it either;
they support something closer to HTML 3.2.
Our recent solution to this problem was to serve XHTML
as application/xhtml+xml to browsers which will accept it
(currently Netscape 6/7 and Mozilla) and as text/html
to everyone else (Opera 6 and 7 will accept it, but scripting is
an issue with them; more on that later). This works, but to me
it is fundamentally wrong.
The same Web page cannot be of
two different MIME types any more than a
JPEG image
can be played as a video, or an ordinary donut can be a cream-filled
Bismark, or a cup of black coffee can pass as a cappuccino.
Yet that is precisely what we are trying to do, if we serve
XHTML as text/html, the MIME
type that most (even modern) browsers can understand.
Some get around this by dynamically
changing the DOCTYPE and removing the extra slashes,
or by using content negotiation to serve a separate HTML
page to non-compliant browsers. The first approach requires that all
HTML files be saved with .php extensions, which for most
means changing all of the URIs—not a good
thing. The second approach means having to maintain two separate
pages, which can be a pain if you update frequently. It would also
divide your traffic among the two pages—not good for search
engine ratings. Neither is a tidy solution. I prefer the “one page
fits all” approach; after all, it is the World Wide Web,
right?
The Scripting Problem
Most websites make extensive use of client-side scripting,
particularly JavaScript. To complicate matters for prospective
XHTML users, JavaScript works differently in an
XML environment than it does with normal HTML.
Mark Pilgrim goes into some depth on this in his article
for xml.com. One major problem is that often-used methods like
document.write simply don’t work in
XML, so other methods must be used instead. I tried
this on our site, but it meant actually using both methods,
since the newer methods aren’t supported by many existing
browsers (including Opera 6; that’s why I don’t serve
application/xhtml+xml to Opera).
Then another problem surfaced: HTML entity codes for special text characters appeared as plain text using the newer methods, so I had to key the characters into the script directly. This meant using characters which are not allowed in HTML. They work fine on Windows, but I wonder what weird Latin characters are showing up on other platforms?
Update, January 31, 2004: I later learned the syntax for presenting the characters in XML. We are using the greeting script on our Home page now.
Redundant Code
If there’s one thing I hate about designing websites,
it’s using redundant code. Specifically I am referring to
redundant name and id attributes in
anchor tags, and redundant style sheet linking methods.
Redundant name and id attributes
XHTML requires the id attribute
for anchor tags used as link targets. XHTML 1.1
does not allow the name attribute in anchors;
XHTML 1.0 allows both for backwards compatibility
with older browsers (such as Netscape 4) that don’t support
linking to id attributes. But regular HTML
does not require the id attribute, and everyone supports
name, so why use both?
Redundant style sheet linking methods
The <link> tag is used in HTML
for linking to external style sheets. Generally this works in
XHTML also, but the preferred method is to use the
?xml-stylesheet syntax. To satisfy standards,
both methods should be used in
“backwards-compatible XHTML.” Hardly anyone
actually does this, and I consider it a waste of time and bandwidth.
The <link> tag works fine for everyone in regular
HTML.
What is “backwards-compatible XHTML,” anyway?
The only reason XHTML works on older browsers at all
is because they are forgiving of HTML errors. They
have to be, because 95% of all existing websites are written in
invalid HTML. This is mainly due to how the Web evolved;
browser manufacturers invented their own proprietary tags, and
now most people still use them. Examples of these include
<marquee>, <embed>,
<bgsound> and <font face…>.
Enough said.
The Bottom Line
The bottom line for me (and probably for most) is this: if you don’t need the extensibility of XML, XHTML is unnecessary. “What about forward compatibility?” you may ask. HTML is not going away any time soon. Too many expensive corporate websites are built with it. Remember when they said the United States would be entirely on the metric system by 1978? It’s now 2003, and everything is still measured in feet and inches here, because the public was so strong to resist change. So I (and others) predict it will be with HTML.
Unfortunately, this means the 21st Anniversary Edition of The Oo Kingdom will be short-lived. It’s back to the drawing boards, this time for a Greatly Simplified Version in HTML 4.01 Strict.