Silly Me

Posted January 29, 2022

For years I’ve been using HTML entity codes to “escape” special characters on this website—curly quotation marks and apostrophes, long dashes and other punctuation marks, letters with accents or other marks, and so on. In the early days of HTML, this was standard practice, as most sites used either ASCII or a Latin character encoding which didn’t include those characters. But in HTML5, the declared encoding is usually UTF-8, which allows many characters to be typed directly into the page sources. I wasn’t fully aware of that when I converted this site to HTML5 in 2009, so I continued using the cumbersome entity codes to “escape” the characters. None of this affects how the text renders on the pages, but the codes are messy and hard to read in the source codes, and they make the site somewhat harder to manage.

I am now in the process of switching out the entity codes on our Web pages for the real thing—that is, the actual characters. Look at the source code for this page. You won’t find a single “escaped” character in it except for this one (&), which is used because that ampersand character is the actual mechanism for doing the escapes: each character entity code begins with the ampersand and ends with a semicolon. In rare instances, you may find a code for the less-than symbol (<) which normally is used for starting an HTML tag, so it must be escaped as well. Oops, now there are two entity codes on this page! Silly me.

In all fairness to myself, I only recently learned that the document itself must be encoded in UTF-8 in order for this to work. Until recently, Windows Notepad had ANSI as its default encoding, so most of the special characters didn’t work in it. I would type them into the source code, save it as HTML, then be puzzled when weird boxes or question marks appeared on the Web page instead of the special characters. To solve the problem at that time, I “escaped” the characters by using the cumbersome entity codes.

You may ask why I’m switching out the character codes for the actual characters, since either will work. My reasoning is twofold: 1) just to be consistent; and 2) to demonstrate to other people building websites that with UTF-8, many of those special characters don’t need to be “escaped” anymore.

At any rate, it’s a lot easier now for me to write a Web page, not having to use those cumbersome entity codes.