July 2005

Making the HTML 4 to XHTML 1 Leap

Recently, this site went through a validation iteration of the documents hosted here. Validation is done periodically (as in once or twice a year) to make sure as many documents conform to a W3C specification.

Use Cascading Style Sheets . . . NOW

Many writers who use the online medium have, in the past at least, completely and totally (brutally) abused how markup works versus how different rendering engines handle it. The advent (unmitigated disaster?) of inline formatting was the first step in the wrong direction of using markup languages for a lot of reasons - but the number one reason that using inline formatting can be a problem is quite simple:

It breaks scalability

It really does not matter who you are, once you read that phrase, you know you are in for a world of hurt.

Not only do Cascading Style Sheets give the author an immense amount of power over formatting controls, but they make scalability simpler. In a programmers terms, they are the header file that goes everywhere (wouldn't we all love that? :)

Once CSS is incorporated and working, all inline formatting has to be removed and that means all of it including body tag data etc. Luckily, the site used CSS and was HTML 4 compliant already.

Basic Formatting Hurdles

The first step that was taken here was to remove known unfriendly XHTML objects - tables for starters. The site has few tables - for a reason - they are not friendly not only to browsers but to validators as well. The next one was to remove any of the old hr tags, not only are they not needed at the site (scan the page, you will see why) but they are totally illegal - even in transition DTDs.

Next came the real stuff that, well, just was not foreseen. Following is a quick list of violations found:

  • Use <br /> not <br>
  • Close table, image and meta tags with /> not just >
  • Your img tag better have an alt field.
  • Form field items either need a closing tag or a />.
  • Close all other tags . . . period.

So what does all of that mean? Basically you have to have closing tags or - as in the case of br or img tags - you have to either have a closing tag per the XML standard or the XHTML /> which indicates that it is a short tag.

What About Strict

Strict can be achieved, however that was not the goal since there is a certain amount of breakage involved. For starters, XML strict will not allow the use of forms, only xforms, a technology that has not yet matured. Actually, it has been the experience of the author that using transitional is the only way to keep sanity unless one is dealing with a very limited amount of raw data types.

Admitted Failures

The site has problems with formatting. The best example is the overuse and abuse of the br tag. That, in of itself, is likely the worst offender. It also - still - uses tables (something not allowed in the strict DTD) not too mention some hackney spacing such as the old classic <p>&nbsp;</p> - there are others to be sure, but like the text says go not gently.. which is probably your best bet too. In addition to all of that, a lot of bold, italic, quotes and other hold overs (which will likely be supported forever) are in these docs as well.

Surviving the Validation

To say that the site's author was pushed to the edge of his patience is like saying Darth Vader occasionally lost his temper. The first thing to be done is to start small, find the smallest corner of a site and change the DTD to the following:

<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html xmlns=http://www.w3.org/1999/xhtml>

Next, redo all short tags such as (but not limited to) the following:

br
img
hr
input
META

Close all other tags, each and every one - no exceptions.

Finally, pay a visit here: the W3 HTML Validator. Yes, copy and paste is your friend (thanks to X mouse properties...). If there is something in the validator output that does not make sense make sure to try out the advanced options (these should be selectable in the menu above the results if a page fails) such as showing the output. The validators line numbers and your file numbers will not match. So using their output generator will go a long way in tracking down the problem area.

Why Validate?

The best, saved for last. Why bother? Whats the use? No one else does. Yes they do, actually a great deal of web authors pride themselves on their validation - although being validated is not, in of itself the makings of a good author. This site has neither the XHTML 1 or CSS images or links to go with it [1]. There are at least several good reasons to get validated other than saying you are.

Browser Compatibility

Believe it or not, a validated site is compatible with most web browsers - as shocking as that may sound. If CSS is being employed to its fullest, then the site - generally - looks the same or at least the layout is comparable. That is not to say that it is not a good idea to break CSS every now and again to see how the site renders. A real cheap way to break CSS is to use Firefox or Mozilla and disable stylesheets.

Readability

Clients that are not visually based, such as aural ones, tend to use standards to the hilt. By using standards you give the disabled an edge when they come to your site - yes actually there is more you can do such as aural stylesheets - but the least amount of work to make sure that your site works for the disabled is to comply with the lowest contemporary standard of the time.

Bug Fixing and Eye Opening

Earlier in the article it was mentioned that the site still has a lot of problems, even within the scope of XHTML 1 Transitional. Faults most likely never would have bubbled to the surface unless a validation check had been run, in addition, several syntax errors were discovered.

Summary

Validation of anything, whether it be POSIX code or environmental sensors, is an essential part of the computing and many others, profession. Making web pages validate with XHTML is a step in the right direction for any online author. It makes the material scalable, import-able and frankly - most useful.

Footnotes

  1. The author keeps a private validation index.

 

Digg!
Submit site
news to Digg!

Slashdot Slashdot It!
Delicious Bookmark on Delicious