Russ Weakley
15-Feb-03
How do you go about validating approximately 10,000
pages? The Australian Museum web team decided it was
possible, and worthwhile.
This article explains the steps we took. but
first, a little about validation:
What is valid code?
Validation is a process of checking your documents
against a formal standard, like those published by
the W3C. A document that has been checked and passed
is considered valid.
Why use valid code?
Valid code will render faster than code with errors
Valid code will render better than invalid code
Browsers are becoming more standards compliant, and
it is becoming increasingly necessary to write valid
and standards compliant HTML
Global changes
The first step we undertook was a wide range
of global changes across the entire site. This meant
that we sometimes changing over 10,000 pages at a
time. Changes included:
Global change 1: Adding a Doctype
Many of our HTML pages had the old Doctypes
HTML 3.2 and HTML 4.0, the latter throwing errors
with the use of the "name" attribute in
image tags which is required for JavaScript image
rollovers, so we replaced this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN">
With this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
Global change 2: Adding Character encoding
It is very important that the character encoding of
any XML or (X)HTML document is clearly labelled. If
a user agent (eg. a browser) is unable to detect the
character encoding used in a Web document, the user
may be presented with unreadable text. So, we added
the character set below to all files:
<meta http-equiv="content-type" content="text/html;
charset=utf-8">
Global change 3: JavaScript Tag
Our <script> elements had no type attribute
- and used the depricated "language" atribute
instead. The type attribute specifies the scripting
language of the element's contents. So, we changed
this:
<script language="JavaScript">
To this:
<script type="text/javascript">
Global change 4: Invalid <body> attributes
We removed the invalid body attributes that are used
to force page content into the top left corner of
the page. These attributes:
leftmargin="0" topmargin="0"
marginwidth="0" marginheight="0"
were removed and replaced with an additional rule
in the CSS file:
body
{
padding: 0;
margin: 0;
}
Global change 5: Bold and Italic tags
As bold and Italic tags are deprecated, we did global
changes across all of our sites and replaced this:
<b></b>
with this:
<strong></strong>
and this:
<i></i>
with this:
<em></em>
Section-by-section changes
There were many that could not be done globally. These
were generally done by downloading a section of the
site at a time and doing mini global changes. Site-section
changes included:
Section change 1: Image-based submit buttons
Many of our image-based submit buttons had width,
height and border attributes. So, we removed all these
attributes within submit buttons.
Section change 2: Invalid characters
As we often pulled content from MSWord into HTML Editors
we often found invalid characters that needed to be
replaced or removed (as well as hundreds of horrid
local styles). Some of the more common invalid characters
include:
† - replace with space
í - replace with ' (single quote mark)
& - replace with & (only in non html text)
ñ - replace with - (dash)
ë - replace with ' (single quote mark)
… - replace (three dots within single character)
with ... (three dots)
‘ replace with ' (single quote mark)
’ replace with ' (single quote mark)
– replace with - (dash)
Section change 3: CSS changes
Finally, there were many CSS files that needed
to be edited by hand including:
We added "background-color" whenever "color"
was specified and visa versa. Often this meant setting
the background colour to "transparent".
This is recommended by WC3.
We added quotes around font names with white space.
Without quote marks, white space in font names will
be ignored.
Fina results
There were other minor adjustments we made throughout
our site during this process. However, it should be
mentioned that our files were reasonably close to
valid when we began. We always used quote marks around
attributes and all images had "alt" tags,
so our task was not as large as it could have been.
The bottom line is that making a large site 100%
valid can be done. Apart from a few sections that
we are still working on, our site is currently 100%
valid.
How do you check if your code is valid?