Technical level: Basic/Beginner ||
Date: 21st February 2004|| Author:
Nigel Peck
This article has been published on SitePoint and
linked from the W3C.
Article Index
Introduction,
Hello World,
XHTML Building Blocks,
Text That Says Something,
Advanced XHTML Building Blocks,
Text That Says Something 2.
Advanced XHTML Building Blocks
Before we look at any more elements there are a few
more basic building blocks of XHTML that we need to
cover in order for you to understand the topics we
will examine. Hopefully you now have an understanding
of elements, start tags, end tags, the basic structure
of an XHTML document and the text elements we looked
at in the previous section.
In this section we will be looking at the topics
listed below, don't worry if the topic titles look
a bit scary, they'll make sense when you get to them,
but the titles will make it easier to check back for
later.
Character References and Entity References,
White Space and
Comments.
Character References and Entity References
Character references aren't as scary as they sound
(no need to sweat). Let's find out why they exist,
and then we can look at how you code them and use
them.
Take a look at your keyboard, can you type a copyright
symbol © or an inverted exclamation mark ¡?
Unless you're using a pretty strange keyboard then
the answer is no.
Imagine you are a Web browser (User Agent) reading
a Web page file and you come across a left angle bracket
<. How do you know if it is the start of a tag
or an angle bracket used in the content of the document?
Answer, you don't.
The solution to these two problems? Entity references
and character references (funny, that's also the title
of this section).
Entity references and character references are extremely
similar in XHTML, and people often confuse the two
names. Basically they tell a Web browser (User Agent)
that it should insert a certain character in their
place.
If you don't know what a character is, it's a catch
all word for a letter, number, punctuation mark etc.
A is one character, AB is two characters, N!P 3 is
five characters (four? you forgot to count the space).
You get the idea.
A character reference or entity reference represents
one character in XHTML, entity references can represent
more than one character in SGML or XML but that's
another story that you don't need to worry about right
now.
The difference between a character reference and
an entity reference is this. Character references
use numbers while entity references use names. Let's
look at the copyright symbol we saw above. To insert
a copyright symbol into your document you would use
either of the following:
©
Try the © entity reference
©
Try the © character reference
If you try the examples above (and your Web browser
(User Agent) isn't broken) you will see a copyright
symbol for both examples. As I said before, the entity
reference uses names (copy), the character reference
uses numbers (169). Observant readers will notice
that the character reference also has a sharp symbol
#. Let's take a closer look.
An entity reference begins with an ampersand. This
is then followed by the name of the entity reference,
which is followed by a semi-colon, much in the same
way that you use a left angle bracket and right angles
bracket to denote (delimit) the start and finish of
a tag.
Character references begin with an ampersand followed
by a sharp symbol. This is then followed by the number
of the character reference, which is again followed
by a semi-colon.
Whether you use an entity reference or a character
reference is up to you. I tend to use entity references
because I find names easier to remember than numbers
but the choice is yours. Just don't forget that you
need the sharp symbol with the character reference
and not with the entity reference.
I will be explaining some of the entity and character
references available to you in later sections, but
I will not be showing you all of them individually
as there are too many (approximately two hundred and
fifty). For your reference I have prepared three articles
detailing the three sets available to you. These are
at the following locations.
Latin-1 Character References
Special Characrer References
Symbol Character References
Not all of them work in all browsers so be sure to
test the ones you choose to use.
Ampersands and Left Angle Brackets
Although it is possible to enter ampersands &
and left angle brackets < with most keyboards,
you should always use an entity or character reference
when they appear in your content. This is for the
reason that I have already mentioned. There is no
way for a computer to know the difference between
the start of an entity/character reference or a tag
from an ampersand or a left angle bracket respectively.
Using character or entity references for those characters
avoids this problem.
The following code contains an ampersand
and a left angle bracket:
<p>Never use a < or an & directly in
your content.</p>
The above code is wrong and should be written in
one of the two following ways, firstly with entity
references and then with character references:
<p>Never use a < or an & directly
in your content.</p>
View example 2
<p>Never use a < or an & directly
in your content.</p>
View example 3
White Space
White space means any characters in your document
that do not serve any purpose other than creating
space. This includes spaces, tabs, line breaks and
zero width spaces. A line break is the character (or
2) at the end of each line that tells the computer
to start a new line. A zero width space is used to
separate words in languages such as Thai.
There are two issues relating to white space that
you need to be aware of.
White Space Between Words
No matter how much space you use between your words,
Web browsers (User Agents) will always reduce it to
a single space character. There is one exception to
this that we will cover in the next section. When
I say words I mean any characters that are not white
space and have no white space between them.
That might sound a bit complicated, but it's not,
it just sounds complicated when you try to describe
it. An example should help you to understand.
<p>This content
has a lot
of white space
between the
words.</p>
View example 4
If you view the above example in a visual Web browser
(User Agent) you will see that all of the content
is on a single line with a single space between each
word. That's all there is to it.
This feature comes in handy, it means that you can
use tabs, spaces and new lines to make your code easier
to read and not worry about your document looking
funny in a visual Web browser (user agent).
Space Around Tags
You need to be careful about putting white space around
your tags until you get used to this rule and then
it will become second nature.
If you want a space before or after a word that is
contained by an element you should put that space
outside the element. By this I mean before the start
tag and after the end tag. If you put it inside you
might not get any white space between your words.
<p>Always leave white space <strong>outside</strong>
your elements when you want it and not<strong>
inside </strong>.</p>
In the example above the strong element containing
the word outside has white space outside the tags,
which is the way it should be. The strong element
containing the word inside has white space inside
the tags and not outside. On some Web browsers (User
Agents) there may not be any space displayed between
the words not and inside.
I have not linked to an example for this because
most Web browsers will display the content without
problems, but they don't have to, so it's better to
get into the habit of doing it right.
Comments
When you are creating your documents you may want
to leave information for yourself or for others viewing
the document code but not viewing the document in
a Web browser (User Agent). To do this you use what
we call a comment. A comment has the following syntax:
You should be careful not to use two dashes together
within your comments as this could be thought to be
the end of the comment (even without the right angle
bracket).
Here's an example:
<!-- This is the first Web page I ever created.
-->
<p>My first Web page.</p>
<!-- This is a comment
spread over two lines. -->
View example 5
As you will see if you view the above example, the
text in the comments is ignored. Comments are useful
for leaving yourself reminders for later such as what
still needs doing to a document.
Summary
In this section we have completed our look at the
basic building blocks of XHTML. We've seen how to
use special characters in our pages with character
references and entity references, we've looked at
the way white space is handled and we've also seen
how you can add comments to your code.
In the next section we're going to continue our coverage
of the elements you can use that relate to text including,
amongst others, headings, line breaks and pre-formatted
text.