![]() |
![]() |
XHTML |
XHTML is an XML-compliant subset of HTML which has been designed with maximum support on a large variety of clients from regular web browsers to web-enabled devices such as web-enabled cell phones, smart watches, pagers, vending machines, and PDAs. The core of XHTML only contains the elements from HTML which would be possible to implement support for on any of those clients. This means that display or memory intensive things such as frames, or JavaScript are not included in it, as a cell phone or smart watch would not have sufficient memory, processing power, or display space to properly support them.
Here is a summary of the rules you should follow to make sure that your web pages are valid XHTML:
Since XHTML is designed as an XML compliant language, all XHTML tags must be opened and closed properly, and in the proper order. In the case of IMG, BR, or HR tags, they should be closed like this:
Note that ONLY tags which normally do NOT contain other elements should be closed in this manner - a P tag for example, is designed to enclose paragraphs of text, and should always start with a <p>, and end with a </p>. Thus, <p /> is NOT a valid tag!
You must also take carefull note of the space between the name of the tag and the slash which is used to close it - this space is very important, especially for compatibility with older web browsers, since without it, they would regard the slash as part of the name of the tag. Since there is no such thing as a BR/ tag, the browser would then ignore it, and the tag would not have it's intended effect on your page.
In regular HTML, most browsers are forgiving of minor errors such as tags being closed out of proper order, however, as an XML based language, XHTML is much more strict about this. This can be illustrated with the following sequence of code:
In the above example, in a regular HTML browser, you would see "Here is some text which becomes bold and italicized in the middle of it." However, since the I tag is opened inside of the area enclosed by the B tag, in XHTML (and HTML actually) it MUST also be closed inside of the B tag. If the text outside of the B tag is still supposed to be italicized, then a new I tag should be opened for it, like this:
While this does make the file seem much longer and more complex at first, in reality it is simplifying the structure of the document, making is easier and faster for the computer to parse and determine what should be done with each bit of text. One way of making it easier to decide what tag should go where is to imagine each tag as a circle that has been drawn around whatever is inside of it - the circles can have other circles inside of them, but none of the lines of the circles can cross eachother.
Since XHTML requires exact syntax, it is important to note that all tags in XHTML should be done in lower case letters. Thus, to create a bold tag, you would use <b>, and definitely not <B>. The reason for this is because in the ASCII character set, and the more modern UNICODE system, each character converts to a number, and the numbers for capital and lower case numbers are in fact different - a capital 'C' translates to the number 67, and a lower case 'c' translates to the number 99. Since the purpose of XHTML was in part to make it easier for clients to parse, forcing all of the characters to be in one case makes sense, as it saves the client from having to constantly make calculations to convert the case of letters back and forth to decide what tag is what.
This rule applies to the names of the tags themselves (such as the img tag), and the names of any attributes on those tags (such as the alt= and src= attributes which are required on all img tags). It does not however apply to the values of attributes, so a tage that looks like the one below is just fine:
In HTML 4, there were some attributes on tags which did not require a value, for example in an image map, you could specify that an area tag did not have an href linked to it with the nohref attribute, which had no value, or that a checkbox in a form was checked with the checked attribute. In XHTML, you must ALWAYS provide a value for any attribute you use - to compensate for this, the values of attributes that had no value in HTML 4 are simply the name of the attribute itself. Below are several examples:
When specifying an attribute on a tag in XHTML, you must remember that all attribute values must be quoted, even if they are numbers. Thus, in the following example image tags, only the third tag is correct, while the first two are incorrect because they leave out the quotation marks - in the first case around everything, and in the second case just around the numbers, but in both cases wrong!
Not only will this improve the readability of your code, it will also prevent browsers from having errors and not noticing your attributes. For example, in the incorrect tag given below:
There is a major problem - because the first attribute is the alt= attribute, some browsers will read that, and then consider everything after the = to be part of the alternate text for the image, and because there is no space after the closing quote, the alternate text will be assumed to be blahblahblah"src="file.gif, and the image file itself will not be loaded. The tag above will work fine if instead done like this:
To formally let the browser know what kind of document it is about to recieve, you should add two tags to the beginning of each page - the first is an XML tag, which declares that your document is XML compliant, and can also be used to define what type of character set is used in your document. The second is a !DOCTYPE tag, which declares the specific version of XHTML you are using, and points to the relevant DTD file for it. (The DTD file contains the specification for an XML language.) This can be usefull, as it most modern browsers have several rendering modes, depending on what type of document it is about to recieve - the default mode, if the browser does not know what kind of document it is getting, must first scan through the document and guess what kind of file it is (ie: HTML 4, XHTML, or something else) and then as all of the tags are loading, it must be able to deal with any quirks, errors, or generally bad things that it finds. All of this extra checking slows down the browser, and makes your page show up more slowly, to the point that on an older system, large pages will load noticably slower if they do not have the <?xml and DOCTYPE tags. Providing those tags tells the browser what to expect, and turns off most of the error checking/correcting features, as you are essentially telling the browser "This is what you will be getting, and there aren't any errors, so don't worry about them!".
IMPORTANT NOTE: The <?xml tag must be the absolute first thing in your page for it to work - you can not place any XHTML comments, blank lines, or even spaces before it! The <!DOCTYPE tag must immediately follow the <?xml tag, with no other tags between them. For an example of this, use "view source" on any of the class website pages.
The <?xml tag provides two pieces of information to the browser - first, it informs the browser that it is about to recieve a proper XML type file, (ie: one that follows all seven of these rules,) and second, it tells the browser what encoding, or Character Set to display your page in.
Character Sets can be very usefull for providing international content in your web pages - since the standard character set for computers in North America does not provide for the letters used by some other languages, you can specify instead that you are using another, to allow your page to contain text in Greek, Hebrew, or even Chinese characters. Here are a few examples of the starting XML tag, along with an explanation of what character set they are describing:
| <?xml version="1.0" encoding="iso-8859-1"?> | The North American standard character set. Use this if you don't need anything special |
| <?xml version="1.0" encoding="us-ascii"?> | The older, outdated North American standard character set. |
| <?xml version="1.0" encoding="iso8859-7"?> | Greek |
| <?xml version="1.0" encoding="gb18030"?> | Chinese, Simplified |
| <?xml version="1.0" encoding="koi8-u"?> | Ukrainian |
A considerable number of additional character sets also exist which can be used - for a complete listing and explanation, you can go to http://www.iana.org/assignments/character-sets. Remember however, that if you use a non-standard character set (ie: anything but utf-8 or iso-8859-1) there is a chance that people will not be able to properly read your document, if their own computer has not been set up to view that type of text. The alternate character sets should also only be used if you have software which actually allows you to produce text in that format.
In some cases (such as with our class server), your web server can be set up to provide the character set information as part of the HTTP headers from the server, making the <?xml line's encoding un-necesary, assuming you wish to provide your files in the character set the server is set to. Our class server electron.cs.uwindsor.ca is set to automatically provide a character set of "ISO-8859-1" (which is almost identical to the "us-ascii" set) - if you wish to provide text that needs a different character set (such as Chinese or Greek text) on a page from electron, you must have the encoding set inside of your page, or the browser will not be able to read your page properly.
The !DOCTYPE tag on your XHTML document declares which specific type of XHTML you are using in your file. You can find a complete list of all of the possible DOCTYPE tags at http://www.w3.org/QA/2002/04/valid-dtd-list.html, but for the sake of this course, you only need to know about two of them. The first, and most common one is this:
It declares that your page uses the XHTML 1.0 Transitional version of XHTML. (This is what your projects are expected to use.) The other type which you may end up needing applies only to an HTML file which contains the frameset tag - it declares that this particular page uses the variation of XHTML which allows for the use of frames, and it looks like this:
In the case of both the XML and the !DOCTYPE tag, you should only place the one tag at the top of the file - there is no "closing" part to either of them, as they are both special XML tags.
A validator is a program which can look at your source code in a language, (XHTML, CSS, C, or anything else with a specific syntax) and evaluate it for correctness. An XHTML validator is available to check your code for errors at the URL http://validator.w3.org/. Note that for the purposes of your project in this course, your project will need to pass the validator check on this site for XHTML 1.0 Transitional.
The XML and !DOCTYPE tags described in section 2.7 will actually make the validator easier for you to use if you include them in your XHTML files from the begining, as they tell the validator what sort of file it is supposed to be looking at. If your file has these tags already, validating your document involves simply typing the URL into the form, and clicking on the validate button, and it will take you to a page which lists off any errors in your page, and explains (as well as it is able) what is wrong about them. If your page lacks the XML and !DOCTYPE tags, you must go through a second form, where you also must select the character set used, as well as the Doctype.
One additional thing the XHTML validator has come to require recently, is a set of attributes on the <html> tag specifying the language of the page - this will make the first few lines of every one of your HTML files look like this:
The XHTML standard was designed as a series of modules, containing different parts of the language, which were then grouped together as either core or optional modules. This means that to be XHTML compliant, a browser only needs to support the core XHTML tags, and if the device the browser is designed to run on is not capable of supporting one of the optional modules, then it is not required to do so. Tags for things like images, forms, tables, and javascript are all part of optional modules, so some browsers may not fully support them, and this should be kept in mind if you wish to design a page for the maximum number of viewers.
In some ways, the core of XHTML can be seen as less advanced than the older HTML standards, as it lacks many of the advanced features of HTML 4.0 in order to allow support on small clients, however, this smaller set of features makes the job of the web designer much simpler, as they need only create their pages in XHTML, and they will know for certain that they will display properly and entirely on any web access device. None the less, despite not containing any of those features, XHTML does contain a method for integrating them into a web site, as it was designed in small sub-sections called modules - each module contains tags which can be used to accomodate a specific type of task. The core of XHTML, called XHTML basic is composed of 11 modules, which contain tags to handle tasks such as hyperlinking, displaying images, lists, and denoting text structures such as headings and paragraphs. A full description of the XHTML basic standard can be found here, and a listing of the modules, and which tags they contain is found in Section 3 of that page.
In addition to the modules in XHTML basic, other modules can be added to support further features and tags, such as scripting and frames - these currently are featured in the two XHTML sub languages XHTML transitional and XHTML frameset. Further information about XHTML and the XHTML standard can be found at http://www.w3.org/TR/xhtml1/.
The web is a continually evolving medium, and as technology progresses, the language web pages are written in needs to adapt to support this. Initially, there was some competition between XHTML 2, and HTML 5, though in the end the HTML 5 standard is the one that was chosen by the industry and standards bodies. While the XHTML 2 specification has not been furthur developed since 2006, HTML 5 is still under active development, and is planned to become a final recomendataion some time in 2012.
It should be noted however, that despite being selected as the next language for web page markup, HTML 5 is not yet a completed specification - this means that not only are some parts of it not yet determined, but also that some parts which are already outlined may be subject to change! While most of the current browsers have implemented partial HTML 5 support already on some of the expected features the implementation of those features varies, sometimes only slightly, and sometimes greatly, from one browser to another. One example of such a tag is the canvas tag from HTML 5, (for drawing vecor graphics in your page which can be controlled and animated in javascript, replacing the functionality of most Flash and Silverlight applets) which is supported by Firefox and Apple's Safari browsers, but not by Internet Explorer. Similarly, the HTML 5 video tag is implemented differently in all major browsers, supporting only the video formats favoured by each particular browser developer rather than a common standard format, since the specification does not yet give a required format to be supported.