The main idea is to give a quick example of how an XML file relates to its XSLT files, its DTD and its XML schema. Following the introduction, short programs to access an XML file from a Perl CGI script and a Java stand-alone program are presented.
name | name of the program |
---|---|
description | description of it's purpose |
language | language in which it is written |
url | URL for more information or to run the application |
date_of_origin | date it was written |
You can see a larger version of this file at
For example, when used to translate XML to HTML for display on the Web, an XSL "template" will include HTML statements along with embedded XSLT commands for selecting and displying data from the XML file that invokes it.
Here is an example that is referenced by the XML file above.
Program | Language | Description |
---|---|---|
|
|
|
Eliminate the cruft and this file reduces to a for loop embedded in HTML. Here are the major loop statements:
For example, the XSLT statement:
Note that there is a W3C standard for identifying XML elements within XSLT templates called XPATH. In the example above the XML structure is so simple that the power of XPATH is not well demonstrated.
The language may seem somewhat awkward, unless you are used to writing ASP/PHP/ColdFusion scripts, but it also seems reasonably powerful. At least one author (Erik Ray) describes XSLT as powerful enough to do 90% of what users are likely to want to do, but either unable or tortuously difficult to do the rest.
Overall, the XSLT approach seems to offer several general advantages. First, it further separates a document's content from it's format, allowing multiple agents to work with the same data in the same format, obviating the need to keep or export data in multiple formats.
Second, it moves some processing to the client from the server. This may be an advantage in some data delivery situations, where, for example, large hit rates severely overload a server. On the other hand large datafiles will probably not be efficiently processed using this approach.
Third, it "democratizes" database access. Site developers can build database driven pages even though they have no server to work with.
Note that there are additional XSL commands. For example, there is a "xsl:if" Boolean conditional, a switch construct, xsl:choose, that relies on xsl:when clauses, some ability to construct variables (though they have limited functionality), xsl:variable, and many more features.
Note also that there is also additional functionality within the commands described earlier. For example, select takes Boolean conditionals that can incorporate a broad set of functions and give detailed control over element selection. These conditionals are defined within XPATH.
You can see this file at
Here is the DTD for the XML document above:
It is simple and straight to the point, at least once you get used to reading DTDs. This one defines the "list-of-programs" element in two ways. First, it defines list-of-programs as a collection of one or more "applications" (as directed by the "+" sign). Then it defines the attributes that can appear within the list-of-programs element declaration. In this case the list-of-programs element may contain one attribute "xmlns:HTML" (XML name space) to which a value may be assigned. In this version a "fixed" value is assigned within the DTD. (As of this writing #IMPLIED did not work on my usual browser, so I was forced to use #FIXED.)
Next, the "applications" element is defined as a collection of up to 5 elements, as listed earlier in this document.
You can see this file at
Enter "schemas", or DTDs on steroids. Here is one for the XML document above:
As of this writing (2002) browsers are not using schemas to validate XML schemas. In fact, it is difficult to find a free tool or web site that will validate schemas. (They will come.)
Note also that the schema is itself defined in XML. (Hmmm...it would be good to include the schema schema here.)
In particular, one might want to use some CGI script to deliver a document that would be tortuously difficult to deliver using XSLT.
There seem to be roughly 3 approaches to accessing XML from languages:
The Document Object Model (DOM) model copies a complete XML file into a Perl data structure. XML::Simple appears to belong to this class, and XML::LibXML is another example.
The Simple API for XML (SAX) is an event-oriented approach where programmers define routines for handling each element, etc. as it arrives within an XML stream.
The third category collects eccentric approaches. The Perl RAX, and PYX packages appear to fit in such a category. RAX will deal with XML files meant to be used like record-oriented relational databases. With RAX, you simply set up a while loop to read each "record" and RAX parses each record and returns values of requested elements as they come through the input stream. PYX converts XML files to a simple text stream that can be handled by Unix (and possibly Windows) filters.
The granddaddy of approaches to reading XML in Perl is XML::Parser::Expat, a C package underlying many other Perl packages, such as XML::Parser, which can deliver element streams and/or document objects. XML::Parser is used under the covers by XML::Simple to implement its DOM approach.
Here is an example using the XML::Simple module to access the XML example above. It uses XMLin to build read the XML file and build an internal hash called "$programs", containing all the data in the file. The program then prints an HTML table containing only program names and descriptions (ignoring other information in the file):
Program | Description |
---|---|
$program->{'app_name'}->[0] | \n"; print "$program->{'description'}->[0] | \n"; print "
\n";
# if ( $max > 0 )
# {
# for ($item=0; $item < $max; $item=$item+1)
# {
# print "{'url'}->[$item]\">
# $program->{'url'}->[$item]\n";
# if ( $item < ($max - 1) )
# {
# print " "; # } # } # } # print "\n | \n";
# print "\n