XSL was developed by the W3C to deal with the styling and presentation of XML. Structured information may need to be rendered in different ways in different applications. For example, an employee record may be displayed differently in a Human Resources (HR) management system than in a contact directory on a staff intranet – not many employees would want their salary details or home address published on the intranet.
XSL consists of three parts:
XSLT (XSL Transformations) is the most important of the three and is used for transforming XML documents into other XML documents, HTML and occasionally, simple text files.
XPath (XML Path Language) is an expression language for addressing parts of an XML document and is used in conjunction with XSLT to transform and render XML documents.
XSL–FO (XSL Formatting Objects) is a language for outputting XML documents in print form.
All three will be covered in more detail in the rest of this section.
XSLT is the technology that breathes life into XML-based applications or web publishing frameworks. We have talked about structured documents and structured information but this is useless if we can’t present information to users or other applications in a meaningful way. This is what XSLT enables us to do. The ability to separate presentation from content is one of the key benefits of XML, and XSLT makes this possible.
The fundamental concept behind XSLT is the definition of a stylesheet for processing XML documents. An XSLT processor is used to apply the stylesheet to an existing XML file and transform it into another XML file for presentation – this might be an XHTML file for publication on a website. This process is known as XSL Transformation and is shown in the following diagram:
Figure 2.1: XSL Transformation process for web publishing
Example XSLT Stylesheet
To transform Employee.xml into a web page for publication on a staff intranet, we might use the following XSLT stylesheet:
Example 2.1: Employee.xsl
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" doctype-public="-//W3C//DTD XHTML 1.0
Strict//EN" doctype-system="DTD/xhtml-strict.dtd"/>
<xsl:template match="/">
<html>
<head>
<title>Contact Directory</title>
</head>
<body>
<xsl:apply-templates select="Employee"/>
</body>
</html>
</xsl:template>
<xsl:template match="Employee">
<h1>Contact Directory</h1>
<h2><xsl:value-of select="Name"/></h2>
<h3><xsl:value-of select="JobTitle"/></h3>
<p><xsl:value-of select="JobDescription"/></p>
<ul>
<li><xsl:value-of select="Email"/></li>
<li><xsl:value-of select="Phone"/></li>
</ul>
</xsl:template>
</xsl:stylesheet>
XSLT Stylesheet Walkthrough
The XML declaration tells us that an XSLT stylesheet is itself an XML document. This is helpful for those already familiar with XML, there is no new syntax to learn here.
The root element is the
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
There is also a namespace declaration indicated by the xmlns:xsl attribute. We will look at namespaces in more detail in section 5.4.3.3, XML Namespaces. For now it is enough to say that every element prefixed by xsl: is part of the XSLT language.
The <xsl:output> element indicates that we wish to produce an HTML document by specifying method="html".
<xsl:output method="html"...
Note: We would specify method="xml" if we wished to produce another XML file, or method="text" to produce a plain text file such as a Java program or a SQL query. The latter technique is often used in computer programming but is beyond the scope of this document.
Note: The doctype-public and doctype-system attributes of the xsl:output element ensure that the generated HTML has a Strict XHTML document type declaration. This is covered in XML on the World Wide Web.
Next we have our first <xsl:template>. XSLT stylesheets typically consist of a series of templates that match elements in the source document and process them in some way for output in the destination file.
<xsl:template match="/">
<html>
<head>
<title>Contact Directory</title>
</head>
<body>
<xsl:apply-templates select="Employee"/>
</body>
</html>
</xsl:template>
<html>
<head>
<title>Contact Directory</title>
</head>
<body>
...
</body>
</html>
Note: XPath will be covered in more detail in the next section.
Inside the <body> element we have an <xsl:apply-templates> element which runs any templates that match the Employee element.
<xsl:apply-templates select="Employee"/>
Processing now jumps to the second of the templates in our stylesheet:
<xsl:template match="Employee">
<h1>Contact Directory</h1>
<h2><xsl:value-of select="Name"/></h2>
<h3><xsl:value-of select="JobTitle"/></h3>
<p><xsl:value-of select="JobDescription"/></p>
<ul>
<li><xsl:value-of select="Email"/></li>
<li><xsl:value-of select="Phone"/></li>
</ul>
</xsl:template>
The XSLT processor looks inside the Employee element and generates content based on the elements contained within: Name, JobTitle, JobDescription and so on. The output HTML is populated with values from the source XML via the <xsl:value-of> element. Hence,
<h3><xsl:value-of select="JobTitle"/></h3>
becomes<h3>Executive Officer</h3>
in the generated HTML file. When the processor finishes this template, it stops as there are no further templates to process.
Note: In a typical XSLT stylesheet there would be more than two templates and processing would continue recursively until all templates have been processed. For this reason, XSLT is referred to as a recursive language as opposed to C or Visual Basic, for example, which are termed procedural languages.
The final XHTML document will be similar to the following:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml-strict.dtd">
<html>
<head>
<title>Contact Directory</title>
</head>
<body>
<h1>Contact Directory</h1>
<h2>Lindsey Brown</h2>
<h3>Executive Officer</h3>
<p>Supervise a small team of staff and day to day
management of the Accounts Payable section.</p>
<ul>
<li>lbrown@culture.gov.uk</li>
<li>020 7421 3423</li>
</ul>
</body>
</html>
Associating Stylesheets with XML Documents
In order to see how the stylesheet renders Employee.xml, we need some way of linking it to the XML document. This can be done via a processing instruction: an instruction sent to the application reading an XML document. We use the xml-stylesheet processing instruction:
<?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet href="Employee.xsl" type="text/xsl"?> <Employee id="91710"> ...
Note: The processing instruction is inserted into Employee.xml after the XML declaration but before the root element and assumes that Employee.xsl is in the same directory.
This is how Employee.xml would now be rendered in Internet Explorer 6.0:
Figure 2.2: XML document rendered as an intranet page
The web page was generated by the browser’s built–in XSLT processor.
Note: XSLT processors are covered in Appendix A Tools and Processors.
The important point here is that we have taken a piece of employee data and presented it in a way that was suitable for the intended audience. The web user interface is completely independent of the underlying data model – true separation of presentation and content.
For more information on the xml-stylesheet processing instruction, see Associating Stylesheets with XML Documents www.w3.org/TR/xml-stylesheet/ [External website]
XPath is a non–XML expression language for addressing parts of an XML document. It is very similar to the way we reference files in a file system. If we consider again Example 1.1 in section 5.4.1.1, the following shows how it may be represented as a tree diagram:
Figure 2.3: Tree representation of Employee.xml
To reference the root node of this tree we use the XPath expression / (forward slash) – similar to the root of a Unix file system.
Note: We talk about nodes rather than elements to specify a position in a tree. XPath expressions return nodesets which are a collection of nodes in the XML tree.
The root node contains all of the elements in the XML document including the root element and any comments or processing instructions.
The expression /Employee/Address evaluates as follows:
<Address>
<Line>Grenada House</Line>
<Line>150 Beaconsfield Road</Line>
<Town>London</Town>
<Postcode>SW1V 1LQ</Postcode>
</Address>
The expression /Employee/Address/Line returns the following:
<Line>Grenada House</Line>
<Line>150 Beaconsfield Road</Line>
We may wish to refine this to return only the first line of the address. This is achieved by using a predicate to filter the nodeset further. For example, /Employee/Address/Line[position()=1] returns the first line of the address only:
<Line>Grenada House</Line>
There is a great deal more that can be done with XPath but its main use is in XSLT stylesheets as a way of referencing parts of a document that need to have templates applied to them. For more information, refer to the XPath specification at www.w3.org/TR/xpath [External website]
XSL–FO is a language for presenting XML documents in print form. It allows the author to control the way content is formatted on the page down to a minute level of detail. Similar to a word–processing package, XSL–FO allows you to control the layout of the page, the borders and margins, the fonts, the character spacing, the colours, the styles and so on. However, it is an XML vocabulary – not a graphical tool – and so all of this is specified through elements and attributes.
Note: An XML vocabulary is a set of element and attribute names associated with a particular XML-based markup language.
Formatting Objects (FO) files are rendered into the appropriate visual medium. This is most commonly PDF, but there are tools which can generate PostScript, plain text (TXT) or even Scalable Vector Graphics (SVG) from FO files. The normal process for rendering an XML document is to first transform it into an FO file using XSLT and then further process it into an output format such as PDF. This process is shown in the following diagram:
Figure 2.4: XSL Formatting Objects process for PDF rendering
Looking again at Employee.xml, let us assume we have transformed it into the following FO file using XSLT:
Example 2.2: Employee.fo
<?xml version="1.0" encoding="utf-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="simple"
page-height="29.7cm"
page-width="21cm"
margin-top="1cm"
margin-bottom="2cm"
margin-left="2.5cm"
margin-right="2.5cm">
<fo:region-body margin-top="3cm" margin-bottom="1.5cm"/>
<fo:region-before extent="3cm"/>
<fo:region-after extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="simple">
<fo:flow flow-name="xsl-region-body">
<fo:block font-size="18pt"
font-family="sans-serif"
line-height="24pt"
space-after.optimum="15pt"
background-color="black"
color="white"
text-align="center"
padding-top="3pt">
Employee Record - Lindsey Brown
</fo:block>
<fo:block>Name: Lindsey Brown</fo:block>
<fo:block>Job Title: Executive Officer</fo:block>
<fo:block>Department: Culture, Media and Sport</fo:block>
<fo:block>Pay: £30,000</fo:block>
<fo:block>Extension: 3423</fo:block>
<fo:block>Email: lbrown@culture.gov.uk</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
This is possible because Employee.fo is itself an XML file and we can easily transform Employee.xml into Employee.fo using XSLT (described above).
Formatting Objects File Walkthrough
Firstly we have the root element and namespace declaration:
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
This is followed by the <fo:layout-master-set> which in turn contains (at least) one <fo:simple-page-master> which defines the layout of the page, the margins and the running headers and footers. The headers and footers are represented by region-before and region-after, respectively. The following diagram illustrates the regions of a page in XSL–FO:
Figure 2.5: Regions of a page in XSL–FO
Once we have defined our page layouts, we begin defining the content inside a <fo:page-sequence> element:<fo:page-sequence master-reference="simple">
Note: The page-sequence refers to the master-name attribute of the simple-page-master element by setting master-reference="simple".
Inside a page-sequence we have a series of <fo:flow> blocks which target a specific region of the page as shown in the diagram above. In this case we have one flow which targets the main body of the page:
<fo:flow flow-name="xsl-region-body">
Within a flow we can define a series of <fo:block> elements which contain the text and styling we wish to display.
<fo:block font-size="18pt"
font-family="sans-serif"
line-height="24pt"
space-after.optimum="15pt"
background-color="black"
color="white"
text-align="center"
padding-top="3pt">
Employee Record – Lindsey Brown
</fo:block>
The various attributes allow complete control over every detail of presentation. If we use an XSL-FO processor to render the output as a PDF, this is how it would appear in Adobe Acrobat Reader :
Figure 2.6: XML document rendered as a Human Resources (HR) file
Note: PDFs generated in this way may not be accessible to screen readers and other assistive technologies. For more information on creating accessible PDFs, please refer to Section 2.4, Building in Universal Accessibility.
This simple example demonstrates how a single XML document can be manipulated using XSL for different output channels. XSLT was used to create an intranet page (see Figure 2.2) and XSL–FO was used to create an HR file which serves a different purpose altogether. In this way, XML technologies can reduce duplication and deliver services based on the needs of the user.
For more information on XSL-FO processors, see Appendix A.
For more information on the XSL-FO specification, see the [External website]
For your assistance – resources:
W3C XSLT 1.0 Recommendation
www.w3.org/TR/xslt [External website]
W3C XPath 1.0 Recommendation
www.w3.org/TR/xpath [External website]
W3C XSL 1.0 Recommendation
www.w3.org/TR/xsl/ [External website]
W3C Associating Stylesheets with XML Documents 1.0 Recommendation
www.w3.org/TR/xml-stylesheet/ [External website]