3.2. Paperboy and XSLT

The paperboy base program really does no actual manipulation of XML (sorry, we cheated :). Instead, paperboy uses XSLT to transform XML into another form (using libxslt). If you want the greatest power and flexibility from Paperboy, you'll need to know XSLT. If you're less interested in being a power user, there are templates available online which you can use, though they may not do precisely what you would like. If you don't know XSLT, don't worry, it is pretty easy to learn (I, Kris, did not know much when I started the project). I learned XSLT from W3Schools. Wren Argetlahm (a fellow paperboy team member) also developed a few tutorials that will prove quite helpful (and great for hacking up). For the sake of keeping things simple, let's look at tutorial1.xsl:


<?xml version="1.0" encoding="UTF-8" ?>
<!--
	An example RSS to XHTML template for Paperboy
		Copyright (C) 2005 wren argetlahm
	
 *	This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU Library General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

This template is a simple example of what an XSLT template for paperboy should look like. It is
provided more as a tutorial of how to build an XSLT than as a production model (though it should
suffice as one).

As it stands... this template will work fine for RSS 0.91. It will probably work fine for other RSS
0.9x and 2.0 - though it's untested for these and won't use many features of those versions. It will
*not* work for RDF feeds, that support will be forthcoming (for a more complex example stylesheet).

Some prime examples of feeds to test this on are:
	http://collab.freegeek.org/~wren/rss/updates.rss (for a simple RSS 0.91 feed)
	http://collab.freegeek.org/~wren/rss/blog.rss (for a somewhat unicode heavy RSS 0.91 feed)
-->

<xsl:stylesheet version="1.0"
		xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
		xmlns="http://www.w3.org/1999/xhtml"
		xmlns:paperboy="http://sourceforge.net/projects/paperboy/"
		extension-element-prefixes="paperboy">
	<!--
	Note first the <?xml?> declaration at the very top of the page, this is necessary because
	XSLT is a subset of XML. After the <?xml?> declaration we need the root element
	<xsl:stylesheet> which should have a version number (the version of XSL not the version of
	your document) and the xsl namespace defined as above.
	
	We have the base namespace defined as XHTML here because that's what we want the output to be.
	If you're working on a template to produce output other than (X)HTML you may want to look as
	some of the other examples in this directory. This one will help you on the basics of XSLT, but
	may not answer the kind of questions you're looking for.
	
	Finally, if we want to use the registered paperboy XSLT functions we need to define a namespace
	for that URI (here we're using "paperboy:"), and we need to set extension-element-prefixes
	to look for that name.
	
	If you're looking for a basic primer on XSLT, check out <http://www.w3schools.com/xsl/>.
	That site is an excellent reference tool, and I may assume some knowledge from it in this
	tutorial.
	-->
	
	<xsl:output method="xml" version="1.0" encoding="UTF-8" omit-xml-declaration="no" indent="no"
	doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
	doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
	<!--
	The <xsl:output/> tag gives information about how to output the result of the XSL
	transformation. Here we're saying, make it XML - XHTML 1.0 Transitional to be exact; output
	should be Unicode(tm) UTF-8 (just like this page); don't omit the <?xml?> declaration (for
	well-formed XML); and don't do any indenting or pretty formatting with whitespace. That last is
	just to save some time on bandwidth since turning indentation off will frequently reduce file
	sizes by 25~40%; and we won't be looking at the output by hand anyways so who cares if it's
	pretty.
	-->
	
	<xsl:template match="/">
		<!--
		Inside the <xsl:stylesheet> you'll have one or more <xsl:template> elements, one
		of which should match "/" - the root of the XPath hierarchy. Here I'm telling it to apply
		the other templates to the children of the selected node (i.e. the root); this is so joined
		feeds still work properly
		-->
		<xsl:apply-templates />
	</xsl:template>
	
	<xsl:template match="rss">
		<!--
		Here's another <xsl:stylesheet> that matches rss feeds. Below is just some standard
		XHTML header stuff.
		-->
		<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
			<head>
				<title>Paperboy : <xsl:value-of select="channel/title" /></title>
				<link rel="stylesheet" type="text/css" href="my_stylesheet.css" />
				<!--
				If you want to use CSS, have an element like the above but replace `my_stylesheet`
				with the link to the CSS you want to use. It's best to keep the CSS in an external
				file. This helps keep the template clean and lets you have multiple generated pages
				all link to the same CSS thus minimizing maintenance issues. And preserving bandwidth
				on multiple views (because browsers will cache the CSS page rather than needing to
				transmit it over and over.
				-->
			</head>
			<body>
				<a>
					<xsl:attribute name="href">
						<xsl:value-of select="channel/image/link" />
					</xsl:attribute>
					<img>
						<xsl:attribute name="src">
							<xsl:value-of select="channel/image/url" />
						</xsl:attribute>
						<xsl:attribute name="alt">
							<xsl:value-of select="channel/image/title" />
						</xsl:attribute>
					</img>
				</a>
				<!--
				To get information from the original file and print it in the output you'll usually
				use an <xsl:value-of/> element. If you want to use the information for
				attributes in your output file (such as taking the <link> element and making
				an actual link out of it), you can use the <xsl:attribute> element as is done
				above both for the anchor and for the image.
				-->
				<h1>
					<a>
						<xsl:attribute name="href">
							<xsl:value-of select="channel/link" />
						</xsl:attribute>
						<xsl:value-of select="channel/title" />
					</a>
				</h1>
				<p>
					<xsl:value-of select="channel/description" />
				</p>
				<p>
					<xsl:value-of select="channel/copyright" />
				</p>
				<xsl:for-each select=".//item">
					<!--
					Now that we've done all the header stuff, we want something that will iterate
					over all the <item> elements in the source. So we use an
					<xsl:for-each>. Make sure to have the "." before the "//" here so it only
					looks recursively down from this node rather than from "/".
					-->
					<h2>
						<a>
							<xsl:attribute name="href">
								<xsl:value-of select="link" />
								<!--
								Once we've entered the <xsl:for-each>, the "current working
								node" is set to each of the nodes found in the select statement.
								Hence we can just use "link" instead of "item/link" or anything like
								that.
								-->
							</xsl:attribute>
							<xsl:value-of select="title" />
						</a>
					</h2>
					<xsl:if test="pubDate">
						<h3>Published <xsl:value-of select="pubDate" /></h3>
						<!--
						In order to make truly flexible templates you can't always assume that the
						feed has certain optional elements. But if they are there, you usually want
						to see them. For situations like that we can use an <xsl:if> element
						which works like your normal "if loop" in programming.
						-->
					</xsl:if>
					<p>
						<xsl:value-of disable-output-escaping="yes" select="description" />
						<!--
						Here we're disabling output escaping so that RSS 2.0 feeds which have
						embedded HTML in escaped form (i.e. using &gt; and &lt; instead of
						the carrots) will come out as HTML rather than escaped.
						-->
					</p>
				</xsl:for-each>
				<p>This page generated by <a
				href="http://sourceforge.net/projects/paperboy/">Paperboy</a>
				v<xsl:value-of select="paperboy:dotted_version()"/>.</p>
				<!--
				And after you've ended the for-each loop, you'd make your footer, hopefully one that
				lets everyone know what a nifty program you used to generate these pages.
				
				The paperboy:dotted_version() function is one of the extra XSLT functions paperboy
				provides.
				-->
			</body>
		</html>
	</xsl:template>
	<!--
	After you close out the <body> and <html> tags, you need to close your
	<xsl:template>. If you have more than one template you'd put the others here. And then
	after the last template, don't forget to close your root <xsl:stylesheet> element.
	-->
</xsl:stylesheet>
			

This will not produce a pretty result, but is sufficient for learning purposes (and you can easily link a CSS file which would make it much more eye-pleasing). For now, you don't have to worry about the special XSLT functions, we'll talk about those later, just copy it into a text editor and save it so you have something to use.