Note: Keep an eye out for a new post with an updated sitemaps class which includes index creation. – april 17th
I’ll do this from time to time if I think it will help a lot of people: release some of my own code for your use. All I ask is that you keep my author tag in place.
Creating a sitemap based on the sitemaps.org protocol is a very smart thing to do SEO-wise. The protocol itself is backed by all the major search engines and it’s very easy to let the search engines know about the sitemap. You can also inform Google through webmaster tools and see reports on when it was downloaded, if there are any errors, etc.
Now I know there is a pear package that does this already. And I actually tried to use that before making my own class. However, I was dealing with a very large number of URL’s and the pear package was not up to par. It was VERY slow and hogged a lot of memory. My class on the other hand is very fast and uses a more manageable amount of memory. This is mostly attributed to this class writing each URL entry to the file on the fly. I’ve used it with the maximum allowed number of URL’s (50,000) and the memory usage was around 20mb I believe.
First, I’ll go through the basic usage of my sitemaps class.
Usage
<?php
/**
* Example usage of the sitemaps class made by George Gonzalez
*/
class sitemap_example extends sitemaps
{
var $urlList;
function __construct()
{
$filename = '/full/path/to/your/sitemap.xml';
parent::__construct($filename);
}
/**
* could get this information from a database. could also be a multi-dimensional
* array with prioritiy values (0.0 - 1.0) and frequency of change (always, hourly,
* daily, weekly, monthly, yearly, or never)
*/
public function getURLs()
{
$this->urlList = array('www.google.com', 'www.yahoo.com', 'www.cnn.com', 'www.lakers.com');
}
public function cycleURLs()
{
foreach($this->urlList as $url) {
$this->addUrl($url, '0.5', 'weekly');
}
}
}
// make an instance of this
$sm = new sitemap_example();
// get our URL list
$sm->getURLS();
// iterate through our URL's and use addUrl() to add them to the sitemap
$sm->cycleUrls();
// important. close out our sitemap file and see any error notices
$sm->finish();
?>
You can customize getURLs() to optionally retrieve your URL’s from the database. For example, if you have URLs in the form: http://www.domain.com/article.php?id=45, then run a mysql query to grab all the article ID’s and add them to the string ‘http://www.domain.com/article.php?id=’.
Another recommended customization here is for the frequency of change and priority for each URL. If you have a quality/importance metric you can go by, grade each URL for the priority and give it a value between 0.0 and 1.0. This doesn’t guarantee better rankings but hints at the search engine that a 1.0 page is more important or relevant than a 0.1 page (and bots will be more likely to go to the higher priority ones more often).
Of course if you customize getURLs to make a multi-dimensional array, you’ll also have to customize cycleURLs to use that array properly.
Don’t forget to call the finish() function! It will give an error notice if the xml file is over 10mb or has over 50,000 URL’s. Both of those are limits set by the sitemaps.org protocol.
Over 50,000 URL’s?
If you have over 50,000 URL’s, you will need to make several sitemap files. Do sitemaps that hold up to 50,000 URL’s and then make a sitemaps index file. The index file is in this form:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
Future Versions
In future versions of my sitemaps class I’ll also include a way to ping the search engines of the new sitemap file. In addition, I want to add a built in way to make sitemap index files.
The Class Itself
Please note that this requires PHP5.
<?php
/**
* Tool to help build xml sitemaps according to sitemaps.org protocol.
* DO NOT TAKE OFF THE AUTHOR TAG!
* @author George Gonzalez <webmaster@socaltrailriders.org>
* @version 1.0
*/
error_reporting(E_ALL);
/**
* XML Sitemaps class that allows us to make sitemaps based on sitemaps.org
* protocol which all the major search engines adhere to.
* @link http://www.sitemaps.org Sitemaps Protocol
* @author George Gonzalez <webmaster@socaltrailriders.org>
*/
class sitemaps
{
/**
* @var mixed The file handler for the sitemap file (before it gets gzipped)
**/
var $f;
/**
* @var string The file, with full path, to the XML file we will create
* before it gets gzipped
**/
var $xmlFile;
/**
* $var integer Keeps a count of the URL's added - should be <= 50,000 for it to be within
* sitemaps.org spec
*/
var $urlCount = 0;
/**
* Initialize this with the filename (including full path) of the file we
* want to do. This isn't including the .gz but can include the .xml. Also
* check to make sure the diretory is writable
* @param string $file The filename
*/
function __construct($file)
{
if (empty($file) || (strlen($file)) < 5) {
trigger_error('The proposed filename doesnt appear to be valid.', E_USER_NOTICE);
}
$parts = explode('/', $file);
$num = count($parts) - 1;
unset($parts[$num]);
$dir = implode('/', $parts);
$dir = trim($dir, '/');
$dir = '/'.$dir.'/';
$result = is_writable($dir);
if ($result === false) {
trigger_error('The directory '.$dir.' is not writeable.', E_USER_ERROR);
}
$this->xmlFile = $file;
$result = $this->_startXML();
}
/**
* Adds a URL to the sitemap. This adds it to the file on the fly
* @param string $url The URL to add to the sitemap
* @param string $priority The priority of the URL, relative to other pages on
* your site. From 0 - 1.
* @param string $freq The frequency that the content on the given URL
* changes. Can be: always, hourly, daily, weekly, monthly, yearly, or never
* @return boolean Returns true if a write went well, false if not
*/
public function addUrl($url, $priority, $freq)
{
// appends XML tags to the end of a file
$xml = '<url><loc>'.$url.'</loc><changefreq>'.$freq.'</changefreq>
<priority>'.$priority.'</priority></url>';
$bytes = fwrite($this->f, $xml);
if ($bytes === false) {
trigger_error('Couldnt add URL '.$url.' to the file.', E_USER_NOTICE);
}
else {
$this->urlCount++;
}
return ($bytes === false) ? false : true;
}
/**
* Once we're done adding URL's, finish off the xml with the closing urlset tag
* and possibly gzip the xml contents + delete the original xml file
* @param boolean $doGz Whether or not to gzip our xml file (and delete the xml file)
* defaults to yes (true)
* @return boolean Possibly Description as well
*/
public function finish($doGz = true)
{
$this->_finishXML();
if ($doGz == true) {
$this->_gzipFile();
$ul = unlink($this->xmlFile);
if ($ul === false) {
trigger_error('Couldnt delete '.$this->xmlFile, E_USER_NOTICE);
}
}
}
/**
* Open the xml file (or create it) with the first couple lines that
* follow the Sitemaps protocol
* @return boolean Returns true if all went well
*/
private function _startXML()
{
// creates a file and starts it
$xml = '<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
$this->f = fopen($this->xmlFile, 'w');
if ($this->f === false) {
trigger_error('Cant open file '.$this->xmlFile.' for writing.', E_USER_ERROR);
return false;
}
else {
$bytes = fwrite($this->f, $xml);
return ($bytes === false) ? false : true;
}
}
/**
* Finish the XML file by adding a closing </urlset> and close the file
* @return boolean Possibly Description as well
*/
private function _finishXML()
{
// adds the ending/closing tags on the XML file
$xml = '</urlset>';
$bytes = fwrite($this->f, $xml);
if ($bytes === false) {
trigger_error('couldnt write the closing tag to the xml file', E_USER_NOTICE);
}
$close = fclose($this->f);
if (filesize($this->xmlFile) > 10485760) {
trigger_error($this->xmlFile.' is bigger than 10mb! This is not within sitemaps.org spec.', E_USER_NOTICE);
}
if ($this->urlCount > 50000) {
trigger_error('You have over 50,000 URL\'s! This is not within sitemaps.org spec.', E_USER_NOTICE);
}
return ($close === false) ? false : true;
}
/**
* Read in the XML sitemap, then uses that content and gzips it using the
* same filename with '.gz' added onto it to designate the gzip
* @return boolean Returns true if the gzip operation worked
*/
private function _gzipFile()
{
$fh = fopen($this->xmlFile, 'r');
if ($fh === false) {
trigger_error('Couldnt open '.$this->xmlFile.' to take contents for gzip.', E_USER_ERROR);
}
$xml = fread($fh, filesize($this->xmlFile));
if ($xml === false) {
trigger_error('Couldnt read '.$this->xmlFile.' to take contents for gzip.', E_USER_ERROR);
}
if (fclose($fh) === false) {
trigger_error('Couldnt close '.$this->xmlFile.' after reading for gzip.', E_USER_NOTICE);
}
$gzipFile = $this->xmlFile.'.gz';
$gz = gzopen($gzipFile, 'w');
if ($gz === false) {
trigger_error('Couldnt open '.$gzipFile, E_USER_ERROR);
}
$bytes = gzwrite($gz, $xml);
if ($bytes > 0) {
echo "Wrote $gzipFile \n";
gzclose($gz);
return $gzipFile;
}
else {
return false;
}
}
}
?>
Please comment if you have a question or suggestion.