Search Engine Sitemaps

Back | Home | Next
Table of Contents
About Sitemaps
Overview
Metacat Implementation
Submit Your Sitemap
Register With Google
Re-Register With Google
About Sitemaps
Overview

Sitemaps are xml files that tell search engines which sites you would like to be available in a web search. This is especially useful in Metacat for making individual metadata entries available via web searches. This does not happen now because of lack of a web accessible browse heirarchy of metadata.

The site map file contains metadata about the available sites on your server. For information on the sitemap protocol, please refer to the Google page on using the sitemap protocol. The sitemap file must be available via the web on your server.

Metacat generates the sitemap file automatically on a daily basis (more on this in the next section). You will need to manually register the sitemap file with Google in order for it to take effect. We discuss that in the Submit Your Sitemap section.

The scope of this discussion encompasses Google web searches only.

Metacat Implementation

Metacat automatically generates sitemap files for all public documents in your catalog. You can view the sitemap files at:

<webapps_dir>/sitemaps
You should see one or more files in this directory that look like
metacat<X>.xml
Metacat limits the number of entries to 25,000 (Google's limit is 50,000). So for every group of 25,000 public documents in Metacat, you will see additional sitemap files where <X> increments for each file.

You should verify that your sitemap files are available to the web by browsing to

<your_web_context>/sitemaps/metacat<X>.xml
for instance
your.server.org/knb/sitemaps/metacat1.xml

Submit Your Sitemap
Register With Google

You will need to register for a Google Webmaster Tools account in order to register your sitemaps. A good description of how to do this is on the Google help site. Follow the instructions there to get your sitemaps registered.

One note: you should register the full url path to your sitemap files, including the http:// (or https://) headers.

Re-Register With Google

You will need to periodically revisit the Google Webmaster Tools utility to refresh your sitemap registration as you add more publicly accessible data to Metacat.

A future enhancement to Metacat will automatically refresh the sitemap registration when necessary, although you will still need to do the initial registration manually.


Back | Home | Next