Project

General

Profile

1 6147 jones
Enabling Web Searches: Sitemaps
2
===============================
3
4 6846 jones
Sitemaps are XML files that tell search engines—such as Google, which is
5
discussed in this section--which URLs on your websites are available for
6
crawling. Currently, the only way for a search engine to crawl and index
7
Metacat so that individual metadata entries are available via Web searches
8
is with a sitemap. Metacat automatically creates sitemaps for all public
9
documents in the repository. However, you must register the sitemaps with
10
the search engine before it will take effect.
11 6147 jones
12
13 6846 jones
Creating a Sitemap
14
------------------
15 6147 jones
16 6846 jones
Metacat automatically generates a sitemap file for all public documents in
17
the repository on a daily basis. The sitemap file(s) must be available via
18
the Web on your server, and must be registered with Google before they take
19
effect. For information on the sitemap protocol, please refer to the Google
20
page on using the sitemap protocol. You can view Metacat's sitemap files at::
21
22
  <webapps_dir>/sitemaps
23
24
The directory contains one or more XML files named::
25
26
  metacat<X>.xml
27
28
where ``<X>`` is a number (e.g., 1 or 2) used to increment each sitemap file.
29
Because Metacat limits the number of sitemap entries in each sitemap file to
30
25,000, the servlet creates an additional sitemap file for each group of
31
25,000 entries.
32
33
Verify that your sitemap files are available to the Web by browsing to::
34
35
  <your_web_context>/sitemaps/metacat<X>.xml
36
  (e.g., your.server.org/knb/sitemaps/metacat1.xml)
37
38
Registering a Sitemap
39
---------------------
40
Before Google will begin indexing the public files in your Metacat, you must
41
register the sitemaps. To register your sitemaps and ensure that they are up
42
to date:
43
44
1. Register for a Google Webmaster Tools account, and add your Metacat
45
   site to the Dashboard.
46
2. From your Google Webmaster Tools site account, register your sitemaps.
47
   See the Google help site for more information about how to register sitemaps.
48
   Note: Register the full URL path to your sitemap files, including
49
   the http:// (or https://) headers.
50
51
Once the sitemaps are registered, Google will begin to index the public
52
documents in your Metacat repository.
53
54
NOTE: As you add more publicly accessible data to Metacat, you will need to
55
periodically revisit the Google Webmaster Tools utility to refresh your
56
sitemap registration.