Project

General

Profile

1
Enabling Web Searches: Sitemaps
2
===============================
3

    
4
Sitemaps are XML files that tell search engines—such as Google, which is 
5
discussed in this section--which URLs on your websites are available for 
6
crawling. Currently, the only way for a search engine to crawl and index 
7
Metacat so that individual metadata entries are available via Web searches 
8
is with a sitemap. Metacat automatically creates sitemaps for all public 
9
documents in the repository. However, you must register the sitemaps with 
10
the search engine before it will take effect.
11

    
12

    
13
Creating a Sitemap
14
------------------
15

    
16
Metacat automatically generates a sitemap file for all public documents in 
17
the repository on a daily basis. The sitemap file(s) must be available via 
18
the Web on your server, and must be registered with Google before they take 
19
effect. For information on the sitemap protocol, please refer to the Google 
20
page on using the sitemap protocol. You can view Metacat's sitemap files at:: 
21

    
22
  <webapps_dir>/sitemaps
23

    
24
The directory contains one or more XML files named::
25

    
26
  metacat<X>.xml
27

    
28
where ``<X>`` is a number (e.g., 1 or 2) used to increment each sitemap file. 
29
Because Metacat limits the number of sitemap entries in each sitemap file to 
30
25,000, the servlet creates an additional sitemap file for each group of 
31
25,000 entries. 
32

    
33
Verify that your sitemap files are available to the Web by browsing to::
34

    
35
  <your_web_context>/sitemaps/metacat<X>.xml 
36
  (e.g., your.server.org/knb/sitemaps/metacat1.xml)
37

    
38
Registering a Sitemap
39
---------------------
40
Before Google will begin indexing the public files in your Metacat, you must 
41
register the sitemaps. To register your sitemaps and ensure that they are up 
42
to date:
43

    
44
1. Register for a Google Webmaster Tools account, and add your Metacat 
45
   site to the Dashboard.
46
2. From your Google Webmaster Tools site account, register your sitemaps. 
47
   See the Google help site for more information about how to register sitemaps. 
48
   Note: Register the full URL path to your sitemap files, including 
49
   the http:// (or https://) headers.
50

    
51
Once the sitemaps are registered, Google will begin to index the public 
52
documents in your Metacat repository. 
53

    
54
NOTE: As you add more publicly accessible data to Metacat, you will need to 
55
periodically revisit the Google Webmaster Tools utility to refresh your 
56
sitemap registration.
(21-21/22)