1
|
Enabling Web Searches: Sitemaps
|
2
|
===============================
|
3
|
|
4
|
Sitemaps are XML files that tell search engines—such as Google, which is
|
5
|
discussed in this section--which URLs on your websites are available for
|
6
|
crawling. Currently, the only way for a search engine to crawl and index
|
7
|
Metacat so that individual metadata entries are available via Web searches
|
8
|
is with a sitemap. Metacat automatically creates sitemaps for all public
|
9
|
documents in the repository. However, you must register the sitemaps with
|
10
|
the search engine before it will take effect.
|
11
|
|
12
|
|
13
|
Creating a Sitemap
|
14
|
------------------
|
15
|
|
16
|
Metacat automatically generates a sitemap file for all public documents in
|
17
|
the repository on a daily basis. The sitemap file(s) must be available via
|
18
|
the Web on your server, and must be registered with Google before they take
|
19
|
effect. For information on the sitemap protocol, please refer to the Google
|
20
|
page on using the sitemap protocol. You can view Metacat's sitemap files at::
|
21
|
|
22
|
<webapps_dir>/sitemaps
|
23
|
|
24
|
The directory contains one or more XML files named::
|
25
|
|
26
|
metacat<X>.xml
|
27
|
|
28
|
where ``<X>`` is a number (e.g., 1 or 2) used to increment each sitemap file.
|
29
|
Because Metacat limits the number of sitemap entries in each sitemap file to
|
30
|
25,000, the servlet creates an additional sitemap file for each group of
|
31
|
25,000 entries.
|
32
|
|
33
|
Verify that your sitemap files are available to the Web by browsing to::
|
34
|
|
35
|
<your_web_context>/sitemaps/metacat<X>.xml
|
36
|
(e.g., your.server.org/knb/sitemaps/metacat1.xml)
|
37
|
|
38
|
Registering a Sitemap
|
39
|
---------------------
|
40
|
Before Google will begin indexing the public files in your Metacat, you must
|
41
|
register the sitemaps. To register your sitemaps and ensure that they are up
|
42
|
to date:
|
43
|
|
44
|
1. Register for a Google Webmaster Tools account, and add your Metacat
|
45
|
site to the Dashboard.
|
46
|
2. From your Google Webmaster Tools site account, register your sitemaps.
|
47
|
See the Google help site for more information about how to register sitemaps.
|
48
|
Note: Register the full URL path to your sitemap files, including
|
49
|
the http:// (or https://) headers.
|
50
|
|
51
|
Once the sitemaps are registered, Google will begin to index the public
|
52
|
documents in your Metacat repository.
|
53
|
|
54
|
NOTE: As you add more publicly accessible data to Metacat, you will need to
|
55
|
periodically revisit the Google Webmaster Tools utility to refresh your
|
56
|
sitemap registration.
|