1 |
6147
|
jones
|
Enabling Web Searches: Sitemaps
|
2 |
|
|
===============================
|
3 |
|
|
|
4 |
8265
|
leinfelder
|
Sitemaps are XML files that tell search engines - such as Google, which is
|
5 |
|
|
discussed in this section - which URLs on your websites are available for
|
6 |
6846
|
jones
|
crawling. Currently, the only way for a search engine to crawl and index
|
7 |
|
|
Metacat so that individual metadata entries are available via Web searches
|
8 |
|
|
is with a sitemap. Metacat automatically creates sitemaps for all public
|
9 |
|
|
documents in the repository. However, you must register the sitemaps with
|
10 |
|
|
the search engine before it will take effect.
|
11 |
6147
|
jones
|
|
12 |
|
|
|
13 |
6846
|
jones
|
Creating a Sitemap
|
14 |
|
|
------------------
|
15 |
6147
|
jones
|
|
16 |
6846
|
jones
|
Metacat automatically generates a sitemap file for all public documents in
|
17 |
|
|
the repository on a daily basis. The sitemap file(s) must be available via
|
18 |
|
|
the Web on your server, and must be registered with Google before they take
|
19 |
|
|
effect. For information on the sitemap protocol, please refer to the Google
|
20 |
|
|
page on using the sitemap protocol. You can view Metacat's sitemap files at::
|
21 |
|
|
|
22 |
|
|
<webapps_dir>/sitemaps
|
23 |
|
|
|
24 |
|
|
The directory contains one or more XML files named::
|
25 |
|
|
|
26 |
|
|
metacat<X>.xml
|
27 |
|
|
|
28 |
|
|
where ``<X>`` is a number (e.g., 1 or 2) used to increment each sitemap file.
|
29 |
|
|
Because Metacat limits the number of sitemap entries in each sitemap file to
|
30 |
|
|
25,000, the servlet creates an additional sitemap file for each group of
|
31 |
|
|
25,000 entries.
|
32 |
|
|
|
33 |
|
|
Verify that your sitemap files are available to the Web by browsing to::
|
34 |
|
|
|
35 |
|
|
<your_web_context>/sitemaps/metacat<X>.xml
|
36 |
8265
|
leinfelder
|
(e.g., your.server.org/metacat/sitemaps/metacat1.xml)
|
37 |
6846
|
jones
|
|
38 |
|
|
Registering a Sitemap
|
39 |
|
|
---------------------
|
40 |
|
|
Before Google will begin indexing the public files in your Metacat, you must
|
41 |
|
|
register the sitemaps. To register your sitemaps and ensure that they are up
|
42 |
|
|
to date:
|
43 |
|
|
|
44 |
|
|
1. Register for a Google Webmaster Tools account, and add your Metacat
|
45 |
|
|
site to the Dashboard.
|
46 |
|
|
2. From your Google Webmaster Tools site account, register your sitemaps.
|
47 |
|
|
See the Google help site for more information about how to register sitemaps.
|
48 |
|
|
Note: Register the full URL path to your sitemap files, including
|
49 |
|
|
the http:// (or https://) headers.
|
50 |
|
|
|
51 |
|
|
Once the sitemaps are registered, Google will begin to index the public
|
52 |
|
|
documents in your Metacat repository.
|
53 |
|
|
|
54 |
|
|
NOTE: As you add more publicly accessible data to Metacat, you will need to
|
55 |
|
|
periodically revisit the Google Webmaster Tools utility to refresh your
|
56 |
|
|
sitemap registration.
|