Revision 6846
Added by Matt Jones about 13 years ago
sitemaps.rst | ||
---|---|---|
1 | 1 |
Enabling Web Searches: Sitemaps |
2 | 2 |
=============================== |
3 | 3 |
|
4 |
Under construction! |
|
4 |
Sitemaps are XML files that tell search engines—such as Google, which is |
|
5 |
discussed in this section--which URLs on your websites are available for |
|
6 |
crawling. Currently, the only way for a search engine to crawl and index |
|
7 |
Metacat so that individual metadata entries are available via Web searches |
|
8 |
is with a sitemap. Metacat automatically creates sitemaps for all public |
|
9 |
documents in the repository. However, you must register the sitemaps with |
|
10 |
the search engine before it will take effect. |
|
5 | 11 |
|
6 |
Heading 1 |
|
7 |
------------ |
|
8 | 12 |
|
9 |
Heading 2
|
|
10 |
------------ |
|
13 |
Creating a Sitemap
|
|
14 |
------------------
|
|
11 | 15 |
|
16 |
Metacat automatically generates a sitemap file for all public documents in |
|
17 |
the repository on a daily basis. The sitemap file(s) must be available via |
|
18 |
the Web on your server, and must be registered with Google before they take |
|
19 |
effect. For information on the sitemap protocol, please refer to the Google |
|
20 |
page on using the sitemap protocol. You can view Metacat's sitemap files at:: |
|
21 |
|
|
22 |
<webapps_dir>/sitemaps |
|
23 |
|
|
24 |
The directory contains one or more XML files named:: |
|
25 |
|
|
26 |
metacat<X>.xml |
|
27 |
|
|
28 |
where ``<X>`` is a number (e.g., 1 or 2) used to increment each sitemap file. |
|
29 |
Because Metacat limits the number of sitemap entries in each sitemap file to |
|
30 |
25,000, the servlet creates an additional sitemap file for each group of |
|
31 |
25,000 entries. |
|
32 |
|
|
33 |
Verify that your sitemap files are available to the Web by browsing to:: |
|
34 |
|
|
35 |
<your_web_context>/sitemaps/metacat<X>.xml |
|
36 |
(e.g., your.server.org/knb/sitemaps/metacat1.xml) |
|
37 |
|
|
38 |
Registering a Sitemap |
|
39 |
--------------------- |
|
40 |
Before Google will begin indexing the public files in your Metacat, you must |
|
41 |
register the sitemaps. To register your sitemaps and ensure that they are up |
|
42 |
to date: |
|
43 |
|
|
44 |
1. Register for a Google Webmaster Tools account, and add your Metacat |
|
45 |
site to the Dashboard. |
|
46 |
2. From your Google Webmaster Tools site account, register your sitemaps. |
|
47 |
See the Google help site for more information about how to register sitemaps. |
|
48 |
Note: Register the full URL path to your sitemap files, including |
|
49 |
the http:// (or https://) headers. |
|
50 |
|
|
51 |
Once the sitemaps are registered, Google will begin to index the public |
|
52 |
documents in your Metacat repository. |
|
53 |
|
|
54 |
NOTE: As you add more publicly accessible data to Metacat, you will need to |
|
55 |
periodically revisit the Google Webmaster Tools utility to refresh your |
|
56 |
sitemap registration. |
Also available in: Unified diff
Converted Event Logging and Sitemaps chapters to RST.