Revision 6846
Added by Matt Jones almost 13 years ago
docs/dev/metacat/source/sitemaps.rst | ||
---|---|---|
1 | 1 |
Enabling Web Searches: Sitemaps |
2 | 2 |
=============================== |
3 | 3 |
|
4 |
Under construction! |
|
4 |
Sitemaps are XML files that tell search engines—such as Google, which is |
|
5 |
discussed in this section--which URLs on your websites are available for |
|
6 |
crawling. Currently, the only way for a search engine to crawl and index |
|
7 |
Metacat so that individual metadata entries are available via Web searches |
|
8 |
is with a sitemap. Metacat automatically creates sitemaps for all public |
|
9 |
documents in the repository. However, you must register the sitemaps with |
|
10 |
the search engine before it will take effect. |
|
5 | 11 |
|
6 |
Heading 1 |
|
7 |
------------ |
|
8 | 12 |
|
9 |
Heading 2
|
|
10 |
------------ |
|
13 |
Creating a Sitemap
|
|
14 |
------------------
|
|
11 | 15 |
|
16 |
Metacat automatically generates a sitemap file for all public documents in |
|
17 |
the repository on a daily basis. The sitemap file(s) must be available via |
|
18 |
the Web on your server, and must be registered with Google before they take |
|
19 |
effect. For information on the sitemap protocol, please refer to the Google |
|
20 |
page on using the sitemap protocol. You can view Metacat's sitemap files at:: |
|
21 |
|
|
22 |
<webapps_dir>/sitemaps |
|
23 |
|
|
24 |
The directory contains one or more XML files named:: |
|
25 |
|
|
26 |
metacat<X>.xml |
|
27 |
|
|
28 |
where ``<X>`` is a number (e.g., 1 or 2) used to increment each sitemap file. |
|
29 |
Because Metacat limits the number of sitemap entries in each sitemap file to |
|
30 |
25,000, the servlet creates an additional sitemap file for each group of |
|
31 |
25,000 entries. |
|
32 |
|
|
33 |
Verify that your sitemap files are available to the Web by browsing to:: |
|
34 |
|
|
35 |
<your_web_context>/sitemaps/metacat<X>.xml |
|
36 |
(e.g., your.server.org/knb/sitemaps/metacat1.xml) |
|
37 |
|
|
38 |
Registering a Sitemap |
|
39 |
--------------------- |
|
40 |
Before Google will begin indexing the public files in your Metacat, you must |
|
41 |
register the sitemaps. To register your sitemaps and ensure that they are up |
|
42 |
to date: |
|
43 |
|
|
44 |
1. Register for a Google Webmaster Tools account, and add your Metacat |
|
45 |
site to the Dashboard. |
|
46 |
2. From your Google Webmaster Tools site account, register your sitemaps. |
|
47 |
See the Google help site for more information about how to register sitemaps. |
|
48 |
Note: Register the full URL path to your sitemap files, including |
|
49 |
the http:// (or https://) headers. |
|
50 |
|
|
51 |
Once the sitemaps are registered, Google will begin to index the public |
|
52 |
documents in your Metacat repository. |
|
53 |
|
|
54 |
NOTE: As you add more publicly accessible data to Metacat, you will need to |
|
55 |
periodically revisit the Google Webmaster Tools utility to refresh your |
|
56 |
sitemap registration. |
docs/dev/metacat/source/event-logging.rst | ||
---|---|---|
1 | 1 |
Event Logging |
2 | 2 |
============= |
3 | 3 |
|
4 |
Under construction! |
|
4 |
Metacat keeps an internal log of events (such as insertions, updates, deletes, |
|
5 |
and reads) that can be accessed with the getlog action. Using the getlog action, |
|
6 |
event reports can be output from Metacat in XML format, and/or customized to |
|
7 |
include only certain events: events from a particular IP address, user, event |
|
8 |
type, or that occurred after a specified start date or before an end date. |
|
5 | 9 |
|
6 |
Heading 1
|
|
7 |
------------
|
|
10 |
The following URL is used to return the basic log—an XML-formatted log of all
|
|
11 |
events since the log was initiated::
|
|
8 | 12 |
|
9 |
Heading 2 |
|
10 |
------------ |
|
13 |
http://some.metacat.host/context/metacat?action=getlog |
|
11 | 14 |
|
15 |
Note that you must be logged in to Metacat using the HTTP interface or you |
|
16 |
will get an error message. For more information about logging in, please see |
|
17 |
Logging In with the HTTP Interface. |
|
18 |
|
|
19 |
:: |
|
20 |
|
|
21 |
<!-- Example of XML Log --> |
|
22 |
<?xml version="1.0"?> |
|
23 |
<log> |
|
24 |
<logEntry><entryid>44</entryid><ipAddress>34.237.20.142</ipAddress><principal>uid=jones, |
|
25 |
o=NCEAS,dc=ecoinformatics,dc=org</principal><docid>esa.2.1</docid><event>insert</event> |
|
26 |
<dateLogged>2004-09-08 19:08:18.16</dateLogged></logEntry> |
|
27 |
<logEntry><entryid>47</entryid><ipAddress>34.237.20.142</ipAddress><principal>uid=jones,o=NCEAS, |
|
28 |
dc=ecoinformatics,dc=org</principal><docid>esa.3.1</docid><event>insert</event><dateLogged>2004- |
|
29 |
09-14 19:50:40.61</dateLogged></logEntry> |
|
30 |
</log> |
|
31 |
|
|
32 |
The basic log can be quite extensive. To subset the report, restrict the |
|
33 |
matching events using parameters. Query parameters can be combined to further |
|
34 |
restrict the report. |
|
35 |
|
|
36 |
+-----------+-----------------------------------------------------+ |
|
37 |
| Parameter | Description and Values | |
|
38 |
+===========+=====================================================+ |
|
39 |
| ipAddress | Restrict the report to this IP Address (repeatable) | |
|
40 |
+-----------+-----------------------------------------------------+ |
|
41 |
| principal | Restrict the report to this user (repeatable) | |
|
42 |
+-----------+-----------------------------------------------------+ |
|
43 |
| docid | Restrict the report to this docid (repeatable) | |
|
44 |
+-----------+-----------------------------------------------------+ |
|
45 |
| event | Restrict the report to this event type (repeatable) | |
|
46 |
| | Values: insert, update, delete, read | |
|
47 |
+-----------+-----------------------------------------------------+ |
|
48 |
| start | Restrict the report to events after this date | |
|
49 |
| | Value: YYYY-MM-DD+hh:mm:ss | |
|
50 |
+-----------+-----------------------------------------------------+ |
|
51 |
| end | Restrict the report to events before this date. | |
|
52 |
| | Value: YYYY-MM-DD+hh:mm:ss | |
|
53 |
+-----------+-----------------------------------------------------+ |
|
54 |
|
|
55 |
To view only the 'read' events, use a URL like:: |
|
56 |
|
|
57 |
http://some.metacat.host/context/metacat?action=getlog&event=read |
|
58 |
|
|
59 |
|
|
60 |
To view only the events for a particular IP address, use a URL like:: |
|
61 |
|
|
62 |
http://some.metacat.host/context/metacat?action=getlog&ipaddress=107.9.1.31 |
|
63 |
|
|
64 |
|
|
65 |
To view only the events for a given user, use a URL like:: |
|
66 |
|
|
67 |
http://some.metacat.host/context/metacat?action=getlog&principal=uid=johndoe,o=NCEAS,dc=ecoinformatics,dc=org |
|
68 |
|
|
69 |
|
|
70 |
To view only the events for a particular document, use a URL like:: |
|
71 |
|
|
72 |
http://some.metacat.host/context/metacat?action=getlog&docid=knb.5.1 |
|
73 |
|
|
74 |
|
|
75 |
To view only the events after a given date, use a URL like:: |
|
76 |
|
|
77 |
http://some.metacat.host/context/metacat?action=getlog&start=2004-09-15+12:00:00 |
|
78 |
|
|
79 |
|
|
80 |
To view only the events before a given date, use a URL like:: |
|
81 |
|
|
82 |
http://some.metacat.host/context/metacat?action=getlog&end=2004-09-15+12:00:00 |
|
83 |
|
|
84 |
|
|
85 |
To view the 'insert' events for September 2004 (i.e., to combine parameters) use a URL like:: |
|
86 |
|
|
87 |
http://some.metacat.host/context/metacat?action=getlog&event=insert&start=2004-09-01+12:00:00&end=2004-09-30+23:59:59 |
|
88 |
|
Also available in: Unified diff
Converted Event Logging and Sitemaps chapters to RST.