|
1 |
<!--
|
|
2 |
* harvester.html
|
|
3 |
*
|
|
4 |
* Authors: Duane Costa
|
|
5 |
* Copyright: 2004 Regents of the University of California and the
|
|
6 |
* National Center for Ecological Analysis and Synthesis,
|
|
7 |
* and the University of New Mexico.
|
|
8 |
* For Details: http://www.nceas.ucsb.edu/
|
|
9 |
* Created: 2004 April 9
|
|
10 |
* Version:
|
|
11 |
* File Info: '$ '
|
|
12 |
*
|
|
13 |
*
|
|
14 |
-->
|
|
15 |
<HTML>
|
|
16 |
<HEAD>
|
|
17 |
<TITLE>Metacat Harvester</TITLE>
|
|
18 |
<link rel="stylesheet" type="text/css" href="@docrooturl@default.css">
|
|
19 |
</HEAD>
|
|
20 |
<BODY>
|
|
21 |
<table width="100%">
|
|
22 |
<tr>
|
|
23 |
<td class="tablehead" colspan="2">
|
|
24 |
<p class="label">Metacat Harvester</p>
|
|
25 |
</td>
|
|
26 |
<td class="tablehead" colspan="2" align="right">
|
|
27 |
<a href="./properties.html">Back</a> |
|
|
28 |
<a href="./metacattour.html">Home</a> |
|
|
29 |
<a href="./unimplem.html">Next</a>
|
|
30 |
</td>
|
|
31 |
</tr>
|
|
32 |
</table>
|
|
33 |
<h4>Introduction</h4>
|
|
34 |
The Metacat Harvester (henceforth referred to as "Harvester") is a
|
|
35 |
program that automates the retrieval of EML documents from one or more sites
|
|
36 |
and their subsequent upload (insert or update) to Metacat. Harvester uses pull
|
|
37 |
technology to retrieve and upload documents to Metacat on a regularly
|
|
38 |
scheduled basis.
|
|
39 |
<P>
|
|
40 |
Although Harvester is included with a Metacat installation (beginning with
|
|
41 |
Metacat version 1.4.0), it is an extention to Metacat's functionality
|
|
42 |
that may be used optionally.
|
|
43 |
</P>
|
|
44 |
<h4>Definitions</h4>
|
|
45 |
The following table defines a number of terms that are useful in discussing
|
|
46 |
Harvester and its features.
|
|
47 |
<br><br>
|
|
48 |
<table border="1">
|
|
49 |
<tr>
|
|
50 |
<td><b>Term</b></td>
|
|
51 |
<td><b>Definition</b></td>
|
|
52 |
</tr>
|
|
53 |
<tr>
|
|
54 |
<td>Harvester</td>
|
|
55 |
<td>The Harvester program, a Java application that is bundled with the
|
|
56 |
Metacat distribution. When a user installs Metacat on a system,
|
|
57 |
the Harvester program is automatically included in the
|
|
58 |
installation.
|
|
59 |
</td>
|
|
60 |
</tr>
|
|
61 |
<tr>
|
|
62 |
<td>Harvester Administrator</td>
|
|
63 |
<td>The individual who installs and manages Harvester. Typically, this
|
|
64 |
would be the same individual who installs and manages Metacat at a
|
|
65 |
given installation.
|
|
66 |
</td>
|
|
67 |
</tr>
|
|
68 |
<tr>
|
|
69 |
<td>Harvest Site</td>
|
|
70 |
<td>A location from which Harvester can retrieve EML documents. A given
|
|
71 |
Harvester can retrieve documents from any number of different
|
|
72 |
Harvest Sites.
|
|
73 |
</td>
|
|
74 |
</tr>
|
|
75 |
<tr>
|
|
76 |
<td>Harvest</td>
|
|
77 |
<td>The act (by Harvester) of visiting a Harvest Site, retrieving a
|
|
78 |
number of EML documents, and inserting or updating the documents to
|
|
79 |
Metacat.
|
|
80 |
</td>
|
|
81 |
</tr>
|
|
82 |
<tr>
|
|
83 |
<td>Harvest List</td>
|
|
84 |
<td>An XML document that lists a set of EML documents to be harvested. The
|
|
85 |
Harvest List must conform to an XML Schema,
|
|
86 |
<a href="../../lib/harvester/harvestList.xsd">harvestList.xsd</a>.
|
|
87 |
</td>
|
|
88 |
</tr>
|
|
89 |
<tr>
|
|
90 |
<td>Site Contact</td>
|
|
91 |
<td>The individual at a particular Harvest Site who registers with
|
|
92 |
Harvester, composes a Harvest List, and periodically prepares
|
|
93 |
the site's EML documents for retrieval and upload to Metacat.
|
|
94 |
</td>
|
|
95 |
</tr>
|
|
96 |
<tr>
|
|
97 |
<td>Harvest List URL</td>
|
|
98 |
<td>A URL to the Harvest List, as specified by the Site Contact.
|
|
99 |
Each Harvest Site corresponds to a Harvest List URL. Harvester
|
|
100 |
uses the URL to locate and read a site's Harvest List.
|
|
101 |
</td>
|
|
102 |
</tr>
|
|
103 |
<tr>
|
|
104 |
<td>Document URL</td>
|
|
105 |
<td>A URL to an EML document, as specified in the Harvest List.
|
|
106 |
The Harvest List may contain any number of Document URLs. Each
|
|
107 |
Document URL provides a locator to a document to be harvested.
|
|
108 |
</td>
|
|
109 |
</tr>
|
|
110 |
<tr>
|
|
111 |
<td>Harvester Registration Page</td>
|
|
112 |
<td>A web page that provides a means for a Site Contact
|
|
113 |
to register with Harvester to schedule regular harvests from the
|
|
114 |
site. Registration involves logging in and then specifying various
|
|
115 |
settings for the Harvest Site, such as the Harvest List URL, the
|
|
116 |
harvest frequency, and the email address of the Site Contact.
|
|
117 |
</td>
|
|
118 |
</tr>
|
|
119 |
</table>
|
|
120 |
<h4>Managing Harvester</h4>
|
|
121 |
Harvester is managed by the Harvester Administrator. Typically, the same
|
|
122 |
individual who manages a Metacat server would also act as the Harvester
|
|
123 |
Administrator. The responsibilities of the Harvester Administrator include:
|
|
124 |
<ul>
|
|
125 |
<li><a href="#Configuring Harvester">Configuring Harvester</a></li>
|
|
126 |
<li><a href="#Running Harvester">Running Harvester</a></li>
|
|
127 |
<li><a href="#Reviewing Harvester">Reviewing Harvester reports to
|
|
128 |
the Harvester Administrator</a></li>
|
|
129 |
</ul>
|
|
130 |
<h5><a name="Configuring Harvester">Configuring Harvester</a></h5>
|
|
131 |
<p>Harvester must be configured to interact with a working Metacat
|
|
132 |
installation. Thus, a Metacat installation that has been properly
|
|
133 |
configured and installed is a pre-requisite to running Harvester.
|
|
134 |
Additionally, Harvester has a number of settable properties that
|
|
135 |
control its behavior. All Harvester configuration information is managed
|
|
136 |
in a single file,
|
|
137 |
<a href=../../lib/harvester/harvester.properties>harvester.properties</a>,
|
|
138 |
located at:
|
|
139 |
<pre> METACAT_HOME/lib/harvester/harvester.properties</pre>
|
|
140 |
where METACAT_HOME is the top-level directory that Metacat is
|
|
141 |
installed in.
|
|
142 |
</p>
|
|
143 |
<p>The Harvester Administrator should edit
|
|
144 |
<a href=../../lib/harvester/harvester.properties>harvester.properties</a>,
|
|
145 |
setting appropriate values for the Metacat URL, database driver,
|
|
146 |
database connection, and other settings. The
|
|
147 |
following table is a summary of each property and its function.
|
|
148 |
</p>
|
|
149 |
<table border="1">
|
|
150 |
<tr>
|
|
151 |
<td><b>Property</b></td>
|
|
152 |
<td><b>Description</b></td>
|
|
153 |
<td><b>Possible or default value</b></td>
|
|
154 |
</tr>
|
|
155 |
<tr>
|
|
156 |
<td>connectToMetacat</td>
|
|
157 |
<td>This property determines whether Harvester should connect to
|
|
158 |
Metacat to upload documents. It should be set to <code>true</code>
|
|
159 |
under most circumstances. Setting this property to <code>false</code>
|
|
160 |
can be useful for testing whether Harvester is able to retrieve
|
|
161 |
documents from a site without actually connecting to Metacat to
|
|
162 |
upload the documents.</td>
|
|
163 |
<td><code>true</code> | <code>false</code><br>
|
|
164 |
Default: <code>true</code>
|
|
165 |
</tr>
|
|
166 |
<tr>
|
|
167 |
<td>dbDriver</td>
|
|
168 |
<td>The JDBC driver to be used to access the backend database. This
|
|
169 |
setting should match the value of the dbDriver property as set
|
|
170 |
in the <a href=../../build.xml>build.xml</a> file as appropriate
|
|
171 |
to the database being used (Oracle, PostgreSQL, or SQL Server).
|
|
172 |
</td>
|
|
173 |
<td>Examples:<br>
|
|
174 |
<code>oracle.jdbc.driver.OracleDriver</code><br>
|
|
175 |
<code>org.postgresql.Driver</code><br>
|
|
176 |
<code>com.microsoft.jdbc.sqlserver.SQLServerDriver</code>
|
|
177 |
</td>
|
|
178 |
</tr>
|
|
179 |
<tr>
|
|
180 |
<td>defaultDB</td>
|
|
181 |
<td>The JDBC connection string that Metacat uses to connect to the
|
|
182 |
backend database. This setting should match the value of
|
|
183 |
the <code>jdbc-connect</code> property as set in the
|
|
184 |
<a href=../../build.properties>build.properties</a>
|
|
185 |
file in the associated Metacat installation.</td>
|
|
186 |
<td>Example:<br>
|
|
187 |
<code>jdbc:oracle:thin:@server.domain.com:1521:Metacat</code></td>
|
|
188 |
</tr>
|
|
189 |
<tr>
|
|
190 |
<td>delay</td>
|
|
191 |
<td>The number of hours that Harvester will wait before beginning its
|
|
192 |
first harvest. For example, if Harvester is run at 1:00 p.m., and
|
|
193 |
the delay is set to 12, Harvester will begin its first harvest at
|
|
194 |
1:00 a.m.</td>
|
|
195 |
<td>Default: 0</td>
|
|
196 |
</tr>
|
|
197 |
<tr>
|
|
198 |
<td>harvesterAdministrator</td>
|
|
199 |
<td>The email address of the Harvester Administrator. Harvester will
|
|
200 |
send email reports to this address after every harvest.
|
|
201 |
</td>
|
|
202 |
<td>An email address</td>
|
|
203 |
</tr>
|
|
204 |
<tr>
|
|
205 |
<td>logPeriod</td>
|
|
206 |
<td>The number of days that Harvester should retain log entries of harvest
|
|
207 |
operations in the database. Harvester log entries record information
|
|
208 |
such as which documents were harvested, from which sites, and
|
|
209 |
whether any errors were encountered during the harvest. Log entries
|
|
210 |
older than <code>logPeriod</code> number of days are purged from the
|
|
211 |
database at the end of each harvest.</td>
|
|
212 |
<td>Default: 90</td>
|
|
213 |
</tr>
|
|
214 |
<tr>
|
|
215 |
<td>maxHarvests</td>
|
|
216 |
<td>The maximum number of harvests that Harvester should execute before
|
|
217 |
shutting down. When the Harvester program is executed, it will
|
|
218 |
continue running until it has executed <code>maxHarvests</code>
|
|
219 |
number of harvests and then the program will terminate.</td>
|
|
220 |
<td>Default: 30</td>
|
|
221 |
</tr>
|
|
222 |
<tr>
|
|
223 |
<td>metacatURL</td>
|
|
224 |
<td>The URL of the Metacat servlet to which Harvester should connect
|
|
225 |
for uploading documents.</td>
|
|
226 |
<td>Example:<br>
|
|
227 |
http://somehost.institution.edu:8080/knb/servlet/metacat</td>
|
|
228 |
</tr>
|
|
229 |
<tr>
|
|
230 |
<td>password</td>
|
|
231 |
<td>The password that Harvester uses to access the backend database.
|
|
232 |
This setting should match the value of the <code>password</code>
|
|
233 |
property as set in the
|
|
234 |
<a href=../../build.properties>build.properties</a>
|
|
235 |
file in the associated Metacat installation.
|
|
236 |
</td>
|
|
237 |
<td> </td>
|
|
238 |
</tr>
|
|
239 |
<tr>
|
|
240 |
<td>period</td>
|
|
241 |
<td>The number of hours between harvests. Harvester will run a new
|
|
242 |
harvest every <code>period</code> number of hours, until the
|
|
243 |
<code>maxHarvests</code> number of harvests have been run.</td>
|
|
244 |
<td>Default: 24</td>
|
|
245 |
</tr>
|
|
246 |
<tr>
|
|
247 |
<td>smtpServer</td>
|
|
248 |
<td>The SMTP server that Harvester uses for sending email messages
|
|
249 |
to the Harvester Administrator and to Site Contacts.</td>
|
|
250 |
<td>A host name, for example: <code>somehost.institution.edu</code>
|
|
251 |
<br><br>
|
|
252 |
Default: <code>localhost</code>
|
|
253 |
<br><br>
|
|
254 |
Note that the default value will only work if the Harvester
|
|
255 |
host machine has been configured as a SMTP server.
|
|
256 |
</td>
|
|
257 |
</tr>
|
|
258 |
<tr>
|
|
259 |
<td>user</td>
|
|
260 |
<td>The username that Metacat uses to access the backend database.
|
|
261 |
This setting should match the <code>user</code> value as set in the
|
|
262 |
<a href=../../build.properties>build.properties</a>
|
|
263 |
file in the associated Metacat installation.
|
|
264 |
</td>
|
|
265 |
<td> </td>
|
|
266 |
</tr>
|
|
267 |
<tr>
|
|
268 |
<td>Harvester Operation Properties (GetDocError, GetDocSuccess, etc.)</td>
|
|
269 |
<td>This group of properties is used by Harvester to report information
|
|
270 |
about the operations it performs for inclusion in log
|
|
271 |
entries and email messages. Under most circumstances the values
|
|
272 |
of these properties should not be modified.</td>
|
|
273 |
<td> </td>
|
|
274 |
</tr>
|
|
275 |
</table>
|
|
276 |
<br>
|
|
277 |
<h5><a name="Running Harvester">Running Harvester</a></h5>
|
|
278 |
After Harvester has been appropriately
|
|
279 |
<a href="#Configuring Harvester">configured</a>,
|
|
280 |
it can be run as follows:
|
|
281 |
<ol>
|
|
282 |
<li>Open a system command window or terminal window.</li>
|
|
283 |
<li>Set the METACAT_HOME environment variable to the value of the Metacat
|
|
284 |
installation directory. Some examples follow:
|
|
285 |
<ul>
|
|
286 |
<li>On Windows:
|
|
287 |
<pre>set METACAT_HOME=C:\somePath\metacat</pre></li>
|
|
288 |
<li>On Linux/Unix (bash shell):
|
|
289 |
<pre>export METACAT_HOME=/home/somePath/metacat</pre></li>
|
|
290 |
</ul>
|
|
291 |
<li>cd to the following directory:
|
|
292 |
<ul>
|
|
293 |
<li>On Windows:
|
|
294 |
<pre>cd %METACAT_HOME%\lib\harvester</pre></li>
|
|
295 |
<li>On Linux/Unix:
|
|
296 |
<pre>cd $METACAT_HOME/lib/harvester</pre></li>
|
|
297 |
</ul>
|
|
298 |
<li>Run the appropriate Harvester shell script, as determined by the
|
|
299 |
operating system:
|
|
300 |
<ul>
|
|
301 |
<li>On Windows:
|
|
302 |
<pre>runHarvester.bat</pre></li>
|
|
303 |
<li>On Linux/Unix:
|
|
304 |
<pre>sh runHarvester.sh</pre></li>
|
|
305 |
</ul>
|
|
306 |
</li>
|
|
307 |
</ol>
|
|
308 |
<p>The Harvester application will start executing. It will begin its first
|
|
309 |
harvest after <code><b>delay</b></code> number of hours (as specified in the
|
|
310 |
<a href=../../lib/harvester/harvester.properties>harvester.properties</a>
|
|
311 |
file). The application will continue running a new harvest every
|
|
312 |
<code><b>period</b></code> number of hours until a <code><b>maxHarvests</b></code>
|
|
313 |
number of harvests have been completed.
|
|
314 |
</p>
|
|
315 |
<h5><a name="Reviewing Harvester">
|
|
316 |
Reviewing Harvester Reports to the Harvester Administrator</a></h5>
|
|
317 |
<P>
|
|
318 |
After every harvest, Harvester will send an email report to the Harvester
|
|
319 |
Administrator detailing the operations that were performed during the
|
|
320 |
harvest. The report will contain information about each of the Harvest Sites
|
|
321 |
that were harvested from, such as which EML documents were
|
|
322 |
harvested and whether any errors were encountered.
|
|
323 |
</P>
|
|
324 |
<p>
|
|
325 |
The harvest report will contain a list of log entries, where each log entry
|
|
326 |
describes an operation that was performed by Harvester. Log entries that
|
|
327 |
show a status value of 1 indicate that an error occurred during the
|
|
328 |
operation, while those that show a status value of 0 indicate that the
|
|
329 |
operation was completed successfully.
|
|
330 |
</p>
|
|
331 |
<P>The Harvester Administrator should review the report, paying particularly
|
|
332 |
close attention to any errors that are reported and to the accompanying error
|
|
333 |
messages that are displayed. When errors are reported at
|
|
334 |
a particular site, the Harvester Administrator should contact the Site
|
|
335 |
Contact to determine the source of the error and its resolution. See
|
|
336 |
<a href=#Reviewing>Reviewing Harvester Reports to the Site Contact</a> for a
|
|
337 |
description of common sources of errors at a Harvest Site.
|
|
338 |
</P>
|
|
339 |
<p>Errors that are independent of a particular site may indicate a problem
|
|
340 |
with Harvester itself, Metacat, or the database connection. Refer to the
|
|
341 |
error message to determine the source of the error and its resolution.
|
|
342 |
</p>
|
|
343 |
<h4>Managing a Harvest Site</h4>
|
|
344 |
A Harvest Site is managed by a Site Contact.
|
|
345 |
The responsibilities of a Site Contact fall into the following categories:
|
|
346 |
<ul>
|
|
347 |
<li><a href=#Registering>Registering with Harvester</a></li>
|
|
348 |
<li><a href=#Composing>Composing a Harvest List</a></li>
|
|
349 |
<li><a href=#Preparing>Preparing EML Documents for harvest</a></li>
|
|
350 |
<li><a href=#Reviewing>Reviewing Harvester reports to the Site Contact</a></li>
|
|
351 |
</ul>
|
|
352 |
<h5><a name="Registering">Registering with Harvester</a></h5>
|
|
353 |
<p>
|
|
354 |
A Site Contact registers a site with Harvester by logging in to the
|
|
355 |
Harvester Registration page and entering several items of information
|
|
356 |
that Harvester needs to know about the site.
|
|
357 |
</p>
|
|
358 |
<ol>
|
|
359 |
<li>Logging in to the Harvester Registration Page
|
|
360 |
<p>
|
|
361 |
The Harvester Registration page is accessed from Metacat. For example, if
|
|
362 |
the Metacat server that you wish to register with resides at the following
|
|
363 |
URL:
|
|
364 |
<pre> http://somehost.somelocation.edu:8080/knb/index.jsp</pre>
|
|
365 |
then the Harvester Registration page would be accessed at:
|
|
366 |
<pre> http://somehost.somelocation.edu:8080/knb/style/skins/dev/harvesterRegistrationLogin.html</pre>
|
|
367 |
</p>
|
|
368 |
<p>
|
|
369 |
After bringing up this page in your browser, login to your Metacat account
|
|
370 |
by entering your username and password.
|
|
371 |
The username should include the full LDAP specification, for example:
|
|
372 |
<pre>
|
|
373 |
Username: uid=jdoe,o=lter,dc=ecoinformatics,dc=org
|
|
374 |
Password: *******
|
|
375 |
</pre>
|
|
376 |
In some cases, a Site Contact may need to login to an anonymous account
|
|
377 |
rather than his or her personal account. For example, a LTER Information
|
|
378 |
Manager may need to login to a dedicated account, named with a three-letter
|
|
379 |
acronym, that has been set up for the LTER site. For example:
|
|
380 |
<pre>
|
|
381 |
Username: uid=GCE,o=lter,dc=ecoinformatics,dc=org
|
|
382 |
Password: *******
|
|
383 |
</pre>
|
|
384 |
is the account login that would be used by the LTER Information Mangager
|
|
385 |
at the GCE (Georgia Coastal Ecosystems) site.
|
|
386 |
</p>
|
|
387 |
</li>
|
|
388 |
<li>Registering with Harvester
|
|
389 |
<p>
|
|
390 |
After logging in, you will be presented with a web form that prompts you
|
|
391 |
to enter information about your site and how often you want to schedule
|
|
392 |
harvests at your site. For example:
|
|
393 |
</p>
|
|
394 |
<pre>
|
|
395 |
Email address: myname@institution.edu
|
|
396 |
Harvest List URL: http://somehost.institution.edu/~myname/harvestList.xml
|
|
397 |
Harvest Frequency (1-99): 2
|
|
398 |
Unit: ( ) day(s) (*) week(s) ( ) month(s)
|
|
399 |
</pre>
|
|
400 |
After values have been entered for each of these fields, click the Register
|
|
401 |
button to register your site with Harvester.
|
|
402 |
</p>
|
|
403 |
<P>
|
|
404 |
In the example shown above, Harvester will attempt to harvest documents from
|
|
405 |
the site once every 2 weeks, it will access the site's Harvest List at URL
|
|
406 |
"http://somehost.institution.edu/~myname/harvestList.xml", and it will send
|
|
407 |
email reports to the Site Contact at email address "myname@institution.edu".
|
|
408 |
</P>
|
|
409 |
</li>
|
|
410 |
<li>Unregistering with Harvester
|
|
411 |
<p>
|
|
412 |
At any time after you have registered with Harvester, you may discontinue
|
|
413 |
harvests at your site by unregistering. Simply login as described above and
|
|
414 |
then click the Unregister button. After doing so, Harvester will discontinue
|
|
415 |
harvests at the site.
|
|
416 |
</p>
|
|
417 |
</li>
|
|
418 |
</ol>
|
|
419 |
<h5><a name="Composing">Composing a Harvest List</a></h5>
|
|
420 |
<p>
|
|
421 |
A Harvest List is an XML file that holds a list of EML documents to be
|
|
422 |
harvested. For each EML document in the list, the following information
|
|
423 |
must be specified:
|
|
424 |
<ul>
|
|
425 |
<li><code>docid</code>, which consists of the:
|
|
426 |
<ul>
|
|
427 |
<li><code>scope</code>, e.g. "demoDocument". The scope is an identifier
|
|
428 |
that indicates which group of documents this document belongs to.
|
|
429 |
</li>
|
|
430 |
<li><code>identifier</code>, e.g. "1". The identifier is a number that
|
|
431 |
uniquely identifies this document within the scope.
|
|
432 |
</li>
|
|
433 |
<li><code>revision</code>, e.g. "5". The revision is a number that
|
|
434 |
indicates the current revision of this document.
|
|
435 |
</li>
|
|
436 |
</ul>
|
|
437 |
</li>
|
|
438 |
<li><code>documentType</code>, e.g. "eml://ecoinformatics.org/eml-2.0.0".
|
|
439 |
The documentType identifies the document as an EML document.</li>
|
|
440 |
<li><code>documentURL</code>, e.g. "http://www.lternet.edu/~dcosta/document1.xml".
|
|
441 |
The documentURL specifies a place where Harvester can locate
|
|
442 |
and retrieve the document via HTTP.</li>
|
|
443 |
</ul>
|
|
444 |
</p>
|
|
445 |
<p>
|
|
446 |
The contents of a Harvest List XML file must conform to a particular
|
|
447 |
XML Schema, as defined in file <a href="../../lib/harvester/harvestList.xsd">
|
|
448 |
harvestList.xsd</a>. The contents of a valid Harvest List
|
|
449 |
can best be illustrated by example. The sample Harvest List
|
|
450 |
below contains two <<code>document</code>> elements that specify the
|
|
451 |
information that Harvester needs to retrieve a pair of EML documents and
|
|
452 |
upload them to Metacat:
|
|
453 |
<pre>
|
|
454 |
<?xml version="1.0" encoding="UTF-8" ?>
|
|
455 |
<hrv:harvestList xmlns:hrv="eml://ecoinformatics.org/harvestList" >
|
|
456 |
<document>
|
|
457 |
<docid>
|
|
458 |
<scope>demoDocument</scope>
|
|
459 |
<identifier>1</identifier>
|
|
460 |
<revision>5</revision>
|
|
461 |
</docid>
|
|
462 |
<documentType>eml://ecoinformatics.org/eml-2.0.0</documentType>
|
|
463 |
<documentURL>http://www.lternet.edu/~dcosta/document1.xml</documentURL>
|
|
464 |
</document>
|
|
465 |
<document>
|
|
466 |
<docid>
|
|
467 |
<scope>demoDocument</scope>
|
|
468 |
<identifier>2</identifier>
|
|
469 |
<revision>1</revision>
|
|
470 |
</docid>
|
|
471 |
<documentType>eml://ecoinformatics.org/eml-2.0.0</documentType>
|
|
472 |
<documentURL>http://www.lternet.edu/~dcosta/document2.xml</documentURL>
|
|
473 |
</document>
|
|
474 |
</hrv:harvestList>
|
|
475 |
</pre>
|
|
476 |
<p>
|
|
477 |
After editing the Harvest List, ensure that the Harvest List XML file resides
|
|
478 |
at the appropriate location on disk as specified by the URL that was entered
|
|
479 |
during the <a href=#Registering>registration</a> process.
|
|
480 |
</p>
|
|
481 |
<h5><a name="Preparing">Preparing EML Documents for harvest</a></h5>
|
|
482 |
<p>
|
|
483 |
To prepare a set of EML documents for harvest, ensure that the following is
|
|
484 |
true for each document:
|
|
485 |
<ul>
|
|
486 |
<li>The document contains valid EML</li>
|
|
487 |
<li>The document is specified in a <document> element in the
|
|
488 |
site's Harvest List, as described above</li>
|
|
489 |
<li>The file resides at the appropriate location on disk as specified
|
|
490 |
by its URL in the Harvest List</li>
|
|
491 |
</ul>
|
|
492 |
</p>
|
|
493 |
<h5><a name="Reviewing" >Reviewing Harvester Reports to the Site Contact</a></h5>
|
|
494 |
<P>
|
|
495 |
After every scheduled harvest that takes place at a particular Harvest
|
|
496 |
Site, Harvester will send an email report to the Site Contact detailing the
|
|
497 |
operations that were performed during the harvest.
|
|
498 |
The report will contain information about the operations that were
|
|
499 |
performed by Harvester at that site, such as
|
|
500 |
which EML documents were harvested and whether any errors were encountered.
|
|
501 |
</P>
|
|
502 |
<P>
|
|
503 |
The Site Contact should review the report, paying particularly
|
|
504 |
close attention to any errors that are reported. Errors are indicated
|
|
505 |
by operations that display a status value of 1, while operations that
|
|
506 |
display a status value of 0 indicate that the operation completed
|
|
507 |
successfully.
|
|
508 |
</P>
|
|
509 |
<p>
|
|
510 |
When errors are reported,
|
|
511 |
the Site Contact should try to determine whether the source of the error
|
|
512 |
is something that can be corrected at the site. Common causes of errors
|
|
513 |
might be:
|
|
514 |
<ul>
|
|
515 |
<li>A document URL specified in the Harvest List does not match
|
|
516 |
the location of the actual EML file on the disk</li>
|
|
517 |
<li>The Harvest List does not contain valid XML as specified in
|
|
518 |
the <a href=../../lib/harvester/harvestList.xsd>harvestList.xsd</a> schema</li>
|
|
519 |
<li>The URL to the Harvest List that was specified during
|
|
520 |
registration with Harvester does not match the actual location of
|
|
521 |
the Harvest List on the disk</li>
|
|
522 |
<li>An EML document that Harvester attempted to upload to Metacat does
|
|
523 |
not contain valid EML</li>
|
|
524 |
</ul>
|
|
525 |
</P>
|
|
526 |
<p>
|
|
527 |
If the Site Contact is unable to determine the cause of the error and its
|
|
528 |
resolution, he or she should contact the Harvester Administrator for assistance.
|
|
529 |
</p>
|
|
530 |
<a href="./properties.html">Back</a> |
|
|
531 |
<a href="./metacattour.html">Home</a> |
|
|
532 |
<a href="./unimplem.html">Next</a>
|
|
533 |
</BODY>
|
|
534 |
</HTML>
|
0 |
535 |
|
Add Harvester documentation to the Metacat Tour