Revision 3075
Added by perry about 18 years ago
spatial_option.html | ||
---|---|---|
255 | 255 |
|
256 | 256 |
|
257 | 257 |
|
258 |
|
|
259 |
|
|
260 |
|
|
261 |
|
|
262 |
|
|
263 |
|
|
264 |
|
|
265 |
|
|
266 |
<br/><br/> |
|
267 |
<hr/><hr/> |
|
268 |
<br/><br/> |
|
269 |
Harvester is managed by the Harvester Administrator. Typically, the same |
|
270 |
individual who manages a Metacat server would also act as the Harvester |
|
271 |
Administrator. The responsibilities of the Harvester Administrator include: |
|
272 |
<ul> |
|
273 |
<li><a href="#Configuring Harvester">Configuring Harvester</a></li> |
|
274 |
<li><a href="#Running Harvester">Running Harvester</a></li> |
|
275 |
<li><a href="#Reviewing Harvester">Reviewing Harvester reports to |
|
276 |
the Harvester Administrator</a></li> |
|
277 |
</ul> |
|
278 |
<h5><a name="Configuring Harvester">Configuring Harvester</a></h5> |
|
279 |
<p>Harvester must be configured to interact with a working Metacat |
|
280 |
installation. Thus, a Metacat installation that has been properly |
|
281 |
configured and installed is a pre-requisite to running Harvester. |
|
282 |
Additionally, Harvester has a number of settable properties that |
|
283 |
control its behavior. All Harvester configuration information is managed |
|
284 |
in a single file, |
|
285 |
<a href=../../lib/metacat.properties>metacat.properties</a>, |
|
286 |
located at: |
|
287 |
<pre> METACAT_HOME/lib/metacat.properties</pre> |
|
288 |
where METACAT_HOME is the top-level directory that Metacat is |
|
289 |
installed in. |
|
290 |
</p> |
|
291 |
<p>Harvester properties are grouped together in |
|
292 |
<a href=../../lib/metacat.properties>metacat.properties</a>, beginning |
|
293 |
after the comment line: |
|
294 |
<pre><code> # Harvester properties</code></pre> |
|
295 |
</p> |
|
296 |
<p>The Harvester Administrator should edit |
|
297 |
<a href=../../lib/metacat.properties>metacat.properties</a>, |
|
298 |
setting appropriate values for the <code><b>harvesterAdministrator</b></code> |
|
299 |
property, the <code><b>smtpServer</b></code> property, and possibly other |
|
300 |
properties. The following table is a summary of each property and its function. |
|
301 |
</p> |
|
302 |
<table border="1"> |
|
303 |
<tr> |
|
304 |
<td><b>Property</b></td> |
|
305 |
<td><b>Description</b></td> |
|
306 |
<td><b>Possible or default value</b></td> |
|
307 |
</tr> |
|
308 |
<tr> |
|
309 |
<td>connectToMetacat</td> |
|
310 |
<td>This property determines whether Harvester should connect to |
|
311 |
Metacat to upload documents. It should be set to <code>true</code> |
|
312 |
under most circumstances. Setting this property to <code>false</code> |
|
313 |
can be useful for testing whether Harvester is able to retrieve |
|
314 |
documents from a site without actually connecting to Metacat to |
|
315 |
upload the documents.</td> |
|
316 |
<td><code>true</code> | <code>false</code><br> |
|
317 |
Default: <code>true</code> |
|
318 |
</tr> |
|
319 |
<tr> |
|
320 |
<td>delay</td> |
|
321 |
<td>The number of hours that Harvester will wait before beginning its |
|
322 |
first harvest. For example, if Harvester is run at 1:00 p.m., and |
|
323 |
the delay is set to 12, Harvester will begin its first harvest at |
|
324 |
1:00 a.m.</td> |
|
325 |
<td>Default: 0</td> |
|
326 |
</tr> |
|
327 |
<tr> |
|
328 |
<td>harvesterAdministrator</td> |
|
329 |
<td>The email address of the Harvester Administrator. Harvester will |
|
330 |
send email reports to this address after every harvest. You may |
|
331 |
enter multiple email addresses by separating each address with |
|
332 |
a comma or semicolon, for example, "name1@abc.edu,name2@abc.edu". |
|
333 |
</td> |
|
334 |
<td>An email address, or multiple email addresses separated by commas |
|
335 |
or semi-colons</td> |
|
336 |
</tr> |
|
337 |
<tr> |
|
338 |
<td>logPeriod</td> |
|
339 |
<td>The number of days that Harvester should retain log entries of harvest |
|
340 |
operations in the database. Harvester log entries record information |
|
341 |
such as which documents were harvested, from which sites, and |
|
342 |
whether any errors were encountered during the harvest. Log entries |
|
343 |
older than <code>logPeriod</code> number of days are purged from the |
|
344 |
database at the end of each harvest.</td> |
|
345 |
<td>Default: 90</td> |
|
346 |
</tr> |
|
347 |
<tr> |
|
348 |
<td>maxHarvests</td> |
|
349 |
<td>The maximum number of harvests that Harvester should execute before |
|
350 |
shutting down. When the Harvester program is executed, it will |
|
351 |
continue running until it has executed <code>maxHarvests</code> |
|
352 |
number of harvests and then the program will terminate. If |
|
353 |
the value of <code>maxHarvests</code> is set to 0 or a negative |
|
354 |
number, it will be ignored and Harvester will execute indefinitely. |
|
355 |
</td> |
|
356 |
<td>Default: 0</td> |
|
357 |
</tr> |
|
358 |
<tr> |
|
359 |
<td>period</td> |
|
360 |
<td>The number of hours between harvests. Harvester will run a new |
|
361 |
harvest every <code>period</code> number of hours, until the |
|
362 |
<code>maxHarvests</code> number of harvests have been run, or |
|
363 |
indefinitely if <code>maxHarvests</code> is set to a value of |
|
364 |
0 or a negative number. |
|
365 |
<td>Default: 24</td> |
|
366 |
</tr> |
|
367 |
<tr> |
|
368 |
<td>smtpServer</td> |
|
369 |
<td>The SMTP server that Harvester uses for sending email messages |
|
370 |
to the Harvester Administrator and to Site Contacts.</td> |
|
371 |
<td>A host name, for example: <code>somehost.institution.edu</code> |
|
372 |
<br><br> |
|
373 |
Default: <code>localhost</code> |
|
374 |
<br><br> |
|
375 |
Note that the default value will only work if the Harvester |
|
376 |
host machine has been configured as a SMTP server. |
|
377 |
</td> |
|
378 |
</tr> |
|
379 |
<tr> |
|
380 |
<td>Harvester Operation Properties (GetDocError, GetDocSuccess, etc.)</td> |
|
381 |
<td>This group of properties is used by Harvester to report information |
|
382 |
about the operations it performs for inclusion in log |
|
383 |
entries and email messages. Under most circumstances the values |
|
384 |
of these properties should not be modified.</td> |
|
385 |
<td> </td> |
|
386 |
</tr> |
|
387 |
</table> |
|
388 |
<br> |
|
389 |
<h5><a name="Running Harvester">Running Harvester</a></h5> |
|
390 |
After Harvester has been appropriately |
|
391 |
<a href="#Configuring Harvester">configured</a>, |
|
392 |
it can be run in either of two ways: (A) in a command window, or, (B) |
|
393 |
as a servlet. If you wish only to test that Harvester is functioning, |
|
394 |
or if you expect to use Harvester infrequently, it may desirable to run it from a |
|
395 |
command window. However, under most circumstances you will want Harvester to |
|
396 |
run continuously as a background servlet process. This eliminates the |
|
397 |
need to keep a command window continuously open while Harvester is running. |
|
398 |
Both of these procedures are described below. |
|
399 |
<ul> |
|
400 |
<li> (A) Running Harvester in a Command Window |
|
401 |
<ol> |
|
402 |
<li>Open a system command window or terminal window.</li> |
|
403 |
<li>Set the METACAT_HOME environment variable to the value of the Metacat |
|
404 |
installation directory. Some examples follow: |
|
405 |
<ul> |
|
406 |
<li>On Windows: |
|
407 |
<pre>set METACAT_HOME=C:\somePath\metacat</pre></li> |
|
408 |
<li>On Linux/Unix (bash shell): |
|
409 |
<pre>export METACAT_HOME=/home/somePath/metacat</pre></li> |
|
410 |
</ul> |
|
411 |
<li>cd to the following directory: |
|
412 |
<ul> |
|
413 |
<li>On Windows: |
|
414 |
<pre>cd %METACAT_HOME%\lib\harvester</pre></li> |
|
415 |
<li>On Linux/Unix: |
|
416 |
<pre>cd $METACAT_HOME/lib/harvester</pre></li> |
|
417 |
</ul> |
|
418 |
<li>Run the appropriate Harvester shell script, as determined by the |
|
419 |
operating system: |
|
420 |
<ul> |
|
421 |
<li>On Windows: |
|
422 |
<pre>runHarvester.bat</pre></li> |
|
423 |
<li>On Linux/Unix: |
|
424 |
<pre>sh runHarvester.sh</pre></li> |
|
425 |
</ul> |
|
426 |
</li> |
|
427 |
</ol> |
|
428 |
<p>The Harvester application will start executing. It will begin its first |
|
429 |
harvest after <code><b>delay</b></code> number of hours (as specified in the |
|
430 |
<a href=../../lib/metacat.properties>metacat.properties</a> |
|
431 |
file). The application will continue running a new harvest every |
|
432 |
<code><b>period</b></code> number of hours until a <code><b>maxHarvests</b></code> |
|
433 |
number of harvests have been completed (if <code><b>maxHarvests</b></code> is set |
|
434 |
to a value greater than 0), or until you interrupt the process by hitting CTRL/C |
|
435 |
in the command window. |
|
436 |
</p> |
|
437 |
</li> |
|
438 |
<li> (B) Running Harvester as a Servlet |
|
439 |
<ol> |
|
440 |
<li>Edit the file in your Metcat installation, <code>lib/web.xml.<em>tomcatN</em></code>, where <em>tomcatN</em> corresponds to the |
|
441 |
version of Tomcat you are running. For example, if you are running Tomcat 5, |
|
442 |
edit file <code>lib/web.xml.tomcat5</code>.</li> |
|
443 |
<li>Remove the comment symbols around the HarvesterServlet entry, so that: |
|
444 |
<pre><code> |
|
445 |
<!-- |
|
446 |
<servlet> |
|
447 |
<servlet-name>HarvesterServlet</servlet-name> |
|
448 |
<servlet-class>edu.ucsb.nceas.metacat.harvesterClient.HarvesterServlet</servlet-class> |
|
449 |
<init-param> |
|
450 |
<param-name>debug</param-name> |
|
451 |
<param-value>1</param-value> |
|
452 |
</init-param> |
|
453 |
<init-param> |
|
454 |
<param-name>listings</param-name> |
|
455 |
<param-value>true</param-value> |
|
456 |
</init-param> |
|
457 |
<load-on-startup>1</load-on-startup> |
|
458 |
</servlet> |
|
459 |
--> |
|
460 |
</code></pre> |
|
461 |
is changed to: |
|
462 |
<pre><code> |
|
463 |
<servlet> |
|
464 |
<servlet-name>HarvesterServlet</servlet-name> |
|
465 |
<servlet-class>edu.ucsb.nceas.metacat.harvesterClient.HarvesterServlet</servlet-class> |
|
466 |
<init-param> |
|
467 |
<param-name>debug</param-name> |
|
468 |
<param-value>1</param-value> |
|
469 |
</init-param> |
|
470 |
<init-param> |
|
471 |
<param-name>listings</param-name> |
|
472 |
<param-value>true</param-value> |
|
473 |
</init-param> |
|
474 |
<load-on-startup>1</load-on-startup> |
|
475 |
</servlet> |
|
476 |
</code></pre> |
|
477 |
Save the edited file. |
|
478 |
</li> |
|
479 |
<li>Shutdown Tomcat.</li> |
|
480 |
<li>Redeploy Metacat by running the following two ant commands from the top-level |
|
481 |
directory of your Metacat installation: |
|
482 |
<code><pre> |
|
483 |
ant cleanweb |
|
484 |
ant install</code></pre> |
|
485 |
</li> |
|
486 |
<li>Restart Tomcat.</li> |
|
487 |
</ol> |
|
488 |
<p>About thirty seconds after you restart Tomcat, the Harvester servlet will |
|
489 |
start executing. It will begin its first |
|
490 |
harvest after <code><b>delay</b></code> number of hours (as specified in the |
|
491 |
<a href=../../lib/metacat.properties>metacat.properties</a> |
|
492 |
file). The servlet will continue running a new harvest every |
|
493 |
<code><b>period</b></code> number of hours until a <code><b>maxHarvests</b></code> |
|
494 |
number of harvests have been completed (if <code><b>maxHarvests</b></code> is set |
|
495 |
to a value greater than 0), or until Tomcat shuts down. |
|
496 |
</p> |
|
497 |
</li> |
|
498 |
<h5><a name="Reviewing Harvester"> |
|
499 |
Reviewing Harvester Reports to the Harvester Administrator</a></h5> |
|
500 |
<P> |
|
501 |
After every harvest, Harvester will send an email report to the Harvester |
|
502 |
Administrator detailing the operations that were performed during the |
|
503 |
harvest. The report will contain information about each of the Harvest Sites |
|
504 |
that were harvested from, such as which EML documents were |
|
505 |
harvested and whether any errors were encountered. |
|
506 |
</P> |
|
507 |
<p> |
|
508 |
The harvest report will contain a list of log entries, where each log entry |
|
509 |
describes an operation that was performed by Harvester. Log entries that |
|
510 |
show a status value of 1 indicate that an error occurred during the |
|
511 |
operation, while those that show a status value of 0 indicate that the |
|
512 |
operation was completed successfully. |
|
513 |
</p> |
|
514 |
<P>The Harvester Administrator should review the report, paying particularly |
|
515 |
close attention to any errors that are reported and to the accompanying error |
|
516 |
messages that are displayed. When errors are reported at |
|
517 |
a particular site, the Harvester Administrator should contact the Site |
|
518 |
Contact to determine the source of the error and its resolution. See |
|
519 |
<a href=#Reviewing>Reviewing Harvester Reports to the Site Contact</a> for a |
|
520 |
description of common sources of errors at a Harvest Site. |
|
521 |
</P> |
|
522 |
<p>Errors that are independent of a particular site may indicate a problem |
|
523 |
with Harvester itself, Metacat, or the database connection. Refer to the |
|
524 |
error message to determine the source of the error and its resolution. |
|
525 |
</p> |
|
526 |
<h4>Managing a Harvest Site</h4> |
|
527 |
A Harvest Site is managed by a Site Contact. |
|
528 |
The responsibilities of a Site Contact fall into the following categories: |
|
529 |
<ul> |
|
530 |
<li><a href=#Registering>Registering with Harvester</a></li> |
|
531 |
<li><a href=#Composing>Composing a Harvest List</a></li> |
|
532 |
<li><a href=#Preparing>Preparing EML Documents for harvest</a></li> |
|
533 |
<li><a href=#Reviewing>Reviewing Harvester reports to the Site Contact</a></li> |
|
534 |
</ul> |
|
535 |
<h5><a name="Registering">Registering with Harvester</a></h5> |
|
536 |
<p> |
|
537 |
A Site Contact registers a site with Harvester by logging in to the |
|
538 |
Harvester Registration page and entering several items of information |
|
539 |
that Harvester needs to know about the site. |
|
540 |
</p> |
|
541 |
<ol> |
|
542 |
<li>Logging in to the Harvester Registration Page |
|
543 |
<p> |
|
544 |
The Harvester Registration page is accessed from Metacat. For example, if |
|
545 |
the Metacat server that you wish to register with resides at the following |
|
546 |
URL: |
|
547 |
<pre> http://somehost.somelocation.edu:8080/knb/index.jsp</pre> |
|
548 |
then the Harvester Registration page would be accessed at: |
|
549 |
<pre> http://somehost.somelocation.edu:8080/knb/style/skins/knb/harvesterRegistrationLogin.html</pre> |
|
550 |
</p> |
|
551 |
<p> |
|
552 |
After bringing up this page in your browser, login to your Metacat account |
|
553 |
by entering your username, organization, and password. For example: |
|
554 |
<table bgcolor="#ffffff" border="0" cellpadding="2" width='100%' > |
|
555 |
<tr > |
|
556 |
<td colspan=3 align=center > </td> |
|
557 |
</tr> |
|
558 |
<tr > |
|
559 |
<td colspan=3 align=center > |
|
560 |
<font face=verdana size=1%> |
|
561 |
<b>Please Enter Username, Organization, and Password </b> |
|
562 |
</font> |
|
563 |
</td> |
|
564 |
</tr> |
|
565 |
<tr> |
|
566 |
<td width='10%'> </td> |
|
567 |
<td width="25%" bgcolor="#4682b4"> |
|
568 |
<p align="center"> |
|
569 |
<font color="white" face=verdana size=2%> |
|
570 |
<b>Username</b> |
|
571 |
</font> |
|
572 |
</td> |
|
573 |
<td><p><input type="text" name="uid" value="jdoe" maxlength="100" size="28"></td> |
|
574 |
</tr> |
|
575 |
<tr> |
|
576 |
<td width='10%'> </td> |
|
577 |
<td width="25%" bgcolor="#4682b4"> |
|
578 |
<p align="center"> |
|
579 |
<font color="white" face=verdana size=2%> |
|
580 |
<b>Organization</b> |
|
581 |
</font> |
|
582 |
</td> |
|
583 |
<td> |
|
584 |
<input type="radio" name="o" value="NCEAS" checked>NCEAS |
|
585 |
<input type="radio" name="o" value="LTER">LTER |
|
586 |
<input type="radio" name="o" value="NRS">NRS |
|
587 |
<br> |
|
588 |
<input type="radio" name="o" value="PISCO">PISCO |
|
589 |
<input type="radio" name="o" value="OBFS">OBFS |
|
590 |
<input type="radio" name="o" value="Unaffiliated">Unaffiliated |
|
591 |
</tr> |
|
592 |
<tr> |
|
593 |
<td width='10%'> </td> |
|
594 |
<td bgcolor="#4682b4"> |
|
595 |
<p align="center"> |
|
596 |
<font color="white" face=verdana size=2%> |
|
597 |
<b>Password</b> |
|
598 |
</font> |
|
599 |
</td> |
|
600 |
<td><p><input type="password" name="passwd" value="*******" maxlength="60" size="28"> |
|
601 |
</td> |
|
602 |
</tr> |
|
603 |
<tr> |
|
604 |
<td colspan=3 align=center > </td> |
|
605 |
</tr> |
|
606 |
</table> |
|
607 |
In some cases, a Site Contact may need to login to an anonymous account |
|
608 |
rather than his or her personal account. For example, a LTER Information |
|
609 |
Manager may need to login to a dedicated account, named with a three-letter |
|
610 |
acronym, that has been set up for the LTER site. The username |
|
611 |
"GCE" would be used by the LTER Information Mangager at the GCE (Georgia |
|
612 |
Coastal Ecosystems) site. |
|
613 |
</p> |
|
614 |
</li> |
|
615 |
<li>Registering with Harvester |
|
616 |
<p> |
|
617 |
After logging in, you will be presented with a web form that prompts you |
|
618 |
to enter information about your site and how often you want to schedule |
|
619 |
harvests at your site. For example: |
|
620 |
<table bgcolor="#ffffff" border="0" cellpadding="2" width='100%' > |
|
621 |
<tr > |
|
622 |
<td colspan=3 align=center > </td> |
|
623 |
</tr> |
|
624 |
<tr > |
|
625 |
<td colspan=3 align=center > |
|
626 |
<font face=verdana size=1%> |
|
627 |
<b>Metacat Harvester Registration </b> |
|
628 |
</font> |
|
629 |
</td> |
|
630 |
</tr> |
|
631 |
<tr> |
|
632 |
<td width='10%'> </td> |
|
633 |
<td width="25%" bgcolor="#4682b4"> |
|
634 |
<p align="center"> |
|
635 |
<font color="white" face=verdana size=2%> |
|
636 |
<b>Email address:</b> |
|
637 |
</font> |
|
638 |
</td> |
|
639 |
<td><p><input type="text" size="55" name="uid" value="myname@institution.edu" maxlength="100" size="28"></td> |
|
640 |
</tr> |
|
641 |
<tr> |
|
642 |
<td width='10%'> </td> |
|
643 |
<td bgcolor="#4682b4"> |
|
644 |
<p align="center"> |
|
645 |
<font color="white" face=verdana size=2%> |
|
646 |
<b>Harvest List URL:</b> |
|
647 |
</font> |
|
648 |
</td> |
|
649 |
<td><p><input type="text" size="55" name="passwd" value="http://somehost.institution.edu/~myname/harvestList.xml" maxlength="60" size="28"> |
|
650 |
</td> |
|
651 |
</tr> |
|
652 |
<tr> |
|
653 |
<td colspan=3 align=center > </td> |
|
654 |
</tr> |
|
655 |
<tr> |
|
656 |
<td width='10%'> </td> |
|
657 |
<td bgcolor="#4682b4"> |
|
658 |
<p align="center"> |
|
659 |
<font color="white" face=verdana size=2%> |
|
660 |
<b>Harvest Frequency (1-99):</b> |
|
661 |
</font> |
|
662 |
</td> |
|
663 |
<td><p><input type="text" size="3" name="passwd" value="2" maxlength="60" size="28"> |
|
664 |
</td> |
|
665 |
</tr> |
|
666 |
<tr> |
|
667 |
<td colspan=3 align=center > </td> |
|
668 |
</tr> |
|
669 |
<tr> |
|
670 |
<td width='10%'> </td> |
|
671 |
<td width="25%" bgcolor="#4682b4"> |
|
672 |
<p align="center"> |
|
673 |
<font color="white" face=verdana size=2%> |
|
674 |
<b>Unit:</b> |
|
675 |
</font> |
|
676 |
</td> |
|
677 |
<td> |
|
678 |
<input type="radio" name="o" value="days" >day(s) |
|
679 |
<input type="radio" name="o" value="weeks" checked>week(s) |
|
680 |
<input type="radio" name="o" value="months">month(s) |
|
681 |
</tr> |
|
682 |
</table> |
|
683 |
<p> |
|
684 |
After values have been entered for each of these fields, click the Register |
|
685 |
button to register your site with Harvester. |
|
686 |
</p> |
|
687 |
<P> |
|
688 |
In the example shown above, Harvester will attempt to harvest documents from |
|
689 |
the site once every 2 weeks, it will access the site's Harvest List at URL |
|
690 |
"http://somehost.institution.edu/~myname/harvestList.xml", and it will send |
|
691 |
email reports to the Site Contact at email address "myname@institution.edu". |
|
692 |
</P> |
|
693 |
<P> |
|
694 |
Note that you may enter multiple email addresses by separating each |
|
695 |
address with a comma or a semi-colon. For example, |
|
696 |
"myname@institution.edu,anothername@institution.edu". |
|
697 |
</P> |
|
698 |
</li> |
|
699 |
<li>Unregistering with Harvester |
|
700 |
<p> |
|
701 |
At any time after you have registered with Harvester, you may discontinue |
|
702 |
harvests at your site by unregistering. Simply login as described above and |
|
703 |
then click the Unregister button. After doing so, Harvester will discontinue |
|
704 |
harvests at the site. |
|
705 |
</p> |
|
706 |
</li> |
|
707 |
</ol> |
|
708 |
<h5><a name="Composing">Composing a Harvest List</a></h5> |
|
709 |
<p> |
|
710 |
A Harvest List is an XML file that holds a list of EML documents to be |
|
711 |
harvested. For each EML document in the list, the following information |
|
712 |
must be specified: |
|
713 |
<ul> |
|
714 |
<li><code>docid</code>, which consists of the: |
|
715 |
<ul> |
|
716 |
<li><code>scope</code>, e.g. "demoDocument". The scope is an identifier |
|
717 |
that indicates which group of documents this document belongs to. |
|
718 |
</li> |
|
719 |
<li><code>identifier</code>, e.g. "1". The identifier is a number that |
|
720 |
uniquely identifies this document within the scope. |
|
721 |
</li> |
|
722 |
<li><code>revision</code>, e.g. "5". The revision is a number that |
|
723 |
indicates the current revision of this document. |
|
724 |
</li> |
|
725 |
</ul> |
|
726 |
</li> |
|
727 |
<li><code>documentType</code>, e.g. "eml://ecoinformatics.org/eml-2.0.0". |
|
728 |
The documentType identifies the document as an EML document.</li> |
|
729 |
<li><code>documentURL</code>, e.g. "http://www.lternet.edu/~dcosta/document1.xml". |
|
730 |
The documentURL specifies a place where Harvester can locate |
|
731 |
and retrieve the document via HTTP.</li> |
|
732 |
</ul> |
|
733 |
</p> |
|
734 |
<p> |
|
735 |
The contents of a Harvest List XML file must conform to a particular |
|
736 |
XML Schema, as defined in file <a href="../../lib/harvester/harvestList.xsd"> |
|
737 |
harvestList.xsd</a>. The contents of a valid Harvest List |
|
738 |
can best be illustrated by example. The sample Harvest List |
|
739 |
below contains two <<code>document</code>> elements that specify the |
|
740 |
information that Harvester needs to retrieve a pair of EML documents and |
|
741 |
upload them to Metacat: |
|
742 |
<pre> |
|
743 |
<?xml version="1.0" encoding="UTF-8" ?> |
|
744 |
<hrv:harvestList xmlns:hrv="eml://ecoinformatics.org/harvestList" > |
|
745 |
<document> |
|
746 |
<docid> |
|
747 |
<scope>demoDocument</scope> |
|
748 |
<identifier>1</identifier> |
|
749 |
<revision>5</revision> |
|
750 |
</docid> |
|
751 |
<documentType>eml://ecoinformatics.org/eml-2.0.0</documentType> |
|
752 |
<documentURL>http://www.lternet.edu/~dcosta/document1.xml</documentURL> |
|
753 |
</document> |
|
754 |
<document> |
|
755 |
<docid> |
|
756 |
<scope>demoDocument</scope> |
|
757 |
<identifier>2</identifier> |
|
758 |
<revision>1</revision> |
|
759 |
</docid> |
|
760 |
<documentType>eml://ecoinformatics.org/eml-2.0.0</documentType> |
|
761 |
<documentURL>http://www.lternet.edu/~dcosta/document2.xml</documentURL> |
|
762 |
</document> |
|
763 |
</hrv:harvestList> |
|
764 |
</pre> |
|
765 |
<p> |
|
766 |
After editing the Harvest List, ensure that the Harvest List XML file resides |
|
767 |
at the appropriate location on disk as specified by the URL that was entered |
|
768 |
during the <a href=#Registering>registration</a> process. |
|
769 |
</p> |
|
770 |
<p> |
|
771 |
The <a href=./harvestListEditor.html>Harvest List Editor</a> is a tool that |
|
772 |
assists in composing and editing a Harvest List. (Click |
|
773 |
<a href=./harvestListEditor.html>here</a> for additional details.) |
|
774 |
</p> |
|
775 |
<h5><a name="Preparing">Preparing EML Documents for harvest</a></h5> |
|
776 |
<p> |
|
777 |
To prepare a set of EML documents for harvest, ensure that the following is |
|
778 |
true for each document: |
|
779 |
<ul> |
|
780 |
<li>The document contains valid EML</li> |
|
781 |
<li>The document is specified in a <document> element in the |
|
782 |
site's Harvest List, as described above</li> |
|
783 |
<li>The file resides at the appropriate location on disk as specified |
|
784 |
by its URL in the Harvest List</li> |
|
785 |
</ul> |
|
786 |
</p> |
|
787 |
<h5><a name="Reviewing" >Reviewing Harvester Reports to the Site Contact</a></h5> |
|
788 |
<P> |
|
789 |
After every scheduled harvest that takes place at a particular Harvest |
|
790 |
Site, Harvester will send an email report to the Site Contact detailing the |
|
791 |
operations that were performed during the harvest. |
|
792 |
The report will contain information about the operations that were |
|
793 |
performed by Harvester at that site, such as |
|
794 |
which EML documents were harvested and whether any errors were encountered. |
|
795 |
</P> |
|
796 |
<P> |
|
797 |
The Site Contact should review the report, paying particularly |
|
798 |
close attention to any errors that are reported. Errors are indicated |
|
799 |
by operations that display a status value of 1, while operations that |
|
800 |
display a status value of 0 indicate that the operation completed |
|
801 |
successfully. |
|
802 |
</P> |
|
803 |
<p> |
|
804 |
When errors are reported, |
|
805 |
the Site Contact should try to determine whether the source of the error |
|
806 |
is something that can be corrected at the site. Common causes of errors |
|
807 |
might be: |
|
808 |
<ul> |
|
809 |
<li>A document URL specified in the Harvest List does not match |
|
810 |
the location of the actual EML file on the disk</li> |
|
811 |
<li>The Harvest List does not contain valid XML as specified in |
|
812 |
the <a href=../../lib/harvester/harvestList.xsd>harvestList.xsd</a> schema</li> |
|
813 |
<li>The URL to the Harvest List that was specified during |
|
814 |
registration with Harvester does not match the actual location of |
|
815 |
the Harvest List on the disk</li> |
|
816 |
<li>An EML document that Harvester attempted to upload to Metacat does |
|
817 |
not contain valid EML</li> |
|
818 |
</ul> |
|
819 |
</P> |
|
820 |
<p> |
|
821 |
If the Site Contact is unable to determine the cause of the error and its |
|
822 |
resolution, he or she should contact the Harvester Administrator for assistance. |
|
823 |
</p> |
|
824 |
<a href="./properties.html">Back</a> | |
|
825 |
<a href="./metacattour.html">Home</a> | |
|
826 |
<a href="./unimplem.html">Next</a> |
|
827 | 258 |
</BODY> |
828 | 259 |
</HTML> |
Also available in: Unified diff
initial import of html versions of spatial option documentation