Revision 6877
Added by Matt Jones about 13 years ago
docs/user/metacat/source/oaipmh.rst | ||
---|---|---|
1 | 1 |
OAI Protocol for Metadata Harvesting |
2 | 2 |
==================================== |
3 | 3 |
|
4 |
Chapter to be written. |
|
4 |
The Open Archives Initiative Protocol for Metadata Harvesting (`OAI-PMH`_) was first |
|
5 |
developed in the late 1990's as a standard for harvesting metadata from |
|
6 |
distributed metadata/data repositories. The current version of the OAI-PMH |
|
7 |
standard is 2.0 as of June 2002, with minor updates in December 2008. |
|
5 | 8 |
|
9 |
.. _OAI-PMH: http://www.openarchives.org/pmh/ |
|
10 |
|
|
11 |
The OAI-PMH standard uses the Hypertext Transport Protocol (HTTP) as a |
|
12 |
transport layer and specifies six query methods (called verbs) that must be |
|
13 |
supported by an OAI-PMH compliant data provider (also referred to as a |
|
14 |
repository). These methods are: |
|
15 |
|
|
16 |
1. ``GetRecord`` – retrieves zero or one complete metadata record from a repository; |
|
17 |
2. ``Identify`` – retrieves information about a repository; |
|
18 |
3. ``ListIdentifiers`` – retrieves zero or more metadata record “headers” (not the complete metadata record) from a repository; |
|
19 |
4. ``ListMetadataFormats`` – retrieves a list of available metadata record formats supported by a repository; |
|
20 |
5. ``ListRecords`` – retrieves zero or more complete metadata records from a respository; and |
|
21 |
6. ``ListSets`` – retrieves the set structure from a repository. |
|
22 |
|
|
23 |
The OAI-PMH compliant data provider must accept requests from both HTTP GET |
|
24 |
and HTTP POST request methods. Responses from the data provider must be |
|
25 |
returned as an XML-encoded (version 1.0) stream. Error handling must be |
|
26 |
supported by the data provider and return the correct error response code |
|
27 |
back to the harvester. Detailed specifications and examples of all six verbs |
|
28 |
may be viewed in Section 4 of the `OAI-PMH standards document`_. |
|
29 |
|
|
30 |
.. _OAI-PMH standards document: http://www.openarchives.org/OAI/openarchivesprotocol.html |
|
31 |
|
|
32 |
EML and Dublin Core |
|
33 |
------------------- |
|
34 |
The OAI-PMH requires that unqualified Dublin Core metadata be supported as a |
|
35 |
minimum. Although EML generally provides more fine-grained metadata than Dublin |
|
36 |
Core, the two metadata standards do share many of the same (or similar) content |
|
37 |
elements. Transformations from EML to Dublin Core performed by Metacat OAI-PMH |
|
38 |
produce *simple* or *unqualified* Dublin Core, which is associated with the reserved |
|
39 |
metadataPrefix symbol ``oai_dc`` in the OAI-PMH. |
|
40 |
|
|
41 |
The following table summarizes the element mappings of the EML to Dublin Core |
|
42 |
crosswalk performed by Metacat OAI-PMH, including notes specific to each |
|
43 |
element mapping. |
|
44 |
|
|
45 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
46 |
| EML Element | DC Element | Notes | |
|
47 |
+=======================================+=============+=================================================================================================================================================+ |
|
48 |
| Title | title | | |
|
49 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
50 |
| Creator | creator | Use only the creator's name (givenName and surName elements); could be an organization name | |
|
51 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
52 |
| keyword | subject | One subject element per keyword element | |
|
53 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
54 |
| abstract | description | Must extract text formatting tags | |
|
55 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
56 |
| publisher | publisher | Use only the publisher's name (givenName and surName elements); could be an organization name | |
|
57 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
58 |
| associatedParty | contributor | Use only the party's name (givenName and surName); could be an organization name | |
|
59 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
60 |
| pubDate | date | One-to-one mapping | |
|
61 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
62 |
| dataset, citation, protocol, software | type | Type value is determined by the type of EML document rather than by a specific field value | |
|
63 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
64 |
| physical | format | Use a mime type as the Format value? For example, if EML has <textFormat> element within <physical>, then use 'text/plain' as the Format value? | |
|
65 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
66 |
| (1) packageId; | identifier | packageId can be used as the value of one identifier element; | |
|
67 |
| (2) URL to the EML document | | a second identifier element can hold a URL to the EML document | |
|
68 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
69 |
| dataSource | source | Use the document URL of the referenced data source? | |
|
70 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
71 |
| Citation | relation | Use the document URL of the referenced citation? | |
|
72 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
73 |
| geographicCoverage | coverage | Add separate coverage elements for geographic description and geographic bounding coordinates. | |
|
74 |
| | | For bounding coordinates, use minimal labeling, for example: | |
|
75 |
| | | 81.505000 W, 81.495000 W, | |
|
76 |
| | | 31.170000 N, 31.163000 N | |
|
77 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
78 |
| taxonomicCoverage | coverage | Use only genus/species binomials; place each binomial in a separate coverage element | |
|
79 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
80 |
| temporalCoverage | coverage | Include begin date and end date when available. For example: | |
|
81 |
| | | 1915-01-01 to 2004-12-31 | |
|
82 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
83 |
| intellectualRights | rights | Must extract text formatting tags | |
|
84 |
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
85 |
|
|
86 |
Metacat OAI-PMH includes a set of XSLT stylesheets used for converting specific |
|
87 |
versions of EML to their Dublin Core equivalents. |
|
88 |
|
|
89 |
Metacat OAI-PMH Service Interfaces |
|
90 |
---------------------------------- |
|
91 |
Metacat includes support for two OAI-PMH service interfaces: a data provider |
|
92 |
(or repository) service interface and a harvester service interface. |
|
93 |
|
|
94 |
Data Provider |
|
95 |
~~~~~~~~~~~~~ |
|
96 |
The Metacat OAI-PMH Data Provider service interface supports all six OAI-PMH |
|
97 |
methods (GetRecord, Identify, ListIdentifiers, ListMetadataFormats, ListRecords, |
|
98 |
and ListSets) as defined in the OAI-PMH Version 2 Specification through a |
|
99 |
standard HTTP URL that accepts both HTTP GET and HTTP POST requests. |
|
100 |
|
|
101 |
The Metacat OAI-PMH Data Provider service was implemented using the Online |
|
102 |
Computer Library Center (OCLC) OAICat Open Source Software as the basis for |
|
103 |
its implementation, with customizations added to facilitate integration with |
|
104 |
Metacat. |
|
105 |
|
|
106 |
Users of the Metacat OAI-PMH Data Provider should be aware of the following issues: |
|
107 |
|
|
108 |
* 'Deleted' Status – OAI-PMH repositories can optionally flag records with |
|
109 |
a 'deleted' status, indicating that a record in the metadata format |
|
110 |
specified by the metadataPrefix is no longer available. Since Metacat does |
|
111 |
not provide a mechanism for retrieving a list of deleted documents, the use |
|
112 |
of the 'deleted' status is not supported in this implementation of the |
|
113 |
OAI-PMH Data Provider. This represents a possible future enhancement. |
|
114 |
* Sets – OAI-PMH repositories can optionally support set hierarchies. Since it |
|
115 |
has not been determined how set hierarchies should be structured in |
|
116 |
Metacat, this implementation of the OAI-PMH repository does not support |
|
117 |
set hierarchies. This represents a possible future enhancement. |
|
118 |
* Datestamp Granularity – When expressing datestamps for repository documents, |
|
119 |
OAI-PMH allows two levels of granularity: day granularity and seconds |
|
120 |
granularity. Since the Metacat database stores the value of its |
|
121 |
``xml_documents.date_updated`` field in day granularity, it is the level |
|
122 |
that is supported by the Metacat OAI-PMH Data Provider. |
|
123 |
|
|
124 |
Metacat OAI-PMH Harvester |
|
125 |
~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
126 |
The Metacat OAI-PMH Harvester service interface utilizes OAI-PMH methods to |
|
127 |
request metadata or related information from an OAI-PMH-compliant data provider |
|
128 |
using a standard HTTP URL in either an HTTP-GET or HTTP-POST request. |
|
129 |
|
|
130 |
The Metacat OAI-PMH Harvester client was implemented using OCLC's |
|
131 |
OAIHarvester2 open source code as its base implementation, with customizations |
|
132 |
as needed to support integration with Metacat. |
|
133 |
|
|
134 |
Users of the Metacat OAI-PMH Harvester should be aware of the following issues: |
|
135 |
|
|
136 |
* Handling of 'Deleted' status – The Metacat OAI-PMH Harvester program does |
|
137 |
check to see whether a 'deleted' status is flagged for a harvested document, |
|
138 |
and if it is, the document is correspondingly deleted from the Metacat repository. |
|
139 |
* Datestamp Granularity – When expressing datestamps for repository documents, |
|
140 |
OAI-PMH allows two levels of granularity – day granularity and seconds |
|
141 |
granularity. Since the Metacat database stores the value of its |
|
142 |
``xml_documents.last_updated`` field in day granularity, it is also the |
|
143 |
level that is supported by both the Metacat OAI-PMH Data Provider and the |
|
144 |
Metacat OAI-PMH Harvester. This has implications when Metacat OAI-PMH |
|
145 |
Harvester (MOH) interacts with data providers such as the Dryad repository, |
|
146 |
which stores its documents with seconds granularity. For example, consider |
|
147 |
the following sequence of events: |
|
148 |
|
|
149 |
1. On January 1, 2010, MOH harvests a document from the Dryad repository |
|
150 |
with datestamp '2010-01-01T10:00:00Z', and stores its local copy with |
|
151 |
datestamp '2010-01-01'. |
|
152 |
2. Later that same day, the Dryad repository updates the document to a |
|
153 |
newer revision, with a new datestamp such as '2010-01-01T20:00:0Z'. |
|
154 |
3. On the following day, MOH runs another harvest. It determines that it |
|
155 |
has a local copy of the document with datestamp '2010-01-01' and does |
|
156 |
not re-harvest the document, despite the fact that its local copy is not |
|
157 |
the latest revision. |
|
158 |
|
|
159 |
Configuring and Running Metacat OAI-PMH |
|
160 |
--------------------------------------- |
|
161 |
|
|
162 |
Metacat OAI-PMH Data Provider Servlet |
|
163 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
164 |
To configure and enable the Data Provider servlet: |
|
165 |
|
|
166 |
1. Stop Tomcat and edit the Metacat properties (``metacat.properties``) file in |
|
167 |
the Metacat context directory inside the Tomcat application directory. |
|
168 |
The Metacat context directory is the name of the application (usually ``knb``): |
|
169 |
|
|
170 |
:: |
|
171 |
|
|
172 |
<tomcat_app_dir>/<context_dir>/WEB-INF/metacat.properties |
|
173 |
|
|
174 |
2. Change the following properties appropriately: |
|
175 |
|
|
176 |
:: |
|
177 |
|
|
178 |
``oaipmh.repositoryIdentifier`` – A string that identifies this repository |
|
179 |
``Identify.adminEmail`` – The email address of the repository administrator |
|
180 |
|
|
181 |
3. Edit the deployment descriptor (``web.xml``) file, also in the WEB-INF |
|
182 |
directory. Uncomment the servlet-name and servlet-mapping entries for the |
|
183 |
DataProvider servlet by removing the surrounding “<!--“ and “-->” strings: |
|
184 |
|
|
185 |
:: |
|
186 |
|
|
187 |
<servlet> |
|
188 |
<servlet-name>DataProvider</servlet-name> |
|
189 |
<description>Processes OAI verbs for Metacat OAI-PMH Data Provider (MODP)</description> |
|
190 |
<servlet-class>edu.ucsb.nceas.metacat.oaipmh.provider.server.OAIHandler</servlet-class> |
|
191 |
<load-on-startup>4</load-on-startup> |
|
192 |
</servlet> |
|
193 |
<servlet-mapping> |
|
194 |
<servlet-name>DataProvider</servlet-name> |
|
195 |
<url-pattern>/dataProvider</url-pattern> |
|
196 |
</servlet-mapping> |
|
197 |
|
|
198 |
4. Save the ``metacat.properties`` and ``web.xml`` files and start Tomcat. |
|
199 |
|
|
200 |
The following table describes the complete set of ``metacat.properties`` |
|
201 |
settings that are used by the DataProvider servlet. |
|
202 |
|
|
203 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
204 |
| Property Name | Sample Value | Description | |
|
205 |
+========================================+============================================================================+=================================================================================================================================================+ |
|
206 |
| oaipmh.maxListSize | 5 | Maximum number of records returned by each call to the ListIdentifiers and ListRecords verbs. | |
|
207 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
208 |
| oaipmh.repositoryIdentifier | metacat.lternet.edu | An identifier string for the respository. | |
|
209 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
210 |
| AbstractCatalog.oaiCatalogClassName | edu.ucsb.nceas.metacat.oaipmh.provider.server.catalog.MetacatCatalog | The Java class that implements the AbstractCatalog interface. This class determines which records exist in the repository and their datestamps. | |
|
211 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
212 |
| AbstractCatalog.recordFactoryClassName | edu.ucsb.nceas.metacat.oaipmh.provider.server.catalog.MetacatRecordFactory | The Java class that extends the RecordFactory class. This class creates OAI-PMH metadata records. | |
|
213 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
214 |
| AbstractCatalog.secondsToLive | 3600 | The lifetime, in seconds, of the resumptionToken. | |
|
215 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
216 |
| AbstractCatalog.granularity | YYYY-MM-DD or | Granularity of datestamps. Either ‘days granularity’ or ‘seconds granularity’ values can be used. | |
|
217 |
| | YYYY-MM-DDThh:mm:ssZ | | |
|
218 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
219 |
| Identify.repositoryName | Metacat OAI-PMH Data Provider | A name for the repository. | |
|
220 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
221 |
| Identify.earliestDatestamp | 2000-01-01T00:00:00Z | Earliest datestamp supported by this repository | |
|
222 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
223 |
| Identify.deletedRecord | yes or no | Use ‘yes’ if the repository indicates the status of deleted records; use ‘no’ if it doesn’t. | |
|
224 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
225 |
| Identify.adminEmail | mailto:tech_support@someplace.org | Email address of the repository administrator. | |
|
226 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
227 |
| Crosswalks.oai_dc | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml2oai_dc | Java class that controls the EML 2.x.y to oai_dc (Dublin Core) crosswalk. | |
|
228 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
229 |
| Crosswalks.eml2.0.0 | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml200 | Java class that furnishes EML 2.0.0 metadata. | |
|
230 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
231 |
| Crosswalks.eml2.0.1 | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml201 | Java class that furnishes EML 2.0.1 metadata. | |
|
232 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
233 |
| Crosswalks.eml2.1.0 | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml210 | Java class that furnishes EML 2.1.0 metadata. | |
|
234 |
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ |
|
235 |
|
|
236 |
|
|
237 |
Sample URLs |
|
238 |
........... |
|
239 |
Sample URLs that demonstrate use of the Metacat OAI-PMH Data Provider follow: |
|
240 |
|
|
241 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
242 |
| OAI-PMH Verb | Description | URL | |
|
243 |
+=====================+==============================================================+==========================================================================================================================================+ |
|
244 |
| GetRecord | Get an EML 2.0.1 record using its LSID identifier | http://<your_context_url>/dataProvider?verb=GetRecord&metadataPrefix=eml-2.0.1&identifier=urn:lsid:knb.ecoinformatics.org:knb-ltergce:26 | |
|
245 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
246 |
| GetRecord | Get an oai_dc (Dublin Core) record using its LSID identifier | http://<your_context_url>/dataProvider?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:lsid:knb.ecoinformatics.org:knb-lter-gce:26 | |
|
247 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
248 |
| Identify | Identify this data provider | http://<your_context_url>/dataProvider?verb=Identify | |
|
249 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
250 |
| ListIdentifiers | List all EML 2.1.0 identifiers in the repository | http://<your_context_url>/dataProvider?verb=ListIdentifiers&metadataPrefix=eml-2.1.0 | |
|
251 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
252 |
| ListIdentifiers | List all oai_dc (Dublin Core) identifiers in the | http://<your_context_url>/dataProvider?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2006-01-01&until=2010-01-01 | |
|
253 |
| | repository between a range of dates | | |
|
254 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
255 |
| ListMetadataFormats | List metadata formats supported by this repository | http://<your_context_url>/dataProvider?verb=ListMetadataFormats | |
|
256 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
257 |
| ListRecords | List all EML 2.0.0 records in the repository | http://<your_context_url>/dataProvider?verb=ListRecords&metadataPrefix=eml-2.0.0 | |
|
258 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
259 |
| ListRecords | List all oai_dc (Dublin Core) records in the repository | http://<your_context_url>/dataProvider?verb=ListRecords&metadataPrefix=oai_dc | |
|
260 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
261 |
| ListSets | List sets supported by this repository | http://<your_context_url>/dataProvider?verb=ListSets | |
|
262 |
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ |
|
263 |
|
|
264 |
|
|
265 |
Metacat OAI-PMH Harvester |
|
266 |
~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
267 |
The Metacat OAI-PMH Harvester (MOH) is executed as a command-line program:: |
|
268 |
|
|
269 |
sh runHarvester.sh -dn <distinguishedName> \ |
|
270 |
-password <password> \ |
|
271 |
-metadataPrefix <prefix> \ |
|
272 |
[-from <fromDate>] \ |
|
273 |
[-until <untilDate>] \ |
|
274 |
[-setSpec <setName>] \ |
|
275 |
<baseURL> |
|
276 |
|
|
277 |
The following example illustrates how the Metacat OAI-PMH Harvester is run from the command line: |
|
278 |
|
|
279 |
1. Open a system command window or terminal window. |
|
280 |
2. Set the METACAT_HOME environment variable to the value of the Metacat |
|
281 |
installation directory. Some examples follow: |
|
282 |
|
|
283 |
On Windows: |
|
284 |
|
|
285 |
:: |
|
286 |
|
|
287 |
set METACAT_HOME=C:\somePath\metacat |
|
288 |
|
|
289 |
On Linux/Unix (bash shell): |
|
290 |
|
|
291 |
:: |
|
292 |
|
|
293 |
export METACAT_HOME=/home/somePath/metacat |
|
294 |
|
|
295 |
3. cd to the following directory: |
|
296 |
|
|
297 |
On Windows: |
|
298 |
|
|
299 |
:: |
|
300 |
|
|
301 |
cd %METACAT_HOME%\lib\oaipmh |
|
302 |
|
|
303 |
On Linux/Unix: |
|
304 |
|
|
305 |
:: |
|
306 |
|
|
307 |
cd $METACAT_HOME/lib/oaipmh |
|
308 |
|
|
309 |
4. Run the appropriate Metacat OAI-PMH Harvester shell script, as determined by the operating system: |
|
310 |
|
|
311 |
On Windows: |
|
312 |
|
|
313 |
:: |
|
314 |
|
|
315 |
runHarvester.bat \ |
|
316 |
-dn uid=jdoe,o=myorg,dc=ecoinformatics,dc=org \ |
|
317 |
-password some_password \ |
|
318 |
-metadataPrefix oai_dc \ |
|
319 |
http://baseurl.repository.org/knb/dataProvider |
|
320 |
|
|
321 |
On Linux/Unix: |
|
322 |
|
|
323 |
:: |
|
324 |
|
|
325 |
sh runHarvester.sh \ |
|
326 |
-dn uid=jdoe,o=myorg,dc=ecoinformatics,dc=org \ |
|
327 |
-password some_password \ |
|
328 |
-metadataPrefix oai_dc \ |
|
329 |
http://baseurl.repository.org/knb/dataProvider |
|
330 |
|
|
331 |
|
|
332 |
Command line options and parameters are described in the following table: |
|
333 |
|
|
334 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
335 |
| Command Option or Parameter | Example | Description | |
|
336 |
+=============================+====================================================+=====================================================================================================+ |
|
337 |
| -dn | ``-dn uid=dryad,o=LTER,dc=ecoinformatics,dc=org`` | Full distinguished name of the LDAP account used when harvesting documents into Metacat. (Required) | |
|
338 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
339 |
| -password | ``-password some_password`` | Password of the LDAP account used when harvesting documents into Metacat. (Required) | |
|
340 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
341 |
| -metadataPrefix | ``-metadataPrefix oai_dc`` | The type of documents being harvested from the remote repository. (Required) | |
|
342 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
343 |
| -from | ``-from 2000-01-01`` | The lower limit of the datestamp for harvested documents. (Optional) | |
|
344 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
345 |
| -until | ``-until 2010-12-31`` | The upper limit of the datestamp for harvested documents. (Optional) | |
|
346 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
347 |
| -setSpec | ``-setSpec someSet`` | Harvest documents belonging to this set. (Optional) | |
|
348 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
349 |
| base_url | ``http://baseurl.repository.org/knb/dataProvider`` | Base URL of the remote repository | |
|
350 |
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+ |
|
351 |
|
|
352 |
|
|
353 |
OAI-PMH Error Codes |
|
354 |
------------------- |
|
355 |
|
|
356 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
357 |
| Error Code | Description | Applicable Verbs | |
|
358 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
359 |
| badArgument | The request includes illegal arguments, is missing required arguments, | all verbs | |
|
360 |
| | includes a repeated argument, or values for arguments have an illegal syntax. | | |
|
361 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
362 |
| badResumptionToken | The value of the resumptionToken argument is invalid or expired. | ListIdentifiers | |
|
363 |
| | | ListRecords | |
|
364 |
| | | ListSets | |
|
365 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
366 |
| badVerb | Value of the verb argument is not a legal OAI-PMH verb, the verb argument is | N/A | |
|
367 |
| | missing, or the verb argument is repeated. | | |
|
368 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
369 |
| cannotDisseminateFormat | The metadata format identified by the value given for the metadataPrefix | GetRecord | |
|
370 |
| | argument is not supported by the item or by the repository. | ListIdentifiers | |
|
371 |
| | | ListRecords | |
|
372 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
373 |
| idDoesNotExist | The value of the identifier argument is unknown or illegal in this repository. | GetRecord | |
|
374 |
| | | ListMetadataFormats | |
|
375 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
376 |
| noRecordsMatch | The combination of the values of the from, until, set and metadataPrefix | ListIdentifiers | |
|
377 |
| | arguments results in an empty list. | ListRecords | |
|
378 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
379 |
| noMetadataFormats | There are no metadata formats available for the specified item. | ListMetadataFormats | |
|
380 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
381 |
| noSetHierarchy | The repository does not support sets. | ListSets | |
|
382 |
| | | ListIdentifiers | |
|
383 |
| | | ListRecords | |
|
384 |
+-------------------------+--------------------------------------------------------------------------------+---------------------+ |
|
385 |
|
|
386 |
|
Also available in: Unified diff
Added OAI-PMH chapter that was contributed by Duane Costa from LTER.