1 |
6850
|
jones
|
OAI Protocol for Metadata Harvesting
|
2 |
|
|
====================================
|
3 |
|
|
|
4 |
6877
|
jones
|
The Open Archives Initiative Protocol for Metadata Harvesting (`OAI-PMH`_) was first
|
5 |
|
|
developed in the late 1990's as a standard for harvesting metadata from
|
6 |
|
|
distributed metadata/data repositories. The current version of the OAI-PMH
|
7 |
|
|
standard is 2.0 as of June 2002, with minor updates in December 2008.
|
8 |
6850
|
jones
|
|
9 |
6877
|
jones
|
.. _OAI-PMH: http://www.openarchives.org/pmh/
|
10 |
|
|
|
11 |
|
|
The OAI-PMH standard uses the Hypertext Transport Protocol (HTTP) as a
|
12 |
|
|
transport layer and specifies six query methods (called verbs) that must be
|
13 |
|
|
supported by an OAI-PMH compliant data provider (also referred to as a
|
14 |
|
|
repository). These methods are:
|
15 |
|
|
|
16 |
|
|
1. ``GetRecord`` – retrieves zero or one complete metadata record from a repository;
|
17 |
|
|
2. ``Identify`` – retrieves information about a repository;
|
18 |
|
|
3. ``ListIdentifiers`` – retrieves zero or more metadata record “headers” (not the complete metadata record) from a repository;
|
19 |
|
|
4. ``ListMetadataFormats`` – retrieves a list of available metadata record formats supported by a repository;
|
20 |
|
|
5. ``ListRecords`` – retrieves zero or more complete metadata records from a respository; and
|
21 |
|
|
6. ``ListSets`` – retrieves the set structure from a repository.
|
22 |
|
|
|
23 |
|
|
The OAI-PMH compliant data provider must accept requests from both HTTP GET
|
24 |
|
|
and HTTP POST request methods. Responses from the data provider must be
|
25 |
|
|
returned as an XML-encoded (version 1.0) stream. Error handling must be
|
26 |
|
|
supported by the data provider and return the correct error response code
|
27 |
|
|
back to the harvester. Detailed specifications and examples of all six verbs
|
28 |
|
|
may be viewed in Section 4 of the `OAI-PMH standards document`_.
|
29 |
|
|
|
30 |
|
|
.. _OAI-PMH standards document: http://www.openarchives.org/OAI/openarchivesprotocol.html
|
31 |
|
|
|
32 |
|
|
EML and Dublin Core
|
33 |
|
|
-------------------
|
34 |
|
|
The OAI-PMH requires that unqualified Dublin Core metadata be supported as a
|
35 |
|
|
minimum. Although EML generally provides more fine-grained metadata than Dublin
|
36 |
|
|
Core, the two metadata standards do share many of the same (or similar) content
|
37 |
|
|
elements. Transformations from EML to Dublin Core performed by Metacat OAI-PMH
|
38 |
|
|
produce *simple* or *unqualified* Dublin Core, which is associated with the reserved
|
39 |
|
|
metadataPrefix symbol ``oai_dc`` in the OAI-PMH.
|
40 |
|
|
|
41 |
|
|
The following table summarizes the element mappings of the EML to Dublin Core
|
42 |
|
|
crosswalk performed by Metacat OAI-PMH, including notes specific to each
|
43 |
|
|
element mapping.
|
44 |
|
|
|
45 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
46 |
|
|
| EML Element | DC Element | Notes |
|
47 |
|
|
+=======================================+=============+=================================================================================================================================================+
|
48 |
|
|
| Title | title | |
|
49 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
50 |
|
|
| Creator | creator | Use only the creator's name (givenName and surName elements); could be an organization name |
|
51 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
52 |
|
|
| keyword | subject | One subject element per keyword element |
|
53 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
54 |
|
|
| abstract | description | Must extract text formatting tags |
|
55 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
56 |
|
|
| publisher | publisher | Use only the publisher's name (givenName and surName elements); could be an organization name |
|
57 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
58 |
|
|
| associatedParty | contributor | Use only the party's name (givenName and surName); could be an organization name |
|
59 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
60 |
|
|
| pubDate | date | One-to-one mapping |
|
61 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
62 |
|
|
| dataset, citation, protocol, software | type | Type value is determined by the type of EML document rather than by a specific field value |
|
63 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
64 |
|
|
| physical | format | Use a mime type as the Format value? For example, if EML has <textFormat> element within <physical>, then use 'text/plain' as the Format value? |
|
65 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
66 |
|
|
| (1) packageId; | identifier | packageId can be used as the value of one identifier element; |
|
67 |
|
|
| (2) URL to the EML document | | a second identifier element can hold a URL to the EML document |
|
68 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
69 |
|
|
| dataSource | source | Use the document URL of the referenced data source? |
|
70 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
71 |
|
|
| Citation | relation | Use the document URL of the referenced citation? |
|
72 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
73 |
|
|
| geographicCoverage | coverage | Add separate coverage elements for geographic description and geographic bounding coordinates. |
|
74 |
|
|
| | | For bounding coordinates, use minimal labeling, for example: |
|
75 |
|
|
| | | 81.505000 W, 81.495000 W, |
|
76 |
|
|
| | | 31.170000 N, 31.163000 N |
|
77 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
78 |
|
|
| taxonomicCoverage | coverage | Use only genus/species binomials; place each binomial in a separate coverage element |
|
79 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
80 |
|
|
| temporalCoverage | coverage | Include begin date and end date when available. For example: |
|
81 |
|
|
| | | 1915-01-01 to 2004-12-31 |
|
82 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
83 |
|
|
| intellectualRights | rights | Must extract text formatting tags |
|
84 |
|
|
+---------------------------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
85 |
|
|
|
86 |
|
|
Metacat OAI-PMH includes a set of XSLT stylesheets used for converting specific
|
87 |
|
|
versions of EML to their Dublin Core equivalents.
|
88 |
|
|
|
89 |
|
|
Metacat OAI-PMH Service Interfaces
|
90 |
|
|
----------------------------------
|
91 |
|
|
Metacat includes support for two OAI-PMH service interfaces: a data provider
|
92 |
|
|
(or repository) service interface and a harvester service interface.
|
93 |
|
|
|
94 |
|
|
Data Provider
|
95 |
|
|
~~~~~~~~~~~~~
|
96 |
|
|
The Metacat OAI-PMH Data Provider service interface supports all six OAI-PMH
|
97 |
|
|
methods (GetRecord, Identify, ListIdentifiers, ListMetadataFormats, ListRecords,
|
98 |
|
|
and ListSets) as defined in the OAI-PMH Version 2 Specification through a
|
99 |
|
|
standard HTTP URL that accepts both HTTP GET and HTTP POST requests.
|
100 |
|
|
|
101 |
|
|
The Metacat OAI-PMH Data Provider service was implemented using the Online
|
102 |
|
|
Computer Library Center (OCLC) OAICat Open Source Software as the basis for
|
103 |
|
|
its implementation, with customizations added to facilitate integration with
|
104 |
|
|
Metacat.
|
105 |
|
|
|
106 |
|
|
Users of the Metacat OAI-PMH Data Provider should be aware of the following issues:
|
107 |
|
|
|
108 |
|
|
* 'Deleted' Status – OAI-PMH repositories can optionally flag records with
|
109 |
|
|
a 'deleted' status, indicating that a record in the metadata format
|
110 |
|
|
specified by the metadataPrefix is no longer available. Since Metacat does
|
111 |
|
|
not provide a mechanism for retrieving a list of deleted documents, the use
|
112 |
|
|
of the 'deleted' status is not supported in this implementation of the
|
113 |
|
|
OAI-PMH Data Provider. This represents a possible future enhancement.
|
114 |
|
|
* Sets – OAI-PMH repositories can optionally support set hierarchies. Since it
|
115 |
|
|
has not been determined how set hierarchies should be structured in
|
116 |
|
|
Metacat, this implementation of the OAI-PMH repository does not support
|
117 |
|
|
set hierarchies. This represents a possible future enhancement.
|
118 |
|
|
* Datestamp Granularity – When expressing datestamps for repository documents,
|
119 |
|
|
OAI-PMH allows two levels of granularity: day granularity and seconds
|
120 |
|
|
granularity. Since the Metacat database stores the value of its
|
121 |
|
|
``xml_documents.date_updated`` field in day granularity, it is the level
|
122 |
|
|
that is supported by the Metacat OAI-PMH Data Provider.
|
123 |
|
|
|
124 |
|
|
Metacat OAI-PMH Harvester
|
125 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
126 |
|
|
The Metacat OAI-PMH Harvester service interface utilizes OAI-PMH methods to
|
127 |
|
|
request metadata or related information from an OAI-PMH-compliant data provider
|
128 |
|
|
using a standard HTTP URL in either an HTTP-GET or HTTP-POST request.
|
129 |
|
|
|
130 |
|
|
The Metacat OAI-PMH Harvester client was implemented using OCLC's
|
131 |
|
|
OAIHarvester2 open source code as its base implementation, with customizations
|
132 |
|
|
as needed to support integration with Metacat.
|
133 |
|
|
|
134 |
|
|
Users of the Metacat OAI-PMH Harvester should be aware of the following issues:
|
135 |
|
|
|
136 |
|
|
* Handling of 'Deleted' status – The Metacat OAI-PMH Harvester program does
|
137 |
|
|
check to see whether a 'deleted' status is flagged for a harvested document,
|
138 |
|
|
and if it is, the document is correspondingly deleted from the Metacat repository.
|
139 |
|
|
* Datestamp Granularity – When expressing datestamps for repository documents,
|
140 |
|
|
OAI-PMH allows two levels of granularity – day granularity and seconds
|
141 |
|
|
granularity. Since the Metacat database stores the value of its
|
142 |
|
|
``xml_documents.last_updated`` field in day granularity, it is also the
|
143 |
|
|
level that is supported by both the Metacat OAI-PMH Data Provider and the
|
144 |
|
|
Metacat OAI-PMH Harvester. This has implications when Metacat OAI-PMH
|
145 |
|
|
Harvester (MOH) interacts with data providers such as the Dryad repository,
|
146 |
|
|
which stores its documents with seconds granularity. For example, consider
|
147 |
|
|
the following sequence of events:
|
148 |
|
|
|
149 |
|
|
1. On January 1, 2010, MOH harvests a document from the Dryad repository
|
150 |
|
|
with datestamp '2010-01-01T10:00:00Z', and stores its local copy with
|
151 |
|
|
datestamp '2010-01-01'.
|
152 |
|
|
2. Later that same day, the Dryad repository updates the document to a
|
153 |
|
|
newer revision, with a new datestamp such as '2010-01-01T20:00:0Z'.
|
154 |
|
|
3. On the following day, MOH runs another harvest. It determines that it
|
155 |
|
|
has a local copy of the document with datestamp '2010-01-01' and does
|
156 |
|
|
not re-harvest the document, despite the fact that its local copy is not
|
157 |
|
|
the latest revision.
|
158 |
|
|
|
159 |
|
|
Configuring and Running Metacat OAI-PMH
|
160 |
|
|
---------------------------------------
|
161 |
|
|
|
162 |
|
|
Metacat OAI-PMH Data Provider Servlet
|
163 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
164 |
|
|
To configure and enable the Data Provider servlet:
|
165 |
|
|
|
166 |
|
|
1. Stop Tomcat and edit the Metacat properties (``metacat.properties``) file in
|
167 |
|
|
the Metacat context directory inside the Tomcat application directory.
|
168 |
|
|
The Metacat context directory is the name of the application (usually ``knb``):
|
169 |
|
|
|
170 |
|
|
::
|
171 |
|
|
|
172 |
|
|
<tomcat_app_dir>/<context_dir>/WEB-INF/metacat.properties
|
173 |
|
|
|
174 |
|
|
2. Change the following properties appropriately:
|
175 |
|
|
|
176 |
|
|
::
|
177 |
|
|
|
178 |
|
|
``oaipmh.repositoryIdentifier`` – A string that identifies this repository
|
179 |
|
|
``Identify.adminEmail`` – The email address of the repository administrator
|
180 |
|
|
|
181 |
|
|
3. Edit the deployment descriptor (``web.xml``) file, also in the WEB-INF
|
182 |
|
|
directory. Uncomment the servlet-name and servlet-mapping entries for the
|
183 |
|
|
DataProvider servlet by removing the surrounding “<!--“ and “-->” strings:
|
184 |
|
|
|
185 |
|
|
::
|
186 |
|
|
|
187 |
|
|
<servlet>
|
188 |
|
|
<servlet-name>DataProvider</servlet-name>
|
189 |
|
|
<description>Processes OAI verbs for Metacat OAI-PMH Data Provider (MODP)</description>
|
190 |
|
|
<servlet-class>edu.ucsb.nceas.metacat.oaipmh.provider.server.OAIHandler</servlet-class>
|
191 |
|
|
<load-on-startup>4</load-on-startup>
|
192 |
|
|
</servlet>
|
193 |
|
|
<servlet-mapping>
|
194 |
|
|
<servlet-name>DataProvider</servlet-name>
|
195 |
|
|
<url-pattern>/dataProvider</url-pattern>
|
196 |
|
|
</servlet-mapping>
|
197 |
|
|
|
198 |
|
|
4. Save the ``metacat.properties`` and ``web.xml`` files and start Tomcat.
|
199 |
|
|
|
200 |
|
|
The following table describes the complete set of ``metacat.properties``
|
201 |
|
|
settings that are used by the DataProvider servlet.
|
202 |
|
|
|
203 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
204 |
|
|
| Property Name | Sample Value | Description |
|
205 |
|
|
+========================================+============================================================================+=================================================================================================================================================+
|
206 |
|
|
| oaipmh.maxListSize | 5 | Maximum number of records returned by each call to the ListIdentifiers and ListRecords verbs. |
|
207 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
208 |
|
|
| oaipmh.repositoryIdentifier | metacat.lternet.edu | An identifier string for the respository. |
|
209 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
210 |
|
|
| AbstractCatalog.oaiCatalogClassName | edu.ucsb.nceas.metacat.oaipmh.provider.server.catalog.MetacatCatalog | The Java class that implements the AbstractCatalog interface. This class determines which records exist in the repository and their datestamps. |
|
211 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
212 |
|
|
| AbstractCatalog.recordFactoryClassName | edu.ucsb.nceas.metacat.oaipmh.provider.server.catalog.MetacatRecordFactory | The Java class that extends the RecordFactory class. This class creates OAI-PMH metadata records. |
|
213 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
214 |
|
|
| AbstractCatalog.secondsToLive | 3600 | The lifetime, in seconds, of the resumptionToken. |
|
215 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
216 |
|
|
| AbstractCatalog.granularity | YYYY-MM-DD or | Granularity of datestamps. Either ‘days granularity’ or ‘seconds granularity’ values can be used. |
|
217 |
|
|
| | YYYY-MM-DDThh:mm:ssZ | |
|
218 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
219 |
|
|
| Identify.repositoryName | Metacat OAI-PMH Data Provider | A name for the repository. |
|
220 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
221 |
|
|
| Identify.earliestDatestamp | 2000-01-01T00:00:00Z | Earliest datestamp supported by this repository |
|
222 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
223 |
|
|
| Identify.deletedRecord | yes or no | Use ‘yes’ if the repository indicates the status of deleted records; use ‘no’ if it doesn’t. |
|
224 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
225 |
|
|
| Identify.adminEmail | mailto:tech_support@someplace.org | Email address of the repository administrator. |
|
226 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
227 |
|
|
| Crosswalks.oai_dc | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml2oai_dc | Java class that controls the EML 2.x.y to oai_dc (Dublin Core) crosswalk. |
|
228 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
229 |
|
|
| Crosswalks.eml2.0.0 | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml200 | Java class that furnishes EML 2.0.0 metadata. |
|
230 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
231 |
|
|
| Crosswalks.eml2.0.1 | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml201 | Java class that furnishes EML 2.0.1 metadata. |
|
232 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
233 |
|
|
| Crosswalks.eml2.1.0 | edu.ucsb.nceas.metacat.oaipmh.provider.server.crosswalk.Eml210 | Java class that furnishes EML 2.1.0 metadata. |
|
234 |
|
|
+----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
|
235 |
|
|
|
236 |
|
|
|
237 |
|
|
Sample URLs
|
238 |
|
|
...........
|
239 |
|
|
Sample URLs that demonstrate use of the Metacat OAI-PMH Data Provider follow:
|
240 |
|
|
|
241 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
242 |
|
|
| OAI-PMH Verb | Description | URL |
|
243 |
|
|
+=====================+==============================================================+==========================================================================================================================================+
|
244 |
|
|
| GetRecord | Get an EML 2.0.1 record using its LSID identifier | http://<your_context_url>/dataProvider?verb=GetRecord&metadataPrefix=eml-2.0.1&identifier=urn:lsid:knb.ecoinformatics.org:knb-ltergce:26 |
|
245 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
246 |
|
|
| GetRecord | Get an oai_dc (Dublin Core) record using its LSID identifier | http://<your_context_url>/dataProvider?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:lsid:knb.ecoinformatics.org:knb-lter-gce:26 |
|
247 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
248 |
|
|
| Identify | Identify this data provider | http://<your_context_url>/dataProvider?verb=Identify |
|
249 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
250 |
|
|
| ListIdentifiers | List all EML 2.1.0 identifiers in the repository | http://<your_context_url>/dataProvider?verb=ListIdentifiers&metadataPrefix=eml-2.1.0 |
|
251 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
252 |
|
|
| ListIdentifiers | List all oai_dc (Dublin Core) identifiers in the | http://<your_context_url>/dataProvider?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2006-01-01&until=2010-01-01 |
|
253 |
|
|
| | repository between a range of dates | |
|
254 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
255 |
|
|
| ListMetadataFormats | List metadata formats supported by this repository | http://<your_context_url>/dataProvider?verb=ListMetadataFormats |
|
256 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
257 |
|
|
| ListRecords | List all EML 2.0.0 records in the repository | http://<your_context_url>/dataProvider?verb=ListRecords&metadataPrefix=eml-2.0.0 |
|
258 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
259 |
|
|
| ListRecords | List all oai_dc (Dublin Core) records in the repository | http://<your_context_url>/dataProvider?verb=ListRecords&metadataPrefix=oai_dc |
|
260 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
261 |
|
|
| ListSets | List sets supported by this repository | http://<your_context_url>/dataProvider?verb=ListSets |
|
262 |
|
|
+---------------------+--------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+
|
263 |
|
|
|
264 |
|
|
|
265 |
|
|
Metacat OAI-PMH Harvester
|
266 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
267 |
|
|
The Metacat OAI-PMH Harvester (MOH) is executed as a command-line program::
|
268 |
|
|
|
269 |
|
|
sh runHarvester.sh -dn <distinguishedName> \
|
270 |
|
|
-password <password> \
|
271 |
|
|
-metadataPrefix <prefix> \
|
272 |
|
|
[-from <fromDate>] \
|
273 |
|
|
[-until <untilDate>] \
|
274 |
|
|
[-setSpec <setName>] \
|
275 |
|
|
<baseURL>
|
276 |
|
|
|
277 |
|
|
The following example illustrates how the Metacat OAI-PMH Harvester is run from the command line:
|
278 |
|
|
|
279 |
|
|
1. Open a system command window or terminal window.
|
280 |
|
|
2. Set the METACAT_HOME environment variable to the value of the Metacat
|
281 |
|
|
installation directory. Some examples follow:
|
282 |
|
|
|
283 |
|
|
On Windows:
|
284 |
|
|
|
285 |
|
|
::
|
286 |
|
|
|
287 |
|
|
set METACAT_HOME=C:\somePath\metacat
|
288 |
|
|
|
289 |
|
|
On Linux/Unix (bash shell):
|
290 |
|
|
|
291 |
|
|
::
|
292 |
|
|
|
293 |
|
|
export METACAT_HOME=/home/somePath/metacat
|
294 |
|
|
|
295 |
|
|
3. cd to the following directory:
|
296 |
|
|
|
297 |
|
|
On Windows:
|
298 |
|
|
|
299 |
|
|
::
|
300 |
|
|
|
301 |
|
|
cd %METACAT_HOME%\lib\oaipmh
|
302 |
|
|
|
303 |
|
|
On Linux/Unix:
|
304 |
|
|
|
305 |
|
|
::
|
306 |
|
|
|
307 |
|
|
cd $METACAT_HOME/lib/oaipmh
|
308 |
|
|
|
309 |
|
|
4. Run the appropriate Metacat OAI-PMH Harvester shell script, as determined by the operating system:
|
310 |
|
|
|
311 |
|
|
On Windows:
|
312 |
|
|
|
313 |
|
|
::
|
314 |
|
|
|
315 |
|
|
runHarvester.bat \
|
316 |
|
|
-dn uid=jdoe,o=myorg,dc=ecoinformatics,dc=org \
|
317 |
|
|
-password some_password \
|
318 |
|
|
-metadataPrefix oai_dc \
|
319 |
|
|
http://baseurl.repository.org/knb/dataProvider
|
320 |
|
|
|
321 |
|
|
On Linux/Unix:
|
322 |
|
|
|
323 |
|
|
::
|
324 |
|
|
|
325 |
|
|
sh runHarvester.sh \
|
326 |
|
|
-dn uid=jdoe,o=myorg,dc=ecoinformatics,dc=org \
|
327 |
|
|
-password some_password \
|
328 |
|
|
-metadataPrefix oai_dc \
|
329 |
|
|
http://baseurl.repository.org/knb/dataProvider
|
330 |
|
|
|
331 |
|
|
|
332 |
|
|
Command line options and parameters are described in the following table:
|
333 |
|
|
|
334 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
335 |
|
|
| Command Option or Parameter | Example | Description |
|
336 |
|
|
+=============================+====================================================+=====================================================================================================+
|
337 |
|
|
| -dn | ``-dn uid=dryad,o=LTER,dc=ecoinformatics,dc=org`` | Full distinguished name of the LDAP account used when harvesting documents into Metacat. (Required) |
|
338 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
339 |
|
|
| -password | ``-password some_password`` | Password of the LDAP account used when harvesting documents into Metacat. (Required) |
|
340 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
341 |
|
|
| -metadataPrefix | ``-metadataPrefix oai_dc`` | The type of documents being harvested from the remote repository. (Required) |
|
342 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
343 |
|
|
| -from | ``-from 2000-01-01`` | The lower limit of the datestamp for harvested documents. (Optional) |
|
344 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
345 |
|
|
| -until | ``-until 2010-12-31`` | The upper limit of the datestamp for harvested documents. (Optional) |
|
346 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
347 |
|
|
| -setSpec | ``-setSpec someSet`` | Harvest documents belonging to this set. (Optional) |
|
348 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
349 |
|
|
| base_url | ``http://baseurl.repository.org/knb/dataProvider`` | Base URL of the remote repository |
|
350 |
|
|
+-----------------------------+----------------------------------------------------+-----------------------------------------------------------------------------------------------------+
|
351 |
|
|
|
352 |
|
|
|
353 |
|
|
OAI-PMH Error Codes
|
354 |
|
|
-------------------
|
355 |
|
|
|
356 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
357 |
|
|
| Error Code | Description | Applicable Verbs |
|
358 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
359 |
|
|
| badArgument | The request includes illegal arguments, is missing required arguments, | all verbs |
|
360 |
|
|
| | includes a repeated argument, or values for arguments have an illegal syntax. | |
|
361 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
362 |
|
|
| badResumptionToken | The value of the resumptionToken argument is invalid or expired. | ListIdentifiers |
|
363 |
|
|
| | | ListRecords |
|
364 |
|
|
| | | ListSets |
|
365 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
366 |
|
|
| badVerb | Value of the verb argument is not a legal OAI-PMH verb, the verb argument is | N/A |
|
367 |
|
|
| | missing, or the verb argument is repeated. | |
|
368 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
369 |
|
|
| cannotDisseminateFormat | The metadata format identified by the value given for the metadataPrefix | GetRecord |
|
370 |
|
|
| | argument is not supported by the item or by the repository. | ListIdentifiers |
|
371 |
|
|
| | | ListRecords |
|
372 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
373 |
|
|
| idDoesNotExist | The value of the identifier argument is unknown or illegal in this repository. | GetRecord |
|
374 |
|
|
| | | ListMetadataFormats |
|
375 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
376 |
|
|
| noRecordsMatch | The combination of the values of the from, until, set and metadataPrefix | ListIdentifiers |
|
377 |
|
|
| | arguments results in an empty list. | ListRecords |
|
378 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
379 |
|
|
| noMetadataFormats | There are no metadata formats available for the specified item. | ListMetadataFormats |
|
380 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
381 |
|
|
| noSetHierarchy | The repository does not support sets. | ListSets |
|
382 |
|
|
| | | ListIdentifiers |
|
383 |
|
|
| | | ListRecords |
|
384 |
|
|
+-------------------------+--------------------------------------------------------------------------------+---------------------+
|
385 |
|
|
|