Project

General

Profile

Revision 6871

Added by Matt Jones over 9 years ago

Moving Metacat Sphinx RST documentation from docs/dev to docs/user directory.

View differences:

docs/dev/metacat/Makefile
1
# Makefile for Sphinx documentation
2
#
3

  
4
# You can set these variables from the command line.
5
SPHINXOPTS    =
6
SPHINXBUILD   = sphinx-build
7
PAPER         =
8
BUILDDIR      = build
9
GRAPHVIZ      = /opt/local/bin/dot
10

  
11
# Internal variables.
12
PAPEROPT_a4     = -D latex_paper_size=a4
13
PAPEROPT_letter = -D latex_paper_size=letter
14
ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
15

  
16
.PHONY: help clean html dirhtml pickle json htmlhelp qthelp latex changes linkcheck doctest pdf
17

  
18
help:
19
	@echo "Please use \`make <target>' where <target> is one of"
20
	@echo "  html      to make standalone HTML files"
21
	@echo "  dirhtml   to make HTML files named index.html in directories"
22
	@echo "  pickle    to make pickle files"
23
	@echo "  json      to make JSON files"
24
	@echo "  htmlhelp  to make HTML files and a HTML help project"
25
	@echo "  qthelp    to make HTML files and a qthelp project"
26
	@echo "  latex     to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
27
	@echo "  changes   to make an overview of all changed/added/deprecated items"
28
	@echo "  linkcheck to check all external links for integrity"
29
	@echo "  doctest   to run all doctests embedded in the documentation (if enabled)"
30
	@echo "  pdf       to make PDF files"
31

  
32
clean:
33
	-rm -rf $(BUILDDIR)/*
34

  
35
plantuml:
36
	GRAPHVIZ_DOT=$(GRAPHVIZ) plantuml source 
37

  
38
html: 
39
	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
40
	@echo
41
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
42

  
43
dirhtml:
44
	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
45
	@echo
46
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
47

  
48
pickle:
49
	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
50
	@echo
51
	@echo "Build finished; now you can process the pickle files."
52

  
53
json:
54
	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
55
	@echo
56
	@echo "Build finished; now you can process the JSON files."
57

  
58
pdf:
59
	$(SPHINXBUILD) -b pdf $(ALLSPHINXOPTS) $(BUILDDIR)/pdf
60
	@echo
61
	@echo "Build finished. The PDF files are in $(BUILDDIR)/pdf."
62

  
63
htmlhelp:
64
	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
65
	@echo
66
	@echo "Build finished; now you can run HTML Help Workshop with the" \
67
	      ".hhp project file in $(BUILDDIR)/htmlhelp."
68

  
69
qthelp:
70
	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
71
	@echo
72
	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
73
	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
74
	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/DataONEArchitecture.qhcp"
75
	@echo "To view the help file:"
76
	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/DataONEArchitecture.qhc"
77

  
78
latex:
79
	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
80
	@echo
81
	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
82
	@echo "Run \`make all-pdf' or \`make all-ps' in that directory to" \
83
	      "run these through (pdf)latex."
84

  
85
changes:
86
	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
87
	@echo
88
	@echo "The overview file is in $(BUILDDIR)/changes."
89

  
90
linkcheck:
91
	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
92
	@echo
93
	@echo "Link check complete; look for any errors in the above output " \
94
	      "or in $(BUILDDIR)/linkcheck/output.txt."
95

  
96
doctest:
97
	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
98
	@echo "Testing of doctests in the sources finished, look at the " \
99
	      "results in $(BUILDDIR)/doctest/output.txt."
docs/dev/metacat/source/geoserver.rst
1
Metacat's Use of Geoserver
2
==========================
3

  
4
GeoServer 2.0.2, an open source Web Mapping Service (WMS) written in Java, is 
5
bundled with Metacat and can be used to render spatial data as web-deliverable 
6
maps. Metacat uses OpenLayers (http://openlayers.org/) to provide a web-based 
7
user interface for interacting with the generated maps. You can use any 
8
WMS-compatible client (e.g., ArcGIS, QGIS, JUMP, UDig, OpenLayers, Mapbender, 
9
Map Builder). 
10

  
11
IMPORTANT: Regardless of whether you plan on using the mapping functionality 
12
you should, for security purposes, configure GeoServer so that it doesn't 
13
use the default password. For instructions, please see 
14
Geoserver Configuration.
15

  
16
.. figure:: images/screenshots/image051.jpg
17
   :align: center
18
   
19
   A map generated by Metacat's GeoServer. Points and "bounding boxes" 
20
   represent the geographic extent of datasets stored in the KNB Metacat repository.
21

  
22
GeoServer supports a wide variety of vector GIS data sources, which can be 
23
styled using Styled Layer Descriptors (SLDs) and output as images (the default) 
24
or raw vector data (GML or KML).
25

  
26
Currently, GeoServer can be used with the following limitations:
27

  
28
* GeoServer will only map documents that are publicly available. This is 
29
  because the mapping server's support for permissions control is not as 
30
  fine-grained as Metacat's.
31

  
32
Metacat developers plan to continue extending and improving Metacat's mapping 
33
capabilities. If you are interested in contributing to those efforts, or if 
34
you are interested in learning more about the architecture and future plans for 
35
the mapping software, please contact the Metacat  development 
36
team  (metacat-dev@ecoinformatics.org).
37

  
38
Installing and Configuring
39
--------------------------
40
The GeoServer webapp should be installed as a sibling of Metacat. If you do 
41
NOT wish to run GeoServer, the deployment can be skipped, but any skins that 
42
use maps will not render correctly. (NOTE: Geoserver recommends using a PermGen 
43
space setting of at least 128MB.). 
44

  
45
Metacat comes with a pre-configured data directory to be used by GeoServer. 
46
This includes a world-countries base layer and a default configuration that 
47
is already aware of Metacat's spatial cache. The Metacat configuration interface 
48
is used to configure GeoServer to use this shared data directory. To further 
49
configure GeoServer, use the Web-based configuration utility, 
50
which is available at: http://your.server.com/context/geoserver.jsp 
51
(e.g., http://knb.ecoinformatics.org/knb/geoserver.jsp). 
52

  
53
Common configuration tasks include:
54

  
55
* Adding a Map to a Web Page or Skin
56
* Configuring the Size and Initial Extent of the Map
57
* Configuring the Layout of the HTML Mapping Interface
58
* Configuring the "Select Location Drop-down Menu
59
* Configuring the Visual Portrayal of Geospatial Data (e.g., symbology and color)
60
* Adding Other Spatial Datasets to the Web Map
61

  
62
.. figure:: images/screenshots/image053.png
63
   :align: center
64
   
65
   GeoServer's Web-based administrative interface.
66

  
67
Note: Some configurations may need to be made to the XML files as well.
68

  
69
OpenLayers, which Metacat uses as the front-end for GeoServer's WMS service, 
70
provides interface components or "widgets" (e.g., the map, a box zoom, layer 
71
list, "Select Location" drop-down menu, scale bar, lat/long coordinates, and 
72
a query form) that make it easy to deploy web-based mapping applications with 
73
minimal coding.
74

  
75
OpenLayers has three main configuration files used to customize the map interface.
76
Default configurations are in::
77

  
78
  $METACAT/lib/style/common/spatial/
79
  
80
+----------------------------------+---------------+-------------------------------------------------------------+
81
| Document                         | Location      | Description                                                 |
82
+==================================+===============+=============================================================+
83
| The named location file          | locations.jsp | The list of pre-defined locations (name and lat/lon bounds) |
84
+----------------------------------+---------------+-------------------------------------------------------------+
85
| Main map rendering functions     | maps.js       | Defines the map, widgets and their behavior                 |
86
+----------------------------------+---------------+-------------------------------------------------------------+
87
| The rendered map and page layout | map.jsp       | Loads the map and controls the HTML layout of the widgets.  |
88
+----------------------------------+---------------+-------------------------------------------------------------+
89

  
90
NOTE: By default, the first time Metacat is restarted, it generates a 
91
"spatial cache" containing geographic information about documents in its 
92
repository. This default behavior is specified in lib/metacat.properties, 
93
where the regenerateCacheOnRestart parameter is set to true. The information 
94
in the spatial cache is stored in a GIS-compatible format (the ESRI Shapefile) 
95
and consists of the document name and its geographic coverage. When documents 
96
are inserted, deleted, and updated in the Metacat repository, Metacat 
97
automatically syncs the spatial cache to reflect the changes. Because 
98
generating the cache can take a considerable amount of time (several minutes 
99
in the case of a few thousand documents), Metacat resets the 
100
regenerateCacheOnRestart property to false after the spatial cache has been 
101
generated. Note that if you upgrade or reinstall Metacat, the spatial cache 
102
will be regenerated again.
103

  
104
Adding a Map to a Web Page or Skin
105
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106
To add a map to a Web page, simply include the map interface using an iframe:: 
107

  
108
  <iframe scrolling="no" frameborder="0" width="780" height="420" 
109
          src="/knb/style/common/spatial/map.jsp">
110
  </iframe>
111

  
112
The map URL, ``/knb/style/common/spatial/map.jsp``, is 
113
the default map interface. If you plan to customize the map interface, copy
114
the map.jsp file into your skin's directory (either the default or 
115
customized skin directory). 
116

  
117
::
118

  
119
  cp -r style/common/spatial/map.jsp /style/skins/<myskin>/spatial
120

  
121
You can access the customized map with the URL: ``/knb/style/skins/<myskin>/spatial/map.jsp`` 
122

  
123
Configuring the Size and Initial Extent of the Map
124
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125
Before you configure the size and initial extent of the map, make sure that you 
126
have copied the map layout page into your skin's directory (See 
127
:doc:`configuration` for directions). Once the file has been copied, you can 
128
modify the map's initial extent in: ``${skin.dir}/spatial/map.jsp``.
129

  
130
To change the map’s initial extent, edit the bounding box. The default is to 
131
show the entire globe. The ``initMap()`` function should also be given the skin 
132
name so that spatial search results can be correctly styled.
133

  
134
::
135

  
136
  <script type="text/javascript">
137
      function init() {
138
         var bounds = new OpenLayers.Bounds(-180,-90,180,90); 
139
         // make the map for this skin 
140
         initMap("<%=GEOSERVER_URL%>", "<%=SERVLET_URL%>", "default", bounds);
141
      }
142
  </script>
143

  
144
The size (height/width) of the map can be controlled by the ``#map`` CSS entry 
145
included in the ``map.jsp`` page.
146

  
147
Configuring the Layout of the HTML Mapping Interface
148
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149
The size and initial extent of the map can be edited in : ${skin.dir}/spatial/map.jsp.
150

  
151
The map.jsp is a simple container that can be included in other more complex 
152
pages if desired. It contains the map, widgets and location dropdown list.
153

  
154
Configuring the "Select Location" Drop-down Menu
155
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156
The locations that appear in the "Select Location" drop-down menu are specified 
157
in the ``locations.jsp`` file. The locations.jsp can be copied from the common 
158
spatial template into your skin directory. Each location is defined as an 
159
HTML ``<option/>`` tag. Edit the value and label to edit or add new locations.
160

  
161
::
162

  
163
  <option value=“-149.725,68.475 -149.3254,68.725”> Arctic LTER (ARC)</option>
164

  
165
Configuring the Visual Portrayal of Geospatial Data (e.g., symbology and color)
166
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167
Geospatial data sets are styled through the use of Styled Layer Descriptors 
168
(SLD). The default SLDs used for the data points and data bounding boxes are in
169
``/lib/spatial/geoserver/data/styles/`` and are named data_points_style.sld and 
170
data_bounds_style.sld, respectively. 
171

  
172
You can find a more detailed tutorial on using SLD with GeoServer in the GeoServer documentation::
173

  
174
  http://docs.geoserver.org/
175

  
176
Adding Other Spatial Datasets to the Web Map
177
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
178
If you have vector GIS data sets, such as weather or topographical information, 
179
on your server that you'd like to include in the interactive map, you must 
180
first register the data set with GeoServer. After the data set has been 
181
registered, you can add the layer to the map. You can also add spatial layers 
182
that have been made publically available through WMS (There are hundreds of 
183
spatial data sets available. Check out wms-sites.com for good catalog). 
184
Instructions for adding publically available layers are included at the end 
185
of this section. 
186

  
187
To register the data set and add it to the map:
188

  
189
1. Point your browser to ``http://your.server/geoserver``, log in to GeoServer, 
190
   and navigate to the "Data Stores" configuration page under ``Data > Stores``. 
191
2. Create a new vector data source from a Shapefile in the “metacat” workspace.
192

  
193
.. figure:: images/screenshots/image055.png
194
   :align: center
195
   
196
   Creating a new shapefile using GeoServers web-based administrative interface.
197

  
198
3. The Description, if specified, is mostly used internally to provide other 
199
   administrators with information about the DataStore. Click Submit.
200
4. Navigate to the "Layers" configuration page under Data > Layers. 
201
   Add a new Layer from your new data source.
202
5. You should also define a spatial reference system (SRS) number for the new 
203
   layer. Most lat/long data is "4326". If your data is in another projection, 
204
   determine its spatial reference system using the help links provided.
205

  
206
.. figure:: images/screenshots/image057.png
207
   :align: center
208
   
209
   GeoServer's FeatureType configuration. The SRS settings discussed in step 5 are highlighted.
210

  
211
6. Style the layer using a style from the drop-down menu on the Publishing tab, 
212
   or create a new SLD to create a new style object and corresponding SLD 
213
   (this option provides more control over the style). 
214
7. Try out the styled data set as a WMS layer using a the Layer Preview.
215

  
216
.. figure:: images/screenshots/image058.png
217
   :align: center
218
   
219
   GeoServer's Layer Preview allows you to see an OpenLayer rendering of the new layer.
220

  
221
8. Copy the default ``map.js`` file that assembles the map in OpenLayers 
222
   (``style/common/spatial/map.js``) to your skin’s spatial directory.
223
9. Edit the init() method to include your new layer in the map – either as an 
224
   overlay or as a base layer.
225
10. Point your browser to the map interface. Your new layer should appear with 
226
    the existing ones.
227

  
228
Adding External Spatial Data Made Publically Available through WMS
229
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
230
There are hundreds of sources of spatial data made publically available 
231
through WMS (check out http://wms-sites.com for a good catalog). To add these 
232
data sources to your map, add the layers in your skin’s ``spatial/map.js`` file.
233

  
234
Spatial Queries
235
---------------
236
To find out which documents in the Metacat repository lie in a specified 
237
geographic region, query the spatial cache using Metacat's spatial_query action. 
238
Metacat can perform any query supported by the WFS/WMS standards.
239

  
240
An example of a spatial query string is::
241

  
242
  http://localhost/knb/metacat?action=spatial_query&xmin=-117.5&xmax=-64&ymin=3&ymax=46&skin=default
243

  
244
Where ``xmin``, ``xmax``, ``ymin`` and ``ymax`` represent the western, eastern, 
245
southern and northern bounding coordinates (the "bounding box"), respectively. 
246
The spatial query action returns all documents that overlap or that are 
247
contained inside the specified spatial coordinates. The result set is returned 
248
as HTML using the style of the specified skin (in this example, default).
249

  
docs/dev/metacat/source/harvester.rst
1
Harvester and Harvest List Editor
2
=================================
3

  
4
Metacat's Harvester is an optional feature that can be used to automatically 
5
retrieve EML documents from one or more custom data management system (e.g., 
6
SRB or PostgreSQL) and to insert (or update) those documents to the home 
7
repository. The local sites control when they are harvested, and which documents 
8
are harvested. 
9

  
10
For example, the Long Term Ecological Research Network (LTER) uses the Metacat 
11
Harvester to create a centralized repository of data stored on twenty-six 
12
different sites that store EML metadata, but that use different data management 
13
systems. Once the data have been harvested and placed into a centralized 
14
repository, they are replicated to the KNB network, exposing the information 
15
to an even larger scientific community.
16

  
17
Once the Harvester is properly configured, listed documents are retrieved and 
18
uploaded on a regularly scheduled basis. You must configure both the home 
19
Metacat and the remote sites (aka the "harvest sites") before using this 
20
feature. Local sites must also provide the Metacat server with a list of 
21
documents that should be harvested.
22

  
23
Configuring Harvester
24
---------------------
25
Before you can use the Harvester to retrieve documents, you must configure the 
26
feature using the settings in the metacat.properties file. Note that you must 
27
also configure each site that the Harvester will connect to and retrieve 
28
documents from (see section 7.2 for details). 
29

  
30
The Harvester configuration information is managed in the metacat.properties 
31
file, which is located at:: 
32

  
33
  <CONTEXT_DIR>/WEB_INF/metacat.properties
34

  
35
The Harvester properties are grouped together and begin after the comment line:: 
36

  
37
  # Harvester properties
38

  
39
To configure Harvester, edit the metacat.properties and set appropriate values 
40
for the harvesterAdministrator and smtpServer property. You may also wish to 
41
customize the other Harvester paramaters, each discussed in the table below. 
42

  
43
Harvester Properties and their Functions
44
----------------------------------------
45

  
46
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
47
| Property                           | Description and Values                                                                          | |
48
+====================================+=================================================================================================+=+
49
| connectToMetacat                   | Determine whether Harvester should connect to Metacat to upload retrieved documents.            | |
50
|                                    | Set to true (the default) under most circumstances. To test whether Harvester can               | |
51
|                                    | retrieve documents from a site without actually connecting to Metacat                           | |
52
|                                    | to upload the documents, set the value to false.                                                | |
53
|                                    |                                                                                                 | |
54
|                                    | Values: true/false                                                                              | |
55
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
56
| delay                              | The number of hours that Harvester will wait before beginning its first harvest.                | |
57
|                                    | For example, if Harvester is run at 1:00 p.m., and the delay is set to 12,                      | |
58
|                                    | Harvester will begin its first harvest at 1:00 a.m.                                             | |
59
|                                    |                                                                                                 | |
60
|                                    | Default: 0                                                                                      | |
61
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
62
| harvesterAdministrator             | The email address of the Harvester Administrator. Harvester will send                           | |
63
|                                    | email reports to this address after every harvest. Enter multiple email addresses by separating | |
64
|                                    | each address with a comma or semicolon (e.g., name1@abc.edu,name2@abc.edu).                     | |
65
|                                    |                                                                                                 | |
66
|                                    | Values: An email address, or multiple email addresses separated by commas or semi-colons        | |
67
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
68
| logPeriod                          | The number of days to retain Harvester log entries. Harvester log entries                       | |
69
|                                    | record information such as which documents were harvested, from which sites,                    | |
70
|                                    | and whether any errors were encountered during the harvest. Log entries older                   | |
71
|                                    | than logPeriod number of days are purged from the database at the end of each harvest.          | |
72
|                                    |                                                                                                 | |
73
|                                    | Default: 90                                                                                     | |
74
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
75
| maxHarvests                        | The maximum number of harvests that Harvester should execute before                             | |
76
|                                    | shutting down. If the value of maxHarvests is set to 0 or a                                     | |
77
|                                    | negative number, Harvester will execute indefinitely.                                           | |
78
|                                    |                                                                                                 | |
79
|                                    | Default: 0                                                                                      | |
80
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
81
| period                             | The number of hours between harvests. Harvester will run a new harvest                          | |
82
|                                    | every specified period of hours (either indefinitely or until the maximum                       | |
83
|                                    | number of harvests have run, depending on the value of maxHarvests).                            | |
84
|                                    |                                                                                                 | |
85
|                                    | Default: 24                                                                                     | |
86
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
87
| smtpServer                         | The SMTP server that Harvester uses for sending email messages to the                           | |
88
|                                    | Harvester Administrator and Site Contacts.                                                      | |
89
|                                    | (e.g., somehost.institution.edu). Note that the default value only works                        | |
90
|                                    | if the Harvester host machine is configured as a SMTP server.                                   | |
91
|                                    |                                                                                                 | |
92
|                                    | Default: localhost                                                                              | |
93
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
94
| Harvester Operation Properties     | The Harvester Operation properties are used by Harvester to report information                  | |
95
| (GetDocError, GetDocSuccess, etc.) | about performed operations for inclusion in log entries and email messages.                     | |
96
|                                    | Under most circumstances the values of these properties should not be modified.                 | |
97
+------------------------------------+-------------------------------------------------------------------------------------------------+-+
98

  
99
Configuring a Harvest Site (Instructions for Site Contact)
100
----------------------------------------------------------
101

  
102
After Metacat's Harvester has been configured, remote sites can register and 
103
send information about which files should be retrieved. Each remote site must 
104
have a site contact who is responsible for registering the site and creating a 
105
list of EML files to harvest (the "Harvest List"), as well as for reviewing 
106
harvest reports. The site contact can unregister the site from the Harvester 
107
at any time.
108

  
109
To use Harvester:
110

  
111
1. Register with Harvester
112
2. Compose a Harvest List (you will likely wish to use the Harvest List Editor)
113
3. Prepare your EML Documents for Harvest
114
4. Review the Harvester Reports
115

  
116
Register with Harvester
117
~~~~~~~~~~~~~~~~~~~~~~~
118

  
119
To register a remote site with Harvester, the Site Contact should log in to 
120
Metacat's Harvester Registration page and enter information about the site and 
121
how it should be harvested. 
122

  
123
1. Using a Web browser, log in to Metacat's Harvester Registration page. 
124
   The Harvester Registration page is inside the skins directory. For example, 
125
   if the Metacat server that you wish to register with resides at the following URL: 
126

  
127
   ::
128
   
129
     http://somehost.somelocation.edu:8080/knb/index.jsp
130

  
131
   then the Harvester Registration page would be accessed at: 
132

  
133
   ::
134
   
135
     http://somehost.somelocation.edu:8080/knb/style/skins/knb/harvesterRegistrationLogin.jsp
136

  
137
.. figure:: images/screenshots/image065.jpg
138
   :align: center
139
   
140
   Metacat's Harvester Registration page.
141

  
142
2. Enter your Metacat account information and click Submit to log in to your 
143
   Metacat from the Harvester Registration page.
144

  
145
   Note: In some cases, you may need to log in to an anonymous "site" account 
146
   rather than your personal account so that the registered data will not appear 
147
   to have been registered by a single user. For example, an information 
148
   manager (jones) who is registering data created by a team of scientists 
149
   (jones, smith, and barney) from the Georgia Coastal Ecosystems site  might 
150
   log in to a dedicated account (named with the site's acronym, "GCE") to 
151
   indicate that the registered data is from the entire site rather than "jones". 
152

  
153
3. Enter information about your site and how often you want to schedule harvests 
154
   and then click the Register button (Figure 7.2). The Harvest List URL should 
155
   point to the location of the Harvest List, which is an XML file that lists 
156
   the documents to harvest. If you do not yet have a Harvest List, please see 
157
   the next section for more information about creating one.
158
   
159
.. figure:: images/screenshots/image067.jpg
160
   :align: center
161
   
162
   Enter information about your site and how often you want to schedule harvests.
163

  
164
The example settings in the previous figure instruct Harvester to harvest 
165
documents from the site once every two weeks. The Harvester will access the 
166
site's Harvest List at URL "http://somehost.institution.edu/~myname/harvestList.xml", 
167
and will send email reports to the Site Contact at email address 
168
"myname@institution.edu". Note that you can enter multiple email addresses by 
169
separating each address with a comma or a semi-colon. For example, 
170
"myname@institution.edu,anothername@institution.edu"
171

  
172
Compose a Harvest List (The Harvest List Editor)
173
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
174
The Harvest List is an XML file that contains a list of documents to be harvested. 
175
The list is created by the site contact and stored on the site contact's site 
176
at the location specified during the Harvester registration process (see 
177
previous section for details). The list can be generated by hand, or you can 
178
use Metacat's Harvest List Editor to automatically generate and structure the 
179
list to conform to the required XML schema (displayed in figure at the end of 
180
this section). In this section we will look at what information is required when 
181
building a Harvest List, and how to configure and use the Harvest List Editor. 
182
Note that you must have a source distribution of Metacat in order to use the 
183
Harvest List Editor.
184

  
185
The Harvest List contains information that helps Metacat identify and retrieve 
186
each specified EML file. Each document in the list must be described with a 
187
docid, documentType, and documentURL (see table).
188

  
189
Table: Information that must be included in the Harvest List about each EML file
190
+--------------+-------------------------------------------------------------------------------------------------+
191
| Item         | Description                                                                                     |
192
+==============+=================================================================================================+
193
| docid        | The docid uniquely identifies each EML document. Each docid consists of three elements:         |
194
|              |                                                                                                 |
195
|              | ``scope`` The document group to which the document belongs                                      |
196
|              | ``identifier``  A number that uniquely identifies the document within the scope.                |
197
|              | ``revision`` Anumber that indicates the current revision.                                       |
198
|              |                                                                                                 |
199
|              | For example, a valid docid could be: demoDocument.1.5, where demoDocument represents            |
200
|              | the scope, 1 the identifier, and 5 the revision number.                                         |
201
+--------------+-------------------------------------------------------------------------------------------------+
202
| documentType | The documentType identifies the type of document as EML                                         |
203
|              | e.g., "eml://ecoinformatics.org/eml-2.0.0".                                                     |
204
+--------------+-------------------------------------------------------------------------------------------------+
205
| documentURL  | The documentURL specifies a place where Harvester can locate and retrieve the                   |
206
|              | document via HTTP. The Metacat Harvester must be given read access to the contents at this URL. |
207
|              | e.g. "http://www.lternet.edu/~dcosta/document1.xml".                                            |
208
+--------------+-------------------------------------------------------------------------------------------------+
209

  
210
The example Harvest List below contains two <document> elements that specify the 
211
information that Harvester needs to retrieve a pair of EML documents and 
212
upload them to Metacat.
213

  
214
::
215

  
216

  
217
  <!-- Example Harvest List -->
218
  <?xml version="1.0" encoding="UTF-8" ?>
219
  <hrv:harvestList xmlns:hrv="eml://ecoinformatics.org/harvestList" >
220
    <document>
221
        <docid>
222
            <scope>demoDocument</scope>
223
            <identifier>1</identifier>
224
            <revision>5</revision>
225
        </docid>
226
        <documentType>eml://ecoinformatics.org/eml-2.0.0</documentType>
227
        <documentURL>http://www.lternet.edu/~dcosta/document1.xml</documentURL>
228
    </document>
229
    <document>
230
        <docid>
231
            <scope>demoDocument</scope>
232
            <identifier>2</identifier>
233
            <revision>1</revision>
234
        </docid>
235
        <documentType>eml://ecoinformatics.org/eml-2.0.0</documentType>
236
        <documentURL>http://www.lternet.edu/~dcosta/document2.xml</documentURL>
237
    </document>
238
  </hrv:harvestList>
239

  
240
Rather than formatting the list by hand, you may wish to use Metacat's Harvest 
241
List Editor to compose and edit it. The Harvest List Editor displays a Harvest 
242
List as a table of rows and fields. Each table row corresponds to 
243
a single <document> element in the corresponding Harvest List file (i.e., one 
244
EML document). The row numbers are used only for visual reference and are 
245
not editable.
246

  
247
To add a new document to the Harvest List, enter values for all five editable 
248
fields (all fields except the "Row #" field). Partially filled-in rows will 
249
cause errors that will result in an invalid Harvest List. 
250

  
251
The buttons at the bottom of the Editor can be used to Cut, Copy, and Paste 
252
rows from one location to another. Select a row and click the desired button, 
253
or paste the default values (which are specified in the Editor's configuration 
254
file, discussed later in this section) into the currently selected row by 
255
clicking the Paste Defaults button. Note: Only one row can be selected at any 
256
given time: all cut, copy, and paste operations work on only a single row 
257
rather than on a range of rows. 
258

  
259
To run the Harvest List Editor, from the terminal on which the Metacat 
260
source code is installed: 
261
      
262
1. Open a system command window or terminal window. 
263
2. Set the METACAT_HOME environment variable to the value of the Metacat 
264
   installation directory. Some examples follow: 
265

  
266
   On Windows: 
267

  
268
   ::
269
   
270
     set METACAT_HOME=C:\somePath\knb
271

  
272
   On Linux/Unix (bash shell): 
273
   
274
   ::
275
   
276
     export METACAT_HOME=/home/somePath/metacat
277

  
278
3. cd to the following directory: 
279

  
280
   On Windows: 
281
   
282
   ::
283
   
284
     cd %METACAT_HOME%\lib\harvester
285

  
286
   On Linux/Unix: 
287

  
288
   ::
289
   
290
     cd $METACAT_HOME/lib/harvester
291

  
292
4. Run the appropriate Harvester shell script, as determined by the operating system: 
293

  
294
   On Windows: 
295
   
296
   ::
297
   
298
     runHarvestListEditor.bat
299

  
300
   On Linux/Unix: 
301

  
302
   ::
303
   
304
     sh runHarvestListEditor.sh
305

  
306
   The Harvest List Editor will open. 
307

  
308
If you would like to customize the Harvest List Editor (e.g., specify a 
309
default list to open automatically whenever the editor is opened and/or 
310
default values), create a file called .harvestListEditor (note the leading 
311
dot character). Use a plain text editor to create the file and place the file 
312
in the Site Contact's home directory. To determine the home directory, open a 
313
system command window or terminal window and type the following: 
314

  
315
On Windows: 
316

  
317
::
318

  
319
  echo %USERPROFILE%
320

  
321
On Linux/Unix: 
322

  
323
::
324

  
325
  echo $HOME
326

  
327
The configuration file contains a number of optional properties that can make 
328
using the Editor more convenient. A sample configure file is displayed below, and 
329
more information about each configuration property is contained in the table.
330

  
331
A sample .harvestListEditor configuration file
332

  
333
::
334

  
335
  defaultHarvestList=C:/temp/harvestList.xml
336
  defaultScope=demo_document
337
  defaultIdentifier=1
338
  defaultRevision=1
339
  defaultDocumentURL=http://www.lternet.edu/~dcosta/
340
  defaultDocumentType=eml://ecoinformatics.org/eml-2.0.0
341

  
342
Harvest List Editor Configuration Properties
343

  
344
+---------------------+----------------------------------------------------------------------------------------------+
345
| Property            | Description                                                                                  |
346
+=====================+==============================================================================================+
347
| defaultHarvestList  | The location of a Harvest List file that the Editor will                                     |
348
|                     | automatically open for editing on startup. Set this property                                 |
349
|                     | to the path to the Harvest List file that you expect to edit most frequently.                |
350
|                     |                                                                                              |
351
|                     | Examples:                                                                                    |
352
|                     | ``/home/jdoe/public_html/harvestList.xml``                                                   |
353
|                     | ``C:/temp/harvestList.xml``                                                                  |
354
+---------------------+----------------------------------------------------------------------------------------------+
355
| defaultScope        | The value pasted into the Editor's Scope field when the Paste                                |
356
|                     | Defaults button is clicked. The Scope field should contain                                   |
357
|                     | a symbolic identifier that indicates the family of documents                                 |
358
|                     | to which the EML document belongs.                                                           |
359
|                     |                                                                                              |
360
|                     | Example:   xyz_dataset                                                                       |
361
|                     | Default:    dataset                                                                          |
362
+---------------------+----------------------------------------------------------------------------------------------+
363
| defaultIdentifer    | The value pasted into the Editor's Identifier field when the                                 |
364
|                     | Paste Defaults button is clicked. The Scope field should contain                             |
365
|                     | a numeric value indicating the identifier for this particular EML document within the Scope. |
366
+---------------------+----------------------------------------------------------------------------------------------+
367
| defaultRevision     | The value pasted into the Editor's Revision field when the Paste Defaults button             |
368
|                     | is clicked. The Scope field should contain a numeric value indicating the                    |
369
|                     | revision number of this EML document within the Scope and Identifier.                        |
370
|                     |                                                                                              |
371
|                     | Example:   2                                                                                 |
372
|                     | Default:    1                                                                                |
373
+---------------------+----------------------------------------------------------------------------------------------+
374
| defaultDocumentType | The document type specification pasted into the                                              |
375
|                     | Editor's DocumentType field when the Paste Defaults button is clicked.                       |
376
|                     |                                                                                              |
377
|                     | Default: ``eml://ecoinformatics.org/eml-2.0.0``                                              |
378
+---------------------+----------------------------------------------------------------------------------------------+
379
| defaultDocumentURL  | The URL or partial URL pasted into the Editor's URL field                                    |
380
|                     | when the Paste Defaults button is clicked. Typically, this                                   |
381
|                     | value is set to the portion of the URL shared by all harvested EML documents.                |
382
|                     |                                                                                              |
383
|                     | Example:                                                                                     |
384
|                     | ``http://somehost.institution.edu/somepath/``                                                |
385
|                     | Default: ``http://``                                                                         |
386
+---------------------+----------------------------------------------------------------------------------------------+
387

  
388

  
389
XML Schema for Harvest Lists
390

  
391
::
392

  
393
  <?xml version="1.0" encoding="UTF-8"?>
394
  <!-- edited with XMLSPY v5 rel. 4 U (http://www.xmlspy.com) by Matt Jones (NCEAS) -->
395
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:hrv="eml://ecoinformatics.org/harvestList" xmlns="eml://ecoinformatics.org/harvestList" targetNamespace="eml://ecoinformatics.org/harvestList" elementFormDefault="unqualified" attributeFormDefault="unqualified">
396
  <xs:annotation>
397
    <xs:documentation>This module defines the required information for the harvester to collect documents from the local site. The local system containing this document must give the Metacat Harvester read access to this document.</xs:documentation>
398
  </xs:annotation>
399
  <xs:annotation>
400
    <xs:appinfo>
401
      <tooltip/>
402
      <summary/>
403
      <description/>
404
    </xs:appinfo>
405
  </xs:annotation>
406
  <xs:element name="harvestList">
407
    <xs:annotation>
408
      <xs:documentation>This represents the local document information that is used to inform the Harvester of the docid, document type, and location of the document to be harvested.</xs:documentation>
409
    </xs:annotation>
410
    <xs:complexType>
411
      <xs:sequence>
412
        <xs:element name="document" maxOccurs="unbounded">
413
          <xs:complexType>
414
            <xs:sequence>
415
              <xs:element name="docid">
416
                <xs:annotation>
417
                  <xs:documentation>The complete document identifier to be used by metacat.  The docid is a compound element that gives a scope for the identifier, an integer local identifer that is unique within that scope, and a revision.  Each revision is assumed to specify a unique, non-changing document, so once a particular revision is harvested, there is no need for it to be harvested again.  To trigger a harvest of a document that has been updated, increment the revision number for that identifier.</xs:documentation>
418
                </xs:annotation>
419
                <xs:complexType>
420
                  <xs:sequence>
421
                    <xs:element name="scope" type="xs:string">
422
                      <xs:annotation>
423
                        <xs:documentation>The system prefix of a metacat docid that defines the scope within which the identifier is unique.</xs:documentation>
424
                      </xs:annotation>
425
                    </xs:element>
426
                    <xs:element name="identifier" type="xs:long">
427
                      <xs:annotation>
428
                        <xs:documentation>The local (site specific) portion of the identifier (docid) that is unique within the context of the scope.</xs:documentation>
429
                      </xs:annotation>
430
                    </xs:element>
431
                    <xs:element name="revision" type="xs:long">
432
                      <xs:annotation>
433
                        <xs:documentation>The revision identifier for this document, indicating a unique document version.</xs:documentation>
434
                      </xs:annotation>
435
                    </xs:element>
436
                  </xs:sequence>
437
                </xs:complexType>
438
              </xs:element>
439
              <xs:element name="documentType" type="xs:string">
440
                <xs:annotation>
441
                  <xs:documentation>The type of document to be harvested, indicated by a namespace string, formal public identifier, mime type, or other type indicator.   </xs:documentation>
442
                </xs:annotation>
443
              </xs:element>
444
              <xs:element name="documentURL" type="xs:anyURI">
445
                <xs:annotation>
446
                  <xs:documentation>The documentURL field contains the URL of the document to be harvested. The Metacat Harvester must be given read access to the contents at this URL.</xs:documentation>
447
                </xs:annotation>
448
              </xs:element>
449
            </xs:sequence>
450
          </xs:complexType>
451
        </xs:element>
452
      </xs:sequence>
453
    </xs:complexType>
454
  </xs:element>
455
  </xs:schema>
456

  
457
Prepare EML Documents for Harvest
458
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
459
To prepare a set of EML documents for harvest, ensure that the following is true for each document: 
460

  
461
* The document contains valid EML 
462
* The document is specified in a ``<document>`` element in the site's Harvest List
463
* The file resides at the location specified by its URL in the Harvest List 
464

  
465
Review Harvester Reports
466
~~~~~~~~~~~~~~~~~~~~~~~~
467
Harvester sends an email report to the Site Contact after every scheduled site 
468
harvest. The report contains information about the performed operations, such 
469
as which EML documents were harvested and whether any errors were encountered. 
470
Errors are indicated by operations that display a status value of 1; a status 
471
value of 0 indicates that the operation completed successfully. 
472

  
473
When errors are reported, the Site Contact should try to determine whether the 
474
source of the error is something that can be corrected at the site. Common 
475
causes of errors include:
476

  
477
* a document URL specified in the Harvest List does not match the location of the actual EML file on the disk 
478
* the Harvest List does not contain valid XML as specified in the harvestList.xsd schema 
479
* the URL to the Harvest List (specified during registration) does not match the actual location of the Harvest List on the disk 
480
* an EML document that Harvester attempted to upload to Metacat does not contain valid EML 
481

  
482
If the Site Contact is unable to determine the cause of the error and its 
483
resolution, he or she should contact the Harvester Administrator for assistance. 
484

  
485
Unregister with Harvester
486
~~~~~~~~~~~~~~~~~~~~~~~~~
487
To discontinue harvests, the Site Contact must unregister with Harvester. 
488
To unregister:
489

  
490
1. Using a Web browser, log in to Metacat's Harvester Registration page. 
491
   The Harvester Registration page is inside the skins directory. For example, 
492
   if the Metacat server that you wish to register with resides at the 
493
   following URL: 
494

  
495
   ::
496
   
497
     http://somehost.somelocation.edu:8080/knb/index.jsp
498

  
499
   then the Harvester Registration page would be accessed at: 
500

  
501
   ::
502

  
503
     http://somehost.somelocation.edu:8080/knb/style/skins/knb/harvesterRegistrationLogin.html
504

  
505
2. Enter and submit your Metacat account information. On the subsequent screen, 
506
   click Unregister to remove your site and discontinue harvests. 
507

  
508
Running Harvester
509
-----------------
510
The Harvester can be run as a servlet or in a command window. Under most 
511
circumstances, Harvester is best run continuously as a background servlet 
512
process. However, if you expect to use Harvester infrequently, or if wish only 
513
to test that Harvester is functioning, it may desirable to run it from a 
514
command window.
515

  
516
Running Harvester as a Servlet
517
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
518
To run Harvester as a servlet (from a source code installation):
519

  
520
1. Remove the comment symbols around the HarvesterServlet entry in the source 
521
   code. The HarvesterServlet entry is located in the ``lib/web.xml.tomcatN`` 
522
   file, where tomcatN corresponds to the version of Tomcat you are running. 
523
   For example, if you are running Tomcat 6, edit file lib/web.xml.tomcat6. 
524

  
525
   ::
526
   
527
     <!--
528
     <servlet>
529
       <servlet-name>HarvesterServlet</servlet-name>
530
       <servlet-class>edu.ucsb.nceas.metacat.harvesterClient.HarvesterServlet</servlet-class>
531
       <init-param>
532
       <param-name>debug</param-name>
533
       <param-value>1</param-value>
534
       </init-param>
535
       <init-param>
536
       <param-name>listings</param-name>
537
       <param-value>true</param-value>
538
       </init-param>
539
       <load-on-startup>1</load-on-startup>
540
     </servlet>
541
     -->
542

  
543
2. Save the edited file. 
544
3. Shut down Tomcat. 
545
4. Redeploy Metacat by running the following two Ant commands from the 
546
   top-level directory of your Metacat installation: 
547

  
548
   ::
549
   
550
     ant cleanweb
551
     ant install
552

  
553
5. Restart Tomcat. Note that you will have to edit the ``metacat.properties`` 
554
   file to specify harvester settings.
555

  
556
About thirty seconds after you restart Tomcat, the Harvester servlet will 
557
start executing. The first harvest will occur after the number of hours 
558
specified in the metacat.properties file. The servlet will continue running 
559
new harvests until the maximum number of harvests have been completed, or until 
560
Tomcat shuts down (harvest frequency and maximum number of harvests are also 
561
set in the Harvester properties). 
562

  
563
Running Harvester in a Command Window
564
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
565

  
566
To run Harvester in a Command Window:
567
 
568
1. Open a system command window or terminal window. 
569
2. Set the ``METACAT_HOME`` environment variable to the value of the 
570
   Metacat installation directory. 
571

  
572
   On Windows: 
573

  
574
   ::
575
   
576
     set METACAT_HOME=C:\somePath\metacat
577

  
578
   On Linux/Unix (bash shell): 
579

  
580
   ::
581
   
582
     export METACAT_HOME=/home/somePath/metacat
583

  
584
3. cd to the following directory: 
585

  
586
   On Windows: 
587

  
588
   ::
589
   
590
     cd %METACAT_HOME%\lib\harvester
591

  
592
   On Linux/Unix: 
593

  
594
   ::
595
   
596
     cd $METACAT_HOME/lib/harvester
597

  
598
4. Run the appropriate Harvester shell script, as determined by the operating system: 
599

  
600
   On Windows: 
601

  
602
   ::
603
   
604
     runHarvester.bat
605

  
606
   On Linux/Unix: 
607

  
608
   ::
609
   
610
     sh runHarvester.sh
611

  
612
The Harvester application will start executing. The first harvest will occur 
613
after the number of hours specified in the ``metacat.properties file``. The 
614
servlet will continue running new harvests until the maximum number of harvests 
615
have been completed, or until you interrupt the process by hitting CTRL/C in 
616
the command window (harvest frequency and maximum number of harvests are also 
617
set in the Harvester properties). 
618

  
619
Reviewing Harvest Reports
620
-------------------------
621
Harvester sends an email report to the Harvester Administrator after every 
622
harvest. The report contains information about the performed operations, such 
623
as which sites were harvested as well as which EML documents were harvested 
624
and whether any errors were encountered. Errors are indicated by operations 
625
that display a status value of 1; a status value of 0 indicates that the 
626
operation completed successfully. 
627

  
628
The Harvester Administrator should review the report, paying particularly 
629
close attention to any reported errors and accompanying error messages. When 
630
errors are reported at a particular site, the Harvester Administrator should 
631
contact the Site Contact to determine the source of the error and its 
632
resolution. Common causes of errors include:
633

  
634
* a document URL specified in the Harvest List does not match the location of the actual EML file on the disk 
635
* the Harvest List does not contain valid XML as specified in the harvestList.xsd schema 
636
* the URL to the Harvest List (specified during registration) does not match the actual location of the Harvest List on the disk 
637
* an EML document that Harvester attempted to upload to Metacat does not contain valid EML 
638

  
639
Errors that are independent of a particular site may indicate a problem with 
640
Harvester itself, Metacat, or the database connection. Refer to the error 
641
message to determine the source of the error and its resolution. 
642

  
docs/dev/metacat/source/identifiers.rst
1
.. raw:: latex
2

  
3
  \newpage
4
  
5

  
6
Identifier Management
7
=====================
8

  
9
.. index:: Identifiers
10

  
11
Author
12
  Matthew B. Jones
13

  
14
Date
15
  - 20100301 [MBJ] Initial draft of Identifier documentation
16

  
17
Goal
18
  Extend Metacat to support identifiers with arbitrary syntax
19

  
20
Summary 
21
  Metacat currently supports identifier strings called 'docids' that have
22
  the syntax 'scope.object.revision', such as 'foo.34.1' (we will refer to
23
  these as 'LocalIDs'). We now want Metacat to support identifiers that are 
24
  arbitrary strings, but still enforce uniqueness and proper revision
25
  handling (refer to these as GUIDs).  Metacat must be able to accept 
26
  these strings as identifiers for all CRUD operations, and reference them 
27
  in search results.
28

  
29
Identifier Resolution
30
---------------------
31
Because Metacat uses LocalIDs throughout the code for references to objects,
32
and that LocalID has a constrained structure that includes semantics about
33
revisions in the identifier, it is difficult to wholesale replace it with
34
less-constrained string identifiers without re-writing much of Metacat.
35
Thus, our alternate strategy is to wrap the Metacat APIs with a
36
identifier resolution layer that keeps track of the unconstrained GUIDs and
37
maps them to constrained local identifiers which are used internally within
38
Metacat. The basic identifer table model is shown in Figure 1, while the
39
basic strategy for retrieving an object is shown in Figure 2, creating an 
40
object is shown in Figure 3, updating an object in Figure 4, and deleting
41
an object is shown in Figure 5.
42

  
43

  
44
Identifier Table Structure
45
~~~~~~~~~~~~~~~~~~~~~~~~~~
46

  
47
.. figure:: images/identifiers.png
48

  
49
   Figure 1. Table structure for identifiers.
50

  
51
..
52
  This block defines the table structure diagram referenced above.
53
  @startuml images/identifiers.png
54

  
55
  identifiers "*" -- "1" xml_documents
56

  
57
  identifiers : String identifier
58
  identifiers : String docid
59
  identifiers : Integer rev
60

  
61
  xml_documents : String docid
62
  xml_documents : String rev
63

  
64
  note right of identifiers
65
    "identifiers.(docid,rev) is a foreign key into xml_documents"
66
  end note
67
  @enduml
68

  
69
.. raw:: latex
70

  
71
  \newpage
72

  
73
.. raw:: pdf
74

  
75
  PageBreak
76

  
77

  
78
Handling document read operations
79
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
80

  
81
An overview of the process needed to read an object using a GUID.
82

  
83

  
84
.. figure:: images/guid_read.png
85

  
86
   Figure 2. Basic handling for string identifiers (GUIDs) as mapped to
87
   docids (LocalIDs) to retrieve an object.
88

  
89
..
90
  @startuml images/guid_read.png
91
  !include plantuml.conf
92
  actor User
93
  participant "Client" as app_client << Application >>
94
  participant "CRUD API" as c_crud << MetacatRestServlet >>
95
  participant "Identifier Manager" as ident_man << IdentifierManager >>
96
  participant "Handler" as handler << MetacatHandler >>
97
  User -> app_client
98
  app_client -> c_crud: get(token, GUID)
99
  c_crud -> ident_man: getLocalID(GUID)
100
  c_crud <-- ident_man: localID
101
  c_crud -> handler: handleReadAction(localID)
102
  c_crud <-- handler: object
103
  c_crud --> app_client: object
104
  
105
  @enduml
106

  
107
Handling document create operations
108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
109

  
110
An overview of the process needed to create an object using a GUID.
111

  
112
.. figure:: images/guid_insert.png
113

  
114
   Figure 3. Basic handling for string identifiers (GUIDs) as mapped to
115
   docids (LocalIDs) to create an object.
116

  
117
..
118
  @startuml images/guid_insert.png
119
  !include plantuml.conf
120
  actor User
121
  participant "Client" as app_client << Application >>
122
  participant "CRUD API" as c_crud << MetacatRestServlet >>
123
  participant "Identifier Manager" as ident_man << IdentifierManager >>
124
  participant "Handler" as handler << MetacatHandler >>
125
  User -> app_client
126
  app_client -> c_crud: create(token, GUID, object, sysmeta)
127
  c_crud -> ident_man: identifierExists(GUID)
128
  c_crud <-- ident_man: T or F 
129
  alt identifierExists == "F"
130
      c_crud -> ident_man: mapToLocalId(GUID)
131
      c_crud <-- ident_man: localID
132
      c_crud -> handler: handleInsertAction(localID)
133
      c_crud <-- handler: success
134
      note right of c_crud
135
        "Also need to address how to handle the sysmeta information wrt insertion methods"
136
      end note
137
      app_client <-- c_crud: success
138
  else identifierExists == "T"
139
      app_client <-- c_crud: IdentifierNotUnique
140
  end
141
  @enduml
142

  
143
Handling document update operations
144
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
145

  
146
An overview of the process needed to update an object using a GUID.
147

  
148
.. figure:: images/guid_update.png
149

  
150
   Figure 4. Basic handling for string identifiers (GUIDs) as mapped to
151
   docids (LocalIDs) to update an object.
152

  
153
..
154
  @startuml images/guid_update.png
155
  !include plantuml.conf
156
  actor User
157
  participant "Client" as app_client << Application >>
158
  participant "CRUD API" as c_crud << MetacatRestServlet >>
159
  participant "Identifier Manager" as ident_man << IdentifierManager >>
160
  participant "Handler" as handler << MetacatHandler >>
161
  User -> app_client
162
  app_client -> c_crud: update(token, GUID, object, obsoletedGUID, sysmeta)
163

  
164
  c_crud -> ident_man: identifierExists(obsoletedGUID)
165
  c_crud <-- ident_man: T or F 
166
  alt identifierExists == "T"
167

  
168
      c_crud -> ident_man: identifierExists(GUID)
169
      c_crud <-- ident_man: T or F 
170
      alt identifierExists == "F"
171
          c_crud -> ident_man: mapToLocalId(GUID, obsoletedGUID)
172
          c_crud <-- ident_man: localID
173
          c_crud -> handler: handleUpdateAction(localID)
174
          c_crud <-- handler: success
175
          note right of c_crud
176
            "Also need to address how to handle the sysmeta information wrt update methods"
177
          end note
178
          app_client <-- c_crud: success
179
      else identifierExists == "T"
180
          app_client <-- c_crud: IdentifierNotUnique
181
      end
182
  else identifierExists == "F"
183
      app_client <-- c_crud: NotFound
184
  end
185
  @enduml
186

  
187
Handling document delete operations
188
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189

  
190
An overview of the process needed to delete an object using a GUID.
191

  
192
.. figure:: images/guid_delete.png
193

  
194
   Figure 5. Basic handling for string identifiers (GUIDs) as mapped to
195
   docids (LocalIDs) to delete an object.
196

  
197
..
198
  @startuml images/guid_delete.png
199
  !include plantuml.conf
200
  actor User
201
  participant "Client" as app_client << Application >>
202
  participant "CRUD API" as c_crud << MetacatRestServlet >>
203
  participant "Identifier Manager" as ident_man << IdentifierManager >>
204
  participant "Handler" as handler << MetacatHandler >>
205
  User -> app_client
206
  app_client -> c_crud: delete(token, GUID)
207
  c_crud -> ident_man: identifierExists(GUID)
208
  c_crud <-- ident_man: T or F 
209
  alt identifierExists == "T"
210
      c_crud -> ident_man: mapToLocalId(GUID)
211
      c_crud <-- ident_man: localID
212
      c_crud -> handler: handleDeleteAction(localID)
213
      c_crud <-- handler: success
214
      app_client <-- c_crud: success
215
  else identifierExists == "F"
216
      app_client <-- c_crud: NotFound
217
  end
218
  @enduml
219

  
220
..
221
  This block defines the interaction diagram referenced above.
222
  startuml images/01_interaction.png
223
    !include plantuml.conf
224
    actor User
225
    participant "Client" as app_client << Application >>
226
    User -> app_client
227

  
228
    participant "CRUD API" as c_crud << Coordinating Node >>
229
    activate c_crud
230
    app_client -> c_crud: resolve(GUID, auth_token)
231
    participant "Authorization API" as c_authorize << Coordinating Node >>
232
    c_crud -> c_authorize: isAuth(auth_token, GUID)
233
    participant "Verify API" as c_ver << Coordinating Node >>
234
    c_authorize -> c_ver: isValidToken (token)
235
    c_authorize <-- c_ver: T or F
236
    c_crud <-- c_authorize: T or F
237
    app_client <-- c_crud: handle_list
238
    deactivate c_crud
239

  
240
    participant "CRUD API" as m_crud << Member Node >>
241
    activate m_crud
242
    app_client -> m_crud: get(auth_token, handle)
243
    participant "Server Authentication API" as m_authenticate << Member Node >>
244
    m_crud -> m_authenticate: isAuth(auth_token, GUID)
245
    m_crud <-- m_authenticate: T or F
246
    m_crud -> m_crud: log(get, UserID, GUID)
247
    app_client <-- m_crud: object or unauth or doesNotExist
248
    deactivate m_crud
249
  enduml
250 0

  
docs/dev/metacat/source/conf.py
1
# -*- coding: utf-8 -*-
2
#
3
# Metacat documentation build configuration file, created by
4
# sphinx-quickstart on Mon Mar  1 14:16:16 2010.
5
#
6
# This file is execfile()d with the current directory set to its containing dir.
7
#
8
# Note that not all possible configuration values are present in this
9
# autogenerated file.
10
#
11
# All configuration values have a default; values that are commented out
12
# serve to show the default.
13

  
14
import sys, os
15

  
16
# If extensions (or modules to document with autodoc) are in another directory,
17
# add these directories to sys.path here. If the directory is relative to the
18
# documentation root, use os.path.abspath to make it absolute, like shown here.
19
#sys.path.append(os.path.abspath('.'))
20

  
21
# -- General configuration -----------------------------------------------------
22

  
23
# Add any Sphinx extension module names here, as strings. They can be extensions
24
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
25
extensions = []
26

  
27
# Add any paths that contain templates here, relative to this directory.
28
templates_path = ['_templates']
29

  
30
# The suffix of source filenames.
31
source_suffix = '.rst'
32

  
33
# The encoding of source files.
34
#source_encoding = 'utf-8'
35

  
36
# The master toctree document.
37
master_doc = 'index'
38

  
39
# General information about the project.
40
project = u'Metacat'
41
copyright = u'2012, Regents of the University of California'
42

  
43
# The version info for the project you're documenting, acts as replacement for
44
# |version| and |release|, also used in various other places throughout the
45
# built documents.
46
#
47
# The short X.Y version.
48
version = '2.0'
49
# The full version, including alpha/beta/rc tags.
50
release = '2.0.0'
51

  
52
# The language for content autogenerated by Sphinx. Refer to documentation
53
# for a list of supported languages.
54
#language = None
55

  
56
# There are two options for replacing |today|: either, you set today to some
57
# non-false value, then it is used:
58
#today = ''
59
# Else, today_fmt is used as the format for a strftime call.
60
#today_fmt = '%B %d, %Y'
61

  
62
# List of documents that shouldn't be included in the build.
63
#unused_docs = []
64

  
65
# List of directories, relative to source directory, that shouldn't be searched
66
# for source files.
67
exclude_trees = []
68

  
69
# The reST default role (used for this markup: `text`) to use for all documents.
70
#default_role = None
71

  
72
# If true, '()' will be appended to :func: etc. cross-reference text.
73
#add_function_parentheses = True
74

  
75
# If true, the current module name will be prepended to all description
76
# unit titles (such as .. function::).
77
#add_module_names = True
78

  
79
# If true, sectionauthor and moduleauthor directives will be shown in the
80
# output. They are ignored by default.
81
#show_authors = False
82

  
83
# The name of the Pygments (syntax highlighting) style to use.
84
pygments_style = 'sphinx'
85

  
86
# A list of ignored prefixes for module index sorting.
87
#modindex_common_prefix = []
88

  
89

  
90
# -- Options for HTML output ---------------------------------------------------
91

  
92
# The theme to use for HTML and HTML Help pages.  Major themes that come with
93
# Sphinx are currently 'default' and 'sphinxdoc'.
94
#html_theme = 'default'
95
html_theme = 'readable'
96

  
97
# Theme options are theme-specific and customize the look and feel of a theme
98
# further.  For a list of options available for each theme, see the
99
# documentation.
100
#html_theme_options = {}
101

  
102
# Add any paths that contain custom themes here, relative to this directory.
103
html_theme_path = ['themes',]
104

  
105
# The name for this set of Sphinx documents.  If None, it defaults to
106
# "<project> v<release> documentation".
107
#html_title = None
108

  
109
# A shorter title for the navigation bar.  Default is the same as html_title.
110
#html_short_title = None
111

  
112
# The name of an image file (relative to this directory) to place at the top
113
# of the sidebar.
114
#html_logo = None
115

  
116
# The name of an image file (within the static path) to use as favicon of the
117
# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
118
# pixels large.
119
#html_favicon = None
120

  
121
# Add any paths that contain custom static files (such as style sheets) here,
122
# relative to this directory. They are copied after the builtin static files,
123
# so a file named "default.css" will overwrite the builtin "default.css".
124
html_static_path = ['_static']
125

  
126
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
127
# using the given strftime format.
128
#html_last_updated_fmt = '%b %d, %Y'
129

  
130
# If true, SmartyPants will be used to convert quotes and dashes to
131
# typographically correct entities.
132
#html_use_smartypants = True
133

  
134
# Custom sidebar templates, maps document names to template names.
135
#html_sidebars = {}
136

  
137
# Additional templates that should be rendered to pages, maps page names to
138
# template names.
139
#html_additional_pages = {}
140

  
141
# If false, no module index is generated.
142
html_use_modindex = False
143
html_domain_indices = False
144

  
145
# If false, no index is generated.
146
#html_use_index = True
147

  
148
# If true, the index is split into individual pages for each letter.
149
#html_split_index = False
150

  
151
# If true, links to the reST sources are added to the pages.
152
#html_show_sourcelink = True
153

  
154
# If true, an OpenSearch description file will be output, and all pages will
155
# contain a <link> tag referring to it.  The value of this option must be the
156
# base URL from which the finished HTML is served.
157
#html_use_opensearch = ''
158

  
159
# If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml").
160
#html_file_suffix = ''
161

  
162
# Output file base name for HTML help builder.
163
htmlhelp_basename = 'Metacatdoc'
164

  
165

  
166
# -- Options for LaTeX output --------------------------------------------------
167

  
168
# The paper size ('letter' or 'a4').
169
#latex_paper_size = 'letter'
170

  
171
# The font size ('10pt', '11pt' or '12pt').
172
#latex_font_size = '10pt'
173

  
174
# Grouping the document tree into LaTeX files. List of tuples
175
# (source start file, target name, title, author, documentclass [howto/manual]).
176
latex_documents = [
177
  ('index', 'Metacat.tex', u'Metacat Documentation',
178
   u'Matthew B. Jones', 'manual'),
179
]
180

  
181
# The name of an image file (relative to this directory) to place at the top of
182
# the title page.
183
#latex_logo = None
184

  
185
# For "manual" documents, if this is true, then toplevel headings are parts,
186
# not chapters.
187
#latex_use_parts = False
188

  
189
# Additional stuff for the LaTeX preamble.
190
#latex_preamble = ''
191

  
192
# Documents to append as an appendix to all manuals.
193
#latex_appendices = []
194

  
195
# If false, no module index is generated.
196
#latex_use_modindex = True
docs/dev/metacat/source/event-logging.rst
1
Event Logging
2
=============
3

  
4
Metacat keeps an internal log of events (such as insertions, updates, deletes, 
5
and reads) that can be accessed with the getlog action. Using the getlog action, 
6
event reports can be output from Metacat in XML format, and/or customized to 
7
include only certain events: events from a particular IP address, user, event 
8
type, or that occurred after a specified start date or before an end date. 
9

  
10
The following URL is used to return the basic log—an XML-formatted log of all 
11
events since the log was initiated::
12

  
13
  http://some.metacat.host/context/metacat?action=getlog 
14

  
15
Note that you must be logged in to Metacat using the HTTP interface or you 
16
will get an error message. For more information about logging in, please see 
17
Logging In with the HTTP Interface.
18

  
19
::
20

  
21
  <!-- Example of XML Log -->
22
  <?xml version="1.0"?>
23
  <log>
24
  <logEntry><entryid>44</entryid><ipAddress>34.237.20.142</ipAddress><principal>uid=jones,
25
  o=NCEAS,dc=ecoinformatics,dc=org</principal><docid>esa.2.1</docid><event>insert</event>
26
  <dateLogged>2004-09-08 19:08:18.16</dateLogged></logEntry>
27
  <logEntry><entryid>47</entryid><ipAddress>34.237.20.142</ipAddress><principal>uid=jones,o=NCEAS,
28
  dc=ecoinformatics,dc=org</principal><docid>esa.3.1</docid><event>insert</event><dateLogged>2004-
29
  09-14 19:50:40.61</dateLogged></logEntry>
... This diff was truncated because it exceeds the maximum size that can be displayed.

Also available in: Unified diff