Metacat Spatial Option |
Back | Home | Next |
Although the spatial option is included with a Metacat installation (beginning with Metacat version 1.7.0), it is an extention to Metacat's functionality that may be used optionally.
Term | Definition |
Spatial Cache | A cached version of the metacat documents representing their geographic coverages in a GIS-compatible data format; the ESRI Shapefile. |
Web Mapping Service (WMS) | A standard interface specification for requesting spatial data as a web-deliverable map image. WMS servers accept a common set of parameters via http, render the spatial dataset into an appropriate image and deliver it back to the client. The WMS spec was developed by the Open Geospatial Consortium. |
Bounding Box (BBOX) | A Bounding Box is two sets of geographic coordinates representing the full geographic extent of an entity; the minimum lat/long (the lower-left) and the maximum lat/long (the upper-right). |
Spatial Dataset | A collection of spatial features in a common datastore. |
Spatial Features | Analagous to a "row" in a tabular dataset, a feature is an entity comprised of both tabular attributes and a spatial geometry. |
Spatial Geometry | The geometry is a vector representation of an entities' geographic location. This can be a point (a single vertex), line (a series of vertices) or polygon (a series of vertices forming a closed area). |
Multi-Geometry | A Spatial geometry represented by one or more geometry primitives (points,lines and polygons). For example a single species census (the spatial feature ) might have mutltiple sample sites and could be represented as a multi-point geometry. |
The Spatial Harvester component syncs the metacat database with the spatial cache (an ESRI shapefile which contains the geographic coverages of the documents).
The Spatial Harvester is implemented entirely in Java using the Geotools library which allows manipulation of spatial datasets. In rough terms, a spatial dataset is a collection of Features which are comprised of a geometry (i.e. the geographic coverage) and associated attributes (i.e. the document's title).
There are a number of Java classes which, collectively, make up the spatial harvester functionality. They are found in the edu.ucsb.nceas.metacat.spatial package:
The spatial cache currently represents the geographic coverage of XML documents based on a bounding box. The four bounding coordinates (either latitudes or longitudes) can be specified in the metacat.properties file by their xpaths. For example, the geographic coverage of EML documents is defined as:
westBoundingCoordinatePath=geographicCoverage/boundingCoordinates/westBoundingCoordinate eastBoundingCoordinatePath=geographicCoverage/boundingCoordinates/eastBoundingCoordinate southBoundingCoordinatePath=geographicCoverage/boundingCoordinates/southBoundingCoordinate northBoundingCoordinatePath=geographicCoverage/boundingCoordinates/northBoundingCoordinate
It is important to note that, at the moment, only one set of xpaths are defined in metacat.properties meaning only documents of the chosen schema can be accessed by the spatial harvester. Also note that, for performance reasons, the xpaths to the bounding coordinates must also appear in your indexPath (defined in build.properties).
The bounding coordinates are spatially cached in two ways: the centroid(s) of the bounding box(s) and the actual bounding box(es). These are stored as two seperate shapefiles with multi-point and multi-polygon geometry types respectively. By default, ${tomcat_webapps_directory}/${context}/data/metacat_shps/data_points.shp is the storage location of the point cache while data_bounds.shp represents the polygon cache.
The bounding polygon is not relevant to every document as bounding coordinates are allowed to be of zero-area (ie west = = east and north = = south). In this case they are represented only as a point. In cases where no bounding coordinates are defined, the document is not represented at all in the spatial cache. Note that special care has been taken to account for cases where the bounding box crosses the international dateline or polar regions (at which point Cartesian calculations are invalid).
Because documents may have more than one geographic coverage, it is necessary to define the two spatial caches as multi-point and multi-polygon geometry types. This means that each feature's geometry field can contain a collection of one or more primitive geometries.
With the spatial option properly installed, the default metacat.properties setting is to set regenerateCacheOnRestart=true. This is very useful the first time you install metacat since it will generate the spatial cache from scratch when your servlet container is restarted. Depending on how many documents you have in your metacat database, this can take a considerable amount of time; several minutes in the case of a few thousand documents. For this reason, Metacat sets this property to false after the spatial cache has been generated the first time. This prevents the regeneration of the spatial cache every time you restart your servlet container. Note that if you upgrade or reinstall metacat, the spatial cache will be regenerated again.
Once the spatial cache has been generated, the Metacat servlet will keep the spatial cache in sync with the metacat database by triggering the spatial harvester on every insert, update or delete. This does not regenerate the whole spatial cache, instead simply updating features in the cache as needed. It is fairly quick and should not add more than 1/2 second to any given transaction. As mentioned earlier, all high-level interactions with the spatial cache are handled through the SpatialHarvester class.
There is one very important note about document authentication. While metacat provides very fine-grained permissions control at the document level, the Web Mapping Server component does not. For this reason, only documents that are publicly readable (i.e. documents which match the following SQL query : select distinct docid from xml_access where principal_name = 'public' and perm_type = 'allow')will be added to the spatial cache. In the Future Directions section of this document, the potential for adding feature-level permissions to the WMS server are discussed.
The primary function of the Web Mapping Server component is to render the spatial cache as a web-deliverable map image. It is also responsible for rendering other geographic data to provide base maps or other auxillary map layers.
The OpenGIS consortium has defined a standard for requesting maps, the Web Mapping Service or WMS standard. WMS servers accept a common set of parameters via http, render the spatial dataset into an appropriate image and deliver it back to the client. For Metacat, we chose to go with GeoServer, a WMS-capable application written in Java.
Integration with Metacat Context
Configuration
Web Interface
Supports many vector data formats
Outputs Images (mainly) but can be used to output GML and KML.
Known issues (rasters, startup procedure, difficult for novice to configure, maven build, size)
Displaying the spatial cache as a map is important but users also need to query the spatial cache in order to answer the question "What documents lie in this geographic region?". The functionality is invoked through the metacat servlet itself; there is a spatial_query action for this purpose. An example spatial query would be:
http://localhost/knb/metacat?action=spatial_query&xmin=-117.5&xmax=-64&ymin=3&ymax=46&skin=default
Where xmin, xmax, ymin and ymax represent the west, east, south and northern bounding coordinates respectively. This will return an html document listing (in the style of the specified skin) all documents whose geographic coverage intersect the given bounding box.
The core functionality of the spatial query mechanism is found in the edu.ucsb.nceas.metacat.spatial.SpatialQuery class and, like the spatial harvester, relies heavily on the Geotools library. This class has a single method, filterByBbox(), which compares the bounding box to both the point and polygon cache. For each shapefile, the process requires two steps: First, filter the spatial cache for features whose bounding box overlaps the specified bounding coordinates; Second, iterate through the remaining features and perform an an actual geometric intersection. The second step, though more costly than comparing the bbox, is necessary because the feature's geometry may be a multi-geometry whose bounding box is large but whose component primitive geometries are scattered over that area. The end result is a vector of docids matching the spatial query.
This docid list is then sent to DBQuery. Using a special constructor that takes a vector of docids, the DBQuery class is able to use the Docid override mechanism to perform an optimized query (for cases where the list of docids is already known).
In order to provide a web-based user interface to the WMS and the spatial query functionality, Metacat relies on Community Mapbuilder. Mapbuilder is a pure HTML/javascript application which uses AJAX and XSLT on the client side to create a desktop-GIS-like environment for interacting with geographic data through a web browser.
Consumes WMS services, defined through WMC
Map interface components (map, box zooms, layer list, "select location" dropdown, scalebar, coordinates, info query)
Widget Architecture, Metacat config files, html divs
AOIMetacat Query
Integration with skins
When first installing a version of metacat with the spatial option, you'll want to ensure that install.spatial is true in build.properties. You'll also want to ensure that runSpatialOption is true in metacat.properties. Both of these values are the default so, unless you explicitly set them to be false, the spatial option should install and run automatically.
How do I configure the layout of the html mapping interface?
How can I configure the initial extent of the map?
How do I configure the "select location" dropdown to contain different predefined locations?
Can I use a different web mapping interface?
How do I configure the styling and classification of the data?
How can I upgrade/change the version of geoserver?
What versions of tomcat are supported?
The spatial functionality has only been tested on tomcat 5.
How do I add the spatial functionality to a metacat skin?
When users put spatial data into the Morpho system, it would be nice if we could automatically pull all the avialable metadata from the spatial dataset itself.
On the metacat side, it might be worth trying to auto-detect spatial datasets and add them to the WMS service do that they could be displayed along with the metadata coverages. This is tricky since the styling of spatial data is intentionally seperated from the data itself; we'd have to have some sort of easy way to prompt the user for the classification and styling info and construct the appropriate SLDs.
It's worth noting that, currently, one could do this manually. There is nothing, aside from editing a few configuration files, to prevent any Geotools-supported dataset from being displayed through the WMS map interface.
For vector datasets, it would be possible to store the data directly in the database itself (This is a logical extension of the future work to put tabular data directly in a relational database). Postgresql has the PostGIS extensions to handle this so we would have to require postgresql if we went this route.
Filter which spatial cache features are displayed by access contraints, skin constraints and the current non-spatial query set. This would involve intercepting incoming WMS requests and appending a styled layer descriptor (SLD) with an OGC filter to prevent/allow certain docids.
Closely related to the WMS bypass implemetation, the SLD factory would be in charge of constructing the filter based on on the contraints mentioned above. Because it would have to generate this list of docids on every wms request, performance is a big concern. Likely we'll need to cache docid lists as session variables.
Geoserver currently offers a nice web-based configuration but it is lacking a few key features and may be difficult for a novice GIS user. We may want to reinvent a custom geoserver configuration interface to
Ideally we could pull as much information as possible from the metadata and make the UI very intuitive. This does bring up issues of web-based admin access constraints and developing a subsytem to handle who has edit access to the map configuration.
Harvester must be configured to interact with a working Metacat installation. Thus, a Metacat installation that has been properly configured and installed is a pre-requisite to running Harvester. Additionally, Harvester has a number of settable properties that control its behavior. All Harvester configuration information is managed in a single file, metacat.properties, located at:
METACAT_HOME/lib/metacat.propertieswhere METACAT_HOME is the top-level directory that Metacat is installed in.
Harvester properties are grouped together in metacat.properties, beginning after the comment line:
# Harvester properties
The Harvester Administrator should edit
metacat.properties,
setting appropriate values for the harvesterAdministrator
property, the smtpServer
property, and possibly other
properties. The following table is a summary of each property and its function.
Property | Description | Possible or default value |
connectToMetacat | This property determines whether Harvester should connect to
Metacat to upload documents. It should be set to true
under most circumstances. Setting this property to false
can be useful for testing whether Harvester is able to retrieve
documents from a site without actually connecting to Metacat to
upload the documents. |
true | false Default: true
|
delay | The number of hours that Harvester will wait before beginning its first harvest. For example, if Harvester is run at 1:00 p.m., and the delay is set to 12, Harvester will begin its first harvest at 1:00 a.m. | Default: 0 |
harvesterAdministrator | The email address of the Harvester Administrator. Harvester will send email reports to this address after every harvest. You may enter multiple email addresses by separating each address with a comma or semicolon, for example, "name1@abc.edu,name2@abc.edu". | An email address, or multiple email addresses separated by commas or semi-colons |
logPeriod | The number of days that Harvester should retain log entries of harvest
operations in the database. Harvester log entries record information
such as which documents were harvested, from which sites, and
whether any errors were encountered during the harvest. Log entries
older than logPeriod number of days are purged from the
database at the end of each harvest. |
Default: 90 |
maxHarvests | The maximum number of harvests that Harvester should execute before
shutting down. When the Harvester program is executed, it will
continue running until it has executed maxHarvests
number of harvests and then the program will terminate. If
the value of maxHarvests is set to 0 or a negative
number, it will be ignored and Harvester will execute indefinitely.
|
Default: 0 |
period | The number of hours between harvests. Harvester will run a new
harvest every period number of hours, until the
maxHarvests number of harvests have been run, or
indefinitely if maxHarvests is set to a value of
0 or a negative number.
| Default: 24 |
smtpServer | The SMTP server that Harvester uses for sending email messages to the Harvester Administrator and to Site Contacts. | A host name, for example: somehost.institution.edu
Default: localhost
Note that the default value will only work if the Harvester host machine has been configured as a SMTP server. |
Harvester Operation Properties (GetDocError, GetDocSuccess, etc.) | This group of properties is used by Harvester to report information about the operations it performs for inclusion in log entries and email messages. Under most circumstances the values of these properties should not be modified. |
set METACAT_HOME=C:\somePath\metacat
export METACAT_HOME=/home/somePath/metacat
cd %METACAT_HOME%\lib\harvester
cd $METACAT_HOME/lib/harvester
runHarvester.bat
sh runHarvester.sh
The Harvester application will start executing. It will begin its first
harvest after delay
number of hours (as specified in the
metacat.properties
file). The application will continue running a new harvest every
period
number of hours until a maxHarvests
number of harvests have been completed (if maxHarvests
is set
to a value greater than 0), or until you interrupt the process by hitting CTRL/C
in the command window.
lib/web.xml.tomcatN
, where tomcatN corresponds to the
version of Tomcat you are running. For example, if you are running Tomcat 5,
edit file lib/web.xml.tomcat5
.
<!--
<servlet>
<servlet-name>HarvesterServlet</servlet-name>
<servlet-class>edu.ucsb.nceas.metacat.harvesterClient.HarvesterServlet</servlet-class>
<init-param>
<param-name>debug</param-name>
<param-value>1</param-value>
</init-param>
<init-param>
<param-name>listings</param-name>
<param-value>true</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
-->
is changed to:
<servlet>
<servlet-name>HarvesterServlet</servlet-name>
<servlet-class>edu.ucsb.nceas.metacat.harvesterClient.HarvesterServlet</servlet-class>
<init-param>
<param-name>debug</param-name>
<param-value>1</param-value>
</init-param>
<init-param>
<param-name>listings</param-name>
<param-value>true</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>
Save the edited file.
ant cleanweb
ant install
About thirty seconds after you restart Tomcat, the Harvester servlet will
start executing. It will begin its first
harvest after delay
number of hours (as specified in the
metacat.properties
file). The servlet will continue running a new harvest every
period
number of hours until a maxHarvests
number of harvests have been completed (if maxHarvests
is set
to a value greater than 0), or until Tomcat shuts down.
After every harvest, Harvester will send an email report to the Harvester Administrator detailing the operations that were performed during the harvest. The report will contain information about each of the Harvest Sites that were harvested from, such as which EML documents were harvested and whether any errors were encountered.
The harvest report will contain a list of log entries, where each log entry describes an operation that was performed by Harvester. Log entries that show a status value of 1 indicate that an error occurred during the operation, while those that show a status value of 0 indicate that the operation was completed successfully.
The Harvester Administrator should review the report, paying particularly close attention to any errors that are reported and to the accompanying error messages that are displayed. When errors are reported at a particular site, the Harvester Administrator should contact the Site Contact to determine the source of the error and its resolution. See Reviewing Harvester Reports to the Site Contact for a description of common sources of errors at a Harvest Site.
Errors that are independent of a particular site may indicate a problem with Harvester itself, Metacat, or the database connection. Refer to the error message to determine the source of the error and its resolution.
A Site Contact registers a site with Harvester by logging in to the Harvester Registration page and entering several items of information that Harvester needs to know about the site.
The Harvester Registration page is accessed from Metacat. For example, if the Metacat server that you wish to register with resides at the following URL:
http://somehost.somelocation.edu:8080/knb/index.jspthen the Harvester Registration page would be accessed at:
http://somehost.somelocation.edu:8080/knb/style/skins/knb/harvesterRegistrationLogin.html
After bringing up this page in your browser, login to your Metacat account by entering your username, organization, and password. For example:
Please Enter Username, Organization, and Password | ||
Username |
||
Organization |
NCEAS
LTER
NRS
PISCO OBFS Unaffiliated | |
Password |
|
|
After logging in, you will be presented with a web form that prompts you to enter information about your site and how often you want to schedule harvests at your site. For example:
Metacat Harvester Registration | ||
Email address: |
||
Harvest List URL: |
|
|
Harvest Frequency (1-99): |
|
|
Unit: |
day(s) week(s) month(s) |
After values have been entered for each of these fields, click the Register button to register your site with Harvester.
In the example shown above, Harvester will attempt to harvest documents from the site once every 2 weeks, it will access the site's Harvest List at URL "http://somehost.institution.edu/~myname/harvestList.xml", and it will send email reports to the Site Contact at email address "myname@institution.edu".
Note that you may enter multiple email addresses by separating each address with a comma or a semi-colon. For example, "myname@institution.edu,anothername@institution.edu".
At any time after you have registered with Harvester, you may discontinue harvests at your site by unregistering. Simply login as described above and then click the Unregister button. After doing so, Harvester will discontinue harvests at the site.
A Harvest List is an XML file that holds a list of EML documents to be harvested. For each EML document in the list, the following information must be specified:
docid
, which consists of the:
scope
, e.g. "demoDocument". The scope is an identifier
that indicates which group of documents this document belongs to.
identifier
, e.g. "1". The identifier is a number that
uniquely identifies this document within the scope.
revision
, e.g. "5". The revision is a number that
indicates the current revision of this document.
documentType
, e.g. "eml://ecoinformatics.org/eml-2.0.0".
The documentType identifies the document as an EML document.documentURL
, e.g. "http://www.lternet.edu/~dcosta/document1.xml".
The documentURL specifies a place where Harvester can locate
and retrieve the document via HTTP.
The contents of a Harvest List XML file must conform to a particular
XML Schema, as defined in file
harvestList.xsd. The contents of a valid Harvest List
can best be illustrated by example. The sample Harvest List
below contains two <document
> elements that specify the
information that Harvester needs to retrieve a pair of EML documents and
upload them to Metacat:
<?xml version="1.0" encoding="UTF-8" ?> <hrv:harvestList xmlns:hrv="eml://ecoinformatics.org/harvestList" > <document> <docid> <scope>demoDocument</scope> <identifier>1</identifier> <revision>5</revision> </docid> <documentType>eml://ecoinformatics.org/eml-2.0.0</documentType> <documentURL>http://www.lternet.edu/~dcosta/document1.xml</documentURL> </document> <document> <docid> <scope>demoDocument</scope> <identifier>2</identifier> <revision>1</revision> </docid> <documentType>eml://ecoinformatics.org/eml-2.0.0</documentType> <documentURL>http://www.lternet.edu/~dcosta/document2.xml</documentURL> </document> </hrv:harvestList>
After editing the Harvest List, ensure that the Harvest List XML file resides at the appropriate location on disk as specified by the URL that was entered during the registration process.
The Harvest List Editor is a tool that assists in composing and editing a Harvest List. (Click here for additional details.)
To prepare a set of EML documents for harvest, ensure that the following is true for each document:
After every scheduled harvest that takes place at a particular Harvest Site, Harvester will send an email report to the Site Contact detailing the operations that were performed during the harvest. The report will contain information about the operations that were performed by Harvester at that site, such as which EML documents were harvested and whether any errors were encountered.
The Site Contact should review the report, paying particularly close attention to any errors that are reported. Errors are indicated by operations that display a status value of 1, while operations that display a status value of 0 indicate that the operation completed successfully.
When errors are reported, the Site Contact should try to determine whether the source of the error is something that can be corrected at the site. Common causes of errors might be:
If the Site Contact is unable to determine the cause of the error and its resolution, he or she should contact the Harvester Administrator for assistance.
Back | Home | Next