Metacat Spatial Option

Back | Home | Next

Introduction

The Metacat spatial option enables you to query and visualize the geographic coverage of metacat documents. This document is intended to provide a high-level overview of the Metacat spatial functionality. It is primarily a resource for users and developers who want to understand the architecture before digging into the code to extend the existing functionality.

Although the spatial option is included with a Metacat installation (beginning with Metacat version 1.7.0), it is an extention to Metacat's functionality that may be used optionally.

Outline

Definitions

The following table defines a number of terms that are useful in discussing Harvester and its features.

Term Definition
Spatial Cache A cached version of the metacat documents representing their geographic coverages in a GIS-compatible data format; the ESRI Shapefile.
Web Mapping Service (WMS) A standard interface specification for requesting spatial data as a web-deliverable map image. WMS servers accept a common set of parameters via http, render the spatial dataset into an appropriate image and deliver it back to the client. The WMS spec was developed by the Open Geospatial Consortium.
Bounding Box (BBOX) A Bounding Box is two sets of geographic coordinates representing the full geographic extent of an entity; the minimum lat/long (the lower-left) and the maximum lat/long (the upper-right).
Spatial Dataset A collection of spatial features in a common datastore.
Spatial Features Analagous to a "row" in a tabular dataset, a feature is an entity comprised of both tabular attributes and a spatial geometry.
Spatial Geometry The geometry is a vector representation of an entities' geographic location. This can be a point (a single vertex), line (a series of vertices) or polygon (a series of vertices forming a closed area).
Multi-Geometry A Spatial geometry represented by one or more geometry primitives (points,lines and polygons). For example a single species census (the spatial feature ) might have mutltiple sample sites and could be represented as a multi-point geometry.
Styled Layer Descriptor (SLD) Styled Layer Descriptors are an OGC standard for defining the filtering, classification and styling of datasets. They are essentially the configuration file which describes how to convert your raw spatial dataset into a cartographic product.
Web Map Context (WMC) Web Map Context documents are an OGC standard for combining various WMS layers into a coherent map. WMC describes which wms servers and layers comprise the map, the layer order, the initial map extent, the requested image formats, etc. This is used by the HTML web map client to construct the interactive map.

Overview Of the Major Components

Spatial Harvester

The Spatial Harvester component syncs the metacat database with the spatial cache (an ESRI shapefile which contains the geographic coverages of the documents).

The Spatial Harvester is implemented entirely in Java using the Geotools library which allows manipulation of spatial datasets. In rough terms, a spatial dataset is a collection of Features which are comprised of a geometry (i.e. the geographic coverage) and associated attributes (i.e. the document's title).

There are a number of Java classes which, collectively, make up the spatial harvester functionality. They are found in the edu.ucsb.nceas.metacat.spatial package:

The spatial cache currently represents the geographic coverage of XML documents based on a bounding box. The four bounding coordinates (either latitudes or longitudes) can be specified in the metacat.properties file by their xpaths. For example, the geographic coverage of EML documents is defined as:

westBoundingCoordinatePath=geographicCoverage/boundingCoordinates/westBoundingCoordinate
eastBoundingCoordinatePath=geographicCoverage/boundingCoordinates/eastBoundingCoordinate
southBoundingCoordinatePath=geographicCoverage/boundingCoordinates/southBoundingCoordinate
northBoundingCoordinatePath=geographicCoverage/boundingCoordinates/northBoundingCoordinate

 It is important to note that, at the moment, only one set of xpaths are defined in metacat.properties meaning only documents of the chosen schema can be accessed by the spatial harvester. Also note that, for performance reasons, the xpaths to the bounding coordinates must also appear in your indexPath (defined in build.properties).

The bounding coordinates are spatially cached in two ways: the centroid(s) of the bounding box(s) and the actual bounding box(es). These are stored as two seperate shapefiles with multi-point and multi-polygon geometry types respectively. By default, ${tomcat_webapps_directory}/${context}/data/metacat_shps/data_points.shp is the storage location of the point cache while data_bounds.shp represents the polygon cache.

The bounding polygon is not relevant to every document as bounding coordinates are allowed to be of zero-area (ie west = = east and north = = south). In this case they are represented only as a point. In cases where no bounding coordinates are defined, the document is not represented at all in the spatial cache. Note that special care has been taken to account for cases where the bounding box crosses the international dateline or polar regions (at which point Cartesian calculations are invalid).

Because documents may have more than one geographic coverage, it is necessary to define the two spatial caches as multi-point and multi-polygon geometry types. This means that each feature's geometry field can contain a collection of one or more primitive geometries.

 With the spatial option properly installed, the default metacat.properties setting is to set regenerateCacheOnRestart=true. This is very useful the first time you install metacat since it will generate the spatial cache from scratch when your servlet container is restarted. Depending on how many documents you have in your metacat database, this can take a considerable amount of time; several minutes in the case of a few thousand documents. For this reason, Metacat sets this property to false after the spatial cache has been generated the first time. This prevents the regeneration of the spatial cache every time you restart your servlet container. Note that if you upgrade or reinstall metacat, the spatial cache will be regenerated again.

Once the spatial cache has been generated, the Metacat servlet will keep the spatial cache in sync with the metacat database by triggering the spatial harvester on every insert, update or delete. This does not regenerate the whole spatial cache, instead simply updating features in the cache as needed. It is fairly quick and should not add more than 1/2 second to any given transaction. As mentioned earlier, all high-level interactions with the spatial cache are handled through the SpatialHarvester class.

There is one very important note about document authentication. While metacat provides very fine-grained permissions control at the document level, the Web Mapping Server component does not. For this reason, only documents that are publicly readable (i.e. documents which match the following SQL query : select distinct docid from xml_access where principal_name = 'public' and perm_type = 'allow')will be added to the spatial cache. In the Future Directions section of this document, the potential for adding feature-level permissions to the WMS server are discussed.

Web Mapping Server

The primary function of the Web Mapping Server component is to render the spatial cache as a web-deliverable map image. It is also responsible for rendering other geographic data to provide base maps or other auxillary map layers.

The OpenGIS consortium has defined a standard for requesting maps, the Web Mapping Service or WMS standard. WMS servers accept a common set of parameters via http, render the spatial dataset into an appropriate image and deliver it back to the client.

For Metacat, we chose to go with GeoServer, a WMS-capable application written in Java. Geoserver (and the Geotools library upon which it depends) are seamlessly integrated into the metacat servlet context. This allows all of the Geoserver/Geotools functionality to be accessible to all of metacat and allows easier deployment than if it were distributed seperately. The downside to this approach is that upgrading geoserver later becomes slightly more complicated.

Geoserver comes with a default configuration that is already aware of the spatial cache and a world countries base layer. In order to configure existing data and add new data sources, geoserver comes with an web-based configuration utility. It is available at http://your.server/context/geoserver.jsp

Geoserver has lots of built in functionality and can support a wide variety of vector GIS data sources. These data sources can be styled using SLDs and can be made available via open distribution standards such as WMS and WFS. For our purposes, it mainly outputs Images (via WMS) but can be used to output raw vector data in the form of GML or KML.

There are several issues with geoserver tha users should be aware of. The version of Geoserver (1.4) used by Metacat does not support raster input dataset (ie satellite imagery or digital elevation models). For distributing rasters, we recommend UMN Mapserver be set up independently. Geoserver, though it offers a web configuration interface, is lacking in several key areas and you may still have to hand-edit some XML files in order to customize your WMS server.

Spatial Query

Displaying the spatial cache as a map is important but users also need to query the spatial cache in order to answer the question "What documents lie in this geographic region?". The functionality is invoked through the metacat servlet itself; there is a spatial_query action for this purpose. An example spatial query would be:

http://localhost/knb/metacat?action=spatial_query&xmin=-117.5&xmax=-64&ymin=3&ymax=46&skin=default

Where xmin, xmax, ymin and ymax represent the west, east, south and northern bounding coordinates respectively. This will return an html document listing (in the style of the specified skin) all documents whose geographic coverage intersect the given bounding box.

The core functionality of the spatial query mechanism is found in the edu.ucsb.nceas.metacat.spatial.SpatialQuery class and, like the spatial harvester, relies heavily on the Geotools library. This class has a single method, filterByBbox(), which compares the bounding box to both the point and polygon cache. For each shapefile, the process requires two steps: First, filter the spatial cache for features whose bounding box overlaps the specified bounding coordinates; Second, iterate through the remaining features and perform an an actual geometric intersection. The second step, though more costly than comparing the bbox, is necessary because the feature's geometry may be a multi-geometry whose bounding box is large but whose component primitive geometries are scattered over that area. The end result is a vector of docids matching the spatial query.

This docid list is then sent to DBQuery. Using a special constructor that takes a vector of docids, the DBQuery class is able to use the Docid override mechanism to perform an optimized query (for cases where the list of docids is already known).

HTML Mapping Client

In order to provide a web-based user interface to the WMS and the spatial query functionality, Metacat relies on Community Mapbuilder. Mapbuilder is a pure HTML/javascript application which uses AJAX and XSLT on the client side to create a desktop-GIS-like environment for interacting with geographic data through a web browser.

The main function of mabuilder is to act as a front-end to WMS services. The WMS layers are configured through a Web Map Context document (typically found in the skin directory under spatial/context.xml). This context document (or WMC) can define the initial extent of the map, the ordering and visibility of layers and, of course, the source and name of the WMS layer.

Mapbuilder provides several interface components or widgets (map, box zooms, layer list, "select location" dropdown, scalebar, coordinates, info query) that make it easy to deploy highly-functional web mapping applications with minimal coding. There are three main configuration files you need to work with in order to customize the map interface: The context document discussed above, the mapbuilder configuration file (typically under spatial/config.xml) which defined the widgets and their behavior and finally the html file (typically spatial/map.html) which loads the mapbuilder javascript library and places the widgets in your html layout.

For integration with metacat, we built a custom mapbuilder widget, the AOIMetacat Query, which allows you to click query the map, either by box or point, and call the metacat spatial_query action.

The cleanest way to integrate a mapbuilder interactive map with any page in your application is to simply create an iframe element with the src pointing to a standalone map.html.

Installing and Configuring the Spatial Option

Initial Installation

To install the spatial option, choose a version of metacat >= 1.7.0. You'll want to ensure that runSpatialOption is set to true in lib/metacat.properties before you build. Running the spatial option is true by default so, unless you explicitly set it to be false, the spatial option will install and run automatically when you install metacat.

How do I configure the layout of the html mapping interface?
The layout of the map components is defined in the spatial/map.html file within the skin's directory. It is a simple tabular layout and the map components are abstracted into "widgets", blocks with a specific id, which can be reorganized within the table. For deeper customization you can modify the web map context document (spatial/context.xml) and the mapbuilder config file (spatial/config.xml).
How can I change the lat/long display to degree-minutes-seconds ?
By default, the map display shows the cursors position in decimal degrees since this is the prefered format for many GPS/GIS applications. However, there may be cases where you need to report coordinates as degrees minutes-seconds. To do so, go into you skins spatial configuration file (usually ${skin.dir}/spatial/config.xml) and edit the CursorTrack widget as shown below:
 
        <CursorTrack id="cursorTrack">
          <mouseHandler>mainMap</mouseHandler>
          <showDMS>true</showDMS>
          <showLatLong>true</showLatLong>
        </CursorTrack>
       
How can I configure the size and initial extent of the map?
The map's initial extent is defined in the web map context document for each skin. To change the map size and/or initial extent, edit the following lines:
    <Window width="720" height="360" />
    <BoundingBox SRS="EPSG:4326" minx="-180" miny="-90" maxx="180" maxy="90" />
Where the width and height are the image size in pixels and the minx/maxx represent the range of longitudes and miny/maxy reprsent the range of latitudes.
How do I configure the "select location" dropdown to contain different predefined locations?
The locations data are held in a file called "named_locations.xml" in your skin's spatial directory. This defines each location as a gml:featureMember. Within each featureMember, you can edit the gml:name and gml:coordinates fields to edit or add new locations.
Can I use a different web mapping interface?

Certainly. Since our mapping server conforms to the WMS standard, you can develop a map interface using any WMS client application. There are many of WMS clients ranging from desktop GIS applications (ArcGIS, QGIS, JUMP, UDig) to javascript web mapping frameworks (openlayers, mapbender, mapbuilder) and anyone of them could be used to build a novel spatial application from the metacat wms.

There is an experimental version of OpenLayers (a tiled, google-maps-esque web interface) included with Metacat spatial. This can be accessed at http://your.server/context/style/common/spatial_templates/openlayers1/map.html. Keep in mind that this is experimental and not supported as an official interface.

How do I configure the styling and classification of the data?
The datasets are styled through the use of Styled Layer Descriptors (SLD). The default SLDs used for the data points and data bounding boxes are in {context}/data/styles and named data_points_style.sld and data_bounds_style.sld respectively. You can find more a more detailed tutorial on using SLD with geoserver at : http://docs.codehaus.org/display/GEOSDOC/SLD+Intro+Tutorial.
What versions of tomcat are supported?

The spatial functionality has only been tested on tomcat 5. The web.xml.tomcat4 distributed with metacat is no longer supported after Metacat 1.6.0 . If you need to use tomcat 4, you might be able to update this file to reflect the incorporation of geoserver (see web.xml.tomcat5) though we have no intention of supporting it's use.

How do I add the map to another page or metacat skin?
The map interface is held in a seperate html document and can be easily included in any html page through the use of an iframe :
  <iframe scrolling="no" frameborder="0" width="736" height="520" 
             src="/knb/style/common/spatial_templates/spatial1/map.html">
  </iframe>
The map URL referenced here is the default, common spatial template for use in any skin. If you plan on doing any customization of the map interface, you should copy that spatial template into your skin's directory:
 cp -r style/common/spatial_templates/spatial1 style/skins/myskin/spatial
and access it with the url "/knb/style/skins/myskin/spatial/map.html".

Adding Other Spatial Datasets to the Web Map

Adding your own spatial datasets
If you have other vector GIS datasets on your server that you'd like to include in the interactive map, there are two main tasks:
Registering your dataset with geoserver
There is a geoserver tutorial which covers the precise steps of adding a new layer via the web interface. The actual process will differ slightly depending on your configuration so we'll clarify and summarize the steps below:
  1. Creating a datastore (ie registering the raw data)
  2. Creating the feature type (ie registering the data as a recognizable layer)
  3. Testing
Adding your layer to the web map context
  1. Locate the web map context document (usually {skin}/spatial/context.xml) and open in a text editor
  2. Locate the Layer entry for an existing layer next to which you want your layer stacked (the first layers in the context are rendered at the bottom).
  3. Create a new Layer entry, copying an existing entry for the metacat data_points layer to use as an example.
  4. Edit the layer Name to reflect the name of your new feature type (ie metacat:newLayer )
  5. Edit the Title; this will be displayed in the map legend.
  6. Special note about the image format: image/gif is the only option if you want transparency (since IE pre-7 has trouble with PNG transparency). image/jpeg is a good option for base layers.
  7. Point your browser to the map interface using this context document and your layer should show up stacked with the others
External WMS data sources
There are hundreds of sources of spatial data made publically available through WMS. Check out wms-sites.com for good catalog. In order to add these data sources to your map, locate your skin's context document and add a Layer as appropriate. Using an existing Layer as an example, modify the OnlineResourceURL, Name, Title and Style to match the WMS layer you're after. See the mapbuilder Add WMS Tutorial for further details.
Raster Images
The version of geoserver currently shipping with metacat (geoserver 1.4) does not support raster images as input data sources. We suggest setting up UMN Mapserver if you aim to serve raster data as a WMS. Note that once a WMS service is set up, the process of adding it to your map context is the same regardless of what WMS software is serving it. (ahhh the beauty of open standards).

Developers Notes

web.xml
The process of integrating geoserver with metacat involved merging the two web.xml documents. Special care must be taken to preserve the order of loading for various components in the geoserver stack. The web.xml.tomcat5 document is commented in the relevant places to indicate its purpose.
Upgrading geoserver
Great care has been taken not to modify geoserver so heavily as to "fork" it. Any small changes made should be submitted to the geoserver development team or maintained as patches against released versions. Still, upgrading geoserver is not as seamless as one might hope. Most notably, any changes in the geoserver web.xml will have to be integrated by hand into out metacat/geoserver hybrid web.xml.tomcat5 document. In addition, geoserver distributes some unneccesary files that have been cut from the metacat version. When upgrading geoserver, make sure any old files get deleted, new files get added to cvs (if they are needed) or removed (if superfluous).

Future Directions

Automatically handle spatial datasets

When users put spatial data into the Morpho system, it would be nice if we could automatically pull all the avialable metadata from the spatial dataset itself.

On the metacat side, it might be worth trying to auto-detect spatial datasets and add them to the WMS service do that they could be displayed along with the metadata coverages. This is tricky since the styling of spatial data is intentionally seperated from the data itself; we'd have to have some sort of easy way to prompt the user for the classification and styling info and construct the appropriate SLDs.

It's worth noting that, currently, one could do this manually. There is nothing, aside from editing a few configuration files, to prevent any Geotools-supported dataset from being displayed through the WMS map interface.

For vector datasets, it would be possible to store the data directly in the database itself (This is a logical extension of the future work to put tabular data directly in a relational database). Postgresql has the PostGIS extensions to handle this so we would have to require postgresql if we went this route.

WMS bypass

Filter which spatial cache features are displayed by access contraints, skin constraints and the current non-spatial query set. This would involve intercepting incoming WMS requests and appending a styled layer descriptor (SLD) with an OGC filter to prevent/allow certain docids.

SLD factory

Closely related to the WMS bypass implemetation, the SLD factory would be in charge of constructing the filter based on on the contraints mentioned above. In other words, it would construct a document specifying which docids were to appear in the map. Because it would have to generate this list of docids on every wms request, performance is a big concern. Likely we'll need to cache docid lists as session variables.

There is currently a stub implementation of the SLD Factory servlet in src/edu/ucsb/nceas/metacat/spatial/SldFactory.java. It is functional except that it doesn't generate a dynamic list of allowable docids. Assuming we can modify the SldFactory servlet to quickly generate a list of allowable docids based on stored session variables, applying this SLD to a WMS request is fairly easy and simply requires appending the URL of the sldfactory as an "SLD" parameter to the WMS GetMap request:

 http://indus/knb/wms?REQUEST=GetMap&SERVICE=WMS....&SLD=http://indus.msi.ucsb.edu/knb/sldfactory?originalSld=data_points_style.sld

where data_points_style.sld is the original style document existing in {context}/data/styles/. The sldfactory servlet will construct a list of allowable docids, append those to the original sld as an ogc filter, an return a (modified) SLD document. There are two possibilities for implementing this:

  1. Mapbuilder (the WMS client in charge of constructing WMS request) can be told to append this SLD parameter through the use of the WMC config document. This would work for the skins but, alone, would not ensure that every WMS request were filtered since other clients could simply ommit the SLD parameter.
  2. An alternative, one that would ensure that EVERY wms request was filtered, would be to handle it all server side with a WMS bypass.
Map configuration interface

Geoserver currently offers a nice web-based configuration but it is lacking a few key features and may be difficult for a novice GIS user. We may want to reinvent a custom geoserver configuration interface to

Ideally we could pull as much information as possible from the metadata and make the UI very intuitive. This does bring up issues of web-based admin access constraints and developing a subsytem to handle who has edit access to the map configuration.

Back | Home | Next