Packages and Relationships

Back | Home | Next

Metacat allows a user to create a virtual link between XML documents within the system. These links are called Relationships and are defined by triples in eml-dataset-2.0 files. A relationship can be defined between two XML or non-XML files. The following is an example of an eml-dataset-2.0 file holding triples at the end:

<?xml version="1.0"?>
<!DOCTYPE dataset PUBLIC "-//NCEAS//eml-dataset-2.0//EN" "eml-dataset-2.0.dtd">
<dataset>
  <identifier system="null">berkley.5.3</identifier>
  <shortName>allsp</shortName>
  <title>MARINE sampling data collected between spring 1992 and fall 1996</title>
  <originator>
    <individualName>
      <salutation>Dr.</salutation>
      <givenName>Peter</givenName>
      <surName>Raimondi</surName>
    </individualName>
    <organizationName>UCSC</organizationName>
    <positionName> </positionName>
    <address>
      <deliveryPoint>Biology Dept.</deliveryPoint>
      <deliveryPoint>A309 Earth and Marine Science Building</deliveryPoint>
      <city>Santa Cruz</city>
      <administrativeArea>CA</administrativeArea>
      <postalCode>95060</postalCode>
      <country>USA</country>
    </address>
    <phone phonetype="voice">831-459-1234 x5674</phone>
    <electronicMailAddress>raimondi@biology.ucsc.edu</electronicMailAddress>
    <onlineLink> </onlineLink>
    <role>Originator</role>
  </originator>
  <pubdate> </pubdate>
  <pubplace> </pubplace>
  <series> </series>
  <abstract>
    <paragraph> </paragraph>
  </abstract>
  <keywordSet>
    <keyword keywordType="null">intertidal</keyword>
    <keyword keywordType="null">santa barbara</keyword>
    <keyword keywordType="null">photoplot</keyword>
    <keyword keywordType="null">quadrat</keyword>
    <keywordThesaurus> </keywordThesaurus>
  </keywordSet>
  <additionalInfo>
    <paragraph> </paragraph>
  </additionalInfo> 
  <triple>
    <subject>berkley.6.1</subject>
    <relationship>isRelatedTo</relationship>
    <object>berkley.5.3</object>
  </triple>
  <triple>
    <subject>berkley.7.1</subject>
    <relationship>isRelatedTo</relationship>
    <object>berkley.6.1</object>
  </triple>
  <triple>
    <subject>berkley.8.1</subject>
    <relationship>isRelatedTo</relationship>
    <object>berkley.5.3</object>
  </triple>
  <triple>
    <subject>berkley.8.1</subject>
    <relationship>isRelatedTo</relationship>
    <object>berkley.6.1</object>
  </triple>
  <triple>
    <subject>berkley.8.1</subject>
    <relationship>isRelatedTo</relationship>
    <object>berkley.7.1</object>
  </triple>
  <triple>
    <subject>berkley.14.1</subject>
    <relationship>isRelatedTo</relationship>
    <object>berkley.6.1</object>
  </triple> 
  <temporalCoverage> 1992 to 1996</temporalCoverage>
  <geographicCoverage> </geographicCoverage>
  <taxonomicCoverage> </taxonomicCoverage>
</dataset>
  
Description of the Package File

Note that the doctype of this document is an unregistered NCEAS specific DTD (-//NCEAS//eml-dataset-2.0//EN). The package doctype is an application property of Metacat. Setting this property (and others) is described in Setting Metacat Properties. The package file is broken up into n triples. Each triple has a subject, relationship, and an object. This grouping can be read as follows: <subject> has <relationship> to <object>. Each triple is a logical link between the subject and object with the relationship being a description of that link.

The Utility of Relations

Relations become useful because many XML data schemas are broken up into multiple DTDs. Thus, there may be many different XML files that are all related to each other yet are stored seperately within the system. Also, since we, here at NCEAS, are developing Metacat for use as a metadata repository for ecological data, we need some way of linking our metadata to the datafiles that they describe. Packages are the way we do this.

Post Processed Relations

The package file is inserted into Metacat as any other file is. Its doctype is checked against the packagedoctype property in the Metacat.properties file. If it is of that type, the file is sent to a postprocessor to be analyzed and inserted into the xml_relation table. The table looks like the following:

relationiddocidpackagetype subjectsubjectdoctype relationshipobjectobjectdoctype
1 berkley.5 -//NCEAS//eml-dataset-2.0//EN berkley.6.1 null isRelatedTo berkley.5.3 null
2 berkley.5 -//NCEAS//eml-dataset-2.0//EN berkley.7.1 null isRelatedTo berkley.6.1 null
3 berkley.5 -//NCEAS//eml-dataset-2.0//EN berkley.8.1 null isRelatedTo berkley.5.3 null
4 berkley.5 -//NCEAS//eml-dataset-2.0//EN berkley.8.1 null isRelatedTo berkley.6.1 null
5 berkley.5 -//NCEAS//eml-dataset-2.0//EN berkley.8.1 null isRelatedTo berkley.7.1 null
6 berkley.5 -//NCEAS//eml-dataset-2.0//EN berkley.14.1 null isRelatedTo berkley.6.1 null

Once, the system has processed the package file and inserted the relations into the xml_relation table, the files relations are always returned to with it in the resultset of a query.

Package Views (formerly known as 'backtracking')

Package View is a feature that was intentionally left out of the Queries and Results section. Package views involves sending a doctype (called a returndoctype) along with a query request. When there is a hit from that query, the system will check the doctype of the hit document against the returndoctype. If the doctypes do not match, the system checks the xml_relation table to see if that document has been packed by document of that doctype. If such package document exists, it is returned instead of the one which was originally hit. If no such package document exists, then the document which was originally hit is returned. This allows a display system (such as a web browser) to try to display a certain type of document.

For example: Take our package file from above. Say we do a query for "intertidal" which returns the document berkley.6 of type -//NCEAS//eml-entity-2.0//EN. However, we have set returndoctype equal to "-//NCEAS//eml-dataset-2.0//EN". When berkley.6 is hit, the system will check its package documents to see if it is of type -//NCEAS//eml-dataset-2.0//EN. Since it is, (relationid 1, 2 and 4) document berkley.5 is returned instead of berkley.6.

From a client the returndoctype is a servlet parameter. A URL with a returndoctype would look something like:

http://server.domain.com/Metacat?action=query&anyfield=%&qformat=html&returndoctype=-//NCEAS//eml-dataset-2.0//EN

The system then inserts the returndoctype parameter value into a pathquery document as illustrated in Queries and Results.


Back | Home | Next