Why does it rain?
Because it is wet.
Because of accummulation of moisture in the
atmosphere. Gravity overcomes levity.
What is EML?
EML stands for
Ecological Metadata Language. It exists as a set of XML Schema
DTDs that allow for the structural expression of metadata
necessary to document a typical data set in the ecological
sciences.
Who is responsible for EML?
The first two released versions of EML, EML 1.0 and EML 1.4.1
were developed at the National Center for Ecological Analysis and
Synthesis (NCEAS), University of California at Santa Barbara, in
Santa Barbara, California USA. EML 2.0 beta 9 and the EML 2.0
release candidate 1 were developed through community efforts that
involved a number of ecological research projects and
organizations. While the bulk of the work still comes from NCEAS,
the Long Term Ecological Research Program sites, and individuals
from a number of other research projects have had significant
input into EML.
Why would I want to use EML when FGDC now supports
biological data through the CSDGM?
modularity & extensible structures.
The CSDGM is one huge monolithic standard, and so it is
difficult to mix and match parts of it with other standards --
mainly because of all of the spatial requirements. So, we built
EML as a series of modules that can be linked together and can be
linked to other metadata standards. This gives us the most
flexibility, and given that we can easily translate into FGDC
compliant documents, there is little cost. Second, we're building
advanced data processing tools that can automatically parse data
sets and analyze them based on the EML metadata descriptions. Due
to various shortcomings in the FGDC standard, mostly oriented
around its tight focus on spatial data, we have found that the
CSDGM isn't adequate for these needs. As a research project, we
are constantly trying to expand the suite of services that metadata
enables, and the FGDC spec isn't accommodating in that regard
(e.g., how can one add machine parsable, semantically oriented
attribute tags to FGDC? Answer, you can't, because it is
monolithic and doesn't permit dynamic ties to other metadata specs
-- the only extension method is a huge administrative task of
actually creating a superset of the FGDC -- not very maintainable).
In addition, the level of granularity for metadata in FGDC is very
patchy -- it goes into tremendous detail for spatial projections,
etc, but is incredibly terse with respect to describing methods and
non-standard data formats. This is appropriate in the spatial
world where there are so few data formats (< 100, many sensor
derived streams), but not so good in ecology where there is no
standardization of data formats (>>>5000, very few sensor
derived).
Is there documentation for EML in English?
Yes, there is a formal specification of EML describing its
development history, architecture, and modules. The intent of each
module is described in narrative and there is a technical
description of each module in XML notation. Included as part of the
technical description is an element-by-element description of the
module. We will eventually provide examples on usage.
Why is EML such an important development?
The last decade has witnessed a tremendous explosion of
ecological and environmental data, catalyzed by societal concerns
and facilitated by advancing technologies. These data have the
potential to greatly enhance understanding of the complexity of
the biosphere. However, broad-scale or synthetic research is
stymied because data are largely unorganized and inaccessible as a
consequence of their tremendous heterogeneity, complexity, and
spatial dispersion in many separate repositories. EML is the first
content standard designed specifically to address these issues for
ecological data. Wide adoption and use of EML will create exciting
new opportunities for data discovery, access, integration and
synthesis.
How do I get EML?
All the documents associated with the EML development effort are
available via the project web server at www.ecoinformatics.org. These
projects are licensed under the GPL (Gnu Public License) agreement and
can be freely distributed and modified.
The EML Schema document is quite complex. An average
ecologist probably cannot and more likely does not want to mark up
content in an XML editor. How then do you get content into
EML?
The Knowledge Network for Biocomplexity
project has developed a software client specifically to address this
need. Morpho (after the butterfly genus) is written in java (making
portable across computer platforms) combines an easy to use interface
to EML with a number of tools to make it easier for ecologists to
document data. These include a reverse-engineering wizard. Morpho is
available from www.ecoinformatics.org.
EML contains provisions for communication. Is it
possible to document in EML dynamic online data resources?
Yes, there are provisions in the eml-physical module for
descriptions of online data resources.. The eml-physical module
describes the structural characteristics of data formats as delivered
over the wire or as found in a file system. One physical object (which
can be a bytestream or an object in a file system) might contain
multiple entities (for example, this would be typical in a MS Access
file that contained multiple tables of data). However, it is typically
used to describe a file or stream that is in some text-based format
such as ASCII or UTF-8, and includes the information needed to parse
the data stream to extract the entity and its attributes from the
stream. There are 3 distribution types, online, offline, and
inline. To describe an online dataset in EML you would populate the
online element with the distribution information.
Do I need to download special client software to use
EML?
No, but there is software available to work with EML. See FAQ 8.
How can I get my existing metadata into EML?
There are several approaches that can be used to convert
existing metadata into EML depending on what form your existing
metadata take.
CASE 1: Metadata is currently in a text format (not stored in a database).
CONVERSION METHODS:
1. Write a script (PERL, PHP, JAVA,etc.) to convert the text into EML compliant XML.
2. Convert the text metadata into XHTML (HTML that is XML compliant). Write an XSLT script to transform the XHTML file into EML compliant XML.
3. Use an special purpose XML editor that generates EML ( Morpho or Xylographa) and manually retype the metadata.
4. Use a general purpose XML development tool such as XML Spy that can create a sample document from an XML Schema and retype the metadata manually.
5. Use a simple text editor and do everything from scratch.
6. Use specialized data transformation software such as the Data Junction suite to extract text data and then map it into an EML structure.
CASE 2: Metadata is stored in a relational database
CONVERSION METHODS:
1. Both Microsoft SQL Server and Oracle have utilities to generate XML from their database. If you use a tool like that, then you will have to write an XSLT script to transform the generated XML into EML.
2. Use a vendor neutral Database-to-XML generator such as Cocoon (an Apache open source free tool). Cocoon can query the database, generate XML, and has a tool for creating the XSL Transformation scripts to convert the first stage XML output into EML format.
3. Use a specialized tool such as Xanthoria (like Cocoon in may respects, but is easier to use) to generate XML from the database. Then use a tool such as XML Spy or Stylus Studio to develop the XSLT script to convert the generated XML into EML compliant XML.
4. Use specialized data transformation software such as the Data Junction query the database and map it into an EML structure.
CASE 3: Metadata is already in XML but in some other form such as NBII or FGDC
CONVERSION METHOD:
1. Write an XSLT script to convert from e.g. FGDC to EML.
NOTE: In each of the cases it may be necessary to add some additional
metadata in order to produce EML compliant metadata. Morpho will
automatically create EML compliant metadata either by adding it for
you or indicating that certain fields are mandatory.
The challenge of getting my data into eml is not
insurmountable. My question is what do I do with it when I get it
there? If I am storing all my metadata in text-based eml files, how am
I supposed to query them or use them for data management?
For a site that has no current electronic data management
system and has no immediate intention of developing one, then there
are a number of solutions including the morpho-metacat solution. If
you store your metadata in a relational database management system or
plan to then there are also solutions. Cocooon and Xanthoria are
examples of programs that can get EML out of an RDBMS. Cocoon and
Xanthoria are both java applications that use java database connection
hooks and style sheets to retrieve and format data. Xanthoria is
smaller code and the XSLT stylesheets for EML 2.0 have already been
written. This solution lets a site stick with the rdbms system that
they probably have integrated with their site management activities,
yet also have their metadata exposed via EML.
Does the modularity of EML mean that one descriptions
can be shared by many documents?
In a previous version, EML packages (via rdf style triples)
supported linking across packages, so you could re-use the same
document in multiple packages. In EML 2.0 release candidate 1 we
redesigned the packaging structure to only allow linking within a
single package. Thus, one could reuse a party description or
attribute list within a package, but not across several. This is a
compromise that keeps some reusability but has fewer management
problems. Along with this change is an ability to put all
metadata and data in a single document for transport -- while
still not limiting ourselves to a monolithinc structure. This has
benefits (akin to db normalization) and costs (access control,
ownership, and multiple update problems abound).
How are EML modules linked together?
With ref and ref:id attributes in each module.
Our general approach in EML has been to create
ComplexTypes (CT) when we wanted a particular block to be
reusable. This concept was extended for linking modules together
by adding an optional attribute named "id" of type "xs:ID" for
each ComplexType. This allows us to uniquely address each block
defined by a CT, and any XML 1.0 parser will validate that all of
the "id" values are in fact locally unique. For the
"ResourceBase" CT, this id element replaces the "identifier"
element and acts as the overall identifier for the package.
The content model for each CT is a choice between the existing content
model and a new element named "references" of type "xs:string". This
element is used to hold a reference to an existing subtree identified
by its id. We use this element instead of an IDREF to surmount
validation issues. This relationship between the "references" element
and the "id" identifiers is enforced by defining an XML Schema "key"
for the "id" elements and a "keyref" for the "references" elements.
Thus, any XML parser that supports XML Schema validation will be able
to validate the correspondence between each "id" and "references"
field (e.g., Xerces 2.0 supports this). Here's a fragemnt of an
example xml doc to illustrate:
...
Jones
p1
lackey
p1
...
This even works for types that extend other types as long as the
subclass is the one that does the referencing (e.g., associatedParty
can reference creator, but not vice versa). This rule will actually
be enforced by validating parsers.
The key and keyref are defined in the eml.xsd module. A package is
defined by all of the content included in the tag, including the
nested modules like attribute in entity. The nature of the
association is implied by the types of the document (ie,
role/predicate/property/relationship is not specified directly). The
reference/id linkage is enforced by defining another "keyref"
constraint. So, this lets us add arbitrary metadata documents and
point them at existing ids in the tree. Thus, the id serves as both
ends of the link (subject and object in RDF terms) depending on
whether it is referred to in a "references" element or in a
"describes" attribute.
Can I put data into EML as well as metadata?
Yes, there are provisions in the eml-physical module for
inclusion of data. The module describes the structural
characteristics of data formats as delivered over the wire or as
found in a file system. One physical object (which can be a
bytestream or an object in a file system) might contain multiple
entities (for example, this would be typical in a MS Access file
that contained multiple tables of data). However, it is typically
used to describe a file or stream that is in some text-based
format such as ASCII or UTF-8, and includes the information needed
to parse the data stream to extract the entity and its attributes
from the stream. There are 3 distribution types, online, offline,
and inline. To include data in EML you would populate the inline
element with the data file described in the data format
element.
What can I do with my EML structured metadata?
be very proud that you are limiting data entropy
worldwide.
Can I validate my EML documents against the
DTD?
Yes and No
EML is implemented in an Extensible Markup Language (XML)
known as XML Schema, which is a language that defines the rules
that govern the EML syntax. XML Schema is an internet
recommendation from the World Wide Web Consortium
(http://www.w3.org), and so a metadata document that is said to
comply with the syntax of EML will structurally meet the criteria
defined in the XML Schema documents for EML. Over and above the
structure (what elements can be nested within others, how many,
etc.), XML Schema provides the ability to use strong data typing
within elements. This allows for finer validation of the contents
of the element, not just it's structure. For instance, an element
may be of type 'date', and so the value that is inserted in the
field will be checked against XML Schema's definition of a
date. Traditionally, XML documents have been validated against
Document Type Definitions (DTDs), which do not provide a means to
employ strong validation on field values through typing. EML is
also distributed with DTD's that are generated from the XML Schema
documents to provide some backward compatability.
Are there required elements in EML?
Yes, although we've made every attempt to limit required
elements in the cause of flexibility there are a number of pieces
of information required to make sense of the metadata document. To
make the metadata more useful we do have recommended usages on the
modules. See specification for details about required fields and
recommended usage. In the future we may provide usage compliance
information such that if you want your data and metadata to be
useful in a particular analytical context you will be provided
with those elements of EML that are required for that
purpose.
There appear to be multiple places to put some types of metadata
in EML. How do I know which of these places is the right place for
my information?
Call or email Peter McCartney.