<eml-faq version="0.1">
  <faq-item id="0">
    <question>Why does it rain?</question>
    <answer>Because it is wet.</answer>
    <long-answer>Because of accummulation of moisture in the
    atmosphere. Gravity overcomes levity.</long-answer>
  </faq-item>
  <faq-item id="1">
    <question>What is EML?</question> 
    <answer>EML stands for
    Ecological Metadata Language. It exists as a set of XML Schema
    DTDs that allow for the structural expression of metadata
    necessary to document a typical data set in the ecological
    sciences.</answer>
  </faq-item>
  <faq-item id="2">
    <question> Who is responsible for EML?</question> 
    <answer> The first two released versions of EML, EML 1.0 and EML 1.4.1
    were developed at the National Center for Ecological Analysis and
    Synthesis (NCEAS), University of California at Santa Barbara, in
    Santa Barbara, California USA. EML 2.0 beta 9 and the EML 2.0
    release candidate 1 were developed through community efforts that
    involved a number of ecological research projects and
    organizations. While the bulk of the work still comes from NCEAS,
    the Long Term Ecological Research Program sites, and individuals
    from a number of other research projects have had significant
    input into EML.</answer>
  </faq-item>
  <faq-item id="3">
    <question> Why would I want to use EML when FGDC now supports
   biological data through the CSDGM?</question> 
    <answer> modularity & extensible structures.</answer> 
    <long-answer> The CSDGM is one huge monolithic standard, and so it is
   difficult to mix and match parts of it with other standards --
   mainly because of all of the spatial requirements.  So, we built
   EML as a series of modules that can be linked together and can be
   linked to other metadata standards.  This gives us the most
   flexibility, and given that we can easily translate into FGDC
   compliant documents, there is little cost.  Second, we're building
   advanced data processing tools that can automatically parse data
   sets and analyze them based on the EML metadata descriptions.  Due
   to various shortcomings in the FGDC standard, mostly oriented
   around its tight focus on spatial data, we have found that the
   CSDGM isn't adequate for these needs.  As a research project, we
   are constantly trying to expand the suite of services that metadata
   enables, and the FGDC spec isn't accommodating in that regard
   (e.g., how can one add machine parsable, semantically oriented
   attribute tags to FGDC?  Answer, you can't, because it is
   monolithic and doesn't permit dynamic ties to other metadata specs
   -- the only extension method is a huge administrative task of
   actually creating a superset of the FGDC -- not very maintainable).
   In addition, the level of granularity for metadata in FGDC is very
   patchy -- it goes into tremendous detail for spatial projections,
   etc, but is incredibly terse with respect to describing methods and
   non-standard data formats.  This is appropriate in the spatial
   world where there are so few data formats (< 100, many sensor
   derived streams), but not so good in ecology where there is no
   standardization of data formats (>>>5000, very few sensor
   derived).</long-answer>
  </faq-item>
  <faq-item id="4">
    <question> Is there documentation for EML in English?</question>
    <answer>Yes, there is a formal specification of EML describing its
  development history, architecture, and modules. The intent of each
  module is described in narrative and there is a technical
  description of each module in XML notation. Included as part of the
  technical description is an element-by-element description of the
  module. We will eventually provide examples on usage.</answer>
  </faq-item>
  <faq-item id="5">
    <question> Why is EML such an important development?</question>
    <answer> The last decade has witnessed a tremendous explosion of
    ecological and environmental data, catalyzed by societal concerns
    and facilitated by advancing technologies. These data have the
    potential to greatly enhance understanding of the complexity of
    the biosphere. However, broad-scale or synthetic research is
    stymied because data are largely unorganized and inaccessible as a
    consequence of their tremendous heterogeneity, complexity, and
    spatial dispersion in many separate repositories. EML is the first
    content standard designed specifically to address these issues for
    ecological data. Wide adoption and use of EML will create exciting
    new opportunities for data discovery, access, integration and
    synthesis.</answer>
  </faq-item>
  <faq-item id="6">
    <question> How do I get EML?</question>
    <answer> All the documents associated with the EML development effort are
available via the project web server at www.ecoinformatics.org. These
projects are licensed under the GPL (Gnu Public License) agreement and
can be freely distributed and modified.  </answer>
  </faq-item>
  <faq-item id="7">
    <question> The EML Schema document is quite complex. An average
ecologist probably cannot and more likely does not want to mark up
content in an XML editor. How then do you get content into
EML?</question> 
    <answer>The Knowledge Network for Biocomplexity
project has developed a software client specifically to address this
need. Morpho (after the butterfly genus) is written in java (making
portable across computer platforms) combines an easy to use interface
to EML with a number of tools to make it easier for ecologists to
document data. These include a reverse-engineering wizard.  Morpho is
available from www.ecoinformatics.org.  </answer>
  </faq-item>
  <faq-item id="8">
    <question> EML contains provisions for communication. Is it
possible to document in EML dynamic online data resources?</question>
<answer>Yes, there are provisions in the eml-physical module for
descriptions of online data resources.. The eml-physical module
describes the structural characteristics of data formats as delivered
over the wire or as found in a file system. One physical object (which
can be a bytestream or an object in a file system) might contain
multiple entities (for example, this would be typical in a MS Access
file that contained multiple tables of data). However, it is typically
used to describe a file or stream that is in some text-based format
such as ASCII or UTF-8, and includes the information needed to parse
the data stream to extract the entity and its attributes from the
stream. There are 3 distribution types, online, offline, and
inline. To describe an online dataset in EML you would populate the
online element with the distribution information.  </answer>
  </faq-item>
  <faq-item id="9">
    <question> Do I need to download special client software to use
EML?</question> 
    <answer>No, but there is software available to work with EML. See FAQ 8.</answer>
  </faq-item>
  <faq-item id="10">
    <question> How can I get my existing metadata into EML?</question>
    <answer>There are several approaches that can be used to convert
    existing metadata into EML depending on what form your existing
    metadata take.</answer>
    <long-answer>
CASE 1: Metadata is currently in a text format (not stored in a database).
CONVERSION METHODS:
             1. Write a script (PERL, PHP, JAVA,etc.) to convert the text into EML compliant XML.
             2. Convert the text metadata into XHTML (HTML that is XML compliant). Write an XSLT script to transform the XHTML file into EML compliant XML.
             3. Use an special purpose XML editor that generates EML ( Morpho or Xylographa)  and manually retype the metadata.
             4. Use a general purpose XML development tool such as XML Spy that can create a sample document from an XML Schema and retype the metadata manually.
             5. Use a simple text editor and do everything from scratch.
             6. Use specialized data transformation software such as the Data Junction suite to extract text data and then map it into an EML structure.

CASE 2: Metadata is stored in a relational database
CONVERSION METHODS:
            1. Both Microsoft SQL Server and Oracle have utilities to generate XML from their database. If you use a tool like that, then you will have to write an XSLT script to transform the generated XML into EML.
            2. Use a vendor neutral Database-to-XML generator such as Cocoon (an Apache open source free tool). Cocoon can query the database, generate XML, and has a tool for creating the XSL Transformation scripts to convert the first stage XML output into EML format.
            3. Use a specialized tool such as Xanthoria (like Cocoon in may respects, but is easier to use) to generate XML from the database. Then use a tool such as XML Spy or Stylus Studio to develop the XSLT script to convert the generated XML into EML compliant XML.
            4. Use specialized data transformation software such as the Data Junction  query the database and map it into an EML structure.

CASE 3: Metadata is already in XML but in some other form such as NBII or FGDC
CONVERSION METHOD:
           1. Write an XSLT script to convert from e.g. FGDC to EML.

NOTE: In each of the cases it may be necessary to add some additional
metadata in order to produce EML compliant metadata. Morpho will
automatically create EML compliant metadata either by adding it for
you or indicating that certain fields are mandatory.
    <long-answer>
  </faq-item>
  <faq-item id="11">
    <question> The challenge of getting my data into eml is not
insurmountable.  My question is what do I do with it when I get it
there? If I am storing all my metadata in text-based eml files, how am
I supposed to query them or use them for data management?</question>
<answer> For a site that has no current electronic data management
system and has no immediate intention of developing one, then there
are a number of solutions including the morpho-metacat solution. If
you store your metadata in a relational database management system or
plan to then there are also solutions. Cocooon and Xanthoria are
examples of programs that can get EML out of an RDBMS. Cocoon and
Xanthoria are both java applications that use java database connection
hooks and style sheets to retrieve and format data. Xanthoria is
smaller code and the XSLT stylesheets for EML 2.0 have already been
written. This solution lets a site stick with the rdbms system that
they probably have integrated with their site management activities,
yet also have their metadata exposed via EML.</answer>
  </faq-item>
  <faq-item id="12">
    <question> Does the modularity of EML mean that one descriptions
    can be shared by many documents?</question> 
    <answer> In a previous version, EML packages (via rdf style triples)
    supported linking across packages, so you could re-use the same
    document in multiple packages. In EML 2.0 release candidate 1 we
    redesigned the packaging structure to only allow linking within a
    single package.  Thus, one could reuse a party description or
    attribute list within a package, but not across several. This is a
    compromise that keeps some reusability but has fewer management
    problems.  Along with this change is an ability to put all
    metadata and data in a single document for transport -- while
    still not limiting ourselves to a monolithinc structure. This has
    benefits (akin to db normalization) and costs (access control,
    ownership, and multiple update problems abound).</answer>
  </faq-item>
  <faq-item id="13">
    <question>How are EML modules linked together?</question> <answer>
    With ref and ref:id attributes in each module.</answer>
    <long-answer> Our general approach in EML has been to create
    ComplexTypes (CT) when we wanted a particular block to be
    reusable. This concept was extended for linking modules together
    by adding an optional attribute named "id" of type "xs:ID" for
    each ComplexType.  This allows us to uniquely address each block
    defined by a CT, and any XML 1.0 parser will validate that all of
    the "id" values are in fact locally unique.  For the
    "ResourceBase" CT, this id element replaces the "identifier"
    element and acts as the overall identifier for the package.

The content model for each CT is a choice between the existing content
model and a new element named "references" of type "xs:string".  This
element is used to hold a reference to an existing subtree identified
by its id.  We use this element instead of an IDREF to surmount
validation issues. This relationship between the "references" element
and the "id" identifiers is enforced by defining an XML Schema "key"
for the "id" elements and a "keyref" for the "references" elements.
Thus, any XML parser that supports XML Schema validation will be able
to validate the correspondence between each "id" and "references"
field (e.g., Xerces 2.0 supports this).  Here's a fragemnt of an
example xml doc to illustrate:


    ... 
    <creator id="p1"> 
      <individualName><surName>Jones</surName></individualName> 
    </creator> 
    <associatedParty> 
      <references>p1</references> 
      <role>lackey</role> 
    </associatedParty> 
    <contact> 
      <references>p1</references> 
    </contact> 
    ... 

This even works for types that extend other types as long as the
subclass is the one that does the referencing (e.g., associatedParty
can reference creator, but not vice versa).  This rule will actually
be enforced by validating parsers.
  
The key and keyref are defined in the eml.xsd module. A package is
defined by all of the content included in the <eml> tag, including the
nested modules like attribute in entity.  The nature of the
association is implied by the types of the document (ie,
role/predicate/property/relationship is not specified directly).  The
reference/id linkage is enforced by defining another "keyref"
constraint.  So, this lets us add arbitrary metadata documents and
point them at existing ids in the tree. Thus, the id serves as both
ends of the link (subject and object in RDF terms) depending on
whether it is referred to in a "references" element or in a
"describes" attribute.</long-answer>
  </faq-item>
  <faq-item id="14">
    <question> Can I put data into EML as well as metadata?</question>
    <answer> Yes, there are provisions in the eml-physical module for
    inclusion of data. The module describes the structural
    characteristics of data formats as delivered over the wire or as
    found in a file system. One physical object (which can be a
    bytestream or an object in a file system) might contain multiple
    entities (for example, this would be typical in a MS Access file
    that contained multiple tables of data). However, it is typically
    used to describe a file or stream that is in some text-based
    format such as ASCII or UTF-8, and includes the information needed
    to parse the data stream to extract the entity and its attributes
    from the stream. There are 3 distribution types, online, offline,
    and inline. To include data in EML you would populate the inline
    element with the data file described in the data format
    element.</answer>
  </faq-item>
  <faq-item id="15">
    <question> What can I do with my EML structured metadata?</question>
    <answer> be very proud that you are limiting data entropy
    worldwide.</answer>
  </faq-item>
  <faq-item id="16">
    <question> Can I validate my EML documents against the
    DTD?</question> 
    <answer> Yes and No </answer>
    <long-answer>EML is implemented in an Extensible Markup Language (XML)
    known as XML Schema, which is a language that defines the rules
    that govern the EML syntax. XML Schema is an internet
    recommendation from the World Wide Web Consortium
    (http://www.w3.org), and so a metadata document that is said to
    comply with the syntax of EML will structurally meet the criteria
    defined in the XML Schema documents for EML. Over and above the
    structure (what elements can be nested within others, how many,
    etc.), XML Schema provides the ability to use strong data typing
    within elements. This allows for finer validation of the contents
    of the element, not just it's structure. For instance, an element
    may be of type 'date', and so the value that is inserted in the
    field will be checked against XML Schema's definition of a
    date. Traditionally, XML documents have been validated against
    Document Type Definitions (DTDs), which do not provide a means to
    employ strong validation on field values through typing. EML is
    also distributed with DTD's that are generated from the XML Schema
    documents to provide some backward compatability.</long-answer>
  </faq-item>
  <faq-item id="17">
    <question> Are there required elements in EML?</question>
    <answer>Yes, although we've made every attempt to limit required
    elements in the cause of flexibility there are a number of pieces
    of information required to make sense of the metadata document. To
    make the metadata more useful we do have recommended usages on the
    modules. See specification for details about required fields and
    recommended usage. In the future we may provide usage compliance
    information such that if you want your data and metadata to be
    useful in a particular analytical context you will be provided
    with those elements of EML that are required for that
    purpose.</answer>
  </faq-item>
  <faq-item id="18">
    <question> There appear to be multiple places to put some types of metadata
    in EML. How do I know which of these places is the right place for
    my information?</question>
    <answer> Call or email Peter McCartney.</answer>
  </faq-item>
</faq>