eml-faq.xml - EML - Ecoinformatics Redmine

Bug #495 » eml-faq.xml

James Brunt, 08/30/2002 10:55 AM

    
      <eml-faq version="0.1">

        <faq-item id="0">

          <question>Why does it rain?</question>

          <answer>Because it is wet.</answer>

          <long-answer>Because of accummulation of moisture in the

          atmosphere. Gravity overcomes levity.</long-answer>

        </faq-item>

        <faq-item id="1">

          <question>What is EML?</question> 

          <answer>EML stands for

          Ecological Metadata Language. It exists as a set of XML Schema

          DTDs that allow for the structural expression of metadata

          necessary to document a typical data set in the ecological

          sciences.</answer>

        </faq-item>

        <faq-item id="2">

          <question> Who is responsible for EML?</question> 

          <answer> The first two released versions of EML, EML 1.0 and EML 1.4.1

          were developed at the National Center for Ecological Analysis and

          Synthesis (NCEAS), University of California at Santa Barbara, in

          Santa Barbara, California USA. EML 2.0 beta 9 and the EML 2.0

          release candidate 1 were developed through community efforts that

          involved a number of ecological research projects and

          organizations. While the bulk of the work still comes from NCEAS,

          the Long Term Ecological Research Program sites, and individuals

          from a number of other research projects have had significant

          input into EML.</answer>

        </faq-item>

        <faq-item id="3">

          <question> Why would I want to use EML when FGDC now supports

         biological data through the CSDGM?</question> 

          <answer> modularity & extensible structures.</answer> 

          <long-answer> The CSDGM is one huge monolithic standard, and so it is

         difficult to mix and match parts of it with other standards --

         mainly because of all of the spatial requirements.  So, we built

         EML as a series of modules that can be linked together and can be

         linked to other metadata standards.  This gives us the most

         flexibility, and given that we can easily translate into FGDC

         compliant documents, there is little cost.  Second, we're building

         advanced data processing tools that can automatically parse data

         sets and analyze them based on the EML metadata descriptions.  Due

         to various shortcomings in the FGDC standard, mostly oriented

         around its tight focus on spatial data, we have found that the

         CSDGM isn't adequate for these needs.  As a research project, we

         are constantly trying to expand the suite of services that metadata

         enables, and the FGDC spec isn't accommodating in that regard

         (e.g., how can one add machine parsable, semantically oriented

         attribute tags to FGDC?  Answer, you can't, because it is

         monolithic and doesn't permit dynamic ties to other metadata specs

         -- the only extension method is a huge administrative task of

         actually creating a superset of the FGDC -- not very maintainable).

         In addition, the level of granularity for metadata in FGDC is very

         patchy -- it goes into tremendous detail for spatial projections,

         etc, but is incredibly terse with respect to describing methods and

         non-standard data formats.  This is appropriate in the spatial

         world where there are so few data formats (< 100, many sensor

         derived streams), but not so good in ecology where there is no

         standardization of data formats (>>>5000, very few sensor

         derived).</long-answer>

        </faq-item>

        <faq-item id="4">

          <question> Is there documentation for EML in English?</question>

          <answer>Yes, there is a formal specification of EML describing its

        development history, architecture, and modules. The intent of each

        module is described in narrative and there is a technical

        description of each module in XML notation. Included as part of the

        technical description is an element-by-element description of the

        module. We will eventually provide examples on usage.</answer>

        </faq-item>

        <faq-item id="5">

          <question> Why is EML such an important development?</question>

          <answer> The last decade has witnessed a tremendous explosion of

          ecological and environmental data, catalyzed by societal concerns

          and facilitated by advancing technologies. These data have the

          potential to greatly enhance understanding of the complexity of

          the biosphere. However, broad-scale or synthetic research is

          stymied because data are largely unorganized and inaccessible as a

          consequence of their tremendous heterogeneity, complexity, and

          spatial dispersion in many separate repositories. EML is the first

          content standard designed specifically to address these issues for

          ecological data. Wide adoption and use of EML will create exciting

          new opportunities for data discovery, access, integration and

          synthesis.</answer>

        </faq-item>

        <faq-item id="6">

          <question> How do I get EML?</question>

          <answer> All the documents associated with the EML development effort are

      available via the project web server at www.ecoinformatics.org. These

      projects are licensed under the GPL (Gnu Public License) agreement and

      can be freely distributed and modified.  </answer>

        </faq-item>

        <faq-item id="7">

          <question> The EML Schema document is quite complex. An average

      ecologist probably cannot and more likely does not want to mark up

      content in an XML editor. How then do you get content into

      EML?</question> 

          <answer>The Knowledge Network for Biocomplexity

      project has developed a software client specifically to address this

      need. Morpho (after the butterfly genus) is written in java (making

      portable across computer platforms) combines an easy to use interface

      to EML with a number of tools to make it easier for ecologists to

      document data. These include a reverse-engineering wizard.  Morpho is

      available from www.ecoinformatics.org.  </answer>

        </faq-item>

        <faq-item id="8">

          <question> EML contains provisions for communication. Is it

      possible to document in EML dynamic online data resources?</question>

      <answer>Yes, there are provisions in the eml-physical module for

      descriptions of online data resources.. The eml-physical module

      describes the structural characteristics of data formats as delivered

      over the wire or as found in a file system. One physical object (which

      can be a bytestream or an object in a file system) might contain

      multiple entities (for example, this would be typical in a MS Access

      file that contained multiple tables of data). However, it is typically

      used to describe a file or stream that is in some text-based format

      such as ASCII or UTF-8, and includes the information needed to parse

      the data stream to extract the entity and its attributes from the

      stream. There are 3 distribution types, online, offline, and

      inline. To describe an online dataset in EML you would populate the

      online element with the distribution information.  </answer>

        </faq-item>

        <faq-item id="9">

          <question> Do I need to download special client software to use

      EML?</question> 

          <answer>No, but there is software available to work with EML. See FAQ 8.</answer>

        </faq-item>

        <faq-item id="10">

          <question> How can I get my existing metadata into EML?</question>

          <answer>There are several approaches that can be used to convert

          existing metadata into EML depending on what form your existing

          metadata take.</answer>

          <long-answer>

      CASE 1: Metadata is currently in a text format (not stored in a database).

      CONVERSION METHODS:

                   1. Write a script (PERL, PHP, JAVA,etc.) to convert the text into EML compliant XML.

                   2. Convert the text metadata into XHTML (HTML that is XML compliant). Write an XSLT script to transform the XHTML file into EML compliant XML.

                   3. Use an special purpose XML editor that generates EML ( Morpho or Xylographa)  and manually retype the metadata.

                   4. Use a general purpose XML development tool such as XML Spy that can create a sample document from an XML Schema and retype the metadata manually.

                   5. Use a simple text editor and do everything from scratch.

                   6. Use specialized data transformation software such as the Data Junction suite to extract text data and then map it into an EML structure.

      CASE 2: Metadata is stored in a relational database

      CONVERSION METHODS:

                  1. Both Microsoft SQL Server and Oracle have utilities to generate XML from their database. If you use a tool like that, then you will have to write an XSLT script to transform the generated XML into EML.

                  2. Use a vendor neutral Database-to-XML generator such as Cocoon (an Apache open source free tool). Cocoon can query the database, generate XML, and has a tool for creating the XSL Transformation scripts to convert the first stage XML output into EML format.

                  3. Use a specialized tool such as Xanthoria (like Cocoon in may respects, but is easier to use) to generate XML from the database. Then use a tool such as XML Spy or Stylus Studio to develop the XSLT script to convert the generated XML into EML compliant XML.

                  4. Use specialized data transformation software such as the Data Junction  query the database and map it into an EML structure.

      CASE 3: Metadata is already in XML but in some other form such as NBII or FGDC

      CONVERSION METHOD:

                 1. Write an XSLT script to convert from e.g. FGDC to EML.

      NOTE: In each of the cases it may be necessary to add some additional

      metadata in order to produce EML compliant metadata. Morpho will

      automatically create EML compliant metadata either by adding it for

      you or indicating that certain fields are mandatory.

          <long-answer>

        </faq-item>

        <faq-item id="11">

          <question> The challenge of getting my data into eml is not

      insurmountable.  My question is what do I do with it when I get it

      there? If I am storing all my metadata in text-based eml files, how am

      I supposed to query them or use them for data management?</question>

      <answer> For a site that has no current electronic data management

      system and has no immediate intention of developing one, then there

      are a number of solutions including the morpho-metacat solution. If

      you store your metadata in a relational database management system or

      plan to then there are also solutions. Cocooon and Xanthoria are

      examples of programs that can get EML out of an RDBMS. Cocoon and

      Xanthoria are both java applications that use java database connection

      hooks and style sheets to retrieve and format data. Xanthoria is

      smaller code and the XSLT stylesheets for EML 2.0 have already been

      written. This solution lets a site stick with the rdbms system that

      they probably have integrated with their site management activities,

      yet also have their metadata exposed via EML.</answer>

        </faq-item>

        <faq-item id="12">

          <question> Does the modularity of EML mean that one descriptions

          can be shared by many documents?</question> 

          <answer> In a previous version, EML packages (via rdf style triples)

          supported linking across packages, so you could re-use the same

          document in multiple packages. In EML 2.0 release candidate 1 we

          redesigned the packaging structure to only allow linking within a

          single package.  Thus, one could reuse a party description or

          attribute list within a package, but not across several. This is a

          compromise that keeps some reusability but has fewer management

          problems.  Along with this change is an ability to put all

          metadata and data in a single document for transport -- while

          still not limiting ourselves to a monolithinc structure. This has

          benefits (akin to db normalization) and costs (access control,

          ownership, and multiple update problems abound).</answer>

        </faq-item>

        <faq-item id="13">

          <question>How are EML modules linked together?</question> <answer>

          With ref and ref:id attributes in each module.</answer>

          <long-answer> Our general approach in EML has been to create

          ComplexTypes (CT) when we wanted a particular block to be

          reusable. This concept was extended for linking modules together

          by adding an optional attribute named "id" of type "xs:ID" for

          each ComplexType.  This allows us to uniquely address each block

          defined by a CT, and any XML 1.0 parser will validate that all of

          the "id" values are in fact locally unique.  For the

          "ResourceBase" CT, this id element replaces the "identifier"

          element and acts as the overall identifier for the package.

      The content model for each CT is a choice between the existing content

      model and a new element named "references" of type "xs:string".  This

      element is used to hold a reference to an existing subtree identified

      by its id.  We use this element instead of an IDREF to surmount

      validation issues. This relationship between the "references" element

      and the "id" identifiers is enforced by defining an XML Schema "key"

      for the "id" elements and a "keyref" for the "references" elements.

      Thus, any XML parser that supports XML Schema validation will be able

      to validate the correspondence between each "id" and "references"

      field (e.g., Xerces 2.0 supports this).  Here's a fragemnt of an

      example xml doc to illustrate:

          ... 

          <creator id="p1"> 

            <individualName><surName>Jones</surName></individualName> 

          </creator> 

          <associatedParty> 

            <references>p1</references> 

            <role>lackey</role> 

          </associatedParty> 

          <contact> 

            <references>p1</references> 

          </contact> 

          ... 

      This even works for types that extend other types as long as the

      subclass is the one that does the referencing (e.g., associatedParty

      can reference creator, but not vice versa).  This rule will actually

      be enforced by validating parsers.

      The key and keyref are defined in the eml.xsd module. A package is

      defined by all of the content included in the <eml> tag, including the

      nested modules like attribute in entity.  The nature of the

      association is implied by the types of the document (ie,

      role/predicate/property/relationship is not specified directly).  The

      reference/id linkage is enforced by defining another "keyref"

      constraint.  So, this lets us add arbitrary metadata documents and

      point them at existing ids in the tree. Thus, the id serves as both

      ends of the link (subject and object in RDF terms) depending on

      whether it is referred to in a "references" element or in a

      "describes" attribute.</long-answer>

        </faq-item>

        <faq-item id="14">

          <question> Can I put data into EML as well as metadata?</question>

          <answer> Yes, there are provisions in the eml-physical module for

          inclusion of data. The module describes the structural

          characteristics of data formats as delivered over the wire or as

          found in a file system. One physical object (which can be a

          bytestream or an object in a file system) might contain multiple

          entities (for example, this would be typical in a MS Access file

          that contained multiple tables of data). However, it is typically

          used to describe a file or stream that is in some text-based

          format such as ASCII or UTF-8, and includes the information needed

          to parse the data stream to extract the entity and its attributes

          from the stream. There are 3 distribution types, online, offline,

          and inline. To include data in EML you would populate the inline

          element with the data file described in the data format

          element.</answer>

        </faq-item>

        <faq-item id="15">

          <question> What can I do with my EML structured metadata?</question>

          <answer> be very proud that you are limiting data entropy

          worldwide.</answer>

        </faq-item>

        <faq-item id="16">

          <question> Can I validate my EML documents against the

          DTD?</question> 

          <answer> Yes and No </answer>

          <long-answer>EML is implemented in an Extensible Markup Language (XML)

          known as XML Schema, which is a language that defines the rules

          that govern the EML syntax. XML Schema is an internet

          recommendation from the World Wide Web Consortium

          (http://www.w3.org), and so a metadata document that is said to

          comply with the syntax of EML will structurally meet the criteria

          defined in the XML Schema documents for EML. Over and above the

          structure (what elements can be nested within others, how many,

          etc.), XML Schema provides the ability to use strong data typing

          within elements. This allows for finer validation of the contents

          of the element, not just it's structure. For instance, an element

          may be of type 'date', and so the value that is inserted in the

          field will be checked against XML Schema's definition of a

          date. Traditionally, XML documents have been validated against

          Document Type Definitions (DTDs), which do not provide a means to

          employ strong validation on field values through typing. EML is

          also distributed with DTD's that are generated from the XML Schema

          documents to provide some backward compatability.</long-answer>

        </faq-item>

        <faq-item id="17">

          <question> Are there required elements in EML?</question>

          <answer>Yes, although we've made every attempt to limit required

          elements in the cause of flexibility there are a number of pieces

          of information required to make sense of the metadata document. To

          make the metadata more useful we do have recommended usages on the

          modules. See specification for details about required fields and

          recommended usage. In the future we may provide usage compliance

          information such that if you want your data and metadata to be

          useful in a particular analytical context you will be provided

          with those elements of EML that are required for that

          purpose.</answer>

        </faq-item>

        <faq-item id="18">

          <question> There appear to be multiple places to put some types of metadata

          in EML. How do I know which of these places is the right place for

          my information?</question>

          <answer> Call or email Peter McCartney.</answer>

        </faq-item>

      </faq>

« Previous
1
2
3
Next »

(3-3/3)

Project

General

Profile

EML

Bug #495 » eml-faq.xml