Why does it rain? Because it is wet. Because of accummulation of moisture in the atmosphere. Gravity overcomes levity. What is EML? EML stands for Ecological Metadata Language. It exists as a set of XML Schema DTDs that allow for the structural expression of metadata necessary to document a typical data set in the ecological sciences. Who is responsible for EML? The first two released versions of EML, EML 1.0 and EML 1.4.1 were developed at the National Center for Ecological Analysis and Synthesis (NCEAS), University of California at Santa Barbara, in Santa Barbara, California USA. EML 2.0 beta 9 and the EML 2.0 release candidate 1 were developed through community efforts that involved a number of ecological research projects and organizations. While the bulk of the work still comes from NCEAS, the Long Term Ecological Research Program sites, and individuals from a number of other research projects have had significant input into EML. Why would I want to use EML when FGDC now supports biological data through the CSDGM? modularity & extensible structures. The CSDGM is one huge monolithic standard, and so it is difficult to mix and match parts of it with other standards -- mainly because of all of the spatial requirements. So, we built EML as a series of modules that can be linked together and can be linked to other metadata standards. This gives us the most flexibility, and given that we can easily translate into FGDC compliant documents, there is little cost. Second, we're building advanced data processing tools that can automatically parse data sets and analyze them based on the EML metadata descriptions. Due to various shortcomings in the FGDC standard, mostly oriented around its tight focus on spatial data, we have found that the CSDGM isn't adequate for these needs. As a research project, we are constantly trying to expand the suite of services that metadata enables, and the FGDC spec isn't accommodating in that regard (e.g., how can one add machine parsable, semantically oriented attribute tags to FGDC? Answer, you can't, because it is monolithic and doesn't permit dynamic ties to other metadata specs -- the only extension method is a huge administrative task of actually creating a superset of the FGDC -- not very maintainable). In addition, the level of granularity for metadata in FGDC is very patchy -- it goes into tremendous detail for spatial projections, etc, but is incredibly terse with respect to describing methods and non-standard data formats. This is appropriate in the spatial world where there are so few data formats (< 100, many sensor derived streams), but not so good in ecology where there is no standardization of data formats (>>>5000, very few sensor derived). Is there documentation for EML in English? Yes, there is a formal specification of EML describing its development history, architecture, and modules. The intent of each module is described in narrative and there is a technical description of each module in XML notation. Included as part of the technical description is an element-by-element description of the module. We will eventually provide examples on usage. Why is EML such an important development? The last decade has witnessed a tremendous explosion of ecological and environmental data, catalyzed by societal concerns and facilitated by advancing technologies. These data have the potential to greatly enhance understanding of the complexity of the biosphere. However, broad-scale or synthetic research is stymied because data are largely unorganized and inaccessible as a consequence of their tremendous heterogeneity, complexity, and spatial dispersion in many separate repositories. EML is the first content standard designed specifically to address these issues for ecological data. Wide adoption and use of EML will create exciting new opportunities for data discovery, access, integration and synthesis. How do I get EML? All the documents associated with the EML development effort are available via the project web server at www.ecoinformatics.org. These projects are licensed under the GPL (Gnu Public License) agreement and can be freely distributed and modified. The EML Schema document is quite complex. An average ecologist probably cannot and more likely does not want to mark up content in an XML editor. How then do you get content into EML? The Knowledge Network for Biocomplexity project has developed a software client specifically to address this need. Morpho (after the butterfly genus) is written in java (making portable across computer platforms) combines an easy to use interface to EML with a number of tools to make it easier for ecologists to document data. These include a reverse-engineering wizard. Morpho is available from www.ecoinformatics.org. EML contains provisions for communication. Is it possible to document in EML dynamic online data resources? Yes, there are provisions in the eml-physical module for descriptions of online data resources.. The eml-physical module describes the structural characteristics of data formats as delivered over the wire or as found in a file system. One physical object (which can be a bytestream or an object in a file system) might contain multiple entities (for example, this would be typical in a MS Access file that contained multiple tables of data). However, it is typically used to describe a file or stream that is in some text-based format such as ASCII or UTF-8, and includes the information needed to parse the data stream to extract the entity and its attributes from the stream. There are 3 distribution types, online, offline, and inline. To describe an online dataset in EML you would populate the online element with the distribution information. Do I need to download special client software to use EML? No, but there is software available to work with EML. See FAQ 8. How can I get my existing metadata into EML? There are several approaches that can be used to convert existing metadata into EML depending on what form your existing metadata take. CASE 1: Metadata is currently in a text format (not stored in a database). CONVERSION METHODS: 1. Write a script (PERL, PHP, JAVA,etc.) to convert the text into EML compliant XML. 2. Convert the text metadata into XHTML (HTML that is XML compliant). Write an XSLT script to transform the XHTML file into EML compliant XML. 3. Use an special purpose XML editor that generates EML ( Morpho or Xylographa) and manually retype the metadata. 4. Use a general purpose XML development tool such as XML Spy that can create a sample document from an XML Schema and retype the metadata manually. 5. Use a simple text editor and do everything from scratch. 6. Use specialized data transformation software such as the Data Junction suite to extract text data and then map it into an EML structure. CASE 2: Metadata is stored in a relational database CONVERSION METHODS: 1. Both Microsoft SQL Server and Oracle have utilities to generate XML from their database. If you use a tool like that, then you will have to write an XSLT script to transform the generated XML into EML. 2. Use a vendor neutral Database-to-XML generator such as Cocoon (an Apache open source free tool). Cocoon can query the database, generate XML, and has a tool for creating the XSL Transformation scripts to convert the first stage XML output into EML format. 3. Use a specialized tool such as Xanthoria (like Cocoon in may respects, but is easier to use) to generate XML from the database. Then use a tool such as XML Spy or Stylus Studio to develop the XSLT script to convert the generated XML into EML compliant XML. 4. Use specialized data transformation software such as the Data Junction query the database and map it into an EML structure. CASE 3: Metadata is already in XML but in some other form such as NBII or FGDC CONVERSION METHOD: 1. Write an XSLT script to convert from e.g. FGDC to EML. NOTE: In each of the cases it may be necessary to add some additional metadata in order to produce EML compliant metadata. Morpho will automatically create EML compliant metadata either by adding it for you or indicating that certain fields are mandatory. The challenge of getting my data into eml is not insurmountable. My question is what do I do with it when I get it there? If I am storing all my metadata in text-based eml files, how am I supposed to query them or use them for data management? For a site that has no current electronic data management system and has no immediate intention of developing one, then there are a number of solutions including the morpho-metacat solution. If you store your metadata in a relational database management system or plan to then there are also solutions. Cocooon and Xanthoria are examples of programs that can get EML out of an RDBMS. Cocoon and Xanthoria are both java applications that use java database connection hooks and style sheets to retrieve and format data. Xanthoria is smaller code and the XSLT stylesheets for EML 2.0 have already been written. This solution lets a site stick with the rdbms system that they probably have integrated with their site management activities, yet also have their metadata exposed via EML. Does the modularity of EML mean that one descriptions can be shared by many documents? In a previous version, EML packages (via rdf style triples) supported linking across packages, so you could re-use the same document in multiple packages. In EML 2.0 release candidate 1 we redesigned the packaging structure to only allow linking within a single package. Thus, one could reuse a party description or attribute list within a package, but not across several. This is a compromise that keeps some reusability but has fewer management problems. Along with this change is an ability to put all metadata and data in a single document for transport -- while still not limiting ourselves to a monolithinc structure. This has benefits (akin to db normalization) and costs (access control, ownership, and multiple update problems abound). How are EML modules linked together? With ref and ref:id attributes in each module. Our general approach in EML has been to create ComplexTypes (CT) when we wanted a particular block to be reusable. This concept was extended for linking modules together by adding an optional attribute named "id" of type "xs:ID" for each ComplexType. This allows us to uniquely address each block defined by a CT, and any XML 1.0 parser will validate that all of the "id" values are in fact locally unique. For the "ResourceBase" CT, this id element replaces the "identifier" element and acts as the overall identifier for the package. The content model for each CT is a choice between the existing content model and a new element named "references" of type "xs:string". This element is used to hold a reference to an existing subtree identified by its id. We use this element instead of an IDREF to surmount validation issues. This relationship between the "references" element and the "id" identifiers is enforced by defining an XML Schema "key" for the "id" elements and a "keyref" for the "references" elements. Thus, any XML parser that supports XML Schema validation will be able to validate the correspondence between each "id" and "references" field (e.g., Xerces 2.0 supports this). Here's a fragemnt of an example xml doc to illustrate: ... Jones p1 lackey p1 ... This even works for types that extend other types as long as the subclass is the one that does the referencing (e.g., associatedParty can reference creator, but not vice versa). This rule will actually be enforced by validating parsers. The key and keyref are defined in the eml.xsd module. A package is defined by all of the content included in the tag, including the nested modules like attribute in entity. The nature of the association is implied by the types of the document (ie, role/predicate/property/relationship is not specified directly). The reference/id linkage is enforced by defining another "keyref" constraint. So, this lets us add arbitrary metadata documents and point them at existing ids in the tree. Thus, the id serves as both ends of the link (subject and object in RDF terms) depending on whether it is referred to in a "references" element or in a "describes" attribute. Can I put data into EML as well as metadata? Yes, there are provisions in the eml-physical module for inclusion of data. The module describes the structural characteristics of data formats as delivered over the wire or as found in a file system. One physical object (which can be a bytestream or an object in a file system) might contain multiple entities (for example, this would be typical in a MS Access file that contained multiple tables of data). However, it is typically used to describe a file or stream that is in some text-based format such as ASCII or UTF-8, and includes the information needed to parse the data stream to extract the entity and its attributes from the stream. There are 3 distribution types, online, offline, and inline. To include data in EML you would populate the inline element with the data file described in the data format element. What can I do with my EML structured metadata? be very proud that you are limiting data entropy worldwide. Can I validate my EML documents against the DTD? Yes and No EML is implemented in an Extensible Markup Language (XML) known as XML Schema, which is a language that defines the rules that govern the EML syntax. XML Schema is an internet recommendation from the World Wide Web Consortium (http://www.w3.org), and so a metadata document that is said to comply with the syntax of EML will structurally meet the criteria defined in the XML Schema documents for EML. Over and above the structure (what elements can be nested within others, how many, etc.), XML Schema provides the ability to use strong data typing within elements. This allows for finer validation of the contents of the element, not just it's structure. For instance, an element may be of type 'date', and so the value that is inserted in the field will be checked against XML Schema's definition of a date. Traditionally, XML documents have been validated against Document Type Definitions (DTDs), which do not provide a means to employ strong validation on field values through typing. EML is also distributed with DTD's that are generated from the XML Schema documents to provide some backward compatability. Are there required elements in EML? Yes, although we've made every attempt to limit required elements in the cause of flexibility there are a number of pieces of information required to make sense of the metadata document. To make the metadata more useful we do have recommended usages on the modules. See specification for details about required fields and recommended usage. In the future we may provide usage compliance information such that if you want your data and metadata to be useful in a particular analytical context you will be provided with those elements of EML that are required for that purpose. There appear to be multiple places to put some types of metadata in EML. How do I know which of these places is the right place for my information? Call or email Peter McCartney.