Project

General

Profile

Bug #495 » eml-faq.xml

James Brunt, 08/30/2002 10:55 AM

 
1
<eml-faq version="0.1">
2
  <faq-item id="0">
3
    <question>Why does it rain?</question>
4
    <answer>Because it is wet.</answer>
5
    <long-answer>Because of accummulation of moisture in the
6
    atmosphere. Gravity overcomes levity.</long-answer>
7
  </faq-item>
8
  <faq-item id="1">
9
    <question>What is EML?</question> 
10
    <answer>EML stands for
11
    Ecological Metadata Language. It exists as a set of XML Schema
12
    DTDs that allow for the structural expression of metadata
13
    necessary to document a typical data set in the ecological
14
    sciences.</answer>
15
  </faq-item>
16
  <faq-item id="2">
17
    <question> Who is responsible for EML?</question> 
18
    <answer> The first two released versions of EML, EML 1.0 and EML 1.4.1
19
    were developed at the National Center for Ecological Analysis and
20
    Synthesis (NCEAS), University of California at Santa Barbara, in
21
    Santa Barbara, California USA. EML 2.0 beta 9 and the EML 2.0
22
    release candidate 1 were developed through community efforts that
23
    involved a number of ecological research projects and
24
    organizations. While the bulk of the work still comes from NCEAS,
25
    the Long Term Ecological Research Program sites, and individuals
26
    from a number of other research projects have had significant
27
    input into EML.</answer>
28
  </faq-item>
29
  <faq-item id="3">
30
    <question> Why would I want to use EML when FGDC now supports
31
   biological data through the CSDGM?</question> 
32
    <answer> modularity & extensible structures.</answer> 
33
    <long-answer> The CSDGM is one huge monolithic standard, and so it is
34
   difficult to mix and match parts of it with other standards --
35
   mainly because of all of the spatial requirements.  So, we built
36
   EML as a series of modules that can be linked together and can be
37
   linked to other metadata standards.  This gives us the most
38
   flexibility, and given that we can easily translate into FGDC
39
   compliant documents, there is little cost.  Second, we're building
40
   advanced data processing tools that can automatically parse data
41
   sets and analyze them based on the EML metadata descriptions.  Due
42
   to various shortcomings in the FGDC standard, mostly oriented
43
   around its tight focus on spatial data, we have found that the
44
   CSDGM isn't adequate for these needs.  As a research project, we
45
   are constantly trying to expand the suite of services that metadata
46
   enables, and the FGDC spec isn't accommodating in that regard
47
   (e.g., how can one add machine parsable, semantically oriented
48
   attribute tags to FGDC?  Answer, you can't, because it is
49
   monolithic and doesn't permit dynamic ties to other metadata specs
50
   -- the only extension method is a huge administrative task of
51
   actually creating a superset of the FGDC -- not very maintainable).
52
   In addition, the level of granularity for metadata in FGDC is very
53
   patchy -- it goes into tremendous detail for spatial projections,
54
   etc, but is incredibly terse with respect to describing methods and
55
   non-standard data formats.  This is appropriate in the spatial
56
   world where there are so few data formats (< 100, many sensor
57
   derived streams), but not so good in ecology where there is no
58
   standardization of data formats (>>>5000, very few sensor
59
   derived).</long-answer>
60
  </faq-item>
61
  <faq-item id="4">
62
    <question> Is there documentation for EML in English?</question>
63
    <answer>Yes, there is a formal specification of EML describing its
64
  development history, architecture, and modules. The intent of each
65
  module is described in narrative and there is a technical
66
  description of each module in XML notation. Included as part of the
67
  technical description is an element-by-element description of the
68
  module. We will eventually provide examples on usage.</answer>
69
  </faq-item>
70
  <faq-item id="5">
71
    <question> Why is EML such an important development?</question>
72
    <answer> The last decade has witnessed a tremendous explosion of
73
    ecological and environmental data, catalyzed by societal concerns
74
    and facilitated by advancing technologies. These data have the
75
    potential to greatly enhance understanding of the complexity of
76
    the biosphere. However, broad-scale or synthetic research is
77
    stymied because data are largely unorganized and inaccessible as a
78
    consequence of their tremendous heterogeneity, complexity, and
79
    spatial dispersion in many separate repositories. EML is the first
80
    content standard designed specifically to address these issues for
81
    ecological data. Wide adoption and use of EML will create exciting
82
    new opportunities for data discovery, access, integration and
83
    synthesis.</answer>
84
  </faq-item>
85
  <faq-item id="6">
86
    <question> How do I get EML?</question>
87
    <answer> All the documents associated with the EML development effort are
88
available via the project web server at www.ecoinformatics.org. These
89
projects are licensed under the GPL (Gnu Public License) agreement and
90
can be freely distributed and modified.  </answer>
91
  </faq-item>
92
  <faq-item id="7">
93
    <question> The EML Schema document is quite complex. An average
94
ecologist probably cannot and more likely does not want to mark up
95
content in an XML editor. How then do you get content into
96
EML?</question> 
97
    <answer>The Knowledge Network for Biocomplexity
98
project has developed a software client specifically to address this
99
need. Morpho (after the butterfly genus) is written in java (making
100
portable across computer platforms) combines an easy to use interface
101
to EML with a number of tools to make it easier for ecologists to
102
document data. These include a reverse-engineering wizard.  Morpho is
103
available from www.ecoinformatics.org.  </answer>
104
  </faq-item>
105
  <faq-item id="8">
106
    <question> EML contains provisions for communication. Is it
107
possible to document in EML dynamic online data resources?</question>
108
<answer>Yes, there are provisions in the eml-physical module for
109
descriptions of online data resources.. The eml-physical module
110
describes the structural characteristics of data formats as delivered
111
over the wire or as found in a file system. One physical object (which
112
can be a bytestream or an object in a file system) might contain
113
multiple entities (for example, this would be typical in a MS Access
114
file that contained multiple tables of data). However, it is typically
115
used to describe a file or stream that is in some text-based format
116
such as ASCII or UTF-8, and includes the information needed to parse
117
the data stream to extract the entity and its attributes from the
118
stream. There are 3 distribution types, online, offline, and
119
inline. To describe an online dataset in EML you would populate the
120
online element with the distribution information.  </answer>
121
  </faq-item>
122
  <faq-item id="9">
123
    <question> Do I need to download special client software to use
124
EML?</question> 
125
    <answer>No, but there is software available to work with EML. See FAQ 8.</answer>
126
  </faq-item>
127
  <faq-item id="10">
128
    <question> How can I get my existing metadata into EML?</question>
129
    <answer>There are several approaches that can be used to convert
130
    existing metadata into EML depending on what form your existing
131
    metadata take.</answer>
132
    <long-answer>
133
CASE 1: Metadata is currently in a text format (not stored in a database).
134
CONVERSION METHODS:
135
             1. Write a script (PERL, PHP, JAVA,etc.) to convert the text into EML compliant XML.
136
             2. Convert the text metadata into XHTML (HTML that is XML compliant). Write an XSLT script to transform the XHTML file into EML compliant XML.
137
             3. Use an special purpose XML editor that generates EML ( Morpho or Xylographa)  and manually retype the metadata.
138
             4. Use a general purpose XML development tool such as XML Spy that can create a sample document from an XML Schema and retype the metadata manually.
139
             5. Use a simple text editor and do everything from scratch.
140
             6. Use specialized data transformation software such as the Data Junction suite to extract text data and then map it into an EML structure.
141

    
142
CASE 2: Metadata is stored in a relational database
143
CONVERSION METHODS:
144
            1. Both Microsoft SQL Server and Oracle have utilities to generate XML from their database. If you use a tool like that, then you will have to write an XSLT script to transform the generated XML into EML.
145
            2. Use a vendor neutral Database-to-XML generator such as Cocoon (an Apache open source free tool). Cocoon can query the database, generate XML, and has a tool for creating the XSL Transformation scripts to convert the first stage XML output into EML format.
146
            3. Use a specialized tool such as Xanthoria (like Cocoon in may respects, but is easier to use) to generate XML from the database. Then use a tool such as XML Spy or Stylus Studio to develop the XSLT script to convert the generated XML into EML compliant XML.
147
            4. Use specialized data transformation software such as the Data Junction  query the database and map it into an EML structure.
148

    
149
CASE 3: Metadata is already in XML but in some other form such as NBII or FGDC
150
CONVERSION METHOD:
151
           1. Write an XSLT script to convert from e.g. FGDC to EML.
152

    
153
NOTE: In each of the cases it may be necessary to add some additional
154
metadata in order to produce EML compliant metadata. Morpho will
155
automatically create EML compliant metadata either by adding it for
156
you or indicating that certain fields are mandatory.
157
    <long-answer>
158
  </faq-item>
159
  <faq-item id="11">
160
    <question> The challenge of getting my data into eml is not
161
insurmountable.  My question is what do I do with it when I get it
162
there? If I am storing all my metadata in text-based eml files, how am
163
I supposed to query them or use them for data management?</question>
164
<answer> For a site that has no current electronic data management
165
system and has no immediate intention of developing one, then there
166
are a number of solutions including the morpho-metacat solution. If
167
you store your metadata in a relational database management system or
168
plan to then there are also solutions. Cocooon and Xanthoria are
169
examples of programs that can get EML out of an RDBMS. Cocoon and
170
Xanthoria are both java applications that use java database connection
171
hooks and style sheets to retrieve and format data. Xanthoria is
172
smaller code and the XSLT stylesheets for EML 2.0 have already been
173
written. This solution lets a site stick with the rdbms system that
174
they probably have integrated with their site management activities,
175
yet also have their metadata exposed via EML.</answer>
176
  </faq-item>
177
  <faq-item id="12">
178
    <question> Does the modularity of EML mean that one descriptions
179
    can be shared by many documents?</question> 
180
    <answer> In a previous version, EML packages (via rdf style triples)
181
    supported linking across packages, so you could re-use the same
182
    document in multiple packages. In EML 2.0 release candidate 1 we
183
    redesigned the packaging structure to only allow linking within a
184
    single package.  Thus, one could reuse a party description or
185
    attribute list within a package, but not across several. This is a
186
    compromise that keeps some reusability but has fewer management
187
    problems.  Along with this change is an ability to put all
188
    metadata and data in a single document for transport -- while
189
    still not limiting ourselves to a monolithinc structure. This has
190
    benefits (akin to db normalization) and costs (access control,
191
    ownership, and multiple update problems abound).</answer>
192
  </faq-item>
193
  <faq-item id="13">
194
    <question>How are EML modules linked together?</question> <answer>
195
    With ref and ref:id attributes in each module.</answer>
196
    <long-answer> Our general approach in EML has been to create
197
    ComplexTypes (CT) when we wanted a particular block to be
198
    reusable. This concept was extended for linking modules together
199
    by adding an optional attribute named "id" of type "xs:ID" for
200
    each ComplexType.  This allows us to uniquely address each block
201
    defined by a CT, and any XML 1.0 parser will validate that all of
202
    the "id" values are in fact locally unique.  For the
203
    "ResourceBase" CT, this id element replaces the "identifier"
204
    element and acts as the overall identifier for the package.
205

    
206
The content model for each CT is a choice between the existing content
207
model and a new element named "references" of type "xs:string".  This
208
element is used to hold a reference to an existing subtree identified
209
by its id.  We use this element instead of an IDREF to surmount
210
validation issues. This relationship between the "references" element
211
and the "id" identifiers is enforced by defining an XML Schema "key"
212
for the "id" elements and a "keyref" for the "references" elements.
213
Thus, any XML parser that supports XML Schema validation will be able
214
to validate the correspondence between each "id" and "references"
215
field (e.g., Xerces 2.0 supports this).  Here's a fragemnt of an
216
example xml doc to illustrate:
217

    
218

    
219
    ... 
220
    <creator id="p1"> 
221
      <individualName><surName>Jones</surName></individualName> 
222
    </creator> 
223
    <associatedParty> 
224
      <references>p1</references> 
225
      <role>lackey</role> 
226
    </associatedParty> 
227
    <contact> 
228
      <references>p1</references> 
229
    </contact> 
230
    ... 
231

    
232
This even works for types that extend other types as long as the
233
subclass is the one that does the referencing (e.g., associatedParty
234
can reference creator, but not vice versa).  This rule will actually
235
be enforced by validating parsers.
236
  
237
The key and keyref are defined in the eml.xsd module. A package is
238
defined by all of the content included in the <eml> tag, including the
239
nested modules like attribute in entity.  The nature of the
240
association is implied by the types of the document (ie,
241
role/predicate/property/relationship is not specified directly).  The
242
reference/id linkage is enforced by defining another "keyref"
243
constraint.  So, this lets us add arbitrary metadata documents and
244
point them at existing ids in the tree. Thus, the id serves as both
245
ends of the link (subject and object in RDF terms) depending on
246
whether it is referred to in a "references" element or in a
247
"describes" attribute.</long-answer>
248
  </faq-item>
249
  <faq-item id="14">
250
    <question> Can I put data into EML as well as metadata?</question>
251
    <answer> Yes, there are provisions in the eml-physical module for
252
    inclusion of data. The module describes the structural
253
    characteristics of data formats as delivered over the wire or as
254
    found in a file system. One physical object (which can be a
255
    bytestream or an object in a file system) might contain multiple
256
    entities (for example, this would be typical in a MS Access file
257
    that contained multiple tables of data). However, it is typically
258
    used to describe a file or stream that is in some text-based
259
    format such as ASCII or UTF-8, and includes the information needed
260
    to parse the data stream to extract the entity and its attributes
261
    from the stream. There are 3 distribution types, online, offline,
262
    and inline. To include data in EML you would populate the inline
263
    element with the data file described in the data format
264
    element.</answer>
265
  </faq-item>
266
  <faq-item id="15">
267
    <question> What can I do with my EML structured metadata?</question>
268
    <answer> be very proud that you are limiting data entropy
269
    worldwide.</answer>
270
  </faq-item>
271
  <faq-item id="16">
272
    <question> Can I validate my EML documents against the
273
    DTD?</question> 
274
    <answer> Yes and No </answer>
275
    <long-answer>EML is implemented in an Extensible Markup Language (XML)
276
    known as XML Schema, which is a language that defines the rules
277
    that govern the EML syntax. XML Schema is an internet
278
    recommendation from the World Wide Web Consortium
279
    (http://www.w3.org), and so a metadata document that is said to
280
    comply with the syntax of EML will structurally meet the criteria
281
    defined in the XML Schema documents for EML. Over and above the
282
    structure (what elements can be nested within others, how many,
283
    etc.), XML Schema provides the ability to use strong data typing
284
    within elements. This allows for finer validation of the contents
285
    of the element, not just it's structure. For instance, an element
286
    may be of type 'date', and so the value that is inserted in the
287
    field will be checked against XML Schema's definition of a
288
    date. Traditionally, XML documents have been validated against
289
    Document Type Definitions (DTDs), which do not provide a means to
290
    employ strong validation on field values through typing. EML is
291
    also distributed with DTD's that are generated from the XML Schema
292
    documents to provide some backward compatability.</long-answer>
293
  </faq-item>
294
  <faq-item id="17">
295
    <question> Are there required elements in EML?</question>
296
    <answer>Yes, although we've made every attempt to limit required
297
    elements in the cause of flexibility there are a number of pieces
298
    of information required to make sense of the metadata document. To
299
    make the metadata more useful we do have recommended usages on the
300
    modules. See specification for details about required fields and
301
    recommended usage. In the future we may provide usage compliance
302
    information such that if you want your data and metadata to be
303
    useful in a particular analytical context you will be provided
304
    with those elements of EML that are required for that
305
    purpose.</answer>
306
  </faq-item>
307
  <faq-item id="18">
308
    <question> There appear to be multiple places to put some types of metadata
309
    in EML. How do I know which of these places is the right place for
310
    my information?</question>
311
    <answer> Call or email Peter McCartney.</answer>
312
  </faq-item>
313
</faq>
(3-3/3)