1
|
<eml-faq version="0.1">
|
2
|
<faq-item id="0">
|
3
|
<question>Why does it rain?</question>
|
4
|
<answer>Because it is wet.</answer>
|
5
|
<long-answer>Because of accummulation of moisture in the
|
6
|
atmosphere. Gravity overcomes levity.</long-answer>
|
7
|
</faq-item>
|
8
|
<faq-item id="1">
|
9
|
<question>What is EML?</question>
|
10
|
<answer>EML stands for
|
11
|
Ecological Metadata Language. It exists as a set of XML Schema
|
12
|
DTDs that allow for the structural expression of metadata
|
13
|
necessary to document a typical data set in the ecological
|
14
|
sciences.</answer>
|
15
|
</faq-item>
|
16
|
<faq-item id="2">
|
17
|
<question> Who is responsible for EML?</question>
|
18
|
<answer> The first two released versions of EML, EML 1.0 and EML 1.4.1
|
19
|
were developed at the National Center for Ecological Analysis and
|
20
|
Synthesis (NCEAS), University of California at Santa Barbara, in
|
21
|
Santa Barbara, California USA. EML 2.0 beta 9 and the EML 2.0
|
22
|
release candidate 1 were developed through community efforts that
|
23
|
involved a number of ecological research projects and
|
24
|
organizations. While the bulk of the work still comes from NCEAS,
|
25
|
the Long Term Ecological Research Program sites, and individuals
|
26
|
from a number of other research projects have had significant
|
27
|
input into EML.</answer>
|
28
|
</faq-item>
|
29
|
<faq-item id="3">
|
30
|
<question> Why would I want to use EML when FGDC now supports
|
31
|
biological data through the CSDGM?</question>
|
32
|
<answer> modularity & extensible structures.</answer>
|
33
|
<long-answer> The CSDGM is one huge monolithic standard, and so it is
|
34
|
difficult to mix and match parts of it with other standards --
|
35
|
mainly because of all of the spatial requirements. So, we built
|
36
|
EML as a series of modules that can be linked together and can be
|
37
|
linked to other metadata standards. This gives us the most
|
38
|
flexibility, and given that we can easily translate into FGDC
|
39
|
compliant documents, there is little cost. Second, we're building
|
40
|
advanced data processing tools that can automatically parse data
|
41
|
sets and analyze them based on the EML metadata descriptions. Due
|
42
|
to various shortcomings in the FGDC standard, mostly oriented
|
43
|
around its tight focus on spatial data, we have found that the
|
44
|
CSDGM isn't adequate for these needs. As a research project, we
|
45
|
are constantly trying to expand the suite of services that metadata
|
46
|
enables, and the FGDC spec isn't accommodating in that regard
|
47
|
(e.g., how can one add machine parsable, semantically oriented
|
48
|
attribute tags to FGDC? Answer, you can't, because it is
|
49
|
monolithic and doesn't permit dynamic ties to other metadata specs
|
50
|
-- the only extension method is a huge administrative task of
|
51
|
actually creating a superset of the FGDC -- not very maintainable).
|
52
|
In addition, the level of granularity for metadata in FGDC is very
|
53
|
patchy -- it goes into tremendous detail for spatial projections,
|
54
|
etc, but is incredibly terse with respect to describing methods and
|
55
|
non-standard data formats. This is appropriate in the spatial
|
56
|
world where there are so few data formats (< 100, many sensor
|
57
|
derived streams), but not so good in ecology where there is no
|
58
|
standardization of data formats (>>>5000, very few sensor
|
59
|
derived).</long-answer>
|
60
|
</faq-item>
|
61
|
<faq-item id="4">
|
62
|
<question> Is there documentation for EML in English?</question>
|
63
|
<answer>Yes, there is a formal specification of EML describing its
|
64
|
development history, architecture, and modules. The intent of each
|
65
|
module is described in narrative and there is a technical
|
66
|
description of each module in XML notation. Included as part of the
|
67
|
technical description is an element-by-element description of the
|
68
|
module. We will eventually provide examples on usage.</answer>
|
69
|
</faq-item>
|
70
|
<faq-item id="5">
|
71
|
<question> Why is EML such an important development?</question>
|
72
|
<answer> The last decade has witnessed a tremendous explosion of
|
73
|
ecological and environmental data, catalyzed by societal concerns
|
74
|
and facilitated by advancing technologies. These data have the
|
75
|
potential to greatly enhance understanding of the complexity of
|
76
|
the biosphere. However, broad-scale or synthetic research is
|
77
|
stymied because data are largely unorganized and inaccessible as a
|
78
|
consequence of their tremendous heterogeneity, complexity, and
|
79
|
spatial dispersion in many separate repositories. EML is the first
|
80
|
content standard designed specifically to address these issues for
|
81
|
ecological data. Wide adoption and use of EML will create exciting
|
82
|
new opportunities for data discovery, access, integration and
|
83
|
synthesis.</answer>
|
84
|
</faq-item>
|
85
|
<faq-item id="6">
|
86
|
<question> How do I get EML?</question>
|
87
|
<answer> All the documents associated with the EML development effort are
|
88
|
available via the project web server at www.ecoinformatics.org. These
|
89
|
projects are licensed under the GPL (Gnu Public License) agreement and
|
90
|
can be freely distributed and modified. </answer>
|
91
|
</faq-item>
|
92
|
<faq-item id="7">
|
93
|
<question> The EML Schema document is quite complex. An average
|
94
|
ecologist probably cannot and more likely does not want to mark up
|
95
|
content in an XML editor. How then do you get content into
|
96
|
EML?</question>
|
97
|
<answer>The Knowledge Network for Biocomplexity
|
98
|
project has developed a software client specifically to address this
|
99
|
need. Morpho (after the butterfly genus) is written in java (making
|
100
|
portable across computer platforms) combines an easy to use interface
|
101
|
to EML with a number of tools to make it easier for ecologists to
|
102
|
document data. These include a reverse-engineering wizard. Morpho is
|
103
|
available from www.ecoinformatics.org. </answer>
|
104
|
</faq-item>
|
105
|
<faq-item id="8">
|
106
|
<question> EML contains provisions for communication. Is it
|
107
|
possible to document in EML dynamic online data resources?</question>
|
108
|
<answer>Yes, there are provisions in the eml-physical module for
|
109
|
descriptions of online data resources.. The eml-physical module
|
110
|
describes the structural characteristics of data formats as delivered
|
111
|
over the wire or as found in a file system. One physical object (which
|
112
|
can be a bytestream or an object in a file system) might contain
|
113
|
multiple entities (for example, this would be typical in a MS Access
|
114
|
file that contained multiple tables of data). However, it is typically
|
115
|
used to describe a file or stream that is in some text-based format
|
116
|
such as ASCII or UTF-8, and includes the information needed to parse
|
117
|
the data stream to extract the entity and its attributes from the
|
118
|
stream. There are 3 distribution types, online, offline, and
|
119
|
inline. To describe an online dataset in EML you would populate the
|
120
|
online element with the distribution information. </answer>
|
121
|
</faq-item>
|
122
|
<faq-item id="9">
|
123
|
<question> Do I need to download special client software to use
|
124
|
EML?</question>
|
125
|
<answer>No, but there is software available to work with EML. See FAQ 8.</answer>
|
126
|
</faq-item>
|
127
|
<faq-item id="10">
|
128
|
<question> How can I get my existing metadata into EML?</question>
|
129
|
<answer>There are several approaches that can be used to convert
|
130
|
existing metadata into EML depending on what form your existing
|
131
|
metadata take.</answer>
|
132
|
<long-answer>
|
133
|
CASE 1: Metadata is currently in a text format (not stored in a database).
|
134
|
CONVERSION METHODS:
|
135
|
1. Write a script (PERL, PHP, JAVA,etc.) to convert the text into EML compliant XML.
|
136
|
2. Convert the text metadata into XHTML (HTML that is XML compliant). Write an XSLT script to transform the XHTML file into EML compliant XML.
|
137
|
3. Use an special purpose XML editor that generates EML ( Morpho or Xylographa) and manually retype the metadata.
|
138
|
4. Use a general purpose XML development tool such as XML Spy that can create a sample document from an XML Schema and retype the metadata manually.
|
139
|
5. Use a simple text editor and do everything from scratch.
|
140
|
6. Use specialized data transformation software such as the Data Junction suite to extract text data and then map it into an EML structure.
|
141
|
|
142
|
CASE 2: Metadata is stored in a relational database
|
143
|
CONVERSION METHODS:
|
144
|
1. Both Microsoft SQL Server and Oracle have utilities to generate XML from their database. If you use a tool like that, then you will have to write an XSLT script to transform the generated XML into EML.
|
145
|
2. Use a vendor neutral Database-to-XML generator such as Cocoon (an Apache open source free tool). Cocoon can query the database, generate XML, and has a tool for creating the XSL Transformation scripts to convert the first stage XML output into EML format.
|
146
|
3. Use a specialized tool such as Xanthoria (like Cocoon in may respects, but is easier to use) to generate XML from the database. Then use a tool such as XML Spy or Stylus Studio to develop the XSLT script to convert the generated XML into EML compliant XML.
|
147
|
4. Use specialized data transformation software such as the Data Junction query the database and map it into an EML structure.
|
148
|
|
149
|
CASE 3: Metadata is already in XML but in some other form such as NBII or FGDC
|
150
|
CONVERSION METHOD:
|
151
|
1. Write an XSLT script to convert from e.g. FGDC to EML.
|
152
|
|
153
|
NOTE: In each of the cases it may be necessary to add some additional
|
154
|
metadata in order to produce EML compliant metadata. Morpho will
|
155
|
automatically create EML compliant metadata either by adding it for
|
156
|
you or indicating that certain fields are mandatory.
|
157
|
<long-answer>
|
158
|
</faq-item>
|
159
|
<faq-item id="11">
|
160
|
<question> The challenge of getting my data into eml is not
|
161
|
insurmountable. My question is what do I do with it when I get it
|
162
|
there? If I am storing all my metadata in text-based eml files, how am
|
163
|
I supposed to query them or use them for data management?</question>
|
164
|
<answer> For a site that has no current electronic data management
|
165
|
system and has no immediate intention of developing one, then there
|
166
|
are a number of solutions including the morpho-metacat solution. If
|
167
|
you store your metadata in a relational database management system or
|
168
|
plan to then there are also solutions. Cocooon and Xanthoria are
|
169
|
examples of programs that can get EML out of an RDBMS. Cocoon and
|
170
|
Xanthoria are both java applications that use java database connection
|
171
|
hooks and style sheets to retrieve and format data. Xanthoria is
|
172
|
smaller code and the XSLT stylesheets for EML 2.0 have already been
|
173
|
written. This solution lets a site stick with the rdbms system that
|
174
|
they probably have integrated with their site management activities,
|
175
|
yet also have their metadata exposed via EML.</answer>
|
176
|
</faq-item>
|
177
|
<faq-item id="12">
|
178
|
<question> Does the modularity of EML mean that one descriptions
|
179
|
can be shared by many documents?</question>
|
180
|
<answer> In a previous version, EML packages (via rdf style triples)
|
181
|
supported linking across packages, so you could re-use the same
|
182
|
document in multiple packages. In EML 2.0 release candidate 1 we
|
183
|
redesigned the packaging structure to only allow linking within a
|
184
|
single package. Thus, one could reuse a party description or
|
185
|
attribute list within a package, but not across several. This is a
|
186
|
compromise that keeps some reusability but has fewer management
|
187
|
problems. Along with this change is an ability to put all
|
188
|
metadata and data in a single document for transport -- while
|
189
|
still not limiting ourselves to a monolithinc structure. This has
|
190
|
benefits (akin to db normalization) and costs (access control,
|
191
|
ownership, and multiple update problems abound).</answer>
|
192
|
</faq-item>
|
193
|
<faq-item id="13">
|
194
|
<question>How are EML modules linked together?</question> <answer>
|
195
|
With ref and ref:id attributes in each module.</answer>
|
196
|
<long-answer> Our general approach in EML has been to create
|
197
|
ComplexTypes (CT) when we wanted a particular block to be
|
198
|
reusable. This concept was extended for linking modules together
|
199
|
by adding an optional attribute named "id" of type "xs:ID" for
|
200
|
each ComplexType. This allows us to uniquely address each block
|
201
|
defined by a CT, and any XML 1.0 parser will validate that all of
|
202
|
the "id" values are in fact locally unique. For the
|
203
|
"ResourceBase" CT, this id element replaces the "identifier"
|
204
|
element and acts as the overall identifier for the package.
|
205
|
|
206
|
The content model for each CT is a choice between the existing content
|
207
|
model and a new element named "references" of type "xs:string". This
|
208
|
element is used to hold a reference to an existing subtree identified
|
209
|
by its id. We use this element instead of an IDREF to surmount
|
210
|
validation issues. This relationship between the "references" element
|
211
|
and the "id" identifiers is enforced by defining an XML Schema "key"
|
212
|
for the "id" elements and a "keyref" for the "references" elements.
|
213
|
Thus, any XML parser that supports XML Schema validation will be able
|
214
|
to validate the correspondence between each "id" and "references"
|
215
|
field (e.g., Xerces 2.0 supports this). Here's a fragemnt of an
|
216
|
example xml doc to illustrate:
|
217
|
|
218
|
|
219
|
...
|
220
|
<creator id="p1">
|
221
|
<individualName><surName>Jones</surName></individualName>
|
222
|
</creator>
|
223
|
<associatedParty>
|
224
|
<references>p1</references>
|
225
|
<role>lackey</role>
|
226
|
</associatedParty>
|
227
|
<contact>
|
228
|
<references>p1</references>
|
229
|
</contact>
|
230
|
...
|
231
|
|
232
|
This even works for types that extend other types as long as the
|
233
|
subclass is the one that does the referencing (e.g., associatedParty
|
234
|
can reference creator, but not vice versa). This rule will actually
|
235
|
be enforced by validating parsers.
|
236
|
|
237
|
The key and keyref are defined in the eml.xsd module. A package is
|
238
|
defined by all of the content included in the <eml> tag, including the
|
239
|
nested modules like attribute in entity. The nature of the
|
240
|
association is implied by the types of the document (ie,
|
241
|
role/predicate/property/relationship is not specified directly). The
|
242
|
reference/id linkage is enforced by defining another "keyref"
|
243
|
constraint. So, this lets us add arbitrary metadata documents and
|
244
|
point them at existing ids in the tree. Thus, the id serves as both
|
245
|
ends of the link (subject and object in RDF terms) depending on
|
246
|
whether it is referred to in a "references" element or in a
|
247
|
"describes" attribute.</long-answer>
|
248
|
</faq-item>
|
249
|
<faq-item id="14">
|
250
|
<question> Can I put data into EML as well as metadata?</question>
|
251
|
<answer> Yes, there are provisions in the eml-physical module for
|
252
|
inclusion of data. The module describes the structural
|
253
|
characteristics of data formats as delivered over the wire or as
|
254
|
found in a file system. One physical object (which can be a
|
255
|
bytestream or an object in a file system) might contain multiple
|
256
|
entities (for example, this would be typical in a MS Access file
|
257
|
that contained multiple tables of data). However, it is typically
|
258
|
used to describe a file or stream that is in some text-based
|
259
|
format such as ASCII or UTF-8, and includes the information needed
|
260
|
to parse the data stream to extract the entity and its attributes
|
261
|
from the stream. There are 3 distribution types, online, offline,
|
262
|
and inline. To include data in EML you would populate the inline
|
263
|
element with the data file described in the data format
|
264
|
element.</answer>
|
265
|
</faq-item>
|
266
|
<faq-item id="15">
|
267
|
<question> What can I do with my EML structured metadata?</question>
|
268
|
<answer> be very proud that you are limiting data entropy
|
269
|
worldwide.</answer>
|
270
|
</faq-item>
|
271
|
<faq-item id="16">
|
272
|
<question> Can I validate my EML documents against the
|
273
|
DTD?</question>
|
274
|
<answer> Yes and No </answer>
|
275
|
<long-answer>EML is implemented in an Extensible Markup Language (XML)
|
276
|
known as XML Schema, which is a language that defines the rules
|
277
|
that govern the EML syntax. XML Schema is an internet
|
278
|
recommendation from the World Wide Web Consortium
|
279
|
(http://www.w3.org), and so a metadata document that is said to
|
280
|
comply with the syntax of EML will structurally meet the criteria
|
281
|
defined in the XML Schema documents for EML. Over and above the
|
282
|
structure (what elements can be nested within others, how many,
|
283
|
etc.), XML Schema provides the ability to use strong data typing
|
284
|
within elements. This allows for finer validation of the contents
|
285
|
of the element, not just it's structure. For instance, an element
|
286
|
may be of type 'date', and so the value that is inserted in the
|
287
|
field will be checked against XML Schema's definition of a
|
288
|
date. Traditionally, XML documents have been validated against
|
289
|
Document Type Definitions (DTDs), which do not provide a means to
|
290
|
employ strong validation on field values through typing. EML is
|
291
|
also distributed with DTD's that are generated from the XML Schema
|
292
|
documents to provide some backward compatability.</long-answer>
|
293
|
</faq-item>
|
294
|
<faq-item id="17">
|
295
|
<question> Are there required elements in EML?</question>
|
296
|
<answer>Yes, although we've made every attempt to limit required
|
297
|
elements in the cause of flexibility there are a number of pieces
|
298
|
of information required to make sense of the metadata document. To
|
299
|
make the metadata more useful we do have recommended usages on the
|
300
|
modules. See specification for details about required fields and
|
301
|
recommended usage. In the future we may provide usage compliance
|
302
|
information such that if you want your data and metadata to be
|
303
|
useful in a particular analytical context you will be provided
|
304
|
with those elements of EML that are required for that
|
305
|
purpose.</answer>
|
306
|
</faq-item>
|
307
|
<faq-item id="18">
|
308
|
<question> There appear to be multiple places to put some types of metadata
|
309
|
in EML. How do I know which of these places is the right place for
|
310
|
my information?</question>
|
311
|
<answer> Call or email Peter McCartney.</answer>
|
312
|
</faq-item>
|
313
|
</faq>
|