1 |
808
|
berkley
|
<!--
|
2 |
|
|
* Ecological Metadata Language (EML) - Data set variable descriptors
|
3 |
|
|
*
|
4 |
|
|
* Authors: Matt Jones, Zheng Wang, and Noah Goldstein
|
5 |
|
|
* Organization: National Center for Ecological Analysis and Synthesis
|
6 |
|
|
* For Details: http://www.nceas.ucsb.edu/
|
7 |
|
|
* Created: 1997 August 19
|
8 |
|
|
* Modified: 1999 June 23
|
9 |
|
|
* Version: 1.4
|
10 |
|
|
* File Info: '$Id$'
|
11 |
|
|
*
|
12 |
|
|
* Ecological Metadata Language is a general purpose metadata content
|
13 |
|
|
* specification for documenting ecological data. The specification
|
14 |
|
|
* consists of a series of modular document type definitions (DTD) that
|
15 |
|
|
* provide metadata content descriptors. It describes the owner and
|
16 |
|
|
* contents of the dataset (eml-dataset.dtd), the research context in
|
17 |
|
|
* which it was created (eml-context.dtd), the structural
|
18 |
|
|
* characteristics of data files (eml-file.dtd), the
|
19 |
|
|
* characteristics of variables in a file (eml-variable.dtd), current
|
20 |
|
|
* status of data and metadata files (eml-status.dtd), access control
|
21 |
|
|
* rules regarding the data and metadata (eml-access.dtd), software
|
22 |
|
|
* information (eml-software) and a variety of miscellaneous
|
23 |
|
|
* supplemental descriptors (eml-supplement.dtd).
|
24 |
|
|
*
|
25 |
|
|
* Files generated under the structural constraints of eml are
|
26 |
|
|
* plain-text files and therefore are editable in ordinary
|
27 |
|
|
* text-processors. However, these DTDs are intended for use within
|
28 |
|
|
* general purpose metadata editors, and within a more specific
|
29 |
|
|
* metadata editor being developed at NCEAS for the ecological
|
30 |
|
|
* community. This metadata editor will provide facilities for
|
31 |
|
|
* version control and efficient metadata entry.
|
32 |
|
|
* The purpose of this specification was to formalize the
|
33 |
|
|
* Michener et al. work in a structured language to examine its
|
34 |
|
|
* application to ecological data in a controlled manner.
|
35 |
|
|
*
|
36 |
|
|
* This specification was based on the work of the Ecological Society
|
37 |
|
|
* of America's Committee on the Future of Long Term Data, and more
|
38 |
|
|
* specifically on a related paper, Michener et al., 1997. See:
|
39 |
|
|
* Michener, William K., et al., 1997. Ecological Appications,
|
40 |
|
|
* "Nongeospatial metadata for the ecological sciences"
|
41 |
|
|
* Vol 7(1). pp. 330-342.
|
42 |
|
|
*
|
43 |
|
|
* Where appropriate, we have used elements of the ISO/TC 211 draft
|
44 |
|
|
* standard - the ISO Geographic information/Geomatics standard,
|
45 |
|
|
* which includes xml code, as well as ISO 8601 schema. Some elements
|
46 |
|
|
* in the ISO/TC 211 were expanded to allow for greater
|
47 |
|
|
* resolution.
|
48 |
|
|
*
|
49 |
|
|
* For an explanation of the classes of metadata and elements defined
|
50 |
|
|
* below, see Michener et al. 1997. In particular, the numbered comment
|
51 |
|
|
* labels found below refer to Table 1 (pp. 336-337) of Michener
|
52 |
|
|
* et al. 1997. In addition, each of the principal elements in the
|
53 |
|
|
* specification is accompanied by a FIXED attribute called "description"
|
54 |
|
|
* that provides a brief description of the content of the element. These
|
55 |
|
|
* descriptions are derived from Michener et al. 1997.
|
56 |
|
|
*
|
57 |
|
|
-->
|
58 |
|
|
|
59 |
|
|
<!-- * * * *
|
60 |
|
|
CLASS IV B - VARIABLE DESCRIPTORS
|
61 |
|
|
* * * *
|
62 |
|
|
-->
|
63 |
|
|
|
64 |
|
|
<!-- Class 4 B -->
|
65 |
|
|
<!ELEMENT eml-variable (meta_file_id, variable*)>
|
66 |
|
|
<!ATTLIST eml-variable description CDATA #FIXED "Variable description for a file">
|
67 |
|
|
|
68 |
|
|
<!ELEMENT meta_file_id (#PCDATA)>
|
69 |
|
|
<!ATTLIST meta_file_id description CDATA #FIXED "Unique identifier of this metadata record">
|
70 |
|
|
|
71 |
|
|
<!ELEMENT variable (variable_name, variable_definition, unit?, storage_type?,
|
72 |
|
|
code_definition* , numeric_range* , missing_value_code*,
|
73 |
|
|
precision?, field_format?)>
|
74 |
|
|
<!ATTLIST variable description CDATA #FIXED "Variable information">
|
75 |
|
|
<!ELEMENT unit (#PCDATA) >
|
76 |
|
|
<!ATTLIST unit description CDATA #FIXED "Unit">
|
77 |
|
|
|
78 |
|
|
<!-- Class 4.B.1 -->
|
79 |
|
|
<!ELEMENT variable_name (#PCDATA) >
|
80 |
|
|
<!ATTLIST variable_name description CDATA #FIXED "Unique variable name or code">
|
81 |
|
|
|
82 |
|
|
<!-- Class 4.B.2 -->
|
83 |
|
|
<!ELEMENT variable_definition (#PCDATA)>
|
84 |
|
|
<!ATTLIST variable_definition description CDATA #FIXED "Precise definition of variables in data set">
|
85 |
|
|
|
86 |
|
|
<!-- Class 4.B.3 - see 4.A.2 -->
|
87 |
|
|
|
88 |
|
|
<!-- Class 4.B.4.a -->
|
89 |
|
|
<!ELEMENT storage_type (#PCDATA) >
|
90 |
|
|
<!ATTLIST storage_type description CDATA #FIXED "Storage type; Integer, floating point, character, string">
|
91 |
|
|
|
92 |
|
|
<!-- Class 4.B.4.b -->
|
93 |
|
|
<!ELEMENT code_definition (code, definition) >
|
94 |
|
|
<!ATTLIST code_definition description CDATA #FIXED "Description of any codes associated with variables">
|
95 |
|
|
<!ELEMENT code (#PCDATA) >
|
96 |
|
|
<!ATTLIST code description CDATA #FIXED "Code">
|
97 |
|
|
<!ELEMENT definition (#PCDATA) >
|
98 |
|
|
<!ATTLIST definition description CDATA #FIXED "List and definition of variable codes">
|
99 |
|
|
|
100 |
|
|
<!-- Class 4.B.4.c -->
|
101 |
|
|
<!ELEMENT numeric_range (minimum?,maximum?) >
|
102 |
|
|
<!ATTLIST numeric_range description CDATA #FIXED "Range for numeric values">
|
103 |
|
|
<!ELEMENT minimum (#PCDATA) >
|
104 |
|
|
<!ATTLIST minimum description CDATA #FIXED "Minimum value">
|
105 |
|
|
<!ELEMENT maximum (#PCDATA) >
|
106 |
|
|
<!ATTLIST maximum description CDATA #FIXED "Maximum value">
|
107 |
|
|
|
108 |
|
|
<!-- Class 4.B.4.d -->
|
109 |
|
|
<!ELEMENT missing_value_code (#PCDATA) >
|
110 |
|
|
<!ATTLIST missing_value_code description CDATA #FIXED "Character used to represent missing data">
|
111 |
|
|
|
112 |
|
|
<!-- Class 4.B.4.e -->
|
113 |
|
|
<!ELEMENT precision (#PCDATA) >
|
114 |
|
|
<!ATTLIST precision description CDATA #FIXED "Precision; number of significant digits">
|
115 |
|
|
|
116 |
|
|
<!-- Class 4.B.5 -->
|
117 |
|
|
<!ELEMENT field_format (variable_width|fixed_width)>
|
118 |
|
|
<!ATTLIST field_format description CDATA #FIXED "Data format">
|
119 |
|
|
|
120 |
|
|
<!--
|
121 |
|
|
Data sets are generally classified as fixed_width format or
|
122 |
|
|
variable_width format, but we have determined that this is actually a
|
123 |
|
|
per-field classification because one may encounter fixed_width fields
|
124 |
|
|
mixed together in the same data file with variable_width fields.
|
125 |
|
|
|
126 |
|
|
In our encoding scheme, the start of each field is assumed to be the
|
127 |
|
|
column after the last column of the previous field, or the first column
|
128 |
|
|
if this is the first field in the dataset. The end column for each
|
129 |
|
|
field is classified using a field_format and some information specific to
|
130 |
|
|
each field_format type that indicates in which column the field ends. The
|
131 |
|
|
two types of field formats are variable_width and fixed_width.
|
132 |
|
|
Variable_width fields can vary in their field length, and the end of the
|
133 |
|
|
field is delimited by a special character called a field delimiter,
|
134 |
|
|
usually a comma or a tab character. Fixed_width fields have a set
|
135 |
|
|
length, and so the end of the field can always be determined by adding
|
136 |
|
|
the field_width to the starting column number. Here is an example:
|
137 |
|
|
|
138 |
|
|
Assume we have the following data in a data set:
|
139 |
|
|
|
140 |
|
|
May,100aaa,1.2,
|
141 |
|
|
April,200aaa,3.4,
|
142 |
|
|
June,300bbb,4.6,
|
143 |
|
|
|
144 |
|
|
The metadata for the 4 fields would include the following:
|
145 |
|
|
<variable><name>month</name>
|
146 |
|
|
<field_format><variable_width><delimiter>,</delimiter>
|
147 |
|
|
</variable_width></field_format></variable>
|
148 |
|
|
|
149 |
|
|
<variable><name>sitecode</name>
|
150 |
|
|
<field_format><fixed_width><field_width>3</field_width>
|
151 |
|
|
</fixed_width></field_format></variable>
|
152 |
|
|
|
153 |
|
|
<variable><name>subsitecode</name>
|
154 |
|
|
<field_format><fixed_width><field_width>3</field_width>
|
155 |
|
|
</fixed_width></field_format></variable>
|
156 |
|
|
|
157 |
|
|
<variable><name>response</name>
|
158 |
|
|
<field_format><variable_width><delimiter>,</delimiter>
|
159 |
|
|
</variable_width></field_format></variable>
|
160 |
|
|
|
161 |
|
|
-->
|
162 |
|
|
|
163 |
|
|
<!ELEMENT variable_width (delimiter+)>
|
164 |
|
|
<!ATTLIST variable_width description CDATA #FIXED "Variable width field">
|
165 |
|
|
<!ELEMENT delimiter (#PCDATA)>
|
166 |
|
|
<!ATTLIST delimiter description CDATA #FIXED "Character used to delimit end of field"><!ELEMENT fixed_width (field_width)>
|
167 |
|
|
<!ATTLIST fixed_width description CDATA #FIXED "Fixed width field">
|
168 |
|
|
<!ELEMENT field_width (#PCDATA)>
|
169 |
|
|
<!ATTLIST field_width description CDATA #FIXED "Width of field in characters">
|
170 |
|
|
|
171 |
|
|
<!-- Class 4.B.5.a - see Class 4.B.5 -->
|
172 |
|
|
|
173 |
|
|
<!-- Class 4.B.5.b - see Class 4.B.5 -->
|
174 |
|
|
|
175 |
|
|
<!-- Class 4.B.5.c -->
|
176 |
|
|
<!-- This section was removed as we were unsure of its usefullness -->
|
177 |
|
|
|
178 |
|
|
<!-- End of file -->
|