1
|
<!--
|
2
|
* Ecological Metadata Language (EML) - Data set variable descriptors
|
3
|
*
|
4
|
* Authors: Matt Jones, Zheng Wang, and Noah Goldstein
|
5
|
* Organization: National Center for Ecological Analysis and Synthesis
|
6
|
* For Details: http://www.nceas.ucsb.edu/
|
7
|
* Created: 1997 August 19
|
8
|
* Modified: 1999 June 23
|
9
|
* Version: 1.4
|
10
|
* File Info: '$Id: eml-variable.dtd 808 2001-07-25 15:57:31Z berkley $'
|
11
|
*
|
12
|
* Ecological Metadata Language is a general purpose metadata content
|
13
|
* specification for documenting ecological data. The specification
|
14
|
* consists of a series of modular document type definitions (DTD) that
|
15
|
* provide metadata content descriptors. It describes the owner and
|
16
|
* contents of the dataset (eml-dataset.dtd), the research context in
|
17
|
* which it was created (eml-context.dtd), the structural
|
18
|
* characteristics of data files (eml-file.dtd), the
|
19
|
* characteristics of variables in a file (eml-variable.dtd), current
|
20
|
* status of data and metadata files (eml-status.dtd), access control
|
21
|
* rules regarding the data and metadata (eml-access.dtd), software
|
22
|
* information (eml-software) and a variety of miscellaneous
|
23
|
* supplemental descriptors (eml-supplement.dtd).
|
24
|
*
|
25
|
* Files generated under the structural constraints of eml are
|
26
|
* plain-text files and therefore are editable in ordinary
|
27
|
* text-processors. However, these DTDs are intended for use within
|
28
|
* general purpose metadata editors, and within a more specific
|
29
|
* metadata editor being developed at NCEAS for the ecological
|
30
|
* community. This metadata editor will provide facilities for
|
31
|
* version control and efficient metadata entry.
|
32
|
* The purpose of this specification was to formalize the
|
33
|
* Michener et al. work in a structured language to examine its
|
34
|
* application to ecological data in a controlled manner.
|
35
|
*
|
36
|
* This specification was based on the work of the Ecological Society
|
37
|
* of America's Committee on the Future of Long Term Data, and more
|
38
|
* specifically on a related paper, Michener et al., 1997. See:
|
39
|
* Michener, William K., et al., 1997. Ecological Appications,
|
40
|
* "Nongeospatial metadata for the ecological sciences"
|
41
|
* Vol 7(1). pp. 330-342.
|
42
|
*
|
43
|
* Where appropriate, we have used elements of the ISO/TC 211 draft
|
44
|
* standard - the ISO Geographic information/Geomatics standard,
|
45
|
* which includes xml code, as well as ISO 8601 schema. Some elements
|
46
|
* in the ISO/TC 211 were expanded to allow for greater
|
47
|
* resolution.
|
48
|
*
|
49
|
* For an explanation of the classes of metadata and elements defined
|
50
|
* below, see Michener et al. 1997. In particular, the numbered comment
|
51
|
* labels found below refer to Table 1 (pp. 336-337) of Michener
|
52
|
* et al. 1997. In addition, each of the principal elements in the
|
53
|
* specification is accompanied by a FIXED attribute called "description"
|
54
|
* that provides a brief description of the content of the element. These
|
55
|
* descriptions are derived from Michener et al. 1997.
|
56
|
*
|
57
|
-->
|
58
|
|
59
|
<!-- * * * *
|
60
|
CLASS IV B - VARIABLE DESCRIPTORS
|
61
|
* * * *
|
62
|
-->
|
63
|
|
64
|
<!-- Class 4 B -->
|
65
|
<!ELEMENT eml-variable (meta_file_id, variable*)>
|
66
|
<!ATTLIST eml-variable description CDATA #FIXED "Variable description for a file">
|
67
|
|
68
|
<!ELEMENT meta_file_id (#PCDATA)>
|
69
|
<!ATTLIST meta_file_id description CDATA #FIXED "Unique identifier of this metadata record">
|
70
|
|
71
|
<!ELEMENT variable (variable_name, variable_definition, unit?, storage_type?,
|
72
|
code_definition* , numeric_range* , missing_value_code*,
|
73
|
precision?, field_format?)>
|
74
|
<!ATTLIST variable description CDATA #FIXED "Variable information">
|
75
|
<!ELEMENT unit (#PCDATA) >
|
76
|
<!ATTLIST unit description CDATA #FIXED "Unit">
|
77
|
|
78
|
<!-- Class 4.B.1 -->
|
79
|
<!ELEMENT variable_name (#PCDATA) >
|
80
|
<!ATTLIST variable_name description CDATA #FIXED "Unique variable name or code">
|
81
|
|
82
|
<!-- Class 4.B.2 -->
|
83
|
<!ELEMENT variable_definition (#PCDATA)>
|
84
|
<!ATTLIST variable_definition description CDATA #FIXED "Precise definition of variables in data set">
|
85
|
|
86
|
<!-- Class 4.B.3 - see 4.A.2 -->
|
87
|
|
88
|
<!-- Class 4.B.4.a -->
|
89
|
<!ELEMENT storage_type (#PCDATA) >
|
90
|
<!ATTLIST storage_type description CDATA #FIXED "Storage type; Integer, floating point, character, string">
|
91
|
|
92
|
<!-- Class 4.B.4.b -->
|
93
|
<!ELEMENT code_definition (code, definition) >
|
94
|
<!ATTLIST code_definition description CDATA #FIXED "Description of any codes associated with variables">
|
95
|
<!ELEMENT code (#PCDATA) >
|
96
|
<!ATTLIST code description CDATA #FIXED "Code">
|
97
|
<!ELEMENT definition (#PCDATA) >
|
98
|
<!ATTLIST definition description CDATA #FIXED "List and definition of variable codes">
|
99
|
|
100
|
<!-- Class 4.B.4.c -->
|
101
|
<!ELEMENT numeric_range (minimum?,maximum?) >
|
102
|
<!ATTLIST numeric_range description CDATA #FIXED "Range for numeric values">
|
103
|
<!ELEMENT minimum (#PCDATA) >
|
104
|
<!ATTLIST minimum description CDATA #FIXED "Minimum value">
|
105
|
<!ELEMENT maximum (#PCDATA) >
|
106
|
<!ATTLIST maximum description CDATA #FIXED "Maximum value">
|
107
|
|
108
|
<!-- Class 4.B.4.d -->
|
109
|
<!ELEMENT missing_value_code (#PCDATA) >
|
110
|
<!ATTLIST missing_value_code description CDATA #FIXED "Character used to represent missing data">
|
111
|
|
112
|
<!-- Class 4.B.4.e -->
|
113
|
<!ELEMENT precision (#PCDATA) >
|
114
|
<!ATTLIST precision description CDATA #FIXED "Precision; number of significant digits">
|
115
|
|
116
|
<!-- Class 4.B.5 -->
|
117
|
<!ELEMENT field_format (variable_width|fixed_width)>
|
118
|
<!ATTLIST field_format description CDATA #FIXED "Data format">
|
119
|
|
120
|
<!--
|
121
|
Data sets are generally classified as fixed_width format or
|
122
|
variable_width format, but we have determined that this is actually a
|
123
|
per-field classification because one may encounter fixed_width fields
|
124
|
mixed together in the same data file with variable_width fields.
|
125
|
|
126
|
In our encoding scheme, the start of each field is assumed to be the
|
127
|
column after the last column of the previous field, or the first column
|
128
|
if this is the first field in the dataset. The end column for each
|
129
|
field is classified using a field_format and some information specific to
|
130
|
each field_format type that indicates in which column the field ends. The
|
131
|
two types of field formats are variable_width and fixed_width.
|
132
|
Variable_width fields can vary in their field length, and the end of the
|
133
|
field is delimited by a special character called a field delimiter,
|
134
|
usually a comma or a tab character. Fixed_width fields have a set
|
135
|
length, and so the end of the field can always be determined by adding
|
136
|
the field_width to the starting column number. Here is an example:
|
137
|
|
138
|
Assume we have the following data in a data set:
|
139
|
|
140
|
May,100aaa,1.2,
|
141
|
April,200aaa,3.4,
|
142
|
June,300bbb,4.6,
|
143
|
|
144
|
The metadata for the 4 fields would include the following:
|
145
|
<variable><name>month</name>
|
146
|
<field_format><variable_width><delimiter>,</delimiter>
|
147
|
</variable_width></field_format></variable>
|
148
|
|
149
|
<variable><name>sitecode</name>
|
150
|
<field_format><fixed_width><field_width>3</field_width>
|
151
|
</fixed_width></field_format></variable>
|
152
|
|
153
|
<variable><name>subsitecode</name>
|
154
|
<field_format><fixed_width><field_width>3</field_width>
|
155
|
</fixed_width></field_format></variable>
|
156
|
|
157
|
<variable><name>response</name>
|
158
|
<field_format><variable_width><delimiter>,</delimiter>
|
159
|
</variable_width></field_format></variable>
|
160
|
|
161
|
-->
|
162
|
|
163
|
<!ELEMENT variable_width (delimiter+)>
|
164
|
<!ATTLIST variable_width description CDATA #FIXED "Variable width field">
|
165
|
<!ELEMENT delimiter (#PCDATA)>
|
166
|
<!ATTLIST delimiter description CDATA #FIXED "Character used to delimit end of field"><!ELEMENT fixed_width (field_width)>
|
167
|
<!ATTLIST fixed_width description CDATA #FIXED "Fixed width field">
|
168
|
<!ELEMENT field_width (#PCDATA)>
|
169
|
<!ATTLIST field_width description CDATA #FIXED "Width of field in characters">
|
170
|
|
171
|
<!-- Class 4.B.5.a - see Class 4.B.5 -->
|
172
|
|
173
|
<!-- Class 4.B.5.b - see Class 4.B.5 -->
|
174
|
|
175
|
<!-- Class 4.B.5.c -->
|
176
|
<!-- This section was removed as we were unsure of its usefullness -->
|
177
|
|
178
|
<!-- End of file -->
|
179
|
|