1 |
2 |
* Ecological Metadata Language (EML) - Data set variable descriptors
3 |
4 |
* Authors: Matt Jones, Zheng Wang, and Noah Goldstein
5 |
* Organization: National Center for Ecological Analysis and Synthesis
6 |
* For Details: http://www.nceas.ucsb.edu/
7 |
* Created: 1997 August 19
8 |
* Modified: 1999 June 23
9 |
* Version: 1.4
10 |
* File Info: '$Id$'
11 |
12 |
* Ecological Metadata Language is a general purpose metadata content
13 |
* specification for documenting ecological data. The specification
14 |
* consists of a series of modular document type definitions (DTD) that
15 |
* provide metadata content descriptors. It describes the owner and
16 |
* contents of the dataset (eml-dataset.dtd), the research context in
17 |
* which it was created (eml-context.dtd), the structural
18 |
* characteristics of data files (eml-file.dtd), the
19 |
* characteristics of variables in a file (eml-variable.dtd), current
20 |
* status of data and metadata files (eml-status.dtd), access control
21 |
* rules regarding the data and metadata (eml-access.dtd), software
22 |
* information (eml-software) and a variety of miscellaneous
23 |
* supplemental descriptors (eml-supplement.dtd).
24 |
25 |
* Files generated under the structural constraints of eml are
26 |
* plain-text files and therefore are editable in ordinary
27 |
* text-processors. However, these DTDs are intended for use within
28 |
* general purpose metadata editors, and within a more specific
29 |
* metadata editor being developed at NCEAS for the ecological
30 |
* community. This metadata editor will provide facilities for
31 |
* version control and efficient metadata entry.
32 |
* The purpose of this specification was to formalize the
33 |
* Michener et al. work in a structured language to examine its
34 |
* application to ecological data in a controlled manner.
35 |
36 |
* This specification was based on the work of the Ecological Society
37 |
* of America's Committee on the Future of Long Term Data, and more
38 |
* specifically on a related paper, Michener et al., 1997. See:
39 |
* Michener, William K., et al., 1997. Ecological Appications,
40 |
* "Nongeospatial metadata for the ecological sciences"
41 |
* Vol 7(1). pp. 330-342.
42 |
43 |
* Where appropriate, we have used elements of the ISO/TC 211 draft
44 |
* standard - the ISO Geographic information/Geomatics standard,
45 |
* which includes xml code, as well as ISO 8601 schema. Some elements
46 |
* in the ISO/TC 211 were expanded to allow for greater
47 |
* resolution.
48 |
49 |
* For an explanation of the classes of metadata and elements defined
50 |
* below, see Michener et al. 1997. In particular, the numbered comment
51 |
* labels found below refer to Table 1 (pp. 336-337) of Michener
52 |
* et al. 1997. In addition, each of the principal elements in the
53 |
* specification is accompanied by a FIXED attribute called "description"
54 |
* that provides a brief description of the content of the element. These
55 |
* descriptions are derived from Michener et al. 1997.
56 |
57 |
58 |
59 |
<!-- * * * *
60 |
61 |
* * * *
62 |
63 |
64 |
<!-- Class 4 B -->
65 |
<!ELEMENT eml-variable (meta_file_id, variable*)>
66 |
<!ATTLIST eml-variable description CDATA #FIXED "Variable description for a file">
67 |
68 |
<!ELEMENT meta_file_id (#PCDATA)>
69 |
<!ATTLIST meta_file_id description CDATA #FIXED "Unique identifier of this metadata record">
70 |
71 |
<!ELEMENT variable (variable_name, variable_definition, unit?, storage_type?,
72 |
code_definition* , numeric_range* , missing_value_code*,
73 |
precision?, field_format?)>
74 |
<!ATTLIST variable description CDATA #FIXED "Variable information">
75 |
<!ELEMENT unit (#PCDATA) >
76 |
<!ATTLIST unit description CDATA #FIXED "Unit">
77 |
78 |
<!-- Class 4.B.1 -->
79 |
<!ELEMENT variable_name (#PCDATA) >
80 |
<!ATTLIST variable_name description CDATA #FIXED "Unique variable name or code">
81 |
82 |
<!-- Class 4.B.2 -->
83 |
<!ELEMENT variable_definition (#PCDATA)>
84 |
<!ATTLIST variable_definition description CDATA #FIXED "Precise definition of variables in data set">
85 |
86 |
<!-- Class 4.B.3 - see 4.A.2 -->
87 |
88 |
<!-- Class 4.B.4.a -->
89 |
<!ELEMENT storage_type (#PCDATA) >
90 |
<!ATTLIST storage_type description CDATA #FIXED "Storage type; Integer, floating point, character, string">
91 |
92 |
<!-- Class 4.B.4.b -->
93 |
<!ELEMENT code_definition (code, definition) >
94 |
<!ATTLIST code_definition description CDATA #FIXED "Description of any codes associated with variables">
95 |
<!ELEMENT code (#PCDATA) >
96 |
<!ATTLIST code description CDATA #FIXED "Code">
97 |
<!ELEMENT definition (#PCDATA) >
98 |
<!ATTLIST definition description CDATA #FIXED "List and definition of variable codes">
99 |
100 |
<!-- Class 4.B.4.c -->
101 |
<!ELEMENT numeric_range (minimum?,maximum?) >
102 |
<!ATTLIST numeric_range description CDATA #FIXED "Range for numeric values">
103 |
<!ELEMENT minimum (#PCDATA) >
104 |
<!ATTLIST minimum description CDATA #FIXED "Minimum value">
105 |
<!ELEMENT maximum (#PCDATA) >
106 |
<!ATTLIST maximum description CDATA #FIXED "Maximum value">
107 |
108 |
<!-- Class 4.B.4.d -->
109 |
<!ELEMENT missing_value_code (#PCDATA) >
110 |
<!ATTLIST missing_value_code description CDATA #FIXED "Character used to represent missing data">
111 |
112 |
<!-- Class 4.B.4.e -->
113 |
<!ELEMENT precision (#PCDATA) >
114 |
<!ATTLIST precision description CDATA #FIXED "Precision; number of significant digits">
115 |
116 |
<!-- Class 4.B.5 -->
117 |
<!ELEMENT field_format (variable_width|fixed_width)>
118 |
<!ATTLIST field_format description CDATA #FIXED "Data format">
119 |
120 |
121 |
Data sets are generally classified as fixed_width format or
122 |
variable_width format, but we have determined that this is actually a
123 |
per-field classification because one may encounter fixed_width fields
124 |
mixed together in the same data file with variable_width fields.
125 |
126 |
In our encoding scheme, the start of each field is assumed to be the
127 |
column after the last column of the previous field, or the first column
128 |
if this is the first field in the dataset. The end column for each
129 |
field is classified using a field_format and some information specific to
130 |
each field_format type that indicates in which column the field ends. The
131 |
two types of field formats are variable_width and fixed_width.
132 |
Variable_width fields can vary in their field length, and the end of the
133 |
field is delimited by a special character called a field delimiter,
134 |
usually a comma or a tab character. Fixed_width fields have a set
135 |
length, and so the end of the field can always be determined by adding
136 |
the field_width to the starting column number. Here is an example:
137 |
138 |
Assume we have the following data in a data set:
139 |
140 |
141 |
142 |
143 |
144 |
The metadata for the 4 fields would include the following:
145 |
146 |
147 |
148 |
149 |
150 |
151 |
152 |
153 |
154 |
155 |
156 |
157 |
158 |
159 |
160 |
161 |
162 |
163 |
<!ELEMENT variable_width (delimiter+)>
164 |
<!ATTLIST variable_width description CDATA #FIXED "Variable width field">
165 |
<!ELEMENT delimiter (#PCDATA)>
166 |
<!ATTLIST delimiter description CDATA #FIXED "Character used to delimit end of field"><!ELEMENT fixed_width (field_width)>
167 |
<!ATTLIST fixed_width description CDATA #FIXED "Fixed width field">
168 |
<!ELEMENT field_width (#PCDATA)>
169 |
<!ATTLIST field_width description CDATA #FIXED "Width of field in characters">
170 |
171 |
<!-- Class 4.B.5.a - see Class 4.B.5 -->
172 |
173 |
<!-- Class 4.B.5.b - see Class 4.B.5 -->
174 |
175 |
<!-- Class 4.B.5.c -->
176 |
<!-- This section was removed as we were unsure of its usefullness -->
177 |
178 |
<!-- End of file -->