'$RCSfile: eml-physical.xsd,v $' Copyright: 1997-2002 Regents of the University of California, University of New Mexico, and Arizona State University Sponsors: National Center for Ecological Analysis and Synthesis and Partnership for Interdisciplinary Studies of Coastal Oceans, University of California Santa Barbara Long-Term Ecological Research Network Office, University of New Mexico Center for Environmental Studies, Arizona State University Other funding: National Science Foundation (see README for details) The David and Lucile Packard Foundation For Details: http://knb.ecoinformatics.org/ '$Author: cjones $' '$Date: 2002/09/16 23:40:58 $' '$Revision: 1.43 $' This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA eml-physical The eml-physical module describes the external and internal physical characteristics of a data object as well as the information required for its distribution. Examples of the external physical characteristics of a data object would be the filename, size, compression, encoding methods, and authentication of a file (or byte stream) that resides on a filesystem or the name of a database table if the data object resides in a relational database. Internal physical characteristics describe the format of the data object being described. Examples are Microsoft Access 2000, ASCII, or UTF-8. It also includes the information needed to parse the data object to extract the entity and its attributes from the data object. Distribution information describes how to retrieve the data object. The retrieval information can be either online with connection information, a URL for example, or offline with the data object residing on an archival tape. Any data object that is being desribed by EML needs this information so the entities and attributes that reside with in the data object can be extracted. yes Physical structure. Physical structure of an entity or entities. The content model for physical is a CHOICE between "references" and all of the elements that let you describe the internal/external characteristics and distribution of a data object (e.g., dataObject, dataFormat, distribution.) A physical element can contain a reference to an physical element defined elsewhere. Using a reference means that the referenced physical is identical, not just in name but identical in its complete description. The eml-physical was introduced into EML 1.4 as eml-file. Data Object. External characteristics of a data object The dataObject element is the parent element for several elements (e.g. objectName, size, authentication, compressionMethod, encodingMethod, characterEncoding) which describe the external characteristics of the data object. Introduced in EML 2.0. Object name A name for the physical object. The objectName is the a name (i.e. identifier) for the object being considered. In many cases, it may just be the file name on the file system where it is stored. Or if the object is a table in a RDBMS it may be the table name. Introduced in EML 2.0. Data object size Describes the physical size of the data object. This element contains information of the physical size of the entity, typically in bytes. 13]]> The entitySize was introduced into EML 1.4. Unit of measurement Unit of measurement for the entity size, typically bytes This element gives the unit of measurement for the size of the entity, and is typically bytes. 13]]> The unit was introduced into EML 1.4. Authentication method A value, typically a checksum, used to authenticate that the bitstream delivered to the user is identical to the original. This element describes authentication procedures or techniques, typically by giving a checksum method (e.g., MD5) and checksum value for the bytestream. f5b2177ea03aea73de12da81f896fe40]]> The authentication element was introduced into EML 1.4. Authentication method The method used to calculate an authentication checksum. This element names the method used to calculate and authentication checksum that can be used to validate a bytestream. Typical checksum methods include MD5 and CRC. f5b2177ea03aea73de12da81f896fe40]]> The authentication element was introduced into EML 1.4. Entity's compression method Name of the entity's compression method This element describes any compression methods used to compress the entity, such as zip, compress, etc. The compressed element was introduced into EML 1.4. Encoding Method Method used for encoding the entity This element describes the entity's encoded method, such as MIME base64 encoding or binhex encoding. The encoded element was introduced into EML 1.4. Character Encoding Contains the name of the chracter encoding used for the data. This element contains the name of the character encoding. This is typically ASCII or UTF-8, or one of the other common encodings. UTF-8]]> Introduced in EML 2.0 Data format Describes the internal physical format of a data object. This element is the parent which is a CHOICE between four possible internal physical formats which describe the internal physical characteristics of the data object. Using this information the user should be able construct the entity and attributes described in those modules. Note that this is the format of the physical file itself. The format element was introduced into EML 1.4. Generic binary format Generic binary format Documentation for a generic binary format Introduced in EML 2.0. Number of physical lines The number of physical lines in the file spanned by a single logical data record. A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, it is necessary to know the number of lines per record in order to correctly read them. 3 Introduced into EML 2.0. Physical Line Number The line on which the data field is found, when the data record is written over more than one physical line in the file. A single logical data record may be written over several physical lines in a file, with no special marker to indicate the end of a record. In such cases, the relative location of a data field must be indicated by both relative row and column number. 3 Introduced into EML 2.0. ASCII fixed delimited Describes physical format of entities and attributes delimitedby special characters like commas and spaces. Describes physical format of entities and attributes delimitedby special characters like commas and spaces. Introduced in EML 2.0. Start column The starting column number for a fixed format attribute. FixedWidth fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number. any positive integer, see example in "delimiter" description Introduced into EML 2.0. Field width FieldWidth specification for fixed field length. FixedWidth fields have a set length, thus the end of the field can always be determined by adding the fieldWidth to the starting column number. any positive integer, see example in "delimeter" description The fieldWidth element was introduced into EML 1.4. Semantics changed to work identically to the NBII DTD. ASCII field delimited Describes physical format of entities and attributes delimitedby special characters like commas and spaces. Describes physical format of entities and attributes delimitedby special characters like commas and spaces. Introduced in EML 2.0. Attribute delimiter The end of the attribute (field) is delimited by a special character called a field delimiter. Variable width format fields (attributes) can vary in their field length, thus the end of the field is delimited by a special character called a field delimiter (typically a comma or a space). Data sets are generally classified as fixedWidth format or variableWidth format, but we have determined that this is actually a per-field classification because one may encounter fixedWidth fields mixed together in the same data file with variableWidth fields. In our encoding scheme, the start of each field is assumed to be the column after the last column of the previous field, or the first column if this is the first field in the dataset, unless the starting column is explicity enumerated using the "fieldStartColumn" element. The end column for each field is classified using either a special character delimeter indicated using the filedDelimiter element, or a fixed field length indicated by using the "fieldWidth" element. The delimiter for the last field in the data set can be omitted. variableWidth fields can vary in their field length, and the end of the field is delimited by a special character called a field delimiter, usually a comma or a tab character. fixedWidth fields have a set length, and so the end of the field can always be determined by adding the fieldWidth to the starting column number. Here is an example: Assume we have the following data in a data set: May,100aaaa,1.2, April,200aaaa,3.4, June,300bbbb,4.6, The metadata indicating the physical layout of the 4 fields would include the following: , 3 3 , ]]> In a strictly fixed format file, the metadata would be slightly different: May100aaaa1.2 Apr200aaaa3.4 Jun300bbbb4.6 3 3 4 3 ]]> or, one could explicitly describe the starting columns: 1 3 4 3 7 4 11 3 ]]> comma, tab, white space, etc. The delimiter element was introduced into EML 1.4. Semantics changed to work identically to the NBII DTD, and then modified to fit more cases. Quote character Character used to quote values for delimiter escaping This element specifies a character to be used in the entity for quoting values so that field delimeters can be used within the value. This basically allows delimeter "escaping". The quoteChacter is typically a " or '. "]]> The quoteCharacter element was taken from the NBII standard. Record delimiter character Character used to delimit records. This element specifies the record delimiter character when the format is text. The record delimiter is usually a newline (\n) on UNIX, a carriage return (\r) on MacOS, or both (\r\n) on Windows/DOS. Multiline records are usually delimited with two line ending characters, for example on UNIX it would be two newline characters (\n\n). \n\r]]> The recordDelimiter element was introduced into EML 1.4. Literal character Character used to escape other characters This element specifies a character to be used for escaping character values so that the following character is treated as its literal value. This allows "escaping" for special characters like quotes, commas, and spaces when they aren't intended as a delimiter value. The literalCharacter is typicallya \. \]]> Introduced in EML 2.0. Format Name Name of the internal format of the data object Name of the internal format of the data object Microsoft Excel The formatName element was introduced into EML 2.0 Format Version Version of the internal format of the data object Version of the internal format of the data object 2000 (9.0.2720) The formatVersion element was introduced into EML 2.0 citation Data object is an eml-literature document. Data object conforms to the EML standard for citation as defined in the XML schema for eml-literature. eml-literature.xml The citation element was introduced into EML 2.0 raster image parameters contains binary raster data header parameters The binaryRasterInfo element is a container for various parameters used to described the contents of binary raster image files. In this case, it is based on a white paper on the ESRI site that describes the header information used for BIP and BIL files ("Extendable Image Formats for ArcView GIS 3.1 and 3.2"). Introduced in EML 2.0. Number of rows The number of rows in the image. The number of rows in the image. Rows are parallel to the x-axis of the map coordinate system. There is no default. 400 Introduced in EML 2.0. Number of columns The number of columns in the image. The number of columns in the image. Columns are parallel to the y-axis of the map coordinate system. There is no default. 600 Introduced in EML 2.0. Entity's record orientation Specification of the binary raster entity's record orientation. This element contains specification of the binary raster entity's record orientation by defining the element's attribute "columnorrow". The binary raster will be column major if the raster is to be displayed column by column from the byte stream, or row major if it is to be displayed row by row from the byte stream. The valid attribute values are "columnmajor" or "rowmajor". If the attribute is not specified, "columnmajor" is used. The orientation element was introduced into EML 2.0 Attribute of orientation element Specification of the entity's record orientation. This attribute specifies the entity's record orientation. The valid attribute values are "columnmajor" or "rowmajor". If the attribute is not specified, "columnmajor" is used. The columnorrow attribute was introduced into EML 1.4. Number of Bands The number of spectral bands in the image. The number of spectral bands in the image. The default is 1. 1 Introduced in EML 2.0. Number of Bits The number of bits per pixel per band. The number of bits per pixel per band. Acceptable values are 1, 4, 8, 16, and 32. The default value is eight bits per pixel per band. For a true color image with three bands (R, G, B) stored using eight bits for each pixel in each band, nbits equals eight and nbands equals three, for a total of twenty-four bits per pixel. For an image with nbits equal to one, nbands must also equal one. 8 Introduced in EML 2.0. Byte Order The byte order in which image pixel values are stored. The byte order in which image pixel values are stored. The byte order is important for sixteen-bit images, with two bytes per pixel. Acceptable values are I - Intel byte order (Silicon Graphics, DEC Alpha, PC) Also known as little endian. M - Motorola byte order (Sun, HP, etc.) Also known as big-endian. The default byte order is the same as that of the host machine executing the software. I or M Introduced in EML 2.0. Layout The organization of the bands in the image file. The organization of the bands in the image file. Acceptable values are bil - Band interleaved by line. bip - Band interleaved by pixel. bsq - Band sequential. The default layout is bil. bil, bip, bsq Introduced in EML 2.0. Skip Bytes The number of bytes of data in the image file to skip in order to reach the start of the image data. The number of bytes of data in the image file to skip in order to reach the start of the image data. This keyword allows you to bypass any existing image header information in the file. The default value is zero bytes. 0 Introduced in EML 2.0. upper left X map coordinate The x-axis map coordinate of the center of the upper-left pixel. The x-axis map coordinate of the center of the upper-left pixel. If this parameter is specified, ulymap must also be set, otherwise a default value is used. 340000 Introduced in EML 2.0. upper left Y map coordinate The y-axis map coordinate of the center of the upper-left pixel. The y-axis map coordinate of the center of the upper-left pixel. If you specify this parameter, set ulxmap, too, otherwise a default value is used. 6486666 Introduced in EML 2.0. X dimension The x-dimension of a pixel in map units. The x-dimension of a pixel in map units. If this parameter is specified, ydim must also be set, otherwise a default value is used. 16.665 Introduced in EML 2.0. Y dimension The y-dimension of a pixel in map units. The y-dimension of a pixel in map units. If this parameter is specified, xdim must also be set, otherwise a default value is used. 16.665 Introduced in EML 2.0. Bytes per band per row The number of bytes per band per row. The number of bytes per band per row. This must be an integer. This keyword is used only with BIL files when there are extra bits at the end of each band within a row that must be skipped. 3 Introduced in EML 2.0. Total bytes of data per row The total number of bytes of data per row. The total number of bytes of data per row. Use totalrowbytes when there are extra trailing bits at the end of each row. 8 Introduced in EML 2.0. Bytes between bands The number of bytes between bands in a BSQ format image. The number of bytes between bands in a BSQ format image. The default is zero. 1 Introduced in EML 2.0. Distribution Information Information on how the resource is distributed online and offline This element provides information on how the resource is distributed online and offline. Connections to online systems can be described as URLs and as a list of relevant connection parameters. Derived from distribution elements in the FGDC standard. Online Distribution Information Distribution information for accessing the resource online. Distribution information for accessing the resource online, represented either as a URL or as a series of named parameters that are needed in order to connect. The URL field is provided for the simple cases where a file is available for download directly from a web server or other similar server and a complex connection protocol is not needed. The connection field provides an alternative where a complex protocol needs to be named and described, along with the necessary parameters needed for the connection. Download site URL A URL (Uniform Resource Locator) from which this resource can be downloaded or information can be obtained about downloading it. A URL (Uniform Resource Locator) from which this resource can be downloaded or additional information can be obtained. If accessing the URL would directly return the data stream, then the "function" attribute should be set to "download". If the URL provides further information about downloading the object but does not directly return the data stream, then the "function" attribute should be set to "information". If the "function" attribute is omitted, then "download" is implied for the URL function. In more complex cases where a non-standard connection must be established that complies with application specific procedures beyond what can be described in the simple URL, then the "connection" element should be used instead of the URL element. http://data.org/getdata?id=98332 ISO CD 19115.3, Geographic information - Metadata Connection A description of the information needed to make an application connection to a data service. A description of the information needed to make an application connection to a data service. The connection starts with a connectionDefinition which lists all of the parameters needed for the connection and possible default values for each. It then includes a list of parameter values, one for each parameter, that override the defaults for this particular connection. One parameter element should exist for every parameterDefinition that is present in the connectionDefinition, except that parameters that were defined with a defaultValue in their parameterDefinition can be ommitted from the connection and the default will be used. All information about how to use the parameters to establish a session and extract data is present in the connectionDefinition, possibly implicitly by naming a connection schemeName that is well-known. Connection Definition Definition of the connection protocol to be used for this connection. Definition of the connection protocol to be used for this connection. The definition has a "scheme" which identifies the protocol by name, and a detailed description of the scheme and its required parameters. Parameter A parameter to be used to make this connection. A parameter to be used to make this connection. This value overrides any default value that may have been provided in the connection definition. Parameter Name Name of the parameter to be used to make this connection. The name of the parameter to be used to make this connection. hostname Parameter Value The value of the parameter to be used to make this connection. The value of the parameter to be used to make this connection. This value overrides any default value that may have been provided in the connection definition. nceas.ucsb.edu References The id of another connection in this EML document to be used to provide the connection information. The id of another connection in this EML document to be used to provide the connection information. This is used instead of duplicating connection information when an identical connection needs to be used multiple times in an EML document. medium of the resource the medium on which this resource is distributed, either digitally or as hardcopy the medium on which this resource is distributed digitally, such as 3.5" floppy disk, or various tape media types, or 'hardcopy' CD-ROM, 3.5 in. floppy disk, Zip disk ISO CD 19115.3, Geographic information - Metadata Medium name Name of the medium that for this resource distribution Name of the medium on which this resource is distributed. Can be various digital media such as tapes and disks, or printed media which can collectively be termed 'hardcopy'. Tape, 3.5 inch Floppy Disk, hardcopy ISO CD 19115.3, Geographic information - Metadata density of the digital medium the density of the digital medium if this is relevant. the density of the digital medium if this is relevant. Used mainly for floppy disks or tape. High Density (HD), Double Density (DD) ISO CD 19115.3, Geographic information - Metadata units of a numerical density a numerical density's units if a density is given numerically, the units should be given here. B/cm ISO CD 19115.3, Geographic information - Metadata storage volume total volume of the storage medium the total volume of the storage medium on which this resource is shipped. 650 MB ISO CD 19115.3, Geographic information - Metadata medium format format of the medium on which the resource is shipped. the file system format of the medium on which the resource is shipped NTFS, FAT32, EXT2, QIK80 ISO CD 19115.3, Geographic information - Metadata note about the media note about the media any additional pertinent information about the media ISO CD 19115.3, Geographic information - Metadata