Bug #1197


dictionary needed for externallyDefinedFormat

Added by Peter McCartney about 19 years ago. Updated over 18 years ago.

eml - general bugs
Target version:
Start date:
Due date:
% Done:


Estimated time:


Externally defined format is useless for automatic processing unless you have
some idea what to look for. This is a step backwards from FGDC which at least
provided enumerations for the common file formats at the time.


formatDictionary2.xml (1.15 KB) formatDictionary2.xml Peter McCartney, 12/17/2003 10:36 AM
Actions #1

Updated by Peter McCartney about 19 years ago

Here is a possible format for a dictionary file to provide an anuthority and
reference for data formats (and archive formats)

Actions #2

Updated by Peter McCartney about 19 years ago

<externallyDefinedFormat name="Shapefile" description="ESRI shapefile" >
<part extension="shp" mime="application/octet-stream"/>
<part extension="dbf" mime="application/octet-stream"/>
<part extension="shx" mime="application/octet-stream"/>
<part extension="prj" mime="application/octet-stream"/>
<part extension="idx" mime="application/octet-stream"/>
<externallyDefinedFormat name ="dBase4" description ="dBase file format">
<part extension="dbf" mime="application/octet-stream"/>
<part extension="idx" mime="application/octet-stream"/>
<externallyDefinedFormat name ="dBase4" description ="dBase file format">
<part extension="dbf" mime="application/octet-stream"/>
<part extension="idx" mime="application/octet-stream"/>
<externallyDefinedFormat name="MSSQLServer7.0" description="MS SQL server version 7.0"/>
<archiveFormat name="zip" description="pkzip compressed archive format">
<part extension="zip" mime="application/zip"/>


Actions #3

Updated by Peter McCartney about 19 years ago

ok the issue seems to be

1) we need a controlled enumeration for externallyDefinedFormat that is both
recongized by users and parsable by applications

2) mime types were created to serve this purpose. Project alexandria
investigated this and decided that a combination of both format name and mime
types was needed, since the appropriate mime type is not always adquate. Read to see
their discussion. Basically they provide three elements i their metadata schema
for downloads - format, mime, and encoding.

3) vendors are slowly adding mime types but very few scientific data formats
have been added. if we define mimes for these formats we could register them
only by putting an x- in front of it. and of course these definitions would be
depracated when the owner puts in a definition.

4) dataFormat is required, so if the data are in Oracle, we need to have
SOMETHING to put here, even if the information is superflous once
connectionDefinition is filled out. its not clear to me if mimes even apply to
connections - perhaps these are all octet-streams?

5) going beyond the enumeration issue, if we were to adopt a dictionary, we have
the option of storing other metadata on a format that could be useful. the
example i show here lists each part of a multipart format and its mime type. we
use a file similar to this in our Xylopia data service to determine what parts
of a file format need to be gathered up into the zip package. in my example it
lists extensions which works fine for dealing with shapefiles, dbf, mapinfo,
geoTiff, and so on. the only other multipart type that does not use extensions
to identify its parts is arcinfo coverages. in this case the rules rely on
foldernames and filenames under those folders to handle the different parts.
because coverages within one folder share a common metadata folder, you can not
move coverages by zipping up the files.. you must open it and save it as some
other format for transport.

there was some debate about the utility of this multipart info, so im willing to
table that part of the issue and continue to do it internally ourselves. but it
would be really nice if we could agree how to ensure that shapefile will always
be shapefile and not Shapefile, shape file, shape, esrishapefile...etc.

the attachment i put in (and edited) was an example of such a dictionary
showing how multiprt, single part and service formats could all be handled using
a strategy similar to ADA where we define format types, and then list the mimes
for each of the parts. Matt felt this was inappropriate as there is in fact a
multipart mime type.
so a variant on this would be to put the mime attribute in the
externallyDefinedElement tag rather than in the part tag (or both). the nice
ting about this is that like stmml.xsd, it abstracts users from complicated
terminology yet does enable maching processing through mime types when they
exist. if we leave the mime element out of eml, then the dictionary can have the
most up-todate mime for any given format and we dont have to edit eml files when
new mime types appear.

Actions #4

Updated by Matt Jones over 18 years ago

Changing QA contact to the list for all current EML bugs so that people can
track what is happening.

Actions #5

Updated by Redmine Admin almost 10 years ago

Original Bugzilla ID was 1197


Also available in: Atom PDF