EML: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362002-04-16T17:09:12ZEcoinformatics Redmine
Redmine Bug #472 (Resolved): establish consistent namespaces for schemashttps://projects.ecoinformatics.org/ecoinfo/issues/4722002-04-16T17:09:12ZMatt Jonesjones@nceas.ucsb.edu
<p>EML currently uses namespaces of the form "eml:modulename" for each of the eml<br />modules (e.g., eml:dataset). In contrast, we use version specific public<br />identifiers for the EML dtds (e.g.,<br />"-//ecoinformatics.org//eml-dataset-2.0.0beta6//EN"). The formal public<br />identifiers will need to be updated with each revision of the standard, but<br />benefit in that they are allow one to specifically state which version of the<br />module a document uses. This is important in systems where we need to be able<br />to reliably validate documents.</p>
<p>So, I think we need to change the public namespaces for eml to be versioned like<br />the public identifiers are. A format like this would do:<br /> "eml:eml-dataset-2.0.0beta7"</p>
<p>Note that I specifically did not choose to use an http URI for this namespace<br />because of the intense controversy over resolvability of namespace URIs, and the<br />later development of specs like RDDL. The namespace spec explicitly states that<br />processors should NOT expect that a schema will reside at the namespace URI, nor<br />even that the namespace URI is resolvable as an address. Thus, the "eml" scheme<br />in the URI makes it clear that it is not a resolvable URL. We should rely on<br />schemaLocation, or handle it in each schema processor.</p>
<p>This will need to be changed throughout the schema docs.</p>
<p>Also, need to add documentation in the DTDs describing the proper public<br />identifier that should be used with the DTDs so that it is clear.</p> Bug #470 (Resolved): need to be able to inline data in EMLhttps://projects.ecoinformatics.org/ecoinfo/issues/4702002-04-15T22:47:19ZMatt Jonesjones@nceas.ucsb.edu
<p>Request to be able to inline data in the same file as EML. For binary data,<br />this could be Base64 encoded. For text it could be in stream. Probably should<br />work from a current standard way to do it like XSIL.</p> Bug #450 (Resolved): entityType and formathttps://projects.ecoinformatics.org/ecoinfo/issues/4502002-03-27T18:27:55ZMatt Jonesjones@nceas.ucsb.edu
<p>The eml-entity module has a field called "entityType" that is supposed to<br />contain the type of the entity for "other" entities. The eml-physical file has<br />a field called "format" that is supposed to contain the name of the data forat<br />for the physical file. We need to clarify the difference between these fields.</p>
<p>If one is using a mime-type to indicate the format (e.g., image/gif), where<br />should that go? My guess is eml-physical/format.</p> Bug #430 (Resolved): need DTDs to correspond to XSD fileshttps://projects.ecoinformatics.org/ecoinfo/issues/4302002-02-15T02:14:28ZMatt Jonesjones@nceas.ucsb.edu
<p>The current set of DTD files checked into the eml module do not correspond in a<br />1:1 way with the XSD files. In particular, 1) parameter entities were resolved<br />(e.g., eml-dataset includes eml-resource) and should not be; and 2) multiple<br />global elements in the schema should be represented as possible root elements in<br />the DTD but in fact were eliminated. For example, in eml-entity, both<br />"table-entity" and "other-entity" should be root elements in the eml-entity.dtd,<br />but infact only "table-entity" is present because it caused some problems<br />withthe software we were using to parse DTDs. This needs to be fixed so that<br />all appropriate elements are available.</p> Bug #429 (Resolved): add additional entity types to EMLhttps://projects.ecoinformatics.org/ecoinfo/issues/4292002-02-15T02:08:42ZMatt Jonesjones@nceas.ucsb.edu
<p>The current eml-entity module describes two types of entities: table-entities<br />and other-entities. Ultimately I think we need to be able to describe several<br />other specific types of entities, particularly spatial images and various GIS<br />objects.</p>
<p>General image support may also be useful (e.g., for jpg, gif, etc) so that photo<br />quadrats and other types of images used as data and metadata can easily be<br />included. We may be able to easily accomodate many of these generic entity<br />types but utilizing a MIME-type label (e.g., image/gif) in the entityType field,<br />although there may also be need for additional metadata for these entity types.</p> Bug #428 (Resolved): eml-constraint overlaps with packaging conceptshttps://projects.ecoinformatics.org/ecoinfo/issues/4282002-02-15T01:15:06ZMatt Jonesjones@nceas.ucsb.edu
<p>The current incarnation of eml-constraint allows the enumeration and definition<br />of integrity constraints that apply to entities. These are currently drawn from<br />the relational model, including UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK<br />constraints. It may also be extended to include other types of relationships<br />between entities that are not part of the relational model.</p>
<p>The "triple" element allows us to create arbitrary relationships between<br />identifiable objects in EML, and is used for associating data with metadata, and<br />groups of metadata and data objects together as a "package". This usage is very<br />similar to the relational model, in that it allows us to define 3-valued tuples<br />in a graph structure. Constraints between entities could conceivable be modeled<br />using this infrastructure, probably with some modifications to the concept of a<br />"relationship".</p>
<p>So, the question arises. Should we try to develop a unified approach to the<br />specification of constraints and the specification of packages? It might be<br />more elegant, but possibly at the cost of simplicity and ease-of-use. My gut<br />feeling is that this is not something we whould pursue, but would like to hear<br />other people's reasons for or against it.</p> Bug #427 (Resolved): eml-constraint use of identifershttps://projects.ecoinformatics.org/ecoinfo/issues/4272002-02-15T01:06:32ZMatt Jonesjones@nceas.ucsb.edu
<p>The current eml-constraint module is designed to reference table and attribute<br />identifers so that the relationships between two particular entities can be<br />established. However, we do not currently indicate how the values for these<br />identifiers should be obtained or constrained. Are they the eml-identifiers<br />(which doesn't work for attributes), or are they names (entityName,<br />attributeName) which might run into many problems with uniqueness issues? We<br />need an easy, consistent, approach that we recommend or require as part of the<br />semantics of this module.</p>
<p>In addition, constraints will always apply to one or more entities, so it is<br />reasonable to consider merging the entire eml-constraint module onto eml-entity.<br /> However, doing this means that constraints that affect a table may be only<br />described in the description of a different table, which could definitely cause<br />some problems in locating the information. By maintining the independence of<br />the eml-constrain module, we create a single, identifiable location where both<br />participants in a constraint can be enumerated. This will be far easier for<br />applications to use to identify both sides of a constraint, at the cost of<br />having to specify both sides in the constraint description. Of course, this<br />does not apply to constraints that apply to only a single entity such as UNIQUE<br />constraints.</p> Bug #373 (Resolved): Incorrect Citation reference (citeinfo) in eml-coverage, temporalCov and tax...https://projects.ecoinformatics.org/ecoinfo/issues/3732001-12-12T06:43:19ZChris Jonescjones@nceas.ucsb.edu
<p>The geolcit, classcit, and idref elements in eml-coverage.xsd use a complex type<br />consisting of one element reference in a sequence. The reference is to the<br />citeinfo element, which is a single element defined in eml-coverage.xsd (with no<br />documentation). This element ref needs to be changed to point to an eml<br />literature citation field. Most likely, we would have to import citation into<br />eml-coverage for this to work appropriately.</p> Bug #339 (Resolved): bounding box vs point data in eml-coveragehttps://projects.ecoinformatics.org/ecoinfo/issues/3392001-11-30T02:41:18ZMatt Jonesjones@nceas.ucsb.edu
<p>The current eml-coverage requires a bounding box described by two points. Many<br />ecological data sets are collected at a site with a point location but no know<br />bounding box. How can we accomodate point coverage? Two possibilities: 1)<br />change the content model to make one of the points in the bounding box optional,<br />or 2) change the documentation to tell the user to fill in identical points in<br />both bounding box coordinates if it is a point.</p> Bug #338 (Resolved): need coverage documentation, review all documentationhttps://projects.ecoinformatics.org/ecoinfo/issues/3382001-11-30T02:38:06ZMatt Jonesjones@nceas.ucsb.edu
<p>Many of the fields in eml-coverage are inadequately documented. Need to<br />thoroughly fill ouot the documentation in the annotation tags of the schema files.</p> Bug #337 (Resolved): storage type issueshttps://projects.ecoinformatics.org/ecoinfo/issues/3372001-11-30T02:33:19ZMatt Jonesjones@nceas.ucsb.edu
<p>KNB scientists wanted to classify storage type for attributes as "nominal",<br />"ordinal", "interval", rather than using the physical storage types we had<br />considered (e.g., test, integer, floating point). Need to clarify what the<br />contents of this field should be and possibly define a domain for the value-space.</p> Bug #336 (Resolved): need formal defs for responsible party roleshttps://projects.ecoinformatics.org/ecoinfo/issues/3362001-11-30T02:30:51ZMatt Jonesjones@nceas.ucsb.edu
<p>Need formal definitions for the responsible party roles that is usable by<br />applications like morpho to help guide the user in their choice of roles.<br />There's a lot of confusion about several roles, like<br />originator/owner/principalInvestigator.</p> Bug #335 (Resolved): decompose eml identifiers into familyid and revisionhttps://projects.ecoinformatics.org/ecoinfo/issues/3352001-11-30T02:02:27ZMatt Jonesjones@nceas.ucsb.edu
<p>Current eml identifiers are a string that symbolizes a unique revision of an<br />object (e.g., jones.14.1). The same identifer should always be associated with<br />the same stream of bytes (ie, checksums would match).</p>
<p>Suggestion that eml identifiers should be decomposed into two parts. The first<br />part is a "family" id (string) that represents a group of related objects. The<br />second is a revision # (integer) that indicates the revision number of one of<br />the objects in the family. The combination of the familyid and revisionnum<br />would always be unique, and would be usable as an accession number. In XML,<br />this could look something like:</p>
<p><identifier system="knb"><br /> <familyid>jones.43</familyid><br /> <revision>13</revision><br /></identifier></p>
<p>Questions remain. <br />1) Would revision be required in eml, or optional?<br /> If optional, then EML would allow description of objects that are not unique.<br />Is this a good thing that we want to encourage/allow as a community?<br />2) For citation in print publications or other non-xml environments, how would<br />one refer to the combination of familyid and revisionid?<br /> Previously we were able to use the whole string -- how do we combine the parts<br />together now? Can we still concatenate them with a separator character?</p> Bug #269 (Resolved): resolve packaging issueshttps://projects.ecoinformatics.org/ecoinfo/issues/2692001-08-31T16:39:12ZMatt Jonesjones@nceas.ucsb.edu
<p>There are some contentious issues surrounding the use of packaging (ie, the<br />triple element) in EML. Some would prefer inclusion via namespaces directly to<br />make the schema more explicit. But using triples to associate data and metadata<br />files is more flexible and allows new types of metadata to be added over time<br />without changes to the original structure.</p>
<p>One complaint is that the current structure requires multiple files to deliver<br />all of the metadata. One possible solution is to include an element 'metadata'<br />with content model 'ANY' as the root element, which can contain all of the other<br />modules, and they in turn can use namespaces to indicate how validation can be<br />performed.</p>