Bug #335


decompose eml identifiers into familyid and revision

Added by Matt Jones over 22 years ago. Updated about 22 years ago.

eml - general bugs
Target version:
Start date:
Due date:
% Done:


Estimated time:


Current eml identifiers are a string that symbolizes a unique revision of an
object (e.g., jones.14.1). The same identifer should always be associated with
the same stream of bytes (ie, checksums would match).

Suggestion that eml identifiers should be decomposed into two parts. The first
part is a "family" id (string) that represents a group of related objects. The
second is a revision # (integer) that indicates the revision number of one of
the objects in the family. The combination of the familyid and revisionnum
would always be unique, and would be usable as an accession number. In XML,
this could look something like:

<identifier system="knb">

Questions remain.
1) Would revision be required in eml, or optional?
If optional, then EML would allow description of objects that are not unique.
Is this a good thing that we want to encourage/allow as a community?
2) For citation in print publications or other non-xml environments, how would
one refer to the combination of familyid and revisionid?
Previously we were able to use the whole string -- how do we combine the parts
together now? Can we still concatenate them with a separator character?

Actions #1

Updated by Matt Jones over 22 years ago

Chnaging target milestone for the major EML bugs to Beta7, which is scheduled
for early to mid March for release. There are likely other bugs that need to be
entered and resolved for this Beta7 release as well, so lets generate a complete

Actions #2

Updated by Matt Jones about 22 years ago

After putting a lot of thought into this issue, mu conclusion is that we should
NOT make these changes to the "identifier" fields in EML. Rather, the
identifiers need to be atomic and contained within a string so that they can be
referenced in print publications and other venues as accession numbers. We gain
very little by breaking it apart.

Do we have concensus on this? If so, we can close the bug as WONT FIX.

One possibilty that we have discussed in the past is a mechanism for describing
the internal structure of identifiers. It seems like we have three types of
componenets of an identifier of significance: alphanumeric strings, serial
numbers, and separator characters. These can be combined in various ways to
create a unique identifier. If we can show these, then we could indicate what
parts of the identifier correspond to a unique revision, and which parts
correspond to a "family" of versions of an object.
In the metacat/morpho case, we use an identifier like knb.XXX.YYY where knb.XXX
indicates a family of revisions, part of which (XXX) is a serial number, and
where YYY is the revision serial number. The separators are "." chars. Being
able to optionally describe these parts of the identifier could be useful.
One way to do this would be to use a format string in an attribute that is
identifiable as to its sub-parts. COnsider this:
<identifier format="(A+.N+).(N+)" family="1" revision="2" sep="."
So the "A" and "N" indicate alphanumeric and numeric digits, the + means one or
more (could also use *,?), the parantheses indicate the groupings of parts of
the identifier which are sequentially labeled 1,2,3 and can be referenced in the
"family" and "revision" attributes. So the identifer can be decomposed with this
info for use in automated systems that want to reason about version lineages, or
auto-increment revision numbers, etc.

This is a feature that could be added later without disrupting current metadata
if format, family, revision, and sep were optional attributes. So, I think for
now we should not include this as part of EML2.0.0, but consider it for a later
release. It has the problem that it is a lot of extra information if it is
repeated in every metadata document in a package.

Actions #3

Updated by Matt Jones about 22 years ago

Resolving this bug as WONTFIX based on the comments I made previously in the
bug. At a minimum this request would not be incorporated into the beta7 release.

Actions #4

Updated by Redmine Admin about 11 years ago

Original Bugzilla ID was 335


Also available in: Atom PDF