Bug #602
closedeml-physical
0%
Description
Matt Jones pointed out that asciiFixed and asciiDelimited should
be changed to less misleading name like textFixed and textDelimited
because other encoding schemes are possible. Unicode for example.
Describing these as asciiFixed or asciiDelimited is misleading
because it implies it can only be ascii. The encoding scheme
can be set in <physical><dataObject><characterEncoding>
Data Objects whose format is a mixture of fixed and delimited are
not supported as eml-physical is currently structured. For example,
data objects whose physical structure looks like this cannot be
represented.
May,100aaaa,1.2,
April,200aaaa,3.4,
June,300bbbb,4.6,
The second attribute is a composite of two attributes that
are of fixed length but with no fixed fieldStartColumn.
I recommend the following changes to eml-physical to support mixed
data formats.
Both asciiDelimited and textFixed be placed as repeatable choices
under a new element called textFormat. numHeaderLines and
numPhysicalLines be made optional subelements of textFormat
because they are global to the data objects being described.
The only actual content change would be moving numPhysicalLines
as a subelement of textFixed and making it a subelement of
textFormat. So the instance document chunk that would describe
the above data object would look like this:
<physical>
<dataObject>
.
.
.
</dataObject>
<dataFormat>
<textFormat>
<textDelimited>
<fieldDelimiter>,</fieldDelimiter>
</textDelimited>
<textFixed>
<fieldBounds>
<fieldStartColumn>-1</fieldStartColunm>
<fieldWidth>3<fieldWidth>
</fieldBounds>
<fieldBounds>
<fieldStartColumn>-1</fieldStartColunm>
<fieldWidth>4<fieldWidth>
</fieldBounds>
</textFixed>
<textFixed>
</textFixed>
<textDelimited>
<fieldDelimiter>,</fieldDelimiter>
</textDelimited>
</textFormat>
</dataFormat>
</physical>
Note that fieldStartColumn is set to -1. Because this column
does not make sense in a mixed format context we could set this
value to -1 OR make this element optional. Currently
fieldStartColumn is an unsignedInt. We would have to make it
an integer or long to support negative numbers
See file eml-physical.xsd sent to eml-dev.
The above solution for mixed data formats solves the problem
of <asciiFixed><fieldBounds> not being repeatable. If folks want
eml-physical to stay as is this element needs to be made repeatable
or you will be limited to only one attribute per dataObject. Clearly,
this was an oversight.
Files