Bug #2219
closedEML document from Andrews LTER are modified by Metacat during insertion and converted into invalid EML
0%
Description
Below are the emails exchanged on metacat-dev describing this bug.
The bug is most probably in the SAX parser library that we are using in
Metacat. Because the SAX parser inserts attributes like scope, constantSI by
default and while entering these attributes, it converts the document into
invalid eml by removing >
Matt Jones wrote:
Sid,
This sounds like a serious metacat bug. Can you enter it as such so that we
are sure to track it down? They should not have to put in arbitrary CRs to
make it work :) Thanks.
Matt
Saurabh Garg wrote:
Duane:
I looked around more. The behaviour is highly unusual. But the fix is very
simple. If a carriage return is inserted between the line which has year and
others, it works fine. So change this:
<additionalMetadata><stmml:unitList><stmml:unit name="decimal degrees
latitude" unitType="latitudeLongitude" id="decimal degrees latitude"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="minutes of
a degree" unitType="latitudeLongitude" id="minutes of a degree"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="day of
month" unitType="datetime" id="day of month" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit><stmml:unit name="month of year"
unitType="datetime" id="month of year" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit><stmml:unit name="year (yyyy)"
unitType="datetime" id="year (yyyy)" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit></stmml:unitList></additionalMetadata>
to:
<additionalMetadata><stmml:unitList><stmml:unit name="decimal degrees
latitude" unitType="latitudeLongitude" id="decimal degrees latitude"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="minutes of
a degree" unitType="latitudeLongitude" id="minutes of a degree"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="day of
month" unitType="datetime" id="day of month" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit><stmml:unit name="month of year"
unitType="datetime" id="month of year" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit>
<stmml:unit name="year (yyyy)" unitType="datetime" id="year (yyyy)"
parentSI="YYYY-MM-DDThh:mm:ss" multiplierToSI="1"></stmml:unit>
</stmml:unitList></additionalMetadata>
Why this is happening - I have no idea. Maybe it has something to do
with '(' and ')' in the year line. But thats just a random guess.
-Sid
Duane Costa wrote:
On 9/29/05, Don Henshaw wrote:
Duane,
Any of these are fine. The EML is available on our public webpage.
From looking at the example, I'm wondering if the problem resides in
having units for date type attributes. We manage date display types as
units in our database but it looks as if these are getting
put into custom units, which is probably not right. Works great for our
webpage but not for EML. I'm sure we can easily fix once
we get some feedback on the problem.
don
-----Here are two URLs to documents that represent the two problems described
below:
Docid: knb-lter-and.4041.4
http://wwwdata.forestry.oregonstate.edu/mdaccess/hjaemlharvester.aspx?
dbcode=TD023
Docid: knb-lter-and.3114.4
http://wwwdata.forestry.oregonstate.edu/mdaccess/hjaemlharvester.aspx?
dbcode=SP016
It seems to me there are really two aspects of these problems. First, there
may be a problem in the EML itself that needs to be
corrected. Second, the fact that Metacat inserted a number of documents
that it is now unable to read could indicate a bug in
Metacat.
Thanks,
Duane-----Original Message-----
From: metacat-dev-bounces@ecoinformatics.org [mailto:metacat-dev-
bounces@ecoinformatics.org] On Behalf Of Duane Costa
Sent: Thursday, September 29, 2005 3:56 PM
To: metacat-dev@ecoinformatics.org
Cc: isangil@lternet.edu; 'Henshaw, Don'
Subject: [metacat-dev] Problems reading harvested documentsWe are having two different problems reading documents that were
successfully harvested from an LTER site. The first problem applies to a large
subset of the documents. The second problem has been found in only one of the
documents. I'll describe both problems
below:
(1) LTER has a number of documents (119) that were successfully harvested
from the Andrews LTER site. This can be confirmed by doing a simple search on
string 'knb-lter-and' at http://prairie.lternet.edu/query .
However, roughly two-thirds (we have yet to determine the exact number) of
the documents cannot be read by Metacat. For example:
http://prairie.lternet.edu:8080/knb/metacat?action=read&qforma
t=xml&docid=knb-lter-and.4041.4The following error appears in my browser:
XML Parsing Error: not well-formed
Location: http://prairie.lternet.edu:8080/knb/metacat?action=read&qforma
t=xml&docid=knb-lter-and.4041.4
Line Number 290, Column 311491:
constantToSI="0.0"></stmml:unit><stmml:unit name="month of year"
unitType="datetime" id="month of year" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1" constantToSI="0.0"></stmml:unit><stmml:unit name="year
(yyyy)"
unitType="datetime" id="year (yyyy)" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"
constantToSI="0.0"</stmml:unit></stmml:unitList></additionalMe
tadata></eml:eml>When I try to access the document in the Metacat search results, the
following error is issued:
http://prairie.lternet.edu:8080/knb/style/common/eml-2.0.0/eml
.xslError transforming document in DBTransform.transformXMLDocument:
Element type "stmml:unit" must be followed by either attribute
specifications, ">" or "/>".
I tried running the buildindex action on the document. It reported
success, but this didn't solve the problem.
(2) There is a single document harvested from Andrews which exhibits a
completely different problem. The document id is knb-lter-and.3114. The
document appears in the search results with an empty title, contacts,
organization, and keywords. When clicking on the ">>" link in the search
results, I get"
<error>
Error reading document: knb-lter-and.3114 </error>We tried re-harvesting the document with an incremented revision number,
but we still get the same error.
Usually, the fix for this is to run the buildindex action:
http://knb.lternet.edu:8088/knb/metacat?action=buildindex&doci
d=knb-lter-and.3114.4
However, when I do so, I get the following:
java.lang.StringIndexOutOfBoundsException: String index out of range: 4000
java.lang.String.charAt(String.java:444)edu.ucsb.nceas.metacat.MetaCatUtil.normalize(MetaCatUtil.java:271)
edu.ucsb.nceas.metacat.DocumentImpl.getNodeRecordList(Document
Impl.java:1685)edu.ucsb.nceas.metacat.DocumentImpl.buildIndex(DocumentImpl.java:1168)
edu.ucsb.nceas.metacat.MetaCatServlet.buildDocumentIndex(MetaC
atServlet.java:2263)edu.ucsb.nceas.metacat.MetaCatServlet.handleBuildIndexAction(M
etaCatServlet.java:2240)edu.ucsb.nceas.metacat.MetaCatServlet.handleGetOrPost(MetaCatS
ervlet.java:514)edu.ucsb.nceas.metacat.MetaCatServlet.doGet(MetaCatServlet.java:239)
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)I'll try to find out from Don whether I have his permission to email the
direct URLs to a couple of these documents to metacat-dev so that the source
documents can be inspected directly, since they can't be read from Metacat.
I took a quick look in Bugzilla, but I didn't see anything recorded that
looked like it was related to either of these problems.
Thanks,
Duane_______________________________________
Metacat-dev mailing list
Metacat-dev@ecoinformatics.org
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/
metacat-dev_______________________________________
Metacat-dev mailing list
Metacat-dev@ecoinformatics.org
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/metacat-dev_______________________________________
Metacat-dev mailing list
Metacat-dev@ecoinformatics.org
http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/metacat-dev
Updated by Michael Daigle over 15 years ago
Metacat now stores and retrieves the original metadata from disk, ensuring that the doc is not modified.