Project

General

Profile

Bug #2219

EML document from Andrews LTER are modified by Metacat during insertion and converted into invalid EML

Added by Saurabh Garg about 14 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Normal
Category:
metacat
Target version:
Start date:
09/30/2005
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
2219

Description

Below are the emails exchanged on metacat-dev describing this bug.

The bug is most probably in the SAX parser library that we are using in
Metacat. Because the SAX parser inserts attributes like scope, constantSI by
default and while entering these attributes, it converts the document into
invalid eml by removing >

Matt Jones wrote:

Sid,

This sounds like a serious metacat bug. Can you enter it as such so that we

are sure to track it down? They should not have to put in arbitrary CRs to
make it work :) Thanks.

Matt

Saurabh Garg wrote:

Duane:

I looked around more. The behaviour is highly unusual. But the fix is very

simple. If a carriage return is inserted between the line which has year and
others, it works fine. So change this:

<additionalMetadata><stmml:unitList><stmml:unit name="decimal degrees

latitude" unitType="latitudeLongitude" id="decimal degrees latitude"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="minutes of
a degree" unitType="latitudeLongitude" id="minutes of a degree"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="day of
month" unitType="datetime" id="day of month" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit><stmml:unit name="month of year"
unitType="datetime" id="month of year" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit><stmml:unit name="year (yyyy)"
unitType="datetime" id="year (yyyy)" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit></stmml:unitList></additionalMetadata>

to:

<additionalMetadata><stmml:unitList><stmml:unit name="decimal degrees

latitude" unitType="latitudeLongitude" id="decimal degrees latitude"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="minutes of
a degree" unitType="latitudeLongitude" id="minutes of a degree"
parentSI="unknown" multiplierToSI="1"></stmml:unit><stmml:unit name="day of
month" unitType="datetime" id="day of month" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit><stmml:unit name="month of year"
unitType="datetime" id="month of year" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1"></stmml:unit>

<stmml:unit name="year (yyyy)" unitType="datetime" id="year (yyyy)"

parentSI="YYYY-MM-DDThh:mm:ss" multiplierToSI="1"></stmml:unit>

</stmml:unitList></additionalMetadata>

Why this is happening - I have no idea. Maybe it has something to do

with '(' and ')' in the year line. But thats just a random guess.

-Sid

Duane Costa wrote:

On 9/29/05, Don Henshaw wrote:

Duane,

Any of these are fine. The EML is available on our public webpage.

From looking at the example, I'm wondering if the problem resides in

having units for date type attributes. We manage date display types as

units in our database but it looks as if these are getting

put into custom units, which is probably not right. Works great for our

webpage but not for EML. I'm sure we can easily fix once

we get some feedback on the problem.

don
-----

Here are two URLs to documents that represent the two problems described

below:

Docid: knb-lter-and.4041.4
http://wwwdata.forestry.oregonstate.edu/mdaccess/hjaemlharvester.aspx?

dbcode=TD023

Docid: knb-lter-and.3114.4
http://wwwdata.forestry.oregonstate.edu/mdaccess/hjaemlharvester.aspx?

dbcode=SP016

It seems to me there are really two aspects of these problems. First, there

may be a problem in the EML itself that needs to be

corrected. Second, the fact that Metacat inserted a number of documents

that it is now unable to read could indicate a bug in

Metacat.

Thanks,
Duane

-----Original Message-----
From: [mailto:metacat-dev-

] On Behalf Of Duane Costa

Sent: Thursday, September 29, 2005 3:56 PM
To:
Cc: ; 'Henshaw, Don'
Subject: [metacat-dev] Problems reading harvested documents

We are having two different problems reading documents that were

successfully harvested from an LTER site. The first problem applies to a large
subset of the documents. The second problem has been found in only one of the
documents. I'll describe both problems

below:

(1) LTER has a number of documents (119) that were successfully harvested

from the Andrews LTER site. This can be confirmed by doing a simple search on
string 'knb-lter-and' at http://prairie.lternet.edu/query .

However, roughly two-thirds (we have yet to determine the exact number) of

the documents cannot be read by Metacat. For example:

http://prairie.lternet.edu:8080/knb/metacat?action=read&qforma
t=xml&docid=knb-lter-and.4041.4

The following error appears in my browser:

XML Parsing Error: not well-formed
Location: http://prairie.lternet.edu:8080/knb/metacat?action=read&qforma
t=xml&docid=knb-lter-and.4041.4
Line Number 290, Column 311491:

constantToSI="0.0"></stmml:unit><stmml:unit name="month of year"
unitType="datetime" id="month of year" parentSI="YYYY-MM-DDThh:mm:ss"
multiplierToSI="1" constantToSI="0.0"></stmml:unit><stmml:unit name="year
(yyyy)"

unitType="datetime" id="year (yyyy)" parentSI="YYYY-MM-DDThh:mm:ss"

multiplierToSI="1"

constantToSI="0.0"</stmml:unit></stmml:unitList></additionalMe
tadata></eml:eml>

When I try to access the document in the Metacat search results, the

following error is issued:

http://prairie.lternet.edu:8080/knb/style/common/eml-2.0.0/eml
.xslError transforming document in DBTransform.transformXMLDocument:
Element type "stmml:unit" must be followed by either attribute

specifications, ">" or "/>".

I tried running the buildindex action on the document. It reported

success, but this didn't solve the problem.

(2) There is a single document harvested from Andrews which exhibits a

completely different problem. The document id is knb-lter-and.3114. The
document appears in the search results with an empty title, contacts,
organization, and keywords. When clicking on the ">>" link in the search
results, I get"

<error>
Error reading document: knb-lter-and.3114 </error>

We tried re-harvesting the document with an incremented revision number,

but we still get the same error.

Usually, the fix for this is to run the buildindex action:

http://knb.lternet.edu:8088/knb/metacat?action=buildindex&doci
d=knb-lter-and.3114.4
However, when I do so, I get the following:
java.lang.StringIndexOutOfBoundsException: String index out of range: 4000
java.lang.String.charAt(String.java:444)

edu.ucsb.nceas.metacat.MetaCatUtil.normalize(MetaCatUtil.java:271)

edu.ucsb.nceas.metacat.DocumentImpl.getNodeRecordList(Document
Impl.java:1685)

edu.ucsb.nceas.metacat.DocumentImpl.buildIndex(DocumentImpl.java:1168)

edu.ucsb.nceas.metacat.MetaCatServlet.buildDocumentIndex(MetaC
atServlet.java:2263)

edu.ucsb.nceas.metacat.MetaCatServlet.handleBuildIndexAction(M
etaCatServlet.java:2240)

edu.ucsb.nceas.metacat.MetaCatServlet.handleGetOrPost(MetaCatS
ervlet.java:514)

edu.ucsb.nceas.metacat.MetaCatServlet.doGet(MetaCatServlet.java:239)
javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

I'll try to find out from Don whether I have his permission to email the

direct URLs to a couple of these documents to metacat-dev so that the source
documents can be inspected directly, since they can't be read from Metacat.

I took a quick look in Bugzilla, but I didn't see anything recorded that

looked like it was related to either of these problems.

Thanks,
Duane

_______________________________________
Metacat-dev mailing list

http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/
metacat-dev

_______________________________________
Metacat-dev mailing list

http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/metacat-dev

_______________________________________
Metacat-dev mailing list

http://mercury.nceas.ucsb.edu/ecoinformatics/mailman/listinfo/metacat-dev

History

#1 Updated by Michael Daigle over 10 years ago

Metacat now stores and retrieves the original metadata from disk, ensuring that the doc is not modified.

#2 Updated by Redmine Admin over 6 years ago

Original Bugzilla ID was 2219

Also available in: Atom PDF