Project

General

Profile

Bug #5618

odd characters cause html display of eml to fail

Added by gastil gastil about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
utilities
Target version:
Start date:
06/05/2012
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
5618

Description

While examining the HTML skins for EML, we found this example of a test doc caused the display to have an error (blank page):

https://demo2.test.dataone.org/knb/metacat/knb-lter-knz.2.4/default

and once the odd characters are removed, it displays normally:
https://demo2.test.dataone.org/knb/metacat/knb-lter-knz.2.5/default

These odd characters were found by
cat knb-lter-knz.2.4_mgb.xml | tr d '\000\011\013-\177' > foo
sort foo | uniq

â
°
µ
They are found on lines 39, 48, 577, 680, and 682 of knb-lter-knz.2.4 in the attached eml file. (This not likely to match the doc of the same pkg Id in the lter metacat, as that one is probably cleaned up.)

Note that showDataset can display that eml w/o removing the odd characters. They just appear as weird characters.

ie
http://mcr-dev.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-knz.2.3
where mcr-dev currently points to lava but that is likely to change.

knb-lter-knz.2.4_mgb.xml (32.8 KB) knb-lter-knz.2.4_mgb.xml gastil gastil, 06/05/2012 03:33 PM

History

#2 Updated by gastil gastil about 7 years ago

To determine whether this bug is important, I used the pathQuery results from 30 April 2012 of all LTER eml abstracts. Of the 6826 eml docs, 240 of them contain some odd characters.

cat resultSet | tr d '\000\011\013-\177' | grep -v "^$" odd_chars | wc -l
240

So this affects less than 4 percent of the LTER eml docs in the LTER Metacat.

#3 Updated by gastil gastil about 7 years ago

To clarify this bug, see knb-lter-bug.4103.2 versus knb-lter-bug.4103.3 in demo2.

The only difference is the presence of higher-order ascii characters in the abstract.
In revision 3 they are commented-out.

#4 Updated by gastil gastil about 7 years ago

Another example of this is the difference between
https://demo2.test.dataone.org/knb/metacat/knb-test-nrs.569.3/default
which does display
versus
https://demo2.test.dataone.org/knb/metacat/knb-test-nrs.569.2/default
which does not display, but instead fails with a "white-page".

The ONLY diff between revisions 2 and 3 is the pacakgeId and
Bancroft's office
vs
Bancroft ’s office

in the abstract.

(revision 1 had denyFirst.)

#5 Updated by ben leinfelder about 7 years ago

These are unfortunate errors, but are most likely do to copy-and-paste from a [MS Word] document into the metadata file.
I've added better error reporting during the transform so there's now an indication that rendering was not possible when the character is encountered.

#6 Updated by Redmine Admin over 6 years ago

Original Bugzilla ID was 5618

Also available in: Atom PDF