Bug #5618
closedodd characters cause html display of eml to fail
0%
Description
While examining the HTML skins for EML, we found this example of a test doc caused the display to have an error (blank page):
https://demo2.test.dataone.org/knb/metacat/knb-lter-knz.2.4/default
and once the odd characters are removed, it displays normally:
https://demo2.test.dataone.org/knb/metacat/knb-lter-knz.2.5/default
These odd characters were found by
cat knb-lter-knz.2.4_mgb.xml | tr d '\000\011\013-\177' > foo
sort foo | uniq
â
°
µ
They are found on lines 39, 48, 577, 680, and 682 of knb-lter-knz.2.4 in the attached eml file. (This not likely to match the doc of the same pkg Id in the lter metacat, as that one is probably cleaned up.)
Note that showDataset can display that eml w/o removing the odd characters. They just appear as weird characters.
ie
http://mcr-dev.lternet.edu/cgi-bin/showDataset.cgi?docid=knb-lter-knz.2.3
where mcr-dev currently points to lava but that is likely to change.
Files
Updated by gastil gastil over 12 years ago
To determine whether this bug is important, I used the pathQuery results from 30 April 2012 of all LTER eml abstracts. Of the 6826 eml docs, 240 of them contain some odd characters.
cat resultSet | tr d '\000\011\013-\177' | grep -v "^$" odd_chars | wc -l
240
So this affects less than 4 percent of the LTER eml docs in the LTER Metacat.
Updated by gastil gastil over 12 years ago
To clarify this bug, see knb-lter-bug.4103.2 versus knb-lter-bug.4103.3 in demo2.
The only difference is the presence of higher-order ascii characters in the abstract.
In revision 3 they are commented-out.
Updated by gastil gastil over 12 years ago
Another example of this is the difference between
https://demo2.test.dataone.org/knb/metacat/knb-test-nrs.569.3/default
which does display
versus
https://demo2.test.dataone.org/knb/metacat/knb-test-nrs.569.2/default
which does not display, but instead fails with a "white-page".
The ONLY diff between revisions 2 and 3 is the pacakgeId and
Bancroft's office
vs
Bancroft âs office
in the abstract.
(revision 1 had denyFirst.)
Updated by ben leinfelder over 12 years ago
These are unfortunate errors, but are most likely do to copy-and-paste from a [MS Word] document into the metadata file.
I've added better error reporting during the transform so there's now an indication that rendering was not possible when the character is encountered.