odd characters cause html display of eml to fail
While examining the HTML skins for EML, we found this example of a test doc caused the display to have an error (blank page):
and once the odd characters are removed, it displays normally:
These odd characters were found by
cat knb-lter-knz.2.4_mgb.xml | tr
d '\000\011\013-\177' > foo
sort foo | uniq
They are found on lines 39, 48, 577, 680, and 682 of knb-lter-knz.2.4 in the attached eml file. (This not likely to match the doc of the same pkg Id in the lter metacat, as that one is probably cleaned up.)
Note that showDataset can display that eml w/o removing the odd characters. They just appear as weird characters.
where mcr-dev currently points to lava but that is likely to change.
#2 Updated by gastil gastil over 9 years ago
To determine whether this bug is important, I used the pathQuery results from 30 April 2012 of all LTER eml abstracts. Of the 6826 eml docs, 240 of them contain some odd characters.
cat resultSet | tr
d '\000\011\013-\177' | grep -v "^$" odd_chars | wc -l
So this affects less than 4 percent of the LTER eml docs in the LTER Metacat.
#4 Updated by gastil gastil over 9 years ago
Another example of this is the difference between
which does display
which does not display, but instead fails with a "white-page".
The ONLY diff between revisions 2 and 3 is the pacakgeId and
Bancroft âs office
in the abstract.
(revision 1 had denyFirst.)
#5 Updated by ben leinfelder over 9 years ago
These are unfortunate errors, but are most likely do to copy-and-paste from a [MS Word] document into the metadata file.
I've added better error reporting during the transform so there's now an indication that rendering was not possible when the character is encountered.