absence of line feeds in eml causes pathQuery to not find some elements
Presence of line feeds seems to be needed for an eml doc to get loaded properly so pathQuery can find attributeList or attribute. Not just one line feed at the end.
We detected this on metacat 1.9.5 at metacat.lternet but tested it on metacat 2.0 (lava.lternet)
knb-lter-kbs.10.19 has no line feeds at all in the document.
revision 20 is same as 19 except stmml-1.1 is spelled right.
revision 21 is same as 20 except it has one line feed at the end of the file.
(so revision 21 has one line)
revisions 19 thru 21, while they were the last revision, did not have their attributeList found by pathQuery.
revision 22, with 165 lines feeds DOES have its attributeList seen by pathQuery.
wc -l knb-lter-kbs.10.*
pathQuery result snippets from two separate queries (when two different revisions were the last revision):
#1 Updated by gastil gastil about 7 years ago
To diagnose this further:
I ran pathQuery for returnfield dataset/dataTable/attributeList/, that is, the whole xpath, not just the attributeList element name by itself.
That is, same result whether attributeList is by itself or a more complete xpath.
I ran pathQuery for elements under attributeList such as
Lots of KBS eml docs (the ones with no line feeds) DO return formatString under a path including attributeList, but zero returns for attributeList itself.
We are guessing maybe the bug relates to the word "attribute" because of its special meaning in xml. I would not think "attributeList" is a reserved keyword though.
#3 Updated by ben leinfelder about 7 years ago
While this is indeed odd, my hunch is that we get a placeholder leaf node for the line feed that separates <attributeList> from it's first child <attribute> element. In query results you're getting of a quirky response with this blank returned as retrunfield data.
My suggestion - no matter what becomes of this bug - is to use a more definitive xpath to a true leaf node like "attributeList/attribute/attributeName" since this is a required element for any attributeList. This way you will be guaranteed positive/negative results no matter what the whitespace on the document looks like.