Bug #2469
closedDocumentImpl.buildIndex() does not index XPaths with attributes correctly
0%
Description
A 1.6.x metacat installation that indexes paths from the xml_nodes table into the xml_path_index table sets the xml_path_index.path column correctly, but sets the xml_path_index.nodedata incorrectly for ATTRIBUTE nodes. This results in searches that return an incorrect subset of documents because the xml_path_index table doesn't reflect the true values in xml_nodes.
For example, an EML 2.0.1 document with a packageId attribute:
<eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.0.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
packageId="ALEXXX_015MTBD003R00_19990906.50.1"
scope="system" system="knb"
xsi:schemaLocation="eml://ecoinformatics.org/eml-2.0.1 eml.xsd">
<dataset scope="document">
<shortName>PISCO moored temperature, ALE</shortName>
... etc ...
</dataset>
</eml:eml>
will contain an indexed record in xml_path_index with the following columns:
docid: ALEXXX_015MTBD003R00_19990906.50
path: /eml/@packageId
nodedata: PISCO moored temperature, ALE
rather than:
docid: ALEXXX_015MTBD003R00_19990906.50
path: /eml/@packageId
nodedata: ALEXXX_015MTBD003R00_19990906.50.1
It seems that the nodedata for the attribute is set to the node value of the next leaf node, in this case the /eml:eml/dataset/shortName field.
This also occurs for other attributes that are indexed in the document, such as /eml:eml/dataset/coverage/geographicCoverage/@id (which has a value of 'ALE')
The above @id will have an indexed value set to the geographicDescription value found in /eml:eml/dataset/coverage/geographicCoverage/geographicDescription (not 'ALE' as above)
Related issues