Task #6040
open
Metacat-index does not handle <references>
Added by ben leinfelder over 11 years ago.
Updated about 11 years ago.
Description
I indexed a document from EVOS that uses a reference for a creator rather than the details of the person:
<creator><references>1359152217358</references></creator>
But in the index it shows up as "||" instead of following the reference back the the id where it was declared:
<associatedParty id="1359152217358">...
http://evos.nceas.ucsb.edu/evos/metacat/df35c.9.14/default
Here is a bit of the bean definition used by indexing to pick out the content from EML
<bean id="eml.origin" class="org.dataone.cn.indexer.parser.CommonRootSolrField"
p:multivalue="true"
p:root-ref="originRoot">
<constructor-arg name="name" value="origin" />
</bean>
<bean id="originRoot" class="org.dataone.cn.indexer.parser.utility.RootElement"
p:name="origin"
p:xPath="//dataset/creator"
p:template="[individualName]||[organizationName]">
<property name="leafs"><list><ref bean="organizationNameLeaf"/></list></property>
<property name="subRoots"><list><ref bean="individualNameRoot" /></list></property>
</bean>
- Target version changed from 2.1.0 to 2.1.1
- Target version changed from 2.1.1 to 2.2.0
- Target version changed from 2.2.0 to 2.2.1
- Priority changed from Normal to High
- Target version changed from 2.2.1 to 2.2.0
Apparently this is fixed in cn-index-processor v1.2.0 -- so we will need to pull in this newer dependency in metacat-index and adjust the code accordingly.
- Target version changed from 2.2.0 to 2.2.1
This is included in the 1.2.0 d1 index release. It will not include || but instead will use blanks. Not a very great "solution" but better.
Spaces aren't really sufficient as a solution, and there are a lot of references fields in EML. We probably need to contribute a fix for this if Skye is not going to fix it for DataONE.
Skye said that the sax parser is used to parse those information. This change may require to use DOM parser. It is a big change.
Even with a SAX parser, the implementation could keep track of all elements with "id" attributes and anytime a "references" element is encountered, substitute with that node. The tricky part would be when we encounter a references element before the actual element that declares the id -- would have to track the references that are unfulfilled and fill them in when we actually get to the id elements.
- Parent task deleted (
#6114)
Also available in: Atom
PDF