Project

General

Profile

Task #6040

Metacat-index does not handle <references>

Added by ben leinfelder almost 6 years ago. Updated over 5 years ago.

Status:
New
Priority:
High
Assignee:
Category:
index
Target version:
Start date:
07/25/2013
Due date:
% Done:

0%

Estimated time:

Description

I indexed a document from EVOS that uses a reference for a creator rather than the details of the person:

<creator><references>1359152217358</references></creator>

But in the index it shows up as "||" instead of following the reference back the the id where it was declared:
<associatedParty id="1359152217358">...

http://evos.nceas.ucsb.edu/evos/metacat/df35c.9.14/default

History

#1 Updated by ben leinfelder almost 6 years ago

Here is a bit of the bean definition used by indexing to pick out the content from EML

<bean id="eml.origin" class="org.dataone.cn.indexer.parser.CommonRootSolrField" 
        p:multivalue="true" 
        p:root-ref="originRoot">
        <constructor-arg name="name" value="origin" />
    </bean>

    <bean id="originRoot" class="org.dataone.cn.indexer.parser.utility.RootElement" 
        p:name="origin" 
        p:xPath="//dataset/creator" 
        p:template="[individualName]||[organizationName]">
        <property name="leafs"><list><ref bean="organizationNameLeaf"/></list></property>
        <property name="subRoots"><list><ref bean="individualNameRoot" /></list></property>
    </bean>

#2 Updated by ben leinfelder almost 6 years ago

  • Target version changed from 2.1.0 to 2.1.1

#3 Updated by ben leinfelder almost 6 years ago

  • Target version changed from 2.1.1 to 2.2.0

#4 Updated by ben leinfelder almost 6 years ago

  • Target version changed from 2.2.0 to 2.2.1

#5 Updated by ben leinfelder over 5 years ago

  • Target version changed from 2.2.1 to 2.2.0
  • Priority changed from Normal to High

Apparently this is fixed in cn-index-processor v1.2.0 -- so we will need to pull in this newer dependency in metacat-index and adjust the code accordingly.

#6 Updated by ben leinfelder over 5 years ago

  • Target version changed from 2.2.0 to 2.2.1

#7 Updated by ben leinfelder over 5 years ago

  • Parent task set to #6114

This is included in the 1.2.0 d1 index release. It will not include || but instead will use blanks. Not a very great "solution" but better.

#8 Updated by Matt Jones over 5 years ago

Spaces aren't really sufficient as a solution, and there are a lot of references fields in EML. We probably need to contribute a fix for this if Skye is not going to fix it for DataONE.

#9 Updated by Jing Tao over 5 years ago

Skye said that the sax parser is used to parse those information. This change may require to use DOM parser. It is a big change.

#10 Updated by ben leinfelder over 5 years ago

Even with a SAX parser, the implementation could keep track of all elements with "id" attributes and anytime a "references" element is encountered, substitute with that node. The tricky part would be when we encounter a references element before the actual element that declares the id -- would have to track the references that are unfulfilled and fill them in when we actually get to the id elements.

#11 Updated by ben leinfelder over 5 years ago

  • Parent task deleted (#6114)

Also available in: Atom PDF