Ecoinformatics Redmine: Peter Slaughterhttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362017-10-12T17:23:33ZEcoinformatics Redmine
Redmine Metacat - Story #6437: Upgrade to SOLR 4 or 5https://projects.ecoinformatics.org/ecoinfo/issues/6437#change-231902017-10-12T17:23:33ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>The main reason for updating to at least Solr 4.1 is for pivot queries. Also, Solr 5.3 is needed for the update to the stats component that allows <br />percentiles (quantiles) to be calculates. This capability is needed for box/whisker plots of metadata quality info that will be added to<br />the MetacatUI profile pages.</p>
<p>It appears that it may be necessary to recreate the Solr index when upgrading, as the Lucene<br />index format has changed from v3 to v4 as described here: <a class="external" href="https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+3+to+Solr+4">https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+3+to+Solr+4</a>.<br />It looks like v4 can run with a v3 index, but all v4 features may not be available.</p>
<p>Solr v5 cannot run with a v3 index at all, so upgrading is mandatory, using the v4.1 'IndexUpgrader', describe here: <a class="external" href="https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5">https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5</a></p>
<p>So, one upgrade path is to upgrade from v3 to v5, and convert the index using 'IndexUpgrader'.</p> Metacat - Bug #7217 (New): Report on metadata creation date in metadata quality summarieshttps://projects.ecoinformatics.org/ecoinfo/issues/72172017-10-11T18:42:38ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>Indexing fields for metadata quality reports do not include the upload date of the metadata they are reporting on. Therefor, summaries that are created, i.e. mean score for a user over time, currently show the time of the creation of the quality report, not the metadata.</p>
<p>Add the field 'mdq.metadata.timestamp' to application-context-mdq.xml to hold the metadata creation or update time.</p>
<p>Each quality suite will be responsible for making this information available in the quality report, so that MDQClient.saveRun<br />can record it.</p> Metacat - Bug #7216 (New): MDQClient.saveRun doesn't obsolete existing quality documentshttps://projects.ecoinformatics.org/ecoinfo/issues/72162017-10-11T17:30:14ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>MDQClient.saveRun is called to upload a newly created quality document, in response to a metadata quality document being uploaded or updated.</p>
<p>When MDQClient.saveRun is called by MNodeService.update, it does not check if a quality document has already been created<br />for the metadata document. saveRun should check if a previous quality document has been created for the metadata, and obsolete it<br />with the new quality document. This will ensure that quality statistics are accurate, as obsoleted quality reports will not be<br />included in statistical calculations, as they are essentially duplicates.</p> Metacat - Bug #7212 (New): metacat-index missing metadata quality fieldshttps://projects.ecoinformatics.org/ecoinfo/issues/72122017-10-03T18:59:38ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>The Spring context file ./metacat-index/src/main/resources/application-context-mdq.xml doesn't contain a bean definition for the quality check types 'congruency' or 'dataFormats',<br />although these are check types that we should record results for. There is a bean for check type 'other', but this isn't sufficient.</p> Metacat - Bug #7201 (Closed): Some DataONE service packages not being reportedhttps://projects.ecoinformatics.org/ecoinfo/issues/72012017-06-27T21:11:47ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>The DataONE 'node' service requests a member node to provide a list of service packages that it supports, for example <a class="external" href="https://knb.ecoinformatics.org/knb/d1/mn/v2/node">https://knb.ecoinformatics.org/knb/d1/mn/v2/node</a><br />will list the implemented packages:<br />...<br /><services><br /><service name="MNCore" version="v1" available="true"/><br /><service name="MNCore" version="v2" available="true"/><br /><service name="MNRead" version="v1" available="true"/><br /><service name="MNRead" version="v2" available="true"/><br /><service name="MNAuthorization" version="v1" available="true"/><br /><service name="MNAuthorization" version="v2" available="true"/><br /><service name="MNStorage" version="v1" available="true"/><br /><service name="MNStorage" version="v2" available="true"/><br /><service name="MNReplication" version="v1" available="true"/><br /><service name="MNReplication" version="v2" available="true"/><br /></services><br />...</p>
<p>However, it appears that not all service packages are being reported, as KNB supports the 'getPackage' service that is part of "MNPackage", which does not appear.<br />Also, "MNView" and "MNQuery" do not appear.</p>
<p>The full list of course, is here: <a class="external" href="https://purl.dataone.org/architecture-dev/apis/MN_APIs.html">https://purl.dataone.org/architecture-dev/apis/MN_APIs.html</a></p> Metacat - Feature #7198 (New): Format solr engine description outputhttps://projects.ecoinformatics.org/ecoinfo/issues/71982017-06-02T17:30:11ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>The solr engine description information from DataONE has XSLT formatted output that includes a description of each field: <a class="external" href="https://cn.dataone.org/cn/v2/query/solr">https://cn.dataone.org/cn/v2/query/solr</a>. The corresponding metacat index output does not: <a class="external" href="https://knb.ecoinformatics.org/knb/d1/mn/v2/query/solr">https://knb.ecoinformatics.org/knb/d1/mn/v2/query/solr</a>. It would be very useful to provide users with this info to help them use/learn solr and our index.<br />I wasn't able to find the .xsl file in metacat repo or the DataONE repo, so am not sure how to include this into metacat-index</p> Metacat - Revision 10225 (metacat): Disable indexing of 'prov_hasSources' fieldhttps://projects.ecoinformatics.org/ecoinfo/projects/metacat-5/repository/metacat/revisions/102252017-04-13T23:06:24ZPeter Slaughterslaughter@nceas.ucsb.eduMetacat - Revision 10224 (metacat): Fix problem with prov_hasSources not being indexedhttps://projects.ecoinformatics.org/ecoinfo/projects/metacat-5/repository/metacat/revisions/102242017-04-13T22:15:37ZPeter Slaughterslaughter@nceas.ucsb.eduMetacat - Bug #7181 (New): Verify completeness of unit test MetacatRdfXmlSubprocessorTesthttps://projects.ecoinformatics.org/ecoinfo/issues/71812017-04-11T23:41:43ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>Verify that all prov relationships that are indexed via src/main/resources/application-context-prov-base.xml are inspected by the unit test MetacatRdfXmlSubprocessorTest.java which reads src/test/resources/rdfxml-example.xml.</p> Metacat - Revision 10221 (metacat): Add check in iindex unit test for 'prov_hasDerivations' fieldhttps://projects.ecoinformatics.org/ecoinfo/projects/metacat-5/repository/metacat/revisions/102212017-04-10T22:05:14ZPeter Slaughterslaughter@nceas.ucsb.eduMetacat - Revision 10220 (metacat): Fix problem where 'prov_hasDerivations' field not being index...https://projects.ecoinformatics.org/ecoinfo/projects/metacat-5/repository/metacat/revisions/102202017-04-10T21:31:00ZPeter Slaughterslaughter@nceas.ucsb.eduMetacat - Bug #7176 (Closed): Metacat-index RDF/XML subprocessor not populating prov_hasDerivatio...https://projects.ecoinformatics.org/ecoinfo/issues/71762017-03-23T22:09:14ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>The package <a class="external" href="https://dev.nceas.ucsb.edu/#view/urn:uuid:c7cda366-5658-4350-ba5a-8d2b84829f5d">https://dev.nceas.ucsb.edu/#view/urn:uuid:c7cda366-5658-4350-ba5a-8d2b84829f5d</a> has one prov relationship 'urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3 <a class="external" href="http://www.w3.org/ns/prov#wasDerivedFrom">http://www.w3.org/ns/prov#wasDerivedFrom</a> urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48'</p>
<p>From this prov relationship, the Solr index 'prov_hasDerivations' field for urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48 should be set to urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3.<br />See <a class="external" href="https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48%22">https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48%22</a></p>
<p>However, the prov_wasDerived from field (the reciprocal relationship) is set for the derivation: <a class="external" href="https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3%22">https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3%22</a></p>
<p>The problem may be related to the 'prov_hasDerivations' SPARQL query in metacat-index/src/main/resources/application-context-prov-base.xml:<br /> <bean id="prov20150115.hasDerivations" class="org.dataone.cn.indexer.annotation.SparqlField"><br />...<br /> SELECT (str(?pidValue) as ?pid) (str(?derivedDataPidValue) as ?prov_hasDerivations)<br /> FROM <$GRAPH_NAME><br /> WHERE {<br /> ?derived_data prov:wasDerivedFrom ?source_data .<br /> ?source_data cito:documentedBy ?source_metadata .<br /> ?source_metadata dcterms:identifier ?pidValue .<br /> ?derived_data dcterms:identifier ?derivedDataPidValue .<br /> }</p>
<p>Not sure why the 'source_metadata' is included in this query. Also, this query is not the<br />reciprocal of the 'prov_wasDerivedFrom' query:</p>
<pre><code>&lt;bean id="prov20150115.wasDerivedFrom" class="org.dataone.cn.indexer.annotation.SparqlField"&gt;<br />...<br /> SELECT (str(?pidValue) as ?pid) (str(?wasDerivedFromValue) as ?prov_wasDerivedFrom)<br /> FROM <$GRAPH_NAME><br /> WHERE { <br /> ?derived_data prov:wasDerivedFrom ?primary_data .<br /> ?derived_data dcterms:identifier ?pidValue . <br /> ?primary_data dcterms:identifier ?wasDerivedFromValue .<br /> }</code></pre>
<p>Note that this will also be a problem for the CN DataONE d1_cn_index_processor component.</p>
<p>The resource map for this package has been included.</p> Metacat - Bug #7161 (Resolved): Uploading a resource map with provenance data causes an NPE durin...https://projects.ecoinformatics.org/ecoinfo/issues/71612016-11-22T23:21:58ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>When uploading a resource map with provenance relationships included, indexing exits with an NPE during processing of RdfXmlSubprocessor:</p>
<p>Java.lang.NullPointerException<br /> at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.getSolrDocs(RdfXmlSubprocessor.java:278)<br /> at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.process(RdfXmlSubprocessor.java:265)<br /> at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.processDocument(RdfXmlSubprocessor.java:119)<br />...</p>
<p>The problem appears to be caused by RdfXmlSubprocessor.getSolrDocs() calling httpService, which causes the NPE because an http solr server<br />is not run on metacat member node instances, which rely on the embedded solr server instead.</p>
<p>The included files are the resource map that was uploaded that caused the NPE, and a portion of the tomcat log file that includes<br />indexer TRACE info and the NPE.</p> Metacat - Task #6995 (New): Error message doesn't provide cause of errorhttps://projects.ecoinformatics.org/ecoinfo/issues/69952016-03-23T21:43:52ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>Several services that require authentication return error descriptions that don't indicate the cause of the error, for example for <br />MNStorage.generateIdentifier(), if the token is expired or corrupted, etc:</p>
<p><?xml version="1.0" encoding="UTF-8"?><error detailCode="2190" errorCode="401" name="InvalidToken"><br /> <description>Session is required to generate an Identifier at this Node.</description><br /></error></p>
<p>The description shown in the D1 API docs does provide enough info for the user to troubleshoot the problem:</p>
<p>"The supplied authentication token is not a proper certificate, or missing required fields, or otherwise proves invalid."</p>
<p>It would be helpful if this description is used instead of the "Seesion is required" message.</p>
<p>Other services return similar "Session..." messages, such as MNStorage.create(), MNStorage.update()</p> MetacatUI - Task #6992: Replace 'authenticatoin_token' text depending on deployment environmenthttps://projects.ecoinformatics.org/ecoinfo/issues/6992#change-227332016-03-23T16:17:37ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>The separate production and test tokens support the use case where a user is concurrently working in both environments, which has been seen in several <br />instances recently. Switching the token definition<br />between two values is cumbersome and error prone, partially because it is not possible for the client to inspect the token to determine which environment<br />is is valid for. This feature is really just for user convenience. There are more details in the github issue that show a user's perspective (an expert user BTW).</p>