Metacat: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362017-03-23T22:09:14ZEcoinformatics Redmine
Redmine Bug #7176 (Closed): Metacat-index RDF/XML subprocessor not populating prov_hasDerivations fieldhttps://projects.ecoinformatics.org/ecoinfo/issues/71762017-03-23T22:09:14ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>The package <a class="external" href="https://dev.nceas.ucsb.edu/#view/urn:uuid:c7cda366-5658-4350-ba5a-8d2b84829f5d">https://dev.nceas.ucsb.edu/#view/urn:uuid:c7cda366-5658-4350-ba5a-8d2b84829f5d</a> has one prov relationship 'urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3 <a class="external" href="http://www.w3.org/ns/prov#wasDerivedFrom">http://www.w3.org/ns/prov#wasDerivedFrom</a> urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48'</p>
<p>From this prov relationship, the Solr index 'prov_hasDerivations' field for urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48 should be set to urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3.<br />See <a class="external" href="https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48%22">https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:146239cd-2f41-4312-8f90-75c8cad09a48%22</a></p>
<p>However, the prov_wasDerived from field (the reciprocal relationship) is set for the derivation: <a class="external" href="https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3%22">https://dev.nceas.ucsb.edu/knb/d1/mn/v1/query/solr/?q=id:%22urn:uuid:94cb9677-be83-4873-aa7c-6691e32229a3%22</a></p>
<p>The problem may be related to the 'prov_hasDerivations' SPARQL query in metacat-index/src/main/resources/application-context-prov-base.xml:<br /> <bean id="prov20150115.hasDerivations" class="org.dataone.cn.indexer.annotation.SparqlField"><br />...<br /> SELECT (str(?pidValue) as ?pid) (str(?derivedDataPidValue) as ?prov_hasDerivations)<br /> FROM <$GRAPH_NAME><br /> WHERE {<br /> ?derived_data prov:wasDerivedFrom ?source_data .<br /> ?source_data cito:documentedBy ?source_metadata .<br /> ?source_metadata dcterms:identifier ?pidValue .<br /> ?derived_data dcterms:identifier ?derivedDataPidValue .<br /> }</p>
<p>Not sure why the 'source_metadata' is included in this query. Also, this query is not the<br />reciprocal of the 'prov_wasDerivedFrom' query:</p>
<pre><code>&lt;bean id="prov20150115.wasDerivedFrom" class="org.dataone.cn.indexer.annotation.SparqlField"&gt;<br />...<br /> SELECT (str(?pidValue) as ?pid) (str(?wasDerivedFromValue) as ?prov_wasDerivedFrom)<br /> FROM <$GRAPH_NAME><br /> WHERE { <br /> ?derived_data prov:wasDerivedFrom ?primary_data .<br /> ?derived_data dcterms:identifier ?pidValue . <br /> ?primary_data dcterms:identifier ?wasDerivedFromValue .<br /> }</code></pre>
<p>Note that this will also be a problem for the CN DataONE d1_cn_index_processor component.</p>
<p>The resource map for this package has been included.</p> Bug #7083 (Closed): Metadata/data objects which have obsoletedBy field ignore the resource map in...https://projects.ecoinformatics.org/ecoinfo/issues/70832016-08-08T21:17:39ZJing Taotao@nceas.ucsb.edu
<p>Hi Bryce:</p>
<p>I looked at the index of the 16 objects and found 5 of them don't have the value of resource_map_urn:uuid:2e3c8c4c-e606-4710-b321-8edc4d506b0a at the resourceMap element:</p>
<p>urn%3Auuid%3A0f64673d-d270-411f-a5ed-98351d3d9450<br />urn%3Auuid%3A12c0ab6a-5eb3-43de-a16c-e71acaeb9817<br />urn%3Auuid%3A45ee065f-746e-4780-872b-d98cabeb0ad7<br />urn%3Auuid%3Aae90efa8-3cf5-4ff9-9637-c7be28b06541<br />urn%3Auuid%3Accebed0b-6bdb-4853-ba2a-6e88321ea4d5</p>
<p>So this is the reason you only get 11 documents when you query this resource map value.</p>
<p>And all of the five objects have the field "obsoletedBy" and the other 11 object don't have the field.</p>
<p>The reason why I looked at the field "obsoletedBy" is I recently found that there was a bug in the d1_cn_index_processor component - when you index a resource map, the component in the resource map will ignore the resource map if it has the "obsoletedBy" field. So this issue sounds like the reflection of this bug.</p>
<p>I will look at the metacat index code to make sure.</p>
<p>Thanks,</p>
<p>Jing</p>
<p>On 8/8/16 12:13 PM, Bryce Mecum wrote:</p>
<blockquote>
<p>So @scng got a hold of me to ask about strange behavior where there package table on two dataset pages are not showing the right number of files. This is a write up of what she told me and what I found so that someone else, <a class="user active" href="https://projects.ecoinformatics.org/ecoinfo/users/293">Jessica Couture</a> or <a class="user active" href="https://projects.ecoinformatics.org/ecoinfo/users/8">Chris Jones</a> can see about addressing it. This is a blocker on Bill Simpson's ticket RT12930.</p>
<p>This applies to two packages:</p>
<p>O-Buoy 8 (needs link)<br />O-Buoy 15</p>
<p>These two packages were recently updated to make them editable (adding otherEntity elements to the EML) by @scng using the R package.</p>
<p>If you look at O-Buoy 15, you'll see ten data objects in the package. However, the R @scng wrote intended to add 15 data objects to the package. If you look at the resource map, resource_map_urn:uuid:2e3c8c4c-e606-4710-b321-8edc4d506b0a, you'll see it aggregates+documents 16 PIDs (metadata + 15 data):</p>
<p>Here's an invalid and abridged section from the resource map, converted to Turtle format before pasting here:<br />...<br />ore:aggregates <<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A0f64673d-d270-411f-a5ed-98351d3d9450">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A0f64673d-d270-411f-a5ed-98351d3d9450</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A12c0ab6a-5eb3-43de-a16c-e71acaeb9817">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A12c0ab6a-5eb3-43de-a16c-e71acaeb9817</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A1584c53e-3d5c-4b70-9bf6-1033de8e2fd1">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A1584c53e-3d5c-4b70-9bf6-1033de8e2fd1</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A1c2d1c50-4d79-4fe5-b650-024e63818336">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A1c2d1c50-4d79-4fe5-b650-024e63818336</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A2e3c8c4c-e606-4710-b321-8edc4d506b0a">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A2e3c8c4c-e606-4710-b321-8edc4d506b0a</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A30a3a76c-c965-4594-8cfd-c652d46ebbe5">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A30a3a76c-c965-4594-8cfd-c652d46ebbe5</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A40d6e8e4-83eb-4579-8b00-90bf28282769">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A40d6e8e4-83eb-4579-8b00-90bf28282769</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A45ee065f-746e-4780-872b-d98cabeb0ad7">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A45ee065f-746e-4780-872b-d98cabeb0ad7</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A4eb92d77-19f4-4a3a-8468-4022926ea4e2">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A4eb92d77-19f4-4a3a-8468-4022926ea4e2</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A6d57e765-32a0-4a3e-ba12-5e681f92b7e5">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A6d57e765-32a0-4a3e-ba12-5e681f92b7e5</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A73926857-7d7c-4a6e-bce3-1556bd98df01">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A73926857-7d7c-4a6e-bce3-1556bd98df01</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A770eb22d-88bb-4c6f-9016-283f4ff7a518">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A770eb22d-88bb-4c6f-9016-283f4ff7a518</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A8539eac4-21f5-4a3a-8c0a-5ad7249cf38c">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3A8539eac4-21f5-4a3a-8c0a-5ad7249cf38c</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Aae90efa8-3cf5-4ff9-9637-c7be28b06541">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Aae90efa8-3cf5-4ff9-9637-c7be28b06541</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Accebed0b-6bdb-4853-ba2a-6e88321ea4d5">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Accebed0b-6bdb-4853-ba2a-6e88321ea4d5</a>><br /><<a class="external" href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Ad54c9d42-99ce-415b-ac7c-a2b3498eb7af">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Ad54c9d42-99ce-415b-ac7c-a2b3498eb7af</a>> ;<br />...</p>
<p>So it looks like the Resource Map is correct which makes sense because it was generated using the R package.</p>
<p>The package view uses the Solr query resourceMap:{RESOURCE_MAP} to fill in the table. If you run this query you see the 11 objects, not 16. This explains the table view not showing all the files.</p>
<p>If you look at the documents section of the metadata object's Solr doc, you'll see the 16 objects it documents (itself + 15 data objects.</p>
<p>So what's going on here? Am I wrong to think that it's just the index that is showing the wrong information?</p>
<p>I have forced a reindex with no change<br />I have not checked the arctica logs for any errors</p>
</blockquote> Bug #6424 (Closed): Obsoleted objects not marked in indexhttps://projects.ecoinformatics.org/ecoinfo/issues/64242014-02-25T23:59:12Zben leinfelderleinfelder@nceas.ucsb.edu
<p>Looks like between 2.3.1 and 2.4.0 a lot of indexing code was commented out or removed and I believe some of it was to update indexed documents when newer versions are added.<br />I don't see anything that will update the solr index for the objected identified by an entry in SM.obsoletes. So we end up with two versions of the same objects in the index (and the UI).</p> Task #5923 (Closed): Discover missed documents and queue them for indexinghttps://projects.ecoinformatics.org/ecoinfo/issues/59232013-04-25T18:05:51ZJing Taotao@nceas.ucsb.edu
<p>Metacat-index needs a mechanism to regenerate the solr index for those missed during the insert or update. This mechanism also will be used to generate index for upgrading metacat from metacat 2.0.6 or pervious version to 2.1.0</p> Feature #5910 (Closed): Build index from scratchhttps://projects.ecoinformatics.org/ecoinfo/issues/59102013-04-12T16:24:18Zben leinfelderleinfelder@nceas.ucsb.edu
<p>The index will need to be completely built when:<br />- Initial upgrade to version that supports solr indexing<br />- Unexpected loss of the index<br />- Massive change to the solr schema</p>