Project

General

Profile

Bug #7188

Updated by Chris Jones almost 7 years ago

Laura Moyers reported that she is seeing many failed replication attempts in the Coordinating Node index.    In particular, KNB, GOA, UIC, ARCTIC, mnUCSB1, and mnORC1 are all affected, and are all running Metacat. 

 After looking at catalina.out on the MNs, we're seeing errors in @MNodeService.replicate()@: 
 <pre> 
 20170508-06:59:14: [ERROR]: Error computing checksum on replica: mark/reset not supported [edu.ucsb.nceas.metacat.dataone.MNodeService] 
 </pre> 

 Here's the number of requests and failures 
 <pre> 
 host          requests          failures        failures_since 
 ----------------------------------------------------------- 
 mn-orc-1      145               25              20170511-01:23:48 
 mn-ucsb-1     105               56              20170508-06:59:14 
 mn-unm-1      0                 0               - 
 knb           71                28              20170509-16:57:53 
 uic           no log access 
 </pre> 

 I'm pretty sure the failures represent 100% of the requests since the failures began, but we'd need to confirm this.    Basically, MN replication looks to be entirely broken in Metacat. 

 The error reported above comes from line 866 of @MNodeService.java@, where the checksum of the bytes of the object from the source MN (to be replicated) is calculated.    Once the checksum is calculated, we call @object.reset()@ on the input stream so it can be read again when writing to disk.    This is throwing the exception above. 

 So what's changed? The last changes regarding the @InputStream@ was that Jing wrapped the calls in a @try{ } finally { }@ block in order to ensure the input stream gets closed after use to prevent memory leaks.    This doesn't seem like an issue at all, although the @finally{ }@ block could have been used in the existing @try { }@ block instead of having three levels of @try@ nesting.    This seems inconsequential incosequential though. 

 The other change is that @d1_libclient_java@ is now using the Apache Commons IO @AutoCloseInputStream@. Looking at the documentation there, it seems to    delegate to the underlying input stream implementation.    We know that not all input streams support the @mark()@ method and therefore can't be @reset()@, which is why we call @markSupported()@ before attempting to calculate the checksum.    So why is @markSupported()@ succeeding, but then @reset()@ is failing after reading the input stream?    It seems like we need to track this down between the interaction of @MNodeService@ and @MultipartMNode.getReplica()@.

Back