Bug #7188
Updated by Chris Jones over 7 years ago
Laura Moyers reported that she is seeing many failed replication attempts in the Coordinating Node index. In particular, KNB, GOA, UIC, ARCTIC, mnUCSB1, and mnORC1 are all affected, and are all running Metacat. After looking at catalina.out on the MNs, we're seeing errors in @MNodeService.replicate()@: <pre> 20170508-06:59:14: [ERROR]: Error computing checksum on replica: mark/reset not supported [edu.ucsb.nceas.metacat.dataone.MNodeService] </pre> Here's the number of requests and failures <pre> host requests failures failures_since ----------------------------------------------------------- mn-orc-1 145 25 20170511-01:23:48 mn-ucsb-1 105 56 20170508-06:59:14 mn-unm-1 0 0 - knb 71 28 20170509-16:57:53 uic no log access </pre> I'm pretty sure the failures represent 100% of the requests since the failures began, but we'd need to confirm this. Basically, MN replication looks to be entirely broken in Metacat. The error reported above comes from line 866 of @MNodeService.java@, where the checksum of the bytes of the object from the source MN (to be replicated) is calculated. Once the checksum is calculated, we call @object.reset()@ on the input stream so it can be read again when writing to disk. This is throwing the exception above. So what's changed? The last changes regarding the @InputStream@ was that Jing wrapped the calls in a @try{ } finally { }@ block in order to ensure the input stream gets closed after use to prevent memory leaks. This doesn't seem like an issue at all, although the @finally{ }@ block could have been used in the existing @try { }@ block instead of having three levels of @try@ nesting. This seems inconsequential incosequential though. The other change is that @d1_libclient_java@ is now using the Apache Commons IO @AutoCloseInputStream@. Looking at the documentation there, it seems to delegate to the underlying input stream implementation. We know that not all input streams support the @mark()@ method and therefore can't be @reset()@, which is why we call @markSupported()@ before attempting to calculate the checksum. So why is @markSupported()@ succeeding, but then @reset()@ is failing after reading the input stream? It seems like we need to track this down between the interaction of @MNodeService@ and @MultipartMNode.getReplica()@.