Bug #4637
closedMetacat Harvester fails to catch some insert and update failures
0%
Description
Metacat Harvester is not catching all insert and update errors.
Recently at LTER, there have been a handful of documents that have been reported by Metacat Harvester as successful inserts or updates, but in fact the documents are not being successfully inserted or updated into Metacat.
The logical error is in method HarvesterDocument.putMetacatDocument(). The problem is that Harvester treats the absence of an exception as a success condition, when it should instead require hard confirmation of success from the Metacat client that the insert or update operation succeeded:
if (harvester.connectToMetacat()) {
try {
if (insert) {
metacatReturnString = metacat.insert(docidFull, stringReader, null);
inserted = true;
harvester.addLogEntry(0, docidFull + " : " + metacatReturnString,
"harvester.InsertDocSuccess",
harvestSiteSchedule.siteScheduleID,
null, "");
}
else if (update) {
metacatReturnString = metacat.update(docidFull, stringReader, null);
updated = true;
harvester.addLogEntry(0, docidFull + " : " + metacatReturnString,
"harvester.UpdateDocSuccess",
harvestSiteSchedule.siteScheduleID,
null, "");
}
}
catch (MetacatInaccessibleException e) {
logMetacatError(insert, metacatReturnString,
"MetacatInaccessibleException", e);
}
catch (InsufficientKarmaException e) {
logMetacatError(insert, metacatReturnString,
"InsufficientKarmaException", e);
}
catch (MetacatException e) {
logMetacatError(insert, metacatReturnString, "MetacatException", e);
}
catch (IOException e) {
logMetacatError(insert, metacatReturnString, "IOException", e);
}
Harvester does not check the value of the string returned by Metacat ('metacatReturnString' in the above code). In the cases where the insert/update operations have been failing, the return string is empty or null. Harvester should examine the return string to confirm that it contains the substring "<success>" or something similar.
The fact that no exception is thrown by Metacat could point to an additional problem in Metacat, since the insert/update operation completes without raising an exception even though the document is not inserted or updated. The documents that appear to trigger this condition are unusually large EML documents (currently there are three documents from CDR and one document from LUQ that trigger this bug).
After the Harvester bug is resolved, or as part of resolving it, further investigation should be done to determine whether there is also a Metacat bug involved here, and if there is, a separate bug entry should be entered for it.