Oops, previous commit suffered from a happy trigger finger. During deleteReplicationMetadata(), don't delete the replica on the replica Member Node. Call CN.delete() for that functionality. This call just updates sytem metadata (according to the API description).
In setReplicationStatus() and UpdateReplicationMetadata(), don't allow a status state change from COMPLETED to anything other than INVALIDATED. This prevents the completed status from being overwritten due to race conditions.
Throw an exception when NOT allowed, not when allowed =).
Add a few logging statemnts for round trip replication metrics.
remove exception from method decl - was not matching the interface def and not compiling.
implement MN and CN.archive() method -- really just the existing delete() methods.https://redmine.dataone.org/issues/2674https://redmine.dataone.org/issues/2675
call MN.delete() for each replica when CN.delete() is calledhttps://redmine.dataone.org/issues/2676
include Session-less interface methods and updated jars that define them.
remove extraneous pid and permission parameters from isAdminAuthorized() method and make public so that it can be called in other locations - namely before our asynchronous replicate() implementation on the MN.
check for empty null (missing) node.subjectList. This should probably be a required element in the D1 schema, but it appears not. (ORNL entry was missing subjects in cn-dev environment)
just use the e.getMessage() as e.getCause() may be null (seeing NPE when testing via the MN IT tester)
Also allow MNs to set the FAILED status in setReplicationStatus(). this was an oversight on my part, trying to keep MNs that truly did succeed from overriding the COMPLETED status with FAILED.
use isAdminAuthorized() to check access to CN.create(). Note this method takes a pid and permission parameter and neither is used. Also removed the NotFound exception because it would never come up.
check that caller is CN/admin for CN.delete()https://redmine.dataone.org/issues/2506
include CN.delete()https://redmine.dataone.org/issues/2506
Notify each replica MN when critical portions of system metadata change so the MN can pull the latest copy into its store. AccessPolicy and RightsHolder changes are the most critical for the MN to keep updated on.
Modify CNodeService.setReplicationStatus() slightly to restrict MN-based calls to only set the status to COMPLETED. The CNs should be setting failures or invalidations, or the status can remain at QUEUED or REQUESTED, and the MNAuditTask can revisit those replicas as needed.
Add a notifyReplicaNodes() method that calls MNStorage.systemMetadataChanged() on MN replica nodes for a given object identifier. This will be called when there are changes to AccessPolicy and rights holder since these are critical access metadata for an MN, but they can only be changed on the CN.
In setReplicationStatus(), first check for a replica target MN subject match with the session subject. If this fails, look to see if CN admin access is allowed. Otherwise throw NotAuthorized. Addresses https://redmine.dataone.org/issues/2494
Remove individual calls to isAdminAuthorized() in favor of the centralized isAuthorized() call that handles it now.
check for null Session before continuing with setReplicationStatus()https://redmine.dataone.org/issues/2476#note-3
throw not authorized when attempting to getReplica as an invalid/non-existent node
throw InvalidToken when an invalid Permission is passed in. THis requires that internal calls to the method also check for this exception.https://redmine.dataone.org/issues/2388
CNodeService.listChecksumAlgorithms() was returning null rather than the list. Fixed.
use RC-1 Dataone jars
For MNs that haven't set the archived flag to false on create(), set it here. Also, ensure that the CN sync code sets the authoritative and origin member node fields.
include new methods needed for replication (in new d1 jars)https://redmine.dataone.org/issues/2203
add method: setObsoletedBy (https://redmine.dataone.org/issues/2185)augement new method: deleteReplicationMetadata
add method: deleteReplicationMetadataremove method: assertRelationupdate the D1 jarshttps://redmine.dataone.org/issues/2187https://redmine.dataone.org/issues/2158
Simplify setReplicationStatus() to not call updateReplicationMetadata() if a replica doesn't exist. Just create it and update the system metadata, which we already have a lock for.
Update the CN methods to throw a VersionMismatch where the API changed (where serialVersion is a required parameter). These were previously throwing an InvalidRequest exception.Change the exception handling for calls to Hazelcast to catch a RuntimeException (not Exception) so we don't catch exceptions that we purposefully throw....
Don't lock() on the map.get() in isNodeAuthorized() (this assumes that the CN has queued the task already). Add more lock/unlock debug statements, and fix setReplicationStatus() - I missed a finally statement to unlock the pid.
Modify CNReplication methods setReplicationStatus(), updateReplicationMetadata() and setReplicationPolicy() to allow administrative access from a Coordinating Node by calling isAdminAuthorized().
In registerSystemMetadata(), lock the pid prior to calling map.containsKey(pid) since a put to the map could occur between the check and the subsequent put().
Use Lock instead of ILock to be consistent across classes.
After reviewing CNodeService and D1NodeService prompted by Robert comparing the Hazelcast locking with the d1_synchronization locking, I've made a number of changes that will prevent locking problems:
1) Multiple methods contained try/catch blocks that would:...
only attempt to unlock a lock if it was created (in the finally block)
new jars with many changes -- including new CN methods: ping, describe, listChecksumAlgorithm. Removed MN.setAccessPolicy. Refactored CN.setOwner() to CN.setRightsHolder().
Change setReplicationStatus() to drop serialVersion and report the failure exception message in the CN log.
updated D1 API -- removed Permission.REPLICATE and associated parameters
If a member node cannot be found in the node list matching the targetNodeSubject given in isNodeAuthorized(), throw a ServiceFailure exception.
Add log statements for each call to ILock.unlock() for debugging.
When using ILock.lock(), get a lock on the string value of the Identifier, not the Identifier object itself. Hazelcast locking won't work otherwise.
Use the Hazelcast ILock mechanism to lock the system metadata identifier rather than using IMap.lock(pid).
when comparing D1 Subject objects, use the equals() method not direct string comparisonhttps://redmine.dataone.org/issues/2050
access nodeList list correctlyhttps://redmine.dataone.org/issues/2049
Use Subject.equals() when comparing DNs rather than CertificateManager.equalsDN(). Don't lock the pid in isNodeAuthorized() to debug for timeout issues. Minor debugging changes.
Minor logging for isNodeAuthorized(), and compare subjects properly. Change this to Subject.compareTo() when it is vetted.
Catch RuntimeExceptions thrown by Hazelcast as opposed to general Exceptions to we don't catch exceptions we're trying to throw.
generalize exception handling -- add cause detail
Changes to setReplicationStatus and isNodeAuthorized(), working out minor bugs in replication.
include exception cause when throwing new exception (combine RuntimeException in Exception handling -- they are almst identical)
Calls to setReplicationStatus() can only be made by a CN or the MN that is the target replica node. Implement this service restriction in CNodeService using CertificateManager's equalsDN() method.
Added stack trace debugging for CNodeService.isNodeAuthorized() for tracking down replication issues.
Fix cast to List<Node> in isNodeAuthorized().
upgrade to 1.0.1-SNAPSHOT DataONE jars
Update CNodeService to use the serialVersion parameter and compare it to the current serialVersion of the system metadata found in the hzSystemMetadata map. Throw an InvalidRequest exception if they are not equal. This affects updateReplicationMetadata(), setReplicationStatus(), setReplicationPolicy(), setAccessPolicy(), and setOwner().
Add updateReplicationMetadata() to the CN service implementation. This was missing from the API, and likely never called. It fully replaces the given replica item in the list of replicas in system metadata.
Minor indentation cleanup.
Add setAccessPolicy() to CNodeService since the CN should only make changes to access policies for objects registered with the D1 system. Increment the serial version after locling and getting the most up to fdate system metadata. Note: CCIT meeting decision says the serial version of the system metadata (during the change) should equal the current serial version, but setAccessPolicy() does not pass in the entire system metadata object, so there's no way to check. For now, increment the latest system metadata from the hzSystemMetadata map.
In CNodeService, separate the CN.create() functionality from the MN.create() functionality while still using the superclass to call create(). Deal with Hazelcast locks and setting serial versions only in the CN implementation.
Change updateSystemMetadata() to evaluate the incoming system metadata serial version against that found in the hzSystemMetadata map. If they are the same, do the update. If not, throw an InvalidRequest explaining that they need the most current version.
Modify CNodeService's registerSystemMetadata() with support for SystemMetadata's serialVersion field. Also, use the hzSystemMetadata map for all system metadata reads using a lock on the pid in order to get the very latest version. This affected isNodeAuthorized(), getChecksum(), and assertRelation(). Since we're using Hazelcast, exceptions are masked as RuntimeException, so throw a ServiceFailure with the underlying message.
Modify CNodeService's updateSystemMetadata(), setReplicationStatus(), setReplicationPolicy(), and setOwner() with support for SystemMetadata's serialVersion field. Other methods still pending an update. Use the hzSystemMetadata map for all system metadata reads using a lock on the pid in order to get the very latest version.
add User-Agent logging to support D1 requirements
log errors on create() and registerSM
Don't use the hzNodes map yet (as a hazelcast client). Use D1Client instead to get the node list in isNodeAuthorized().
Reverting previous @Overrides chanrge from r6470, as that is the desiredbehavior under Java 1.6 -- previous versions of Java (e.g., 1.5) will notcomile with this usage of the @Overrides annotation, but the currentlysupported version will. So reverting to the 1.6 convention.
Removing incorrect @Override annotations that were preventing compilation. The methods marked did not actually override a method in the superclass, so they were not compiling. I think @Overrides was being mistaken for methods that implement an interface but aren't actually in the superclass.
catch runtime exceptions that arise from hazelcast storage errors in the system metadata map
Lock the system metadata entry in hzSystemMetadata when calling setReplicationPolicy().
Lock the system metadata entry in hzSystemMetadata when calling registerSystemMetadata().
Remove references to CNReplicationTask.
Change isNodeAuthorized() to query the hzSystemMetadata map rather than the hzPendingreplicationTasks map. The latter isn't needed for authorization since the ReplicationStatus for each Replica in SystemMetadata lists the status of the replica and can be queried.
move CNReplicationTask to the hazelcast package
only "save" to the shared system metadata map - not directly to the table store.
rely on Hazelcast to store the SystemMetadata locally for the node. Entry event listeners store the shared system metadata on their local node when alerted. TODO: remove old replication code that included system metadata xml when replicating scimeta and data
move bulk of the Hazelcast code into HazelcastService from CNodeService so that it is centrall located - easier to manage and configure
initialize Hazelcast from the custom configuration when initializing the Metacat service.
handle entryAdded and entryUpdated the same - update the entry if it exists, otherwise create it
handle entryAdded (to hzSystemMetadata) to store sysmeta to local store when it is not already present
add code to handle new entry when it is not on the local member of the sysmeta cluster
comment out processCluster connections that use hzClient until that is finalized
use pending replication task queue to check if node is authorized for replication. moved from old ReplicationService code
Add stub methods in CNodeService that implement the Hazelcast EntryListener interface: entryAdded(), entryRemoved(), entryUpdated(), and entryEvicted(). Add a listener to the hzSystemMetadata map so the CNodeService can respond to those events and create appropriate CNReplicationTask objects for distributed execution across the CN cluster. Again, stubs only so far.
Minor cleanup - tabs to spaces.
Enable CNodeService to access 1) the hzNodes map defined in the DataONE process cluster by becoming a Hazelcast client (hzClient) to that cluster and 2) the hzSystemMetadata map defined in the DataONE storage cluster by becoming a member to that cluster (using direct Hazelcast calls). Added fields for maintaining the DataONE cluster properties.
changes for schema update (d1_common)
Update classes to use the DataONE 0.6.4 schema and types. Major changes involve using BigInteger vs long in SystemMetadata.size, and using ObjectFormatIdentifier rather than Object format.
latest D1 jars - changes include:updateSystemMetadata() impl for CNnew identifier methods (generate is its own method)removal of the resourceMap pointer from system metadata
use new "v1" types from DataONE
add hasReservation() method (NotImplemented, however)
use objectFormatIdentifier for listObjects()remove provisional system metadata indicator - Metacat will not implement reserveIdentifier()
allow very minimal system metadata for provisional entries (CN.reserveIdentifier)
remove resolve() test -- not implemented in Metacat
catch exceptions from system meta data query and throw service failure rather than swallowing them with an error msg
use super class' create() methoduse string comparison for assertRelation method
return all public objects for the search() method [for now]
add the old ecogrid query code (still commented out) from the old Rest handler