use revision provided in the docid when looking up guid. had been using latest revision which I think incorrectly reports on the log history.noticed this when looking at: https://redmine.dataone.org/issues/2444
A minor change to isAuthorized() - compare each Person in the SubjectInfo (not just the primary Subject) since each person could have an equivalent identity mapped to the primary Subject. Add debug logging for the comparison.
added debug logginghttps://redmine.dataone.org/issues/2429
check if verified flag is null before evaluating (NPE during MN Auth test)https://redmine.dataone.org/issues/2429
Globally change the property 'dataone.memberNodeId' to 'dataone.nodeId'. This is more useful for both MNs and CNs implemented in Metacat. Also, change D1NodeService.getLogRecords() to return log entries with the actual node id rather than the IP address (looks like a cut/paste error)....
throw InvalidToken when an invalid Permission is passed in. THis requires that internal calls to the method also check for this exception.https://redmine.dataone.org/issues/2388
do not allow blank node references to be used.https://redmine.dataone.org/issues/2362
only generate system metadata when the call comes from the legacy Metacat API, not the D1 API.https://redmine.dataone.org/issues/2362 (I think this was the culprit)
Get ReplicationPolicy correctly generated:-tweak the regular expression for getting the pref/blocked node list for default replication policy.-set blocked list (had mistakenly been two calls to set pref list)
actually, let's set the serialVersion during the MN.create() call so that the HZ map and the backing store have the same information immediately. Also, this is how the docs specify it.http://mule1.dataone.org/ArchitectureDocs-current/design/SystemMetadata.html
remove ID mapping when a create()/"insert" call fails so that subsequent calls do not return an IdentifierNotUnique error. In this case it was due to invalid XML.https://redmine.dataone.org/issues/2341
use RC-3 DataONE jars and fix compilation error that arose. https://redmine.dataone.org/issues/2351
CNodeService.listChecksumAlgorithms() was returning null rather than the list. Fixed.
Update D1NodeService to reflect new ObjectFormatCache signature.
1. lookup and use the guid when processing obsoletes/obsoletedBy entries -- had previously been assuming localId==guid but now that we have introduced DOIs as part of the Metacat upgrade process, we may have DOIs for the guid that map to localIds.2. base ORE guids on the localid of the data package they are describing and not on their DOI -- otherwise we might mash up the DOI prefix (or other id scheme that we are unaware of). By using resourceMapPreix + localId we are sure to have a valid localid and guid for the ORE map we create and add to the system
use updated authorization policies as discussed in:https://redmine.dataone.org/issues/2277andhttp://epad.dataone.org/20120131-authn-authz-questions
remove createAndInsertSystemMetadat() method that acts on a single localId -- incorporated this into the localId-list-based method.
refactor IdentityManager.createSystemMetadata(sm) to be insertSystemMetadata(sm) so that it is clear that this method inserts the SM object into the backing store. This differentiates it from the "generation" methods we use when we need to create SM about pre-existing objects or objects we get from non-D1 api calls.
generate SystemMetadata during D1 registration (not 2.0.0 upgrade). This process runs in a thread and updates a metacat.properties value when it is complete.
match changes to MN service methods (return type as boolean)
Added new methods to generate a default replication policy based on properties from the metacat configuration. This is called during system metadata creation for objects that lack any system metadata.
Clean up warnings in class.
handle "BIN" objects so as to avoid repeated calls to lookup the non-existent ObjectFormat
catch cases where the previous/next revision of objects have not had system metadata generated yet
create system metadata object if it wasn't found in HZ
multithreaded implementation for processing docids for system metadata generation.need to investigate ant/junit running that deadlocks hazelcast (config?)
additional logging of the config file being used - seem to have thread locking on the xmlConfig use when running under ant/junit
calculate object size using the size on the file system rather than re-reading as an input stream.Now only EML document bytes will be read twice: once for the checksum and again for parsing out datapackage details
system metadata generation optionally skips entries that have already been generated (data size, checksum) but allows the latest EML that describes them to have the last word on object format
remove DML for parsing -- the D1 EML parser still uses DOM, so this may not be too big of a perfromance improvement
fix a bug in MNodeService.replicate() where the checksum value was being compared to the computed checksum object, not its value.
use UTC serialization for log entries so that the timestamp, not just the date, is preservedhttps://redmine.dataone.org/issues/2257
In MN.getCapabilities(), the required contact subject was not being added to the node instance from the dataone properties. Add it in.
use RC-1 Dataone jars
try to read the local document before making the localid->guid mapping (in cases where we fail to read the data locally like if it is referenced in an EML file but does not exist on this Metacat instance)
For MNs that haven't set the archived flag to false on create(), set it here. Also, ensure that the CN sync code sets the authoritative and origin member node fields.
On MN.create(), set the archived flag to be false. This field isn't required in the schema, but is needed by the DataONE indexer once objects are sync'd.
refactor generate system meta loop to the factory class -- to be reused in sysmeta and ORE generationhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
When managing obsoletes/obsoletedBy system metadata fields, set the archived flag to false initially, and set it to true on system metadata for objects that a revision obsoletes.
check that the resourceMap (based on Id only) does not currently exist in the local metacat when generating OREs
In IdentifierManager.updateSystemMetadata(), add a check for invalid system metadata (fields that throw a NullPointerException on access) to ensure that system metadata is populated correctly. Updated calling classes to handle the exception.
Handle SQLExceptions when trying to save system metadata locally.
Convert SQLExceptions to RuntimeExceptions for Hazelcast MapStore operations.
Keep the hzIdentifiers set in sync with the Metacat systemmetadata table. If entries are added/updated in the hzSystemMetadata map, make sure the identifier is in the set. If (for some administrative reason) the entry is removed, remove the identifier from the set. This usually doesn't happen.
When loading all keys from Metacat into the hzSystemMetadata map, also load identifiers into the hzIdentifiers set if they are not already there. Although entries may be evicted from the map, the list of identifiers will remain. The list will have a fairly small memory footprint since it's just identifiers.
Add support for the distributed Set of unique identifiers in the storage cluster called 'hzIdentifiers'. This set is a persistent total list of all identifiers (even when entries in the hzSystemMetadata map are evicted). It reflects the state of the identifiers in the postgresql systemmetadata table, but is distributed across the cluster. Add the getIdentifiers() method, which returns the ISet of identifiers.
include new methods needed for replication (in new d1 jars)https://redmine.dataone.org/issues/2203
add method: setObsoletedBy (https://redmine.dataone.org/issues/2185)augement new method: deleteReplicationMetadata
add method: deleteReplicationMetadataremove method: assertRelationupdate the D1 jarshttps://redmine.dataone.org/issues/2187https://redmine.dataone.org/issues/2158
Simplify setReplicationStatus() to not call updateReplicationMetadata() if a replica doesn't exist. Just create it and update the system metadata, which we already have a lock for.
Minor null checks to avoid NPEs when calling replicate()
Don't throw a NotAuthorized exception in isAdminAuthorized() - just return false.
do not download and save remote data resources which are HTML but are not expected to be such (login or info/splash pages before data content).http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
Update the CN methods to throw a VersionMismatch where the API changed (where serialVersion is a required parameter). These were previously throwing an InvalidRequest exception.Change the exception handling for calls to Hazelcast to catch a RuntimeException (not Exception) so we don't catch exceptions that we purposefully throw....
Use a Logger instead of System.out for SystemMetadataMap.
Don't lock() on the map.get() in isNodeAuthorized() (this assumes that the CN has queued the task already). Add more lock/unlock debug statements, and fix setReplicationStatus() - I missed a finally statement to unlock the pid.
Modify CNReplication methods setReplicationStatus(), updateReplicationMetadata() and setReplicationPolicy() to allow administrative access from a Coordinating Node by calling isAdminAuthorized().
Add isAdminAuthorized() to D1NodeService to check if the operation is being requested from a CN. Consult the NodeList from the CN and test the NodeType of the given node and the X509 certificate Subject. Perhaps we should expand this to also check for service-level access in the future.
In registerSystemMetadata(), lock the pid prior to calling map.containsKey(pid) since a put to the map could occur between the check and the subsequent put().
Use Lock instead of ILock to be consistent across classes.
After reviewing CNodeService and D1NodeService prompted by Robert comparing the Hazelcast locking with the d1_synchronization locking, I've made a number of changes that will prevent locking problems:
1) Multiple methods contained try/catch blocks that would:...
use inherited access control from EML for the data file we download from a remote sourcehttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
download remote data and save locally when it is referenced by an EML package, then include it in the ORE map.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
exapnd permissions on the exisiting access rule not on the permission being checked. (hierarchical permissions)
Make sure the local id isn't null when we try to get the object from the local instance.
Simplify the error handling, and throw the exception once the CN is updated with the new status.
Set the replica status to failed (not invalidated) when we get exceptions trying to read the object bytes. Not much of a difference, but only the CN, in theory, is supposed to be able to set the invalidated status.
Set the replication status to invalidated when we have a localId, but getting the object bytes fails for any reason.
Only call super.create() if there's no localId found on the MN (ie a replica is there from an out of band process).
Get the object inputstream from the local metacat instance using MetacatHandler.get() rather than MN.getReplica() so we don't throw an InvalidToken exception when passing in a null Session. The D1Client object is never used for this local call.
interpret permissions as hierarchicalhttps://redmine.dataone.org/issues/2150
process the current revision, not the latest!use direct object/system metadata insertion for ORE maps.
allow other Metacat process (system metadata and ORE generation) to directly insert objects and system metadata without having to go through the MN/CN methods.
only attempt to unlock a lock if it was created (in the finally block)
new jars with many changes -- including new CN methods: ping, describe, listChecksumAlgorithm. Removed MN.setAccessPolicy. Refactored CN.setOwner() to CN.setRightsHolder().
add revision history to the generated ORE objects -- we use the revision history of the EML package as a basis because the each ORE revision mirrors the revision of the EML package. Add a placeholder for checking if an equivalent ORE map exists in the DataONE infrastructure - this will be a call to CN.search() that looks at the solr index for OREs based on the EML package ID.
In the call to MNReplication.replicate(), call back to CNReplication.setReplicationStatus() and set the status to failed when we get local exceptions, exceptions from the source MN when calling getReplica(). Send back an exception with a description when setting the status. Add a private setReplicationStatus() method to refactor these calls out.
Change setReplicationStatus() to drop serialVersion and report the failure exception message in the CN log.
set SystemMetadata.archived=true on MN.deleteThere is ongoing discussion on what the exact behavior should be here, but this mimics Metacat's delete-as-archive action.http://redmine.dataone.org/issues/882
In MNodeService.replicate(), check to see if we have a replica (via an out of band channel) before we call sourceMN.getReplica().
updated D1 API -- removed Permission.REPLICATE and associated parameters
include SerialVersion in describe responsehttps://redmine.dataone.org/issues/2135NOTE: d1 jars should be replaced once all schema changes are finalized and the generate d1_common code is committed to svn
If a member node cannot be found in the node list matching the targetNodeSubject given in isNodeAuthorized(), throw a ServiceFailure exception.
update with latest d1_common/d1_lib (includes latest schema changes)
for now, look up SystemMetadata directly from the table otherwise we won't have the latest access information. Need to refresh the in-memory copy everytime we edit the access policy via Metacat (includes EML parser)
refactor Metacat access handling to be on a per-revision basis so that it more closely aligns with the DataONE approachhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5560
ensure that the revision list is ordered ascending in case someone changes the sql query without realizing that it matters...
set the byte size of the ORE map before adding it
set/update the obsoletes/obsoletedBy fields in system metadata so that we always have a complete revision history for each object.Note: ORE maps do not have revision history...yet(?)
generating ORE maps and creating/updating system metadata now. There are some Permission conversion issues to be worked out yet
make exception/error reporting clearer -- was getting lock messages when perhaps that was not the correct exception.
Add log statements for each call to ILock.unlock() for debugging.
evict the HazelCast SystemMetadata entry if we update the access control rules via Metacat's legacy API, otherwise stale SystemMetadata stays in memory instead of being looked up from the backing table store.
optionally include ORE generation/insertion into Metacat when generating SystemMetadatahttps://redmine.dataone.org/issues/2056
Set a default HazelcastInstance after init() is called, and use this instance in getLock() to acquire a lock in the cluster.
no need to cast docInfo entries to String -- they are all strings
set revision history, the create/update dates and the owner/submitter (correctly)
use shared method for looking up "docInfo" map -- both in Metacat replication and in D1 system metadata generation
make default formatting a little bit easier to read