Add locking to the itemAdded() method so ideally only one CN will respond to the request for a 'wanted' pid from the cluster. The lock is on a string, not the pid, and so won't conflict with system metadata locking. The string is based on the pid, with "missing-" as a prefix.
only publish to the missing pid "wanted list" when resynching system metadata. we were seeing redundant entry added/updated events when looking up the shared systemmetadata first.
print the missing pid count, not the total shared pid count so we know how many will be processed.
change the system metadata resynch approach: nodes will publish PIDs that they are missing after inspecting the shared identifier set. other nodes will be listening for the "wanted" pids and will put their local copy of SystemMetadata on the shared SM map. This should dramatically decrease the hazelcast chatter during a resynch and targets only the pids that are missing from any of the various nodes.
logging for processing identifier set on restart.
remove possibility for infinite loop in case data replication is not configured for the server and a data file is encountered (yikes!)
added logging debug statements to see where the replication timeout might be occurring.
check for null archived flag in ORE SMhttps://redmine.dataone.org/issues/3046
check if the caller is the Node admin (the member node calling itself) as well as the existing check for the CN calling the service. Both of those callers should be given full admin rights.
use local Set processing to determine which pids (if any) should be contributed to the shared set by this node during the resync. Should save time rather than checking each and every pid against the shared set.
move the hzIdentifiers initialization into the resync thread so that it does not affect start up time. cleaned up unused methods and superfluous code.
only load local pids into hzIdentifiers if t hey do not already exist in the shared set. increase logging severity and detail of messages emitted during this process to get a better sense of what is taking so long.
utility methods to update/reserialize existing ORE maps that were generated with older foresite (and included bad dateTime strings).https://redmine.dataone.org/issues/3046
On the coordinating Nodes, we often get McdbDocNotFoundExceptions for data (doctype == 'BIN') documents because they are not synchronized to the CNs. Change the logging to only print the stack trace during load() and loadAll() when log debug is enabled.
check for invalid (!) pids. thanks, M. Reyes for catching thishttps://redmine.dataone.org/issues/3047
only look up the client timeout property once, not every time we make a callhttps://redmine.dataone.org/issues/3078
improve content type handling during the get() callshttps://redmine.dataone.org/issues/3070
check for whitespace in identifiers during create() and update()https://redmine.dataone.org/issues/3047
configurable replication client timeouthttps://redmine.dataone.org/issues/3078
order the listObjects() results by identifier to mitigate random paged resultshttps://redmine.dataone.org/issues/3065
correct the parameter/value setting in the prepared statements for retrieving log information.
use docid, not the guid when returning the accesscontrol block
include dataone.ore.downloaddata as a configurable property in case MNs (like LTER) want to have the process download externally-stored data files described in an EML data package.
set date SM modified when we are setting obsoletes/obsoletedBy/archived values. This way the CN can actualy pick up the changes in revision history.
log error when looking up non-existent local SM rather than completely bombing out of the resynch thread.
look up docid using mapped guid when checking permission on described data fileAddresses: http://support.nceas.ucsb.edu/rt/Ticket/Display.html?id=7490
use docid (not guid) when instantiating the PermissionController. Was getting an error with DOI-ified identifier and the metacat getaccesscontrol action:https://knb.ecoinformatics.org/knb/metacat?action=getaccesscontrol&docid=Collinge.3.28<error>AccessControlForSingleFile.getACL() - MCDB error when getting ACL: No guid registered for docid doi:10.5063/AA/Collinge.3.28...
make sure we have non-null values where jibx serialization expects them for LogEntry
use secure Metacat context URL for D1 registrationhttps://redmine.dataone.org/issues/3030
first pass: DataONE-specific log retrieval to avoid java-based post-processing.
set archived flag (true) when we set the obsoletedBy value in the ORE system metadata
use the localId for obsoletes/obsoletedBy ORE system metadata (https://redmine.dataone.org/issues/2964)
Print the stack trace when the MMP cannot be resolved.
report errors during XML->HTML transformhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5618
Oops, previous commit suffered from a happy trigger finger. During deleteReplicationMetadata(), don't delete the replica on the replica Member Node. Call CN.delete() for that functionality. This call just updates sytem metadata (according to the API description).
Minor logging change.
Add debug logging to delete() to understand why we're getting InsufficientKarmaException.
Since we already have determined access via isAuthorized() and isAdminAuthorized(), act as the Metacat administrator during calls to DocumentImpl.delete() in archive(), passing in null username and group.
restrict getLogRecrods (both MN and CN) to be called only by admin users (the CN)https://redmine.dataone.org/issues/2855
In setReplicationStatus() and UpdateReplicationMetadata(), don't allow a status state change from COMPLETED to anything other than INVALIDATED. This prevents the completed status from being overwritten due to race conditions.
use metacat.properties to specify the default checksum algorithm to use -- this way it will be easy for us to switch to whatever DataONE decrees. https://redmine.dataone.org/issues/2834
put(sm) for every pid we have a SM value for so that all members receive the entry event and can save locally.
Throw an exception when NOT allowed, not when allowed =).
ignore partition owner -- always attempt to look up form local store if we were unable to get the SM from the shared map.
do not check if this CN has a "perfect" copy of the SM identifiers -- we need any CN coming online to contribute the records that they have locally so that in the event that all three CNs have a partial view of things they all eventually share each others' SM entries.
Also get the list size, which may throw an NPE.
Only add an AccessPolicy to SystemMetadata during generation when the AccessPolicy is not empty. We've had some scenarios where IdentifierManager.getaccessPolicy() is returning an empty policy because of an empty permission list coming from the db. This was causing InvalidSystemMetadata exceptions during MN to MN replication.
push SystemMetadata entries from the CN that has them all to the shared map where other nodes may not have all entries. The CN with the complete copy only pushes SM entries that it does not own and that return as null because those are the ones that are missing on the other, non-complete CNs....
trace level log for looping over EVERY pid in the system.
meant to log the guids (source) not the pids (target)
logging for each step of shared identifiers loading.
remove pause/resume - seemed to make metacat just hang on SM retrieval. Add more logging when returned SM is null -- want to make sure it is becuase the local node "owns" the pid key even though there is no value for it.
due to hudson build issue, did not actually end up testing pause/resume -- trying that again
pause/resume was not enough. trying shutdown/restart
experiment with lifecycle pause/resume. hopefully it prevents our node from taking ownership of any keys before we are sure we have them all.
increase logging and add back in the call to saveLocally() in case the SM object has already been loaded into the shared map but before this node came back online.
no need to call saveLocally explicitly since loading from the shared store triggers that behavior locally because of the configured listeners.use an iterator over the shared identifiers in case this set is constantly changing.
make only one DB call to look up local pids - no need to do a pstmt for every single shared pid.
on init (start up) launch a synchronization thread that ensures all shared identifier entries have a corresponding local System Metadata entry.
fix NPE (logMetacat object was not initialized) that was occurring during store()
stack trace the HZ put exception during CN-CN replication
additional debugging statements for CONCURRENT_MAP_PUT error during CN-CN replication.
Don't set the replication status to failed for an object when it is called by a public user. Just throw the NotAuthorized exception. This prevents this node from being de-prioritized because of public calls to the method.
share the same dbConnection when inserting and then updating SystemMetadata objects in the backing store.any errors encountered during the update will rollback the entire transaction and the SM record will not exist, even in part.
Do not loadAllKeys() for SystemMetadataMap when Metacat first starts up. hzIdentifiers will be populated with a simple SQL statement rather than the serial loading of every single SystemMetadata object. It will remain in synch using the usual entryXXX() methods as before....
include pidFilter handling - only matches the complete pid. Issues a warning in the Metacat logs when pidFilter cannot be applied but allows the call to getLogs() to return as though there was no pidFilter given.https://redmine.dataone.org/issues/2798
use at least one thread on single-processor machines.https://redmine.dataone.org/issues/2800
Add a few logging statemnts for round trip replication metrics.
add trace statements for measuring time to complete SM generation.
instead of generating SM and ORE maps during dataone configuration/MN registration, moved this all to the replication admin screen where we can target generation for specific nodes. That way it's more controlled as to when and where we generate DataONE required content....
include all EML versions (had been only eml 2.1 for testing)
Append more information such as user name and group to the validating session response.
remove exception from method decl - was not matching the interface def and not compiling.
add "Generate System Metadata" button to the replication server list display. When clicked, we generate SM for records belonging to that source server. This is only enabled when DataONE has been configured.https://redmine.dataone.org/issues/2762
expose serverLocation parameter to run GenerateSystemMetadata for different replication parters as needed.https://redmine.dataone.org/issues/2740
only generate system metadata for original objects.https://redmine.dataone.org/issues/2721
handle authorization for delete() differently for CN vs MN.On the CN, only the CN (or tbd admin user) can call it.On the MN, both the CN (or admin user) and the same MN can call it.
add Session-less archive() method
only admin users can call MN/CN.delete(). This is limited to any CN and only the MN that is calling itself
update the sysmeta data modified when setting archived=truehttps://redmine.dataone.org/issues/882
handle CN.archive() rest call: PUT /archive/{pid}https://redmine.dataone.org/issues/2678
correct log about 'archive' being called
handle 'archive' rest callshttps://redmine.dataone.org/issues/2678
[optionally] do not archive the xml_documents and xml_nodes to *_revisions when 'deleting' a document. This will effectively guarantee that the document/data cannot be retrieved after delete.NOTE: D1 system metadata will persist (for now) so that the ID cannot be reused with the DataONE API but Metacat calls may allow the ID to be reused -- may need to reconsider this behavior....
optionally remove the document/data file from the filesystem completely when 'deleting' it.https://redmine.dataone.org/issues/2677
newer d1 jars that include shared AuthUtilsmethod for isAuthorized() consistencyhttps://redmine.dataone.org/issues/2661
implement MN and CN.archive() method -- really just the existing delete() methods.https://redmine.dataone.org/issues/2674https://redmine.dataone.org/issues/2675
call MN.delete() for each replica when CN.delete() is calledhttps://redmine.dataone.org/issues/2676
defer to AuthUtils for flattening out the equivIdent subject list.https://redmine.dataone.org/issues/2661
check normal access control rules for getSystemMetadata before deferring to MN replica information that may grant MNs additional access to the SM.https://redmine.dataone.org/issues/2656
include Session-less interface methods and updated jars that define them.
use a shared ExecutorService for replicate() calls.https://redmine.dataone.org/issues/2623
remove extraneous pid and permission parameters from isAdminAuthorized() method and make public so that it can be called in other locations - namely before our asynchronous replicate() implementation on the MN.
check for empty null (missing) node.subjectList. This should probably be a required element in the D1 schema, but it appears not. (ORNL entry was missing subjects in cn-dev environment)
just use the e.getMessage() as e.getCause() may be null (seeing NPE when testing via the MN IT tester)
no not record EML access rules that use the "denyFirst" permOrder.https://redmine.dataone.org/issues/2614
needed to initialize the nodeList that stores matching nodes (by subject) -- this was the source of a NPE when we had a matching node subject.