MN.getPackage() - test with ORE that includes 2 data files and a "metadata" file (still should be using EML for that test). https://projects.ecoinformatics.org/ecoinfo/issues/6026
First pass at MN.getPackage() implementation using Bagit library from LOC. https://projects.ecoinformatics.org/ecoinfo/issues/6026
add method for publishing existing object (usually assumed to be scimeta) with a DOI. https://projects.ecoinformatics.org/ecoinfo/issues/6014
only use MapStore/MapLoader for saving/loading IndexEvent objects. No need to use a listener since there is only the single node -- all entries are persisted to DB using the hazelcast.xml config we have for the map. https://projects.ecoinformatics.org/ecoinfo/issues/5944
add MapStore/Loader test for the IndexEvents -- adding and removing events in the DB table through hazelcast. https://projects.ecoinformatics.org/ecoinfo/issues/5944
stub for storing IndexEvent objects in Metacat (from metacat-index processing). https://projects.ecoinformatics.org/ecoinfo/issues/5944
do not force a get() during refresh (causing EML-defined data access rules to be lost when inserting EML docs about data files). note that this reverses a change that was meant to trigger indexing, but now we are using a new queue to share index events with metacat-index and so should not be necessary.
correct regex for whitespace in D1 identifier.
use an independent ISet<SystemMetadata> structure to communicate objects that should be indexed by metacat-index. https://projects.ecoinformatics.org/ecoinfo/issues/5943
Solr will be enabled if it is in the db.enabledEngines.
Use some contants from the EnabledQueryEngines.
use ContentTypeInputStream interface (and ByteArray implementation) to specify the desired content-type of the InputStream returned by MN.query().
load the evicted SM back into the map on a "Refresh" so that listeners hear the update. (metacat-index, for example)
Use the set of subjects to replace the user and groups for the solr query.
Use a new class to handle the solr query engine description request.
Add the access query filter.
Add code to handle the solr index information. we still need to figure out how to get the information.
Add the solr engine to the engine list.
use maven to manage most jar dependencies in Metacat.Exceptions include: LSID, Datamamager (EML),
Add code to handle solr query.
Merging the METACAT_2_0_6_BRANCH changes for [M|C]NodeService into the trunk.
allow verification date to be updated for replicas (patch from Skye). https://redmine.dataone.org/issues/3699
include size and format datcite elements (optional) and use more general resourceType without formatId in them (Dataset/metadata and Dataset/data). http://schema.datacite.org/meta/kernel-2.2/doc/DataCite-MetadataKernel_v2.2.pdf
lookup the title for EML files when registering DOIs.lookup the creator from DataONE CN (if available).add EML-based test. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
Set the session to null so that the call uses the CN certificate when calling MN.systemMetadataChanged();
To keep all nodes up to date with regard to system metadata changes, add the broadcastSystemMetadataChange() method that finds replica MNs in the node list and calls systemMetadataChanged(). Modify setReplicationStatus() and updateReplicationMetadata() to fire this off when a replica status changes to completed. We may decide to inform MNs at other times too, but this is a conservative amount of calls going to the MNs for now.
refactor DOI registration into separate class. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
refactor using ezid-client changes that split field names and values into separate enums. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
Correctly mint and register DOIs in teh MN API implementation. Add tests to exercise minting and creating. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
register DOIs with minimal DataCite metadata. still need to determine which details to include and when, but the plumbing is in place as we refine those rules. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
disable EZID/DOI minting by default since we do not yet have a means of tracking minted DOIs and augmenting metadata for them when we actually receive the object in a subsequent create() or update() call. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5753
generate ID from UUID. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5840
make sure serial version is included or set on MN.update().http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5793
make sure to call lock() on the SM when updating rightsholder (like every other method that gets a lock object from HZ).
CN.search() id not implemented by metacat -- making that explicit and also testing for it.
first pass at DOI minting using the EZID service in mn.generateIdentifier()http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5755
for MN.update() we needed to pass the original pid, not the new pid
do not reject any schemes -- all handled the same at the moment.
simple autogen-based implementation of MN.generateIdentifier(). does not support DOIs, ARKs, etc. It does support including a fragment, returning an identifier like "<fragment>.2012113010215298206"
limit /log and /object calls to configurable maximum count for paging. defaults to existing Metacat value of 7000
no need to mark SM as archived now that DocumentImpl.delete() does it.https://redmine.dataone.org/issues/3406
In migrating to Hazelcast 2.4.x, replace deprecated methods. Use Hazelcast.newHazelcastInstance() rather than Hazelcast.init(). For other deprecated static methods, use the HazelcastInstance equivalent calls.
In CNodeService.updateReplicationMetadata(), we are setting the replicaVerifiedDate() when we update or wholesale add a new replica. However, in setReplicationStatus(), we only do so when there's a new entry. Change setReplicationStatus() to also update the replicaVerifiedDate on updates of existing entries to be more consistent with other changes. This affects node prioritization based on this date timestamp. Thanks to Skye for pointing this out.
To attempt to address performance and stability WRT Hazelcast communication, we're upgrading to the 2.x series of Hazelcast. remove the 1.9.x jar files, and add the 2.4.1-SNAPSHOT jars. Modify HazelcastService to handle the minor change in the ItemListener interface (now passes ItemEvent<Identifier> as an argument)....
implement query description for pathquery -- only tells callers about the pre-indexed paths we have in Metacat since there are an infinite number of "fields" when storing arbitrary XML, but we really don't want people using non-indexed paths for performance reasons anyway. I've typed all the fields as String, even though some are not just strings and can be used for numeric or data comparisons.
Implement MNQuery for "pathquery" engine. Optionally include guid in the pathquery results (https://redmine.dataone.org/issues/3083)
add count for the total processed pids (from ISet iterator)
Update d1_common_java and d1_libclient_java to the newest jar files. Add methods to CNodeService to throw NotImplemented exceptions for query(), listQueryEngines(), and getQueryEngineDescription() since these API calls are handled outside of metacat.
do not allow updates to orphan another branch of revision history. https://redmine.dataone.org/issues/3338
include the subjects we are testing for authentication.https://redmine.dataone.org/issues/2778
make sure data objects correctly use force replicate with action "insert" https://redmine.dataone.org/issues/3138
allow SM resynch to be executed any time, not just during start up.https://redmine.dataone.org/issues/3116
change to debug log level when processing shared/local pids)
only lock the missing pid event if we know we have it locally to contribute.https://redmine.dataone.org/issues/3117
Add locking to the itemAdded() method so ideally only one CN will respond to the request for a 'wanted' pid from the cluster. The lock is on a string, not the pid, and so won't conflict with system metadata locking. The string is based on the pid, with "missing-" as a prefix.
only publish to the missing pid "wanted list" when resynching system metadata. we were seeing redundant entry added/updated events when looking up the shared systemmetadata first.
print the missing pid count, not the total shared pid count so we know how many will be processed.
change the system metadata resynch approach: nodes will publish PIDs that they are missing after inspecting the shared identifier set. other nodes will be listening for the "wanted" pids and will put their local copy of SystemMetadata on the shared SM map. This should dramatically decrease the hazelcast chatter during a resynch and targets only the pids that are missing from any of the various nodes.
logging for processing identifier set on restart.
check if the caller is the Node admin (the member node calling itself) as well as the existing check for the CN calling the service. Both of those callers should be given full admin rights.
use local Set processing to determine which pids (if any) should be contributed to the shared set by this node during the resync. Should save time rather than checking each and every pid against the shared set.
move the hzIdentifiers initialization into the resync thread so that it does not affect start up time. cleaned up unused methods and superfluous code.
only load local pids into hzIdentifiers if t hey do not already exist in the shared set. increase logging severity and detail of messages emitted during this process to get a better sense of what is taking so long.
utility methods to update/reserialize existing ORE maps that were generated with older foresite (and included bad dateTime strings).https://redmine.dataone.org/issues/3046
On the coordinating Nodes, we often get McdbDocNotFoundExceptions for data (doctype == 'BIN') documents because they are not synchronized to the CNs. Change the logging to only print the stack trace during load() and loadAll() when log debug is enabled.
check for invalid (!) pids. thanks, M. Reyes for catching thishttps://redmine.dataone.org/issues/3047
check for whitespace in identifiers during create() and update()https://redmine.dataone.org/issues/3047
set date SM modified when we are setting obsoletes/obsoletedBy/archived values. This way the CN can actualy pick up the changes in revision history.
log error when looking up non-existent local SM rather than completely bombing out of the resynch thread.
use secure Metacat context URL for D1 registrationhttps://redmine.dataone.org/issues/3030
first pass: DataONE-specific log retrieval to avoid java-based post-processing.
set archived flag (true) when we set the obsoletedBy value in the ORE system metadata
use the localId for obsoletes/obsoletedBy ORE system metadata (https://redmine.dataone.org/issues/2964)
Oops, previous commit suffered from a happy trigger finger. During deleteReplicationMetadata(), don't delete the replica on the replica Member Node. Call CN.delete() for that functionality. This call just updates sytem metadata (according to the API description).
Minor logging change.
Add debug logging to delete() to understand why we're getting InsufficientKarmaException.
Since we already have determined access via isAuthorized() and isAdminAuthorized(), act as the Metacat administrator during calls to DocumentImpl.delete() in archive(), passing in null username and group.
restrict getLogRecrods (both MN and CN) to be called only by admin users (the CN)https://redmine.dataone.org/issues/2855
In setReplicationStatus() and UpdateReplicationMetadata(), don't allow a status state change from COMPLETED to anything other than INVALIDATED. This prevents the completed status from being overwritten due to race conditions.
use metacat.properties to specify the default checksum algorithm to use -- this way it will be easy for us to switch to whatever DataONE decrees. https://redmine.dataone.org/issues/2834
put(sm) for every pid we have a SM value for so that all members receive the entry event and can save locally.
Throw an exception when NOT allowed, not when allowed =).
ignore partition owner -- always attempt to look up form local store if we were unable to get the SM from the shared map.
do not check if this CN has a "perfect" copy of the SM identifiers -- we need any CN coming online to contribute the records that they have locally so that in the event that all three CNs have a partial view of things they all eventually share each others' SM entries.
Also get the list size, which may throw an NPE.
Only add an AccessPolicy to SystemMetadata during generation when the AccessPolicy is not empty. We've had some scenarios where IdentifierManager.getaccessPolicy() is returning an empty policy because of an empty permission list coming from the db. This was causing InvalidSystemMetadata exceptions during MN to MN replication.
push SystemMetadata entries from the CN that has them all to the shared map where other nodes may not have all entries. The CN with the complete copy only pushes SM entries that it does not own and that return as null because those are the ones that are missing on the other, non-complete CNs....
trace level log for looping over EVERY pid in the system.
meant to log the guids (source) not the pids (target)
logging for each step of shared identifiers loading.
remove pause/resume - seemed to make metacat just hang on SM retrieval. Add more logging when returned SM is null -- want to make sure it is becuase the local node "owns" the pid key even though there is no value for it.
due to hudson build issue, did not actually end up testing pause/resume -- trying that again
pause/resume was not enough. trying shutdown/restart
experiment with lifecycle pause/resume. hopefully it prevents our node from taking ownership of any keys before we are sure we have them all.
increase logging and add back in the call to saveLocally() in case the SM object has already been loaded into the shared map but before this node came back online.
no need to call saveLocally explicitly since loading from the shared store triggers that behavior locally because of the configured listeners.use an iterator over the shared identifiers in case this set is constantly changing.
make only one DB call to look up local pids - no need to do a pstmt for every single shared pid.
on init (start up) launch a synchronization thread that ensures all shared identifier entries have a corresponding local System Metadata entry.