Replaced the JiBXException by MarshallingException.
add warning when exception encountered loading SM into map.
Add the timestamp for the printing out the start and end of hazelcast synchronization.
Add the log messages to indicate the hazelcast synchronization starting and ending.
Add the code to delete systemmetadata.
add support for v2 DataONE API.
Unify solr indexing with an IndexTask that is added to the queue -- allows us to send more than just the systemMetadata to the indexer. Initially this is for READ event counts for each document. https://projects.ecoinformatics.org/ecoinfo/issues/6346
only use MapStore/MapLoader for saving/loading IndexEvent objects. No need to use a listener since there is only the single node -- all entries are persisted to DB using the hazelcast.xml config we have for the map. https://projects.ecoinformatics.org/ecoinfo/issues/5944
add MapStore/Loader test for the IndexEvents -- adding and removing events in the DB table through hazelcast. https://projects.ecoinformatics.org/ecoinfo/issues/5944
stub for storing IndexEvent objects in Metacat (from metacat-index processing). https://projects.ecoinformatics.org/ecoinfo/issues/5944
do not force a get() during refresh (causing EML-defined data access rules to be lost when inserting EML docs about data files). note that this reverses a change that was meant to trigger indexing, but now we are using a new queue to share index events with metacat-index and so should not be necessary.
use an independent ISet<SystemMetadata> structure to communicate objects that should be indexed by metacat-index. https://projects.ecoinformatics.org/ecoinfo/issues/5943
load the evicted SM back into the map on a "Refresh" so that listeners hear the update. (metacat-index, for example)
In migrating to Hazelcast 2.4.x, replace deprecated methods. Use Hazelcast.newHazelcastInstance() rather than Hazelcast.init(). For other deprecated static methods, use the HazelcastInstance equivalent calls.
To attempt to address performance and stability WRT Hazelcast communication, we're upgrading to the 2.x series of Hazelcast. remove the 1.9.x jar files, and add the 2.4.1-SNAPSHOT jars. Modify HazelcastService to handle the minor change in the ItemListener interface (now passes ItemEvent<Identifier> as an argument)....
add count for the total processed pids (from ISet iterator)
allow SM resynch to be executed any time, not just during start up.https://redmine.dataone.org/issues/3116
change to debug log level when processing shared/local pids)
only lock the missing pid event if we know we have it locally to contribute.https://redmine.dataone.org/issues/3117
Add locking to the itemAdded() method so ideally only one CN will respond to the request for a 'wanted' pid from the cluster. The lock is on a string, not the pid, and so won't conflict with system metadata locking. The string is based on the pid, with "missing-" as a prefix.
only publish to the missing pid "wanted list" when resynching system metadata. we were seeing redundant entry added/updated events when looking up the shared systemmetadata first.
print the missing pid count, not the total shared pid count so we know how many will be processed.
change the system metadata resynch approach: nodes will publish PIDs that they are missing after inspecting the shared identifier set. other nodes will be listening for the "wanted" pids and will put their local copy of SystemMetadata on the shared SM map. This should dramatically decrease the hazelcast chatter during a resynch and targets only the pids that are missing from any of the various nodes.
logging for processing identifier set on restart.
use local Set processing to determine which pids (if any) should be contributed to the shared set by this node during the resync. Should save time rather than checking each and every pid against the shared set.
move the hzIdentifiers initialization into the resync thread so that it does not affect start up time. cleaned up unused methods and superfluous code.
only load local pids into hzIdentifiers if t hey do not already exist in the shared set. increase logging severity and detail of messages emitted during this process to get a better sense of what is taking so long.
On the coordinating Nodes, we often get McdbDocNotFoundExceptions for data (doctype == 'BIN') documents because they are not synchronized to the CNs. Change the logging to only print the stack trace during load() and loadAll() when log debug is enabled.
log error when looking up non-existent local SM rather than completely bombing out of the resynch thread.
put(sm) for every pid we have a SM value for so that all members receive the entry event and can save locally.
ignore partition owner -- always attempt to look up form local store if we were unable to get the SM from the shared map.
do not check if this CN has a "perfect" copy of the SM identifiers -- we need any CN coming online to contribute the records that they have locally so that in the event that all three CNs have a partial view of things they all eventually share each others' SM entries.
push SystemMetadata entries from the CN that has them all to the shared map where other nodes may not have all entries. The CN with the complete copy only pushes SM entries that it does not own and that return as null because those are the ones that are missing on the other, non-complete CNs....
trace level log for looping over EVERY pid in the system.
meant to log the guids (source) not the pids (target)
logging for each step of shared identifiers loading.
remove pause/resume - seemed to make metacat just hang on SM retrieval. Add more logging when returned SM is null -- want to make sure it is becuase the local node "owns" the pid key even though there is no value for it.
due to hudson build issue, did not actually end up testing pause/resume -- trying that again
pause/resume was not enough. trying shutdown/restart
experiment with lifecycle pause/resume. hopefully it prevents our node from taking ownership of any keys before we are sure we have them all.
increase logging and add back in the call to saveLocally() in case the SM object has already been loaded into the shared map but before this node came back online.
no need to call saveLocally explicitly since loading from the shared store triggers that behavior locally because of the configured listeners.use an iterator over the shared identifiers in case this set is constantly changing.
make only one DB call to look up local pids - no need to do a pstmt for every single shared pid.
on init (start up) launch a synchronization thread that ensures all shared identifier entries have a corresponding local System Metadata entry.
fix NPE (logMetacat object was not initialized) that was occurring during store()
share the same dbConnection when inserting and then updating SystemMetadata objects in the backing store.any errors encountered during the update will rollback the entire transaction and the SM record will not exist, even in part.
Do not loadAllKeys() for SystemMetadataMap when Metacat first starts up. hzIdentifiers will be populated with a simple SQL statement rather than the serial loading of every single SystemMetadata object. It will remain in synch using the usual entryXXX() methods as before....
only generate system metadata for original objects.https://redmine.dataone.org/issues/2721
add comment about returning early when no system metadata can be found.removed extraneous check on the content type of the SM -- was unused.formatted indenting
for SystemMetadata events we first check the event for the SM value. If it returns null, we look it up from the shared map. It seems as if we don't always get a value with our events.
comment out: synchronize local system metadata on cn restart
synchronize local system metadata on cn restart
log calls to store() system metadata to the backing store
Add the listener for LifecycleEvent state changes
synchronizeLocalStore() when the cluster has a LifecycleEvent state change to RESUMED.
refactor memberAdded code to separate method - synchronizeLocalStore for possible reuse
-use MembershipListener to keep new members' backing store for system metadata synchronized with the shared system metadata map.-remove the unused InstanceListener interface
Add a few more debugging statements to HazelcastService for troubleshooting hazelcast map concurrency.
add an alternative method for loading system metadata identifiers but leave it commented out. We may find that using the ObjectList method is too much overhead, but it will always be consistent with what metacat reports for listObjects().
add note about long-running load for shared system metadata map
refactor IdentityManager.createSystemMetadata(sm) to be insertSystemMetadata(sm) so that it is clear that this method inserts the SM object into the backing store. This differentiates it from the "generation" methods we use when we need to create SM about pre-existing objects or objects we get from non-D1 api calls.
additional logging of the config file being used - seem to have thread locking on the xmlConfig use when running under ant/junit
In IdentifierManager.updateSystemMetadata(), add a check for invalid system metadata (fields that throw a NullPointerException on access) to ensure that system metadata is populated correctly. Updated calling classes to handle the exception.
Handle SQLExceptions when trying to save system metadata locally.
Convert SQLExceptions to RuntimeExceptions for Hazelcast MapStore operations.
Keep the hzIdentifiers set in sync with the Metacat systemmetadata table. If entries are added/updated in the hzSystemMetadata map, make sure the identifier is in the set. If (for some administrative reason) the entry is removed, remove the identifier from the set. This usually doesn't happen.
When loading all keys from Metacat into the hzSystemMetadata map, also load identifiers into the hzIdentifiers set if they are not already there. Although entries may be evicted from the map, the list of identifiers will remain. The list will have a fairly small memory footprint since it's just identifiers.
Add support for the distributed Set of unique identifiers in the storage cluster called 'hzIdentifiers'. This set is a persistent total list of all identifiers (even when entries in the hzSystemMetadata map are evicted). It reflects the state of the identifiers in the postgresql systemmetadata table, but is distributed across the cluster. Add the getIdentifiers() method, which returns the ISet of identifiers.
Use a Logger instead of System.out for SystemMetadataMap.
Use Lock instead of ILock to be consistent across classes.
evict the HazelCast SystemMetadata entry if we update the access control rules via Metacat's legacy API, otherwise stale SystemMetadata stays in memory instead of being looked up from the backing table store.
Set a default HazelcastInstance after init() is called, and use this instance in getLock() to acquire a lock in the cluster.
use shared method for looking up "docInfo" map -- both in Metacat replication and in D1 system metadata generation
When using ILock.lock(), get a lock on the string value of the Identifier, not the Identifier object itself. Hazelcast locking won't work otherwise.
Use the Hazelcast ILock mechanism to lock the system metadata identifier rather than using IMap.lock(pid).
delete system metadata when MN.delete() is called.
make MNodeServiceTest pass JUnit testing
return null instead of throwing an exception when pid is not found in store
comment out resynch() method until errors are resolved
use default hazelcast config when not configured to use an external one
For now, remove the hzClient code connecting to the DataONE process cluster to get the hzNodes map. This will be moved into the storage cluster, but use D1Client to get the node list for now.
going back to using IDentifier as the key for the ObjectPAthMap.
Reverting previous @Overrides chanrge from r6470, as that is the desiredbehavior under Java 1.6 -- previous versions of Java (e.g., 1.5) will notcomile with this usage of the @Overrides annotation, but the currentlysupported version will. So reverting to the 1.6 convention.
Removing incorrect @Override annotations that were preventing compilation. The methods marked did not actually override a method in the superclass, so they were not compiling. I think @Overrides was being mistaken for methods that implement an interface but aren't actually in the superclass.
Remove references to CNReplicationTask.
Remove the CNReplicationTask (for now). We will be using Metacat's ForceReplicationHandler to replicate science metadata across CNs, and may explore the use of a 100% evicted hzScienceMetadata map. Either way, the distributed task design won't be needed. When a dropped CN comes back online, we'll catch it up based on last modified dates for PIDs in the hzSystemMetadata map.
Add getNodesMap() to return the hzNodes map from the process cluster. Remove getPendingReplicationTasks since that structure is being removed. Add minor documentation.
lookup latest system metadata update date for use in synchronizing CN-CN when an offline nodes comes back online
changed the key type from Identifier to String for ObjectPathMap. (need a Comparable key).
rework this to be MN->MN replication. Should be fleshed out more.
throw RuntimeExceptions when store() methods throw declared exceptions -- we want callers to put() to be alerted if there are errors.
move CNReplicationTask to the hazelcast package
check if system metadata exists rather than just the id mapping (before creating the entry)
move bulk of the Hazelcast code into HazelcastService from CNodeService so that it is centrall located - easier to manage and configure
removing unneeded class (never used)
cleaned up mock tests on hzObjectPathMap. split out code for mocking a datastore into MockObjectPathMap.
initialize Hazelcast from the custom configuration when initializing the Metacat service.
handle entryAdded (to hzSystemMetadata) to store sysmeta to local store when it is not already present
use HashMap, HashSet instead of the Tree* classes that require Identifier objects implement Comparable