create docid-guid mapping during replication if it does not exist. we were [incorrectly] assuming that there would be SM coming with the document info that would fill this information in, but for traditional non-MN Metacat deployments there is no SM to provide a mapping. In this case we use the docid as the guid.
include certificate export SSL options as an example (used heavily for DataONE and Metacat Replication)
stream the replication "update" response rather than building up a complete list in a stringbuffer. prompted by findings on t he CN: https://redmine.dataone.org/issues/3141
make sure data objects correctly use force replicate with action "insert" https://redmine.dataone.org/issues/3138
correct the update statement for setting archived flag on SM where document revision does not exist in the xml_documents table
sleep before updating and deleting test documents - otherwise their index entries may not be fully written and this causes errors (update and delete first attempt to remove index references, but if they are not in the DB yet then they are not removed but then they do get added and the FK constraints make the delete fail). Since we know indexing occurs in a separate thread with a configured delay, we just use this same delay in our testing.
when updating a document on a remote server, we still need to use the previous docid to check that the user has permissions to do so (rather than the new id that is obsoleting the old id). This was discovered by M Servilla at LTER.
remove unused "dataonelogger"
prep for 2.0.3 release
allow SM resynch to be executed any time, not just during start up.https://redmine.dataone.org/issues/3116
change to debug log level when processing shared/local pids)
only lock the missing pid event if we know we have it locally to contribute.https://redmine.dataone.org/issues/3117
Add locking to the itemAdded() method so ideally only one CN will respond to the request for a 'wanted' pid from the cluster. The lock is on a string, not the pid, and so won't conflict with system metadata locking. The string is based on the pid, with "missing-" as a prefix.
only publish to the missing pid "wanted list" when resynching system metadata. we were seeing redundant entry added/updated events when looking up the shared systemmetadata first.
print the missing pid count, not the total shared pid count so we know how many will be processed.
change the system metadata resynch approach: nodes will publish PIDs that they are missing after inspecting the shared identifier set. other nodes will be listening for the "wanted" pids and will put their local copy of SystemMetadata on the shared SM map. This should dramatically decrease the hazelcast chatter during a resynch and targets only the pids that are missing from any of the various nodes.
logging for processing identifier set on restart.
remove possibility for infinite loop in case data replication is not configured for the server and a data file is encountered (yikes!)
added logging debug statements to see where the replication timeout might be occurring.
use correct EZID account names for the three different nodes.https://redmine.dataone.org/issues/2815
align the final column headers with the datacite schema, as applicable.https://redmine.dataone.org/issues/2815
add block for finding and updating records that should be marked as archived.https://redmine.dataone.org/issues/3109
use DataCite isNewVersionOf/isPreviousVersionOf for revision history
include JCS jar as it is a runtime dependency for d1_libclient's object caching.
check for null archived flag in ORE SMhttps://redmine.dataone.org/issues/3046
check if the caller is the Node admin (the member node calling itself) as well as the existing check for the CN calling the service. Both of those callers should be given full admin rights.
add note about DataONE CA chain file when configuring MNs at Tier 2+
not every EML file has an ORE datapackage descriptor -- join only to those when setting the resourceMapId
correctly use document revision for object format and resource map joins.
use local Set processing to determine which pids (if any) should be contributed to the shared set by this node during the resync. Should save time rather than checking each and every pid against the shared set.
move the hzIdentifiers initialization into the resync thread so that it does not affect start up time. cleaned up unused methods and superfluous code.
use correct children of 'publisher' element
only load local pids into hzIdentifiers if t hey do not already exist in the shared set. increase logging severity and detail of messages emitted during this process to get a better sense of what is taking so long.
utility methods to update/reserialize existing ORE maps that were generated with older foresite (and included bad dateTime strings).https://redmine.dataone.org/issues/3046
include the resourceMapId for the metadata objects, not just the data files.
updated LDAP dump and corrected missing entries that had been removed from LDAP.
On the coordinating Nodes, we often get McdbDocNotFoundExceptions for data (doctype == 'BIN') documents because they are not synchronized to the CNs. Change the logging to only print the stack trace during load() and loadAll() when log debug is enabled.
check for invalid (!) pids. thanks, M. Reyes for catching thishttps://redmine.dataone.org/issues/3047
only look up the client timeout property once, not every time we make a callhttps://redmine.dataone.org/issues/3078
improve content type handling during the get() callshttps://redmine.dataone.org/issues/3070
check for whitespace in identifiers during create() and update()https://redmine.dataone.org/issues/3047
remove semtools skin as a configured skin -- will need to add that if we ever get back to deploying a semtools instance.
configurable replication client timeouthttps://redmine.dataone.org/issues/3078
order the listObjects() results by identifier to mitigate random paged resultshttps://redmine.dataone.org/issues/3065
correct the parameter/value setting in the prepared statements for retrieving log information.
use docid, not the guid when returning the accesscontrol block
handle null givenNames from the LDAP dump.
make sure we only get the publisher text content (not attribute value)
DOI registration:-include more revision history based on the identifier table not just the generated SM metadata-include ecogrid data urls for revisions (long query in xml_nodes_revisions table)
include new libclient jar that uses encoded pids in the resolve URLshttps://redmine.dataone.org/issues/3035
update D1 jars in preparation for 2.0.2 release. NOTE: still need libclient jar that includes ORE changes for encoding PIDs in resolve URLs
prep for 2.0.2 release by updating the version numbers.
include dataone.ore.downloaddata as a configurable property in case MNs (like LTER) want to have the process download externally-stored data files described in an EML data package.
updated foresite (snapshot) to include dateTime serialization fix.https://redmine.dataone.org/issues/3035
set date SM modified when we are setting obsoletes/obsoletedBy/archived values. This way the CN can actualy pick up the changes in revision history.
update creator and publisher using LDAP dump. unfortunately LDAP has shifted over the years and not all identities are still active in LDAP...but we did get quite a few creator names updated!https://redmine.dataone.org/issues/2815
log error when looking up non-existent local SM rather than completely bombing out of the resynch thread.
include parameter for deleting system metadata records (or not). Intending to also use this for https://redmine.dataone.org/issues/3055
look up docid using mapped guid when checking permission on described data fileAddresses: http://support.nceas.ucsb.edu/rt/Ticket/Display.html?id=7490
function/procedure for removing all content related to a PID from the DB.https://redmine.dataone.org/issues/3037
use docid (not guid) when instantiating the PermissionController. Was getting an error with DOI-ified identifier and the metacat getaccesscontrol action:https://knb.ecoinformatics.org/knb/metacat?action=getaccesscontrol&docid=Collinge.3.28<error>AccessControlForSingleFile.getACL() - MCDB error when getting ACL: No guid registered for docid doi:10.5063/AA/Collinge.3.28...
save point - adding more columns for access, data packaging, revision historyhttps://redmine.dataone.org/issues/2815
script to find and update missing SystemMetadata revision history. https://redmine.dataone.org/issues/2938
update the table to indicate which DOI account we are targetinghttps://redmine.dataone.org/issues/2815
make sure we have non-null values where jibx serialization expects them for LogEntry
use secure Metacat context URL for D1 registrationhttps://redmine.dataone.org/issues/3030
first pass: DataONE-specific log retrieval to avoid java-based post-processing.
use production cn url for the resolve url
remove the non-doi identifiers before updating the LTER - should save time on the update.https://redmine.dataone.org/issues/2858
use eml stylesheet tag (1.0.3)
use 1.0.2 d1_libclient jar (built by hudson)
set archived flag (true) when we set the obsoletedBy value in the ORE system metadata
update for 2.0.1 upgrade -- scripts, docs, readme
remove saxon jar (XSLT 2.0 support) with plans to re-add when we understand how to make it more lenient WRT to invalid character content in source XML
move to 1.0.2 d1_libclient jar for ORE generation change (URI for aggregation)
use the localId for obsoletes/obsoletedBy ORE system metadata (https://redmine.dataone.org/issues/2964)
use correct esa email list
include xml-apis*.jar when building the LSID authority.war -- without this file the authority webapp has a fatal error (no class def found) on init.
correct block formatting for tomcat config changes
add tomcat config options for DataONE identifiers with slashes
excluded the HzObjectPathMapTest$1 and HzObjectPathMapTest$2 classes which are not test classes in the test taraget.
Print the stack trace when the MMP cannot be resolved.
Use a final static string to replace the hard code.Search document title rather than id in testReplicateEML_AtoB method.
use 1.0.2 tag for EML stylesheets
update MN registration screen shot and amend instructions that say a nodeId will be assigned during registration.add section about generating SM for a new Member Node that has existing Metacat data.
use CN session when testing getLogRecords() and getOperationStatistics() becuase they are now protecting "sensitive" information
use RC for EML stylesheets before going to yet another minor revision number.
report errors during XML->HTML transformhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5618
add 'fulldist' target to combine building of src and bin distributions
include gastil's changes re: pathquery and 'delete'http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5516
Add details for configuring Apache to use client certificate authentication.
use production CN url as the default (instead of cn-dev)
Oops, previous commit suffered from a happy trigger finger. During deleteReplicationMetadata(), don't delete the replica on the replica Member Node. Call CN.delete() for that functionality. This call just updates sytem metadata (according to the API description).
remove distribution tar.gz and zip files on fullclean.
use iframe id for the login anchor since the anchor inside the iframe does not work.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5619
get utilities source from the correct checkout location
clarify release notes for 2.0.0 (minor)
correct Javadoc link http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5516
Minor logging change.