allow verification date to be updated for replicas (patch from Skye). https://redmine.dataone.org/issues/3699
select only distinct guids (synch may have failed more than once for any given guid)https://redmine.dataone.org/issues/3539
include xml_revisions.do not allow removal of server_location = 1 documents (these are not replicas).https://redmine.dataone.org/issues/3539
include size and format datcite elements (optional) and use more general resourceType without formatId in them (Dataset/metadata and Dataset/data). http://schema.datacite.org/meta/kernel-2.2/doc/DataCite-MetadataKernel_v2.2.pdf
lookup the title for EML files when registering DOIs.lookup the creator from DataONE CN (if available).add EML-based test. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
Set the session to null so that the call uses the CN certificate when calling MN.systemMetadataChanged();
To keep all nodes up to date with regard to system metadata changes, add the broadcastSystemMetadataChange() method that finds replica MNs in the node list and calls systemMetadataChanged(). Modify setReplicationStatus() and updateReplicationMetadata() to fire this off when a replica status changes to completed. We may decide to inform MNs at other times too, but this is a conservative amount of calls going to the MNs for now.
refactor DOI registration into separate class. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
refactor using ezid-client changes that split field names and values into separate enums. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
Correctly mint and register DOIs in teh MN API implementation. Add tests to exercise minting and creating. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
register DOIs with minimal DataCite metadata. still need to determine which details to include and when, but the plumbing is in place as we refine those rules. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
class for removing failed/invalid replicas from target nodes that previously held replicated content (KNB/LTER/PISCO/etc). https://redmine.dataone.org/issues/3539
disable EZID/DOI minting by default since we do not yet have a means of tracking minted DOIs and augmenting metadata for them when we actually receive the object in a subsequent create() or update() call. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5753
tweak to pathquery/generic xpath handling
ready Metacat for 2.0.6 release (docs, db version, build files etc).
group user_owner clause as "AND (... OR .... OR ....)" to handle multiple pathquery <owner> elements. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5880
accidentally added
typo
Search and indexing with Lucene/SOLRRequires a manually configured SOLR installationNot currently used by the rest of metacat
generate ID from UUID. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5840
make sure serial version is included or set on MN.update().http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5793
Quick fix for bad handling of non-default data/backup directories.
remove indexing task from the queue when we are updating the document
move DocInfo parsing into utilities project so that it can be used by Morpho as well as Metacat.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5737
use default count = 1000 for CN.listObjects rather than -1 (because now -1 will cause an SQL error)
default replicaStatus to true for the CN.listObject call
make sure to call lock() on the SM when updating rightsholder (like every other method that gets a lock object from HZ).
CN.search() id not implemented by metacat -- making that explicit and also testing for it.
default replicaStatus (aka "show replicas in results") to true rather than false
add debug statements for listObject slice debugging
additional db indexes for pathquery performancehttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5696
Do not set headers until response is ready to send (5756)
use dual query for query slicing - one for count, another for the actual records when requested.https://redmine.dataone.org/issues/3065
get total (or subtotal when non-slicing params are present) count as a separate query from the field selection query.
include Skye's suggestions about correctly limiting by D1 Event types
first pass at DOI minting using the EZID service in mn.generateIdentifier()http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5755
Fix a minor bug in listObjects() where total was set incorrectly when total was set incorrectly when count=0. The definition of total in the D1 architecture docs says 'The total number of entries in the source list from which the slice was extracted.' With count=0, we assume the total is the total count from the entire object store. Needs testing.
remove empty package
rollback the delete() when there is an error performing part of it -- don't want to end up with partial delete.
use Identifier object not String when retrieving SM from the HZ map to set archived during delete()
for MN.update() we needed to pass the original pid, not the new pid
do not reject any schemes -- all handled the same at the moment.
simple autogen-based implementation of MN.generateIdentifier(). does not support DOIs, ARKs, etc. It does support including a fragment, returning an identifier like "<fragment>.2012113010215298206"
add link for reference on how to do record limits in oracle
limit /log and /object calls to configurable maximum count for paging. defaults to existing Metacat value of 7000
use RDBMS-specific features to limit the resultset for paging the object list -- postgres and oracle have implementations. we don''t really support mssql so I skipped that one.
use RDBMS-specific features to limit the resultset for paging -- postgres and oracle have implementations. we don''t really support mssql so I skipped that one.
include debug msg about removing docid from index queue. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5750
remove document from the indexing queue when delete is called. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5750
clean up index queue code before tackling index/delete race condition. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5750
no need to mark SM as archived now that DocumentImpl.delete() does it.https://redmine.dataone.org/issues/3406
mark documents as archived=true when they are deleted using the Metacat API.https://redmine.dataone.org/issues/3406
look up the archived value when retrieving SystemMetadata record.https://redmine.dataone.org/issues/3405
surround returned query in CDATA to prevent parsing of xml within xml
In migrating to Hazelcast 2.4.x, replace deprecated methods. Use Hazelcast.newHazelcastInstance() rather than Hazelcast.init(). For other deprecated static methods, use the HazelcastInstance equivalent calls.
In CNodeService.updateReplicationMetadata(), we are setting the replicaVerifiedDate() when we update or wholesale add a new replica. However, in setReplicationStatus(), we only do so when there's a new entry. Change setReplicationStatus() to also update the replicaVerifiedDate on updates of existing entries to be more consistent with other changes. This affects node prioritization based on this date timestamp. Thanks to Skye for pointing this out.
To attempt to address performance and stability WRT Hazelcast communication, we're upgrading to the 2.x series of Hazelcast. remove the 1.9.x jar files, and add the 2.4.1-SNAPSHOT jars. Modify HazelcastService to handle the minor change in the ItemListener interface (now passes ItemEvent<Identifier> as an argument)....
implement query description for pathquery -- only tells callers about the pre-indexed paths we have in Metacat since there are an infinite number of "fields" when storing arbitrary XML, but we really don't want people using non-indexed paths for performance reasons anyway. I've typed all the fields as String, even though some are not just strings and can be used for numeric or data comparisons.
Implement MNQuery for "pathquery" engine. Optionally include guid in the pathquery results (https://redmine.dataone.org/issues/3083)
update pub_date when the length of that field is != 4 (use date_created in this scenario). There were 2 entries that had "193" as the pub_date.
replace new lines in creator with spaces. set blank " " titles and creators to "unknown". use "Baltimore Ecosystem Study LTER" for publisher on all BES objects.
include John Kunze's latest suggestions for improved metadata -- a lot of clean-up, especially on characters in the file. Note UTF-8 encoding of the script.
use ObjectFormatInfo libclient utility to look up mimeType and filename extension during get() calls. Configurable mapping file is deployed by default to /var/metacat/dataone where it can then be augmented as needed. This location is controlled in the metacat.properties file (which is injected into the DataONE Settings values during weapp intitialization)....
add count for the total processed pids (from ISet iterator)
handle /object?count=0 queries using simpler (quicker) sql https://redmine.dataone.org/issues/3065
allow getlog action to use docid parameters that do not include revision. In these cases, the latest revision will be used.
handle case where we do not have a pathexpr to checkhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5696
simplify the xml_access query, and instead use guid to check for permission. Now the docid/rev join (to get most recent version for search results) happens "higher up" in the query.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5696
pass parameters to the getLog action for rendering in xslt
remove morpho.jar -- moved needed classes into shared utilities project. (currently building form utilities trunk -- be sure to 'ant fullclean' to get the latest utilities.jar built)
Update d1_common_java and d1_libclient_java to the newest jar files. Add methods to CNodeService to throw NotImplemented exceptions for query(), listQueryEngines(), and getQueryEngineDescription() since these API calls are handled outside of metacat.
do not allow updates to orphan another branch of revision history. https://redmine.dataone.org/issues/3338
Change the set and get methods for the replication verified date to use java.sql.Timestamp rather than java.util.Date via setTimestamp(), not setDate(). The hh:mm:ss.sss was previously getting truncated.
include the subjects we are testing for authentication.https://redmine.dataone.org/issues/2778
remove the max(rev) clause in favor of a more straight-forward join to xml_documents (that will have the max rev). http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5696
include inverted sendParameters() method that uses the keys as values, and the values as keys so that multiple docid parameters can be specified for the zip download. This was a regression when moving to standard httpclient rather than the roll-your-own version we had been using. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5718
use version 2.0.5
shorten the systemmetadata* table names for Oracle's 30 character limit. move version to 2.0.5. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5717
use alternative syntax for xml_access table update since oracle does not use joins in a update statement that same way as postgres. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5717
correct Oracle syntaxhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5717
use correct column (guid) for temporary index
use correct docid format when checking for existing mappings.
use CDATA for docname field in docInfo so that XML parser ignores the content that can contain characters like "&
include missing identifier mappings during 2.0.4 upgrade (mappings may be missing due to previous replication between servers that do not house SystemMetadata)
use SchemaLocationResolver to fetch remote entries for the xml_catalog -- we want to be able to fetch included xsd files as well as use any error handling it provides for checking the schemas.
prep for 2.0.4 release
when performing query, make sure we are using the access rules of the latest revision of a given docid, otherwise we may include documents that used to be public but have been made private in subsequent revisions.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5696
correct the number of prepared statement parameters when inserting to xml_revisions table. Errors like the following were showing in the replication log file:knb 20120831-19:42:38: [ERROR]: DocumentImpl.writeReplication - Failed to create access rule for package: john.15950.1 because The column index is out of range: 12, number of columns: 11. [ReplicationLogging]
include WHERE in the sql where clause - encountered by SAEON's node admin, Alex Niehaus.
use resourceMapLocation (resolve url for the ore map) as the datacite_relatedIdentifier_isPartOf property
use lowercase 'metadata' and 'data' for the resourceType
set publisher to the source system when publisher == creator (we want them to be different, even if just for appearances)
only include public (readable) DOIs in the final output
use "lastname, firstname" convention throughout
include more descriptive data file name for title of data records
include publisher given name correctly
create docid-guid mapping during replication if it does not exist. we were [incorrectly] assuming that there would be SM coming with the document info that would fill this information in, but for traditional non-MN Metacat deployments there is no SM to provide a mapping. In this case we use the docid as the guid.
include certificate export SSL options as an example (used heavily for DataONE and Metacat Replication)
stream the replication "update" response rather than building up a complete list in a stringbuffer. prompted by findings on t he CN: https://redmine.dataone.org/issues/3141
make sure data objects correctly use force replicate with action "insert" https://redmine.dataone.org/issues/3138