Add a new class SolrFieldParser and remove some classes like GenericIndex.
include class diagram for components in the cn-index-processor packaage (dataone), metacat, and solr library.
switch to non-snapshot EZID client jar.
use new, dedicated, LTER test account for LDAP referral test.
added more classes to the index diagram to reflect current state of the code - needs to be updated to include plan for implementation (e.g., DocType object is not what we want). http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5884
enable plantuml generation when building the sphinx documentation. note that you do need to have graphviz installed, but hopefully that is all.
select only distinct guids (synch may have failed more than once for any given guid)https://redmine.dataone.org/issues/3539
include xml_revisions.do not allow removal of server_location = 1 documents (these are not replicas).https://redmine.dataone.org/issues/3539
add README note about DOI support http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
include size and format datcite elements (optional) and use more general resourceType without formatId in them (Dataset/metadata and Dataset/data). http://schema.datacite.org/meta/kernel-2.2/doc/DataCite-MetadataKernel_v2.2.pdf
lookup the title for EML files when registering DOIs.lookup the creator from DataONE CN (if available).add EML-based test. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
Set the session to null so that the call uses the CN certificate when calling MN.systemMetadataChanged();
To keep all nodes up to date with regard to system metadata changes, add the broadcastSystemMetadataChange() method that finds replica MNs in the node list and calls systemMetadataChanged(). Modify setReplicationStatus() and updateReplicationMetadata() to fire this off when a replica status changes to completed. We may decide to inform MNs at other times too, but this is a conservative amount of calls going to the MNs for now.
include the create test in the suite
refactor DOI registration into separate class. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
refactor using ezid-client changes that split field names and values into separate enums. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
Correctly mint and register DOIs in teh MN API implementation. Add tests to exercise minting and creating. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
reference the correct metacat.properties entry for "guid.ezid.enabled"
use correct default ezid service baseURL
register DOIs with minimal DataCite metadata. still need to determine which details to include and when, but the plumbing is in place as we refine those rules. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
class for removing failed/invalid replicas from target nodes that previously held replicated content (KNB/LTER/PISCO/etc). https://redmine.dataone.org/issues/3539
add section about behavior for deprecated Metacat API. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
add DOI development page. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5513
disable EZID/DOI minting by default since we do not yet have a means of tracking minted DOIs and augmenting metadata for them when we actually receive the object in a subsequent create() or update() call. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5753
use utilities 1.3.0 tag
add solr index documentation outline. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5884
wordsmith the identity mapping page. Not fundamentally different, but hopefully more concise. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5814
use d1_libclient v1.2.1 (temp file creation fix)
tweak to pathquery/generic xpath handling
use utilities and eml style tag as we prep for release.
ready Metacat for 2.0.6 release (docs, db version, build files etc).
group user_owner clause as "AND (... OR .... OR ....)" to handle multiple pathquery <owner> elements. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5880
accidentally added
typo
remove older lucene library and include ORE test to make sure that change does not prevent us from generating OREs. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5874
Search and indexing with Lucene/SOLRRequires a manually configured SOLR installationNot currently used by the rest of metacat
PARC, OBFS, NRS: use only the paths that are indexed by default in metacat.properties. If deployments want to cusotmize these, they are free to do so, but we should ship skins that match the paths we index with a vanilla installation.
generate ID from UUID. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5840
make sure serial version is included or set on MN.update().http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5793
remove duplicate cgi-bin part in path to create account
Quick fix for bad handling of non-default data/backup directories.
Also add the 2.4.1 hazelcast jars to the trunk.
remove indexing task from the queue when we are updating the document
move DocInfo parsing into utilities project so that it can be used by Morpho as well as Metacat.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5737
use utilities tag to build (remember to 'fullclean' after this update!)
use default count = 1000 for CN.listObjects rather than -1 (because now -1 will cause an SQL error)
default replicaStatus to true for the CN.listObject call
make sure to call lock() on the SM when updating rightsholder (like every other method that gets a lock object from HZ).
return from test when we encounter the NotImplemented exception for CN.search()
include identifier.guid in the test SQL clause.
CN.search() id not implemented by metacat -- making that explicit and also testing for it.
default replicaStatus (aka "show replicas in results") to true rather than false
add debug statements for listObject slice debugging
Add the non-snapshot jars for the D1 libraries.
use utilities and eml RC tags for building Metacat.
include dataone.contactSubject in backup properties so it will be "remembered" during upgrades.
update release date to December
additional db indexes for pathquery performancehttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5696
Do not set headers until response is ready to send (5756)
use jar generated from the git repo source (just in case it was different from svn). http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5755
use dual query for query slicing - one for count, another for the actual records when requested.https://redmine.dataone.org/issues/3065
get total (or subtotal when non-slicing params are present) count as a separate query from the field selection query.
include Skye's suggestions about correctly limiting by D1 Event types
use test doi shoulder as the default for local server, at least during testing phase. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5755
first pass at DOI minting using the EZID service in mn.generateIdentifier()http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5755
Fix a minor bug in listObjects() where total was set incorrectly when total was set incorrectly when count=0. The definition of total in the D1 architecture docs says 'The total number of entries in the source list from which the slice was extracted.' With count=0, we assume the total is the total count from the entire object store. Needs testing.
remove empty package
rollback the delete() when there is an error performing part of it -- don't want to end up with partial delete.
use Identifier object not String when retrieving SM from the HZ map to set archived during delete()
for MN.update() we needed to pass the original pid, not the new pid
do not reject any schemes -- all handled the same at the moment.
simple autogen-based implementation of MN.generateIdentifier(). does not support DOIs, ARKs, etc. It does support including a fragment, returning an identifier like "<fragment>.2012113010215298206"
add link for reference on how to do record limits in oracle
limit /log and /object calls to configurable maximum count for paging. defaults to existing Metacat value of 7000
use RDBMS-specific features to limit the resultset for paging the object list -- postgres and oracle have implementations. we don''t really support mssql so I skipped that one.
use RDBMS-specific features to limit the resultset for paging -- postgres and oracle have implementations. we don''t really support mssql so I skipped that one.
Add the latest SNAPSHOT build of the hazelcast jars built by robert at:
http://dev-testing.dataone.org/maven/com/hazelcast/hazelcast/2.4.1-SNAPSHOT/hazelcast-2.4.1-SNAPSHOT.jarhttp://dev-testing.dataone.org/maven/com/hazelcast/hazelcast-client/2.4.1-SNAPSHOT/hazelcast-client-2.4.1-SNAPSHOT.jar...
Update the hazelcast libraries based on the most recent build from the hazelcast trunk using patches that robert submitted via git pull requests.
include debug msg about removing docid from index queue. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5750
remove document from the indexing queue when delete is called. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5750
clean up index queue code before tackling index/delete race condition. http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5750
additional release notes about archive/delete behavior and HZ upgrade
no need to mark SM as archived now that DocumentImpl.delete() does it.https://redmine.dataone.org/issues/3406
mark documents as archived=true when they are deleted using the Metacat API.https://redmine.dataone.org/issues/3406
look up the archived value when retrieving SystemMetadata record.https://redmine.dataone.org/issues/3405
surround returned query in CDATA to prevent parsing of xml within xml
Update the two hazelcast jars to 2.4.1-SNAPSHOT versions that Robert generated after fixing certain hazelcast build problems.
correct the metacat.properties help anchors.
use sleeker "?" icon for the admin help links
correct the "?" links in the admin pages to the docs pages that are deployed as part of metacat.
In migrating to Hazelcast 2.4.x, replace deprecated methods.
In migrating to Hazelcast 2.4.x, replace deprecated methods. Use Hazelcast.newHazelcastInstance() rather than Hazelcast.init(). For other deprecated static methods, use the HazelcastInstance equivalent calls.
In CNodeService.updateReplicationMetadata(), we are setting the replicaVerifiedDate() when we update or wholesale add a new replica. However, in setReplicationStatus(), we only do so when there's a new entry. Change setReplicationStatus() to also update the replicaVerifiedDate on updates of existing entries to be more consistent with other changes. This affects node prioritization based on this date timestamp. Thanks to Skye for pointing this out.
To attempt to address performance and stability WRT Hazelcast communication, we're upgrading to the 2.x series of Hazelcast. remove the 1.9.x jar files, and add the 2.4.1-SNAPSHOT jars. Modify HazelcastService to handle the minor change in the ItemListener interface (now passes ItemEvent<Identifier> as an argument)....
implement query description for pathquery -- only tells callers about the pre-indexed paths we have in Metacat since there are an infinite number of "fields" when storing arbitrary XML, but we really don't want people using non-indexed paths for performance reasons anyway. I've typed all the fields as String, even though some are not just strings and can be used for numeric or data comparisons.
Implement MNQuery for "pathquery" engine. Optionally include guid in the pathquery results (https://redmine.dataone.org/issues/3083)
update pub_date when the length of that field is != 4 (use date_created in this scenario). There were 2 entries that had "193" as the pub_date.
replace new lines in creator with spaces. set blank " " titles and creators to "unknown". use "Baltimore Ecosystem Study LTER" for publisher on all BES objects.
include John Kunze's latest suggestions for improved metadata -- a lot of clean-up, especially on characters in the file. Note UTF-8 encoding of the script.