getChecksum() is intentionally different on the CN vs MN, so I'm removing it as a common method.
Initial check in of the D1NodeService class that provides methods common to both CNodeService and MNodeService implementation classes. The common API methods are:
Methods common to CNCore and MNCore APIsgetLogRecords()
Methods common to CNRead and MNRead APIs...
placeholder for setting up certificate manager
remove AuthToken (use session). Remove login() call. Use AccessPolicu object to set public read permission
take getLogRecords impl form CrudService and use in CNCoreImpl
escape quotes when processing returnfield with predicates. example:<returnfield>dataset/dataTable/physical/distribution/online/url[@function='download']</returnfield>
Updated MetacatPopulator to now use ObjectFormatCache.getInstance(). Note: problems remain with the authentication API changes - calls to mn.login(), etc. need to be addressed.
Removed ObjectFormatService in favor of CNCoreImpl
Added support in ResourceHandler for the /formats collection. Added listFormats() and getFormat() method, both of which call CNCoreImpl methods to handle the call.
Updated CNCoreImpl to implement listFormats() and getFormat(), and changed calls to ObjectFormatCache in IdentifierManager, MetacatHandler to call getInstance(). Removed the ObjectFormatService registration from MetaCatServlet since it is replaced by CNCoreImpl.
add option for replicating system metadata (dataone)https://redmine.dataone.org/issues/1626
use Data Manager Library to parse EML when needed in DataONE classes.(augmented DML to parse data format elements in EML to estimate MIME type)https://redmine.dataone.org/issues/1634
When calling SystemMetadata.getObjectFormat(), return the string value of the ObjectFormatIdentifier rather than ObjectFormat.toString() (which no longer returns the fmtid string).
When calling SystemMetadata.getObjectFormat(), return the value of the ObjectFormatIdentifier rather than ObjectFormat.toString() (which no longer returns the fmtid string).
organize imports so that it is clearer what dependencies exist on the D1 jars
include create() and reserveIdentifier() methods
include override annotation for register method
use Date not joda's DateTime
expose spatial cache regeneration option in the admin interface
force replication for newly-registered system metadata
Merged in the D1_0_6_2_BRANCH changes that include the transition from ObjectFormat calls to ObjectFormatCache calls.
check system metadata for the id as well (in cases when we only have system metadata)
include GUID column for xml_access and related methods for storing/retrieving access rules
implement the old interface for now (until 0.6.2)
include CNCore implementation - only registerSystemMetadata is implemented at the moment. also - updated d1 jar (0.6.2) should be used since that is where the method is defined.would like to consider making ResourceHandler more modular - seems like it does A LOT of different things
include System Metadata forced replication - just need to figure out when to call it!
handle timed replication of system metadata. there are still a few outstanding issues: -track server location of system metadata-only entries-replication policy flag for system metadata-only entries?-locking for replicated entries?-forced replication of entries
read and write D1 access policy rules from metacat xml_access table.still TBD: which mechanism takes precedence when there are systemMetadata access rules and EML access rules and other access rules?
persist system metadata replication policy and status using db tables
rework SystemMetadata creation when inserting documents via the Metacat servlet api (in which case there was no client-supplued system metadata)
do not look in systemMetadata for a docid->guid mapping
transfer full System Metadata (as XML) during document and data replication
-remove system metadata guid -> local id mapping (there is no document for system metadata now)-include system metadata elements when replicating data objects (TODO: transfer all system metadata structures with the docinfo request).TODO: remove docid+rev from the systemMetadata table definition
do not use XML files for storing SystemMetadata - use DB tables only.
Modified Metacat to build against the D1_SCHEMA_0_6_1 branch of the dataone schemas by incorporating the 0.6.1-SNAPSHOT version of d1_common and d1_libclient libraries, and refactoring Metacat code references to the d1 schema changed types.
In order to sync up with DataONE 0.6.1 changes, I'm backing out ObjectFormatService changes temporarily in Metacat. Most functionality will be rolled back in using the DataONE 0.6.2 tag, but some methods in ObjectFormatService (such as getListFromDisk()) will be moved into d1_libclient_java.
Changes in the DataONE ObjectFormat class deprecate the convert() method, and we're now using Metacat's ObjectFormatService to look up object format attributes. The following changes replace ObjectFormat.convert() with ObjectFormatService.getFormat() in several classes....
use update method to update the mapping between local and guid (d1) when we get a force replication request that is an "update
generateMissingSystemMetadata was swallowing Exceptions instead of throwing. Refactored so that specific exceptions are thrown, affecting [create/update]SystemMetadata methods, too.
committing changes related to the new restservice update specification (newPid vs. obsoletedGuid)
replace whitespace in generated docid scope (sanparks patch from 1.9.4 branch)
use outputstream as an object, not a string. relax the Map typing to allow for mixed values. (sanparks patch)
use "object_format" element consistently so that it is replicated across instanceshttps://redmine.dataone.org/issues/1514
remove very old "metacat webservice" code - as far as i can tell it is never referenced or used. plus we have eocgrid and the new D1 rest services covering this territory now
zero padded date string in DocumentUtil.generateDocumentId() for readability
Use SystemUtil.getContextURL() in ResourceHandler to construct the DataONE service URL (rather than direct calls to PropertyService). This handles http and https URLs, and strips the :80 or :443 for the well known ports.
Minor changes to MetacatHandler:- Improved logging where MetaCatServlet.class was used in getLogger() rather than MetacatHandler.class (holdover from the refactor)- Minor formatting changes, and replacement of 'MetaCatServlet' with 'MetacatHandler' in the logging output as needed.
improved multipart handling (improved logging messages, code, and error checking). Added exception classname to error output when the generic Exception is thrown. Added error check for cases of null value for file parts 'sysmeta' and 'object.'
added a few debugging lines in createSystemMetadata() related to contents of identifier strings
Modified IdentifierManager.getDocumentInfo() to include the docid in the returned hash map, since it is useful to be able to obtain the docid and rev separately from a given fullDocidWithRev (e.g. test.1.1).
fixing annoying error message inaccuracy
Changed AuthLDAP to deal with cases where getAttributes encounters non-stringattributes (which used to cause a ClassCastException). Now, if an attributevalue can not be cast to string, we catch the class cast exception and justskip this value. This only typically occurs when an LDAP server is set to send...
MOdified MetacatHandler to catch cases where ObjectFormat is not being set properly on data files whengenerating SystemMetadata. When the EML document contains a format for an entity that maps to a nulltype in ObjectFormat.convert(), then the type ends up being null and an error is generated on insertion...
allow "docid override" queries to include the results of a "normal" query - if the operator is left null, it acts as the usual override, otherwise UNION and INTERSECT modes can be used to either augment or refine the results.this is for incorporating semantic+spatial+keyword queries into one query operation/result
remove System.out statements in favor of logging
Removed hardcoded D1 node type in ResourceHandler and added in a new 'dataone.nodeType' property. Also added 'dataone.coordinatingNodeBaseURL' property which points to the CN that stores the authoritative object format list. If this instance of Metacat is a CN, it may point to itself.
ResourceHandler in Metacat was set to return the KNB site URL as the MN base URL rather than the node Id. Fixed. https://redmine.dataone.org/issues/1390
initialize the HandlerPluginManager
allow the addition of properties via code
add event notification for insert/update/delete on documents (for semtools plugin)
do not attempt to check permissions when reading documents for systemMetadata generation (unless I completely do not understand this feature - please verify!).
do each table separately with it's own connection - running into memory issues on dev.nceas running this.
This is the start of the ObjectFormatService, which manages the list of object formats registered within Metacat. This includes schema types, mime types, and other information related to a particular format. The service provides functionality for the DataONE MemberNode and CoordinatingNode components, with CoordinatingNodes providing the authoritative list of object formats. See https://redmine.dataone.org/issues/1378....
Bug 3835 - design and implement OAI-PMH compliant harvest subsystem Minor bug fix to handle irregular Metacat docids containing two or more dot ('.') characters. In the LTER Metacat, the following docids (scope and identifier, excluding the revision value) have that characteristic:...
Bug 3835 - design and implement OAI-PMH compliant harvest subsystem Return a 'badVerb' response when the 'verb' request parameter is missing from the request. Previously this generated a NullPointerException.
use the jaxb date parser for ISO 8601 formats. the numeric and date node values are now calculated after the document has been successfully inserted in the db so any sql exceptions do not prevent the raw node data from being saved.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2084
rollback the accessDAO changes - leaving well enough alone.
only include accessfileid when it is not toplevel
include accessfileid and subtreeid when inserting xml_access values
use access control dao for setting access in EML parser. send additional xml_access info in replication request
insert/update documents with null user and null group to circumvent access control restrictions then update the user_owner and user_updated values to reflect what exists on the originating server (pisco)
use 'user_updated' field when writing the replicated document - allows most recent ownership/permissions to be used (in case LDAP groups have shifted) and is more accurate for both updates and initial inserts (hopefully addresses the replication issue we are having with pisco)
add support for temporal element query in pathqueryhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=2084
DocumentImpl.delete() now throws finer grained exceptions (not a general exception). Consequently, the classes that call it have been updated to handle the thrown exceptions, including CrudService, ReplicationHandler, and ReplicationService.
refactor the names of these Data Manager implementation classes so that it's easier to use them with the default/local versions of similar. These classes utilize Metacat-specific configuration values rather than relying soley on the bundles that are used in the stand-alone DM lib.
To support GUIDs in MetacatHandler.handleDeleteAction(), I've added in a new method:deleteFromMetacat() - deletes a document based on the docidThis factors the deletion code out of handleDeleteAction(). handleDeleteAction() now does a docid lookup based on GUID, and if it is not found, reverts to the deletion based on docid instead.
These are fairly significant changes to MetacatHandler.handleInsertOrUpdateAction() that add in support for creation or update of GUIDs and SystemMetadata. Upon insertion or update of DataPackages from non-DataONE aware clients (such as Morpho), the identifier table is updated by creating a GUID, and the systemmetadata table is updated with fields after the EML document is parsed for distribution information and entity typing. System Metadata documents are also generated and inserted into Metacat. The list of data entities is iterated over and System Metadata is generated for each data file as well.
In MetacatHandler I've removed updateSystemMetadata() in favor of additions to insertOrUpdateSystemMetadata(). Modified createSystemMetadata() to reflect the changes as well.
Modified MetacatHandler.createSystemMetadata() to take a localId, not a guid as an argument since there are times when the guid has yet to have been created, and it is created in this method if so. Also, put the read() call to get the InputStream of the data/metadata document into it's own try/catch statement.
Somehow missed adding in javadoc for read(). Here it is.
For now, getSystemMetadata() will be private like the other *SystemMetadata() methods.
Modified MetacatHandler, updated the getSystemMetadata() method to now use read() and deserializeSystemMetadata() to produce the SystemMetadata object. Exceptions are pushed up the stack, and so accordingly, modified createSystemMetadata() to reflect the changes.
Modified MetacatHandler, added createSystemMetadata() - generates SystemMetadata objects for newly inserted data or documents. This is intended to be used from handleInsertOrUpdateAction(), and only for documents being inserted from clients that don't support the DataONE interface. The method parses EML documents to discover data entities, and updates the system metadata for those entries, with support for describes and describedBy metadata. Currently doesn't handle FGDC, etc. documents....
Modified MetacatHandler, added three methods:getSystemMetadata() - returns a SystemMetadata object from the systemmetadata table using the given GUID. Stub only.updateSystemMetadata() - updates the systemmetadata table using the given SystemMetadata object....
Modified MetacatHandler and added two methods:serializeSystemMetadata() - Serialize a SystemMetadata object to XML stringdeserializeSystemMetadata() - Deserialize a SystemMetadata object from an XML string
Modified MetacatHandler, added read() - Read a document from metacat and return an InputStream. The XML or data document should be on disk, but if not, read from the metacat database. This method should be optimized, along with others, to not write stream data to disk for performance reasons.
To support generation of SystemMetadata and GUIDs, added a number of methods to MetacatHandler that are also in CrudService(). CrudService should eventually be refactored to use the handler methods. Added:readFromFilesystem() - Read a file from Metacat's configured file system data directory, and return a FileInputStream. This code has been factored out of handleInsertOrUpdateAction()....
fixed bug where the wrong checksum alg got written to the db
added file extension for txt or csv files
To support the generatemissingsystemmetadata REST call, modified CrudService.createSystemMetadata() to use DataoneEMLParser and further determine object formats from EML metadata. Formats currently supported are text/plain, text/csv, image/[jpg|jp2|bmp|tiff|png], and only for EML documents with 'ecogrid://' defined entity urls....
adding more debuggin and fixing bug with systemmetadata
Add code to download the included schema.
fixed replication bug where systemmetadata was not getting procssed correctly
think I fixed the connection problem. one connection in IdentifierManager was being leaked. added more debug info in case it happens again
Add a static method to get base url base on a schema url.
A sax handler class can get included schema path.
added some debug info to DBConnectionPool
fixed typo that prevented replication
fixed node response bug
fixed update problem