rework SystemMetadata creation when inserting documents via the Metacat servlet api (in which case there was no client-supplued system metadata)
do not look in systemMetadata for a docid->guid mapping
transfer full System Metadata (as XML) during document and data replication
remove docid and rev from systemMetadata table
-remove system metadata guid -> local id mapping (there is no document for system metadata now)-include system metadata elements when replicating data objects (TODO: transfer all system metadata structures with the docinfo request).TODO: remove docid+rev from the systemMetadata table definition
add systemMetadataProvenance table for tracking those relationships
do not use XML files for storing SystemMetadata - use DB tables only.
Modified Metacat to build against the D1_SCHEMA_0_6_1 branch of the dataone schemas by incorporating the 0.6.1-SNAPSHOT version of d1_common and d1_libclient libraries, and refactoring Metacat code references to the d1 schema changed types.
In order to sync up with DataONE 0.6.1 changes, I'm backing out ObjectFormatService changes temporarily in Metacat. Most functionality will be rolled back in using the DataONE 0.6.2 tag, but some methods in ObjectFormatService (such as getListFromDisk()) will be moved into d1_libclient_java.
Changes in the DataONE ObjectFormat class deprecate the convert() method, and we're now using Metacat's ObjectFormatService to look up object format attributes. The following changes replace ObjectFormat.convert() with ObjectFormatService.getFormat() in several classes....
Include the DataONE 0.6.0 type schema in the SQL schema and DTD loader script.
use update method to update the mapping between local and guid (d1) when we get a force replication request that is an "update
generateMissingSystemMetadata was swallowing Exceptions instead of throwing. Refactored so that specific exceptions are thrown, affecting [create/update]SystemMetadata methods, too.
committing changes related to the new restservice update specification (newPid vs. obsoletedGuid)
replace whitespace in generated docid scope (sanparks patch from 1.9.4 branch)
use outputstream as an object, not a string. relax the Map typing to allow for mixed values. (sanparks patch)
use "object_format" element consistently so that it is replicated across instanceshttps://redmine.dataone.org/issues/1514
remove very old "metacat webservice" code - as far as i can tell it is never referenced or used. plus we have eocgrid and the new D1 rest services covering this territory now
zero padded date string in DocumentUtil.generateDocumentId() for readability
Use SystemUtil.getContextURL() in ResourceHandler to construct the DataONE service URL (rather than direct calls to PropertyService). This handles http and https URLs, and strips the :80 or :443 for the well known ports.
Minor changes to MetacatHandler:- Improved logging where MetaCatServlet.class was used in getLogger() rather than MetacatHandler.class (holdover from the refactor)- Minor formatting changes, and replacement of 'MetaCatServlet' with 'MetacatHandler' in the logging output as needed.
improved multipart handling (improved logging messages, code, and error checking). Added exception classname to error output when the generic Exception is thrown. Added error check for cases of null value for file parts 'sysmeta' and 'object.'
added a few debugging lines in createSystemMetadata() related to contents of identifier strings
Modified IdentifierManager.getDocumentInfo() to include the docid in the returned hash map, since it is useful to be able to obtain the docid and rev separately from a given fullDocidWithRev (e.g. test.1.1).
fixing annoying error message inaccuracy
Changed AuthLDAP to deal with cases where getAttributes encounters non-stringattributes (which used to cause a ClassCastException). Now, if an attributevalue can not be cast to string, we catch the class cast exception and justskip this value. This only typically occurs when an LDAP server is set to send...
MOdified MetacatHandler to catch cases where ObjectFormat is not being set properly on data files whengenerating SystemMetadata. When the EML document contains a format for an entity that maps to a nulltype in ObjectFormat.convert(), then the type ends up being null and an error is generated on insertion...
allow "docid override" queries to include the results of a "normal" query - if the operator is left null, it acts as the usual override, otherwise UNION and INTERSECT modes can be used to either augment or refine the results.this is for incorporating semantic+spatial+keyword queries into one query operation/result
remove System.out statements in favor of logging
Removed hardcoded D1 node type in ResourceHandler and added in a new 'dataone.nodeType' property. Also added 'dataone.coordinatingNodeBaseURL' property which points to the CN that stores the authoritative object format list. If this instance of Metacat is a CN, it may point to itself.
ResourceHandler in Metacat was set to return the KNB site URL as the MN base URL rather than the node Id. Fixed. https://redmine.dataone.org/issues/1390
initialize the HandlerPluginManager
allow the addition of properties via code
add event notification for insert/update/delete on documents (for semtools plugin)
do not attempt to check permissions when reading documents for systemMetadata generation (unless I completely do not understand this feature - please verify!).
do each table separately with it's own connection - running into memory issues on dev.nceas running this.
do not drop nonexistent table (identifier is not in 1.9.3)
This is the start of the ObjectFormatService, which manages the list of object formats registered within Metacat. This includes schema types, mime types, and other information related to a particular format. The service provides functionality for the DataONE MemberNode and CoordinatingNode components, with CoordinatingNodes providing the authoritative list of object formats. See https://redmine.dataone.org/issues/1378....
Bug 3835 - design and implement OAI-PMH compliant harvest subsystem Minor bug fix to handle irregular Metacat docids containing two or more dot ('.') characters. In the LTER Metacat, the following docids (scope and identifier, excluding the revision value) have that characteristic:...
Bug 3835 - design and implement OAI-PMH compliant harvest subsystem Return a 'badVerb' response when the 'verb' request parameter is missing from the request. Previously this generated a NullPointerException.
use the jaxb date parser for ISO 8601 formats. the numeric and date node values are now calculated after the document has been successfully inserted in the db so any sql exceptions do not prevent the raw node data from being saved.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=2084
rollback the accessDAO changes - leaving well enough alone.
only include accessfileid when it is not toplevel
include accessfileid and subtreeid when inserting xml_access values
use access control dao for setting access in EML parser. send additional xml_access info in replication request
insert/update documents with null user and null group to circumvent access control restrictions then update the user_owner and user_updated values to reflect what exists on the originating server (pisco)
use 'user_updated' field when writing the replicated document - allows most recent ownership/permissions to be used (in case LDAP groups have shifted) and is more accurate for both updates and initial inserts (hopefully addresses the replication issue we are having with pisco)
add support for temporal element query in pathqueryhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=2084
DocumentImpl.delete() now throws finer grained exceptions (not a general exception). Consequently, the classes that call it have been updated to handle the thrown exceptions, including CrudService, ReplicationHandler, and ReplicationService.
refactor the names of these Data Manager implementation classes so that it's easier to use them with the default/local versions of similar. These classes utilize Metacat-specific configuration values rather than relying soley on the bundles that are used in the stand-alone DM lib.
To support GUIDs in MetacatHandler.handleDeleteAction(), I've added in a new method:deleteFromMetacat() - deletes a document based on the docidThis factors the deletion code out of handleDeleteAction(). handleDeleteAction() now does a docid lookup based on GUID, and if it is not found, reverts to the deletion based on docid instead.
These are fairly significant changes to MetacatHandler.handleInsertOrUpdateAction() that add in support for creation or update of GUIDs and SystemMetadata. Upon insertion or update of DataPackages from non-DataONE aware clients (such as Morpho), the identifier table is updated by creating a GUID, and the systemmetadata table is updated with fields after the EML document is parsed for distribution information and entity typing. System Metadata documents are also generated and inserted into Metacat. The list of data entities is iterated over and System Metadata is generated for each data file as well.
In MetacatHandler I've removed updateSystemMetadata() in favor of additions to insertOrUpdateSystemMetadata(). Modified createSystemMetadata() to reflect the changes as well.
Modified MetacatHandler.createSystemMetadata() to take a localId, not a guid as an argument since there are times when the guid has yet to have been created, and it is created in this method if so. Also, put the read() call to get the InputStream of the data/metadata document into it's own try/catch statement.
Somehow missed adding in javadoc for read(). Here it is.
For now, getSystemMetadata() will be private like the other *SystemMetadata() methods.
Modified MetacatHandler, updated the getSystemMetadata() method to now use read() and deserializeSystemMetadata() to produce the SystemMetadata object. Exceptions are pushed up the stack, and so accordingly, modified createSystemMetadata() to reflect the changes.
Modified MetacatHandler, added createSystemMetadata() - generates SystemMetadata objects for newly inserted data or documents. This is intended to be used from handleInsertOrUpdateAction(), and only for documents being inserted from clients that don't support the DataONE interface. The method parses EML documents to discover data entities, and updates the system metadata for those entries, with support for describes and describedBy metadata. Currently doesn't handle FGDC, etc. documents....
Modified MetacatHandler, added three methods:getSystemMetadata() - returns a SystemMetadata object from the systemmetadata table using the given GUID. Stub only.updateSystemMetadata() - updates the systemmetadata table using the given SystemMetadata object....
Modified MetacatHandler and added two methods:serializeSystemMetadata() - Serialize a SystemMetadata object to XML stringdeserializeSystemMetadata() - Deserialize a SystemMetadata object from an XML string
Modified MetacatHandler, added read() - Read a document from metacat and return an InputStream. The XML or data document should be on disk, but if not, read from the metacat database. This method should be optimized, along with others, to not write stream data to disk for performance reasons.
To support generation of SystemMetadata and GUIDs, added a number of methods to MetacatHandler that are also in CrudService(). CrudService should eventually be refactored to use the handler methods. Added:readFromFilesystem() - Read a file from Metacat's configured file system data directory, and return a FileInputStream. This code has been factored out of handleInsertOrUpdateAction()....
fixed bug where the wrong checksum alg got written to the db
Updated DROP script that was missing tables and sequences.
added file extension for txt or csv files
To support the generatemissingsystemmetadata REST call, modified CrudService.createSystemMetadata() to use DataoneEMLParser and further determine object formats from EML metadata. Formats currently supported are text/plain, text/csv, image/[jpg|jp2|bmp|tiff|png], and only for EML documents with 'ecogrid://' defined entity urls....
adding more debuggin and fixing bug with systemmetadata
Add code to download the included schema.
fixed replication bug where systemmetadata was not getting procssed correctly
think I fixed the connection problem. one connection in IdentifierManager was being leaked. added more debug info in case it happens again
Add a static method to get base url base on a schema url.
A sax handler class can get included schema path.
added some debug info to DBConnectionPool
fixed typo that prevented replication
fixed node response bug
fixed update problem
put the pid in the info section of the url
fixed content type problem where csv files were set as text/xml
fixed problem with count in listObjects()
Cleaned up warnings, removed dead code.
Updated to most recent DataONE libraries. Updated CrudService to set the correct origin MN and auth MN in system metadata. Refactored exception passing. More work to come in generating SystemMetadata.
removed debug statements
fixed bugs in listObjects
Add in correct node references in system metadata.
Cleanup harvester exceptions and generics.
remove httpclient 3.1 and custom-built httpclient.jarrework MetacatClient (and other classes) to use httpclient 4updated build to not create httpclient.jarencoding tests now pass.
blank configuration value should be treated the same as null
few bug fixes for listObjects
added code to do database query for listObjects
Cleaned up unused imports.
pass the root exception message up the call chain so that it can effectively be reported as a helpful error message. also, the JUnit test expects the specific error message (SchemaRegistryTest)
adding fields for additional system metadata info
use the read() method instead of manually calling with parameters
some new code for debugging mmp
updated the populator
added code to run an squery for listObjects instead of an anyfield query
always re-write web.xml in case geoserver has been redeployedhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=4307
Modified MetacatPopulator to deal with change in D1Client static methods.
added more code for new mmp requests
bug fixes