catch errors for each localid we are processing so that they do do prevent other ids from having ORE content generated
double check "ecogrid" data urls for valid docid.rev - namely integer rev numbers - when parsing EML and also generating system metadata when necessary. Log the errors as warnings.
Use Jjava.util.Calendar rather than com.ibm ...
use current datetime (at system metadata generation) as the date last modified
Globally change the property 'dataone.memberNodeId' to 'dataone.nodeId'. This is more useful for both MNs and CNs implemented in Metacat. Also, change D1NodeService.getLogRecords() to return log entries with the actual node id rather than the IP address (looks like a cut/paste error)....
do not allow blank node references to be used.https://redmine.dataone.org/issues/2362
Get ReplicationPolicy correctly generated:-tweak the regular expression for getting the pref/blocked node list for default replication policy.-set blocked list (had mistakenly been two calls to set pref list)
1. lookup and use the guid when processing obsoletes/obsoletedBy entries -- had previously been assuming localId==guid but now that we have introduced DOIs as part of the Metacat upgrade process, we may have DOIs for the guid that map to localIds.2. base ORE guids on the localid of the data package they are describing and not on their DOI -- otherwise we might mash up the DOI prefix (or other id scheme that we are unaware of). By using resourceMapPreix + localId we are sure to have a valid localid and guid for the ORE map we create and add to the system
remove createAndInsertSystemMetadat() method that acts on a single localId -- incorporated this into the localId-list-based method.
refactor IdentityManager.createSystemMetadata(sm) to be insertSystemMetadata(sm) so that it is clear that this method inserts the SM object into the backing store. This differentiates it from the "generation" methods we use when we need to create SM about pre-existing objects or objects we get from non-D1 api calls.
generate SystemMetadata during D1 registration (not 2.0.0 upgrade). This process runs in a thread and updates a metacat.properties value when it is complete.
Added new methods to generate a default replication policy based on properties from the metacat configuration. This is called during system metadata creation for objects that lack any system metadata.
handle "BIN" objects so as to avoid repeated calls to lookup the non-existent ObjectFormat
catch cases where the previous/next revision of objects have not had system metadata generated yet
create system metadata object if it wasn't found in HZ
multithreaded implementation for processing docids for system metadata generation.need to investigate ant/junit running that deadlocks hazelcast (config?)
calculate object size using the size on the file system rather than re-reading as an input stream.Now only EML document bytes will be read twice: once for the checksum and again for parsing out datapackage details
system metadata generation optionally skips entries that have already been generated (data size, checksum) but allows the latest EML that describes them to have the last word on object format
remove DML for parsing -- the D1 EML parser still uses DOM, so this may not be too big of a perfromance improvement
try to read the local document before making the localid->guid mapping (in cases where we fail to read the data locally like if it is referenced in an EML file but does not exist on this Metacat instance)
refactor generate system meta loop to the factory class -- to be reused in sysmeta and ORE generationhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
When managing obsoletes/obsoletedBy system metadata fields, set the archived flag to false initially, and set it to true on system metadata for objects that a revision obsoletes.
check that the resourceMap (based on Id only) does not currently exist in the local metacat when generating OREs
do not download and save remote data resources which are HTML but are not expected to be such (login or info/splash pages before data content).http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
use inherited access control from EML for the data file we download from a remote sourcehttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
download remote data and save locally when it is referenced by an EML package, then include it in the ORE map.http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522
process the current revision, not the latest!use direct object/system metadata insertion for ORE maps.
add revision history to the generated ORE objects -- we use the revision history of the EML package as a basis because the each ORE revision mirrors the revision of the EML package. Add a placeholder for checking if an equivalent ORE map exists in the DataONE infrastructure - this will be a call to CN.search() that looks at the solr index for OREs based on the EML package ID.
for now, look up SystemMetadata directly from the table otherwise we won't have the latest access information. Need to refresh the in-memory copy everytime we edit the access policy via Metacat (includes EML parser)
refactor Metacat access handling to be on a per-revision basis so that it more closely aligns with the DataONE approachhttp://bugzilla.ecoinformatics.org/show_bug.cgi?id=5560
ensure that the revision list is ordered ascending in case someone changes the sql query without realizing that it matters...
set the byte size of the ORE map before adding it
set/update the obsoletes/obsoletedBy fields in system metadata so that we always have a complete revision history for each object.Note: ORE maps do not have revision history...yet(?)
generating ORE maps and creating/updating system metadata now. There are some Permission conversion issues to be worked out yet
optionally include ORE generation/insertion into Metacat when generating SystemMetadatahttps://redmine.dataone.org/issues/2056
no need to cast docInfo entries to String -- they are all strings
set revision history, the create/update dates and the owner/submitter (correctly)
use shared method for looking up "docInfo" map -- both in Metacat replication and in D1 system metadata generation
make default formatting a little bit easier to read
reformat code -- no changes
refactor SystemMetadata creation into separate class from the MetacatHandler -- this will be shared by upgrade code and normal metacat api.