all full-text queries for ORCID, but it isn't that great because we might have a"PISCO" creator that shows us in may different orcid profiles...false matches.
correct glaring erros -- still needs to be honed, but at least it runs without NPE and Jena/foresite errors
stub for testing ORE augmentation - this generates an ORE, adds a "wasDerivedFrom" triple and saves to Metacat MN for indexing. https://projects.ecoinformatics.org/ecoinfo/issues/6548
include ORE formatId as handled by the RDF subprocessor and index prov:wasDerivedFrom field where it exists in the RDF model. https://projects.ecoinformatics.org/ecoinfo/issues/6548
expand the sparql queries to include dcterms:identifier
Text changes to ORE docs
Add documentation for the ORE model expansion for derived data
use HttpClient to query orcid so I can easily set headers and such -- getting 503s from their production server when I test on dev.nceas...odd
adjust tests for production service -- more "real" information shows additional return values from the query.
switch to the production ORCID server for looking up orcid matches for our creators.add test to summarize how many creator matches we can actually find. https://projects.ecoinformatics.org/ecoinfo/issues/6423
use a non-public rightsHolder for both EML and Annotation test documents now that the RDF subprocessor checks each annotation to see that it came from a user that as write permission for the object being annotated.
test for update using the updated EML file, not the original. Also add the SM to the shared map so that the indexing process can consult SM.accessPolicy when indexing annotations that assert things about those test documents.
ignore the metacat/solr comparator tests - they are one-offs.
change the hazelcast group name to be the default "metacat" instance so that the metacat-index tests pass without additional local configuration, at least when running a default metacat deployment.
do not set archived=false for all CN.create calls. The CN will use create() even harvesting content that is new to it and needs to handle already-archived content. https://projects.ecoinformatics.org/ecoinfo/issues/6475
cache the imported models to avoid timeouts from remote hosts (or being locked out for too many requests in a given time period).
process all the returned annotation suggestions until we find one that is appropriately located in the subclass hierarchy for the given superclass.
use in-memory TDB dataset for querying annotations for indexing -- this comes with the same reasoning capabilities as the directory-based one, but has the benefit of not filling the directory with triples that will not be used again. prepping for d1 AHM
pass around the object file path rather than the data stream so that multiple subprocessors can index the same object and not consume the stream before it gets to the next one. In preparation for extending the assertions stored in OREs. https://projects.ecoinformatics.org/ecoinfo/issues/6548
when indexing annotations directly, just use an in-memory triple store rather than TDB since we remove each graph as it is processed (and my TDB instance would get into the multi-GB range with a few runs, even if I removed the old models)
redirect "short form" metacat read URIs to the the new Metacat UI using the configured UI context. This translates the docid -> pid to use the correct identifier for the correct service. https://projects.ecoinformatics.org/ecoinfo/issues/6546
simplify lookup for classes and orcid. remove the "random" annotation code branches -- just too confusing to look at those bogus classes especially now that we have "real" generated annotations.
add 'test' for indexing annotations without actually storing the RDF of the generated annotation.
only allow multiple values for multi-valued fields....
Add admin service to update DOI registrations by specifying a list of formatIds or DOIs, or update all.
use new method to override the CN URL when constructing a CNode instance. see https://redmine.dataone.org/issues/5142
use newer httpclient library so that Jena's dependency is met - this goes all the way back to d1_common/libclient needing to pull in the newer library.
first pass at direct EML->semantic index method. Still produces an RDF model, but does not persist it in Metacat, only in the triplestore. Allows us to re-run without adding stale RDF to the MN store.
allow multivalued fields to be indexed using the "fields" pass through.
Remove the attribute disable from the update button if it has been submitted.
The image which has the default values.
Add a new screen shot which contains the cn url.
Localized the file which doesn't have the bean for dataUrl.
Add the cn url.
Add the text field for the cn url.
Store the cn url in the backup.
switch to use FIleUpload instead of O'Reilly COS library for handling chunked file uploads. https://projects.ecoinformatics.org/ecoinfo/issues/6517
forgot to check in the actual class: first pass at allowing admins to update DOI registration. This only acts on EML objects at the moment and is meant to illustrate one mechanism for updating the DOIs. https://projects.ecoinformatics.org/ecoinfo/issues/6530
first pass at allowing admins to update DOI registration. This only acts on EML objects at the moment and is meant to illustrate one mechanism for updating the DOIs. https://projects.ecoinformatics.org/ecoinfo/issues/6530
correct the ORE lookup query syntax and add junit assertion to check that it continues to function as expected. https://projects.ecoinformatics.org/ecoinfo/issues/6529
index the ORE after we submit the metadata for indexing. https://projects.ecoinformatics.org/ecoinfo/issues/6520
include BioPortal lookup for Entity matches using the data table description. TODO: only associate measurements to the entity observation if they apply.
recompile with java 1.6 for compatibility with our servers.
Remove the reference to the bean eml.fileID.
use 1.5.1 tag for hudson to build metacat ui (for KNB deployment)
Index the document after it has been inserted.
Index the document after document is written to the db.
Use the ecogrid-1.2.3 branch which will be the next release.
Remove the bean named eml.fileID which used the ResolveSolrField class.
calculate geohash_3 to three places (typo)
use NSEW for the bounding box geohash calculation from EML - all versions
up the field count to 111 to include the 9 geohash fields.
Using 1.3.0-SNAPSHOT from d1_cn_index_processor
Add fields for geohashes
Add beans to support geohashes
Check for undefined and null elements to avoid errors in IE 8 and earlier in the registry entry form JS
Close a <span> HTML tag in the entry form MetacatUI template to avoid errors in older browsers
The package libdigest-sha1-perl was removed from ubuntu 12.04. We have to install it from cpan.
add "test" for generating annotations based on the entity/attribute details of a datapackage. This iterates through all current EML revisions and either updates or creates annotations based on what it finds. It does add content to your metacat deployment (RDF files) but it can be safely re-run when each time we change our annotation algorithm.
check for null entities and/or attributes (typically when otherEntity is being used in EML).
remove extra space in log message
handle null Boolean in SM.archived field
include sample data package for generating annotations. This is the classic Datos Meteorologicos set, but with Matthew Jones as the creator so that we can look up his ORCID in their sandbox environment. https://projects.ecoinformatics.org/ecoinfo/issues/6267
use Matthew Jones for test creator since he has an ORCID in their staging environment.
augment annotation indexing test/sample to include orcid annotation. https://projects.ecoinformatics.org/ecoinfo/issues/6267https://projects.ecoinformatics.org/ecoinfo/issues/6423
attribute the datapackage to the creator (using orcid if we can find it). https://projects.ecoinformatics.org/ecoinfo/issues/6267https://projects.ecoinformatics.org/ecoinfo/issues/6423
add test for BioPortal annotator service.
refactor web service calls to bioportal and orcid outside of the annotator class. test with orcid sandbox server. include orcid uri for the annotations being generated (we can index these and drive our searches on these values down the road). related to this: https://projects.ecoinformatics.org/ecoinfo/issues/6423 and also some semtools tasks.
remove leading '?' in the query parameter for MN.query() implementation. We want it to match CN behavior/expectations and comply with the DataONE specification for the interface. https://projects.ecoinformatics.org/ecoinfo/issues/6488
Use OBOE-SBC ontology for looking up concepts (it contains subclasses of our OBOE Characteristic and Standard superclasses). Restrict annotations to only subclasses that fit the OBOE model. Correct the xpointer and individual naming conventions so they are unique, but express the exact entity/attribute being annotated.
remove my api key. oops
add comment/pointer to BioPortal annotation service.
Include method to look up annotation classes from BioPortal. We still have OBOE-SBC in there, and theyhave the SWEET ontology. The suggestions returned are not perfect, but they can be better than nothing. Ideally, we'd only query a few ontologies so we don't end up using terms from medical ontologies that aren't really appropriate for our domain. https://projects.ecoinformatics.org/ecoinfo/issues/6256
Add xpointer FragmentSelectors to each annotation.Split attribute label into tokens to attempt matching to OBOE concepts.
include code to generate random annotations for UI testing. Effective, but can be confusing to see so many unrelated concepts on duplicate EML packages.
include characteristic_sm field with SPARQL query
include SSLVerify* directives for client certificates and a pointer for getting the DataONE chain files.
Added an explanation of "metacat context" to the Metacat Themes docs based on questions asked by an actual user following our instructions in the docs.
Edited the docs to incude more details about creating a custom theme
Remove the code to lookup alias dn in the getGroups method.
Rather than directly to modify the env, we use context.addToEnv.This fixed a bug in non-tls env, the alias log-in doesn't work.
first pass at generating annotations from EML attribute information. uses the OpenAnnotation model that the metacat-index tests assume which allows us to populate dynamic index fields for the annotation class[es]. There is still much to be done with finding appropriate concepts for each attribute. https://projects.ecoinformatics.org/ecoinfo/issues/6256
switch to index standard since it is more likely we will be able to determine this from our existing EML attribute information. https://projects.ecoinformatics.org/ecoinfo/issues/6253
Edited the replicaPolicies script to print out a list of IDs that have a different authoritative member node, the number of successes, and failures at the end.
Add comments to bash script to explain its function and dependencies
Added a bash script to call /replicaPolicies/{pid} via the DataONE API for all objects in a MN or a list of ids.
Add the test class for the pisco account.
Remove the test method for the pisco account since it maybe fails because of the fire wall issue.
Add the login test of the pisco account.
Add the pisco account.
Do a more thorough check that the characteristic annotation was successfully indexed as expected (semtools)
switch to the OpenAnnotation (OA) model for annotating datapackages with measurements/characteristics (semtools)
support content from all serverLocations when summarizing entity info (semtools)
bump the poms to 2.4.2
merge from trunk: these open layers resources were not committed!
merge from branch: more notes on 2.4.1 release in the readme
Add a pisco test account.