remove classes annotator classes that have moved to a different project under dataone's github.
add Annotator Store implementation -- pass through to D1 API for the AnnotatorJS API
remove AnnotatorService completely - was moved to cn-index-processor
only index non-empty comment text
index both tags and text from annotateit.org
query by consumerKey until the pid search facets are fully supported on annotateit.org
look up annotations when reindexing a given pid. still very much a prototype in that we are looking up annotations from an external annotator-store. TODO: add pid filtering to query when annotateit.org supports it (pending upgrade on their site).
use http://tools.ietf.org/rfc/rfc3023 spec for conformsTo property. use the full xpath for EML dataTable and attribute selectors
add support for v2 DataONE API.
use separate surName and givenNames to lookup ORCIDs.
all full-text queries for ORCID, but it isn't that great because we might have a"PISCO" creator that shows us in may different orcid profiles...false matches.
use HttpClient to query orcid so I can easily set headers and such -- getting 503s from their production server when I test on dev.nceas...odd
adjust tests for production service -- more "real" information shows additional return values from the query.
switch to the production ORCID server for looking up orcid matches for our creators.add test to summarize how many creator matches we can actually find. https://projects.ecoinformatics.org/ecoinfo/issues/6423
cache the imported models to avoid timeouts from remote hosts (or being locked out for too many requests in a given time period).
process all the returned annotation suggestions until we find one that is appropriately located in the subclass hierarchy for the given superclass.
use in-memory TDB dataset for querying annotations for indexing -- this comes with the same reasoning capabilities as the directory-based one, but has the benefit of not filling the directory with triples that will not be used again. prepping for d1 AHM
when indexing annotations directly, just use an in-memory triple store rather than TDB since we remove each graph as it is processed (and my TDB instance would get into the multi-GB range with a few runs, even if I removed the old models)
simplify lookup for classes and orcid. remove the "random" annotation code branches -- just too confusing to look at those bogus classes especially now that we have "real" generated annotations.
first pass at direct EML->semantic index method. Still produces an RDF model, but does not persist it in Metacat, only in the triplestore. Allows us to re-run without adding stale RDF to the MN store.
include BioPortal lookup for Entity matches using the data table description. TODO: only associate measurements to the entity observation if they apply.
check for null entities and/or attributes (typically when otherEntity is being used in EML).
remove extra space in log message
attribute the datapackage to the creator (using orcid if we can find it). https://projects.ecoinformatics.org/ecoinfo/issues/6267https://projects.ecoinformatics.org/ecoinfo/issues/6423
add test for BioPortal annotator service.
refactor web service calls to bioportal and orcid outside of the annotator class. test with orcid sandbox server. include orcid uri for the annotations being generated (we can index these and drive our searches on these values down the road). related to this: https://projects.ecoinformatics.org/ecoinfo/issues/6423 and also some semtools tasks.
Use OBOE-SBC ontology for looking up concepts (it contains subclasses of our OBOE Characteristic and Standard superclasses). Restrict annotations to only subclasses that fit the OBOE model. Correct the xpointer and individual naming conventions so they are unique, but express the exact entity/attribute being annotated.
remove my api key. oops
add comment/pointer to BioPortal annotation service.
Include method to look up annotation classes from BioPortal. We still have OBOE-SBC in there, and theyhave the SWEET ontology. The suggestions returned are not perfect, but they can be better than nothing. Ideally, we'd only query a few ontologies so we don't end up using terms from medical ontologies that aren't really appropriate for our domain. https://projects.ecoinformatics.org/ecoinfo/issues/6256
Add xpointer FragmentSelectors to each annotation.Split attribute label into tokens to attempt matching to OBOE concepts.
include code to generate random annotations for UI testing. Effective, but can be confusing to see so many unrelated concepts on duplicate EML packages.
first pass at generating annotations from EML attribute information. uses the OpenAnnotation model that the metacat-index tests assume which allows us to populate dynamic index fields for the annotation class[es]. There is still much to be done with finding appropriate concepts for each attribute. https://projects.ecoinformatics.org/ecoinfo/issues/6256
support content from all serverLocations when summarizing entity info (semtools)
First pass at a class for summarizing attribute information for analysis. (semtools) https://projects.ecoinformatics.org/ecoinfo/issues/6256