Project

General

Profile

Statistics
| Revision:

# Date Author Comment
9463 12/18/2015 03:14 PM Jing Tao

Close some prepared sql statement in the summarize method.

9153 03/18/2015 02:26 PM ben leinfelder

remove classes annotator classes that have moved to a different project under dataone's github.

9014 12/09/2014 01:10 PM ben leinfelder

add Annotator Store implementation -- pass through to D1 API for the AnnotatorJS API

8977 11/19/2014 12:17 PM ben leinfelder

remove AnnotatorService completely - was moved to cn-index-processor

8964 11/14/2014 11:20 AM ben leinfelder

only index non-empty comment text

8963 11/14/2014 11:13 AM ben leinfelder

index both tags and text from annotateit.org

8961 11/13/2014 05:59 PM ben leinfelder

query by consumerKey until the pid search facets are fully supported on annotateit.org

8946 10/31/2014 05:09 PM ben leinfelder

look up annotations when reindexing a given pid. still very much a prototype in that we are looking up annotations from an external annotator-store. TODO: add pid filtering to query when annotateit.org supports it (pending upgrade on their site).

8910 10/17/2014 05:02 PM ben leinfelder

use http://tools.ietf.org/rfc/rfc3023 spec for conformsTo property. use the full xpath for EML dataTable and attribute selectors

8810 07/23/2014 04:19 PM ben leinfelder

add support for v2 DataONE API.

8788 05/19/2014 02:21 PM ben leinfelder

use separate surName and givenNames to lookup ORCIDs.

8784 05/15/2014 03:17 PM ben leinfelder

all full-text queries for ORCID, but it isn't that great because we might have a"PISCO" creator that shows us in may different orcid profiles...false matches.

8777 05/14/2014 12:04 PM ben leinfelder

use HttpClient to query orcid so I can easily set headers and such -- getting 503s from their production server when I test on dev.nceas...odd

8776 05/14/2014 11:43 AM ben leinfelder

adjust tests for production service -- more "real" information shows additional return values from the query.

8775 05/14/2014 09:18 AM ben leinfelder

switch to the production ORCID server for looking up orcid matches for our creators.
add test to summarize how many creator matches we can actually find. https://projects.ecoinformatics.org/ecoinfo/issues/6423

8769 05/09/2014 01:48 PM ben leinfelder

cache the imported models to avoid timeouts from remote hosts (or being locked out for too many requests in a given time period).

8768 05/08/2014 04:25 PM ben leinfelder

process all the returned annotation suggestions until we find one that is appropriately located in the subclass hierarchy for the given superclass.

8767 05/08/2014 04:23 PM ben leinfelder

use in-memory TDB dataset for querying annotations for indexing -- this comes with the same reasoning capabilities as the directory-based one, but has the benefit of not filling the directory with triples that will not be used again. prepping for d1 AHM

8765 05/07/2014 11:12 PM ben leinfelder

when indexing annotations directly, just use an in-memory triple store rather than TDB since we remove each graph as it is processed (and my TDB instance would get into the multi-GB range with a few runs, even if I removed the old models)

8763 05/02/2014 04:39 PM ben leinfelder

simplify lookup for classes and orcid. remove the "random" annotation code branches -- just too confusing to look at those bogus classes especially now that we have "real" generated annotations.

8757 04/29/2014 04:54 PM ben leinfelder

first pass at direct EML->semantic index method. Still produces an RDF model, but does not persist it in Metacat, only in the triplestore. Allows us to re-run without adding stale RDF to the MN store.

8743 04/22/2014 11:43 AM ben leinfelder

include BioPortal lookup for Entity matches using the data table description. TODO: only associate measurements to the entity observation if they apply.

8724 04/02/2014 03:36 PM ben leinfelder

check for null entities and/or attributes (typically when otherEntity is being used in EML).

8723 04/02/2014 03:35 PM ben leinfelder

remove extra space in log message

8718 03/31/2014 11:11 AM ben leinfelder

attribute the datapackage to the creator (using orcid if we can find it). https://projects.ecoinformatics.org/ecoinfo/issues/6267
https://projects.ecoinformatics.org/ecoinfo/issues/6423

8717 03/31/2014 10:31 AM ben leinfelder

add test for BioPortal annotator service.

8716 03/28/2014 03:51 PM ben leinfelder

refactor web service calls to bioportal and orcid outside of the annotator class. test with orcid sandbox server. include orcid uri for the annotations being generated (we can index these and drive our searches on these values down the road). related to this: https://projects.ecoinformatics.org/ecoinfo/issues/6423 and also some semtools tasks.

8714 03/26/2014 04:05 PM ben leinfelder

Use OBOE-SBC ontology for looking up concepts (it contains subclasses of our OBOE Characteristic and Standard superclasses). Restrict annotations to only subclasses that fit the OBOE model. Correct the xpointer and individual naming conventions so they are unique, but express the exact entity/attribute being annotated.

8713 03/26/2014 03:10 PM ben leinfelder

remove my api key. oops

8712 03/26/2014 03:02 PM ben leinfelder

add comment/pointer to BioPortal annotation service.

8711 03/26/2014 03:00 PM ben leinfelder

Include method to look up annotation classes from BioPortal. We still have OBOE-SBC in there, and theyhave the SWEET ontology. The suggestions returned are not perfect, but they can be better than nothing. Ideally, we'd only query a few ontologies so we don't end up using terms from medical ontologies that aren't really appropriate for our domain. https://projects.ecoinformatics.org/ecoinfo/issues/6256

8710 03/24/2014 04:17 PM ben leinfelder

Add xpointer FragmentSelectors to each annotation.
Split attribute label into tokens to attempt matching to OBOE concepts.

8709 03/23/2014 12:11 AM ben leinfelder

include code to generate random annotations for UI testing. Effective, but can be confusing to see so many unrelated concepts on duplicate EML packages.

8702 03/14/2014 10:59 AM ben leinfelder

first pass at generating annotations from EML attribute information. uses the OpenAnnotation model that the metacat-index tests assume which allows us to populate dynamic index fields for the annotation class[es]. There is still much to be done with finding appropriate concepts for each attribute. https://projects.ecoinformatics.org/ecoinfo/issues/6256

8689 03/03/2014 03:41 PM ben leinfelder

support content from all serverLocations when summarizing entity info (semtools)

8646 02/24/2014 04:30 PM ben leinfelder

First pass at a class for summarizing attribute information for analysis. (semtools) https://projects.ecoinformatics.org/ecoinfo/issues/6256