Bug #4922: DB-driven Annotation Manager - Semtools - Ecoinformatics Redmine

Actions

Copy link

Bug #4922

closed

DB-driven Annotation Manager

Added by ben leinfelder almost 15 years ago. Updated almost 15 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

ben leinfelder

Category:

SMS API

Target version:

Unspecified

Start date:

03/31/2010

Due date:

% Done:

Estimated time:

Bugzilla-Id:

4922

Description

The current AnnotationManager implementation reads XML serializations into Annotation objects in memory. Any operations (search, edit) operate on the in-memory version and are then serialized to XML (save). This doesn't scale and also doesn't make it all that easy to issue complex searches.
I'd still keep the XML serializations as a transport mechanism and archive method (<cough> Metacat)

Pros:
-allows us to index annotations by various criteria
-efficient querying
-won't run out of memory
Cons:
-DB overhead/configuration
-not binding to actual data (or are we?)

Actions

Copy link

Updated by ben leinfelder almost 15 years ago

draft of the basic schema needed is here:
https://code.ecoinformatics.org/code/semtools/trunk/docs/design/AnnotationManagerERD.pdf

This seems reasonable enough to implement and I believe it satisfies our query needs.
When the annotation was updated in an application, it would be persisted as follows:
Java object -> DB tables -> XML.
When the AnnotationManager is initialized, it would retrieve the XML representations, update/create entries in the DB for the Annotations it found, but not keep any Annotation objects in memory.
When an application wanted to use a given Annotation, it would be looked up from the DB, then read into a java object from its XML form.

Actions

Copy link

Updated by Matt Jones almost 15 years ago

I'm not sure the cardinality is right on that ERD model. In particular, shouldn't the Measurement--Characteristic cardinality be 1:1? For each measurement, there is only one characteristic afaik.

Actions

Copy link

Updated by ben leinfelder almost 15 years ago

Making good progress on this implementation. Really it's more of a hybrid - extending the in-memory version to only hold a small number of "active annotations" (i.e. Morpho is editing them) while the DB stores all the rest (for searching). This allows us to search effectively, but also use the "manager" to manage annotations throughout client applications.
The persistence layer is an an embedded Apache Derby database (think HSQLDB) with Apache Cayenne mediating between the objects and the tables. It's all up and running already, and my first prototype queries were a cinch to write:

Expression.fromString("observations.measurements.standard like $param ");

creates an expression for an annotation query - the object-oriented dot notation is used for the "joins" from annotation->observations->measurements. (How cool is that!?)

Implementing all the existing search options and any more we can think up should be painless now that the wiring is all hooked up. (That's the one remaining task - re-implement all the search methods to use the db).

Actions

Copy link

Updated by ben leinfelder almost 15 years ago

Status:
The existing query methods have been re-implemented to use the db model.
If there's a "working" copy of the annotation in memory, we use that one, otherwise we look up the annotation in the index db where there is a pointer to the serialized version which is then read into memory.

Actions

Copy link

Updated by ben leinfelder almost 15 years ago

Contexts are now processed recursively so that their transitive nature can be searched:
A rel B
B rel C
-----
A rel C (synthetic)

Actions

Copy link

Updated by ben leinfelder almost 15 years ago

implemented the "is not" operator for criteria. I'm not sure how useful it will be at this point given that searches are done across the entire Annotation.
If I have two Observations in my Annotation and one has Entity A, the other Entity B, then expressing the query Entity is not B still means that Entity A gives us a match for the annotation (A!=B, right?). I guess it will be useful in more complex/compound query scenarios, but still something we should keep in mind as we test the searching capabilities.

Actions

Copy link