Bug #4922
closed
DB-driven Annotation Manager
Added by ben leinfelder over 14 years ago.
Updated over 14 years ago.
Description
The current AnnotationManager implementation reads XML serializations into Annotation objects in memory. Any operations (search, edit) operate on the in-memory version and are then serialized to XML (save). This doesn't scale and also doesn't make it all that easy to issue complex searches.
I'd still keep the XML serializations as a transport mechanism and archive method (<cough> Metacat)
Pros:
-allows us to index annotations by various criteria
-efficient querying
-won't run out of memory
Cons:
-DB overhead/configuration
-not binding to actual data (or are we?)
draft of the basic schema needed is here:
https://code.ecoinformatics.org/code/semtools/trunk/docs/design/AnnotationManagerERD.pdf
This seems reasonable enough to implement and I believe it satisfies our query needs.
When the annotation was updated in an application, it would be persisted as follows:
Java object -> DB tables -> XML.
When the AnnotationManager is initialized, it would retrieve the XML representations, update/create entries in the DB for the Annotations it found, but not keep any Annotation objects in memory.
When an application wanted to use a given Annotation, it would be looked up from the DB, then read into a java object from its XML form.
I'm not sure the cardinality is right on that ERD model. In particular, shouldn't the Measurement--Characteristic cardinality be 1:1? For each measurement, there is only one characteristic afaik.
Making good progress on this implementation. Really it's more of a hybrid - extending the in-memory version to only hold a small number of "active annotations" (i.e. Morpho is editing them) while the DB stores all the rest (for searching). This allows us to search effectively, but also use the "manager" to manage annotations throughout client applications.
The persistence layer is an an embedded Apache Derby database (think HSQLDB) with Apache Cayenne mediating between the objects and the tables. It's all up and running already, and my first prototype queries were a cinch to write:
Expression.fromString("observations.measurements.standard like $param ");
creates an expression for an annotation query - the object-oriented dot notation is used for the "joins" from annotation->observations->measurements. (How cool is that!?)
Implementing all the existing search options and any more we can think up should be painless now that the wiring is all hooked up. (That's the one remaining task - re-implement all the search methods to use the db).
Status:
The existing query methods have been re-implemented to use the db model.
If there's a "working" copy of the annotation in memory, we use that one, otherwise we look up the annotation in the index db where there is a pointer to the serialized version which is then read into memory.
Contexts are now processed recursively so that their transitive nature can be searched:
A rel B
B rel C
-----
A rel C (synthetic)
implemented the "is not" operator for criteria. I'm not sure how useful it will be at this point given that searches are done across the entire Annotation.
If I have two Observations in my Annotation and one has Entity A, the other Entity B, then expressing the query Entity is not B still means that Entity A gives us a match for the annotation (A!=B, right?). I guess it will be useful in more complex/compound query scenarios, but still something we should keep in mind as we test the searching capabilities.
this is now the implementation in the packaged installer.
the bulk of this is complete - i'll add individual tasks to refine its use as they are identified.
Original Bugzilla ID was 4922
Also available in: Atom
PDF