Kepler: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362009-01-10T18:09:44ZEcoinformatics Redmine
Redmine Bug #3739 (Resolved): Issues in inferring data lineage via the provenance recorderhttps://projects.ecoinformatics.org/ecoinfo/issues/37392009-01-10T18:09:44ZShawn Bowersbowers@gonzaga.edu
<p>I am trying to figure out how to infer data/token dependencies (more generally, data lineage information) via the current Provenance Recorder component/implementation in Kepler. It looks as though the recorder is providing the following high-level information (shown as relational tables):</p>
<p>Fire(Actor, StartTime, EndTime)<br />Write(Actor, Port, Channel, DataValue, WriteTime)<br />Read(Actor, Port, Channel, DataValue, ReadTime)</p>
<p>If we assume that actor invocations do not maintain state, and that all data read by an actor invocation prior to writing a data value were used to derive the written value, then one could derive simple data dependencies using the following high-level query (or view). Note that in general, these are severe restrictions, e.g., many Kepler/Ptolemy actors maintain state and do not use all input data to derive all output data.</p>
<p>DependsOn(DataValue2, DataValue1) :-<br /> Fire(Actor, StartTime, EndTime), <br /> Read(Actor, _, _, DataValue1, ReadTime), <br /> Write(Actor, _, _, DataValue2, WriteTime), <br /> StartTime <= ReadTime <= WriteTime <= EndTime.</p>
<p>This query says that if an actor fires from StartTime to EndTime, and during this time range it read DataValue1 before it wrote DataValue2, then implicitly DataValue2 depended on (or was derived by) DataValue1.</p>
<p>Here are three problems (there may be more) that seem to prevent these simple types of dependencies, and in general data lineage, from being inferred from the information provided by the Provenance Recorder.</p>
<p>1. The granularity of timestamps recorded for fire, read, and write events is too coarse (I believe at the granularity of seconds). For many Kepler/Ptolemy workflows this means that every such event has the same timestamp (thus, timestamps in this case become "meaningless"). A simple fix for this issue is for the Provenance Recorder to generate timestamps at a finer granularity.</p>
<p>2. To compute data lineage information---the transitive closure of these simple dependencies, e.g., to infer which input data items the output of a workflow were derived from---instead of recording data values above, we should be recording token ids. This is because the same data value can be produced and consumed by multiple invocations in a workflow, which confuses the inference of data lineage information. A possible approach for obtaining an object id for Ptolemy tokens would be to use the System.identityHashCode(Object) method provided by Java. Other alternatives would be to add an "id" field to Token (that could be auto generated on token creation), but this would present additional difficulties (e.g., it would require each token to be "larger", possibly decreasing efficiency).</p>
<p>3. The Provenance Recorder does not record the read, write, and fire events of all actors. In particular, very basic actors like ArrayToSequence are not recorded. I'm not really sure why this is the case, but unless each invocation has its events recorded, in general, it will not be possible to infer data lineage via the Provenance Recorder.</p>
<p>While I am not sure what information specifically the new workflow execution reporting tools are proposing to track and present to users, basic data dependencies and lineage information (e.g., to determine which input data produced output data) to me seems like essential information. So, figuring out how to record provenance events to infer these dependencies generally for all Kepler workflows seems crucial.</p>
<p>Thoughts?</p>
<p>Thanks,<br />Shawn</p> Bug #3094 (Resolved): File keyboard shortcuts use ctrl key instead of apple key on Machttps://projects.ecoinformatics.org/ecoinfo/issues/30942008-01-23T01:40:58ZShawn Bowersbowers@gonzaga.edu
<p>Unlike in View and Edit, File shortcuts use ^Key (ctrl+Key) instead of Apple^Key. These shortcuts include:</p>
<ul>
<li>Open File</li>
<li>Save</li>
<li>Print</li>
<li>Close</li>
</ul>
<p>Note also that on the actor right-click menu, Open Actor is Apple^L, whereas Configure Actor is Ctrl^E. In the original design, these were both supposed to be Apple^Key I believe.</p> Bug #3092 (Resolved): sampling_occurrenceData_ R.xml demo workflowhttps://projects.ecoinformatics.org/ecoinfo/issues/30922008-01-23T00:49:40ZShawn Bowersbowers@gonzaga.edu
<p>This demo workflow does not run properly because a hard-coded path is used for the file.</p> Bug #3090 (Resolved): ASC2RAWTest demo workflow errorhttps://projects.ecoinformatics.org/ecoinfo/issues/30902008-01-23T00:18:29ZShawn Bowersbowers@gonzaga.edu
<p>When I try to run this workflow I get the java exception:</p>
<p>Cannot open file or URL in .ASC2RAWTest.Layer List Reader.fileOrURL<br />Because: <br />.../demos/ENM/layerList.txt (No such file or directory)</p> Bug #3089 (Resolved): Actor port properties lost after saving to libraryhttps://projects.ecoinformatics.org/ecoinfo/issues/30892008-01-23T00:12:52ZShawn Bowersbowers@gonzaga.edu
<p>If I select an actor (e.g., Garp Prediction), add a semantic type to one of its input ports, and save it back to the library (e.g., as My Garp Prediction), when I drag the actor back onto the canvas, the semantic type property is no longer available (it somehow was "lost in translation"). This happens with the latest version/update of Kepler and PTII (02/22/08).</p> Bug #2978 (Resolved): saving to actor library from kepler no longer consistently works and port i...https://projects.ecoinformatics.org/ecoinfo/issues/29782007-10-08T23:40:11ZShawn Bowersbowers@gonzaga.edu
<p>It appears as though some new issues have cropped up w.r.t. the kepler actor library, in particular:</p>
<p>- when actors are added to the library from the canvas, they sometimes do not appear in the library, sometimes require a change in another actor to appear, and sometimes require kepler to be restarted to appear. I haven't nailed down exactly what sequence of events are causing this / needed to make an actor show up.</p>
<p>- input/output port properties are no longer saved when an actor is saved to a library (and dragged back to the canvas). This includes port data types as well as port semantic types. The same holds (i.e., port information is lost) when an actor is built via buildkarlib as well.</p>
<p>- the actor library interface appears to have changed, e.g., calling entityList() returns a null pointer (it seems ...)</p>
<p>The above behavior occurs on a fresh cvs checkout ...</p>
<p>Thanks,<br />-shawn</p> Bug #1923 (New): Develop ontologies; engage KR grouphttps://projects.ecoinformatics.org/ecoinfo/issues/19232005-01-25T19:11:27ZShawn Bowersbowers@gonzaga.edu
<p>Develop some guidelines for what makes an ontology useful for specific SMS<br />applications so that the KR group can develop appropriate ontologies.</p>
<p>Pick a few actors that could be useful to us and use those to create the<br />ontologies used to annotate ports, rather than from first principals. In<br />particular, use Eco Niche Modeling example and Biodiversity Analysis example<br />from SEEK.</p>
<p>Determine how to deal with actors that use files for passing data rather than<br />passing the data itself.</p> Bug #1922 (New): Develop strategies for GUI extenstions for the semantics stuffhttps://projects.ecoinformatics.org/ecoinfo/issues/19222005-01-25T19:07:45ZShawn Bowersbowers@gonzaga.edu
<p>Determine whether the "access points" at higher levels of the GUI can be<br />modified to suit our needs, rather than change thier classes through their Moml<br />extension mechanisms. Alternatively, generalize the Ptolemy source code for<br />extended GUI customization.</p> Bug #1921 (New): Define the Kepler GUI components required for semantic mediation in Keplerhttps://projects.ecoinformatics.org/ecoinfo/issues/19212005-01-25T19:06:05ZShawn Bowersbowers@gonzaga.edu
<p>For example, a new button for "semantic type check", a easy mechanism to add<br />semantic annotations to actors/ports and possibly datasets, and an explanation<br />interface for describing why a particular item was found/used/merged, etc.</p> Bug #1920 (New): Search based on semantic annotations of a dataset's attributeshttps://projects.ecoinformatics.org/ecoinfo/issues/19202005-01-25T19:02:04ZShawn Bowersbowers@gonzaga.eduBug #1919 (New): choose language for semantic annotations in kepler archiveshttps://projects.ecoinformatics.org/ecoinfo/issues/19192005-01-25T19:00:13ZShawn Bowersbowers@gonzaga.eduBug #1918 (New): Search based on the semantic annotation of an actor's porthttps://projects.ecoinformatics.org/ecoinfo/issues/19182005-01-25T18:59:11ZShawn Bowersbowers@gonzaga.eduBug #1917 (New): Design and implement workflow semantic type check.https://projects.ecoinformatics.org/ecoinfo/issues/19172005-01-25T18:57:59ZShawn Bowersbowers@gonzaga.edu
<p>Design and implement a feature similar to the Kepler type-system check (which<br />occurs after a workflow is executed), but for semantic annotations.</p>
<p>The semantic type-check should be a component (api) that can be called by<br />Kepler. Need a mechanism to report (through an interface) the semantic problems<br />of a workflow (if any exist), and possibly a mechanism to suggest ways to<br />correct the problems (a la the unit type system).</p> Bug #1916 (New): Implement semantic search for data and actors in local fileshttps://projects.ecoinformatics.org/ecoinfo/issues/19162005-01-25T18:54:03ZShawn Bowersbowers@gonzaga.edu
<p>The goal of this work is to decouple the semantic search as much as possible<br />from Kepler so we can shift the implementation to EcoGrid nodes and other remote<br />repositories.</p>
<p>Investigate whether it is feasible to combine the actor and data simple search<br />interfaces into a single set of search interfaces, that do not distinguish<br />between different object types.</p>
<p>Also, the resulting implementation should provide actor and data "smart" search<br />so that the implementations locally and remotely are the same.</p> Bug #1915 (New): Define a published interface for a semantic search servicehttps://projects.ecoinformatics.org/ecoinfo/issues/19152005-01-25T18:49:32ZShawn Bowersbowers@gonzaga.edu
<p>The purpose of this work is to define a generic api for "smart" search, which<br />implementation(s) will conform to. Each api operation will require both (1) a<br />public interface that external services will interface with (to call the desired<br />function), and (2) a description of the required backend information-storage<br />operations necessary to execute the desired functions. For example, this might<br />include the operation 'getObjectLSIDList(ObjectType) : ObjectLSIDList' that when<br />given an object type (such as "actor" or "dataset") will return all the LSIDs<br />for objects stored in the target backend repository.</p>