Kepler: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362009-01-10T18:09:44ZEcoinformatics Redmine
Redmine Bug #3739 (Resolved): Issues in inferring data lineage via the provenance recorderhttps://projects.ecoinformatics.org/ecoinfo/issues/37392009-01-10T18:09:44ZShawn Bowersbowers@gonzaga.edu
<p>I am trying to figure out how to infer data/token dependencies (more generally, data lineage information) via the current Provenance Recorder component/implementation in Kepler. It looks as though the recorder is providing the following high-level information (shown as relational tables):</p>
<p>Fire(Actor, StartTime, EndTime)<br />Write(Actor, Port, Channel, DataValue, WriteTime)<br />Read(Actor, Port, Channel, DataValue, ReadTime)</p>
<p>If we assume that actor invocations do not maintain state, and that all data read by an actor invocation prior to writing a data value were used to derive the written value, then one could derive simple data dependencies using the following high-level query (or view). Note that in general, these are severe restrictions, e.g., many Kepler/Ptolemy actors maintain state and do not use all input data to derive all output data.</p>
<p>DependsOn(DataValue2, DataValue1) :-<br /> Fire(Actor, StartTime, EndTime), <br /> Read(Actor, _, _, DataValue1, ReadTime), <br /> Write(Actor, _, _, DataValue2, WriteTime), <br /> StartTime <= ReadTime <= WriteTime <= EndTime.</p>
<p>This query says that if an actor fires from StartTime to EndTime, and during this time range it read DataValue1 before it wrote DataValue2, then implicitly DataValue2 depended on (or was derived by) DataValue1.</p>
<p>Here are three problems (there may be more) that seem to prevent these simple types of dependencies, and in general data lineage, from being inferred from the information provided by the Provenance Recorder.</p>
<p>1. The granularity of timestamps recorded for fire, read, and write events is too coarse (I believe at the granularity of seconds). For many Kepler/Ptolemy workflows this means that every such event has the same timestamp (thus, timestamps in this case become "meaningless"). A simple fix for this issue is for the Provenance Recorder to generate timestamps at a finer granularity.</p>
<p>2. To compute data lineage information---the transitive closure of these simple dependencies, e.g., to infer which input data items the output of a workflow were derived from---instead of recording data values above, we should be recording token ids. This is because the same data value can be produced and consumed by multiple invocations in a workflow, which confuses the inference of data lineage information. A possible approach for obtaining an object id for Ptolemy tokens would be to use the System.identityHashCode(Object) method provided by Java. Other alternatives would be to add an "id" field to Token (that could be auto generated on token creation), but this would present additional difficulties (e.g., it would require each token to be "larger", possibly decreasing efficiency).</p>
<p>3. The Provenance Recorder does not record the read, write, and fire events of all actors. In particular, very basic actors like ArrayToSequence are not recorded. I'm not really sure why this is the case, but unless each invocation has its events recorded, in general, it will not be possible to infer data lineage via the Provenance Recorder.</p>
<p>While I am not sure what information specifically the new workflow execution reporting tools are proposing to track and present to users, basic data dependencies and lineage information (e.g., to determine which input data produced output data) to me seems like essential information. So, figuring out how to record provenance events to infer these dependencies generally for all Kepler workflows seems crucial.</p>
<p>Thoughts?</p>
<p>Thanks,<br />Shawn</p> Bug #3094 (Resolved): File keyboard shortcuts use ctrl key instead of apple key on Machttps://projects.ecoinformatics.org/ecoinfo/issues/30942008-01-23T01:40:58ZShawn Bowersbowers@gonzaga.edu
<p>Unlike in View and Edit, File shortcuts use ^Key (ctrl+Key) instead of Apple^Key. These shortcuts include:</p>
<ul>
<li>Open File</li>
<li>Save</li>
<li>Print</li>
<li>Close</li>
</ul>
<p>Note also that on the actor right-click menu, Open Actor is Apple^L, whereas Configure Actor is Ctrl^E. In the original design, these were both supposed to be Apple^Key I believe.</p> Bug #3092 (Resolved): sampling_occurrenceData_ R.xml demo workflowhttps://projects.ecoinformatics.org/ecoinfo/issues/30922008-01-23T00:49:40ZShawn Bowersbowers@gonzaga.edu
<p>This demo workflow does not run properly because a hard-coded path is used for the file.</p> Bug #3090 (Resolved): ASC2RAWTest demo workflow errorhttps://projects.ecoinformatics.org/ecoinfo/issues/30902008-01-23T00:18:29ZShawn Bowersbowers@gonzaga.edu
<p>When I try to run this workflow I get the java exception:</p>
<p>Cannot open file or URL in .ASC2RAWTest.Layer List Reader.fileOrURL<br />Because: <br />.../demos/ENM/layerList.txt (No such file or directory)</p> Bug #3089 (Resolved): Actor port properties lost after saving to libraryhttps://projects.ecoinformatics.org/ecoinfo/issues/30892008-01-23T00:12:52ZShawn Bowersbowers@gonzaga.edu
<p>If I select an actor (e.g., Garp Prediction), add a semantic type to one of its input ports, and save it back to the library (e.g., as My Garp Prediction), when I drag the actor back onto the canvas, the semantic type property is no longer available (it somehow was "lost in translation"). This happens with the latest version/update of Kepler and PTII (02/22/08).</p> Bug #2978 (Resolved): saving to actor library from kepler no longer consistently works and port i...https://projects.ecoinformatics.org/ecoinfo/issues/29782007-10-08T23:40:11ZShawn Bowersbowers@gonzaga.edu
<p>It appears as though some new issues have cropped up w.r.t. the kepler actor library, in particular:</p>
<p>- when actors are added to the library from the canvas, they sometimes do not appear in the library, sometimes require a change in another actor to appear, and sometimes require kepler to be restarted to appear. I haven't nailed down exactly what sequence of events are causing this / needed to make an actor show up.</p>
<p>- input/output port properties are no longer saved when an actor is saved to a library (and dragged back to the canvas). This includes port data types as well as port semantic types. The same holds (i.e., port information is lost) when an actor is built via buildkarlib as well.</p>
<p>- the actor library interface appears to have changed, e.g., calling entityList() returns a null pointer (it seems ...)</p>
<p>The above behavior occurs on a fresh cvs checkout ...</p>
<p>Thanks,<br />-shawn</p> Bug #1924 (Resolved): Refactor actor and data display and search codehttps://projects.ecoinformatics.org/ecoinfo/issues/19242005-01-25T19:13:03ZShawn Bowersbowers@gonzaga.edu
<p>Refactor the current actor and data display to have published interface for<br />interacting with it (i.e., don't pass in a JTree, but rather provide accessor<br />methods for adding and removing categories and items in the categories, and make<br />it work the same for Data and for Actors tabs).</p>