Kepler: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362009-08-07T23:27:12ZEcoinformatics Redmine
Redmine Bug #4300 (New): Animate at Runtime" checkbox stays checked when director is replacedhttps://projects.ecoinformatics.org/ecoinfo/issues/43002009-08-07T23:27:12ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>If you enable run-time animation of a workflow and then swap in a different director, the "Animate at Runtime" menu item remains checked. However, the next run of the workflow will not be animated; apparently the newly inserted director does not know about the animation?</p> Bug #4046 (New): ComadTest should report more details when detecting an errorhttps://projects.ecoinformatics.org/ecoinfo/issues/40462009-05-01T20:05:58ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The ComadTest actor is used to create automated tests of COMAD features. Because it is often useful to include several instances of ComadTest in the same test workflow, the ComadTest actor should report its name when it throws an exception. Ideally it also would indicate something about how the data stream it received during the current workflow run does not match what it received during training--perhaps the element name and line number of the first mismatch in the trace?</p> Bug #3671 (New): Configurable workspace directory for holding workflows, data, and run productshttps://projects.ecoinformatics.org/ecoinfo/issues/36712008-11-13T19:04:28ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>In bug 3558 I requested that a new directory be created on the user's system for each workflow run, and that outputs of the run, trace files, etc, be placed there. In bug 3585 I asked for an API that would make it easy for actors to write output files to this 'run' directory.</p>
<p>But where should these run directories themselves go? I believe we should allow users to specify a directory for holding their 'workspace' in a location of their choosing. In the workspace could go a directory for holding the workflows they develop and use for a particular project (we've done this before in the Kepler/ppod release, but the directory location was fixed), another directory for holding workflow runs, etc.</p>
<p>One alternative would be to hide all this somewhere inside .kepler in the user's home directory. However, I don't think this is the best approach for two reasons. First, the point is to make it easy for users to find their workflows, data, and workflow run products, and to load the latter into other tools for visualization and further analysis. The .kepler directory is hidden and should be used for things that would distract the user if made more prominent. Second, in practice the .kepler directory is frequently deleted (sometimes when installing a new version of Kepler, for example). A user's work should not be deleted at such times, so .kepler should be used only for things that can be regenerated as needed (e.g. data caches).</p>
<p>Another alternative would be to store everything discussed here in a database. However, many workflows (a) generate large numbers of large data files that would be awkward to place in a database, and (b) users often want immediate file-system access to these output files anyway because the other tools they use to review and further analyze their results expect the data to be stored in files. There shouldn't be an extra step of exporting workflow run products from a database to a directory of files after each workflow run in such cases.</p>
<p>I also think users should have the option of creating multiple workspaces, each with their own directories of workflows and runs. A workspace browser in Kepler could make it easy to view workflows or runs from a particular workspace or all of them at once.</p>
<p>Note that all this has ramifications for distributed execution. Following execution on multiple nodes, the files expected to be found in a local run directory will need to be copied automatically from each compute node.</p> Bug #3585 (New): Provide API for creating output files during a workflow runhttps://projects.ecoinformatics.org/ecoinfo/issues/35852008-10-30T18:47:57ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>I suggested in bug 3558 that Kepler should create a directory on the user's system for each workflow run and store the trace(s) of the run there along with other files produced during the run.</p>
<p>To make use of this facility easy for actor developers, we could provide an API for easily creating output files (e.g., graphics files containing plots of data) in this location (or copying them from temporary directories elsewhere) and ensuring that such files are named uniquely in that directory.</p>
<p>Note that while the default implementation could leave these run directories on the user's machine, an alternative implementation could transfer data files created during a run from these directories to some other data store, load the trace files into a DBMS to make them queryable, etc.</p> Bug #3576 (New): support for accessing cascading metadata from within CompositeCoactorhttps://projects.ecoinformatics.org/ecoinfo/issues/35762008-10-27T22:04:24ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CompositeCoactor class extends TypedCompositeActor (and implements Coactor) to provide a mechanism for implementing coactors from SDF sub-workflows of conventional actors. Data is extracted from the read scope using input ports named according to the types of data to be extracted, e.g., a port named 'StringToken' will extract a single string token out of the current read scope and provide it as input to the subworkflow on each firing; a port named 'StringToken+' will provide an array token containing one or more string tokens extracted from the read scope on each firing, etc.</p>
<p>Currently, metadata or annotations applied to the top-level collection in a scope-match can also be extracted by specifying the key for the metadata element required (e.g., by naming a port 'StringToken [key=filename]'). One thing that can't be done is to access metadata applied to collections above the read scope and cascading down to it. This function would be very useful for reusing information across multiple invocations of a composite coactor.</p> Bug #3574 (New): Support for importing directory contents using CollectionSourcehttps://projects.ecoinformatics.org/ecoinfo/issues/35742008-10-27T18:49:37ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>A common workflow pattern is to take as input all of the files (or those of a particular type) in a directory on a researcher's computer system. For example, there are COMAD workflows that process all the FASTA files in a directory, creating a collection for each FASTA file and storing the contained DNA or protein sequences in the corresponding input collections.</p>
<p>Once the CollectionSource actor is able to automatically import the contents of files (see bug 3573), it will be extremely useful to refer to directories in the XML input to CollectionReader or CollectionComposer and have the actor import all of the files it finds there. Another useful feature would be the option of having CollectionSource descend into sub-directories, creating a nested collection for each and importing contained files into the corresponding subcollections. Whole directories of scientific data files could then easily serve as input to COMAD workflows.</p>
<p>These features eventually could make it much easier to stage data for input to a workflow run without requiring modification of the workflow specification itself.</p> Bug #3573 (New): Support for importing file contents automatically using CollectionSourcehttps://projects.ecoinformatics.org/ecoinfo/issues/35732008-10-27T18:32:05ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CollectionComposer and CollectionReader actors extend CollectionSource to read XML representations of the input to a COMAD workflow and translate them into data tokens, metadata tokens, collection delimiters, etc. Presently all data read in by CollectionComposer must be contained in the XML that is provided either as a parameter value to CollectionComposer or as an external file to CollectionReader. However, many workflows use data from other files and this data currently must be read and parsed by explicit actors elsewhere in the workflow. The input to a workflow would be clearer, and workflows simpler and more transparent, if files could be referred to in the XML processed by CollectionSource, and if CollectionSource were to automatically include the contents of these files in the workflow input.</p>
<p>A simple first step would be to enable CollectionComposer to read in text files either as a TextFile collection containing a single StringToken holding the contents of the file, or a TextFile collection containing one StringToken for each line of the text file. (Existing COMAD workflows demonstrate the usefulness of both approaches).</p>
<p>A second step would be to allow one to register format-specific parsers for CollectionSource to use when reading particular types of files. For example, a FASTA file parser could be plugged in that would create a FASTA collection filled with (e.g., DNA) Sequence tokens, and a Nexus file parser could create a Nexus collection containing CharacterMatrix, WeightVector, and phylogenetic Tree tokens.</p> Bug #3568 (New): support for writing COMAD-style trace files from the Provenance Recorderhttps://projects.ecoinformatics.org/ecoinfo/issues/35682008-10-24T23:48:40ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>I have heard rumors that there are plans to enable the general-purpose Provenance Recorder in Kepler to (optionally) write out it's records of a run using the trace file format employed by the COMAD framework. This would be extremely helpful because one could then view the provenance captured during any workflow run via the provenance browser.</p>
<p>I expect that this also would highlight information that cannot be stored in a COMAD trace or represented in the provenance browser, and so lead to enhancements of these to make them more generally useful.</p> Bug #3566 (New): order collection contents displayed in provenance browser?https://projects.ecoinformatics.org/ecoinfo/issues/35662008-10-24T20:04:56ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>When the data elements inside COMAD collections are displayed in the provenance browser they seem to be arbitrarily ordered, both in the dependency history view and the collection history view. It would be helpful to have these data and collection elements ordered according to their sequence in the data stream (as recorded in the trace file).</p>
<p>Consider an invocation of the Gblocks actor. It takes a collection of biological (e.g., protein) sequences and outputs another collection of sequences, where each output protein sequence corresponds to one input (those segments of the input sequences not aligned reliably between the input sequences having been removed from the outputs). If the sequences represented in the dependency history were ordered then clicking on the third input protein sequence and then the third output sequence displayed in the dependency history view would give one a before-and-after view of that sequence. As it is, it's difficult finding the corresponding sequences in the input and outputs.</p>
<p>The same holds for the collection history view. In addition, if the collections themselves were ordered according to trace order, then incremental buildup of the collections when stepping through the actor invocations would more closely represent the construction of the collection stream during workflow execution.</p> Bug #3560 (New): Color-code contents of CollectionDisplayhttps://projects.ecoinformatics.org/ecoinfo/issues/35602008-10-24T00:53:33ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CollectionDisplay actor provides a live, XML-formatted view of the data stream arriving at the actor in a COMAD workflow. The window contents would be easier to understand if they were color-coded to distinguish Collections, Data elements, metadata/annotations, and provenance records, as suggested for the Trace File view of the provenance browser in bug 3555 (the color-coding should be same for both).</p> Bug #3559 (New): month field in trace file name is off by onehttps://projects.ecoinformatics.org/ecoinfo/issues/35592008-10-24T00:38:31ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The name of a new run trace file comprises the name of the workflow, a timestamp, and a number distinguishing the trace in case multiple traces were created during the run, e.g. Clustal_20080923_172447_1.trace for a run of the Clustal workflow a few minutes ago. The month field of the date portion of the timestamp is off by one.</p> Bug #3558 (New): Store each workflow run trace in it's own directoryhttps://projects.ecoinformatics.org/ecoinfo/issues/35582008-10-23T23:33:32ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The ppod-gui module includes enhancements to the Kepler GUI for browsing and viewing the traces of previously run workflows. Each trace is a single, XML-formatted file, and the traces are organized in folders named according to the workflows run. These folders corresponding to actual directories on the user's machine. Double-clicking on a trace opens it in the provenance browser.</p>
<p>I suggest that instead of putting all the traces of runs of a particular workflow in the same directory, we create a new directory for each run (in the directory named for the workflow) and place the trace there. This would support: (a) including multiple traces output from a single run (possible with COMAD); (b) storing a copy of the workflow in the run directory so that the specification of the executed workflow is not lost; (c) using this directory as the default location for other files produced during the run (and possibly temporary directories holding intermediate files useful for resuming an aborted run, say); (d) summary reports generated for the run; (e) copies of input data; etc.</p>
<p>Adding another level of directories would add to the work of navigating to the latest trace, but this would not be a problem if we automatically opened the trace using the provenance browser at the end of each run (as suggested in Bug 3546), and if we provided an additional view of recent traces that hid this extra nesting.</p> Bug #3557 (New): Provide data dependency graph view in provenance browserhttps://projects.ecoinformatics.org/ecoinfo/issues/35572008-10-23T18:21:04ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The provenance browser currently provides four views of a workflow run: the raw trace file, the "Collection History", an invocation dependency graph, and a "Dependency History" graph. The latter is a hybrid of the invocation dependencies and the data dependencies. What we don't have is a pure data dependency graph.</p>
<p>As an example, compare the data dependency graph (<a class="external" href="http://daks.ucdavis.edu/~sbowers/prov/pq1.gif">http://daks.ucdavis.edu/~sbowers/prov/pq1.gif</a>) for the COMAD implementation of the 1st provenance challenge (<a class="external" href="http://twiki.ipaw.info/bin/view/Challenge/DAKS">http://twiki.ipaw.info/bin/view/Challenge/DAKS</a>) with the dependency history view provided by the provenance browser (attached). I'd like to be able to view the first kind of graph in the provenance browser as well.</p> Bug #3555 (New): enhancements to Trace File view in provenance browserhttps://projects.ecoinformatics.org/ecoinfo/issues/35552008-10-23T00:25:07ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The provenance browser allows one to view the raw trace file (in XML format) of the run currently loaded in the browser. It would be very nice if one could navigate the other provenance graphs by clicking on XML elements in this Trace File view, and have the details for the selected element appear in the left-hand panels, etc, similar to what happens when one clicks on the data, collections, and invocation items in the Dependency History and Collection History views. Clicking on these graphical views could highlight the corresponding lines in the trace file view as well. (Clicking on a token, object, or invocation id in the XML might also take you to the referenced item.)</p>
<p>A second very helpful feature would be color-coding of the XML in the Trace File view. Not color-coding to emphasize the XML syntax, but rather to highlight what XML elements comprise Collections, Data elements, metadata/annotations, and provenance records, say.</p>
<p>I think these enhancements would make it easier to make sense of the trace and provenance information stored in it.</p> Bug #3546 (New): Automatically load trace for a completed run into the provenance browserhttps://projects.ecoinformatics.org/ecoinfo/issues/35462008-10-22T17:40:53ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>At present, viewing the trace of a workflow run via the provenance browser (the one in the provenance-apps module) requires either running the provenance browser from the command line or navigating to the trace for the latest workflow run in the MyTraces subtree in the Traces panel of the Workspace pane in Kepler. I almost always want to see the trace immediately on running a workflow.</p>
<p>Could we provide an option to load the trace of the current run automatically when it completes?</p>