Kepler: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362008-10-27T22:04:24ZEcoinformatics Redmine
Redmine Bug #3576 (New): support for accessing cascading metadata from within CompositeCoactorhttps://projects.ecoinformatics.org/ecoinfo/issues/35762008-10-27T22:04:24ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CompositeCoactor class extends TypedCompositeActor (and implements Coactor) to provide a mechanism for implementing coactors from SDF sub-workflows of conventional actors. Data is extracted from the read scope using input ports named according to the types of data to be extracted, e.g., a port named 'StringToken' will extract a single string token out of the current read scope and provide it as input to the subworkflow on each firing; a port named 'StringToken+' will provide an array token containing one or more string tokens extracted from the read scope on each firing, etc.</p>
<p>Currently, metadata or annotations applied to the top-level collection in a scope-match can also be extracted by specifying the key for the metadata element required (e.g., by naming a port 'StringToken [key=filename]'). One thing that can't be done is to access metadata applied to collections above the read scope and cascading down to it. This function would be very useful for reusing information across multiple invocations of a composite coactor.</p> Bug #3574 (New): Support for importing directory contents using CollectionSourcehttps://projects.ecoinformatics.org/ecoinfo/issues/35742008-10-27T18:49:37ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>A common workflow pattern is to take as input all of the files (or those of a particular type) in a directory on a researcher's computer system. For example, there are COMAD workflows that process all the FASTA files in a directory, creating a collection for each FASTA file and storing the contained DNA or protein sequences in the corresponding input collections.</p>
<p>Once the CollectionSource actor is able to automatically import the contents of files (see bug 3573), it will be extremely useful to refer to directories in the XML input to CollectionReader or CollectionComposer and have the actor import all of the files it finds there. Another useful feature would be the option of having CollectionSource descend into sub-directories, creating a nested collection for each and importing contained files into the corresponding subcollections. Whole directories of scientific data files could then easily serve as input to COMAD workflows.</p>
<p>These features eventually could make it much easier to stage data for input to a workflow run without requiring modification of the workflow specification itself.</p> Bug #3573 (New): Support for importing file contents automatically using CollectionSourcehttps://projects.ecoinformatics.org/ecoinfo/issues/35732008-10-27T18:32:05ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CollectionComposer and CollectionReader actors extend CollectionSource to read XML representations of the input to a COMAD workflow and translate them into data tokens, metadata tokens, collection delimiters, etc. Presently all data read in by CollectionComposer must be contained in the XML that is provided either as a parameter value to CollectionComposer or as an external file to CollectionReader. However, many workflows use data from other files and this data currently must be read and parsed by explicit actors elsewhere in the workflow. The input to a workflow would be clearer, and workflows simpler and more transparent, if files could be referred to in the XML processed by CollectionSource, and if CollectionSource were to automatically include the contents of these files in the workflow input.</p>
<p>A simple first step would be to enable CollectionComposer to read in text files either as a TextFile collection containing a single StringToken holding the contents of the file, or a TextFile collection containing one StringToken for each line of the text file. (Existing COMAD workflows demonstrate the usefulness of both approaches).</p>
<p>A second step would be to allow one to register format-specific parsers for CollectionSource to use when reading particular types of files. For example, a FASTA file parser could be plugged in that would create a FASTA collection filled with (e.g., DNA) Sequence tokens, and a Nexus file parser could create a Nexus collection containing CharacterMatrix, WeightVector, and phylogenetic Tree tokens.</p> Bug #3568 (New): support for writing COMAD-style trace files from the Provenance Recorderhttps://projects.ecoinformatics.org/ecoinfo/issues/35682008-10-24T23:48:40ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>I have heard rumors that there are plans to enable the general-purpose Provenance Recorder in Kepler to (optionally) write out it's records of a run using the trace file format employed by the COMAD framework. This would be extremely helpful because one could then view the provenance captured during any workflow run via the provenance browser.</p>
<p>I expect that this also would highlight information that cannot be stored in a COMAD trace or represented in the provenance browser, and so lead to enhancements of these to make them more generally useful.</p> Bug #3566 (New): order collection contents displayed in provenance browser?https://projects.ecoinformatics.org/ecoinfo/issues/35662008-10-24T20:04:56ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>When the data elements inside COMAD collections are displayed in the provenance browser they seem to be arbitrarily ordered, both in the dependency history view and the collection history view. It would be helpful to have these data and collection elements ordered according to their sequence in the data stream (as recorded in the trace file).</p>
<p>Consider an invocation of the Gblocks actor. It takes a collection of biological (e.g., protein) sequences and outputs another collection of sequences, where each output protein sequence corresponds to one input (those segments of the input sequences not aligned reliably between the input sequences having been removed from the outputs). If the sequences represented in the dependency history were ordered then clicking on the third input protein sequence and then the third output sequence displayed in the dependency history view would give one a before-and-after view of that sequence. As it is, it's difficult finding the corresponding sequences in the input and outputs.</p>
<p>The same holds for the collection history view. In addition, if the collections themselves were ordered according to trace order, then incremental buildup of the collections when stepping through the actor invocations would more closely represent the construction of the collection stream during workflow execution.</p> Bug #3560 (New): Color-code contents of CollectionDisplayhttps://projects.ecoinformatics.org/ecoinfo/issues/35602008-10-24T00:53:33ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CollectionDisplay actor provides a live, XML-formatted view of the data stream arriving at the actor in a COMAD workflow. The window contents would be easier to understand if they were color-coded to distinguish Collections, Data elements, metadata/annotations, and provenance records, as suggested for the Trace File view of the provenance browser in bug 3555 (the color-coding should be same for both).</p> Bug #3559 (New): month field in trace file name is off by onehttps://projects.ecoinformatics.org/ecoinfo/issues/35592008-10-24T00:38:31ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The name of a new run trace file comprises the name of the workflow, a timestamp, and a number distinguishing the trace in case multiple traces were created during the run, e.g. Clustal_20080923_172447_1.trace for a run of the Clustal workflow a few minutes ago. The month field of the date portion of the timestamp is off by one.</p> Bug #3558 (New): Store each workflow run trace in it's own directoryhttps://projects.ecoinformatics.org/ecoinfo/issues/35582008-10-23T23:33:32ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The ppod-gui module includes enhancements to the Kepler GUI for browsing and viewing the traces of previously run workflows. Each trace is a single, XML-formatted file, and the traces are organized in folders named according to the workflows run. These folders corresponding to actual directories on the user's machine. Double-clicking on a trace opens it in the provenance browser.</p>
<p>I suggest that instead of putting all the traces of runs of a particular workflow in the same directory, we create a new directory for each run (in the directory named for the workflow) and place the trace there. This would support: (a) including multiple traces output from a single run (possible with COMAD); (b) storing a copy of the workflow in the run directory so that the specification of the executed workflow is not lost; (c) using this directory as the default location for other files produced during the run (and possibly temporary directories holding intermediate files useful for resuming an aborted run, say); (d) summary reports generated for the run; (e) copies of input data; etc.</p>
<p>Adding another level of directories would add to the work of navigating to the latest trace, but this would not be a problem if we automatically opened the trace using the provenance browser at the end of each run (as suggested in Bug 3546), and if we provided an additional view of recent traces that hid this extra nesting.</p> Bug #3557 (New): Provide data dependency graph view in provenance browserhttps://projects.ecoinformatics.org/ecoinfo/issues/35572008-10-23T18:21:04ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The provenance browser currently provides four views of a workflow run: the raw trace file, the "Collection History", an invocation dependency graph, and a "Dependency History" graph. The latter is a hybrid of the invocation dependencies and the data dependencies. What we don't have is a pure data dependency graph.</p>
<p>As an example, compare the data dependency graph (<a class="external" href="http://daks.ucdavis.edu/~sbowers/prov/pq1.gif">http://daks.ucdavis.edu/~sbowers/prov/pq1.gif</a>) for the COMAD implementation of the 1st provenance challenge (<a class="external" href="http://twiki.ipaw.info/bin/view/Challenge/DAKS">http://twiki.ipaw.info/bin/view/Challenge/DAKS</a>) with the dependency history view provided by the provenance browser (attached). I'd like to be able to view the first kind of graph in the provenance browser as well.</p> Bug #3555 (New): enhancements to Trace File view in provenance browserhttps://projects.ecoinformatics.org/ecoinfo/issues/35552008-10-23T00:25:07ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The provenance browser allows one to view the raw trace file (in XML format) of the run currently loaded in the browser. It would be very nice if one could navigate the other provenance graphs by clicking on XML elements in this Trace File view, and have the details for the selected element appear in the left-hand panels, etc, similar to what happens when one clicks on the data, collections, and invocation items in the Dependency History and Collection History views. Clicking on these graphical views could highlight the corresponding lines in the trace file view as well. (Clicking on a token, object, or invocation id in the XML might also take you to the referenced item.)</p>
<p>A second very helpful feature would be color-coding of the XML in the Trace File view. Not color-coding to emphasize the XML syntax, but rather to highlight what XML elements comprise Collections, Data elements, metadata/annotations, and provenance records, say.</p>
<p>I think these enhancements would make it easier to make sense of the trace and provenance information stored in it.</p> Bug #3552 (In Progress): Annotation elements in trace file do not appear in details pane of prove...https://projects.ecoinformatics.org/ecoinfo/issues/35522008-10-22T20:07:51ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The provenance browser shows the details for the selected data or collection element of a trace in the lower left-hand panel. When the element selected has been annotated with one or more Metadata elements, these appear as name-value pairs under the heading "Annotations" in that panel. However, if an <strong>Annotation</strong> element has been applied to the selected element, it is not displayed.</p>
<p>Annotation and Metadata elements should both appear in the details panel, probably under two distinct headings, 'Metadata' and 'Annotations'.</p>
<p>(Note that the distinction between metadata and annotations in COMAD is that the former are reserved for things that have always been true about the item it is associated with, while the former can be used for any purpose. Consequently, Metadata elements cannot be deleted or replaced during a COMAD workflow run, while Annotation elements can be.)</p> Bug #3546 (New): Automatically load trace for a completed run into the provenance browserhttps://projects.ecoinformatics.org/ecoinfo/issues/35462008-10-22T17:40:53ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>At present, viewing the trace of a workflow run via the provenance browser (the one in the provenance-apps module) requires either running the provenance browser from the command line or navigating to the trace for the latest workflow run in the MyTraces subtree in the Traces panel of the Workspace pane in Kepler. I almost always want to see the trace immediately on running a workflow.</p>
<p>Could we provide an option to load the trace of the current run automatically when it completes?</p> Bug #2251 (In Progress): Need to document how to use Kepler for phylogeneticshttps://projects.ecoinformatics.org/ecoinfo/issues/22512005-11-07T02:23:51ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>We need to write documentation for users of the phylogenetics actors in Kepler.<br /> This documentation should include:</p>
<p>1. Instructions for installing programs wrapped by phylogenetics actors and<br />configuring Kepler to use these programs.</p>
<p>2. Sample workflows and data sets demonstrating the capabilities of the<br />phylogenetics actors.</p>
<p>3. Tutorials using the sample workflows as examples.</p> Bug #2250 (In Progress): Need documentation for collection-oriented workflow approachhttps://projects.ecoinformatics.org/ecoinfo/issues/22502005-11-07T02:11:42ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>We need to write documentation describing:</p>
<p>1. How the framework supporting collection-oriented workflows works.<br />2. How to compose, configure, and run collection-oriented workflows.<br />3. How to write collection-oriented actors.</p> Bug #2249 (In Progress): Need to support a useful subset of PHYLIP (PHYlogeny Inference Package)https://projects.ecoinformatics.org/ecoinfo/issues/22492005-11-07T01:59:59ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>At present, actors wrap the following PHYLIP programs (but do not yet support<br />all features of these programs):</p>
<pre><code>consense, dnaml, dnamlk, dnapars, dnapenny, <br /> drawgram, pars, penny, proml, promlk, protpars</code></pre>
<p>At the very least, the following programs need to be wrapped by new actors and<br />their basic capabilities supported:</p>
<pre><code>dnadist, protdist, seqboot, fitch, kitsch, neighbor, factor, drawtree</code></pre>