Kepler: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362010-11-30T22:50:17ZEcoinformatics Redmine
Redmine Bug #5249 (In Progress): test kepler for memory leakshttps://projects.ecoinformatics.org/ecoinfo/issues/52492010-11-30T22:50:17Zjianwu jianwujianwu@sdsc.edu
<p>a separate bug only for memory leak fixing for Kepler suite. bug 5095 depends on it.</p> Bug #5095 (In Progress): test kepler and wrp for memory leakshttps://projects.ecoinformatics.org/ecoinfo/issues/50952010-07-14T22:56:35ZMatt Jonesjones@nceas.ucsb.edu
<p>Oliver Soong reported having difficulties with memory leaks. There are two specific bugs about this, which I have set to block this testing bug. In addition, testing may reveal additional leaks, which should be fixed before 2.1 is released. Here's Oliver's synopsis of the issues:</p>
<p>I think this is limited to the wrp suite, but Kepler’s performance degrades significantly over time. Provenance recording can become prohibitively slow, and there is no native in-Kepler fix. There is a large memory leak somewhere, and many components are quite memory-intensive regardless. Given the intention to record executions and the large number of analyses scientists perform, I suspect any dedicated user of Kepler will quickly encounter data management problems. In my case, I stopped using local repositories and began closing Kepler after running any large workflows.</p> Bug #4310 (New): ValueListeners receive valueChanged events when values have not changedhttps://projects.ecoinformatics.org/ecoinfo/issues/43102009-08-13T18:34:29ZDaniel Crawldanielcrawl@gmail.com
<p>A ValueListener sometimes receives events for a Settable when the Settable's value has not changed. This can lead to a stack overflow since reading the value of the Settable may generate another valueChanged event.</p>
<p>To fix this, valueChanged not be called unless the value has actually changed.</p> Bug #4300 (New): Animate at Runtime" checkbox stays checked when director is replacedhttps://projects.ecoinformatics.org/ecoinfo/issues/43002009-08-07T23:27:12ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>If you enable run-time animation of a workflow and then swap in a different director, the "Animate at Runtime" menu item remains checked. However, the next run of the workflow will not be animated; apparently the newly inserted director does not know about the animation?</p> Bug #4095 (New): Java Package Ontology (e.g. to facilitate Ptolemy component access)https://projects.ecoinformatics.org/ecoinfo/issues/40952009-05-20T23:35:10ZBertram Ludaescherludaesch@ucdavis.edu
<p>Ptolemy has a number of neat components that are currently very hard to get to. <br />For example, MonitorReceiverContents under ptolemy.vergil.actor.lib is nice for certain runtime monitoring and "debugging" purposes (thanks Edward! :)</p>
<p>If we created a <strong>virtual</strong> "Java Package Ontology" (JPO), i.e., one that simply considers maps containment as concept subsumption, then this would allow us to browse and search (!) any and all otherwise unannotated components (Ptolemy or Kepler) easily.</p>
<p>It seems this would make the good (bad) old habit of InstantiateComponent / InstantiateAttribute superfluous.</p>
<p>Bertram</p> Bug #4046 (New): ComadTest should report more details when detecting an errorhttps://projects.ecoinformatics.org/ecoinfo/issues/40462009-05-01T20:05:58ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The ComadTest actor is used to create automated tests of COMAD features. Because it is often useful to include several instances of ComadTest in the same test workflow, the ComadTest actor should report its name when it throws an exception. Ideally it also would indicate something about how the data stream it received during the current workflow run does not match what it received during training--perhaps the element name and line number of the first mismatch in the trace?</p> Bug #3915 (New): The error dialogue won't go away.https://projects.ecoinformatics.org/ecoinfo/issues/39152009-03-24T00:11:16Zjianwu jianwujianwu@sdsc.edu
<p>Workflow: There are two composite actors, one called CompositeActor1 on the top level, another called CompositeActor2, in CompositeActor1. There are two String Parameters: one called p1 on the top level, another called p2 with value '$p1/l', in CompositeActor1. p2 is used in actors in CompositeActor1, such as expression, file open.</p>
<p>Steps: <br />1) Open the whole workflow,<br />2) Open CompositeActor1,<br />3) Open CompositeActor2,<br />4) Close CompositeActor2,<br />5) Delete CompositeActor2,<br />6) Change the value of p1.</p>
<p>There will be an error saying that: "Error evaluating expression: $p1/l in .CompositeActor2.p2 Because The ID p1 is undefined."</p>
<p>There is no way to close the error except closing Kepler by force, which will lost all unsaved modification.</p>
<p>I found the bug with Kepler version 16865 and ptolemy version 52661, but I think this bug is always there.</p>
<p>I attached the workflow and error dialogue.</p> Bug #3671 (New): Configurable workspace directory for holding workflows, data, and run productshttps://projects.ecoinformatics.org/ecoinfo/issues/36712008-11-13T19:04:28ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>In bug 3558 I requested that a new directory be created on the user's system for each workflow run, and that outputs of the run, trace files, etc, be placed there. In bug 3585 I asked for an API that would make it easy for actors to write output files to this 'run' directory.</p>
<p>But where should these run directories themselves go? I believe we should allow users to specify a directory for holding their 'workspace' in a location of their choosing. In the workspace could go a directory for holding the workflows they develop and use for a particular project (we've done this before in the Kepler/ppod release, but the directory location was fixed), another directory for holding workflow runs, etc.</p>
<p>One alternative would be to hide all this somewhere inside .kepler in the user's home directory. However, I don't think this is the best approach for two reasons. First, the point is to make it easy for users to find their workflows, data, and workflow run products, and to load the latter into other tools for visualization and further analysis. The .kepler directory is hidden and should be used for things that would distract the user if made more prominent. Second, in practice the .kepler directory is frequently deleted (sometimes when installing a new version of Kepler, for example). A user's work should not be deleted at such times, so .kepler should be used only for things that can be regenerated as needed (e.g. data caches).</p>
<p>Another alternative would be to store everything discussed here in a database. However, many workflows (a) generate large numbers of large data files that would be awkward to place in a database, and (b) users often want immediate file-system access to these output files anyway because the other tools they use to review and further analyze their results expect the data to be stored in files. There shouldn't be an extra step of exporting workflow run products from a database to a directory of files after each workflow run in such cases.</p>
<p>I also think users should have the option of creating multiple workspaces, each with their own directories of workflows and runs. A workspace browser in Kepler could make it easy to view workflows or runs from a particular workspace or all of them at once.</p>
<p>Note that all this has ramifications for distributed execution. Following execution on multiple nodes, the files expected to be found in a local run directory will need to be copied automatically from each compute node.</p> Bug #3585 (New): Provide API for creating output files during a workflow runhttps://projects.ecoinformatics.org/ecoinfo/issues/35852008-10-30T18:47:57ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>I suggested in bug 3558 that Kepler should create a directory on the user's system for each workflow run and store the trace(s) of the run there along with other files produced during the run.</p>
<p>To make use of this facility easy for actor developers, we could provide an API for easily creating output files (e.g., graphics files containing plots of data) in this location (or copying them from temporary directories elsewhere) and ensuring that such files are named uniquely in that directory.</p>
<p>Note that while the default implementation could leave these run directories on the user's machine, an alternative implementation could transfer data files created during a run from these directories to some other data store, load the trace files into a DBMS to make them queryable, etc.</p> Bug #3576 (New): support for accessing cascading metadata from within CompositeCoactorhttps://projects.ecoinformatics.org/ecoinfo/issues/35762008-10-27T22:04:24ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CompositeCoactor class extends TypedCompositeActor (and implements Coactor) to provide a mechanism for implementing coactors from SDF sub-workflows of conventional actors. Data is extracted from the read scope using input ports named according to the types of data to be extracted, e.g., a port named 'StringToken' will extract a single string token out of the current read scope and provide it as input to the subworkflow on each firing; a port named 'StringToken+' will provide an array token containing one or more string tokens extracted from the read scope on each firing, etc.</p>
<p>Currently, metadata or annotations applied to the top-level collection in a scope-match can also be extracted by specifying the key for the metadata element required (e.g., by naming a port 'StringToken [key=filename]'). One thing that can't be done is to access metadata applied to collections above the read scope and cascading down to it. This function would be very useful for reusing information across multiple invocations of a composite coactor.</p> Bug #3574 (New): Support for importing directory contents using CollectionSourcehttps://projects.ecoinformatics.org/ecoinfo/issues/35742008-10-27T18:49:37ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>A common workflow pattern is to take as input all of the files (or those of a particular type) in a directory on a researcher's computer system. For example, there are COMAD workflows that process all the FASTA files in a directory, creating a collection for each FASTA file and storing the contained DNA or protein sequences in the corresponding input collections.</p>
<p>Once the CollectionSource actor is able to automatically import the contents of files (see bug 3573), it will be extremely useful to refer to directories in the XML input to CollectionReader or CollectionComposer and have the actor import all of the files it finds there. Another useful feature would be the option of having CollectionSource descend into sub-directories, creating a nested collection for each and importing contained files into the corresponding subcollections. Whole directories of scientific data files could then easily serve as input to COMAD workflows.</p>
<p>These features eventually could make it much easier to stage data for input to a workflow run without requiring modification of the workflow specification itself.</p> Bug #3573 (New): Support for importing file contents automatically using CollectionSourcehttps://projects.ecoinformatics.org/ecoinfo/issues/35732008-10-27T18:32:05ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CollectionComposer and CollectionReader actors extend CollectionSource to read XML representations of the input to a COMAD workflow and translate them into data tokens, metadata tokens, collection delimiters, etc. Presently all data read in by CollectionComposer must be contained in the XML that is provided either as a parameter value to CollectionComposer or as an external file to CollectionReader. However, many workflows use data from other files and this data currently must be read and parsed by explicit actors elsewhere in the workflow. The input to a workflow would be clearer, and workflows simpler and more transparent, if files could be referred to in the XML processed by CollectionSource, and if CollectionSource were to automatically include the contents of these files in the workflow input.</p>
<p>A simple first step would be to enable CollectionComposer to read in text files either as a TextFile collection containing a single StringToken holding the contents of the file, or a TextFile collection containing one StringToken for each line of the text file. (Existing COMAD workflows demonstrate the usefulness of both approaches).</p>
<p>A second step would be to allow one to register format-specific parsers for CollectionSource to use when reading particular types of files. For example, a FASTA file parser could be plugged in that would create a FASTA collection filled with (e.g., DNA) Sequence tokens, and a Nexus file parser could create a Nexus collection containing CharacterMatrix, WeightVector, and phylogenetic Tree tokens.</p> Bug #3568 (New): support for writing COMAD-style trace files from the Provenance Recorderhttps://projects.ecoinformatics.org/ecoinfo/issues/35682008-10-24T23:48:40ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>I have heard rumors that there are plans to enable the general-purpose Provenance Recorder in Kepler to (optionally) write out it's records of a run using the trace file format employed by the COMAD framework. This would be extremely helpful because one could then view the provenance captured during any workflow run via the provenance browser.</p>
<p>I expect that this also would highlight information that cannot be stored in a COMAD trace or represented in the provenance browser, and so lead to enhancements of these to make them more generally useful.</p> Bug #3566 (New): order collection contents displayed in provenance browser?https://projects.ecoinformatics.org/ecoinfo/issues/35662008-10-24T20:04:56ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>When the data elements inside COMAD collections are displayed in the provenance browser they seem to be arbitrarily ordered, both in the dependency history view and the collection history view. It would be helpful to have these data and collection elements ordered according to their sequence in the data stream (as recorded in the trace file).</p>
<p>Consider an invocation of the Gblocks actor. It takes a collection of biological (e.g., protein) sequences and outputs another collection of sequences, where each output protein sequence corresponds to one input (those segments of the input sequences not aligned reliably between the input sequences having been removed from the outputs). If the sequences represented in the dependency history were ordered then clicking on the third input protein sequence and then the third output sequence displayed in the dependency history view would give one a before-and-after view of that sequence. As it is, it's difficult finding the corresponding sequences in the input and outputs.</p>
<p>The same holds for the collection history view. In addition, if the collections themselves were ordered according to trace order, then incremental buildup of the collections when stepping through the actor invocations would more closely represent the construction of the collection stream during workflow execution.</p> Bug #3560 (New): Color-code contents of CollectionDisplayhttps://projects.ecoinformatics.org/ecoinfo/issues/35602008-10-24T00:53:33ZTimothy McPhillipsmcphillips@ecoinformatics.org
<p>The CollectionDisplay actor provides a live, XML-formatted view of the data stream arriving at the actor in a COMAD workflow. The window contents would be easier to understand if they were color-coded to distinguish Collections, Data elements, metadata/annotations, and provenance records, as suggested for the Trace File view of the provenance browser in bug 3555 (the color-coding should be same for both).</p>