Kepler: Issueshttps://projects.ecoinformatics.org/ecoinfo/https://projects.ecoinformatics.org/ecoinfo/ecoinfo/favicon.ico?14691340362012-07-18T20:12:02ZEcoinformatics Redmine
Redmine Bug #5640 (New): associate timezones with all timestamps recorded in provenance tableshttps://projects.ecoinformatics.org/ecoinfo/issues/56402012-07-18T20:12:02ZDerik Barseghianbarseghian@nceas.ucsb.edu
<p>Right now provenance records timestamps in local time without recording timezone. This is lossy. E.g. one problem scenario: User runs a workflow on their laptop in one timezone, moves timezones, and exports the run. The exported run now has the wrong timestamp recorded. Related is that WRM is assuming local timezone and during run-export, <strong>adding</strong> the local timezone in the recorded run (separate bug).</p>
<p>Part of this bug is to also deal with a user's existing timestamps. While it's not safe to associate local timezone to all their existing timestamps, it's the best guess we can make, short of giving the user a way to change them. The user should at least be made aware this is what's going to happen during the provenance schema upgrade.</p> Bug #5429 (New): improve default provenance store performancehttps://projects.ecoinformatics.org/ecoinfo/issues/54292011-06-24T20:13:24ZDerik Barseghianbarseghian@nceas.ucsb.edu
<p>Currently there can be some big performance penalties when using kepler with provenance turned on (by default using hsql). It would be great to improve these.</p>
<p>Unless noted, references to workflow execution times below refer to the reap GDD wf set to process 200days of data:<br /><a class="external" href="https://code.ecoinformatics.org/code/reap/trunk/usecases/terrestrial/workflows/derivedMETProducts/growingDegreeDays.kar">https://code.ecoinformatics.org/code/reap/trunk/usecases/terrestrial/workflows/derivedMETProducts/growingDegreeDays.kar</a></p>
<p>I see/saw a few issues:</p>
<p>-1) at one point I mentioned kepler shutdown was taking a very long time. This isn't an issue anymore, shutdown seems near instant.</p>
<p>0) the pre-initialize stage of workflow execution can take a very long time and grows longer w/ each subsequent execution when running with a provenance store that's large. E.g. up to 15m.<br />Dan's fixed this issue, I believe w/ r27746. Pre-init is now close to instant or just a few seconds.</p>
<p>1) execution of the workflow w/ provenance off takes a few seconds. With provenance on, it takes about 4min to run the first time with an empty provenance store.</p>
<p>2) subsequent executions of the same workflow take longer to run. <br />E.g. Here are the execution times of 9 runs of the workflow on 2 different machines: <br />10.6 macbook 2.2ghz intel core 2 duo w/ 4gb RAM:<br />4:01, 4:03, 3:57, 7:43, 8:07, 8:01, 8:33, 8:10, 8:33, <br />ubuntu 10.04 dual 3ghz w/ 2gb RAM: <br />4:03, 4:13, 4:32, 9:13, 12:32, 8:08, 9:54, 9:06, 11:53</p>
<p>3) startup time can take a very long time when the prior Kepler invocation ran data/token intensive workflows. I believe what's happening is hsql is incorporating the changes in the log file into the .data file. I think something's happening w/ the .backup file too. The data file slowly grows very large (a lot more than by 200mb), and finally the log file drops to near 0, and then the data file decreases in size to a size larger than where it started. I think with the default log file max size of 200mb, startup can take on the order of 10-20m. I've tested w/ a variety of log file sizes. Making it dramatically smaller, e.g. 5mb, dramatically improves startup time, but comes at a huge workflow execution time penalty (~20m to run the wf), so this is an unacceptable fix. The execution penalty starts happening when the log file max size is set smaller than about 100mb. With a 100mb log file, startup is still very slow.</p>
<p>One thing I've found that improves execution time performance is increasing the 'memory cache exponent' setting (hsqldb.cache_scale) from the default of 14 to the max of 18. This setting "Indicates the maximum number of rows of cached tables that are held in memory, calculated as 3 <strong>(2</strong>*value) (three multiplied by (two to the power value)). The default results in up to 3*16384 rows from all cached tables being held in memory at any time." <br />With a 200mb log file max size, and cache_scale=18, the first run of the workflow takes about 2:17.</p> Bug #4764 (New): ProvenanceRecorder.changeExecuted slow after workflow runhttps://projects.ecoinformatics.org/ecoinfo/issues/47642010-02-06T02:19:48ZOliver Soongsoong@nceas.ucsb.edu
<p>If I run any of the tpc workflows (e.g., tpc09), any subsequent changes to Kepler (say changing workflow parameters) cause Java to peg one of my CPU cores. This includes canceling changes to RExpression. I've seen this behavior on Windows XP and 7. While I haven't seen it under linux or OS X, I haven't tested those as extensively. I have tried small test workflows, and haven't seen a particularly noticeable slowdown, so it may be related to the size of the workflow run. I have to restart Kepler to get things back up to speed, and it's bad enough that I'm actually restarting Kepler after every run.</p>
<p>I'm not sure it's a memory thing. java.exe is about maxed out on memory (~0.5 GB) in the Task Manager, but the Check System Settings window says I have 46% free. I was watching jstat, and changes don't seem to trigger a flurry of garbage collection.</p> Bug #3668 (New): Define contents of "Publication Ready Archive"https://projects.ecoinformatics.org/ecoinfo/issues/36682008-11-13T17:41:49Zben leinfelderleinfelder@nceas.ucsb.edu
<p>It's still unclear exactly what will be included (user specified?) and how it will be structured.<br />From past discussion these seem to be likely:<br />-Original Input Data<br />-Intermediate Data products<br />-Various workflow outputs (images, tables, graphs, charts)<br />-Final Data results<br />-Workflow file (MOML)<br />-Defining parameters for included execution[s]<br />-Report instance (XML and/or PDF?</p>
<p>Also, will the the "PRA" contain more than one execution? I believe that was indicated in meetings</p> Bug #3652 (New): Ensure all Provenance Recording Types record the same thingshttps://projects.ecoinformatics.org/ecoinfo/issues/36522008-11-13T00:41:57ZDerik Barseghianbarseghian@nceas.ucsb.edu
<p>Recording Type SQL-SPA-v8 records files referenced by string tokens and the workflow moml. This is not done by the other recording types, and this will cause a problem for a user who wants to generate a report containing figures in output files but is trying to use a non SQL-SPA-v8 prov-store.</p>
<p>A user wanting to export a workflow run to a Publication Ready Archive but who is trying to use a non SQL-SPA-v8 prov-store would also have a problem.</p>