Project

General

Profile

Actions

Bug #4325

closed

Workflow Run Manager - deleted runs sometimes reappear after Kepler relaunch

Added by Derik Barseghian over 14 years ago. Updated over 14 years ago.

Status:
Resolved
Priority:
Normal
Category:
workflow run manager
Target version:
Start date:
08/20/2009
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
4325

Description

To reproduce:
- Launch Kepler
- Run a workflow twice.
- Export the 2nd run.
- Run the workflow again.
- Delete third run from WRM
- Close, then relaunch Kepler. #3's back

The problem is related to exporting a run.

Actions #1

Updated by Derik Barseghian over 14 years ago

Also: If instead of deleting the third run, you delete the second run (the one you exported), and then close and relaunch kepler, you get a FileNotFoundException from CacheManager's getCacheObjectIterator. This is because the file is still in cacheContentTable. Watching the DB after a delete operation, the DB row never actually gets deleted.

Actions #2

Updated by Derik Barseghian over 14 years ago

If you:
- Launch Kepler
- Run a workflow twice.
- Export the 2nd run.
- Delete the 2nd run
- Close and relaunch, the 2nd run is gone and no errors.

Actions #3

Updated by Derik Barseghian over 14 years ago

Now I'm unable to reproduce the error from the procedure in comment #2, even though I'd done it a few times. Maybe something more insidious happening here...

Actions #4

Updated by Derik Barseghian over 14 years ago

One way to sometimes get this to occur seems to be quitting kepler immediately after clicking ok in the delete runs dialog. You needn't have exported any of the runs. The preparedStatements will return how many rows they've "deleted" and the delete method always seems to complete before kepler-shutdown, yet when you connect to the database post-kepler shutdown (connecting to the db file instead of server) the run rows still exist in workflow_exec.

Actions #5

Updated by Daniel Crawl over 14 years ago

HSQL delays writing to the file system after updates occur to the database. According to the docs, http://hsqldb.org/doc/guide/ch09.html#set_write_delay-section, the default delay is 20 seconds, but my hsqldb.script says it's 10 seconds. We should probably decrease this.

Actions #6

Updated by Derik Barseghian over 14 years ago

Dan and I discussed this, it seems like a likely culprit. I'm going to try to change the write delay to see if it fixes this bug, and if/how bad it hurts performance. We figure a better solution is probably to do a clean shutdown of the server when Kepler quits, if it's the last Kepler instance running. To know if it's the last Kepler running, we might e.g. store and check, on shutdown, a numberKeplersRunning variable in the database.

Actions #7

Updated by Derik Barseghian over 14 years ago

Decreased write_delay to 100ms in r20931. Needs more testing to check for any performance hit, but so far so good. I can't get runs to reappear no matter how fast I quick after doing a delete (good deal). If this does impact performance, I suspect we still want to make write_delay much shorter than what it was, 10s, so that the db loses less in cases of kepler-crash.

Actions #8

Updated by Derik Barseghian over 14 years ago

This is fixed, but we need to test to see if the fix degraded performance (so far it seems fine) before closing.

Dan, do you have any particular workflows you use for testing provenance performance?

Actions #9

Updated by Derik Barseghian over 14 years ago

No reports of degraded performance, going to close this.

Actions #10

Updated by Redmine Admin about 11 years ago

Original Bugzilla ID was 4325

Actions

Also available in: Atom PDF