Project

General

Profile

Bug #2895

Distributed Execution Tracking Bug

Added by Chad Berkley over 11 years ago. Updated over 9 years ago.

Status:
New
Priority:
Immediate
Assignee:
Category:
core
Target version:
Start date:
07/23/2007
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
2895

Description

Chad and Lucas are developing the distributed execution system for kepler. The system is currently working in a very simplified way. This bug is a consolidation of bug 1891 and bug 1899.

The following items need to be added:

  • Make sure that the JNI libraries can be accessed via the slave and that the ENM actors will work on the slave
  • we might have to solve the problem that kepler has where you can't run multiple instances of the application with the same user account. The problem is that the cache uses an embedded database which only allows one connection at a time. the db is stored in the .kepler directory so if you try to run kepler twice at the same time, you'll get an error on the 2nd one that the db is already in use. If we have a cluster where the slave is distributed via a single home directory, this will be a problem
  • Matt came up with the idea of using the ecogrid registry as a way of doing node discovery.
  • Get this to run on the NCEAS ROCKS cluster.
  • we need to deal with transferring support files to the slave(s). This includes doing the indirect transfers between slaves (instead of transferring results back to the master then to the next slave, the slaves should be able to transfer data between each other).

History

#1 Updated by Redmine Admin over 5 years ago

Original Bugzilla ID was 2895

Also available in: Atom PDF