Project

General

Profile

Bug #3073

Heartbeat function for slaves

Added by Chad Berkley over 11 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
distributed execution
Target version:
Start date:
01/16/2008
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
3073

Description

When a slave registers itself with the earthgrid registry, there needs to be a way to tell if that slave still exists and if it doesn't, the registry entry needs to be removed. If we don't do this, we are quickly going to end up with a lot of dead registry entries since a lot of the time, kepler will not terminate normally.

There should also be an indication in the gui as to whether the slave is alive or not when the user chooses it.

History

#1 Updated by jianwu jianwu over 9 years ago

Either EcoGrid registry or Master should check the heartbeats from slaves. But the services at EcoGrid are to respond the outside invocation, not to execute itself periodically to check the heartbeats. In theory, Masters can check heartbeats of slaves. But we'd better remove unavailable slaves when users use ‘Distributed Computing Options’ menu item from Master side to check available slaves. Otherwise, the registered unavailable slaves are confusing and useless to Masters.

So I modified the code for ‘Distributed Computing Options’ menu item at version 23397. Now the availability of all registered slaves will by checked, and the slaves will be de-registered and not show up if they are not available.

#2 Updated by Redmine Admin over 6 years ago

Original Bugzilla ID was 3073

Also available in: Atom PDF