M Raynal and F Tronel 1999 Distrib. Syst. Engng. 6 95 doi:10.1088/0967-1846/6/3/301
M Raynal and F Tronel
Show affiliationsA group membership failure (in short, a group failure) occurs when one of the group members crashes. A group failure detection protocol has to inform all the non-crashed members of the group that this group entity has crashed. Ideally, such a protocol should be live (if a process crashes, then the group failure has to be detected) and safe (if a group failure is claimed, then at least one process has crashed).
Unreliable asynchronous distributed systems are characterized by the impossibility for a process to get an accurate view of the system state. Consequently, the design of a group failure detection protocol that is both safe and live is a problem that cannot be solved in all runs of an asynchronous distributed system.
This paper analyses a group failure detection protocol whose design naturally ensures its liveness. We show that by appropriately tuning some of its duration-related parameters, the safety property can be guaranteed with a probability as close to one as desired. This analysis shows that, in real distributed systems, it is possible to achieve failure detection with a negligible probability of wrong suspicions.
07.05.Bx Computer systems: hardware, operating systems, computer languages, and utilities
84.40.Ua Telecommunications: signal transmission and processing; communication satellites
Issue 3 (September 1999)
Received 6 September 1999
M Raynal and F Tronel 1999 Distrib. Syst. Engng. 6 95
Matjaž Perc 2005 New J. Phys. 7 252
Jennifer E Decker et al 2006 Metrologia 43 L51
M Munzinger et al 2005 New J. Phys. 7 68
Randall J. Splinter et al. 1998 ApJ 497 38
Éanna É Flanagan and Scott A Hughes 2005 New J. Phys. 7 204
Bo Chen et al 2006 New J. Phys. 8 274
Inna Ponomareva et al 2003 New J. Phys. 5 119
M V Berry 2002 New J. Phys. 4 66
V Latora and M Marchiori 2007 New J. Phys. 9 188