Fault Tolerant Scheduling in Distributed Networks
dc.contributor.author | Weissman, Jon B. | |
dc.contributor.author | Womack, David | |
dc.date.accessioned | 2023-10-19T15:21:30Z | |
dc.date.available | 2023-10-19T15:21:30Z | |
dc.date.issued | 1996-09-25 | |
dc.description.abstract | We present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site. A system architecture for coordinating the replicas is described. The fault tolerance mechanism is being added to a wide-area scheduler prototype in the Legion parallel processing system. A performance evaluation of the fault tolerant scheduler and a comparison to the traditional means of fault tolerance, checkpoint-recovery, is planned. | |
dc.description.department | Computer Science | |
dc.description.sponsorship | This work was partially funded by NSF ASC-9625000. | |
dc.identifier.uri | https://hdl.handle.net/20.500.12588/2110 | |
dc.language.iso | en_US | |
dc.publisher | UTSA Department of Computer Science | |
dc.relation.ispartofseries | Technical Report; CS-96-10 | |
dc.title | Fault Tolerant Scheduling in Distributed Networks | |
dc.type | Technical Report |