Fault Tolerant Scheduling in Distributed Networks

dc.contributor.authorWeissman, Jon B.
dc.contributor.authorWomack, David
dc.date.accessioned2023-10-19T15:21:30Z
dc.date.available2023-10-19T15:21:30Z
dc.date.issued1996-09-25
dc.description.abstractWe present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site. A system architecture for coordinating the replicas is described. The fault tolerance mechanism is being added to a wide-area scheduler prototype in the Legion parallel processing system. A performance evaluation of the fault tolerant scheduler and a comparison to the traditional means of fault tolerance, checkpoint-recovery, is planned.
dc.description.departmentComputer Science
dc.description.sponsorshipThis work was partially funded by NSF ASC-9625000.
dc.identifier.urihttps://hdl.handle.net/20.500.12588/2110
dc.language.isoen_US
dc.publisherUTSA Department of Computer Science
dc.relation.ispartofseriesTechnical Report; CS-96-10
dc.titleFault Tolerant Scheduling in Distributed Networks
dc.typeTechnical Report

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Weissman_Womack_CS-96-10.pdf
Size:
93.6 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.86 KB
Format:
Item-specific license agreed upon to submission
Description: