Fault tolerant computer architecture

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law h...

Full description

Bibliographic Details
Main Author: Sorin, Daniel J.
Format: Electronic
Language:English
Published: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool Publishers, c2009.
Series:Synthesis lectures on computer architecture (Online), # 5.
Subjects:
Online Access:View fulltext via EzAccess
LEADER 05823nam a2200589 a 4500
001 3419
005 20090605144156.0
006 m e d
007 cr cn |||m|||a
008 090604s2009 caua fsab 000 0 eng d
020 # # |a 9781598299540 (electronic bk.) 
020 # # |z 9781598299533 (pbk.) 
024 7 # |a 10.2200/S00192ED1V01Y200904CAC005  |2 doi 
035 # # |a (CaBNvSL)gtp00534711 
040 # # |a CaBNvSL  |c CaBNvSL  |d CaBNvSL 
050 # 4 |a QA76.9.F38  |b S674 2009 
082 0 4 |a 004.2  |2 22 
100 1 # |a Sorin, Daniel J. 
245 1 0 |a Fault tolerant computer architecture  |c Daniel J. Sorin.  |h [electronic resource] / 
260 # # |a San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :  |b Morgan & Claypool Publishers,  |c c2009. 
300 # # |a 1 electronic text (xii, 103 p. : ill.) :  |b digital file. 
490 1 # |a Synthesis lectures on computer architecture,  |v # 5  |x 1935-3243 ; 
500 # # |a Part of: Synthesis digital library of engineering and computer science. 
500 # # |a Title from PDF t.p. (viewed on June 4, 2009). 
500 # # |a Series from website. 
504 # # |a Includes bibliographical references. 
505 0 # |a Introduction -- Goals of this book -- Faults, errors, and failures -- Masking -- Duration of faults and errors -- Underlying physical phenomena -- Trends leading to increased fault rates -- Smaller devices and hotter chips -- More devices per processor -- More complicated designs -- Error models -- Error type -- Error duration -- Number of simultaneous errors -- Fault tolerance metrics -- Availability -- Reliability -- Mean time to failure -- Mean time between failures -- Failures in time -- Architectural vulnerability factor -- The rest of this book -- References -- Error detection -- General concepts -- Physical redundancy -- Temporal redundancy -- Information redundancy -- The end-to-end argument -- Microprocessor cores -- Functional units -- Register files -- Tightly lockstepped redundant cores -- Redundant multithreading without lockstepping -- Dynamic verification of invariants -- High-level anomaly detection -- Using software to detect hardware errors -- Error detection tailored to specific fault models -- Caches and memory -- Error code implementation -- Beyond EDCs -- Detecting errors in content addressable memories -- Detecting errors in addressing -- Multiprocessor memory systems -- Dynamic verification of cache coherence -- Dynamic verification of memory consistency -- Interconnection networks -- Conclusions -- References -- Error recovery -- General concepts -- Forward error recovery -- Backward error recovery -- Comparing the performance of FER and BER -- Microprocessor cores -- FER for cores -- BER for cores -- Single-core memory systems -- FER for caches and memory -- BER for caches and memory -- Issues unique to multiprocessors -- What state to save for the recovery point -- Which algorithm to use for saving the recovery point -- Where to save the recovery point -- How to restore the recovery point state -- Software-implemented BER -- Conclusions -- References -- Diagnosis -- General concepts -- The benefits of diagnosis -- System model implications -- Built-in self-test -- Microprocessor core -- Using periodic BIST -- Diagnosing during normal execution -- Caches and memory -- Multiprocessors -- Conclusions -- References -- Self-repair -- General concepts -- Microprocessor cores -- Superscalar cores -- Simple cores -- Caches and memory -- Multiprocessors -- Core replacement -- Using the scheduler to hide faulty functional units -- Sharing resources across cores -- Self-repair of noncore components -- Conclusions -- References -- The future -- Adoption by industry -- Future relationships between fault tolerance and other fields -- Power and temperature -- Security -- Static design verification -- Fault vulnerability reduction -- Tolerating software bugs -- References. 
506 # # |a Abstract freely available; full-text restricted to subscribers or individual document purchasers. 
510 0 # |a Compendex 
510 0 # |a INSPEC 
510 0 # |a Google scholar 
510 0 # |a Google book search 
520 3 # |a For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art--over approximately the past 10 years--in academia and industry. 
530 # # |a Also available in print. 
538 # # |a Mode of access: World Wide Web. 
538 # # |a System requirements: Adobe Acrobat reader. 
650 # 0 |a Fault-tolerant computing. 
650 # 0 |a Self-stabilization (Computer science) 
650 # 0 |a Computer architecture. 
690 # # |a Fault tolerance (or fault tolerant) 
690 # # |a Reliability 
690 # # |a Dependability 
690 # # |a Computer architecture 
690 # # |a Error detection 
690 # # |a Error recovery 
690 # # |a Fault diagnosis 
690 # # |a Self-repair 
690 # # |a Autonomous 
690 # # |a Dynamic verification 
730 0 # |a Synthesis digital library of engineering and computer science. 
830 # 0 |a Synthesis lectures on computer architecture (Online),  |v # 5.  |x 1935-3243 ; 
856 4 2 |u https://ezaccess.library.uitm.edu.my/login?url=http://dx.doi.org/10.2200/S00192ED1V01Y200904CAC005  |z View fulltext via EzAccess