The Key Concepts in Fault Tolerance vs. Disaster Recovery Efforts

Fault tolerance is the term to describe the ability of a system that is computer-based to operate, despite
hardware or software failures, through the combination of special hardware (made up of full
error-checking, redundancy, and hot-swap support features) and special software. Fault tolerance has
basic attribute requirements, including non-single point of failure, failing components-fault isolation plus
repair, the availability of reversion modes, as well as fault containment to impede
failure propagation. In addition, fault tolerance is usually measured at the application level, which
characterizes both planned and unplanned service outages. The computer operations are performed in two
or more duplicate systems, so whenever one system fails, the other or others can take over. Fault tolerance
may is the ability to resume operation in the event of a power failure. Basically, failures
start from physical failure and then progress to logical faults as a result of system errors.

Disaster recovery is the ability to start operating again after a disaster hits and causes even
the destruction of an entire data center. A successful disaster recovery effort pivots on the ability to restore
or recreate such resources as business data, system, facilities, networks, and user access. There are
disaster recovery methods used to resume system operations, including offsite periodic tape backups,
disaster recovery site, data vaulting, and hot site. The backups of software, as well as data information,
should be stored separately in a location independent from company facilities as well to facilitate recovery
when disaster occurs.

The considerable difference of fault tolerance from disaster recovery is that fault tolerance provides
constant updates about an environment for user access. It should be a part of disaster recovery but cannot
be considered a true disaster recovery effort of its own.

Recommended For You

Leave a Reply