ERROR RECOVERY PROCESS - DCU-5 DISK ERROR RECOVERY

3.5 DCU-5 DISK ERROR RECOVERY

3.5.2 ERROR RECOVERY PROCESS

Each major recovery process proceeds according to the following four rules:

• The clear faults and reset functions are retried a limited number of times on each call to 04RES. If the retry limit is reached with no success in performing the specified function, the error is

considered unrecoverable and the entire recovery process is terminated.

• The function to obtain Drive General Status or a Selected Status is also retried a limited number of times in D4STAT before a bad

status is returned to the caller. This is not considered a fatal condition, however, and the calling overlay may continue with the recovery process.

• A successful read or write process is considered to include the head select, LMA select, and read or write functions. If an error is detected on any of these functions, the process is retried starting with the head select.

• If the ON flag is not set when expected, or if the BZ and ON flags cannot be cleared with the channel clear function, the error is considered unrecoverable and the recovery process is terminated.

The following subsections describe the recovery process for each major error type.

3.5.2.1 Unit select process

The following are conditions for unit select process error recovery.

3.5.2.1.1 Software detected errors: if a software time-out has occurred, call 04RES to reset the drive and retry the select.

3.5.2.1.2 Status register 0 errors: if the drive is not ready, delay and check repeatedly until the drive becomes ready or a retry limit is reached. If an input parity error is detected, retry the select.

3.5.2.1.3 00-49 drive general status errors: if a

sequence-operation-in-progress is detected, delay and check repeatedly until the sequence operation is complete or a retry limit is reached. If a catastrophic drive error is detected, inform the operator through 04MSG that manual intervention may be required (see subsection 3.5.3, Operator Messages, for more information). If the operator indicates a retry should be performed, retry the select.

I

^3.5.2.1.4 RO-10, 00-39, and 00-40 drive general status errors: if a catastrophic drive error is detected, inform the operator through 03MSG that manual intervention may be required (see subsection 3.5.3, Operator Messages, for more information). If the operator indicates a retry

should be performed, retry the select.

If none of the above conditions are found, call D4RES to reset the drive and retry the select.

3.5.2.2 Cylinder select process

The following are conditions for cylinder select process error recovery:

3.5.2.2.1 Software detected errors: if a software time-out has occurred, call 04RES to reset the drive and retry the seek.

3.5.2.2.2 Status register 0 errors: if the drive is not ready, delay and check repeatedly until the drive becomes ready or a retry limit is reached. If an input parity error is detected, call 04RES to reset the drive and retry the seek.

3.5.2.2.3 DD-49 drive general status errors: if a

sequence-operation-in-progress is detected, delay and check repeatedly until the sequence operation is complete or a retry limit is reached. If an invalid option, invalid command, function parity error, Bus-out parity error, or function lost is detected, call 04RES to clear faults. If a catastrophic drive error is detected, inform the operator through D4MSG that manual intervention may be required. If the operator indicates a retry should be performed, retry the seek.

I

3.5.2.2.4 RO-10, 00-39, and 00-40 drive general status errors: if a catastrophic drive error is detected, inform the operator through D3MSG that manual intervention may be required. If the operator indicates a retry should be performed, retry the seek. If i t is a function parity error or Bus-out parity error, call D4RES to clear faults. If i t is a command error or sequence parity error, call 04RES to reset the drive.

If none of the preceding conditions are found, call 04RES to reset the drive and retry the seek.

3.5.2.3 Head select-LMA select-read process

The following are conditions for head select-LMA select-read processing.

3.5.2.3.1 Software detected errors: if an initial LMA echo error has occurred, reload the Local Memory address into the LMA register in error until the load is successful or a retry limit is reached. If a final LMA echo error has occurred, the error is unrecoverable. If a software

time-out has occurred, call 04RES to reset the drive and retry the read process.

3.5.2.3.2 Status register 0 errors: if the drive is not ready, delay and check repeatedly until the drive becomes ready or a retry limit is

reached. If an input parity error is detected, retry the read process.

3.5.2.3.3 00-49 drive general status errors: if a

sequence-operation-in-progress is detected, delay and check repeatedly until the sequence operation is complete or a retry limit is reached. If an invalid option, invalid command, function parity error, Bus-out parity error, or function lost is detected, call 04RES to clear faults. If a seek error is detected, call 04SKR to perform seek error recovery. If an overflow is detected, call 04RES to clear faults. If an ID-not-found or Synchronization time-out is detected,

offset algorithm that follows below.

D4SKR to perform seek error recovery.

attempt error correction according to follows.

execute retries according to the If a drive error is detected, call

If an ECC error is detected, the correction algorithm that

I

^3.5.2.3.4 RO-IO, 00-39, and DO-40 drive general status errors: if Unit Ready is not set, call D3SKR to perform seek error recovery. If i t is a function parity error or Bus-out parity error, call 04RES to clear

I

faults. If i t i t a command error or sequence parity error, call D4RES to reset the drive. If a seek error is detected, call D3SKR to perform seek error recovery. If an overflow is detected, call D4RES to clear faults.

If an IO-not-found or Synchronization time-out is detected, execute retries according to the offset algorithm below. If a drive fault is detected, call 03SKR to perform seek error recovery. If an interface logic fault is detected, call D3SKR to perform seek error recovery. If an ECC error is detected, attempt error correction according to the following correction algorithm.

The offset algorithm is as follows:

• Call D4RES to clear faults and retry the read until a limit is reached.

I

The

•

Offset actuator or actuators in error toward spindle.

•

Retry the read until a limit is reached.

•

Offset actuator or actuators in error away from spindle.

•

Retry the read until a limit is reached.

correction algorithm is as follows:

This algorithm is superimposed on the offset algorithm. It is

executed following each read retry yielding an ECC error. Compute and transfer the correction vectors for this read attempt. Compare the correction offsets from this read with those from the previous read.

If the error offsets are consistent (within 1 parcel) on all channels, call D4ECC (DD-49), D40ECC (RD-10 and DD-40), or D3ECC (DD-39) to correct the last read data.

If none of the preceding conditions are found, retry the read process.

3.5.2.4 Head select-LMA select-write process

The following are conditions for head select-LMA select-write processing:

3.5.2.4.1 Software detected errors: if an initial LMA echo error has occurred, reload the Local Memory address into the LMA register in error until the load is successful or a retry limit is reached. If a final LMA echo error has occurred, retry the write process. If a software time-out has occurred, call D4RES to reset the drive and retry the write process.

3.5.2.4.2 Status register 0 errors: if the drive is not ready, delay and check repeatedly until the drive becomes ready or a retry limit is

reached. If an input parity error is detected, retry the write process.

3.5.2.4.3 DD-49 drive general status errors: if a

sequence-operation-in-progress is detected, delay and check again until the sequence operation is complete or a retry limit is reached. If an invalid option, invalid command, function parity error, Bus-out parity error, or function lost is detected, call D4RES to clear the faults. If a seek error is detected, call D4SKR to perform seek error recovery. If an underflow is detected, call D4RES to clear faults. If an ID-not-found or synchronization time-out is detected, call D4RES to clear faults and retry the write process. If the retry limit is reached for ID-not-found, call D4SKR to perform seek error recovery and retry the write process one more time. If drive error is detected, call D4SKR to perform seek error

recovery.

I

^3.5.2.4.4 RO-10, 00-39, and 00-40 drive general status errors: if Unit Ready is not set, call 03SKR to perform seek error recovery. If i t is a function parity error or Bus-out parity error, call D4RES to clear

faults. If command error or sequencer parity error, call D4RES to reset the drive. If a seek error is detected, call 04SKR to perform seek error recovery. If an underflow is detected, call 04RES to clear faults. If an IO-not-found or Synchronization time-out is detected, call D4RES to clear faults and retry the write process. If the retry limit is reached for IO-not-found, call 03SKR to perform seek error recovery and retry the write process one more time. If drive error or interface logic error is detected, call 03SKR to perform seek error recovery.

If none of the preceding conditions are found, retry the write process.

3.5.2.5 Unit release process

The following are conditions for unit release processing.

3.5.2.5.1 Software detected errors: if a software time-out has occurred, call 04RES to reset the drive and retry the release.

3.5.2.5.2 Status register 0 errors: if the drive is not ready, delay and check repeatedly until the drive becomes ready or a retry limit is reached. If any other error is detected, call D4RES to reset the drive;

reselect the unit, then retry the release.

3.5.2.5.3 DD-49 drive qeneral status errors: if any error is detected in drive general status, call 04RES to reset the drive and end error recovery.

• 3.5.2.5.4 RO-10, 00-39, and 00-40 drive general status errors: if any error is detected in drive general status, end error recovery.

I

If none of the preceding conditions are found, reselect the unit and retry the release.

Im Dokument Internal Reference Manual (Seite 125-129)