• Keine Ergebnisse gefunden

ERROR RECOVERY PROCEDURES

Im Dokument DMV11 Synchronous Controller (Seite 96-99)

I CMD TYPE

CHAPTER 4 PROGRAMMING TECHNIQUES

4.6 ERROR RECOVERY PROCEDURES

Within a DMVl1-based network, there are three basic levels of error recovery involving the user pro-gram:

1. Procedural violations where only the user program is notified.

2. Recovery from errors requiring protocol shutdown initiated by the user program.

3. Fatal errors resulting in system shutdown with minimal notice to the user program.

Referring to Table 3-7, procedural error codes from 100 to 140 are reported to the user program with no recovery required. The remaining two procedural errors (codes 300 and 302) involve error recovery levels two and three respectively. All network errors require recovery through protocol shutdown, and the control response (queue overflow) could result in network shutdown.

4.6.1 Recovery from Network Errors

In all cases, recovery from network errors requires that the protocol be halted at the tributary or station recording the error. Two similar but separate procedures are recommended for recovery from threshold errors, and babbling and streaming tributary errors. These recovery procedures are described below.

4.6.1.1 Recovery from Threshold Errors - DMVl1 threshold errors are detailed in Section 5.3.3. The recommended recovery procedure to be initiated by the user program at the station recording the errors is presented below:

1. Halt the protocol (see Table 3-6).

2. Read the error counters to determine the nature and cause of the threshold error condition. If the error results from a shortage of receive buffers, correct the condition. If the transmit or selection threshold is being exceeded, check the operational condition of the remote station.

3. When the conditions causing the errors have been eliminated, restart the protocol (see Sec-tion 4.3.3').

4.6.1.2 Recovery from Babbling and Streaming Tributary Errors - Babbling or streaming tributary er-rors are created when their respective timers are exceeded. Therefore, a timeout can result from an actual error condition, or because the period of the timer is too short for the type of message activity on the line (see Sections 4.4.2 and 4.4.3). A suggested recovery procedure to be used when encountering these conditions is:

1. Halt the protocol.

2. Check the value of timer parameters and increase if the value is not appropriate.

3. Restart the protocol (see Section 4.3.3).

4. If this error condition persists, reconfigure the station as specified by Section 4.3.1.

5. When the cause of the timeout originates at the remote station, action must be taken at the remote station to ascertain and correct the fault. The local station is at fault only if the values of the timer parameters are inappropriate.

4.6.2 Recovery from Procedural Errors

The three procedural errors that require a recovery procedure are:

L Nonexistent memory error.

2. Buffer too small error.

3. Queue overflow error.

The recovery procedure for each of these errors is detailed in Sections 4.6.2.1 through 4.6.2.3.

4.6.2.1 Recovery from a Nonexistent Memory Error - Nonexistent memory errors occur when the DMV11 tries to access an allocated receive or transmit buffer having an invalid bus address. When this error is detected, the DMVII posts a control response to the user program containing the invalid ad-dress (see Section 3.4.2). It is up to the user program to determine whether the nonexistent address concerns a transmit or receive buffer.

NOTE

Depending on microcode processing circumstances, the nonexistent memory address returned to ,the user program could have been incremented to the next se-quential location.

4-18

The suggested recovery procedure for this error is as follows:

1. Halt the protocol for the tributary or station recording this error to initiate return of all out-standing buffers.

2. If the error concerns a buffer from the common pool, the user program should issue the glob-al hglob-alt command to initiate return of glob-all outstanding receive buffers from the common pool.

3. Restart the protocol and reallocate buffers as necessary.

Persistent recurrence of this error indicates a possible main CPU or DMVII malfunction.

NOTE

If the network line speed is 56K bls, the requests for retransmission generated by a nonexistent memory address can result in the overflow of the DMVll re-sponse queue causing a fatal system error (see Sec-tion 4.6.2.3).

4.6.2.2 Recovery from a Receive Buffer Too Small Error - When the DMVII receives a message, it first checks for the availability of a buffer from the common buffer pool linked list, and if one is avail-able, it uses that buffer. If the common buffer pool is empty or not enabled, the private buffer linked list is checked. If a private buffer is not available, the receiving station NAKs the incoming message.

The steps taken by the DMVll microcode in this process are listed below.

1. Is the message number in sequence? Yes, continue; No, ignore message.

2. Is the common buffer pool enabled? Yes, continue; No, go to Step 6.

3. Is the common buffer pool quota = O? Yes, go to Step 6; No, continue.

4. Is a common pool buffer available? Yes, continue; No, go to Step 6.

5. Is the common pool buffer too small? Yes, go to Step 8; No, use this buffer.

6. Is a private buffer available? Yes, continue; No, send NAK - buffer temporarily unavailable.

7. Is private buffer too small? Yes, send NAK - buffer too small; No, use this buffer.

8. Is private buffer available? Yes go to Step 7; No, send NAK - buffer too small.

NOTE

The DMVll does not scan the common pool or pri-vate linked list structures looking for a buffer of suf-ficient size. Rather, it uses the next available buffer from the list.

Buffer too small errors apply only to receive buffers. The procedure for recovery from this error is dependent on whether the allocated buffer is from the common pool or is a private buffer. The appli-cable recovery procedures are explained below.

A. Common pool buffer too small

1. Assign a private buffer of sufficient size to the receiving tributary through a buffer ad-dress/character count command (see Section 3.3.4).

B. Both private and common pool buffers too small

1. Halt the protocol for the offending tributary to initiate return of all outstanding private buf-fers.

2. Restart the protocol.

3. Assign a private buffer of sufficient size to the receiving tributary through a buffer ad-dress/character count command (see Section 3.3.4).

C. Private buffer too small, and common pool not enabled

1. If buffers from the common pool are available to other tributaries, and are of sufficient size, enable common pool buffers for this tributary (see Section 3.3.4).

2. If the common buffer pool is not in use for other tributaries, follow recovery procedure B above.

4.6.2.3 Recovery from a Queue Overflow Error - This error is always fatal to the DMVII recording the error since it. forces automatic shutdown of the device. The basic cause of this error is the in-availability of link blocks from the free linked list (see Section 5.4.1.1). Typically, this error results when the internal response queue overflows because the DMVII generated responses faster than the user program could retrieve responses from the queue. This error can also occur if an inordinate num-ber of receive buffers have been allocated. One cause of response queue overflow is the occurrence of repetitive nonexistent memory errors in high-speed networks (see Section 4.6.2.l).

When this error occurs, the DMVII posts the most current entry in the response queue to the user program. The user program then has three seconds after being interrupted to retrieve the response. If it is retrieved during this three second window, the next response is posted. As long as the user program retrieves each response within this window, the process continues until the internal response queue is empty. These responses can then be analyzed to determine the cause of the queue overflow.

After the last response has been posted, or the three second response period has expired, the DMVII shuts itself down. At this point, returning the DMVII to operational status requires that the start-up procedure be initiated from the beginning (see Section 4.3).

Im Dokument DMV11 Synchronous Controller (Seite 96-99)