• Keine Ergebnisse gefunden

THE PRIMARY OPERAND UNIT

Im Dokument and Roland (Seite 73-87)

4 The Primary Instruction Pipelin

4.2 THE PRIMARY OPERAND UNIT

The provision of separate Primary and Secondary Operand Uni ts in the MU5 Processor arose from the distinction made in the order code between different types of operand, particularly between named or literal operands and data structure elements.

The basic idea was that the Primary Operand Unit (PROP) would be concerned with accessing the operand specified directly by the instruction and routing the instruction, together with its primary operand, to the appropriate following unit for execution or further processing. PROP would therefore contain the Name Store, and if the primary operand was a named variable or literal for example, the instruction would be ready for execution at the end of PROP. An instruction invol ving a descriptor would be sent to the Secondary Operand Uni t (SEOP). As described in section 2.3, however, a Name Store was also incorporated into SEOP, and some instructions can therefore leave PROP without their primary operand.

Figure 4.5 shows the basic hardware in PROP and the variouS stages of operation involved in processing a typical instruction. Instructions are received from the IBU into registers DF (function) and DN (name). The first action is the decoding of the instruction to select the appropriate base (NB, XNB or SF) and the name part of the instruction. For access to a 32-bit variable, the name is shifted down one place relative to the base and the least significant digit is used later to select the appropriate half of the 64-bit word obtained from the Name Store.

X software simulation (section 2.3) and on technological considerations. Thus, for example, the modules needed to make

4.2.1 Design of the PROP Pipeline

The pipelining of the five stages in PROP is achieved by staticising the information obtained at the end of each stage in a buffer register (figure 4.6). An important aspect of the design of a pipeline is the timing of the strobes to the buffer registers, because some outputs from a stage (the function bits, for example) will derive directly from its inputs. With master/slave flip-flops the various registers can be strobed simultaneously, but these devices are inherently slower than the D-type flip-flops used in MU5 (section 3.1.1).

'A different technique is therefore used in which the result obtained at the end of anyone stage is only copied into its buffer register when the resul t . of the following stage has itself been staticised. The strobes used to copy information into the buffer registers are therefore staggered, as shown in figure, 4.7.

.-- r-- r-- r-- r-

r-.. D r-... Decode I- F~ Decode ~ F~ Decode f-F~ Decode j+FI+ Decode j+F

I--F 0 1 1 2 2 3 3 4 4 5

L..- L.-

--

~ ~

f---

S S

SI--BU 3 4 5

... r- r- f- ~

L.-~D

fL

f:'"L .~L

.--N

t

Shift 1 2 3 Shift

~

L...- ~ '--

L.-V &

N

l

F

M l'

-N~ ~

~ .:..:. Virtual

L.-Namel Address ~L Value Control C

-

B_ Base r-- I~ Field R Field - Adder 0

r--S Adder N

X ...- .-- I- ' - - - '--

-N V H~

B N S 10..-S U 0

B F 6

I

.- .. .-

L.- L.-

L.-Figure 4.6 The Complete Primary Operand Unit

The shaded portions of figure 4.7 show the progress of one instruction through the PROP pipeline. It is first copied into DF and DN (function and name respectively) and the Decode 0 logic carries out the decoding of the instruction necessary to control the first stage. The decoding logic of figure 4.5 is spread out in the pipelined version into separate decoders for

each stage. In many cases, however, the necessary decoding cannot be carried out in sufficient time to control the action of a given stage. In these cases it is carried out i'n the previous stage, and the various control signals appear as additional function digits, along with the original function, as inputs to the stage requiring them.

Stage 0 ----

Decode--Stage 1 Add

-Stage 2 - Associate

--j Ur---....

~AJ

~--Stage 3

---Read--- ---Read--- ....

10 ns

~

--

-Assemble

--40 ns

Stage 5 _ _

U~-U

U""'---'U

Control

J....--...

LJ u u u

T i m e

-_Increment __

Control

~--~

Figure 4.7 The Basic PROP Timing Diagram

The next pipeline strobe is timed to arrive no earlier than when the outputs of the first stage have settled and are ready to be strobed into the registers F1 (function), NM (name) and BS (base). The addition of name and base now takes place and after the appropriate time has elapsed, the result is copied into IN, the Interrogate Register. The output of IN is concatenated with PN, the Process Number, to form the input to the associative field of the Name Store. The result of the association is then copied into the PROP Line Register (PLR), the out put of which accesses the line in the Value Field containing the required operand. The Value Field output is copied into the Value Field register (VF), and thence, after assembly, into the Highway Input register (HI).

Once an instruction has reached HI, PROP must wait until it has been accepted by another unit before taking any further action. Instructions therefore proceed through PROP in series

of 'beats', the rate at which these beats occur being

designing for only three levels of logic in paths whi~h contains one 16-bit order. However, circumstances arise which prevent full utilisation of each stage. For example, the order Register updating. Other conditions can also arise which require hold-ups wi thin the pipeline and hence also produce preventing subsequent beats from propagating back beyond the stage from which they arise (figure 4.8), and by causing dummy

4.2.2 Multi-length Instructions

Mul ti-Iength instructions are those using a 16-bi t name or a 16, 32, or 64-bit literal operand. The actions required to implement these orders, and also the stack mechanism, are controlled by three digits in each of registers DF and F1 and the decoding logic between them. Thus following a DF strobe, the inputs to these three control digits in F1 take up states determined both by the function and the states of the control digits in DF. When the next beat pulse reaches F1, the F1 control digits take up their new: states and are then copied back into the corresponding DF digits by the same beat pulse 10 ns later, so that the digits in DF and F1 act as a master / slave comb ina tion. For example, when a long name is used, the next beat copies a dummy order into F1 and the 16-bit name into DN, the DF strobe being inhibited. On the following beat the order enters F1 with the execute digit set to 1 and the 16-bit name in NM (figure 4.6).

16-bit literal operands are dealt with in the same way as 16-bit names, except that the content of DN is copied into L1 instead of NM, and thence into L2, L3 and VU as the order proceeds down the pipeline. 32-bit and 64-bit literal operands start off in the same way as 16-bit literals but require three and five phases respectively. Following the decoding of the literal, the next beat copies a dummy order into F1, and the first 16 bits of the literal into DN. The valid order i tsel f proceeds to F1 on the subsequent beat and is then followed by dummy orders until the whole literal has been copied into the pipeline using registers DN, L1, L2 and L3.

For any size of literal the complete value is copied into VU from some or all of DN, L1, L2 and L3, as appropriate, when the order itself enters F4. For a 16-bit literal, sign extension, if specified, takes place to 32 bits between L3 and VU (a 6-bit literal is extended to 16 bits between DN and L1) and sign extension to 64 bits, again if specified, takes place between VU and HI. Figure 4.9 shows the pattern of the phases for a 32-bit literal superimposed on the pipeline timing diagram.

This technique for dealing with long literals was adopted because it avoided the need for the IBU-PROP interface to be extended beyond 16 data bits, and also the need to provide a 64-bit data buffer at each pipeline stage. The obvious penalty for this saving in hardware is the number of extra pipeline beats necessary whenever a long literal is used. However, for a parallel system to operate satisfactorily, a higher IBU data rate would also be necessary to ensure the availability of all Jarcels of a multi-length instruction.

Stage 0 particularly following unpredicted control transfers. In these cases a 'data valid' signal accompanying the function copied is valid, since sufficient additional instruction parcels may not be available to complete the instruction. Additional 'data available' signals are therefore copied into DF, along with the 'data valid' signal, to indicate whether there are two, three, four, or five instruction parcels immediately available. If the required number of parcels is not available for the order involved, a dummy order is propagated forwards and the strobe to the IBU is inhibited. In addition, however, the strobe to the DF function digits is inhibited so that successive pipeline beat pulses only copy in the 'data available' digits until sufficient parcels are available for the order to proceed normally.

partial results are stacked by the use of the

'*='

(stack and load) function. They are later unstacked by use of the operand form STACK (section 2.2. 1). Stacked operands are therefore contained in the MU5 Processor storage system in exactly the same way as names, their addresses being generated relative to the Stack Front register (SF), which points to the most recently stacked operand wi tl:lin the Name Segment. Thus SF is advanced by both the

'*='

function and functions concerned with procedure entry (STACK and STACK LINK), and all these functions require two operand accesses to be made. Hence they are divided into two phases.

For the STACK functions an access is first made for the specified operand followed by an access to the stack, while for the

'*='

order the first access is to the stack, in order to store the content of the specified register, and the second is for the operand. For the stack writes the name/base adder is used to create the address SF+2 and at the same time SF is updated to this new value. The two phases of these ord'ers are distinguished by extra digits carried through the pipeline with the function. These digits override the normal operand accessing mechanisms when access to the stack is required and also prevent the incrementing of the Control Register when the first phase passes the Control Point.

For the unstacking operation the access to the stack must use the current value of SF as the address, and SF must then be decremented. Two passes through the name/base adder are therefore required, one to present the address SF and one to create address SF-2. Thus this type of order is also split into two phases, one of which is essentially a dummy order serving simply to decrement SF.

The implementation of this stack mechanism wi thin a pipeline gives rise to additional problems in relation to control transfer orders. An order implicitly changing SF does so while there are still several orders ahead of it, but not past the Control Point, and therefore not yet completed. Any one of these orders could be a control transfer order requiring that the partially processed orders behind it in the pipeline be abandoned. Should this si tua tion occur, the SF Register may contain an incorrect value. The correct SF value could be maintained by preventing overlap in such situations, but this would seriously deteriorate the pipeline performance.

The alternative solution adopted is to allow the SF register to change as and when required and to carry along the pipeline with the order the new value of SF created by it (registers S3, S4 and S5 in figure 4.6). When the Control Register is updated for the order, the value in S5 is copied into s6.

Therefore when a control transfer occurs, the value in S6 is

correct and is used to restore SF.

4.2.4 Register-Register Orders

Since all the central registers in MU5 serve dedicated purposes, the need for register-register transfers occurs far less frequently than in machines such as the PDP-11 and System/360. Indeed, transfers between these dedicated registers are not really compatible wi th a pipeline organisation. However, it is sometimes convenient to use orders such as

x

+ B B

=>

DO

where B (the Modifier Register) and DO (the Origin Field of the Descriptor register) are specified as Internal Registers (section 2.2.3). A general scheme for organlslng these transfers was therefore implemented. It involves splitting the order in to two phases, one to obtain the operand from the source register, and one to carry out the required operation on the destination register. Since the Control Register can only be incremented once the second phase is complete, it is convenient to split the order before the Control Point, and to use the hold-up and WAIT mechanisms to control the necessary actions. In retrospect it is doubtful whether the engineering complications required to implement these orders are justified by the advantage gained in the software, and a different solution to the overall problem would be sought in a re-designed system.

The first action in the pipeline for a register-register order is the setting of a hold-up digit in the Stage 3 function register, F3 (figure 4.6) ... When the next beat pulse occurs a control digit is set in F4, the output of which is fed back to F3 in order to act as a counter and to release the hold-up. Thus when the first phase of the order reaches Stage 5, the second phase enters Stage 4 and a new order enters Stage 3. The first phase of the order sets a WAIT condition (section 4.2. 1) and no further action takes place in PROP until the appropriate operand is returned via the Central Highway to register HO. This operand is then copied into register VU to line up with the second phase of the order in F4. A beat is then generated, without the Control Register being incremented, to put the second phase of the order into Stage 5 where it behaves qS a simple order.

Slightly different actions are required in each of the two phases for the two examples given above. In the first case, the first phase of the order is sent to the B-unit accompanied

by a control digit which indicates that the '+' function should be ignored and the value in the B-Register simply routed on to the Central Highway. The second phase of the order proceeds normally to the A-unit with the operand being trea ted as a literal in SEOP. In the second case, the first phase of the order proceeds to the B-unit, where it is treated as a normal store order. The second phase is sent to the D-unit, again accompanied by a control digit which indicates that the

'=>'

function should be ignored and the operand simply loaded into the least significant half of the Descriptor Register.

4.2.5 Store Orders

An order of the type 'B

=>

name' does not reach the B-unit until some time after the access has been made to the Name Store, so that the operand is not immediately available. Thus, in the absence of any additional technique, a hold-up equivalent to at least four pipeline stages would be needed to awai t the return of the operand from the B-uni t. In the case of store orders involving registers within PROP (NB

=>,

etc.) or SEOP (DR

=>,

etc.) this delay is less important since these orders occur infrequently and 'ACC

=>

name' orders are dealt with separately by the Secondary Pipeline (section 5.2.6). For the 'B

=>

name' order, however, special action is taken in order to avoid the hold-up.

When a 'B

=>

name' order enters Stage 4 of the PROP pipeline, the content of the PROP Line Register (PLR in figure 4.6) is preserved, for later use, in an additional register BWy and the order proceeds normally through the pipeline wi thout impeding t,hose following, except as described below.

When the function is executed by the B-unit, the Central Highway copies the resul t into register HO, and sets a WAIT condition (section 4.2.1), which stops the pipeline before the next beat is generated. The information held in register BW is then used to select the appropriate line in the Value Field of the Name Store and the content of register HO is written into it. The ~dditional information needed to select one half of the line for over-writing is held in the F2 Function Register, together with a 'B

=>

outstanding' digit which indicates that the BW Register is in use. When the action of writing into the store has been completed, the 'B

=>

outstanding' digit is re-set and the pipeline is re-started.

While the 'B·

=>

out stand ing' digit is set, two pipeline hold-ups can occur, one at Stage 2 and one at Stage 4. The hold-up at Stage 2 occurs if a second 'B

=>

name' instruction enters that stage. This hold-up prevents subsequent beats from

prop~gating back beyond the input registers to Stage 3 (F3

etc.) and causes dummy orders to be copied into Stage 3. The hold-up at Stage 4 occurs if any instruction tries to access the same line in the Name Store as that indicated by BW. This hold-up prevents subsequent beats from propagating back beyond the input registers to Stage 5 (F5, etc.) and causes dUrrL.llY orders to be copied into Stage 5. Both these hold-ups are automatically released when the 'B

=>

outstanding' digit is re-set, or if the contents of the pipeline are discarded by the action of a control transfer before the 'B

=>,

orde"r has left the end of PROP. If a control transfer occurs after a 'B

=>,

order has left the end of PROP, then the store updating action must still be carried out since the Control Register will have been incremented for this order.

4.2.6 Lock-outs

Two types of si tuation occur in which an order reaching the end of the PROP pipeline cannot be allowed to proceed until an action arising from a previously issued instruction has been completed. In such cases the earlier order will have set a lock-out digit as it left PROP, and the order which must be held up is copied into F5, but is not issued, until the lock-out digit has been re-set.

The first type of situation occurs when a B-function requires a secondary operand access. The order is sent from PROP to Dr and thence via OBS and Dop to the B-unit. Once an order has been accepted by Dr, PROP would normally be free to send orders using primary operands direct to the B-unit, but these would arrive ahead of the order proceeding through the SEOP. The B lock-out digit prevents this.

The second type of situation occurs when a comparison order is sent to the B-unit or A-unit (CaMP or CINC, for example) or to the D-unit (SCMP). The final action of any of these orders is the setting of the Test Digits in the Machine Status Register in PROP, according to the result of the comparison (zero, negative or overflow). Subsequent orders which copy the Machine Status Register into store (STACK LINK, for example), mus t therefore be held up un til the resul t is received. In addition, further comparison orders must also be held up. This is not a normal programming si tua tion, but a faulty program could issue two comparison orders in succession before a conditional control transfer, and the control transfer following the second comparison order would proceed on the result of the first, leaving all subsequent pairs of

The second type of situation occurs when a comparison order is sent to the B-unit or A-unit (CaMP or CINC, for example) or to the D-unit (SCMP). The final action of any of these orders is the setting of the Test Digits in the Machine Status Register in PROP, according to the result of the comparison (zero, negative or overflow). Subsequent orders which copy the Machine Status Register into store (STACK LINK, for example), mus t therefore be held up un til the resul t is received. In addition, further comparison orders must also be held up. This is not a normal programming si tua tion, but a faulty program could issue two comparison orders in succession before a conditional control transfer, and the control transfer following the second comparison order would proceed on the result of the first, leaving all subsequent pairs of

Im Dokument and Roland (Seite 73-87)