Processor Pipeline Overview

Processor Pipeline Overview ... _ .... _ ... _ ... _ .... . 11 2.1. Pipeline Fundamentals ... _ ... _ ... _ ... _ .... _._ ... . 11

FOIFI (Fetch) ... _ ... _ ... _ ... . 11 I>O (Grouping) ... _ ... __ ... _ ... _ ... . 11 D I (Resource Allocation) .. _ ... _ ... _._ .... _ ... _ ... _ ... _ .. 12 D2 (Read Operands) _ ... _ ... _._ ... _ ... _ .. _ ... _ ... __ ... _ ... _ ... . 12 EO (Execute first stage) _ ... __ . __ .~ ... _ ... _._ .... __ ... __ ... _ ... _ ... _._._. __ ... _._ .. 12 EI (Execute second stage) ... _ .... _ ... _ ... _ ... _ ... _._ ... : .. _ ... . 12

WB (Write Back Results) _ ... _ ... _ .... _ ... _. __ . __ ... _. ___ ._ ... __ .. _ ... _._ .. 13 2.2. Basic Pipeline Diagram ... __ ._ ... _ ... __ .... _._._ ... _ .... _ ... __ ... __ ... _._ .... . 13

2.3. Pipeline Exmnples _ .... _._ .... _. __ . __ .. ___ . __ .... _ ... _. ___ ~ ___ ._._._._. ___ ... _._ 14 ^..

Memory References _____ ._. _____ ._ ... __ .. ______ . ______ ... _. ____ .. 15 ^.".. ^.";:

LD (l..oad operation) __ ._ ... ___ .. ___ ._ ... _ .. _ ... _. __ ._ ... _. ___ .... _._. __ _ 15 ^{... -:}^..^:-:

ST (Store operation) ___ ._._ ... __ ._ ... _ ... _ ... __ ... __ ... _ ... __ ... . ₁₇ Aoating Point Pipeline _____ . ____ . _____ . __ ._. _______ .... ____ ._._ .. 18 Aoating Point Instructions ___ . _____ :_. __ ... ________ . _____ ._. __ .. 20 Aoating Point Queue ___________ . _____ . __ ._. ___ . ___ . _____ _

20 Aoating Point Execution Times . ____ .. ______ . ____________ .. 20 Conditional Branch Pipeline ______ . ___ . ___ . ____ . ______ _ 22

Untaken Branch ________________ . _________ . _____ . __ _ Taken Branch _________ . _____________________ _ 23

24 Branch Couple ______________ . _________ .. __________ _

2S JMPL . _______________ . __ . _____ . ______ . _______ .

..

Procedure Call and Return ... __ ... _ ... . 26

CALL ._ ... _ ... . 26 ^I,^i1"

'-^/ CWP Pipeline ... _ .. _ ... _ ... _ ... _ ... . 26

SA VE ... _ ... _ ... _ ... .. 26 RES'I'ORE ... ___ ._ ... _ ... _ ... _______ ... __ ... ___ ... __ ., 27 Exceptions _ ... __ ... __ ._ ... __ ... _ .... _ ... __ .... _ .... _._ .... _ ... _ .. 27 Exception Pipeline _ ... _ ... _ ... _ ... . 27

InleI1UPts ... _ .... ___ ._. ___ .... _ ... _ ... _ ... __ ... _ ... . 28 RE'rJ' (Return From Trap) Pipeline _ ... _ .... _ ... .. ₂₉

. ^~...

-2

The Viking J1Processor pipeline is presented in this chapter. This information is used throughout the document to describe Viking's operation. The next chapter.

Code Generation. suggests code generation strategies to attain maximum perfor-mance from Viking's SuperScalarpipeline.

2.1. Pipeline Fundamentals Viking's pipeline comprises eight stages. which execute in four clock cycles.

2.1.1. FOIFI (Fetch)

2.1.2. DO (Grouping)

Each stage has a unique function. which varies depending on the group of instructions being executed. In general. they follow the standanl Fetch, Decode, Execute, Write Back model The Viking pipeline stages are:

FO, Fl. DO, Dt, D2, EO, EI. WB and each stage is defined in detail below.

All instructions must be fetched before they are executed. However, not all instructions are fetched in the cycle immediately preceding their executiOlL They may be prefetched. and placed in the instruction queue. The Fetch stages (FO/Fl) of the pipeline manage the instruction queUe, including fetching and prefetching required instructions,from memory. Not all instructions fetched are executed.

Some may be discarded if a control transfer instruction (branch) changes the Dow of execution. Up to 128 bits (four insttuctions) may be read from the instruction cache in every cycle. 1bese instructions are entered into the instruction queue, and can be removed at a maximum rate of three instructions per cycle.

The DO stage selects from 0 to 3 instructions from the instruction queue to form an in-order instruction group. This selection depends on the set of instruction Ct.UUliII4tes that are available at the bead of the instruction queue prefetch buffer, as well as the current state of the processor pipeline. The grouping rules used to form 1bis selection are described in section 3.4 1bese instructions must be taken in order from the queue, ^Vikingdoes not execute instructions out of order.

Once a group of instructions is selected, DO identifies the first memory reference instruction in the group, and latches the corresponding register index. DO forms extension words based on the immediate values for memory reference and con-trol transfer instructions' displacements. DO identifies ClJ.SctJ/le conditions fmteger insauction data dependencies within and between instruction groups).

and insens pipeline bubbles when necessary. A bubble or pipeUne stall of dead

.!P"!!

Sun Microlyltema Pmpriday Revision 2.00 of November 1. 1990

12 TMS390ZS0 - Viking User Documentation there is a pipeline hazard, or if required data is not available.

01 assigns available resources within the integer unit to individual instructions in the group selected during DO. All cases of data forwarding (or bypass) are

resolved in this stage. All operand register index are selected and assigned to individual register file ports. 1bese resources stay constant throughout the execu-tion of the insttuctions.

The two address registers selected during DO are read via two dedicated register file ports during DI. This data is used in 02 to compute a LO/ST virtual address.

The data for these may also beforwarded from currently executing insttuction groups.

Branch target addresses are generated in 01, taken from the extension words selected in DO and the Program Counter (PC) value of the branch instruction within the group. Next PC values are also generated.

Stage 02's primary function is to read the operand registers selected in the preceding 01 stage. In addition, the address operands read during 01 will be combined in the virtual address adder. The result is a 32-bit vinual address which will be used to reference the MMU and data cache in subsequent stages. During D2, any bypass paths required for execution will be set up to transfer data in cycles that follow.

Viking has two execution stages. EO is the primary execution stage. Most arith-metic operations complete in EO. During EO, the data operands read from the register file during D2 are passed through one of two ALUs, or the shifter. Up to two integer results can be generated in EO. Only one may be generated by the shifter. These results are then presented as input to the EI cascaded ALU, and sent into many forwanting paths.

For memory references, the virtual address generated in 02 is used in EO to begin

floating point operations are dispatched to the FPU during EO.

The sec:ond stage of execution can generate at most one additional integer ALU result This result is generated

ill

the Cl.lSt:IIIUd ALU.The computed results from

the EO AW or shifter are used as inputs to this AW. All execution results from ;<.',

the current insUuction group are available by the end of the E 1 stage. This ~-./.

includes data retumed from the data cache.

Revision 2.00 ofNowmber 1.1990

(

2.1.7. WB (Write Back Results)

2.2. Basic Pipeline Diagram

Results generated in E 1 are delayed a cycle before they can be used as address operands. Address dependencies for a load from memory result in one cycle of pipeline bubble. Condition codes generated in E 1 are delayed a cycle before they can be used in resolving conditional branches.

When stage EI has completed, all results are guaranteed to be available. The pri-mary action in the WB pipeline stage is to 'Write Back' these results into the register files based on write enable signals generated earlier in the pipeline and potentially modified due to exceptions. 1be WB stage executes at the same time as the EO stage of the next instruction group. Forwarding paths are used to transmit data between successive groups. The integer unit and FPU update the register files during WB, and normally the data cache updates its contents when a ST instruction has appeared in EO-El.

Viking can operate in either cc mode or MBUS mode. The choice of mode has a major impact on the behavior of ST instructions. In cc mode, Viking assumes the existence of an external cache. In this mode, the Viking data cache behaves as a 'Write 1brough' cache, which means that all ST instructions that modify the internal cache also write their data through to the external cache.

In MBUS mode, the Viking data cache operates as a 'Copy Back' cache, which means that ST data is not written out to the external system until the line contain-ing the data is replaced in the cache, or a snoop on the bus forces a copy back.

Also, in MBUS mode the cache implements a 'Write Allocate' policy, which means that if a ST misses in the cache, the line containing that data is brought in from memory, and then the ST is performed locally (i.e. memory does not get' updated, consistent with the 'Copy Back' strategy). MBUS mode does not assume the presence of an external cache.

As in

cc

mode, there are conditions that will force a ST in MBUS mode to be treated as a synchronous ST.

1broughout the document, pipeline diagrams are used to represent

the

Bow of insuuctions an;..; data through the processor. TIle most basic pipeline diagram is shown in the figure below. This diagram is generic, and is intended to show the relation of groups in the pipeline •

• §!I,.!!

Revision 2.00 of November 1, 1990

14 TMS390ZS0 - ViJcing User Documentation

Figure 2-1 Basic Pipeline Description

CLOCK

Instruction Group One

Instruction Group Two

Instruction Group Three

Instruction Group Four

2.3. Pipeline Examples

FO Fl DO 01 02 EO El WB

R> Fl DO 01 02 EO El WB

All pipeline stages are identified. ^Thebold venica1lines indicate major (rising) clock edges. The shaded venica1lines are minor (falling) clock edges. In general, the contents of the instruction group will be indicated in the left-side heading.

Significant operations and interactions are included in the boxes for individual stages.

Viking's pipeline is straightforward for simple instruction sequences (e.g., integer arithmetic). The complexity rises quickly for memory reference and control transfer insttuctiODS. 'Ibis section describes these cases in detail. Standard LD and ST sequences are presented first. followed by floating point operations (!POPS).

Then SAVE, RESTORE. and all fonns of control transfers are described. The final section describes how the pipeline deals with exceptions.

This section describes only the simple cases of these sequences. In particular.

pipeline stalls caused by a variety of sources are not considered here. In general.

pipeline bubbles or idle cycles may be injected into the pipeline at any point for a variety of reasons.

...

.!!l,.!!

Revision 2.00 ofNowmber 1. 1990

2.3.1. Memory References

2.3.1.1 LD (Load operation)

L-LOs and STOREs are very frequent operations in SPARe code. In a typical program, as many as 30% of the instructions are loads or stores. Since Viking executes up to 3 instructions per cycle, it may be required to execute a memory reference nearly every cycle. This presents significant challenges to the processor design.

To maximize performance, Viking has removed restrictions that have existed in prior RIse designs. In particular, many sources of interlocks on load instructions have been removed. This allows Viking to execute a ^LDinstruction, immediately followed by a dependent ALU instruction in the next instruction group (an ALUOP

with a register dependency with the LD).

All LD and ST instructions that hit in the internal data cache execute in a single cycle. This includes all byte, half-word, word, and double-word references. Up to two other instructions may be included in the instruction group with the memory reference. Stores are generally buffered. In ee mode, they take a single cycle to execute whether or not they hit in the cache.

The diagram below (load after ^ALUOPexample) shows a ^LDinstruction execut-ing, surrounded by arithmetic operations. For simplicity, the sequence uses sin-gle instruction groups, forced by the dependencies in the code. The code

sequence being executed demonstrates ^theuse of many data forwarding paths for reference. The sequence is: instruction and the two would be grouped together. This would have resulted in a pipeline bubble between the second add and the load, as shown in the example below. TIle total execution time would have been identical.

-add UO,U1,U2

TIle execution of this code sequence through the pipeline is shown below:

• §!I,.!!

San MicIosywtaDI Propna.y Revision 2.00 of November 1, 1990

16 TMS390ZS0 - Viking User Documenwion

Figure 2-2 Basic load pipeline sequence

CLOCK

The add and shift instructions execute, pass data through forwarding paths from the add result, into the shifter. then from the shifter result into the virtual address adder for the load. TIle "OxlO" offset is extended into a 32-bit value in the Dl stage. TIle offset extension word, and the forwarded version of register %12 are added, and the result passed to both the data cache and the MMU, which are

accessed in parallel. When a hit is identified in the noB, the physical page number is extracted and passed on to the data cache. In the meant time, the cache has completed reading all data and tags for the four possible sources of the memory location (4-way set associative cache). TIle tags are compared with the physical

page nmnber ^{from the}MMV. Wben the proper set is identified. the data is routed back: and forwarded into the next EO execute stage for the-last add instrudiun, it is also written ^intothe register file.

Had a cache or MMU miss OCCUlTed. pipeline bubbles would have been insenec1.

The EO and El stages of the load would be repeated until all the misses had been

Im Dokument The Viking Microprocessor ·(T.I. TMS390Z50) User Documentation (Seite 23-30)