CCLK
DCLK
Clock S\1 itching Circ u i t
Enable
>---i MCLK
Cl
ock switching is only us<.:d d ur ing ti lls. Stores which miss in th<.: each<.: and casrouts ar<.: written to m<.:mory through the write bufkr withouts
witching th<.: i n ternal clock over to M C L K . The write buftcr rccciv<.:s borh D C L I<. and M C : L K :1ndt1:1sses the d ata f(>r
external stores across the D C L K/ I'v\ C : L K i n te r be
e ll'ith propn attention to S\'nchro n i zation issues bet\\ cen the t\\'0 clock regim<.:s. On<.: i mcrcsti ng charactnistic of clock
S
\\'itcbing is th�1 t it gi,·cs the S\'stem Lksigncr another option to Sa\ e power in situ�Hions r(Jr ll'hich the ful l pertormanu.: of the chip is not r<.:quircd . Bv disabl ing clockS
ll'itching on th<.: rlv, vou can con fig u;
·c the chip to r u n otlthe bus clock. There is no limit on �1svmmcrry or maxi m u m pu lse width of the bus c l ock, so the c.hip can be operated at very loll' h-cqttcn cics ifd<.:sircd.Conditional Clock Buffers
Conditional clock buffers are simpk l\:AN D/invcrr structures ll'ith an i ntegral lar
c
h on the condition input. The buftl:rs must be marched to their lo�1d to m i n i m i ze skell'. Since adding d u m1m· clock loads is con tran· to the ]oll'-pO\\'Cr design philosoph\
·, \\'C crea
ted scaled clock buftcrs "·hich \\'Otdd produce matched c l ocks for a ,,·ide rang<.: or· lo<
lds and onl\·IKetkd to add dumnw clock l oads for �1 smalln umbn o f vcrv lightlv loaded �lock nmks. Th<.: t�1sk ol. match
incr rh� b c�JCk. bu
H
ers to the loa
d was !-';rcnlv si� . mpli fi<.:d by thebet
the clock load pres
<.:nt<.:d by our scmdard latches is largely data
-independent.Whik the usc of conditional clock bufkrs is central
to
the design method used on the chip,it should Lx
noted that �he critical paths to gener
a
te the condition input to rhcsc bufkrs repres<.:nt some ofrhc most d i Hi c u l r cksign problems i n t h e chip. I n this G1Se, \\ CDigiLll .kchnic�l Tournai Vol . <.) No. 1 ) <.)<.)7
decided t hat th<.: power
S < W
ing associated ll'ith the conditional clocking ,,·::�s ,,·orrh rhc addi tional design eftorr and possible pnt()rmancc reduction .
Latch Circuits
Th<.: st:mcLmi latches us
<.:
d 1 11 the design are diftl:rential edge-triggered l::�rches( F
igure 1 1
). The circuit structure is a prcchargcd d i ftl:rential sense amp tol l owcd lw a pair of cross-coupkd NAN D gates. The sense amp need not b<.: particularly we l l balanced because the inputs to the larch <1l"C fu ll Gv! OS levels. The NMOS s horting dcvic<.: bctll'<.:cn nodes L3 and L4 prm·idcs :1 de path to ground r(Jr !cC�kag<.: currents on nodes L l and L 2 i n case the inputs to t h <.: brch
S
ll'itch after th<.:latch c\·al u�Jtcs. At norn1:1l operating t[·eq uencics, this dn·ice is not parr i c u l a 1·l,· import
a
nt but it is req u i red tor the la
rchto
be staric. Note that si nce the de currcm tlo,,'ing is due on h· to dC\'icc k1 kage, t he magn itud<.:
of the cu rrent is insigniricant to the poll'<.:r i n normal oper
a
tion.Testability
The chip supf10rts I E E E 1 1 49 . 1 boundary scan tlJr conrinu itv rcsn ng. In �
HJ
d i tion, it has nvo hardwar<.:katures to aid i n manubcturing testing. The fi rst is �1 bypass to allm,·
CCLI<.
robe d ri\'Cn f[-om a pin svnchronous to M C L K .
T
his : tl lo\\'s the tester to con trol the r i m i n cr bcn,·e<.:n C : C I . K :1nd 1YI C : LK to make the asvnchron
�
lllS sections :1ppc1r to be ckterministic . The second test katurc pro,·idcs :1 l i n<.:ar f<.:<.:dback shift register ( LrSR) th
a
t can b<.: lo�1ded \\'ith instruction data ti·om the Icache. Load i n g the LrS R can be conditioned based on the ,.�1luc of <!dd ress bit2
and the I cache h i t signal . The LfSR is lmdcd after t h e Fetch stageto
allow the i nstruction tollowing <1 branch to be read from the leach
<.:
�md loaded i nro the LrSR. This katun: allows anv
L
1!1dom 11�1ttern to be loaded i nt o th<.:Figure 1 1 Larch C : i rc u ir
OUT L
OUT H
lcKhc :md then read o u t bv alternating branch instructions with d ata patterns words.
Power Dissipation Results
Measured Results
Power dissipation data was collected on an evaluation bo;ml ru nning Dhrystone
2 . 1
with the bus clock running at one-third of the PLL clock frequency.D hrystonc tits e nt irely i n the i nternal caches so, after the ti rst pass through the loop, pin activity is l imited.
This is the highest power case because cache misses c:1usc the i n ternal clocks to run at the bus speed and resu l t in a lower total power. For both sets ofrncasure
mclltS, external Vdd is fixed at 3 . 3 V. for an i nternal Vdd of 1 . 5 V, t he total power is
2 . 1
mW/JV\ Hz. I f the i mcrn:tl supply i s set to2 . 0
V, the total power is 3 . 3 mW/
MHz. Note that the ratio of the power at1 . 5
and2 .0
V does not track Vdd2 because it contains;1 component of external power and the external V dd is tixcd .
Simulated Power Dissipation by Section
An <lnalysis of node transitions based on simulation
was
pcrf(>rmed to estimate the power d issipation associ;Hed with the various major sections of the chip (Table 3 ) . Toggle i n t(xmation was collected based on J
60,000
cvclcs of Dhrvstonc :md com bined with cxn·acted node cap:�.citanccs to estimate power d issipation
lw
node :�.nd this data was ti.1rther grou ped by section. The clock power l isted i n Table 3 is d u e only to the global clock circuits.
A
kw
poi nts arc worth noting.• First, the power
is
d o m i nated by the c:�cbcs as you m i g h t expect given their size. This is d espite o u r e fforts to reduce their power t h rough bank selection and other means. The I cache b u rns m ore power than the Dcachc beca u se i t r u ns even' cvcle.Ta ble 3
S i m u l ated Power D issipation by Section
I CAC H E 27%
I BOX 1 8%
DCACH E 1 6%
CLOCK 1 0%
I M M U 9 %
EBOX 8%
D M M U 8 %
Write b u ffer 2 %
Bus interface unit 2 %
PLL < 1 %
• Next, the PLL power is insignificant in normal oper
ati o n . As
was noted earl ier,
its low power c haracteristics are only i mport:lllt in
I
dle.• Fi nally, si nce reduction i n cloc k power was one of our explicit goals, it is interesting to consider the total clock power. 1 f you extract the local clock power ti·om the nonclock sections and sum it, you get a total clock power, including the global clock trees, the local clock buffers and tl1e local clock loads. This power is 25% of the total chip power, significantly less th:�.n the 65% consumed by the clocks i n the A lph:�. microprocessor used in our i n i tial feasi bility studies.
Conditional clocking was an integral part of the d esign method , so it is difficult to determine the power s:�ving Jssociated with it. H owever, the power associated with dri\·i ng the conditional c locks is 1 5%
of the chip power and i f the conditions on all the conditional clock
bufkrs
were always true , this power wou l d q uadruple. This docs not account tc)r the additional power savings that has been achieved by bl ocking spu rious dara transitions.CAD Tools
The CAD tools used on this chip were largelv the same as those used on our Alpha designs.5 This is not sur
prisin g since the pert(mn:�nce target of the chip roughly par.1ll els th:tt of the Alpha fam i lv as noted i n the section Circuit I m plementati o n . The most sig
ni
fi
cmt dcp<lrture w:�.s i n the area of static timing veritication and rae<: analysis where the adoption of edge-triggered latching required significant mod i fications to the tools used i n the Alpha designs.
Project Organization
One of the challengin g :�.spccts of this project was geographical. The detailed design was pert()rmcd at fou r sites across a n i ne hour time zone range . The ini
tial feasi bility work and architectu ral de
fi
nition was done at Digital Semicon d uctor's design center i n Austin with on-site partici pation by personnel ti·om Ad\'anced RISC: Mach i ne
s Limited ( A RM ) . The implementation was more widely d istributed \Vith the caches, M M U 's, write buftl:r, and bus interf
ace unit at Digital Semiconductor's design center i n Palo Alto, the instruction u nit, execution u n it, and clocks in Austin , the pad driver and ESD pro tection circuits at Digital Semiconductor's main facility in H udson, MA, and the PLL at the CSEM desig
n center i n Neuchatel, Switzerland . I n addition, we consulted with H udson for CA I ) and process issues, with ARM i n Cambridge, Engl and, tor al l manner of architevDigital Tcchnic1l Jourml Vo1 . 9 No. I ! 9'J7 59
rural issues and
i mplementation trade
ofts associated with A RM designs and \\'ith T. Lee ti-om St�mtord on the P LL. The implementation phase of the project took l ess than nine months with about 20 design en gmeers.Conclusion
The microprocessor descri kd uses trad irion::tl high pcrfPrmance c ustom circ u i ·, design , an intentionallv simple architectural d esign , and ad,·anced CMOS
process technology ro produce a 1 60 M l-I z micro
processor which d issip:nes less than 450 mW. The internal suppli es G \ 1 1 vary hom 1 . 5 to 2.2 Y while the pin i nterface runs at 3 . 3 Y. The chip implements the A R.M V4 instruction set and delivers 1 8 5 Dhrvstone 2 . 1 M I PS at 1 60 M H z . The chip contains 2 . 5 million transistors and is fabricated i n a 0 3 5 - f,Lm three-metal
CMOS process. It measu res 7.8 mm X 6.4 m m and is pacbged in a 1 44-pin plastic thin qu<ld rl at pac k (TQfP) package.
Acknowledg ments
The authors
would
like ro Kknoll'ledge the contributions of the fol lo\\ ing people :
F .
Aires, M . Bazor, G . CIL nev, K . Chui, M . Culbert,T.
Daum, K . Field ing, J . Gee, J . Grodstein, L. Hal l , J . H:uKock, H . Horovitz, C. Houghton, L . H owarth , D . Jaggar, G . Joe, R. Kaye, ] . Kapp, I . Ki m, Y. Lou , S . Lum, D . Noorlag, L. O'Donnell , K. PJtton, J . Reinschmidt, S . Roberts, A . Silveria, P. Skerry, D . Souyadalay, E . Supnet, L . Tran , D . Zoehrcr, andthe PLL design
team Jt CS EM.The support \\'hich thev recci,·ed on manv aspects
of
the design fiom the people at Ach'anced IUSC Nbchin :s,
Ltd .
\\'as very importamand
kecnlv appreciated.Referen(es
ARJl'[ A rch itecture Re/�'rellce
(
Cambt·id ge, EngL1 n d : Advanced !USC !Ybcbi•H:s, Lrd., 1 99 5 ).2. 1'. Grono\\'ski er �1., "A 433 MHz 64 b
Qu�d- bsue
RISC M icroprocessor," !SSCC D(ges/ u{ 7 i'cbuiwl Papers
(
Februarl', 1 996 ): 22 2-2 2 3 .3 . D . Dobberpuhl er <11 . , "/\ 200 M H z 64b Du ;l l - bsue CMOS Microprocessor," Jl:.'t-'l::.}ou rna!
uj'Solid-Stole
Circuits, ,·o l . 27, 110 . I I ( 1 992 ) .
4. V . \'011 Kae11el er al., "A 32C MHz, 1 . 5 mW CMOS P L L f(>r Microprocessor Clock ( ;cncratiot\"
!SSCC IJ(t;esl o/
iechnica/ PajJel� ( f'e bruJrl', 1 996 ) : 1 32-1 33.
::> . T. fo,, ''The Design of High- Pcrtormancc t\1icroprou:s
sors at Digit<ll," ,) lsi A ()\1/ff:.l:.E Des(t;n A IIIUII/(t/iull Conference, s,m Diego, Cali f. (June 1 994 ): 586-59 1 .
Dig;iral Tcchnic1l journ;tl Vol . 9 No. J 1 997
Biographies
James Montanaro
James JV!onr;m;ll·o t·ect:il'cd rhe B . S . E. E. <l lld M . S . E . E.
degree from rl1e Massachusetts Institute uf'J'echnology, Cam bridge, 1v!A, i n 1 98 0 . He joined Digit<ll Equ ipment CorporJrion in 1 98 2 and \\'Orked <ls a circuit designer on sel'cral RISC: micropmcessor chips including rhe tit·st t\\'U Alpha designs. I n 1 99 2 , he JOined Apple Computer <lS a circuir designer on rhc Pm,·erPC: 603 chi p. I n 1 993, he returned t'.l Digit;ll, \\'Orkin g i n the Austin RcseJrch Jnd Design Ccnrcr 011 rbc design ofrhe tirsr StTon�;�:\ 1<..,\lmicro processor
c
hip.Richard T. Witek
Rich Wirck t·ccci,·ed a 13.5. in comp
u
ter scietKe ti·om Au rora Col lege, A u rorJ, I I ., i n 1 976. He is rhe l c1d archi recr on the Strong.ARM microp mcessors ar Digit<ll's Ausri n design center. He ,,·as co-;lrchi recr ofrhc Digiral Alph:l <lt·chitecrure <llld lc1d ;lrchirecr on rhe first Alpha micropi'Ocessor. Rich ,,·as one of the lc1d designers on rhc M icroV.AX I I microprocessor, the fi rst single chip VAX . Ar Digira l , Rich also \\'Oi'kcd on Phase 2 and Phase 3 D 1-'.C :ncr ;lrcbirecture and impl.emenrarion along with orher I' D I' I I and VAX soh-ware projects. Rich was p:u-r of rhe Apple Power PC:
:1rchitect11rc rc1n1 ar Somerset in Austin. His current pro
tessional i nrcresrs include processor architecrure and imple menratiom. Rich ILlS numerous parcnrs <l lld tech nical public;Jtions 011 minoprocessors and c.ll'ilcs.
Krishna Anne
Krishna Anne recei,nl rhe B . E . degree i n elccrmni cs engi
neering i n 1 99 1 ti·om Andh ra University, Vi z;lg, I ndia, and rhc M .S . F. E. degree fi·om ri1c Un i,crsirv ofToas at Arli ngton in J 993. /\frcr a bricfsrav ar Tenslcep Design, I n c . , Ausrin, T.\, i n 1 994, he joi ned A
u
stin Researc h and Design Cenrer of Digiral Equipment Corpor;lrion as a design engineer responsible tor rhe htll-cusrotn design and de,·elopmenr of high -pertorm;mce lo'' -po" cr processors.He \\'Orkcd on the design and implemcnr;uion of rhe m u lri
pliet· on rhe Srrong,A I\tvl projecr and is
cum:mh·
'' orking on another lo\\'-po\\'cr chip.And rew
J .
BlackAnd\' B l ack rccei\ed <l B . S . E . E . from PetlllS\·I , an i a Srarc U n i;ersin· and an M . S . E . E .
from
rhe Uni'c.rsin· of Sourl1ern. Cal i f(nn i <l . He joi ned Digir;1l i n 1 992 <lfrer ,,·orking t\H· I ntern;triona l Sol:�r Electric Technolog\·.H e \\'JS .1 senior h:1rd,,·,1re engineer in Digita l 's Palo AJ ro Design Ccnrer, \\'hen: he led the bus int nhn: unir design rot· the Stro n gA RM SA- 1 1 0 mi croprocessor chip. During his work on rhe Alpha 2 1 1 64 crt · , he '''as a member of the design ream t(Jr rhc mcmorv lllJtLlgc menr u n i r <lnd conrriburcd to rhe c hi p's clock desi gn. H e is c u rrenrll' ,,·irh Sil icon Craphics I n c . as a m e m be r o f rhe technical stati in rhc ,\ I l l'S Tee llllol ogv Di,·ision ,,.IJ<:t-c he is ,,·ork
ing on h i gh - pert(mn;Jnce consu mn-oriemcd pmducrs. And\' is ;\ m e m be r of I . E . E. E . , Tau f>ct<1 lli , ;md ELl Kappa � u .
Elizabeth M. Cooper
Elizabnh Cooper recei1ul the B.S. degrees ( s u n l 111J c u m laude ) in clecrric1l e ngineering Jnd computer science fi·om Washington University in St. Louis in 1 98 5 . She n:ceived Tech nicJI D i t-cctm of the Loll' Power IV!icroprocessor C roup 11·irh Digi t,1 1 's Palo Alro Design Center. He is the senim engineer '1t Digital Eq uipmem Corporation's Ausrin Rese<lrch ,llld Design Cenrer in Austin, T\, 11·orking most recemlv on the micro,1rchitecru re ofrhe SA- 1 1 0 StrongAI\J\1 microprocessor. lktorc his ernploymem wi rh Digital, he w'1s ll'ith the Somerset Design Cen ter in Austin, working 011 the microarchitecture a nd design of rhe Poll'eri'C 603 micro
processor. Pre,·ious to th.is, Jim 11·as ill\ oil-ed in ,"-.S IC design suppon ,md roo! dc1·elopmem 'lt Compaq Com purer Corporation. H is research i mercsts include loll'·p011·er microprocessor design and rhc prop,1g,nion of acoustic ll'a,·es i n vario�ts matct·ials, en hanced bv i nteraction ll'ith selected organic compounds.
Gregory W. Hoeppner
GregorY H o..:ppm:r graduated ll'ith distinction ti·om Purdue Uni,·ersin·, 'Nest Lat:m:tte, I"!, i n 1 979 . I n 1 980 he \l Ot-ked a� Cencr,;l Tel q)hon� and Electronics Research Labor,1rory, W,JJrham, Mr\, pert(mning basic pmperries research on GaAs. From 1 98 1 ro 1 992 he held a number of positions to 11 ork on rhe implementation of the I nstruction Mcmon·
Management Unit
f(x
the SA- 1 1 0 , the f-i rst StrongA I\J\'1 Paper" ,1\vard at the I nternational Solid-State Circuits Conference. and StrongA I\JVI mJCroprocessors. H e is currently working for C-Cu bc Microsystems, Milpi tas, CA. He holds one62
Mark H. P earce
Mark Pearce was born i n Gene,·a, s,,·itzerland, on J u ne 1 2 , 1 969. He recei\'ed the B . S . E . E . degree fi-orn Uni,-crsiLY of Pennsylvania, Philadelphia, i n 1 992
,
and the lvl .S. E . E.degree from Stanford Univers
i
ty, Stanford , CA, in 1 994.I n 1 994 he
j
oined Digital Equipment Corporation, ;H theirPalo Alto Design Center, working initially on a lm,· power Alpha processor prototype. He designed the write buffer on SA- 1 10, the StrongARM processo1·. He is cu rremlv work
ing on another high-pertonnance, low-po\\'cr processor.
Sriba lan Santhanam
Sri balan Santhanam received the M .S . E. deg1·ee in computer science and engineering from the UniversitY of Michiga n , Ann Arbor, i n 1 989. H e joined Digital Equipment Corp
oration, in Hudson, MA, where he worked on the design of the floati ng-point unit of the 2 1 064 CPU and subsequently on the design of the
c
ache control unit of the Alpha 2 1 1 64 CPU. He then moved to Digital's Palo Alto Design Center where he was responsible tor the design of the cac
hes t(lr the SA- 1 10 StrongAfu'vl microprocessor. He is currenth
· .1 pri nci pal hard\\'are engineer \\'Orking on the implementation of ;1 fol lo\\'·On StrongARM miuoprocessor.Kathryn
J.
SnyderKnbryn Snyder
(
formerly Hoover) received the B . S . and M .S . de
gree
s from the University of Michigan, Ann Arbor, in 1 990 and 1992, respecti
vely. She is a circuit designer with Digital Eq u i pment Corporation working on lowpower m icroprocessor designs in Austin, TX. She designed
a varierv of custom circuits tor the SA- 1 1 0 Stro n gA RM microprocessor. Prior to cmplovrne nt \\'ith Digital, she
\\'Orked for IBM in Austin, doing custom a rra1· design
t(>l'
l'owerPC m icroprocessors.
Ray Stephany
Ray Stephany received the B . S . E .E. ti:om R.ense i L �e 1·
Polvtech n ic Institute, TrO\', N Y , and a n 1VL B . A . from Wo
;
·c
ester Polytechnic l nsti.rutc, Worc
ester,MA. He j
oine
dDigital's Austin R
e
search and Design Center in ju ly, 1 99 3 . Since that time, he has been one of the proj
ect leads o n the StrongAfu'vl line of microprocessors. He has contributed to the de,·elopment of low po\\'er circuit design techniques, CA D tools, verification, and overall methodolog\'. He is currently leadi ng the implemen tati
on of a next-generation SrrongARM CPU and looking at SOl as a potential lower power process for futureg
enerations of m icroprocessors. Stephen C. ThieraufStephen Thierau
fi
s ac
onsulting hardware engineer at Digital Equi pment Corporation's Digital Semicondunor Cmup, located in H udson, ,\1A, and is responsible
tor 1/0 circuit design, on- and offchip signal integrit\', and l/0 modeling for Alpha microprocessors, PC! peripher<11s, and other ULSijVLSI de,·iccs. His pre\'ious \\'Ork incl udes wsrem b·el signal integr
itv analysis, micropackaging
an;11ysis <1nd micropackaging design f·(Jr n u merous high-pertorm;mce microprocessors and peripherals.Dif(il'.ll Technical )ourn;ll Vol. 9 No. I I \N7