• Keine Ergebnisse gefunden

Digital Technical Journal

N/A
N/A
Protected

Academic year: 2022

Aktie "Digital Technical Journal"

Copied!
83
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Digital Technical Journal

I

INTERNET PROTOCOL V.6 PRESERVATION OF HI STORICAL COM PUTER SYSTEMS

FORTRAN FOR PARALLEL COM PUTING SERVER PERFORMANCE EVALUATION AND OPTIM IZATION

INTERNET COLLABORATION SO FTWARE

Volume 8 Number 3 1996

(2)

Editorial

)an� C. Bl.1kc, ;Vl;ln;1ging Editor Helen L. Pattnson, Editm Kathleen M. Stetson, Editor Circulation

Catherine M. Phillip>, Administrator Dorothc� R. C1ss;1dv, Sccrct<1ry Production

T�rri Auticri, Production Editor Ann� S. K�tzcff, T)'l.lOf'.L1phcr PetcT R. \.Yoodhury, Illustrator Advisory Board

S�rnuel H. hiller, Chairman Rich.1rd W. !kane

Donald Z. H<1rixrr Williollll R. Hall'e Rich�rd ). Hollings11·orrl1 Willi<1m A. L1ing Richard 1'. l.ary Alan(;. NenH:th P�ulinc A. Nist Robert M. Supnik

Cover Design

The tiuKtion of the lnrc:mct is a simple: one:

C:ounecr indi1·idu·.1ls through computer nct\\'orks 11 orld11 ide ti>r rhc purpose of (CllllllH111iGHion. The sr.tphic ()l) our({)\'(]"

symbolizes this 11·orld11 ide: connection of innumcr:1bk peopk in "cl'ixrsp:1ce." Inside the issue, tii'O p<lJ><:rs <Hldrcss aspects of the:

c·omplcs 11·ork needed to n1:1ke the: connec­

tions, tit·st, :lt the protocol kvd, Internet Protocol version 6, and :lt the: user ln•el, Alt�Vist<1 rorum software t(>r collabor:1rion on rhe lmcrnct.

The cover image is based on a phorograph t.lkenlw Chuck Cilkttc ofsh dil'<:rs 11·ho set a rcwrd in October 1996 tilr rhc lllllll­

ber of pwplc (I 04) in � si nglc timm rion.

The· COI'l'r dc:sign is 111' l.ucincb O'Neill of Digit� I\ Corporate Design (;roup .

The !)fgiwl 'f(•cbuical.foumal is a refereed journJl published quarterly by Digital Equipmenr Corporarion, SO N;1gog Park, AK02-3/R3, Anon, MA 01720-9843.

Subscriptions un be ordered by sending :�check in U.S. funds (lll<1cl� pJvable ro Digiral Equipment Corporation) to the published-bv address. General subscription rates ar� $40.00 (non-U.S. $60) for four issues and $75.00 (non-U.S. $115) tor eight issues. Univnsitv and ..:ollegc pmti.:s­

sors and Ph.D. sructcnrs in the dectric1l engineering and computer s..:ience tields re..:eive COtnf'limentJry subscriptions upon request. Digit;1l's utstomers m.1y qualit\•

for gift subsu·iptions <1nd Me cncour « ged ro cont«([ their account represenr<Hives.

Single copies <1nd back issues ;.Jrc a1·ailabk fm$16.00 (non-U.S $18) each and L<ll1 be ordered bv sending the requested issue's volume and number :111d ;1 check to the published-lw address. See the Further Rc1dings section in the back of this issue t(>r a ..:omplctc listing. R.:cetlt issues are also <11.;1ilabk on the fnrcnKt ar

http:/ jwww.digital.com/info/dtj.

Digit:1l ctnployees may order subscriptions through Readers Choice <lt URL http:/ jwebrc.clas.dec.com or lw entering VTX PROFILE at the SI'Stem prompr.

lmptirics, address ch;1ngcs, and compli­

metltal'\' subscription meters em be s<:tlt to the IJ(�f/({/'/i>chnico!Jollnwl ;Jt the publishcd-lw address or the ek([mnic m;1il ;1ddress, cltj@digir<11.com. Inquiries can ,1lso be tmcle bv calling thejollrnal oftice :1r 508-486-2538.

Comments on the conretlt of anv f'aper

;1rc 11·ckomed and mav be sent to the managing editor at the publishcd-lw or electronic mail address.

Copvright © 1996 Digital Equipment Corpor<1tion. Copying ll'itholl[ fcc is per­

mitted provided that such copies arc ll1<1de for usc in cclucarional institutions bv t:JCulrv menJbcrs and arc nor distributed for com­

nH:Tei<li adv�ntage. Abstr:JCting 11·ith credit of Digital Equipment Corporation's ;1uthor­

ship is permitted.

The intt)rtnation in thc.fouma/ is subject ro change ll'ithour notice Jnd should not be construed .1s .\ cnmmittnenr bv Digit<11 Equipment Corporation or lll' the comp:1·

nics herein rq1resented. Digital Equipment Corpor.Hion assumes no responsibilitY for atw errors that lll<1)' :1ppear in thejounwl.

ISSN 089R-90 I X

Documetlt,Jtion ::--lumber �:C-;-...17285-18 Book production \\'aS done bv Qu<HHic Communications, Inc.

The folloll'ing arc rradem,11·ks of Digital Equipment Corporation: AlphaServcr, AlphaStation, Alt<1Vista, DEChub, DECm,ne, DEC Notes,DECsystem-10, DECt<1pe, DECl'S, DECII'rirer, Di&it<11, the DIGITAL logo, (;ICAswitch, GI(;J, HSC, HSZ, J-11, KA10, KJ, LA, LN03, LQP03, LSI-II , MicroVAX, MicroVMS, MINC, Open VMS, PATH WORKS, PDP, PDP-11, POLYCENTER, Q-bus, RC, RC25, RK,

RL, RM, !U', ltSTS/E, RSX-11 M, RT-11, lc\01, RX02, RZ, TM, TruCiusrer, TS, TU, UNIBUS, VAX, VAXclustcr, VAXmJte, VAXsr.nion, VMS, and VT.

A I X, DB2, IBM, L.otus Notes, l'oll'crPC,

<1nd RlSC Systcm/6000 are registered trJdemarks <1ncl Svstcm/360 is a rr.1denurk oflnternation;J] Business MKhines Corporation.

BASIC is a registered tr.1ckm;1rk of the trustees of Dartmouth College, D.I3.A.

Dartmourh College.

BSD is :1 rr,1clemark of rhe Uni1·crsit1' of Calitornia ;lt lkrkelcv. · CHALLENGE is a registered tr�ckmark ot'Silicon (;r;1pbics, ftK.

Hcll'lctt-Packard, HP, c1nd HP-UX .11-c registered tradctnarks ot'Hcll·ktt-Packard Companv.

H im.11�1'<1 and Tandem at-e registered track­

marks ofT.111ckn1 Computers, Inc.

INI-'ORNIIX and INI-'ORMIX-OnLinc

<lrc registered tr<Hkmarks of I ntimnix Softii'.HT, Inc.

KAP is ;1 tradem,nk ofKuck & Associates, lnc.

J\t!EMORY CHANNEL is <1 tr<1detmrk ot·

Encore Computer C:orpor:Hion.

Microsoh and Visual C++ .He registered trademarks and Windo11·s and Win doll's NT .liT tradem,Hks of Microsoft Corporation.

MIMIC is a trademark of Sierra Geophvsics, fne.

Mosaic is a tr.1dcm;nk of Mosaic Communications Corpor;Jtic>n.

Orack7 is :1 tr;Jckmark ofOr�de Corporation.

SoLlris ;111d S l't\ RCcelltcr ;H·e rcgistuecl trademat-ks of Sun Minosvstcms, Inc.

SPFCint is <1 tradcm:Jrk of the St�nd;1rd Pcrtimn:HKe b·;1luarion CotHKil.

Svbasc is <1 registered tr;1tkmark of Svbase, Inc.

TPC-C is a tr;�ckmark of the Tt-;1nsacrion Processing Pt'rt()rm;lncc Coun..:il.

Tuxedo is a registered tr;Hkmark of BE;\

Svstems, Inc

UNIX is :1 registered tradem.Jrk in the United St,Jtcs ;md in other coutJtries, lic·et1scd exclusi,·ch• through X/Opcn Comp.1n1' I .rd.

(3)

Contents

Foreword

IN TERNET PROTOCOL V.6

Internet Protocol Version 6 and the Digital U N IX Implementation Experience

PRESERVATION OF HISTORICAL COMPUTER SYSTEMS

Alan G. Nemeth

D�nicJ T. Harrington, James P. Bound, John J. McCann, and Matt Thomas

Preserving Computing's Past: Restoration and Simulation M�x\\'cll M. Burnet and Robert M. Supnik

FORTRAN FOR PARALLEL COMPUTING

Modern Fortran Revived as the Language of Scientific Parallel Computing

WilliJm N. Celmaster

SERVER PERFORMANCE EVALUATION AND OPTIMIZATION

Performance Measurement of TruCiuster Systems under the TPC-C Benchmark

Performance Analysis Using Very Large Memory on the 64-bit Alpha Server System

IN TERNET COLLABORATION SOFTWARE Building Collaboration Software for the Internet

Judith A. Piantedosi, Archana S. Sathave, and D. john Shakshobcr

T�red'Ka\\'at� D. John Shakshober, and l);ll"id C. Stanlcl·

Dah Ming Chiu and David M. Griffin

3

5

23

39

46

58

66

Digit<ll Tcchnicd journal Vol. 8 No.3 1 996

(4)

2

Editor's

Introduction

This issue presents papers on di1·erse computing topics-the Internet, modern Fortran language extensions fix parallel computing, and perf(Jr­

mance measurement ofAipluServer 64-bit !USC systems-each repre­

senting an area of engineering strength for Digital. Also in the issue is <1 thought-prm·oking paper on the p1-cscn·ation of historical computers.

The opening paper on the Internet Protocol 1·ersion 6 examines the status of todav's Internet and looks toward its future. Digital is one of several companies participating in the work­

ing groups of the Internet Engineer­

ing Task Force on the transition to a new protocol. Dan Harrington, Jim Bound, Jack McCmn, and Matt Thomas 1-eport what they ha1·c learned ti-om designing an IP1·6 prototvpc, and compare and contrast the ne11·

version with the existing protocol,

!Pv4. The most import<ll1t difkrence between the versions-one th<lt will relieve the strain on the Internet-is the increase in lPv6 of address size fi·om 32 bits to 128 bits. The authors conclude with a look :Jt futme work in such areas as securitv and daulink interfaces for ATM.

Om next paper-an unusual one not only for the issue but f(Jr this Journal-temporarily moves the dis­

cussion from computing's fi.mm: ro its past. Max Burnet and Bob Supnik argue that an understanding of com­

puting's past is vital to irs future. The

<luthors present two computer preser­

,·arion techniques: restOI·ation and simulation. To exempli�· issues in restoration, they review the status of

<1 project to restore a large UNIBUS­

based PDP-11 svstem. The section

Digit<li Teehnical Journ<li

on simulation describes the tvpes and purposes ofsimuLuors <md presents a case study ofSIM, <l simulator imple­

mented inC for the �rudy of historical computer architectures.

In a paper on modern fortr:m, Bill Celmaster demonstrates th<H todav's

Fortran is a viable mainstrcun lan­

guage for parallel computing. Since irs de1·eJopment more than 40 vcars ago, Fortran h;ls been extended bv language designers to meet the needs of users, particularlv the needs of scientific/technical users who require mathematical expressivity :md code optimization. Bill reviews key features of Fortran 90, recent effiJrts to stan­

dardize parallel extensions to Fortran, and shared-memorv p<H<lllelism. He includes three c1se studies that illus­

trate the data par<lllel and single­

program-multiple-dat<l styles of programmmg.

Two papers describe resting methodologies that resulted in lead­

ership system perform:mce under the TPC-C: bcnclml<lrk t(>r <l cluster system and fcJr a single-node svstem.

The first paper presents the evalua­

tion of an AlphaServer 8400 5/350 TruCiuster configur<Hion support­

ing the Oracle Parallel Server data­

base. Judy Pi:u1tedosi, Archana Sathaye, and John Sh<lkshober dis­

cuss the system tuning and the record­ setting results of their work. The sec­

ond paper, by TJrcef Kawaf� John Shakshober, and D<we Stanley, looks at two optimization techniques­

locking intrinsics <lnd OM profile­

based optimization-applied to a large d:nabase program running in the very large memorv (VLM) envi­

ronment on an AlphaServcr 8400

Vol. 8 No.3 1 996

wstem. The results of these optimi­

zations arc significmt increases in throughput and database-cache hit ratios.

The development of AltaVista Forum is the subject of our final paper. Unlike other groupware prod­

ucts, AluVista forum uses the World

\Vide 'Web as <111 infi·astructure to facilitate the rapid development of collaboration applications for NT and U�IX systems. Dah Ming Chiu and Dave Griffin explain this design deci­

sion and slure their experiences with usability studies, an interpretive lan­

guJge (Tel) f()r building the toolkit, and the inclusion of an indexing and search engi nc.

The next issue of the.fourna/ will feature the ne11 AI phaSen·er 4100 high-performance midrange sen-er system, a new implementation of MEMORY CHANNEL, and large­

database technologies in the VLM environment.

�&4_

Jane C. Blake ,\/anaginp, f:'ditnr

(5)

Foreword

Alan G. Nemeth CiJ/pumll' Ccmsultanl

1 IN! X Architecture om/ 'Ji!chno/o;.:y

"The Internet is dying." l reel quite confident you will regularlv sec articles with this mess:�ge in the industrv :md general press m·er the next t(:,,. ve;lrs.

The message won't be as nn,· as rhe authors of the articles might bclie1·e, and the work ro 1-cmm·e the most ri·equenrly identified problems 11·as begun VG1rS :�go 11 ithin the Internet Engineering T:�sk force (I ETr).

Internet Protocol 1·ersion 6 (I P1 6) is a large bmilv of protocols th;H form the basis of the I ETF response ro a set of problems identiried in the early 1990s and till· which the need is accelerated by the explosion of

fnterner usage.

One of"the major concerns about the current Internet is the limited amount or· address space. The under­

lying address r(Jr I P endpoints is 32 bits wide, permitting;\ total of4 bil­

lion distinct addresses. Although this number seC!llS large (and it seemed truly gig:llltic in the early 1970s when the width was selected), it is currently a real, practical barrier to current

deployment p::�tterns. Large users of Internet addresses can no longer get the address space thev need t(.Jr assignments. Because the Internet

h::�s run as a decentralized organization over the years, there is no effective central administration to support com­

petition for scarce resources such as address space. Instead, the response of the community is to provide resources sufricicnt to keep allocation as a low­

overhead activity. So 1Pv6 defines an Jddress space of 128 bits. This cur­

rmrly seems like a gigantic number1 Bur limited address space is hard to build into a persuasive case for change.

End users arc much more likely to be concerned about the local problem or· getting just "one more address,"

rather th::�n the problems of keep ing the Internet as a whole ali1·e and li.mcrioning. So the 1Pv6 design dclib­

<.:r;Hclv incorporates a set of fimc­

tionalitl' impro1·emcnts tlut pro1·icle attractive end-user capabilities. IP16 includes much easier schemes tor

;\ssigning addresses, which will reduce the administrative burden for users

;md their nct\I'Ork managers. IJ\·6 provides ;\ better rl·amework for encrvption and an expectation that it ll'ill be ll'idely available and used.

And I Pv6 provides some systematic m.:ch,misms for describing requests r(Jr specific quality levels in the service ofkred by the transport provider.

These capabilities will address some very real, practical problems that do aHlict individual end users of the Internet.

However, there is no expectation tlut it is acceptable to switch the set of Internet users to IPv6 either simul­

taneously or even over an extended time period. IPv6 must interopcrate 11 ith the current installed IPv4 pro­

tocols rcJr an indetinitc period. This implies scn·ices that translate between the difrerenr ::�ddresses (and address

assignment approKhes that eJsc mechanical derii'<Hion of I Pv6 addresses ti·om ll'v4 addresses), dual protocol stacks to permit com­

munication with both protocols depending on the capabilities of the participants in the conversJtion, and schemes to accommodate security mechanisms and quality of service requests.

The entirety of I Pv6 represems a large implementation effort to be undertaken by many different organizations. The Internet repre­

sents the largest example I knoll' ot' J distributed comput<ltion rhat h;�s sur­

,·ived r(x 27 vcars. (I dare ri-om 1969 when the rirsr ARPANET f Advanced Research Projects Agcncv Network]

nodes ll'erc inst<lilccl.) With a kll' notable exceptions, this computarion has run continu;lllv, despite constant changes in hardll';lrc, software, imple­

mentcrs, ;md oper;ltors. I r Ius sur­

,.i,·ed explosi1·e growth tar bevond the designs of irs originators. It h;�s done so with a volunteer organiz;l­

tion driving the development direc­

tion. The community spirit has been crucial w making rhis work. I Pv6 is ;m exampk ofrhat community at work; no one organization c1n implement it all, either at J product level or at a dcploymenr level.

The I Pv6 p<lpcr in this issue describes the technical design needed to build Jn I Pv6 irnplemenr:�tion for the core protocols under the Digital UNIX operJting system. Digital has been one of the leading prototvpc builders of the design spec­

i rications as thev ha\'C evolved in the industrv debates. At the time the Internet Protocol Next (;enerJtion (IPng) Directorate ofticiallv adopted Digit�! Tcchniol Journal Vol. 7 J:-.io. 3 l996 3

(6)

4

kcv clements oftbe protocol, Digital's implementation 11as the on!\· one running to demonstrate that the design was indeed feasible.

But we don't bclie1·e tbat \I'C can implemcm Jll the pieces of ll\·6 <lS a single company. Theretore we choose to share the implementation experi­

ence through this paper to aid others in their eHorts to deal with the imple­

mentation problems. We also don't cl,lim completeness; the fi.J!I suite of specitications ror !Pv6 is evolving, :�nd the software to implement it is large.

We fullv expect that portions of our ultimate product ot1erings will be dc1-clopcd b1· others in the industrv.

The long-term e1·olution of the Internet captured in the IP1·6 implc­

ment•ltion paper is but one example in this issue of the extent to which computing now has a history that gives us much insight into the fi.1ture.

Certainly the paper by Supnik and Burnet is Jll explicit trip through computing histon·. The re-creation, both physical and logical, ofcompur­

i ng S\'Stcrns of the past on on II- he! p remind us that the artif:Kts 11·c crcuc ha,·c a longer lire than we anticipate.

As our programmers write new code, or our hardw:H"C designers produce new architectural approaches, or our storage designers push the bound­

aries on new media technologies, do they consider the imponderables of running these systems 25 or more

\'cars in the future? The view ofarchi­

l'ists tn•ing to preserve this historv reminds us of the difticultv of prcscr­

,·ation after the fact and of the am.lz­

ing duration of design decisions.

The paper on tbe e1·olution of' r:orrran is vet another example of the rich historv of computing. Here we Digirol Tcchniul Joumal

sec clearlv the C\'Oiution of a kcv Lmguagc to accommodate the ch<lng­

ing patterns of's\'stem architectur.ll dcsi)2,11S and parallel program CO!l­

cepts. The computer industrv rl-c­

qucntlv develops commercially import•ll1t programs by evolution­

the 100,000-line program that 10 years later h<lS become 10 million lines of code in an assortment of languages and computing stvles.

Here the vencr.1ble Fortran (tirst intmduced in 19541) adds support

f(Jr some of· the latest appw:1ches to bst S\'Stcm interconnect represented lw t\lElvlORY CHA;-..J;-..iEL and the fXlLlllcl architectures of clusters of SM P S\'Stems.

iVlE!vlORY CHANNEL reaf1fXars in the paper about TPC-C perr()rmancc on TruC:Iuster systems. This paper, one o�-a pair on the issues of tuning a commercially important benchm,lrk, prcscms an attractive model for the benefits in ped(>rmance that can he dcri1·cd ti-om a I'Crl' bst intnconnecr ,md sofr11arc srructu1·es ro m.ltch.

·rhe pcrr(>rm•wce Je,·cls •Khic,·ed sh,lttcr 11 orld records on a bench­

mark that Ius l1ad exrensi1·e atten­

tion .md ll'ork.

The other p<lper on TPC:-C pn­

r(mnancc with very large mcmorv (VI.M) illustrates the truth of,m old design maxim, "If memory is get­

ting cheaper, use more ofit1" When Digital rirst built a 2-gigabytc (GB) mcmon- bo•1rd, it took more than a million dollars' worth of-DRAM chips to populate the initial instance.

Howc1·cr, nJemOI"\' prices 11:1\'C con­

rinucd to drop sharplv, and roda1·

o1·er 40 percent of the AJphaSen·cr 8400 S\'stems ship with 2GB or more of'mcmon·. The memon· prices 11 iII Vol � No.3 1996

continue to come down, and the insights o�"kred in this paper 11 ill help in understanding where additional memon· can pro1·ide real benetits to customer 11·orkloads.

The rinal paper in the collection is on the AltaVista Forum approach to collabor<Jtion arnong groups exploit­

ing the Internet and WWW technolo­

gies and brings us back around to the initial thoughts in this foreword. The ubiquitous nature of the Internet per­

mits and encourages tools such as this that utilize computer S\'Stcms in new 11 •1\'S. This approach builds on the E1bric that we emphasized in the I Pv6 paper but sees the Internet as a tool and a component ot'a larger solution and shows how to exploit these clpa­

bilitics to allow new ways of working.

Using imagination and building on the work of others are characteristic of the :�pproach taken by those who

•1rc cu:tlysts in the industry. The p•1pcr demonstrates how easv it is to build J S\'Stem that would ha1·c been

•1 111ajo1- f)rojccr just fin: vc11·s <�go.

This case of construction is a bcnctit of the programming techniques and infi·,lstructurc im·estments and J spur to keep doing more of it.

(7)

Internet Protocol

Version 6 and the Digital UNIX Implementation Experience

In the early 1990s, the Internet community rec­

ognized that the current TCP/IP architecture was not capable of sustaining the explosive growth of the Internet. In July 1994, the Internet Protocol next generation (IPng) directorate responded to the problem with the Internet Protocol version 6 (1Pv6) as the replacement network layer proto­

col. Working groups of the Internet Engineering Task Force (IETF) then began to build specifications that would add ress the needs for an expanded Internet add ress space, an increase in router table size, and new technology features. As a contrib­

utor to these efforts, Digital has implemented 1Pv6 on the Digital UNIX platform. The primary goal of Digital's efforts has been to evaluate the technical feasibility of the proposed architecture and provide critical feedback to the standards development process in the IETF. The secondary goal has been to evaluate system design alter­

natives to gain the experience needed to allow Digital to incorporate this new architecture i nto existing products.

I

Daniel T. Harrington James P. Bound John J. MeCum Matt Thomas

As one of its ongoing advanced development efforts in nenvorking technology, Digital h as built an Internet Protocol version 6 (IPv6 ) prototype tor the Digital UNIX operating system. I n this paper, we describe the design of the Digital UNIX IPv6 prototype and its his­

tory relevant to the I nternet Protocol next generation (!Png) eftort in the Internet Engineering Task Force (IETF). We also compare its relationship with the existing Transmission Control Protocol/Internet Protocol (TCP /IP) suite. We emphasize techniques and technologies t hat were deve.loped to accommo­

date particular aspects of the IPv6 architecture and issues t hat required further discussion in the IETF. In particular, we d iscuss the moditications to the trans­

port layer modules to use two distinct network layer protocols, along with the i mplications to the UNIX socket layer and applications. In addition, we describe the new IPv6 and Internet Control Message Protocol (ICMP) nenvork layer modu les, including t heir inter­

actions with both the data link layer and the IPv4 protoco l . \Ve review the new Neighbor Discovery Protocol and its a lgorithms and give dt:tails of its implementation.

To accommodate the dynamic nature of fi.Jture net­

works, IPv6 includes mechanisms to do both stateless and stateful add ress configuration, as well as router discovery; we explain the design of a user-mode process that implements these functions. The paper includes a discussion of enhancements to well -known IPv4 services, such as dynamic updates to the domain naming service (DNS), as well as general techniq ues to support the transition of �::xisting applications. The paper concludes with an overview of what we have learned in this project and summarizes our curren t sta­

tus and future work, including efforts in nonbroadc:tst m u l ti p le access (NBJ'v1.A) data link technologies such as asynchonous transfer mode ( ATlvi) and resource reser­

vation protocols.

Internet Protocol Next Generation

In the early 1990s, the members of the I nternet com­

munity realized that the address space and certain aspects of the current TCP/IP architecture were not capable of sustaining the explosive growth of the

Digiral Technical ]ourn<li Vol. 8 No. 3 1996 5

(8)

6

I nternet. Within the I ETr, sC\-cral efforts were under­

taken to both studv and impro,·e the usc of the 32-bit I nternct Protocol ( r r,-4 ) addresses, JS well JS to idcn­

ti t\' :�nd replace protocols and sen·ices that wou ld limit grmnh. The 32- bit addressing architecture i n the nct­

\\'Ork l a\-cr ,,·:�s quickh· determined to be the crux of the problem, with both hard,,·are and hum:111 limits appro:�ching funtbment:�l bound arics . 1 l J),·4 addresses

arc unevenly :�!located i n blocks that arc often too large or roo sma l l ; they :�re also d i fficult ro c hange within any existing network.

When the l ETF cJi led tor replacemcnr proposals, Digit:�! p:�rticipatcd in this industry-wide cft(m by s u bmitting white papers outlining issues and by de,·cl­ oping and evaluating prototvpes of the various pro­

pos:� Is. Digital also participated in the l ETr working groups and in the IPng d irectorate, which had the

responsibi lity tc>r m:�king the ultimate decision . In July 1 994, the 1 Png directorate selected the I nternet Protocol \'CI'Sion 6 ( l l"·6 ) as the repbcemcnt nct\\'ork Ja,·er protocol, Jnd I ETF \\'orking groups began to build specifications. "The Recommendation t(>r the I P Next Generation Protocol" summarizes the c111d i ­ datcs �md cxpbins the selection of this protocol.'

Digital UNIX Prototype

The currcllt Digit:� I U N I X 1 Pv6 prototype project is Digital's most recent addition to an ongoing cHc>rt to develop and evaluate the competing I Png proposals. This began with the Simple Internet Protocol ( S I P ) ,

which used eight octet addresses. S I P was later melded with another early proposal and became known as Simple I n ternet Protocol Plus ( S J P P ) , the d i rect a ntecedent of ! Pv6 . ' The primary goal of Digital's cftC>rts has been to ev:� l u ate the technical feasibility of the proposed :�rchitccture and pro,·ide ked back to the I ETr \\'Or king groups. This is critical to the st:�ndards dC\ e lopmellt process in the IETF, ,,·bich requires mul­

tiple i ndependent �md interoperable implcmenutions of �1 spccitic:�tion bdore it ma\' become an Internet standard . An additional goal has been to e,·�1luarc svs ­ rcm design :�lternativcs to gain the experience needed to a llow Digital to i ncorporate t h is nc,,· :�rc h itccru rc i mo existing prod ucts. Digital h:�s made the prototype Jvaibblc to rcsc::1rchers withi n the company as �1 source

VERSION

I

P R I O R ITY

I

PAYLOAD LENGTH

I

code distribution and more reccnth· has begun to sup­

ply binar\' kits t(Jr earlv adopters and e\'aluators in the I mernct communit\'. As the I P\'6 protocol �md �1rch i ­ tccturc matures, \\'e ha,·e begun t o tocus on ho\\' to best integrate the code i nto the Digital U N I X product.

/Pv6 Overview

To u nderstand the svstem-\\'ide impact of I P,·6, \\ C rn·ic\\' some of its new tCatures and contrast them \\'ith the I P\'4 model. I P\ 6 is both a completclv nc\\' nct\\'ork laver protocol and a major revision of the I nternet architecture . At both levels, it builds upon

and incorporntcs experiences gained with 1 1"4.

figure 1 shows the evolution of the pJcket t(mnat into the new I Pv6 hender. It retains some ticlds ( ver­

sion, source, and d estination address ), cbritics the role o f others ( [or example, the Time To Live [TTL] ticld is rcn:�med the Hop Limit), and introduces new ones (such as Flo\\' l D ) with as vet untapped potcmial. The next hudcr ticld aiiO\\'S for modular construction of complex p�1ckets: different header t\vcs em he ch�1i11cd together to pro,·ide specialized tunctionalirv, includ­

ing securit\' and source routing. hna.l lv, a l l hc:�dcrs arc structured to allo\\' 64- bit alignment, \\'hich shoul d al low optim:d processing both at source and desti na­

tion systems, as \\'ell as in transit '

The most stri king departure ti-om I l'v4 is the :�dd rcss sit.c : i t has increased from 32 bits to 128 bits.

The J P\'6 addressing architecture is rich, with prefi xes t(Jr mu lticast addresses and prcdctincd scopes rcJr both u nicast and multicast addresses. One special type of unicast �1ddress is tl1e link- local address, which permits communications \\'ith only those systems directly con­

nected on the same l ink. This allows a stan(brd boot­

strapping mecb:�nism, so that S\'Stcms can lc�1rn about neighbors and scn·ices before a rour:�blc :�dd rcss is

�1ssigned to an i nterrace. Various address :.�ssignmcnt options h�1\'C been defined, includ ing hier:�rchical models based u pon regional 1-cgisrries �1nd sen·ice prm·idcr idcntitiers.'n In each case, care h:�s been ta ken to ensure proper mute aggregation, \\ hich \\'il l help vicki more efficient b:�ckbone router pcrt(mn�mcc .

Multiple mc:�ns of acquiring addresses have been

dctincd t(Jr J P, 6 addressing, with the go:�ls ofallo\\'ing tlcxibi l i t\' through different administrative policies

FLOW LABEL

NEXT HEADER

I

HOP LIMIT SOURCE ADDR ESS

Figure 1 1 1\·6 Header Digir:d Tcclmic.ll journal

DESTINATION ADDRESS

Vol. 8 No. 3 l 'o/96

(9)

and , perhaps more importJilt, o F demJnding that net­

work address reassignment be supported throughout the an: h i tectu re . The two new :1dd rcssi ng services are Stateless Add ress Aurocontigu ration and the statefu l , transaction - b:.1scd Dynamic Host Configu ration Pro­

tocol version 6 ( D H C P,·6 ) . c s I n the stateless model, add ress prefixes arc learned lw listening for router ad\-ertisemcnt packers. Addresses an: t(Jrmcd bv com­

bining the prefi x ,,·ith a l i n k-specific token such as the 48-bit Ethernet hardware address. I n the stateful pro­

ccd ure, hosts ma\' request Jdd resses, con figu ration information, and services tl·om dedicated configura­

tion sen·ers, with routers potemial ly sen·ing as relay stations during the initi�1! phase. I n both cases, the resulting addn:sses have associated l i fetimes, and sys­

tems m ust be prepared to both learn new addresses and release expi red add resses. Combined with the ability to register updated add ress i n formation with DNS servers, these mechanisms provide a path toward network renumbering, a goal thJt has proved difficult to achieve in the l l'v4 world.

Final ly, the I nternet Control MessJgc Protocol ver­

sion 6 ( ICMPv6) >vas developed." This specification aimed to merge the functions of two d istinct I Pv4 pro­

tocols tor reporting errors and status, I C M P tor u n i ­ cast packet transmission a n d t h e I nternet Group Message Protocol ( I GM P ) tor m u lticast traffic.

The messages ddi ned i n this protocol arc catego­

rized JS either error or i n t(mnational , with a family of messages in the second group used to provide the Neighbor DiscO\·erv P rotocol . '" Neighbor d iscovery serves mu ltiplc purposes with the O\'crJ! I theme of prm·iding a system \\'ith topological and environme n ­ tal hims . F o r examp l e, l i n k- laver Jdd rcss resolution, router d iscovcrv, destination address redirection, and :�ddrcss auroconfigur:�rion mechanisms arc :�II specified using neighbor d iscO\·erv packet tvpes.

Although the network layer did experience the largest Jmoum of change, Figure 2 shows that the effects of this work touch nearly all aspects of the Digital U N IX system . We point out examples ofdccisions nude d ue to

our fundament:tl design philosophy, which is based

upon imcgration with the U N IX system trJmcwork, mod u lar Jlld extensible sofuv:1rc, su pport tor multiple operatjonal policies, and J desire to take advantage of the Alpha plattcmn without compromising portability.

I n the t(J!Jowing sections, we study these topics in depth, beginning with the network layer, then cover­

ing the transport layer rnoditicJtions and the new neighbor discovery algorithms. Alter that, we discuss Jdd ress autocontigurJtion mechanisms and their effects u pon the svstem . We concl ude with services thJt will be afft:ctcd by the transition tl·om I Pv4 to lPv6 such as the socket Jpplicnion programming i ntcrfJce (API) a nd D N S .

EJ

USER KERNEL

TRANSITION MECHANISM

DYNAMIC ADDRESS

I PV6/IPV4 TUNNELS

Figure 2

I P-BASED APPLICATIONS

TRANSITION MECHANISMS

NETWORK COMMANDS AND UTILITIES

I

socKET LAYER

I Ll

__ s_E_c_Ru_I_TY _ __J

8 El

I

N ETWOR K LAYER

I

LINK-LAY E R MODULES

ROUTING TABLE AND NEIGHBOR CACHE

NEIGHBOR DISCOV E RY

Base Platform Changes

Network Layer

In this section, we rn·icw the p rocessing req uirements of the I Pv6 mod u les, including I C: M Pv6, extension header options, and ti·Jgmcntation . An early design decision was made to base the networking su bsystem on the Berkeley StJmiJrd D istribution ( BSD) 4.4 model and code base, which J! !ows great tlcxibiliry i n dealing with multiple network layers. 1 1 ·rhis ::dso h::1s the ad\'antage of providing support tc)r vJri�l b l c - bi t ­ length net masks ( also known as Cl D R -stvlc nctmasks, from Classless I n tcr- DomJin Rou ting), which �1re appropriate to both I p, 4 Jnd I p,·6 . " We ha,·e :� I so tried to take mJximum ad,·antage of the 64- bit AlphJ architecture when i mplementing I p,·6 , whi l e making certain that this i m plementation \\'Ou!d run on 3 2 - bit CPUs as wel l . For cxJmplc, the checksum rou tines operate on 3 2 -bit quantities (allowing the GIITV ro overflow i nto the upper 32 bits of J 64- bit register).

The checksum routine is also designed to allow it to be issued to m u ltiple Alpha execution units, which

remains a topic tor further investigation .

Adaptations to Existing IP and ICMP Routines

The 1Pv6 and ICMPv6 routines Jrc completely independent of the correspond i ng 1 Pv4 and I C M Pv4 routines, and the processing styles have d istinct difkr­

ences. In l Pv6, the incoming packet is treated as bei ng read-on ly, while the BSD 1 Pv4 code manipu!Jtcs fields within the lPv4 header. We also avoid u nneccsS:Jrv usc of the m_p u l lup routine (which consoliLbtes ch

;

i ncd

memory buffers into a single large bufkr) bccJusc this could cause the packet to be necd lcsslv lost. Finallv instead of passing numerous clrgument

when callin·

from function to nmction, a COI111110n cbtJ structure is Digital Technical Journal Voi . R ::--.Jo 3 1 996 7

(10)

used to store ncccssarv dJt:l and poin te rs; r(Jr most rl.1 nction calls, it is only necessary to p�1ss a poin ter to this structure. This reduces the stack overhead and also \'iclds mod u lar <111d casih· extensi ble su broutines.

I P,·6 has a dedicated interrupt processi n g thread , and rccci,·cd 1 1".·6 packets Jrc pl:�ccd onto their own i nterrace input queue ( i fl1ucuc ) . When an l P\ 6 packet is taken off the i f(]ucuc, basic ,·alid itv tests are done;

onlv afi:er passing them is the packet tested to see i f it is directed to <l u nicast or �1 mul ticast <1ddrcss.

If the p:�ckct is to <1 m u l ticast �1dd rcss, the destina­

tion is compared to the enabled I !\ 6 m u lticasts ror the intcrr:\Cc m·er which the p<lckct was rccci,-cd . I f the destination matches, the packet is passed up to normal packet processing; otherwise, a copv of the packet is p�1SSCd to the multicast ri.lr\\'ardcr.

Simi larlv, u niust p�1ckcts �1rc c hecked to sec that the destination matches one of the svstcm 's addresses. I n the special case of the packet being t<1rgctcd to a l i n k­

local address, only the l i nk-local �1ddrcss r(Jr the receiv­

ing i ntcrbcc is comp<1rcd . If there is a n exact match, the packet is processed normally; otherwise, it i s passed to the u nicast packet r(Jrwarding routi ne.

Header Processing

After a packet has been matched to a loca l add ress, the I Pv6 headers must be p rocessed, independently of whether the packet is m u l ticast or u niG!st. This pro­

cessing is done in a common routine that handles a l l tvpcs of 1 Pv6 headers. A nll lnber of actions mav result ti·om the \'Crit[cation and :111:1lvsis p h ase, i ncluding an I C:!Y\ Pv6 p<Kkct being sent bJck to the source, the packet being silcntlv d ropped, or being ftlrwardcd to another node due to J sou rce route . If none of these possibilities occurs, the next [ !',·6 header in the packet is processed .

I f the header is a kn0\\'11 I I\·6 header tvpe , the

<1ppropriatc rou tine is cal led. If not, this packet is prolx1blv destined t(n another protocol module suc h a s TCP, r h c User Datagram Protocol ( U D P ) , o r I C : Jv! P,·6. T h e hc1der type is looked u p i n t h e l i s t o f active protocols and passed t o t h e mJtching protocol input rou tine. If no entry is t(Jund, <111 I C M Pv6 e rror may be sent back.

Header Options

Since the hop-by-hop and destination node headers have the S<1mc t(nmat, a common rou tine processes both types. As the routine processes each option , i t valid�ncs the optio n . I f this hils, i t checks whether an 1 C:M Pv6 parameter problem error should be sent, whether the packet shoul d be discarded, or the option ignored .

/CMPv6 Processing and Checksums

Upon receipt oLm I C:MJ',·6 packet ri·om <l node i n the network reporting :�n error or other i n r(mnation, it is

\'oi . K :-:o. 3 1 996

first validated for correct packet r(m11Jt :�nd checksum.

The packet is then rL!rthcr processed b�1scd u pon its I CiV!Pv6 type val ue. I fit has an IC:Ml\·6 error type ( i .e . , tvpe v a l u e J ess t h a n 1 2 8 ) , t h e appropriate notifications are sent to the afkcted protocol . Neighbor d iscovcn·

packets, which arc al l i n r(mnationa l , hJ\'C a n u m ber of addition<1 1 consistency checks, and the packet is d ropped if it hils the m . Afi:er the I C : M P,·6 packet has been processed, it is also sent to am· I C:M 1',·6 raw sock­

ets that ha\'C req uested reception of that tvpc . The exception to this ru le is �111 I C : M l',·6 echo request packet, which is not copied to the L1\\' sockets.

vVhen an !CM!\ 6 echo request is rccci,·cd :md

\'Jlidatcd, the I C.\11\·6 echo response packet is pre­

pared . I n the typicd c:�sc, it is idcmical to the echo req uest except r(Jr the I CM I',·6 tvpc and c hecks u m value. The exception wou l d b e a n echo request sent to a m u lticast address, in which usc a source address must a l so be selected . R<nhcr than compu ting the checksum on the new packet, the received checksum is simply adj usted down by I , since the sole d i fkrcncc between the two pJckcts is the val ue of the I CM Pv6 type fields, a n d I CM Pv6 echo request Jnd echo response types d i ftcr bv l .

IPv6 req u i res a l l nodes to su pport m u lticasti ng, specif[callv level 2 :�s ddincd in "Host Exte nsions tcJr

I P Multicasting. " ' ' Although this w�1s written tcJr l l'' 4, the same genera l a l gorithms arc used t(Jr ! Pv6. One notable exception to this is that the m ultic:Jst addresses used k1r neigh bor sol icitions and the prcddincd l i n k­

local m u lticasts such as all - nodes and a l l - rou ters do not require period ic st:ltus report s .

Path Maximum Transmission Unit Discovery

One of the signiricam d i fkrcnccs bet\\ ccn I p,·4 and I l\·6 concerns fragmenution. I n I 1'' 6 , r!·agmcmation mav be done onlv lw the node ti·om ,,·hicb a packet originates. Forw:micrs, which nu\' be routers or hosts acring upon sou rce muting hcJJcrs, Jrc nor permitted to fi-agment packets. The burden is on the origi nating node to send packets that arc small enough to rit t h rough a l l the l i n ks <1 1ong the paths to their destina­

tions, where each l i n k type may h �wc a d i fferent maxi­

mum transmission unit ( MTU ) . To c�1sc this burde n , T P,·6 defi n es a m i n i m u m l i n k M T U of 576 bytes. A node may usc this as the upper l imit on p:1ckct size and be assured that its packets arc s u Hicicntlv small to reach their destinations.

The minimu m JV!TU o F a l l the l i n ks in a path between two nodes is referred to as the p�nh MTU .'; I n manv cJses, the path MTU will exceed 5 76 bytes, :md it is desirable to send the largest possi ble packets. I l'v6 pro,·icks a mechanism bv which a node 111�1v discm·er a path's MTU 1' \;\/hen <1 rt)J'\\'ardcr CJnnot r(Jrward a packer because the p:lckct is roo large r(Jr the next hop's link Jv\TU, it sends m lCJYIPv6 Packet Too Big ( l'TB) message back to the sou rce or· the packet. The PTB

(11)

message contJins the MTU of the constricting l i n k . The sou rce node adjusts i ts packer size t o fir through this link.

Path MTU information is kept on a per-destination basis and is stored in the routing table entry tor a given destination . Packets sent on that route will be si zed according to the path MTU value. When J PTI3 mes­

sage is received , the appropriate rou te is updated to contain the new path MTU value as reported in the PTJ3 message, and a rimer is started . When the timer expires, the path MTU value is increased to the ( known ) MTU of the first hop link. This al lows the node to detect increases in the path MTU.

Switches arc provided to disa ble path MTU discov­

ery system-wide, on a per-destination basis and on a per-socket basis. When path J\llTU discovery is dis­

abled , packets are limited to 576 bytes.

Fragmentation

A packet that is larger than the MTU of the path on which it is to be sent must be fragmented. Unlike IPv4, the I Pv6 header contains no fields to carry ti·agmenta·

tion information . Instead, this information is carried in a specialized extension header, called the fragment beJl.ier. As shown in Figure 3, the fields in the ti·ag·

ment header include an off�et, in eight octet units, and an identifier common to all ti·agments of the original packet. The M ( man;�ged ) flag is used to ind icate inter·

med i:ne fragments; the terminal fragment has the bit

RESERVED

\

NEXT HEADER

I

RESERVED

I

FRAG MENT OFFSET

I \ I

M

I D ENTI FICATION

Figure 3

Fragment H eader

Figure 4 Fragmenrarion

O R I G INAL PACKET U N FRAGMENTABLE PART

FRAGMENT PACKETS UNFRAGMENTABLE PART

UN FRAG M E NTABLE PART

UN FRAGMENTABLE PART

c leared . Note that the amount of dJta in <1

ti·Jgmenr

packet is derived !Tom the total packer length.

The tlrst step in the fi·agmentation process is to idcnrit)' the fragment:1ble and unti·agmentable parts of the origi nal packet (see Figu re 4 ) . The unfrag­

menrable part of the packer consists of the I Pv6 header and any extension headers that must be processed by each node traversed by the packet ( e.g., hop-by-hop header, rou ting heJder). The fragment header is appended to the u nfragmenrable part. The rest of the p;Kket is d ivided i n to f

i

·

a

gmenrs, and each fragment is appended to a copy of the unfi·agmentable part plus fragmen t header.

When

the fragment header is appended to the

untragmentablc

part, two fields in the unti·agmentable part must be updated . First, the pJyload length tield in the ! Pv6 header must be updJted to rdlecr the length of rhe fragment packer. Second, the next header tield i n the last header of the unfragmentable part must be changed ro indicate that a h·agmenr header follows.

A copy of the u n tragmentablc parr is created tor each fragmen t packet. As :111 optimi zation, DigitJI U N I X al lows portions ofJ packet to be sh;�red among copies of the packer, to avoid an actual data copy. As with I Pv4 , care must be taken ro ensure that fields being updated are not contJined in sh;�rcd bu ft-Crs.

This is typically Jccomplished by copying the portions that must be updated into a private memory bufter ( m b u f ) . Unlike J Pv4, the untragmenrable pJrt may not fi r in a single

mbut�

and the 1Pv6 ti-agmentation code must be capable of handling this case .

To reduce the possibi l i ty of ti·agment loss at the source node, all the fragment packets arc built before any is passed to the data link tor transmission .

A question that arises here is how big should the ti·agment packets be? Should they be sized accord ­ ing to the path MTU, or shou ld they be limited to 576 bytes> The tormer yiel ds the desirable larger

FRAGMENTABLE PART

Digir�l Technic\! )ourn;\l Vol . 8 No. 3 1 996 9

(12)

1 0

p:1ckers, while the l :mcr :1\'oids undesi rable fragment loss (due ro the tr:1gmenr packet be i ng too big). The

Digital U N I X 1 Pv6 prototvpe su pports either choice on a s�·stem-wide , per-desrin:ltion, or per-socket basis.

T h i s is Jn cx:1mpk of separJtion of mechJnism trom pol icy, J bJsic guideline being used across this project.

Reassembly

The rcasscmblv process reconstructs the original packet ti·om tragmenr p�1c kets. Fragments belonging to the same p<1cket arc i d c n ritied b\' J combin:1tion of source I P Jddrcss, not header tYpe ( fi rst he:1dcr of the tragment•1blc p:1rr) and ri·agmcnt identifier. Indi,·idml fi·Jgmcnts arc queued within the network l aver until the originJI packer cJn be completclv reassembled, at which point it is passed to the appropriate protocol module.

\Nhen :1l l fragmems h�l\'e arrived, the original packer can be n.:assemblcd . A singk copy of thc un tragmenr­ Jble part is kept, Jnd the data from each tagment packet is appended. The p:1ylo:1d length field of the I Pv6 header is updated to rdkct the length of the reasscm­

bkd packet, and the next hc;�dcr ticld of the last header of the unti·agmcntablc part is restored to rdlect the first header in the ITagmcntJble pJrt.

As with the ti·:lgmcntJtion code, care must be taken so thJt tields being uplhtcd arc not in bu ffers shared with other copies of the pJckct.

When the tirsr tl-:1gmcnt of J pJckct arrives, J timer is started. If the rimer expires bdorc that packet is complete, the tr:1gmcnts Jre d isc;�rded . If the other zero tragmcnr h :1s been rccci,Td, Jn I CM p,·6 error message is sent.

Forwarding and Routing

! f a rcccin.:d packer docs nor m:nch one ofrhc S\'Stc m 's add resses Jnd the svstcm is not Jcting :\S a rou ter, the packet is silcnrlv dropped. Othcn,·isc, an attempt is made to t()rW:lrd the packet. The �i rst step in torward ­ ing i� to do :1 lookup in the rou ting t:tblc; the tvpe of lookup depends on whether the pac ker contains a nonzero tlow bbcl. I f it docs, the lookup is based on both the source :1d drcss Jnd the tlow label; otherwise the destination add ress is used. If the lookup succeeds and the length ofrhe p:1ckct tits within the MTU ofrhc

resu lunt route and i n rcr�ace, the packer is trJnsrnitred to rhc next hop :\S indiurcd by the route. Otherwise an appropriate ICMPv6 error is sent back to the origi­

nating node.

Tunnels

Tu nne l i ng is a mechanism that J l l ows packets of one network type to be enc1psulncd :lllli tonvarded within a network layer packer of rhc SJnH:: or a d i fferent rvpe .

J P,-6 packers CJil be ru nnclcd over either I P,·4 or 1 Pv6 networks, JS m:w I p,.4 p:1ckcts1"r The tun neling rou­

tine r:1 kcs as input a p:1ckct, prepcnds the appropriate

Vol . 8 No 3 1 996

I P header tor the network over which the packer wil l be tunneled, :1nd trJnsmits rhc res u ltant packer m·cr rhar network. Tu n nels JI'C unid irccrionJI; there need not be a corresponding tun nel in the reverse d i rection.

Rather thJn h:l\'ing multiple tu nnel i nrcrtaccs (one tor eJch possible combination of protocol Y 0\·er protocol X ) , the Digit:1l U N I X implementation uses a single runnel inrcrtacc. This method was the sugges­

tion of Keith Sklcm er of the Uni,nsit-v of C:-�litorn i:-�

ar Berkc lc\'. 18 Whe n the interrace is i n i ri�1lized, on l\·

:lutomaric tunneling of I p,-6 over I 1'\-4 is cn:1blcd . 1 '' To configure :1 static tu n n e l , where ti\cd end points

are use d , a sutic route is �1d d cd to rhc routing t:lbles wid1 the proper destination and gJtC\\'J\' (runnel end point) addresses.

'vVhen J packer is presented to rhc ru nnel i n terrace, it looks up the route entrv of rhc desti nation address.

The route contents tel ls the tunneling rou tine how the packer is to be encJpsuiJtcd and t()nvarded . The route's gateway address indicates what undcrlvi ng network to use, and the route's destination address ind icates wh<lt type of packet is being tu nneled .

When a tunneled packet is received , the i nitial header is stripped and the resu l ting packe t is placed on the appropriate I Pv6 or I Pv4 i ti..]ucuc.

Tra nsports

One of rbe strengths of the I Png efti>rt was the com­

mirment to preserve the \\'e l l - u nderstood rr:�nsporrs, TCP Jnd U DP , upon which a \\'CJi th of applicnions hJ,·e been built.

The 1Pv6 spccificJtion c:-� l l s r(>r three par ricul1r requirements of upper-laver protocols:

1 . The pseudoheader checksu m must accom modate larger Jddresscs.

2. The ma\imurn p:1ckcr litctimc 1s no longer

computed.

3. The larger I P\'6 hc:-�dcr( s ) must be Ll ken in to account when computing the 111 3 \ i m u m p::tylo�l<.i size (e.g., TC I)'s mJ\imurn segmcm size [ MSS] ) ' In Jddition ro these mandated modifications, we had to nuke a fu ndJment:ll design choice . With rwo d i ffer­

ent network layer protocols in the system, each using a d ifterenr size :-�ddrcss, our design choice could hJve been ro use two independent transport mod u les, one for each network layer. figures 5 and 6 show the i nde­

pendent versus the intcgrued transport design options. Although the independent model oftcrs an clement of design simplicirv, i t w:lstcs memory by duplicating

each transpon l ave r ti.mction . I n the Digital U N I X implementation , these modu les JIT implemented in the kernel, and duplication \\ Ould be expensive. Also, the design :1nd usc ot'a single prognmming imcrLKc to access both sets of scn·iccs wou ld be complic:lted .

(13)

SOCKET LAYER

�� - - - 1

_ _

KERNEL AF_INET AF_INET6

rPC'Bl--i

V4 TRANSPORT

I I

V6 TRANSPORT

�rPC'Bl

I

I PV6

I

� I

I PV4

I

Figure 5

Indcpcndcnr Transport 1 mplemenration

SOCKET LAYER

�� - - - 1

_ _

KERNEL

Fig u re 6

AF _INET/AF _INET6

�::::V:4:AN: D ::V�

6-T

� RA

� N :S:P

:O:R

:T::::

� r-

-, PCB

I

L ... __

, P_v

_4 _

_.1 �-�-- 'P _ _ v 6

_

_.1

Integrated Transport Implementation

The ability to maintain, let alone extend, the code base wou ld also su ffer. Fortunately, d u e to the fact that I Pv4 addresses are a well-defined subset of the entire I Pv6 address space, it is relatively straightforward to i mplement the transports so that a single set of mod ­ ules can be used over both network layers.20 To accom­

p l ish this, we i ncreased the storage space aLlocated tor add resses and separated those functions that arc dependent upon a particular network layer. vVe discuss each of' these issues in this section.

Storing Large Addresses

Two specific data structures must be modified to accommodate addresses larger than the 32-bit 1 Pv4 type . The fi rst of these is the sockaddr struct, which is used when deal ing with the BSD socket layer and passed along to user applications. The second is the Internet Protocol Control B lock ( PCB) data struc­

ture, the in_pcb. In this section, we review the modifi­

cations to each structure.

A program that uses a transport does so by means of the BSD sockets interface and passes addressing infor­

m ation in a sockaddr str ucture. For 1Pv6, this is a sockadd r_in 6 . I n te rn al ly, the structure is detl ned so that 64- bi t alignment is preserved; however, it has the following public definition :

s t r u c t s o c k a d d r i n 6 {

} . ,

u_ c h a r u_ c h a r u s h o r t u _ i n t s t r u c t

s i n 6_ l e n ; s i n 6_ f a m i l y ; s i n 6_p o r t ; s i n 6_ f l o w l a b e l ; i n 6 a d d r s i n 6_a d d r ;

Although the concept of a sockaddr is generic in the BSD archi tecture, the tlow l abel a nd in6_addr mem­

bers of this structure are unique to I Pv6 and would be used only in the AF _INET6 add ress fami ly. The detai ls of this are specified in Reference 2 1 .

The in_pcb data structure is created for each socket using TCP, UDP, or other c lients of the net\vork layer.

In :-tddition to storing the source and destination addresses, various other pieces ofi n tonnation req uired tor proper communication are stored here, including the port numbers, options and tlags, a pointer ro the socket receiving the d ata, a header template, and a pointer ro the routing entry tor the given destination.

For 1 Pv6, this basic model has been retained, and addi­

tional information is stored . This i nformation includes l ocal and remote tlow l abels and indiotors of which address family the application is using and which net­

work layer the transport comm unication is using.

Finally, a partial checksum of the transport pseudo­

header is stored here as well; its usc is described in the following section.

In addition to the explicit storage of the JKt\vork layer and address family, the fundamental technique that facilitates the use of a common transport is the storage of I Pv4 add resses in an I P\·6 format. This is known as an 1 Pv4- mapped address and is described in "IP Version 6 Add ressing Architecture."20 This address format is expl icitly reserved to store addresses of systems that arc capable of using only the I Pv4 protocol, and rhus is an appropriate form of storage in the PCB for communications that will be sent using the I Pv4 protocol, as opposed to fPv4-compatible addresses, which are sent using I Pv6 packets. These mapped addresses are of the fol lowing form :

O O O O : O O O O : O O O O : O O O O : O O O O : F F F F : 2 0 4 . 1 2 3 . 2 . 7 5

These add resses arc manipulated within the I Pv4 TCP and UDP protocols by means of macros that allow the I Pv4 addresses to be inserted , extracted , or compared while in an I Pv6 address structure

(in6_addr).

As an example, the code

t!·agmcnt in

Figure 7 shows an address bein g extracted tor use in evaluating a conflgurable 1 Pv4 socket option.

Special Transport/Network Layer Interactions

Within the integrated transport layers, the transport protocol is treated independently of the particular network layer being used, and net\vork-laycr-speci flc functions are used to interface to either I Pv4 or 1Pv6.

There are t\vo particular i nstances in which the transport l ayer has interactions with the 1 Pv6 nct\vork layer over and above the exchange o f· d ata packets tor input or output. These are the notification and u pdate of path MTU , which is required in 1 Pv6, and the potential to refresh the neighbor discovery cache based on f(xward progress; i.e., if the transport knows that data is reaching i ts destination, it can validate the

Digital Technical Journal Vol . 8 No. 3 1 996 I I

Referenzen

ÄHNLICHE DOKUMENTE

The X user interface (X Ul} was a key element of the DECwindows program, version 1.0. XUJ changed Digital's approach to modern, graphic, direct-manipulation user interfaces

serves the user's investments in those underlying tools. The GUis interpret the output of UNIX commands and present the intormation in pictorial and interactive displays. A

guage ( AVL). Such objects are called dynamic objects because they may be created, destroyed, and altered on-the-fly as a Tecate session unfolds. Nonetheless,

ther reduced by the AccuVid eo dithering algorithm that is implemented by the Dagger and TGA2 chips. Two-dimensiona l smoothing tilters can be su pporttd with

to access the RAM without using mu ltiple cycles per read operation, and since the full transter involving memory comprises four of these operations, the penalty

ies herein represented. Digital Equipment Corporation assumes no responsibility tor anv errors that may appear in the .fou rua!. The following arc trademarks of Digital

ing the physical address of the shared system space page tables into every process' top-level page table. Thus, every process has the same lower-level page

The functional organization of the S-C 4060 Stored Program Recording System may be divided into four main sections: The Input/Output Section, the Product Control Unit, the