• Keine Ergebnisse gefunden

Digital Technical Journal Digital Equipment Corporation

N/A
N/A
Protected

Academic year: 2022

Aktie "Digital Technical Journal Digital Equipment Corporation"

Copied!
91
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Digital Technical Journal

Digital Equipment Corporation

Volume 3 Number 1 Winter 1991

(2)

Cover Design

Transaction processing is the common theme for papers in this issue. The automatic teller machine on our cover represents one of the many businesses that rely on TP systems. If we could look behind the familiar machine, we would see the products and technologies - here symbolized by linked databases - that suppo1·t reliable and speedy processing of transactions worldwide.

The cover was designed by Dave Bryant of Digital's Media Communications Group.

Editorial

Jane C. Blake, Editor

Kathleen M. Stetson, Associate Editor

Ci.rculation

Catherine M. Phillips, Administrator Suzanne). Babineau, Secretary

Production

Helen L. Patterson, Production Editor Nancy jones, Typographer

Peter Woodbury, Illustrator

Advisory Board

Samuel H. Fuller, Chairman Richard W Beane

Robert M. Glorioso Richard). Hollingsworth john W McCredie Alan G. Nemeth Mahendra R. Patel

F. Grant Sa viers Robert K. Spitz Victor A. Vyssotsky Gayn B. Winters

The Digital Tecbnicaljoumal is published quarterly by Digital Equipment Corporation, 146 Main Street MLO l-3/B68, Maynard, Massachusetts 0175 4-2571. Subscriptions to the journal are $40.00 for four issues and must be prepaid in .S. funds. niversity and college professors and Ph.D. students in the electrical engineering and computer science fields receive complimentary subscriptions upon request. Orders , inquiries, and address changes should be sent to The Digital Tecbn.ical}oumal at the published-by address.

Inquiries can also be sent electronically to DTJ®CRJ..DEC.COM.

Single copies and back issues are available for $16.00 each from Digital Press of Digital Equipment Corporation, 12 Crosby Drive, Bedford, M A 01730 -1493.

Digital employees may send subscription orders on the ENET to RDVAX::JOURNAI. or by interoffice mail to mailstop MLO I-3/B68.

Orders should include badge number, cost center, site location code and address. All employees must advise of changes of address.

Comn1ents on the content of any paper are welcomed and may be sent to the editor at the published-by or network address.

Copyright <D 1991 Digital Equipment Corporation. Copying without fee is permitted provided that such copies are made for use in educational institutions by faculty members and are not distributed for commercial advantage. Abstracting with credit of Digital Equipment Corporation's authorship is permitted. All rights reserved.

The information in this journal is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this journal.

ISSN 0898-901 X

Documentation Number EY-F588E-DP

The following are trademarks of Digital Equipment Corporation:

DEC, DECforms, DECintact, DECnet, DECserver, DECtp, Digital, the Digital logo, LAT, Rdb/VMS, TA, VAX ACMS, VAX CDD, VAX COBOL, VAX DBMS, VAX Performance Advisor, VAX RALLY, VAX Rdb/VMS, VAX RMS, VAX SPM, VAX SQL, VAX 6000, VAX 9000, VAXcluster, VA.Xft, VAXserver, VMS.

IBM is a registered trademark of International Business Machines Corporation.

TPC Benchmark is a trademark of the Transaction Processing Performance Council.

Book production was done by Digital's Educational Services Media Communications Group in Bedford, MA.

(3)

I Contents

8 Foreword Carlos G. Borgiall i

1 0 DECdta-Digital's Distributed Transaction Processing Architecture

Transaction Processing, Databases, and Fault-tolerant Systems

Phil ip A. Bernstein, William T. Emberton, and Vi jay Trehan 18 Digital's Transaction Processing Monitors

Thomas G. Speer and Mark W Storm

33 Transaction Management Support in the VMS Operating System Kernel

Wi ll iam A. Laing, James E. Johnson, and Robert V Landau 45 Peiformance Evaluation of

Transaction Processing Systems

Walter H. Kohler, Yun-Ping Hsu, Thomas K. Rogers, and Wael H. Bahaa-EI-Di n

58 Tools and Techniques for Preliminary Sizing of Transaction Processing Applications

William Z. Zahavi, Frances A. Habib, and Kenneth). Omahen 65 Database Availability for Transaction Processing

Ananth Raghavan and T. K. Rengarajan

70 Designing an Optimized Transaction Commit Protocol Peter M. Spiro, Ashok M . Joshi, and T. K. Rengarajan

79 Verification of the First Fault-tolerant VAX System Wi lliam F. Bruckert, Carlos Alonso, and James M . Melvin

(4)

I Editors Introduction

Jane C. Blake Editor

Digital's t ransaction processi ng system s are i nte­

grated hardware and software products that operate in a distributed environment to support commer­

cial applications, such as bank cash wit hd rawals, credit card t ransactions, and global t rad i ng. For these app lications, data i ntegrity and cont i nuous access to shared resources are necessary system characteristics; anything less would jeopardize the revenues of busine ss operat ions that depend on these applications. Papers in this issue of the Journal look at some of D igi tal 's techologies and products that provide these system characterist ics in three areas: distributed transaction processing, database access, and system fault tolerance.

Opening the issue is a discussion of the architec­

ture, DECdta, which ensures rel iable interoperation in a d i st ri buted environment. Phil Bernstei n, B i l l Emberton, and V i jay Trehan define some transaction processing termi nology and anal yze a TP applica­

tion to i l l u strate the need for separate architectural components. They then present overviews of each of the components and interfaces of the distributed transaction p rocessing architecture, giving partic­

ular attention to transaction management.

Two products, the ACMS and DECi ntact monitors, implement several of the functions defi ned by the D ECdta architecture and are the twi n topics of a paper by Tom Speer and Mark Storm. Although based on di fferent implementation strategies, both ACMS and DECintact provide TP-specific services for developi ng, e xecuting, and managing TP appli­

cat ions. Tom and Mark discuss the two strategies and then highl ight the functional sim i larities and differences of each monitor product.

The ACMS and DECi ntact monitors are layered on the VMS ope rat i ng system, which provides base services for distributed transaction management.

Described by Bill Lai ng, Jim Joh nson, and Bob Landau, these VMS services, called DECdtm, are an

2

addition to the operating system kernel and address the problem of i ntegrat ing data from multiple sys­

tem s and databases. The authors describe t he t hree DECdtm components, an opt imized implementa­

tion of the two-phase commit protocol, and some VA.Xclu ster-specific optim izations.

The next two papers turn to the issues of measur­

i ng TP system pe rformance and of sizi ng a system to ensure a TP appl icat ion will run efficient ly. Wal t Kohler, Yun-Ping Hsu, Tom Rogers, and Wael Bahaa­

E I-Din discuss how Digital measures and models TP system performance. They present an overview of the industry-standard TPC Benchmark A and Digital's implementation, and then describe an alternative to benchmark measurement- a mult i level analyti­

cal model ofTP system performance that simplifies the system's complex behavior to a manageable set of parameters. The discussion of performance con­

tinues but takes a di fferent perspective in t he paper on sizing TP systems. B i l l Zahav i , Fran H abib, and Ken Omahen have wri tten about a methodology for estimat i ng the appropriate system size for a TP application. The tools, techniques and algorithms they describe are used when an appl icat ion is sti l l in i t s early stages of development.

High performance must extend to the database system . ln their paper on database avai labi l i ty, Ananth Raghavan and T. K. Rengarajan exam i ne strategies and novel techniques that minim ize the affects of downtime situations. The two databases referenced in their discussion are the VAX Rdb/YMS and VAX D BMS systems. Both system s u se a database kernel called KODA, which provides t ransaction capabil i t ies and com m i t processing. Peter Spiro, AshokJoshi, and T.K. Rengarajan explain the impor­

tance of commit processi ng relati ve to throughput and describe new designs for improving the perfor­

mance of group com mit processing. These designs were tested, and the results of these tests and t he authors' observations are presented .

Equal ly as important in TP systems as database avai labil ity is system availabi lity. The topic of the final paper in this issue is a system designed to be cont i nou sly available, the VAX.ft 3000 fault-tolerant system. Authors Bill Bruckert, Carlos Alonso, and Jim Melvin give an overview of the system and then focus on the four-phase verification strategy devised to ensure t ransparent system recovery from errors.

I thank Carlos Borgial li for his help in prepari ng this issue and for writing the issue's Foreword.

(5)

Biographies I

Carlos Alonso A principal software engineer, Carlos Alonso is a team leader for the project to port the System-V operat ing system to the VAXft 3000.

Previou sly, he was the project leader for various VAXft 3000 system validation development efforts. As a member of the research group, Carlos developed the test bed for evaluati ng concurrency control algorithms using the VMS Distributed Lock Manager, and he designed the prototype alternate lock rebuild algorithm for cl uster transit ions. He holds a B.S. E.E. (1979) from Tulane University and an M.S . C.S. (1980) from Boston University.

Wael Hilal Bahaa-El-Din Wael Bahaa-EI-Din joined Digi tal in 1987 as a senior consultant to t he Systems Performance Group, Database System s. He has led a number of studies to evaluate performance database and transaction process­

i ng systems under response time constraints. After receiving his Ph. D. (1984) in computer and informat ion science from Ohio State University, Wael spent three years as an assistant professor at the University of Houston. He is a member of ACMS and IEE E , and he has wri tten numerous art icles for profes­

sional journals and conferences.

Philip A. Bernstein As a senior consultant engineer, Philip Bern stei n is both an architectural consultant i n the Transaction Processi ng Systems Group and a researcher at the Cambridge Research Laboratory. Prior to joining Digital in 1987, he was a professor at Wang Institute of Graduate Studies and at Harvard Un iver­

sity, a vice president at Sequoia System s, and a researcher at the Computer Corporation of America. He bas published over 60 papers and coauthored two books. Phi l received a B.S. (1971) in engineering from Cornel l University and a Ph. D. ( 197'5) in computer science from the University of Toronto.

William F. Bruckert William Bruckert is a consu lti ng engineer who joined D igital in 1969 after receiving a B.S.E.E. degree from the University of Massachusetts. He received an M.S.E. E./C. E. degree from the same university in 1981 . Begin n i ng as a worldwide product support engineer, Bill later worked on a number of DECsystem-10/20 designs. He developed the cache, memory, and 1/0 subsystem for the VA.,'( 8600 processor and was the system architect of the VAX 86'50 processor. H is most recent role was as the architect of the VAXft 3000 system . Bi.ll currently holds seven patents.

3

(6)

4

William T. Emberton As a principal software engineer, William Emberton is currently involved in the development of Queue Management Architecture. He is also involved in X/Open and POS!X TP Standards work ancl is a member of the team that is developing the overall DECtp product architecture. Previ­

ously, he worked on the initial versions of the DEC:dta architecture. Before com­

ing to Digital in 1987, Bill held positions as Director of Software Development at National Semiconductor and Manager of Systems Development for Inter­

national Retail Systems at NCR. He was educated at London University.

Frances A. Habib

Fran Habib is a principal software engineer involved with the development of transaction processing workload characterization and siz­

ing tools. Previously, Fran worked at Data General and c;TE Laboratories as a management science consultant. She holds an

M.S.

in operations research from MIT and a B.S. in engineering and applied science from Harvard. Fran is a full member of

ORSA

ancl belongs to ACM, IEEE, and the

AC:YI S!CMETRJC:S

special interest group on modeling and performance evaluation of computer systems.

Yun-Ping Hsu

Yun-Ping is currently a principal software engineer in the Transaction Processing Systems Performance and Characterization Group. He joined Digital in October

1987,

after receiving his master's degree in electrical and computer engineering from the University of Massachusetts at Amherst. In his position, Yun-Ping is responsible for performance modeling and bench­

mark measurement of both ACMS- and DEC:intact-based TP systems. He also participated in the

TPC

Benchmark A standardization activity during

!989

He is a member of ACM and IEEE.

james E. johnson

A consulting software engineer, Jim Johnson has worked for the VMS Engineering Group since joining Digital in

1984.

He is current!)' a project leader for VMS Engineering in Europe. Prior to this work, Jim led the RMS project, and after relocating to the UK three years ago, he was responsible for much of the design and implementation of the DEC:dtm services. At the same time, Jim was an active participant in the transaction management architecture review group. He has applied for a patent pertaining to the two-phase commit protocol optimization currently used in DECdtm services.

Ashok M. Joshi Ashok Joshi is a principal software engineer interested in database systems, transaction processing , and object-based programming. He is presently working on the KODA subsystem, which provides record storage for Rdb/VMS and DBMS software. For the Rdb/VMS project, he developed hash indexing and record placement features, and he worked on optimizing the lock protocols. Ashok came to Digital after receiving a bachelor's degree in electrical engineering from the Indian Institute of Technology, Bombay, and a master's degree in computer science from the University of Wisconsin, Madison.

(7)

TP benchmark standards activities. Before joining D igital in 1988, Walt was a vis­

i t ing scientist and technical consultant to D igital and a professor of electrical and computer engineering at the Univers i ty of Massachusetts at Amherst. He holds B.S., M.S., and P h . D . degrees in electrical engi neering, all from Princeton University. Walt recently received the IEEE/CS Meritorious Service Award, and he has published over 25 technical articles.

William A. Laing W i l l iam La i ng is a senior consu l ta nt engi neer based in Newbury, England . He is the technical leade r for p roduction systems support for the VMS operat i ng system . D u ring five years spent in the U.S., Bi l l was responsible for the design and i n it ial development of symmetrical mult i­

processi ng support i n the VMS system . He joined D igital i n 1981, after doing research on operating systems at Edinburgh University for nine years. Bill holds a B.Sc. (1972) in mathematics and computer science and an M.Phil. (1976) i n computer science, both from Edinburgh Univers ity.

Robert V. Landau Principal software engineer Robert Landau is a member of the VMS Engi neering Group, based in Newbury, England. He is currently the project leader of a VMS advanced development team investigat ing a high-perfor­

mance, transaction-based, flat file system. Before joining D igi tal i n 1987, Bob worked for a variety of software houses speciali zing in database-related prod­

ucts. He stud ied botany at London Univers ity and, subsequently, obta ined a teaching qualification from Hereford College.

James M. Melvin As a principal design engineer, Jim was responsible for the specification of hardware error-handling mechanisms i n the VAXft system and is presently an engineering project leader for future VA.,'(ft systems. He also speci­

fied and led the implementatio n of t he hardware system simulation platform and t he hardware des ign verification test plan. Jim joi ned D igital in 1984 and holds a B.S.E.E. (1984) and an M.S. (1989) in engineering management from Worcester Polytechnic Insti tute. He holds t hree patents on the VAXft 3000 sys­

tem, al l related to error handling in a fault-tolerant system.

Kenneth]. Omahen A principal engineer, Kenneth Omahen is developing object-oriented queuing network solvers. He designed a variety of perfor­

mance tools and performed design support stud ies which i nfluenced a number of D igital products. Prior to joining D igital , Ken worked at Bel l Telephone Laboratories, lectured at the University of Newcast le-Upon-Tyne, and was a faculty member at Purdue Un iversity. He received a B.S. degree i n science engi­

neering from Northwestern University and M . S . and P h . D . degrees in informa­

tion sciences from the University of Chicago.

5

(8)

Biographies

6

Ananth Raghavan Since join i ng D igital i n 1988, Ananth Raghavan has been a software engi neer who has led projects for t he KODA/Rdb Group. Previous to this position, he was a teaching ass istant in t he computer science department of the University of Wisconsin. Anant h holds a B.S. ( 1985) degree in mechani­

cal engineering from the I nd ian I nstitu te of Technology, Madras, and an M.S.

( 1987) degree in computer science from t he Un iversity of Wisconsin, Mad ison . H e h a s two patent applicat ions p end i ng for h i s w o r k on undo a n d undo/redo database algori thms.

T. K. Rengarajan T. K. Rengarajan has been a member of the Database Systems Group since 1987 and works on the KODA software kernel for database management systems. He is involved in the support for WORM devices and global buffer management in the VA..'\cluster environment. His work in the areas of boundary element methods and database management systems is reported in several published papers and patent applications. Ranga holds an M.S. degree i n computer-a ided design from the Uni versity o f Kentucky and a n M.S. in com­

puter science from the Un iversi ty of Wisconsin.

Thomas K. Rogers Thomas Rogers is a project leader for the Transaction Processing Systems Performance ami Characte rization Group. He is respon­

sible for tes t i ng the V.A.,'C 9000 Model 210 system us ing the TPC Benchmark A standard . Prior to j o i n i ng D igital i n January 1988, Tom worked for Sperry Corporation as a techn ical specia l ist for t he Nort heast region. H e received a bachelor of science degree in mathematical sciences i n 1979 from Johns Hopkins University.

Thomas G. Speer As a principal software engineer i n t he DECtp/East Engineering Group, T homas Speer is currently lead i ng the D EC intact V2.0 pro­

ject. In this posit ion, his m ajor responsi b i lity is defi n i ng the requirements for DECintact support of DECdtm services, client/server database access, and sup­

port for the DECform s p roduct. Since joining Digital in 1981 , Tom has worked on several development projects, including FORTRAN-10/20 and RMS-20. He holds degrees from Harvard University, Ru tgers University, and Simmons College. He is a member of Phi Beta Kappa.

Peter M. Spiro Peter Spiro, a pri n cipal software engineer, is currently i nvolved in optim izing database technology for RISC machi nes. He has worked on database fac i l i t ies such as access m e t hods, journal i ng and recovery, t rans­

action protocols, and buffe r management. Peter joined D igital i n 1985 , after rece iving M.S. degrees in forest science and computer science from the University of Wiscons in. He has a patent pend ing for a method of database jour­

nal i ng a nd recovery, and he authored a paper for an earl ier issue of t he Digital Technical journal. In add i tion, Peter enjoys the game of Ping-Pong.

(9)

TP products for more t han ten years. Currently, he is act ing technica l d irector for t he East Coast Transaction Processing Engi neering Group, as wel l as manag­

ing a small advanced development group. After join i ng D igital i n 1976, Mark worked on COBOL compi lers for the PDP-11 systems and developed the first native COBOL compiler for t he VAX computer. He holds a B.S. (with honors) i n computer science from t h e Un iversity o f Southern M ississippi .

Vijay Trehan Since joi n i ng Digi tal i n 1978, Vijay Trehan has contributed t o several archi tecture projects. H e i s t h e techn i cal d irector responsi ble for DECtp architecture, design, and standards work. Prior to t his assignment, Vijay was t he archi tect for t he DECdtm p rotocol, architect for the D DIS data inter­

change format, and i n i t iator of work on t he D DIF document i n terchange format and compound document strategy. He holds a B.S. ( 1972) i n mechan ical engi­

neering from t he I nd ian I nstitute of Technology and an M.S. ( 1974) in operations research from Syracuse Un iversity.

William Z. Zahavi As an engineering manager, B i l l is responsible for the des ign and development of predict ive sizi ng tools for t ransaction p rocessi ng app.lications. Before join i ng D igital i n 1987, he was a techn ical consu ltant for Sperry Corporation, specializing i n systems performance analysis and capacity planni ng. Bil l rece ived an M . B.A. from Nort heastern Un iversity and a B.S. i n mathematics from t he Univers ity o f Virgi n ia . H e i s an active member o f the Computer Measurement Group, and frequently presents at CMG conferences.

7

(10)

I Foreword

Carlos G. Borgialli

Senior Manager, DECtp Software Engineering

Transaction p rocessing is one of the largest, most rapidly growing segments of the computer i nd us­

try. D igital's st rategy is to be a leader in transaction processing, and toward that end we are making technological advances and delivering products to meet the evolving needs of businesses that rel y on transaction processing systems.

Because of the speed and rel iabi l i ty with which transaction processing systems capture and d is­

play up-to-date information, they enable businesses to make well-informed, t imely decisions. Industries for which t ransaction p rocessing systems are a sig­

nificant asset i nclude banki ng, labo ratory au toma­

tion , manufacturing, government, and i nsurance.

For these i ndustries and others, t ransaction p ro­

cessing is an i nformation l ifeli ne that supports the achievement of da i l y business objectives and i n many instances provides a competitive advantage.

Many older transaction processing systems on which busi nesses rely are centralized and tied to a particular vendor. A great deal of money and time has been invested i n these systems to keep pace with busi ness expansion. As expansion continues beyond geographic boundaries, however, the cen­

tralized, s i ngle-vendor t ransaction p rocessing sys­

tems are less and less l i kely to offer the flex ibility needed for round- the-clock, rel iable, business operations conducted worldwide. Transaction pro­

cessi ng technology therefore must evolve to respond to the new business environment and at the same t ime protect the i nvestment made i n existing systems.

Our research efforts and i nnovative p roducts provide the transaction p rocessi ng systems that businesses need today. The demand for d istribu ted

8

rather than central ized systems has focused atten­

tion on system m anagement. Que u i ng services, highly av a i lable systems, heterogeneous environ­

ments, securi ty services, and compute r-a ided soft­

ware engineering (CASE) are a few examples of areas in which research and advanced develop­

ment efforts have had and will con t i nue to have a major i mpact o n the capabilities of transaction processi ng systems.

Transaction p rocess i ng solut ions requ i re the appli cation of a w ide range of technology and the integration of m u l t iple software and hardware products: from desktop to ma inframe: from presen­

tation services and user i nterfaces to TP moni tors, database systems, and compu ter-a ided software eng ineeri ng tools; from optim ization of system performance to optimization of availabi lity. Making all of this tcch.nology work well together is a great challenge, but a challenge D igital is u niquely posi­

t ioned to meet.

D igital ensures broad appl ication of its t rans­

action p rocess i ng technology by defi n i ng an architecture, the Digital Distribu ted Transaction Architecture (DECdta). DE Cdta, about which you will read i n this issue, defines the major components of a D igital TP systt:m and the way those components can form an integrated transaction p rocessi ng sys-­

tem. The DECdta architecture describes how data and processi ng are easily d istributed among m ulti­

p le VAX p rocessors, as wel l as how the components can i nteroperate in a heterogeneous environment.

The D ECdta architecture is based on the client/

server computing model, which allows D igital to apply its traditional strengths in networking and expandabi I ity to t ransaction p rocessi ng system so lutions. In the DECdta client/server computing model, the client port ion i nteracts with the user to create processi ng requests, and the server portion performs t he data manipulation and computation to execute the processing request. T his computi ng model facil itates the d ivision of a TP system into small components in three ways. It al lows for dis­

tribut ion of functions among VA_,\: p rocessors; i t part itions the work performed b y one or more of the components to al low for parallel processi ng;

or i t repl icates functions to achieve h igher ava i l­

ability goals. T hese opt ions permit the customer to p urchase the configurat ion that meets present needs, confident that the system will al low smooth expansion in the future.

Further, the D ECdta architecture sets a direction for i ts evolution through different p roducts i n a

(11)

coord inated manner. It provides for the cooper­

ation and interoperation of components imple­

mented on different platforms, and it supports the expansion of customer applicat ions to meet growth requirements. The DECdta arch i tecture is des igned to work with other Digital arch itectures such as the D igital Network Architecture (DNA), t he network application services (NAS), and the Digi tal database archi tecture (DDA). Moreover, the DECdta architec­

ture supports ind ustry st andards that enable the portability of appl ications and their interopera­

t ion in a heterogeneous enviro nment, such as the standard appl ication programming interfaces being developed by t he X/Open Trans action Proce ssing Working Group and t he IEEE POSJX. Standard wire protocols that provide for systems interoperation in a mult ivendor, heterogeneous environment are be i ng developed by the International Standards Organization as part of the Open System Inter­

connection activities.

Among the products D igi tal has developed speci­

f ical l y for TP systems are the TP monitors. These monitors provide the system integrat ion "glue," if you will. Rather than act as their own systems inte­

grators, customers who use D igital's TP monitors are able to spend more t ime on solving bus iness problems and less t ime on solving software in te­

gration problems, such as how to make forms and database products work together smoothly.

Digital's TP moni tors run on all types of hard­

ware configurations, including local area networks (LANs), wide area networks (WAJ'\Is), and VAXcluster systems. The DECdta client/server computing model provides t he necessary flex ibility to change hard­

ware configurations, thus allowing reco nfigura­

t ion without the need for any source code changes.

The two TP moni tors, DECin tact and VAX AG•IS, i ntegrate vital D igital technologies such as t h e D igital Distributed Transaction Manager (DECcltm) and products such as D igital's forms systems (DECforms) and our Rdb/VMS or V�'\ DBMS data­

base products. DECdt m uses the two-phase com­

mit protocol to solve the complex problem of coord i nating updates to multiple data resources or databases.

Major developments in Digita l's database prod­

ucts have enhanced the strengths of its overal l product offerings. The two mainstrea m database products noted above, Rdb/VMS and VA,"( DBMS, layer on top of a database kernel called KODA, thus providing data access i ndependent of any data mod el. The services made available by KODA,

besides its high performance, allow D igi tal's data­

base products to eff icient ly support TP applica­

tions as well as to provide rich functional ity for general-purpose database appl ications.

For those TP systems that require u ser i nter­

faces, DECforms provides a device-independent, easy-to-use human interface and perm its t he sup­

port of mult iple devices and users within a single appl icat ion.

TP systems that requ ire high ava ilabil i ty or con­

t inuous operations are supported by the V�'X fam­

ily of hardware and software. The introd uct ion of the fault-tolerant VAXft 3000 system, added to t he successf u l V�'Xcluster system, allows for a high level of s ystem av a ilabil i t y. Performance needs also are be ing met by a combination of hardware resources. includ ing the VAX 9000 system.

This combinat ion of architecture, software, and hardware technology, and support for emerging industry standards places D igital in an excellent pos i t ion to become the industry leader for d is­

tributed, portable transaction processing systems.

The papers in this issue of the Journal provide a view of t he key elements of D igital's d istributed transaction process ing technologies.

Many individuals, teams, organizations, and busi­

ness partners are respons ible for bringing Digi tal's TP v ision to fru it ion. Their dedicat ion, hard work, and creativity will cont inue to drive t he develop­

ment of new technologies t hat enhance our family of products and services.

9

I

(12)

Philip A. Bernstein William T. Emberton Vijay Trehan

DECdta -Digitals Distributed Transaction Processing

Architecture

Digital's Distributed Transaction Processing Architecture (DECdta) describes tfJe modules and interfaces that are common to Digital's transaction processing (DECtp) products. The architecture allows easy distribution of DECtjJ products.

fn particular. it supports client/server style applications. Distributed transaction management is the main function that ties DECdta modules together it ensures that application programs, database systems, and other resource managers inter­

operate reliably in a distributed �ystem.

Transaction processing (TP) is the activity of execut­

ing requests to access shared resources, typical ly databases. A computer system that is configured to execute TP applications is cal led a TP system.

A t ransaction is an execut ion of a set of opera­

t ions on shared resources that has the fo llowing properties:

Atom ici ty. Either aJ J of the transaction ·s ope ra­

t ions execute, or the transact ion has no effect at all.

Serializabi li ty. The set of all operat ions that exe­

cute on behalf of the t ransaction appears to execute serially with respect to the set of opera­

tions executed by every other transaction.

Durabi lity. The effects of the transaction 's oper­

ations are resistant to fa i lu res.

A t ransaction term inates by executing the com­

mit or abort operat ion. Commit tells the system to install the effect of the transact ion's operations permanently. Abort tells the system to undo t he effects of the transact ion's operations.

For enhanced reliabi l i ty and ava i labil ity, a TP application uses t ransactions to execute requests.

That is, the application receives a request message (from a d isp lay, compu ter, or other device), exe­

cutes one o r more t ransactions to process the request, and possibly sends a reply to the origina­

tor of the request or to some other parry specified by the originator.

TP appl icat ions are essential to the operation of many indust ries, such as finance, reta i l , health care, transportation, govern ment, commun ications,

10

and manufacturing. Given the broad range of appli­

cat ions of TP, D igital offers a wide variety of prod­

ucts with which to build Tl' systems.

DECtp is an u mbrel la term that refers to Digi tal's TP p roducts. The goal of DECtp is to offe r an inte­

grated set of ha rdware and software p roducts t hat supports the development, execu t ion, and management of TP appl ications for enterprises of all sizes.

DECtp systems include software components t hat are specialized for TP, notably TP monitors such as t he ACMS and DECintacr TP monito rs, and transaction managers such as the DEC:dtm t rans­

action manager. ' ' DECtp systems also req uire the integration of general-purpose hardware products (processors, storage, communications, and termi­

nals) and software products (operat ing systems, database systems, and com munication gateways).

These products a re typically integrated as s hown in Figure l.

TP APPLICATION

TP MONITOR DATABASE SYSTEMS FORMS MANAGER

OPERATING SYSTEM COMMUNICATION SYSTEM

Figure 1 Layering of Products to Support a TP Application

Vol. .l No. I Willll!r J')')J Digital Tec!Jnical jounwl

(13)

Appl ications on DECtp systems can be des igned using a client/server parad igm . This parad igm is especially useful for separat i ng the work of prepar­

ing a request from that of running t ransactions.

Request p reparation can be done by a front-end system, that is, one that is close to the user, i n which processor cycles arc i nexpens ive and inter­

active feedback is easy to obtain. Transaction execution can be done by a larger back-end sys­

tem, that is, one that m anages large databases and may be far from the user. Back-end systems may themselves be d istribu ted . Each back-end system manages a p orrion of the enterprise database and executes appl icat ions, usually ones that make heavy use of the database on that back end. D ECtp products are modu larized to al low easy d istribu tion across front ends and back ends, which enables them to support client/server style applications. DECtp systems thereby simplify pro­

gramming and reco nfiguration in a d istribu ted system.

Digi t a l 's Distributed Transaction Processi ng Architecture (DECdta) defines the modularization and d istribu t ion structure that is common to DI'Ctp products. D ist ributed transaction management is the m a i n fu nction that tics this structu re together.

This paper describes the D ECdta structure and explains how DECdta components are integrated by distributed transaction management.

Current versions of DECtp p roducts imp lement most, but not all, modu les and inte rfaces in the DECdta architectur e . Gaps between the architec­

ture and products will be fi l led over time. D ECtp products that current ly imp lement DECd ta compo­

nents are referenced throughou t the paper.

TP Application Structure

By analyzing TP appl icat ions, we can see where the need a rises for separate D ECdta co mponents. A typical TP app l ication is structured as fol lows:

Step 1 : The client application i nteracts with a user (a person or machine) to gather input, e.g., using a forms manage r.

Step 2 : The client maps the user's input into a request, that is, a message that asks the system to pe rform some wo rk. The c l ient sends the request to a serve r appl ication to process the request.

A request may he d irect or queued. Jf d irect, the client expects a server to process the request right away. If queued , the cl ient deposits the request in a queue from which a server can dequeue the request later.

Digitu/ Teclmicul jouniUI Vol. ,) Nu I Winter t'J'JI

Step 3: A server processes the request by executing one or more transactions. Each trans­

action may

a. Access multiple resources

b. Cal. I programs, some of which may be remote c. Generate requests to execute other t ransactions d. Interact with a user

e. Return a reply when the transaction fi nishes Step 4: If the transaction produces a reply, then the client i nteracts with the user to d isplay that reply, e.g., using a forms manager.

Each of the above steps involves the interact ion of two or more programs. In many cases, it is desir­

able that these programs be d istribu ted . To d is­

t ribute them conveniently, i t is important that the programs be in separa te components. For exam­

ple, consider the fol lowing:

The p resentation service that operates the dis­

play and the appl ication that controls which form to d isplay may be d istributed.

One may want to off-load presentation services and related functions to front ends, whi le allow­

ing programs on back ends to cont rol which forms are d isplayed to users. This capabi l i ty is useful in Steps 1 , 3d, and 4 above to gather input and d isplay output. To ensure that the presenta·

tion service and application can be d istribu ted, the p resentat ion service should correspond to a separate DECdta component.

The cl ient appl ication that sends a request and the server application that processes the request may be d istribu ted. The applicat ions m ay com­

m u n icate through a nerwork or a queue.

In Step 2, front-end applications may want to send requests direct ly to back-end applicat ions or to place requests in queues that are managed on back ends. Simi larly, in Step 3c, a t rans·

action, T, may enqueue a request to run another t ransaction, where the queue resides on a d if­

ferent system than T. To max imize the flexibi l­

ity of d istribu t i ng request management , request management should correspond to a separate DECdta component.

Two t ransaction m anagers that want to run a com m i t protocol may be d istribu ted .

For a transaction to be distributed across different systems, as in Step 3b, the transaction management

1 1

(14)

Transaction Processing, Databases, and Fault-tolerant Systems

se rvices must be dist ri buted.

'1()

en sure that each t ran saction is at omic, the t ransac tion manage rs on these sy ste ms must c on t rol t ran sac tion c o m m it­

men t using a com mon c o m mit prot oc ol. To c o m­

plic ate matte rs, the re is more t han on e w ide ly used prot oc ol for t ran sac ti on c o m mit men t. To the exten t possi b le, a sy st e m sh o u ld all ow inte ro pe ra­

t ion of th ese protoc ols.

To en sure th at t ran sact ion manag e rs c an be dis­

t ributed, the t ran sact ion m an ag e r sho uld be a c o mponent of DEC:dt a.

Tc>

en sure th at they c an inte ro pe rate, the ir t ran saction p rot oc ol sh o u ld also be in DECdt a. To en sure th at (liffe rent c o m mit p rot oc ol s

em

be supported , the part of tran saction man age ment th at define s the prot oc o l for inte r­

act ion with re mote t ran sac tion man age rs sh ould be se parated f ro m the part th at coordinates t ran s­

act ion exec ution ac ross loc a l re sources. In the DECdt a architecture, the forme r is c alled a c o m mu­

nic at ion man age r, and the latte r is c al led a t ran s­

act ion manage r.

Inte rope rat ion of t ran s action m an age rs and re source man age rs, such as (latabasc syste ms, also affect s the m od ul arization of DEC:dt a c omponent s.

A t ran saction may inv olve clifferent ty pe s of re source s, as in Ste p :)a. For example , it may update d at a th at is man aged by different database sy ste ms.

To c ont rol t ran saction c o m mit m en t, th e t ransac­

tion man age r must inte rac t w i th d iffe rent re source man age rs, p ossi bly su pplied by diffe rent vend ors.

This re qui re s th at re so urce man ag e rs be separate c omponents of DE C:dt a.

The DECdta Architecture

H aving seen whe re t he need fo r DECdt a c ompo­

nent s ari se s, we are n ow re ady t o de sc ri be th e DE Cdt a architec ture as a w hole, inc luding the func­

t ion s of and interf aces t o e ach comp onent.

Most DECdt a inte rface s are rmblic . S ome of the public inte rf ace s are c ont rolled by offic ial stan­

dard s bodie s and ind ust ry c onsortia; i .e., they are

"open " inte rf ac es . Oth ers are c ont rolled sole ly by D igit al. DECdt a inte rf ace s and protoc ols w il l be published and align ed with ind ust ry st andards, as appropriate.

DECdt a c omponent s are abst ract entitie s. They do n ot nece ssari ly map one-t o-one to hardware component s, software c omponent s (e .g ., p ro­

g rams or prod uct s), o r exec ution envi ron ment s (e .g ., a single-th re aded p roce ss, a multith re aded process, or an ope rating sy ste m se rvice). Rathe r, a DE Cdt a c omponent m ay be i mple mented as m u lti­

ple software c omponents, for ex ample, as seve ral

1 2

proce sse s . Alte rnatively. sev era l DECdt a c o mpo­

nen ts may be imple men ted as a s ing le software c omponent. For ex ample, an ope rating system o r

TP

m onit or ty pic a l ly offe rs th e fac il ities of more th an one DECdt a c ompon en t.

The f ollowing are th e c ompon en ts of DEC:d ta:

An applic a tion p rog ram is any prog ram that use s se rv ice s of D ECdta com pon ent s

A re sou rce man ager man ag es resourc es th at sup­

port t ran sact ion se mantic s.

A t ran saction m an age r c oordin ates tran sac ti on te rmin at ion (i.e , c o m mi t and abort).

A c om munic ati on man age r supports a t rans­

ac tion c o m m unic at ion protoc ol between

Tl'

syste ms.

A p re sent ation man ag e r support s d ev ic e-inde­

pendent inte ract ion s with a presen tation d evic e.

A re q ue st m an ag er fac i li t ates th e subm ission of re que sts to exec ute t ran sactions.

DECdt a c ompon ent s are l ay e red on serv ice s that are p rovided by the underlying operating sy ste m and dist ributed syste m platform, and arc n ot spec i­

fic t o

Tl',

as sh mvn in Figure

2.

Application Program

We usc the term app l ic ation prog ra m to mean a prog ram th at use s th e services provid ed by oth e r DECd ta c ompon ent s . An app lic ation p rog ram c o u ld be a c ust omcr-wri tt cn prog ram, a laye red

prod uct . or a DfUita c omponent .

In the D ECdt a arch i tecture, we disting uish tw o special types of app l ic ation prog ra m : request ini­

tiat ors and t ran sact ion se rve rs. A re quest in it iator is a DECd ta c o mpon ent that prepares ami submi t s a req ue st for the exec ut ion o f a t ran sact ion.

Tb

c reate a re q ue st, t he re que st initiator usua

II

y inte r­

act s with a pre sent ati on m an age r that provide s an inte rface t o a device, such as a te rmin al, a w ork­

station, a dig it al priv ate branch exchange, or an aut o m ated telle r machine .

A t ran s acti on se rve r c an d emarc at e a t ran s ­ acti on, inte ract with one or more resourc e man­

age rs t o acce ss rec ove rable re sourc e s on behalf of the t ran saction, inv oke ot her t ran sac tion serve rs, and re spond t o c alls f rom request initi at ors.

For a s im p le re q ue st , a t ransac ti on serv e r receives the re que st , proce sse s it, and opti on ally ret urn s a re ply t o the re q ue st initiat o r. A c onve r­

sation al re que st is like a simple re que st, exc ept th at while p roce ssing the re q ue st, t he transac t ion

\�11. j .Vu. J Winter 1991 Digital Tecbuica/ jourua/

(15)

A P P L ICATION PROGRAMS

TP S E R V I C E S

R EQUEST I N ITIATOR

R E QU EST MANAGER

P R E S E NTATION MANAGER

R E Q U EST MANAGER

OPERATING SYSTEM A N D D I S T R I BUTED SYSTEM S E R V I C E S

DIST R I B U T E D NAME S E R V I C E

DISTR I BU T E D T I M E S E R V I C E

T H R E A D MANAG E M E N T S E R V I C E

TRANSACTION S E R V E R

RESOU RCE MANAG E R

OTH E R

COM M U N I CATION MANAGE R S TRANSACTION

MANAGER

U I D S E R V I C E A U T H ENTICATION S E R V I C E

Figure 2 DECdta Components and Interfaces

server exchanges one or more messages with the user, usuall y through the request initiator.

In principle, a request ini tiator coulll also execute transactions (not shown in Figure 2). That is, the dis­

tinction between request i n i t iators and transaction servers is for clarity onl y, and does not restrict an appli cation from perform ing request initiation func­

t ions i n a transaction. Architectural ly, this amounts to saying that request initiation fu nctions can exe­

cute in a transaction server.

Resource 1l1anager

A resource manager performs operations on shared resources. We are especia l l y i nterested i n recover­

able resource managers, those that obey transaction semantics. In particular, a recoverable resource manager undoes a transaction's updates to the resources if the transaction aborts. Other recover­

able resource manager activities i n support of trans­

actions are described in the next section. In the rest of this paper, we use " resource manager" to mean

" recoverable resource manager."

In a TP system, the most common k i nd of resource manager is a database system. Some pre­

sentation managers and communication managers may also be resource managers. A resource man-

Digita/ 1ec1Jitical jourt�al 1-'11/ . .> Nu. I \Vinter I'J'JI

ager may be wri tten by a customer, a third party, or D igital.

Each resource manage r type offers a resource­

manager-specific interface that is used by applica­

tion p rograms to access and modify recoverable resources managed by the resource manager. A des­

cription of these resource manager i nterfaces is outside the scope of DECdta. However, many of these resource manager interfaces have archi tec­

tures defined by industry standards, such as SQL (e .g., t he VAX Rdb/Vtv!S product), CODASYL data man­

ipulation language (e.g., the VAX DB,'v!S product), and COBOL fi le operations (e.g. , RNIS i n the VMS system).

One type of resource manager that plays a spe­

cial role in TP systems is a queue resource manager.

It manages recoverable queues, which are often used to store requests. ' I t allows appl ication pro­

grams to p lace elements i nto queues and retrieve them, so that appl ication programs can com muni­

cate even though they execute i ndependently and asynchronou s l y. For example, an appl ication pro­

gram that sends elements can communicate with one that receives elements even if the two applica­

t ion p rograms are not operationai simultaneously.

This communication arrangement improves ava i l­

abil i ty and faci litates hatch input of elements.

1 3

(16)

Transact ion Processing, Databases, and Fault-tolerant Systems

A queue resource manager i n terface supports such operations as open-queue, close-queue, enqueue, dequeue, and read-elemen t . The ACMS and DEC in tact TP moni tors both have queue resource managers as components.

Transaction Manager

A t ransaction manager supports the transact ion abstraction. It is responsible for ensur i ng the atom­

icity of each transaction by tel l i ng each reso urce manager in a transaction when to com m i t . It uses a two-phase comm i t p rotocol to ensure that ei ther all resource managers accessed by a t ransaction comm i t the transaction or they all abort the t rans­

action. ' To support transaction atomici ty, a t rans­

action manager provides the fo l lowing functions:

Transaction demarcation operations allow appli­

cation p rograms or resource managers to start and commi t or abort a transaction. (Resource managers sometimes start a transaction to exe­

cute a resource operat ion if the caller is not executing a transac t ion. The SQL standard requires this.)

Transaction exec u t ion operations al low resource managers and com munication man­

agers to declare themselves part of an existing transaction.

Two-phase com m i t operations al low resource managers and communication managers to change a transaction's state (to "prepared," "com­

mitted," or "aborted ").

The serial izabi l i ty of t ransactions is primari l y the responsibil ity of the resource managers.

Usual ly, a resource m anager ensures serial izabi l i ty by set t i ng locks on resources accessed by each transaction, and by releasing t he locks after t he transact ion manager tel l s the resource manager to commit. (The latter activi ty makes serial izabi l­

i ty partly the respo ns ibility of the t ransaction manager.) If t ransactions become dead locked, a resource manager may detect the dead lock and abort one of the dead locked transact ions.

The durability of transactions is a responsibi l ity of transaction managers and resource managers.

The t ransaction manager is responsible for the durabi l i t y of the com m i t or abort decis ion. A resource manager is responsible for the durabi l i ty of operations of com m i t ted transactions. Usually, i t ensures durabi l it y by storing a description of each t ransact ion 's resource operations and state changes in a stable (e.g., d isk- resident) log. It can

14

later use t he log to reconstruct transactions' states while recovering from a fa i lure.

A deta i led description of the DECdta transaction manager component appears in the Transact ion Manager Architecture section.

Communication Manager

A com munication manager provides services for communication between named objects i n a TP system, such as application programs and trans­

action managers. Some commun ication managers part icipate in coord i n a t i ng the term i nation of a transaction by p ropaga t i ng the transaction man­

ager's two-phase comm i t operations as messages to remote communication managers. Other com­

munication managers propagate application data and transact ion context, such as a t ransaction iden­

tifier, from one node to another. Some do both.

A TP system can support multiple commun ica­

tion managers. These communication managers can interact with other nodes us i ng d ifferent com­

m i t protocols or message-passi ng p rotocols, and may be part of d ifferen t name spaces, securi ty doma i ns, system management doma i ns, etc.

Examples are an IBM SNA LU6.2 commun ication manager or an ISO-TP communication manager.

By support i ng m u l t iple com munication man­

agers, the DECdta architecture enhances the i nter­

operability ofTP systems. D i fferent TP systems can i nteroperate by execu t i ng a t ransact ion using d if­

ferent com m i t protocols.

A com munication manager offers an i n terface for application p rograms to comm u n icate w i t h other application programs. Different communica­

tion managers may offer d ifferent communication paradigms, such as remote procedure call or peer­

to-peer message pass i ng.

A com munication m anager also has an i nterface to i ts local t ransaction manager. It u ses this i n ter­

face to tel l the transaction manager when a trans­

action has spread to a new node and to obt a i n i nformation about transaction commitment, which it exchanges w i th comm u n i cation managers o n remote nodes.

Presentation Manager

A p resentation manager provides an appl icat ion p rogram with a record-oriented i n terface to a pre­

sentation device. Its services are used by applica­

tion p rograms, usual ly request i n i t iators. By using presentation manager servi ces, i nstead of d i rectly access i ng a p resentation device, appl ication pro­

grams become device i ndependent.

Vol. 3 No. 1 Winter 1991 Digital Teclmicaljournal

(17)

A forms manage r is one type of presentation manager. Just as a database system supports opera­

t ions to define, open, close, and access databases, a forms m anager supports operations to defi ne, enable, d isable, and access forms. A form i ncludes the defi n i t ion of the fields (wi t h different attributes) that make up the form. I t also i ncludes services to map the fields into device-i ndependent application records, to pe rform data validation, and to perform data conve rsion to map fields onto device-specific frames.

One presentation manager is D igital's DEC:forms forms management p roduct. The DECforms prod­

uct is the first i mplementat ion of the A NSI/ISO Forms Interface Management Systems standard (COOASYL FIMS) .'

Request Manager

A request manage r provides services to authenti­

cate the source of requests (a user ami/or a presen­

tation device), to subm i t requests, and to receive repl ies from the execu tion of requests. It supports such operat ions as send- request and receive- reply.

Send- request must p rovide the ident i t y of the source device, the identity of the user who entered the request, the ident ity of the appl ication pro­

gra m to be i nvo ked, and the i nput data to the program.

A request manager can ei ther pass the request di rect ly to an application program , or it can store requests in a queue. In t he latte r case, anot her request manage r can subsequently schedule the request by dequeuing the request ami i nvoking an a pplication p rogram. The ACMS System Interface is an example of an ex isting request manager inter­

face for d irect requests. The ACMS Queued Trans­

action Ini tiator is an example of a request m anager that schedules queued requests.'

Transaction Manager Architecture

OECdta components are t ied together by the t rans­

action abstraction. Transactions al low application programs, resou rce m anagers, request managers (ind irectly through queue resource managers), and commun ication managers to inte mperate reliably.

Si nce transactions p lay an especially important ro le i n the O ECdta archi tecture, we describe the transaction management funct ions in more det a i l.

The OECdta archi tecture i ncl udes i nte rfaces between transaction managers and applicat ion p rograms, resource managers, and communication manage rs, as shown in Figure 3. I t also i ncl udes a

Digital Tedmical Jour11al 1'<>1. .i 1\i>. I Winler I') VI

APPLICATION PROGRAM

OTH ER

COMMUNICATION MANAGERS

Figure 3 Transaction Manager A rchitecture

transaction manager protocol, whose messages are propagated by communication managers. This pro­

tocol is used by D igital's D EC :dtm d istributed t rans­

action manager.'

From a t ransaction manager's viewpoint, a trans­

action consists of transact ion demarcation opera­

t ions, transact ion execution operat ions, two-phase com m it operat ions, and recovery operations.

The t ransaction demarcation ope rat ions are issued by an application program to a transac­

tion manager and incl ude ope rat ions to start and e i ther end or abort a t ransaction.

Transaction execur ion operations are issued by resource managers ami commun ication man­

agers to a transaction manager. They i nclude operat ions

For a resource manager or com m unication manager to join an existing transaction - For a commun icat ion manager to tel l a t rans­

action manager to start a new branch of a t ransaction that al ready exists at another node

Two-phase com m i t operat ions are issued by a transaction manager to resource managers, commun ication managers, and through com­

munication managers to other t ransaction man­

agers, and vice-versa. They i nclude operat ions - For a transaction manager to ask a resource manager or commun ication manager to p re­

pare , comm i t, or abort a transaction

For a resource manager or commun ica­

t ion manager to tel l a transaction manage r whether i t has p repared, com m i t ted , o r aborted a transaction

1 5

(18)

Transaction Processing, Databases, and Fault-tolerant Systems

- For a com mu n ication manager to ask a t rans­

action manager to p repare, co m m i t, or abort a t ransaction

- For a transact ion manager to te l l a com mu­

n ication manager whether it has prepared, com m i tted, or aborted a transaction

Recovery operat ions are issued by a resource manager to its t ransaction manager to deter­

m i ne the state of a t ransaction (i . e . , com m i tted or aborted).

In response to a start operat ion i nvoked by an application program, the transaction manager d is­

penses a unique transaction ident ifier for the trans­

action. The transaction manager that processes the start ope ration is that t ransact ion's home t rans­

action m anager.

When an application program invokes an opera­

tion supported by a resource m anager, the resource manager must find out the t ransaction identifier of the appl ication p rogram's t ransaction.

This can happen in d iffe rent ways. For example, the appl ication p rogram m ay tag the operation with the t ransaction ident ifier, or the resource m anager may look up the transact ion identifier in the app l i­

cation program's context. When a resource man­

ager receives i ts first operation on behalf of a transaction, T, i t must join T, meani ng that it must tell a transact ion manager that i t is a subordinate for T. AJ ternatively, the DECdta architecture sup­

ports a model in which a resource manager may ask to be j o ined automatically to all transactions man­

aged by its transaction manager, rather than asking to join each transaction separately.

A t ransact ion , T, spreads from one node, Node 1, to another node, Node 2 , by send i ng a message (through a commun ication manager) from an appl i­

cation p rogram that is executing T at Node 1 to an application p rogram at Node 2 . When T sends a message fro m Node 1 to Node 2 fo r the first time, the communication managers at Node 1 and Node 2 m ust perfor m branch registration. This fu nction may be performed automatica l l y by the commu nication managers. Or, it may be done man­

ually by the application program , which tell s t he comm unication managers at Node 1 and Node 2 that the transaction has spread to Node 2. In ei ther case, the result is as fol lows: the com m unication manager at Node 1 becomes the subord inate of the t ransaction manager at Node 1 for T and the supe­

rior of the com m u n ication manager at Node 2 for T; and the com munication manager at Node 2 becomes the superior of the transaction manager

1 6

at Node 2 fo r T. This arrangement allows the com­

mit protocol between transact ion managers to be propagated p roperly by com munication m anagers.

After the transaction is done with i ts applicat ion work, the appl ication p rogram that started transac­

t ion T may i nvoke an "end" operation at the home transaction manager to commit T. This causes the home transact ion manager to ask its su bord i nate resource managers and co m munication m anagers to try to co m m i t T. The t ransaction ma nager does this by using a two-phase commit p rotocol. The p rotocol ensures that ei ther all subord inate resource managers com m i t the transaction or they all abort the t ransaction.

In phase 1 , the home transaction manager asks its subordi nates for T to prepare T. A subord inate p repares T by doing what is necessary to guarantee that it can either com m i t T or abort T if asked to do so by its superior; this guarantee is valid even if i t fa ils i mmed iately after becom i ng p repared . To p repare T,

Each subordin ate for T recmsively propagates the p repare request to i ts subordinates for T

Each resource manager subordi nate writes a l l of T's updates to stable storage

Each resource manager and transaction manager subord i nate writes a prepare-record to sta ble storage

A subord i nate fo r T repl ies with a "yes " vote if and when i t bas completed its stable writes and a l l o f i t s subordinates for T have voted " yes" ; other­

wise, it votes " no.'' lf any subord inate for T does not acknowledge the request to prepare within the t imeout period, then the home transaction man­

ager aborts T; the effect is the same as issuing an abort operation.

In phase 2 , when the home transaction manager has received "yes" votes from all of its subordinates for T, i t decides to comm i t T. It writes a com m i t record for T t o stable sto rage a n d tells i t s subordi­

nates for T to com m i t T. Each subord i nate for T writes a com m i t record for T to stable storage and recursively p ropagates the com m i t request to i ts subord i n2.tes for T. A subord i nate for T rep I ies with an acknowledgment if and when i t has com m itted the transaction (in the case of a reso urce m anager subord inate) and has received acknowledgments from all subord inates for T. When the home trans­

action manager receives acknowledgments fro m a l l o f i t s subordi nates fo r T, the transaction com m i t­

ment is complete.

v'!JI. j No. J Winter I':J'JJ Digital Technical journal

(19)

To re cove r from a f a ilu re, all res ource manage rs that part icipated in a trans action mu st exa m i ne the i r logs on s table s torage to de te rm i ne w hat to do. If the log contains a commit or abort record for T, t he n T comple ted. No act ion is requ i red. If the log conta i ns no p rep are , com m it, or abort record for T, the n T w as act ive. T mus t be aborted. If t he l og con t a i ns a p repare record for T, bur no com­

m i t or abort re cord for T, T w as betwee n phases I and 2. The res ou rce manage r mus t ask i ts superior transaction manag e r w hether to commi t or abort t he trans act ion.

An i nhcrenr pro blem in aU two-phase comm i t proto cols is that a resource manager is blocked between phases I and 2. that is, after vot i ng "yes"

and before receivi ng the com m i t or abort decision.

It cannot com m i t or abort the transaction u nt i l t he trans action m anage r tel ls i t w h ich to do. I f i ts trans­

action m anage r fa i ls, t he res ou rce manag e r may be block ed i ndef i n i tel y, u n til e i t he r the t ransaction manage r re cove rs or an ex te rnal age nt, such as a system ma nage r, s teps i n to tel l t he re sou rce man­

age r w he t he r to co m m i t or abo rt.

A trans action T may s pontane ousl y a bort due to syste m e rrors at any rime du ring i rs execu t i o n. O r, an appl ication p rog ram (p rior to completing its work ) o r a res ource manage r (p rior t o vo t i ng "yes") may tell i ts trans ac t io n manager to abort T. In e i the r case , the t ransaction manager t hen tel l s a l l of i ts su bord i nates for T to undo t he e ffe ct s of T's res ource manage r operations. S u bord i nate re source manage rs abort T, and su bord i nate com­

mun ication managers recursivel y prop ag ate the abort re ques t to the i r su bord i nates fo r T.

The two-phase commit p ro to col is opt i m ized for t hose cases i n w h ich t he nu mber of messag es exchanged can be red uced below that of the g e n­

eral case (e.g. , if the re is onl y o ne su bord i nate res ou rce manage r. if a resource manag e r d id not mod i fy res ou rces, or if the presu med-abort proto­

col was us ed to s ave acknowledgments)."

Summary

We have presented an ove rview of the DECdta archi tecture. As p a rt of this overview, we i n t ro­ duced the components and expla ined t he fu nction of each i ntcrface. We als o d es cribed tile D ECd ta trans act ion manag eme n t an:hi recrure in some dera i l. Ove r t i me, many i nte rf aces of the DECd ta model w ill be m ade pu bl ic via prod uct offerings or archi tecture pu b! ications .

Digital Teclmical jounwl l'ol . . > .\'u. I Winter I')<) I

Acknowledgments

T his architecture g rew f rom dis cu ssions w i t h many col leagues. We thank them a l l for their help, espe­

cially D ieter G awl ick, B ill La i ng , Dave Lomet, Bru ce M an n , Barry Ru binson, Diogenes Torres, and the TP archi tecture g roup , i nclud i ng Edw ard B ragi nsky, Tony Del laFera, George Gaj nak, Per G y l lstrom, and Yoav Raz.

References

1 . T. Speer and M . Storm, " D igital's Transaction Process ing Monitors," Digital Technical journal, vol . 3. no. I (Win ter 1991 , this issu e): 18-32.

2. W L1 ing, J. johnson, and R. Landau, "Transaction M anag ement Support in the VMS Ope rati ng System Ke rnel," Digital Technical journal, vol . 3, no. 1 (Winter 1991 , this issue): :B-44.

3. P B ernste i n , V Hadzilacos, and N. G o od man,

Concurrency Control and Recouery in Database Systems ( Re ad ing, MA: Add is on-Wes le y, 1987).

4 . P Bernste i n , M. H su , and B. Mann, " I mplement ­ i ng Recovera ble Re q ues ts Us i ng Q ueues,"

Proceedings 1 990 ACM StG/viOD Conference on Management of Data (May 1990).

5. FIMS journal of Developrnent (Norfo l k, VA:

CODASYL FIMS Committee, Ju l y 1990).

6. C. Mohan, B. Linds ay, and R. O bermarck,

"Trans action M anage ment i n t he R* D istribu ted D atabase M anag ement System," ACM Trans­

actions on Database .�vstems, vo l. 1 1 , no. 4 (Dece mber 1986)

1.7

Referenzen

ÄHNLICHE DOKUMENTE

tage of this fact. Datagrams are used to log events and other commun ications, such as from DECnet nodes, that control the Joss of datagrams i n other ways. In the

anism uses one global AIJ lock and one local AIJ l ock per node. The global AIJ lock is used to single-thread access ro t he AI) file. There is one global AIJ J ock

formance of load and store operations. Because unity stride load and store instruct ions are the most efficient memory access instruct ions. whenever possible, data should

ine the distributed clocking scbeme ami present the methods used in the design of the optical link (methods later adopted by the Physical Layer 1Yiedium Dependent

For a receive, the number of instructions required was about 160, consisting of 5 instructions for work clone in the scheduler to determine initial receive context, 40

Meeti ng this goal wou ld enable t he prototype system hardware and software development teams to meet their sched u les and wou ld a l low more intensive logical

Following the body of the paper, the Appendix reviews the basic ideas and terminology of computer graphics and image processing, including digital images and geometry-based

metric multiprocessing implementation of the DEC OSF/ I operating s�'stem. on the optimization of mathematical subroutine libraries for the Alpha architecture. ami on