• Keine Ergebnisse gefunden

Concurrency Control in Integrated Information Systems: A Survey of Problems

N/A
N/A
Protected

Academic year: 2022

Aktie "Concurrency Control in Integrated Information Systems: A Survey of Problems"

Copied!
22
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

deposit_hagen

Publikationsserver der Universitätsbibliothek

Mathematik und

Informatik

Informatik-Berichte 33 – 04/1983

Concurrency Control in Integrated Information Systems

A Survey of Problems

(2)

Abstract A main systems support

Systems: a Survey of Problems

U. Prädel G.Schlageter

Fernuniversität

Praktische Informatik I Postfach 940

D-5800 Hagen West Germany

stream of research in the area of database is the extension of these systems to non-classical applications like CAD, information-retrieval or office information systems. The usage of these systems is different in same essential aspects fram classical database systems. It turns out that these different usage patterns have direct cansequences for concurrency control. This paper gives a summary of the research problems tobe solved for these integrated informa- tion systems with respect to concurrency control and recovery. It is not the intention of the paper to present special salutions, but to characterize the open issues.

This work was supported by IBM Germany.

(3)

1. Introduction

The extension of database systems to support non-conventio- nal applications has been discussed since a few years.

Special points of interest in this general context are databases for scientific applications (e.g. CAD) /Kale82/, /GuStBZ/,

/Tslo81/

functions

/HaRe83/, databases in office information systems and databases integrating information-retrieval /Sche80/, /Tagg82/. Work has been done from several points of view, and some projects are under way.

In this paper we discuss an area which is very open so far:

namely concurrency control in integrated systems of the type outlined above. The focus of the discussion is on systems integrating database and information retrieval functions, however the problems arising in other types of integrated systems are included.

Integrated systems are, for a good deal, used in a different way from database systems, especially with respect to interactivity, to the duration of transactions, to the type of data to be maintained,· and to the expectations of the user in case of failures. These different usage patterns have direct influence on concurrency control and on recovery management, conventional mechanisms are not sufficient and, furthermore, are not easily extendable to satisfy the new requirements.

In this paper we wish to give a concise summary of the problems to be solved. The emphasis is to give a survey of the problems, not a discussion of possible solutions (though we indicate some possibilities). Considering the literature and the few projects in the area it seems necessary to survey the concurrency control aspects, as not even the problem is clearly seen and understood in its entirety.

(4)

2. Transaction Types

In a DBIRS we can distinguish three types of transactions:

- database transactions

- information retrieval transactions - integrated transactions

Database transactions in today's database systems can be characterized as follows:

- the data are structured and have fixed or limited length.

- the number of objects (records) transaction usually is small.

accessed by a

- transactions are short (in the order of seconds).

transactions require level 3 consistency /EGLT76/, concurrency control has to guarantee serializabi- lity of parallel transactions.

special types of transactions may only require level l consistency, e. g. "quick scans" /ABCGBO/, however these are exceptions and, in fact, this possibility is used rarely.

- the interaction of a user consists in starting a transaction and reading its results.

Information retrieval systems da not know the notion of a transaction. A typical application, comparable to a transac- tion in a database system, may be characterized as follows:

- information retrieval transactions are pure read transactions.

(5)

- the number of objects accessed by the transaction tends tobe very high.

- transactions may be batch-applications as well as online-applications.

transactions run an different levels of consi- stency. For batch retrieval parallel insertions of new objects may not be a problem (level l consi- stency), whereas , e.g., in a medical information system level 3 consistency may be mandatory in some cases.

New types of Transactions: DBIRS-Transactions

In a DBIRS we have to envisage all types of hybrid forms of database and information retrieval transactions, some of which exhibit inherently new features, especially from the point of view of the concurrency control.

As a first example consider a travelling agency application.

A user may "navigate'' through the data about hotels, looking for some interesting place. After this lengthy phase of decision making he will try to book a certain hotel. In contrast to the browsing phase, the update phase of this type of transaction is very short. The set of data accessed during the browsing phase is large, only a small fraction of these data is updated.

Another form of new transactions results directly from the integration of text and data in DBIRS: The user may perform lengthy update operations in an interactive way an a small set of database objects, like for example in CAD or office information systems. This type of transaction will be considered in more detail in chapter 4.

What are the additional features of integrated transactions, in the following called DBIRS-transactions?

(6)

- a user transaction may consist of several subtran- sactions, which are in general independent one from each other.

the user works with the system in an interactive way.

- the user not only sees tuples and relations, but also higher structures ("complex objects").

- work with objects of a DBIRS may last several terminal sessions or "transactions'', before these objects reach a consistent state.

Proposition 1:

DBIRS-transactions are not sufficiently supported by today's concurrency control mechanisms.

Though concurrency control is well understood in database systems, the mechanisms are not easily generalizable to support information retrieval transactions, let alone DBIRS-transactions.

For information retrieval transactions the obvious problems are, that

- retrieval may take place on different levels of consistency.

- the set of data accessed by a transaction is large.

- the transactions have a long duration (as compared to database transactions).

Conventional concurrency control mechanisms are supposed to show performance problems for this type of transactions /Gray81/. However, quantitative data about the behaviour of concurrency control mechanisms in the presence of informa- tion retrieval transactions are not available so far.

(7)

Experimental and simulation work is under progress, first results indicate, that the performance problems can become quite severe /PrSc83/. Investigations about this issue must consider optimistic concurrency control mechanisms which were supposed tobe superior to locking in retrieval-domi- nant systems /KuRo79/, /UnPS82/.

Proposition 2:

DBIRS-transactions cannot be subject to strict two-phase locking.

As information retrieval oriented transactions already pose severe problems, it is clear that general DBIRS-transactions require modified or new approaches to concurrency control.

For instance, locking all objects accessed during the browsing phase would be prohibitively costly and would degrade parallelism to an unacceptable level. The question is how the depart from two-phase locking might look like, and how consistency can then be preserved.

Proposition 3:

DBIRS-transactions may not be transac- tions in the classical sense. The notion of a transaction is too restrictive in DBIRS.

This is a conclusion of the above discussion. "Transactions"

da not need tobe atomic, as the examples show, and there- fore are not necessarily the unit of recovery. These points will be addressed again in the next sections of the paper.

(8)

3. Nested Transactions

A nested transaction is a transaction, which consists of a series of logically meaningful valid state transitions, called subtransactions. Though each subtransaction is self-contained, the user only considers the whole transac- tion as his unit of work. The notion of a nested transaction was first discussed by Gray /Gray81/, though in a more narrow sense than we do here.

A typical example of a nested transaction can be taken from a travelling agency. Booking a journey may happen as follows:

1. The user searches for a suitable flight with available places.

2. He books the flight.

3. The user considers the hotels at his destination.

4. He books a hotel.

5. The user considers additional offers relevant to him, e.g. car rentals, excursions, sight seeing trips etc.

6. He books additional offers.

7. The user terminates his work.

Proposition 4:

Nested Transactions require an extension of the concept of a transaction.

This proposition is obvious from the properties of nested transactions as seen by the user:

- the transaction consists of subtransactions (which form transactions in the classical sense).

- the result of a subtransaction may require to

(9)

adjust or to cancel a former subtransaction.

the characteristics stency, atomicity, dispersed upon the

of a transaction, i.e. consi- durability and isolation are whole (global) transaction and the subtransactions. A subtransaction is atomic and locally consi·stent, the global transaction is durable and globally consistent. Locally consistent means that the underlying state transition is consistent with respect to all integrity asser- tions. Global consistency derives from the user's view only, in the example: the whole series of bookings.

Concurrency transaction would be:

Control like a

might, of conventional

course, treat transaction;

a nested the effect

- lang blocking of the data accessed by the transac- tion.

- easy undo of subtransactions.

- easy repeat of subtransactions.

In general, locking of objects for periods of this length will not be acceptable (as is obvious in the above example).

As another extreme solution one might consider a subtransac- tion as a transaction for concurrency control. This, however, does not correspond to the user's view of what the real transaction is, and serious problems may result from the fact, that each subtransaction commits irrespective of the ongoing work within the same (global) transaction.

It is not clear, how a general solution should look like. A situation not unusual is that subtransactions within global transactions da not intersect in the following sense: the read sets of the subtransactions do not intersect with the write sets of

not intersect.

other subtransactions and the write sets do In this case a possible solution might be:

(10)

the (global) transaction is the unit of consi- stency.

the termination of a subtransaction results in a transaction-checkpoint:

- any the

• the read-locks of the subtransaction are released

. provisions for a fast undo/repeat of the subtransaction are established

subtransaction may be undone (and repeated) by user. He may also repeat the transaction from some selected subtransaction.

This approach has the advantage of keeping only those locks of subtransactions which are necessary for consistency.

We believe that the issue of nested transactions and its influence on concurrency control is only poorly understood and more reasearch has tobe performed.

4. Complex objects

In a DBIRS the user will deal with various "complex objects"

(The notion seems tobe used first in /Halo82/ in a restric- ted way). A typical example for a complex object is a book:

- it contains formatted data like author-names, year of edition, etc. and non-formatted data like text, glossary etc.

- several users may work simultaneously with the same book (writing, reading, correcting).

- the users accessing the book work interactively and in lang sessions.

There has been some discussion in the literature about how to represent complex objects in database systems. For

(11)

relational databases two approaches have been proposed:

- representation within the frame of normalized relations /Halo82/, /Macl81/.

- representation as non-normalized relations /Sche80/, /FrWW82/.

In the normalized presentation complex objects are mapped to first-normal-form relations by the use of artificial attributes like paragraph-numbers, sentence-numbers etc.

Standard database query languages can be used to process the data. The main disadvantages are the limited and rigid structuring possibilities and the unnatural way of expres- sing queries.

In the non-normalized approach first-normal-form is abando- ned. For instance an attribute of type text can again consist of sets or vectors of attributes, like chapters.

Thus, a complex object can be represented as one single tupel. However, new operations for the manipulation of non-first-normal-form relations must be offered /JaSc82/, /ScPi82/.

Proposition 5:

Complex objects require new concepts for transaction management. Communication of the concurrency control with the user seems tobe necessary.

As to concurrency control the following observations concerning complex objects are important:

- it consists of structured and unstructured data.

- several users may wish to work simultaneously with a complex object.

- the conventional notion of a transaction /Gray79/

is not applicable. For instance, the creation of a

(12)

complex object may require sessions, i.e. consistency related to the concept

a series of terminal of the object is not of a transaction.

the user in general knows the object which he wishes to work with. Lang retrieval phases are the exception.

work with a complex object often is interactive.

- some users may be interested in the object as a whole, whereas others only require access to parts

0 f i t.

Proposition 6:

Consistency is not related to "transac- tions" or terminal sessions. Therefore locks an complex objects are not related to transactions but to users.

The user usually knows which object he wishes to process.

Thus, an this global level a preclaim strategy is applicab- le. However, the global lock an a complex object must not be related to a transaction, but must be set and released explicitly by the user (hence, must survive "transactions"

or terminal sessions). The reason for this is the above- mentioned fact that a consistent state of a complex object may be reached after a series of terminal sessions only. The user himself (perhaps by the assistance of special software) decides when an object is consistent.

The following aspects are essential for a preclaim strategy for complex objects :

- no deadlocks possible.

- waiting mechanisms are not applicable. The user must be informed if an object is not available for him.

(13)

there must be provisions to cope with locks users forgot to release.

- there must be lock modes which allow concurrent work an different parts of an object.

- locks must survive crashes of any kind.

Proposition 7:

Complex objects may necessitate dynamic lock hierachies.

Concurrency Control must allow users to work simultaneously an different parts of a complex object. As a complex object can be considered as a hierachical structure, it seems obvious to use hierarchical locking protocols. However, the situation can become more complex than in the situations considered for hierarchical locking so far: objects of the same type may have different structure, especially the height of the hierarchy may be dynamic. Hierarchical locking protocols must be extended or reformulated to cope with this feature of complex objects.

Of course, the above problem depends to some extent from the way complex objects are presented to the user: by normalized or by non-normalized relations, for example.

5. Versions

Proposition 8:

Conventional transaction management is not prepared to handle versions visible to the user.

It is clear that in a DBIRS the notion of versions is essential. Consider three examples:

(14)

- in CAD databases the concept of version is vital, and there are very special requirements in this respect /Kale82/,/LeHo82/.

- in juridical information systems different versions of laws, decrees etc. must be maintained.

- in the commercial sector versions are a standard requirement.

The above concept of version is explicit from the paint af view of the user. Two other concepts of versions have been discussed in the database literature:

- versions related to time /Bjor75/, /Reed78/.

Each object is assigned a "validity interval".

Whenever an object is updated, the interval of the current object is terminated and the interval of the "new" object is started. This mechanism is proposed for concurrency control (Reed) and for recovery (Bjork). According to Reed the user should have the possibility to start a transaction with a certain "time of validity".

implicit versions /Selo76/, /Lori77/, /BaHRBO/.

In this case versions da not exist in the world of the user, but are maintained and used by the system for internal purposes. Implicit versions are used for concurrency control and for recovery.

These different concepts of versions offer different functions to the user:

application-controlled version mechanism:

. a new version is created by the user . . updates to versions are possible or not . . an application may access different

versions of the same object.

- time-controlled version mechanism:

. every update produces a new version.

(15)

. an application may read versions of the same object . . a version cannot be modified.

implicit version mechanism:

different

This version mechanism is invisible for the user.

Proposition 9:

It seems tobe hard to define (the kernel of) a general version manager.

The concept of a version is not exactly the same in diffe- rent application areas, and it is therefore not obvious to

\1/hat extent generalized examples:

versions can or should be DBIRS (see also /Halo82/).

supported by a Consider three

the version concept in CAD is rather specialized and quite complex, for instance, it includes alternative versions (see, e.g. /Kale82/).

the much

version concept simpler. An old

in project documentation version of a report

is is possibly used as a source of additional information and for purposes of control.

- in office automation systems a document (or a form) passes through a series of states, \1/here each ne\1/

version may be visible for different user groups.

Consider, concerning management cantrol.

for instance, business reports or papers strategic planning. Here, version has streng connections to access

We believe that the issue of version management is essential in the DBIRS discussion, but that \1/e are far from a general understanding of the problem.

(16)

Proposition 10:

It is an open question how version management and concurrency control are interrelated.

In the time-controlled version concept there is a complete integration of version management and concurrency control /Reed78/. However, the approach has several drawbacks for the application level. For instance, a transaction cannot process several versions of an object. Also, assigning timestamps to records may raise problems, i f complex objects are introduced. Any small update create a new version of the whole object.

For application controlled versions the issue is more complicated. There are different possibilities:

- concurrency control does not know about versions, it knows objects only. The versions (and all related

generating level.

consistency constraints for version etc.) are maintained at the application

- the DBIRS has a version manager. This module may have to interact with the concurrency control, for instance in order to guarantee a consistent view an versions of a complex object. The type of interac- tion required is not clear at present, it is not clear, for example, whether the concurrency control may kept devoid of any knowledge about versions (this would result in a highly flexible system).

- concurrency control and version management are integrated, i.e. concurrency control also controls versions. With respect to relationships between versions (hierarchies of versions) and complex objects new techniques are required to maintain certain types of consistency constraints (see also

(17)

/LeHo82/).

It has been proposed to use versions for concurrency control in order to improve the degree of parallelism in the system or to facilitate recovery. While these possibilities may be useful for time-controlled version mechanisms in many application-controlled applications,

versions da

we not

believe that lend themselves reasons for this are:

for this purpose. Same

- to reach a uniform concurrency control, for all objects versions have tobe maintained. There is no need for this in all applications.

- the actual realization of the version-management differs from application to application. This will affect concurrency control, too.

- there are many applications where old versions cannot substitute current versions, even i f the view on the old versions is perfectly consistent.

Presenting a consistent but "old" view to the user might be fatal in some situations.

6. Effect of integration on recovery

Proposition 11:

Current recovery techniques are not sufficient for some applications of DBIRS.

Recovery techniques are an essential part of database technology /Verh78/. In information retrieval systems recovery is not a central issue because of the usage of these systems (no parallel update and read).

As to DBIRS, the following characteristics have direct consequences for recovery:

(18)

- ''In-transaction" abort must be supported (undo of operations on behalf of the user during interactive sessions).

- data objects (records) may become much larger than in database

like before- applicable in

systems. Standard recovery techniques and after-images are not directly this case.

- the large portion of lang transactions may compli- cate the maintenance of checkpoints/recoverypoints.

application controlled versions must be considered by the recovery system.

- a perfectly new requirement arises due to interac- tive transactions: conventional rollback of a traniaction in case of failures is not always suitable. An example: a user who has inserted large parts of a complex object into the database is certainly not willing to accept a rollback of his transaction in the case of a system crash. He wishes to find his object just as it was before the crash occurred, i.e. recovery should enable the user to resume the work within his transaction at a point of time as near as possible to the crash point.

It seems to be clear that the above problems cannot be solved in the framework of current recovery techniques in database systems. Expecially the new restart requirement for interactive transactions and logging in the presence of large objects are open problems.

(19)

7. Textindexes

Proposition 12:

Secondary data may become bottlenecks in DBIRS.

In DBIRS secondary data may comprise large indexes for text.

An update, insertion or deletion of a record may result in extensive modifications of these text indexes. This may have a very negative impact an the degree of parallelism of transactions. In /ScheBO/ this problem is discussed, and a solution

is not

based on deferred update is proposed. However, it clear whether this type of a solution would not produce a new bottleneck, namely the queue of secondary data which are still tobe modified or inserted.

The necessary influence the For instance, /KuRo79/, may

modifications of the secondary data may choice of the concurrency control mechanism.

optimistic concurrency control mechanisms show serious weaknesses due to text indexes, as these may lengthen a transaction and may increase its risk of being backed up and restarted.

8. Conclusion

The paper has surveyed the open problems with respect to concurrency control in integrated information systems. It turns out that some new aspects have great impact on concurrency control and recovery, such that not only extensions of current mechanisms are required, but that some problems require basic research tobe started. Same of the problems addressed (e.g. versions) are hardly understood so far.

(20)

9. References

/Adli80/ Adiba,M.E., Lindsay,B.G. "Database Snapshots", Proc. 6th VLDB , Montreal 1980

/ABCG80/ Astrahan,M.M., Blasgen,M.W., Chamberlain,D.D., Gray,J.N. : "A History and Evaluation of System R", IBM Research Report RJ 2843, Dec. 1980

/BaHR80/ Bayer, R., Heller, H., Reiser, A. : "Parallelism and Recovery in Database Systems", ACM ToDS , Vol.5 , No. 2 , June 1980

/Bjor75/ Bjork,L.A., "Generalized audit trail requirements and concepts for data base applications", IBM Systems Journal, No.3, 1975

/EGLT76/ Eswaran,K.P., Gray,J.N., Lorie,R.A., Traiger,I.L.,

"The Nations of Consistency and Predicate Lacks in a Database System", Comm. of the ACM, Val. 9, No.

11, Nov. 1976 /FrWW82/ Freitag,J.,

Attribute

Werner,H., Wilkes,W. , "Strukturierte in Relationen zur Unterstützung von IR-Anwendungen", Proc. Annual Conference of GI , 1982 (in German)

/Gray79/ Gray,J., "Notes an Data Base Operating Systems", in: "R. Bayer e.a. (ed.): "Operating Systems: An Advanced Course", Springer Verlag, Berlin, 1979 /Gray81/ Gray,J., "The Transaction Concept: Virtues and

Limitations", Proc. 7th VLDB, 1981

/GuSt82/ Guttman,A., Stonebraker,M., "Using a Relational Database System for Computer Aided Design Data", IEEE Database Engeneering Bulletin, Vol.5, No.2, June 1982

(21)

/Halo82/ Haskin,R.L., tions of a SIGMOD 82 ,

Lorie,R.A., Relational Orlando 1982

"On Extending the Func- Database System", Proc.

/HäRe83/ Härder, T., Reuter, A., "Database Non-Standard Applications", Proc.

Symposium 1983

Systems for Int. Computing

/JaSc82/ Jaeschke,G., Schek,H.-J., "Remarks on the Algebra of Non First Normal Form Relations", Proc. ACM SIGACT-SIGNOD Symposium an Principles of Data Base Systems, March 1982

/Kale82/ Katz,R.H., Lehman, T .J., "Storage Structures for Versions and Alternatives", Techn. Report No. 479, University of Wisconsin, Madison, Computer Science Department, July 1982

/KaWe83/ Katz,R.H., Weiss,S., "Transaction Management for Design Databases", Techn. Report No. 496, Univer- sity of Wisconsin, Madison, Computer Science Department, Febr. 1983

/KuRo79/ Kung,H.T., Robinson,J.T., "On Optimistic Methods for Concurrency Control", Proc. 5th VLDB, Rio de Janeiro 1979

/LeYe79/ Lee,S., Yeh,R. T., "Structural Locking for Concur- rency Control in Data Base Systems", Proc. COMPSAC 79

/LeHo82/ Lehmann, C., Hornung, T., "Consistency and Transac- tions in CAD Database", Proc. 8th VLDB, Mexico City 1982

/Lori77/ Lorie,R.A., "Physical Integrity in a Large Segmen- ted Database", ACM ToDS,Vol. 2, No. 1, 1977 /Macl81/ Macleod,I.A., "The Relational Model as a Basis for Document Retrieval Systems Design", Information Systems, Val. 24, No. 4, 1981

(22)

/PrSc83/ Prädel,U., Schlageter,G., "The Bahaviour of Database Concurrency Control Mechanisms in Integra- ted Information Systems", Research Report, Univer- sity of Hagen, 1983 (tobe submitted for publica- tion)

/Reed78/ Reed,D.P., "Naming and Synchronisation in a System", PH.D. Thesis, decentralized Computer

M.I.T. 1979

/Sche80/ Schek,H.-J., "Methods for the Administration of

/ScPi82/

/Selo76/

Textual Data in Data Base Systems", Proc. Research and Development in Information Retrieval, Cambrid- ge, 1980

Schek,H.-J., Pistor,P., "Data Structures for an Integrated Data Base Management and Information Retrieval System", Proc. VLDB 82, Mexico City 1982

"Differential Files:

Severance,D.G., Lohman,G.M.,

Their Application to the Maintenance of Large Val. 1, NO. 3, 1976 Databases", ACM ToDS

/Tagg82/ Tagg,R.M., "Bibliographie and commercial databases- contrasting approaches for data management with special respect to DBMS", Program, Val. 16, No. 4, Oct. 1982

/Tslo81/ Tsichritzis,D., Lochovsky,F., Systems: Challenge for the

"Office Information 80's" in: Tsichrit- zis,D.: "Omega Alpha", Techn. Rep. CSRG-127, Univ.

of Toronto, March 1981

/UnPS82/ Unland,R., Prädel,U., Schlageter,G., "Design Alternatives for optimistic Concurrency Control Schemes", Research Report No. 30, University of Hagen, Jan. 1983 (submitted for publication) /Verh78/ Verho(stad,J.S.M.,

Database Systems", No. 2, 1978

"Recovery Techniques for ACM Computer Surveys, Val. 10,

Referenzen

ÄHNLICHE DOKUMENTE

Aside from all this, the Summit adopted a Strategic Agenda for the years to come, agreed to new strategic guidelines for the Area of Freedom, Security and Justice, postponed

In the policy delphi and the dialectical debate, the stakeholders are individually involved and not exposed to real group interactions, whereas in the focus group and the

The integration of the planning, scheduling, production and process control functions greatly influence the efficiency of tIle steel works by increasing yield, reducing equipment

The hardware of all the integrated systems control we have already studied has a structure similar to these higher levels. Such big modern computers as an IBM-370, AEG-60-50 and

In this study the effect of three-dimensional strain fields on the strain measurement using NBED was examined. Since there are to date no studies on this topic, this

It discusses the impact of age, educational attainment and participation in adult learning activities on proficiency in literacy, numeracy and problem- solving skills, as measured

and is formulated as: Determine the capabilities of IM departments in German hospitals with respect to (D1) the CIO’s position in the hospital management hierarchy, (D2)

London, S. instant, with reference to the grant of landing rights at Mauritius, Rodrigues and the Cocos-Keeling.. The Minutes of the last meeting were read and