• Keine Ergebnisse gefunden

The application of cryptography for data base security

Im Dokument JEFFERSON who (Seite 113-121)

by EHUD GUDES and HARVEY S. KOCH

The Ohio State University Columbus, Ohio

and

FRED A. STAHL

Columbia University New York, New York

ABSTRACT

The application of cryptographic transformations for the purpose of enhancing the security in data base systems is discussed. These transformations have been recognized in the past as a valuable protection mecha-nism but their relation to data base security has not been identified. The maj or reason is the lack of a suit-able data base model for investigating the questions of security and cryptography. A multi-level model of a data base is presented in this paper. This model helps to understand the connection between the data base structure and the cryptographic transformations ap-plied to the data base. It is shown that cryptographic transformations can be applied behveen the different levels of the data base. Several types of these trans-formations are identified and the possible ways of using and controlling them are also discussed. The multi-level model can provide a useful framework for further research in the area of cryptography and data base security.

INTRODUCTION

The technical problems associated with providing pro-tection for information in a shared computer environ-ment have received considerable attention in recent years. Petersen and Turn discussed a broad spectrum of these problems in their article on "System Implica-tions of Information Privacy."14 Subsequently, there have been numerous endeavors attempting to amplify and find reasonable solutions to these problems.6,7,9,ll,12

Concurrent with the developments in the area of protection has been the very important research in data base systems. This is due principally to the grow-ing size and complexity of existgrow-ing data bases. Re-search in data security and reRe-search in data base systems have only been combined recently. Most data security models/,ll use a unified approach in which all

97

components of the computer system (i.e., objects) are viewed as being on the same level. A data base system is viewed as much more complex than the traditional file system and therefore its security problems are more complicated. In a data base, protection may be required from the file and record level down to the field level. User protection requirements of a data base are more complex than protection of a full file (or seg-ment) and complex protection specifications based on boolean expressions may be neededo A unified approach is, therefore, not suitable for dealing with the security problems in data base systems. Our view is that the security problems of data base systems have to be dealt with separately from the security problems of operat-ing systems. (A similar view is held by Minskyy) However, since a data base system uses the operating system services, the two problems cannot be completely disconnected.

Since we are interested in data base systems, our assumption is that the hardware and the operating system are correct and secure. This is not as strong an assumption as it may seem at first glance. Since we have removed the problems of files and data bases from the operating system domain, the operating system becomes smaller and easier to verify. Clearly, the com-plexity of the protection problem in the data base system increases.

In this paper we concentrate on the security prob-lems of data base systems, We develop a model of a data base which allows the incorporation of several protection mechanisms. The goal is to define a struc-tural model of a data base in which known protection mechanisms can be applied, and their dependency on the structure of the data base can be understood. In this paper we are interested in one protection mecha-nism called cryptographic transformations. The value of these transformations as a means of protection has been well established14 and some research was done on their application in file systems.19 This model will help us to understand where and how to apply these

trans-formations, and how these transformations can be used in conjunction with other protection mechanisms, in a data base system.

CRYPTOGRAPHY AND DATA BASES

Cryptographic transformations have been recog-nized long ago to be an effective protection mechanism in communication systems. In the past they have been used mainly to protect information which is trans-ferred through communication lines. However, as Peterson and Turnu pointed out, they can be used as an effective counter-measure against some of the threats to security which exist in computer systems.

Among them are: wiretapping, between lines entry, browsing files, physical acquisition of removable files.

Although several other articles mention their existence and suggest their application in computer systems (Skatrud,18 Van TassePO), not much (public) research has been done in this area. In particular, there is no research on the problems of how to apply crypto-graphic transformations to permanent sharable data bases as opposed to their application to information transferred through communication lines. We call the first form of cryptography data base cryptography as opposed to the more popular form of communication cryptography.

There are large differences between the constraints put on communication cryptography (see Shannon15 ) and between those put on data base cryptography.

Turn19 enumerated some of these differences. The ma-jor ones are: (a) The problem of selective retrievalr-because files are usually organized so that selective retrieval of records can be achieved, it is very desirable that enciphering (deciphering) of record i will not depend on another record j. This constraint prevents the use of the popular Vern am Cipher using a pseudo-random number generator of a very large period. Such a generator would be usually used for enciphering large quantities of data (e.g., the whole file) and would have to include more than one record in the encipher-ing process.

*

(b) The long "life" of the data-data in data bases usually resides there for relatively long periods. Therefore, the very popular method of chang-ing the cryptographic keys very often cannot be process the "clear" data. The reasons for this are that the system is more secure if only ciphered data is pro-cessed and the overhead of enciphering/deciphering every time we access the data is saved. We would like

':' Of course, VERN AM cipher can be applied to each record separately, probably using different "seeds" for the pseudo-random number generator, but then its security is far from a

"one time" cipher.

therefore to design "processable" ciphers. Examples of such ciphers will be given later. It can also be shown that in the case of a "processable" cipher, applying the cryptographic transformation on the data item level only, is not secure enough. Given the constraints above it is clear that the subject of data base cryptography

i~

strongly connected to the subject of data base organiza-tion, representation and accessing. None of the cur-rent models of data base systems addresses this prob-lem directly. Furthermore, current data base models are not suitable for answering questions related to data base cryptography.

Very few models of security in data bases mention the use of cryptographic transformations. The CODASYL3 model provides the ENCODING/DECOD-ING clause on the data item level. However, applica-tion on the data item level only may not be secure enough. The connection to other protection mecha-nisms is not clear. Hoffman~ mentions the SCRAM-BLE/UNSCRAMBLE procedures, but does not give any details of their use.

We are then faced with the following questions:

(a) To which level should the cryptographic transfor-mations belong? To the physical structure, to the logi-cal structure, or to the mapping between them? (b) Should the Data Base Administrator (DBA) have complete control on the cryptographic transforma-tions? Similarly, should the keys for these transfor-mations be part of the system, e.g., in its Data Defini-tion Language (DDL), or should only the appropriate user have some of the cryptographic keys? (c) Should cryptographic transformations preserve or destroy the structure of the data base and what are the advantages and disadvantages of each case? (d) What is the re-lation with other protection mechanisms? Should they complement each other and how should it be done? The main goal of this paper is to answer the questions above. However, in order to answer them, we need a framework-a data base model in which the security problems and their relation to the data base structure are clearly identified. Such a model will be developed.

Before we develop this model, we want to review some of the basic concepts in data security.

BASIC CONCEPTS

Looking at the literature on data security, we find some degree of confusion about the basic concepts. We will use the terms security, protection and access con-trol interchangeably and assign them the following meaning by McCauley,I2 "The process of determining the authorized users of the data base and of determin-ing which access may be permitted and which would be denied." Graham and Denning7 made a very important distinction between the protection specification and the protection mechanism in computer systems. The protection specifications are the translation of man-agement privacy views into exact specifications. The

Application of Cryptography for Data Base Security 99

protection mechanism is the mechanism to execute correctly the protection specifications and to assure that any protection violation of these specifications will be detected.

In different systems there exist different protection mechanisms. Most of these mechanisms are composed of two parts: the protection procedure and the pro-tection data. The propro-tection data is not to be confused with the protection specification. The protection data is data which is internal to the protection mechanism, for example-passwords in the case of the password protection mechanism or tags in Friedman's mode1.5 The protection procedure is analogous to the program or procedure in programming systems and it is the coded fo~m of the protection mechanism algorithm.

An important example of such a two part protection mechanism are cryptographic transformations. In this case, the transformation algorithm is the protection procedure while the cryptographic keys are the pro-tection data. The analogy to programming systems can be carried further as follows: of OPEN. The protection data is the list of passwords.

The protection specification is the distribution of pass-words between users according to the privacy deci-sions: which user has access to which file. So pass-words here has a double role: as the protection data and as the way to express the protection specifications.

In the next section, we will use the concepts defined dis-cussion of these levels and the mapping between them is given in Sibley & TaylorY In the CODASYL model we can distinguish three levels: the Sub-Schema and Schema levels which define the logical structure and the storage level which defines the physical structure.

A four level model known as the entity-set model was suggested by Senko, et al. I Another four level model,

similar in concept to the one we suggest here, was sug-gested by Sibley.I6 We could have developed most of our notions in the framework of one of these models, however we preferred to develop our own terminology and structure for two major reasons: First it allows us to stress the security point of view which we are interested in. Secondly, our model differs from other models by its recognition of the existence of more than one physical level in a data base. In most data base models only one physical level is recognized and this is usually the secondary storage structure. However in most conventional systems (even with virtual mem-ory), data exists physically in more than one medium and this fact is very important from the security and cryptographic points of view. The main idea in this model is that a data base is composed of several logical or abstract levels which describe data which physically resides in one or more physical media and therefore have one or more physical structures.

The existence of these physical media is recognized

4. Storage level (also called structured storage level) Each of these levels can be composed of several sub-levels. Corresponding to each logical level there is a physical level which is connected to a physical medium.

We will now concentrate on the description of the logical levels and their relation to security.

The user-logical level corresponds to the way a user or a user group sees the data base. It is very similar to CODASYL's Sub-Schema with the exception that we do not have the constraint that the user-logical level must be a subset of the system logical level. On the contrary, it might be useful to have complex transfor-mations between the two levels. Usually there are several user-logical level structures in a data base. The system-logical level describes the whole logical struc-ture of the data base. It may correspond to CODASYL's Schema with the difference that indexes, directories, and access paths are not part of the system-logical level

(they are part of the Schema in CODASYL.) The access level describes the directories, indexes and all possible access paths in the data base. The storage level iR the result of applying the access level to a par-ticular physical secondary storage device (s) and scribes characteristics which are special for these de-vices. To each logical level corresponds one or more physical levels according to the number of physical media. An example is shown in Figure 1.

In this example the user logical level describes the data as it appears on the user site. The system logical level gives the interpretation to data which appears in memory. And the access and storage levels give the right interpretations to the data which reside on the secondary storage devices. It is conceivable that in the future the secondary storage hardware will do all the access calculations and therefore no access infor-mation has to be in memory or described in the system logical level. The main idea is the existence of more than one physical level corresponding to more than one physical media.

The relation of this model to security is discussed in Gudes.8 The model is shown to be general and to include other known data base security models5,12 as special cases. The main idea is the decentralization of protection mechanisms by the "spreading" of protec-tion specificaprotec-tions and mechanisms through the dif-ferent levels of the data base. In this paper we are interested with only one protection mechanism-cryp-tographic transformations. The model will be used as a framework for the application of cryptographic trans-formations in a data base. As is shown in the follow-ing sections, cryptographic transformations are a sub-set of the transformations between the physical levels of a data base, which is used for protection.

FORMALISM

In order to understand the relationship between the logical levels of a data base and their corresponding physical levels we need some notation. When one looks on a data base as it is represented on secondary stor-age one sees a sequence of O's and 1's. These binary digits make sense only when one knows the right structure, coding and interpretation of the data. One starts with the simple division to data items and then starts to build the more complex blocks of the struc-ture (repeating groups, records, files). The basic con-cepts, then, for describing a data base is the concept

; . : . . . : : -.: c:

1 Terrmnal or 1 .... - - ' M' t~' , _econcary ,Satellite Cornputer,---. 1 ,aln ,emory ,~--, Storace

I I I I 1 - ' _ _ _ - '

User- Lo9i cal SyS tern-Looi ca 1 Access and Storage -Loa i ca 1

Figure I-The fOUl" levels of a data base

of the data item. A data item has the following proper-ties:

interpretation (attribute) -what the data item is

length and what it is used

value for.

representation (coding) address

We denote data items as: di

A common case in data bases is that one data item contains one or more properties of other data items.

We give two examples:

Example 1 d1 d2

interpretation: Length of d2 name

value: 5 SMITH

In this case d1 contains the property: length of d2 •

Example 2 d1 d2

interpretation: units distance

value: 0 or 1 100

If d1 = 0 then dz is in miles If d1

=

1 then d2 is in kilometers

Hence d1 contains the interpretation of d2 • However we need also the interpretation of d1 in order to know what dz is. The interpretation of d1 is probably docu-mented in some manual which describes the system.

We see then that a set of data items may have several levels of interpretation. Some of them are in the data base itself and some of them are only implicit. Each physical level of the data is just a set of data items, where their interpretation is either in some of the data items themselves, or in the logical description of this level, or documented in some manual. More for-mally, a physical record j on level i is denoted as PRl where this physical record has the address Aj • This physical record is an ordered tuple of data items

PRjo= (dlij,d2ij" ... ,dnij ) The address of data item dki

j is determined by its rela-tive position and the length of all data items before it.

These lengths can themselves be other data items or part of the logical description of this physical record.

The order of data items within a physical record is then important for finding their addresses. (A physical record here is "continuous" by definition.) The physi-cal data base on level i is the set of physiphysi-cal records on this level

PB (i)

=

{PRiHPRi~, .... ,PR\u}

In reality only part of a physical data base on some level will exist at any time (Except the fourth level-the storage level-in which level-the whole physical data base exists.)

The definition of physical records is very simple be-cause we believe that the complex structure of the data base is connected to the interpretation we give to these data items. Most of this interpretation is in the logical levels of the data base. In order to describe these logical levels, we need to define more concepts.

Application of Cryptography for Data Base Security 101

Two data items are called simila1' if they have the same interpretation. A field is an abstract concept representing a set of similar data items. A field has no value but usually has a unique name or identifier. We denote field j on level i as Fij •

The notation dj'- F k means dj is a data item occur-rence of the field F k' A logical record is a set of fields with a unique name or identifier. The order of fields in a logical record is immaterial because each field can be identified by its name. Logical records on level i are denoted as: LR\.

What is the connection between logical records and physical records? Very simple. A physical record is an occurrence of a logical record and a data item is an oc-currence of a field!

The logical data base on level i is a set of logical records plus their interpretation which is contained

The logical data base on level i is a set of logical records plus their interpretation which is contained

Im Dokument JEFFERSON who (Seite 113-121)