• Keine Ergebnisse gefunden

10.S. The Expression Statement

10.11. Random File Access

Every datum written to or read from a file has a particular position in the file. File positions are designated by long integers; the first position in a file is OLe In a text file, one character is stored at each position; in a data file, one "storage unit" is stored at each position (except for files opened for PDF I/O, explained below). The size of storage units varies from system to system, but every MAINSAIL data type occupies an integral number of storage units. The number of bits in a storage unit is available to a program as the integer constant

"$bitsPerStorageUnit" .

For every data type in MAINSAIL, a corresponding integer "type code" is predefined. The names of the type codes appear in Table 1 0.11-1. The number of storage units occupied by a value of a given data may be found by calling the procedure "size"; for example, the number of storage units occupied by an integer is given by "size(integerCode)".

Storage units and data type sizes are explained in detail in Chapter 18.

The position on each "read" from or "write" to a file is incremented by the number of characters or storage units read or written. When a file is opened for sequential access, this is the only control you have over file positions. When a file is opened for random access, however, you

booleanCode integerCode longIntegerCode realCode

longRealCode bitsCode longBitsCode stringCode addressCode charadrCode pointerCode

Table 10.11-1. Names of the MAINSAil... Type Codes

may explicitly record or change the current file position, so that the next "read" or "write"

occurs at a specified place in the file.

The procedures "setPos", "getPos", and "relPos" are used to manipUlate file positions. Their declarations are shown in Table 10.11-2. setPos sets the current position of the file f to be n.

If, for some reason (e.g., the position specified is beyond the end of the file), it is not possible to set the position to be n, setPos returns false; otherwise it returns true. In addition, if it returns false and the bit errorOK is not set in the ctrlBits parameter, an error message is issued to 10gFile. getPos returns the current file position associated with f. "relPos(f,n)" is equivalent to

"setPos(f,getPos(t) + cvli(n))".

BOOLEAN PROCEDURE setPos (POINTER (file) f;

OPTIONAL LONG INTEGER n;

OPTIONAL BITS ctrlBits);

LONG INTEGER PROCEDURE getPos (POINTER(file) f ) i

BOOLEAN PROCEDURE relPos (POINTER(file) f; INTEGER n;

OPTIONAL BITS ctrlBits);

Table 10.11-2. Declarations of setPos, relPos, and getPos

Example 10.11-4 shows a program that maintains a symbol table or primitive database as a data file. Each record in the database has a string name and consists of a string that represents the

data in the record (also called the "value" of the record). The records may be created or examined by the user with commands read from cmdFile.

In order to speed lookup of the records within the file, each record name has associated with it an integer "hash code". Records with similar hash codes are stored on the same list (or in the same "hash bucket"); the whole data structure is called a "hash table". The purpose of a hash

~ table is to shorten searches for a given item through a data structure representing a set of items.

~ Instead of maintaining a single list through which to search, N (the "number of hash buckets") lists are maintained, so that (if items are distributed evenly among the buckets) searches take about

lIN

as long as through a single list. The hash bucket in which to store or search for a given item is computed from some characteristic of the item (the item's "key") by a function called a "hash function" (the procedure "hash" in Example 10.11-4). The best hash functions produce values distributed evenly among the hash buckets when given a typical mixture of keys.

The format of the file is shown in Figure 10.11-3.

The system procedure "confirm" writes its argument to 10gFile as a prompt and accepts a "yes"

or "no" answer from cmdFile. If the user types something other than "yes" or "no" (or an abbreviation thereof). "confirm" reprompts until an acceptable answer is given. The procedure

"errMsg" writes an error message to 10gFile. then prompts with the standard "Error response:"

prompt. It is used by most MAINSAIL utilities to indicate an error condition. More detailed descriptions of these procedures may be found in the "MAINSAIL Language Manual".

The first thing in the file is an integer, which is the number of hash buckets in the file.

The next thing in the file is the null record used to terminate hash lists. 'It consists of a single long integer,

OLe

An algorithm that traverses a hash list may therefore terminate by checking whether the nextRec field (which is the first field) of the record is

OL;

if so, the current record is the null record and contains no data.

The next thing in the file is the current end-of-file position. New records are created at this position, i.e., they are added at the end of the file.

The next thing in the file is a series of N long integers, where N is the number of hash buckets. Each long integer is the file position of the first record in the

corresponding bucket. If a given hash bucket is empty, the long integer is the position of the null record.

The next thing in the file is the data records themselves.

The information in each record is stored as the following sequence of values:

+---+---+---+---+---+

I nextRec I nameLen I name chars I dataLen I data chars I

+---+---+---+---+---+

ne~tRec is a long integer, representing the file position of the next record in this hash bucket. It points at the null record if this is the last record in the list.

nameLen is the number of characters in the record name;

i t is an integer.

name chars are the characters in the name of the record, stored as individual integers.

dataLen and data chars are the length and characters of the data string, respectively.

BEGIN "symTab"

*

Maintains a symbol table or primitive database in the

*

form of a random-access data file.

POINTER (dataFile) f;

*

The database file

INTEGER numBuckets;

*

How many buckets in the file DEFINE numBucketsPos OL;

DEFINE nullRecordPos

numBucketsPos + cvli(size(integerCode»;

DEFINE eofPosPos

=

nullRecordPos

+

cvli(size(longIntegerCode»;

DEFINE firstBucketPos

=

eofPosPos + cvli(size(longIntegerCode»;

BOOLEAN PROCEDURE createNewDataBase (STRING name);

*

Return true if successful creation.

BEGIN INTEGER i;

LONG INTEGER eofPos;

STRING s;

Example 10.11-4. The Use of a Random-Access Data File (continued)

IF NOT confirm("Create new database file " & name) THEN RETURN (FALSE) i

IF NOT open(f,name,create!input!output!random!errorOK) THENB

errMsg("Couldn't create",name)i RETURN (FALSE) ENDi setPos(f,nullRecordPos)i write(f,OL)i # Create null rec.

write(logFile,"Number of hash buckets to use in file ", name," «eol> for 131): ") i # 131 is a good number read(cmdFile,s)i

numBuckets := IF NOT s THEN 131 EL cvi(s);

IF numBuckets

<

1 OR numBuckets > 1000 THENB

# Sensible numBuckets?

errMsg("Bad number of buckets" & s,eol &

"Should be 1 - 1000"); RETURN (FALSE) END;

setPos(f,numBucketsPos); write(f,numBuckets);

*

Now initialize all the buckets to be empty:

setPos(f,firstBucketPos)i

FOR i := 1 UPTO numBuckets DO write(f,nullRecordPos);

eofPos :- getPos(f); setPos(f,eofPosPos);

write(f,eofPos);

*

eofPos is current end-of-file position RE TURN (TRUE) ;

END;

INTEGER PROCEDURE hash (STRING S)i

*

Returns a value in the range

a

to numBuckets - 1 BEGIN

INTEGER h,i,j;

i := (h

:=

length(s» MIN 4; j

:=

1;

WHILE (i .- 1) GEQ 0 DO·q .+ cRead(s)

*

(j .+ 2);

RETURN(h MOD numBuckets) END;

STRING PROCEDURE getString (INTEGER numChars) ;

*

Read the next numChars integers from the file into a

*

string.

BEGIN

INTEGER Chi STRING Si

Example 10.11-4. The Use of a Random-Access Data File (continued)

S

.= .

nn. ,

WHILE (numChars .- 1) GEQ 0 DOB read(f,ch); cWrite(s,ch) END;

RETURN(s);

END;

LONG INTEGER PROCEDURE bucketPos (INTEGER hashCode) ;

# Return the position of the start of the hash list with

# hash code hashCode.

RETURN(firstBucketPos +

cvli(hashCode

*

size(longIntegerCode»);

BOOLEAN PROCEDURE lookup

(STRING recName; PRODUCES OPTIONAL STRING recVal);

# Return true if record recName is found, or if recName

# is nn (illegal record name) BEGIN

INTEGER nameLen,valLen;

LONG INTEGER nextPos;

IF NOT recName THENB

errMsg(UNull record name"); recVal := u";

RETURN (TRUE) END; # Act as if we found i t

*

position to hash list for this record name:

setPos(f,bucketPos(hash(recName»); read(f,nextPos);

setP~s(f,nextPos);

*

Pos of first record in list DOB read(f,nextPos);

END;

IF NOT nextPos THEN RETURN (FALSE) ; # End of this list read (f, nameLen) ;

IF getString(nameLen) NEQ recName THENB setPos(f,nextPos); CONTINUE END;

read(f,valLen); recVal := getString(valLen);

RETURN (TRUE) END;

Example 10.11-4. The Use of a Random-Access Data File (continued)

PROCEDURE writeRecord (STRING recName,recVal);

*

Write the new record at the current end-of-file

*

position.

BEGIN

LONG INTEGER eofPos,listPos;

setPos(f,eofPosPOS)i read(f,eofPos);

*

Insert the record at the head of the hash list:

setPos(f,bucketPos(hash(recName»);

*

Overwrite the head of the list position:

read(f,listPos); relPos(f,- size(longIntegerCode»;

write(f,eofPos); setPos(f,eofPos);

write(f,listPos,length(recName»;

WHILE recName DO write(f,cRead(recName»;

write(f,length(recVal»;

WHILE recVal DO write(f,cRead(recVal»;

eofPos

.=

getPos(f); setPos(f,eofPosPos); write (f,eofPos) ; END;

PROCEDURE createRecord (STRING s);

BEGIN

STRING recVal,t;

scan(s," " & tab,proceed!omit);

*

Remove leading blanks IF lookup(s) THENB

errMsg("Record already exists:",s); RETURN END;

*

Now read the record value from cmdFile:

write (logFile,

"Enter record value; end with blank line"- & eol);

recVal := "";

DOB read(cmdFile,t); IF NOT t THEN DONE;

write(recVal,t,eol);

*

Same as "recVal .& (t & eol)"

END;

writeRecord(s,recVal);

*

Now write i t into the file END;

Example 10.11-4. The Use of a Random-Access Data File (continued)

1

J

PROCEDURE lookupRecord (STRING S)i BEGIN

STRING recVali

scan{s," " & tab,proceed!omit)i # Remove leading blanks IF NOT lookup(s,recVal) THEN errMsg("No such record:",s) EL write(logFile,recVal)i

ENDi

PROCEDURE showRecords;

BEGIN

INTEGER i,nameLeni LONG INTEGER nextPosi

FOR i := 0 UPTO numBuckets - 1 DOB

setPos(f,bucketPos(i»; read(f,nextPos);

setPos(f,nextPos)i # Pos of first record in list DOB read(f,nextPos); IF NOT nextPos THEN DONE;

END;

read (f, nameLen) ;

write(logFile,getString(nameLen),eol)i setPos(f,nextPos) END END;

Example 10.11-4. The Use of a Random-Access Data File (continued)

BOOLEAN PROCEDURE processCommand (STRING s);

UNTIL open(f,s,random!input!output!errorOK) OR createNewDataBase(s);

*

Note use of short-circuit evaluation:

*

createNewDataBase is called only if open fails setPos(f,numBucketsPos);

read(f,numBuckets);

*

Get the number of buckets

Example 10.11-4. The Use of a Random-Access Data File (continued)

D08 write(logFile,"Command: "); read(cmdFile,s) END UNTIL NOT processCornmand(s);

close(f) ; END;

END "symTab"

Example 10.11-4. The Use of a Random-Access Data File (end)

10.12. PDF 110

By preceding a file name with "PDF" and the device module prefix character (defined as

$devModBrk, '>' on most systems), or by including the $pdfbit in the call to open, a file can be opened for PDF, or "Portable Data Format", I/O. This format is used for interchange of data among different processors. When a file is open for PDF I/O, the file positions are in terms of character units instead of storage units. To allow a program to handle data files opened for either normal I/O or PDF I/O, the procedure $ioSize should be used instead of size to position within a data file.

$ioSize's procedure header looks like:

INTEGER PROCEDURE $ioSize (POINTER(file) fi INTEGER typ);

(Actually, $ioSize is a macro, not a procedure, but it acts as if it were a procedure declared with the above header). $ioSize returns the number of storage or character units, as appropriate, occupied by the data type with the type code typ in the file f, based on whether or not f is open for PDF I/O.

Since the program of Example 10.11-4 uses size instead of $ioSize, it will not work if the

"PDF" device prefix is given in the database file name. To allow for this possibility, all occurrences of "size" would have to be replaced in the module with appropriate calls to

$ioSize; for example, the body of the procedure bucketPos would be changed to:

RETURN(firstBucketPos +

cvli(hashCode

*

$ioSize(f,longlntegerCode»)i