• Keine Ergebnisse gefunden

Batch, conversational, and incremental compilers

Im Dokument VOLUME 34 (Seite 61-71)

by HARRY KATZAN, JR.

Pratt Institute Brooklyn, New York

INTRODUCTION

Compiler-writing techniques have received a great deal of pragmatic and academic attention and are now fair-ly well-defined.

*

It was and still is generally felt that the compiler is independent of the operating system jn which it resides, if it resides in one at all. The invention of time-sharing systems with conversational capability, however, has required that compiler experts re-evaluate existing concepts to make better use of external facilities. This was done and conversational and in-cremental compilers have evolved. A generalized and consolidated discussion of these relatively new concepts is the subject of this paper. First, a model of a batch compiler is introduced. The concepts are then modified and extended for a conversational programming en-vironment. Finally, a recent development termed

"incremental" compilation, which satisfies the needs of both batch and conversational compiling as well as interactive computing, is presented. First, some intro-ductory material is required.

Basic concepts

In the cla~:::;ical data processing environment,

**

t.he

"compile phase" or "souree language pl'ocet):sing phase"

is of prime importance as are definitions of source pro-gram and object program. The latter are redefined in light of the time-sharing or iuteractive environment.

Extran~ous items, such as where the object program is stored or whether or not the compiler should produce assembler language coding, are practically ignored.

The source program is the program as written by the

• Two books devoted entirely to the subject are worth men-tioning: Lee, J.A .N., 'Phe Anatomy of a Compiler,l and Randell, B.

and L. J. Russell, Algol 60 Implernentation.2

*. See Lee,l p. 9.

programmer. It is coded in symbolic form and punched

?n cards or typed in at the terminal. The object program

IS the program after being transformed by the compiler into a machine-oriented form which can be read into the computer and executed with very few (if any) modifications. Also of interest is the information vector which gives initial conditions for compilation and de-notes the types of output desired. A sample of specifica-tions which might be found in an information vector follow: (1) location of the source program; (2) name of the program; (3) the extent of compiler processing, i.e., syntax check only, optimize, etc.; (4) computer system parameters; (5) compiler output desired; and (6) dis-position of the object module. The form of the source program is sometimes required, although in most cases this information is known implicitly. This pertains to different BCD codes and file types which may range from sequential or indexed files on conventional systems to list-structured files in virtual machines.

Similarly for output, the user can request a specialized form of object module or none at all, source or object program listing, and cross-reference listings. The object module is known as a Program Module which contains the machine language text and relocation information.

Additionally, it may contain an Internal Symbol Dic-tionary for use during execution-time debugging. The Internal Symhol Dictionary is especially useful in con-versational time-sharing systems where execution can be stopped on a conditional basis and the values of internal variables can be displayed or modified.

Batch compilation

Batch compilation methods are required, quite natu-rally, in a batch processing environment. The term

"batch processing" stems from the days when the pro-grammer submitted his job to the computer center 47

---and subsequently received his results later in time. A collection of different jobs was accumulated by opera-tions personnel and the batch was then presented to the computer system on an input tape. The important point is that the programmer has no contact with his job be-tween the time it is submitted to operations and when he receives his output. The concept has been extended to caver .:\1ultiprogramming Systems, Remote Job Entry (HJE), and the trivial case where no operating system exists and the programmer runs the compiler to com-pletion.

The generalized bat.ch environment

The most significant aspect of the batch processing envirOlllllent is that the entire source program is avail-able to the compiler initially and that all compiler out-put can be postponed until a later phase. The compiler writer, therefore, is provided with a liberal amount of flexibility in designing his language processor. For ex-ample, specification (i.e., declarative) statements can be recognized and processed in an initial phase and storage allocated immediately. In the same pass, state-ment labels are recognized and entabled; then in a later phase, validity decisions for statements that use statement labels can be made immediately rather than making a later analysis on the basis of table entries.

If desired, source program error diagnostics can be postponed. ::.Yforeover, the designer may specify his compiler so that the source program is passed by the compiler or so that the compiler is passed over the source program, which resides semi-permanently in memory.

This inherent flexibility is not exploited in the com-piler model which follows. Instead, an attempt has been made to present the material in a conceptually straight-forward manner.

A generalized batch compiler

By itself, a model of a generalized batch compiler is of limited interest. The concept is useful, hmvever, for comparison with those designed to operate in time-shared computer systems. Therefore, the presentation is pedagogical in nature as compared to one which might present a step by step procedure for building one.

Processing by the compiler is rather naturally divided into several phases which tend to be more logical than physical. Each phase has one or more specific tasks to perform. In so doing, it operates on tables and lists pos-sibly modifying them and producing nmv ones. One phase, of course, works on the source program froin the system input device or external storage and another produces the required output. The entire compiler is

described therefore by listing the tasks each phase is to perform; ordinarily, the description would also denote which tables and lists each phase uses and what tables and lists it creates or modifies. The specific tables and lists which are required, however, tend to be language dependent and are beyond the scope of this treatment.

The compiler is composed of five phases and an ex-ecutive routine, as follows:

The Compiler Executive (EXEC). The various phases run under the control of a compiler executive routine (EXEC) which is the only communication with the outside world. It establishe.s initial con-ditions and calls the different phases as required.

It can be assumed that EXEC performs all system input/ output services, upon demand from the phase modules. :\Iore specifically, the EXEC has five maj or and distinct functions:

1. to interface "\vith the compiler's environment;

2. to prepare the source statements for processing by phase one;

3. to control and order the operation of the phases;

4. to prepare edited lines for output; and

5. to provide compiler diagnostic information.

Phase 1. Phase 1 performs the source program syntactic analysis, error analysis, and translation of the program into a tabular representation. Each variable or con-stant is given an entry in the symbol table, with formal arguments being flagged as such. Initial values and array dimensions are stored in a table of preset data.

Lastly, information from specification statements is stored in the specification table. The most significant Pl'ocp.ssing; howfwer, occurs wi.th respect to t.hp. Program Reference File and the Expression Reference File.

Each executable statement and statement label is placed in the Program Reference File in skeletal form.

In addition to standard Program Reference File entries, the Program Referellee File contains pointers to the Expression Heferenc(' File for statements involving;

arithmetic or logical expressions.

The Expression Reference File stores expressions in an internal notation using pointers to the symbol table . when necessary. As wjth the Expression Heference File, the Program Reference File also contains pointers to the symbol table.

Phase 2. In general, phase 2 performs analyses that cannot be performed in phase 1. It makes storage as-signments in the Program l\Iodule for all variables that are not formal parameters. It detects illegal flow in loops and recognizes early exits therefrom. It also detel'lnine::; blocks uf a program with IlU path of control

Batch, Conversational, and Incremental Compilers 49

to them; and lastly, it detects statement labels which are referenced but not defined.

Phase 3. The object of phase 3 is to perform the global optimizations used during object code generation, which is accomplished in phase 4.

The first major function of phase 3 is the recognition and processing of common sub-expressions. Phase 3 determines which arithmetic expressions need be com-puted only once and then saved for later use. In addi-tion, it determines the range of statements over which expressions are not redefined by the definition of one or more of their constituents. If the occurrence of an ex-pression in that range is contained in one or more DO*

loops which are also entirely contained in that range, Phase 3 determines the outermost such loop outside which such an expression may be computed, and physically moves the expression to the front of that DO loop. Only the evaluation process is removed from the loop; any statement label or replacement operation is retained in its original position. The moved ex-pression is linked to a place reserved for that purpose in t,he program reference file entries corresponding to the beginning of the respective DO loops.

The second major function of phase 3 is the recogni-tion and processing of removable statements. A "remov-able statement" is one whose individual operands do not have "definition points" inside the loop; obviously, the execution of this statement for each iteration would be unnecessary. A definition point is a statement in which the variable has, or may have, a new variable stored in it (e.g., appears on the left-hand side of an equal sign). In removing statements, they are usually placed before the DO statement.

Phase 3 also processes formal parameters and devel-ops the prologue to the program; it optimizes the use of registers; and it merges the Program Reference File and the Expression Reference File to form a Complete Program File in preparation for phase 4.

Phase 4-. Phase 4 performs the code generation function.

I ts input consists of the symbol table and the Complete Program File and its output is the Code File, which rep-resents completed machine instructions and control information.

Phase 5. Phase 5 is the output phase and generates the Program Module, the source and object listings, and the cross reference listing. Upon request, an Internal Symbol Dictionary is also included in the Program Module.

* Although the DO keyword is a constituent part of several programming languages, it should be interpreted as representing the class of statements from different languages which effec-tively enable the programmer to write program loops in a straightforward manner.

Any compiler model of this type is clearly an abstrac-tion; moreover, there is almost as much variation be-tween different compilers for the same programming language as there is between compilers for different languages. The model does serve a useful purpose which

.

'

IS to present a conceptual foundation from which con-versational and incremental compilers can be intro-duced.

Conversational compilation

Compared with the "batch" environment in which user has no contact with his job once it is submitted, the conversational environment provides the exact opposite.

A general-purpose time-sharing system of one kind or another is assumed, * with users having access to the computer system via terminal devices.

In the batch environment, the user was required to make successive runs on the system to eliminate syntax and setup errors with the intervening time ranging from minutes to days. Excluding execution-time 'bugs", it often took weeks to get a program running. In the conversational mode, syntactical and setup errors can be eliminated in one terminal session. Similarly, execu-tion-time debugging is also possible in a time-sharing system, on a dynamic basis.

. Conversational programming places a heavy load on a compiler and an operating system; the magnitude of the load is reflected in the basic additions necessary to support the conversational environment.

The time-sharing environment

The time-sharing environment is characterized by versatility. Tasks can exist in the "batch" or "con-versational" mode. Furthermore, source program in-put can reside on the system inin-put device or be pre-stored. The time-sharing operating system is able to distinguish between batch and conversational tasks;

therefore, batch tasks are recognized as such and pro-cessed as in any operating system. The ensuing dis-cussion will concern conversational tasks. It is assumed, also, that the user resides at a terminal and is able to respond to requests by the system.

During the compile phase, the source program may be entered on a statement-by-statement basis or be pre-stored. In either case, the compiler responds immedi-ately to the terminal with local syntactic errors. The user, therefore, is able to make changes to the source program immediately. Changes to the source pro-gram other than in response to immediate

diagnos-* Two typical general-purpm;e time-sharing systemR are T&.,/36(}3·4 and MULTICS.o

t.ics cause a restart. of t.he compilation process. Obvious-ly, the system must keep a fresh copy of the source pro-gram for the restart case. To satisfy this need, a copy of the current up-to-date source program is maintained on external storage; if the source was prestored, the original version is updated with change requests; if the source program is not prestored, the compiler saves :J.ll source (and changes) as they are entered line-by-line.

With the user at a tenninal, the compiler is also able to stop midway during compilation (usually after the glob-al statement anglob-alysis and before optimization) to in-quire whether or not the user wants to continue. Under error conditions, the user may abort the compilation command structure of the operating system. In prepara-tion for execuprepara-tion-time debugging, the user would prob-ably request an Internal Symbol Dictionary during compilation so that internal variables can be addressed symbolically. Since execution-time debugging IS a relatively new concept, it is discussed briefly.

Debugging commands usually fall into three cate-gories: (1) program control; (2) program modification;

and (3) debugging output. Debugging commands may be imbedded in the program itself or the program can be stopped (either asynchronously or with an AT com-mand) and the actions perfonned immediat.ely. Ex-amples of typical program control commands are:

AT symbolic-location ... STOP RUN

RUN symbolic-location

Examples of program modification commands are:

SET A = 1.0

IF A (0, SET A

=

0

Examples of debugging output commands are:

DISPLAY MAIN.I ::\IAIN.A the compiler's effort is devoted. to producing an efficient object program. As a result, the instructions to perform certain computations are sometimes not located where one would expect to find them. In fact, this is a direct consequence of common sub-expressions and removable statements, which were discussed previously. Although these processes contribute to efficiency, they have a side effect which hinders the debugging effort. There-fore, when expecting to use dynamic debugging, the user should request an Internal Symbol Dictionary and select the option which does not produce optimized.

code.

The conversational compiler and the time-sharing operating system must support several aspects of the conversational environment. These are sUlnmarized as follows: (1) the ability to change or forget the effects of the preceding statement; (2) restart logic; (3) main-tenance of the entire source program, in up-to-date form, on external storage; (4) the ability to scan ments and produce diagnostics on an individual state-ment. basis; and (5) the option to produce optimized or unoptimized code.

The conversational compiler

Basically, the conversational compiler is a conven-tional batch-processor containing special features mak-ing it suitable for conversational, terminal-oriented.

operation.

Structurally, the major addition over a batch com-piler is Comcom-piler Control Program (CCP) , which in effect controls compilation. CCP is cognizant of whether the mode of operation is batch or conversational and is able to fetch source records and dispose of output print lines, accordingly. CCP is the facility which maintains the source program on external storage and is able to call the compiler at its initial entry for the restart case.

The function to fetch a source record is termed GET-LINE and is summarized in Table 1. Accordingly, an overview of the CCP is given in Figure 1.

The overall logic of the conversational compiler is shown in Figure 2. Clearly, it differs very little from the batch version. The differences in the compiler itself are found in phase one and at the end of phase two. In phase one, as shown in Figure 3, the compiler uses CCP as its external interface. IVloreover, the compiler always compiles a statement conditionally; later it uses the

Batch, Conversational, and Increnlental Compilers 51

Figure 1-0vervie\Y of the compiler control program (CCP)

Table I-GETLINE Function of the Compiler Control Program (CCP)

Conversational Batch

Presto red Not Prest. Prestored Not Prest.

GBTLINE A B A C

A. Fetches the next source record from external storage and

B.

returns it to compiler EXEC.

Fetches another source record from the terminal input device and updates the source file on external storage.

If it is the next source record, the line is returned to the compiler with the "forget" flag off. If the give~

source record is to replace the previous one, the

"forget" flag is turned on and the line is again returned.

Otherwise, a previous line has been modified and the compiler is entered at the "initial" entry point for the restart case.

C. Fetches the next source record from the system input device and updates the source file on external storage;

the line is returned to EXEC with the "forget" flag off.

"forget flag" to freeze or delete the compiled informa-tion.

After phase two, as shown in Figures 1 and 2, the conversational compiler again exits to CCP. In the batch mode, of course, CCP simply returns to the com-piler. In the conversational mode, as shown in Figure 1, the user is asked for changes and whether he wants to

Initial Continue

Ent:rv Ent:rv

1r Phase 3 Initialize

'n _ _ &: _ _ CC.I..l.V.I..lll

Compiler Global

Optimiza-tion

, ,

Phase 1 Phase 4

Translate Generate

Source; Object

~ind Syntax Code

Errors.

,

t

Phase 2 Phase 5

Assign Build PM

Storage; and ISD;

~ind Global Prepare

Errors Listings.

,

~,

Exit to CCP Exit to CCP

Figure 2-Logic of the conversational compiler

continue. At the user's request, CCP can change the source program, still residing on external storage, and restart the compiler at the "initial" entry. If the user desires to continue, the compiler is entered at the "con--tinue" entry. Otherwise, CCP exits to the command system and the remainder of the compilation is aborted.

continue. At the user's request, CCP can change the source program, still residing on external storage, and restart the compiler at the "initial" entry. If the user desires to continue, the compiler is entered at the "con--tinue" entry. Otherwise, CCP exits to the command system and the remainder of the compilation is aborted.

Im Dokument VOLUME 34 (Seite 61-71)