~ TAPE INSTRUCT - JOINT CONFERENCE

ORDER TRANSLATION

ORDER EDIT PRODUCT DETAILS

METHODS AND TIME STANDARDS

QUALITY PROCEDURES

PRODUCT COSTS

MAN, MACHINE AND MATERIAL TIMING

VENDORS SUPPLY MATERIALS

I I

TRANSLATION LOGIC

PRODUCT DESIGN STRUCTURE MANUFACTURING

OPERATION STRUCTURE QUALITY CONTROL

STRUCTURE

COST STRUCTURE

MANUFACTURING CONTROL STRUCTURE

MACHINES:

AUTOMATIC OPERATOR RUN

• PARTS. SHIPMENT

• ASSEMBLIES • AUDIT

Figure 8

III

...

0 U

~ PROGRAM

~ TAPE

INSTRUCT .

THEORY OJ' !'ILES

*

Lionello Lombardi

University of California at Los Angeles

SUlJIDI8ry

The theor,r of files is a tool for the logico-mathematical treatment of automatic non-numerical data processing problems, such as machine accounting, information retrieval and mechanical translation of languages. The main result which has been obtained sofar from the application of the theory of files is the for-mulation of a Simple pattern to which the data flow of any information processing procedure conforms, regardless of how many files are in-volved. The flow of each file can be controlled and coordinated with the flow of the other files by means of five boolean parameters, called

'indicators' •

A specially designed Algebraic Business Language exploits this result for the purpose of programming digital data processing systems.

This paper also probes into the impact of the theory of files upon the logical design of di-gital information processing systems.

Introduction one specific language - the mathematical lan-guage - has been successfUl in several areas where the need for a scientific investigation existed.

That aspect of human activity which seems to be growing the fastest (in such a way that it sometimes threatens to minimize the impor-tance of all the others) is the control of jeopard-ize this progress itself. Paperwork is general-ly carried out by machines; however, the work of organizing, coordinating, defining and de-scribing it is still performed by humans. '!'he proportion of the total available manpower that it absorbs grows dangerously with the wealth and sophistication of our SOCiety, '!'he phe-nomenon of paperwork control has reached the stage where it should be investigated scientif- . iCally, hope~ to repeat the success that

c~arable scientific approaches yielded when they were applied to other fields, in terms of

promoting the knowledge, suggesting the

de-velopment of suitable techniques, allowing the adoption of reliable procedures and collapsing the amount of work necessary for carrying them out. We believe that this can be done, and that the appropriate language for analy'ziDg coordinated papervork can only be mathematics.

The purpose of the theor,y of files is to auto-matic data processing systems. The specific problem of non-numerical data processing has been emphasized, the main reason for this being the Widely spread prejudice that this field cannot possibly be approached scientifically.

Available Computer Languages A wide selection of computer languages designed with the aim of providing tools for automatizing the programmer's work is available today; however, it seems that none of these languages can help the systems analyst. We think that the main shortcoming of such lan-guages (which range from Simple assemblers to autocoders able to handle macros, and to such languages as the IBM Commercial Translator, COBOL,J'ACT, now-Matic or AIMACO), are all de-Signed according to the ~attern that we called 'v. Nf!1llDIUlll Language' in : a v. Neumann lan-guage4 is a language in which a phenomenon is described by means of a sequence of statements divided in two categories - 'executable' state-ments and 'descriptive' statements - in such a way that the statements of the first catego-ry can be put into a one-to-one correspondence with a f'lowchart5 of the procedure involved by the phenomenon represented.

SUch a language can be used successfUlly for describing l~outs of information supports and sequences of actions, namely proceduresG •

Unfortunately such l.a.ngua.gEB cannot possibly provide for a synthetical and compact defini-tion of the compound of logical condidefini-tions to which ~ action is subjected, nor for the synthesis of a coordinated flow of information.

lor instance, if we consider a language such as COBOL and we try to use it for representing integrated data proceSSing procedures, the fol-lOwing shortComings come to light:

138 3.3

(1) Each statement has the form "IF condition THEN action", vhere the action denotes a sequence of steps and the condition denotes a boolean expression. Nevertheless the execution of the action is only apparently circumstances under which a certain action is to be executed, one should carefully possibly contain any control statements, and consequentlY could not possibly be a v. Neumann language.

(2) Most documents are selt-explaining, as tar as their patA between different procedures is concerned. However, one ot the big problems in systems analYsiS is the de-termination of when and under which circum-stances a document is entered into or issued trom a procedure. This Yell known problem of efficientlY coordinating the data flow becomes one of the main issues ot systems

ma-chine control through the procedUre de-scription. In such easy applications as payroll, where the degree ot parallelism cann6t possiblY be high, COBOL can be used successfully. On the contrary, consider applicatiOns in which, for the sake ot saving computer time, several fUnctionally independent procedures that relate only by the tact that they operate on sets ot tiles which are sorted with respect to the same key- (1.e., they are equiordered), are run in parallel: then the use ot such a lan-guage leads to long and exceedingly in-volved descriptions. Even in same conwa-ratively simple applications, where tor example ^Jbilling and accounts receivable (or autocoders as a programming language. How-ever, it does not seem to be an appropriate analysis language for such applications.

In cases vhere the degree of parallelism is high and the data floy is complex, slich a language should be discarded.

(3) Last and least, it appears that the use of same type of kindergarten English, vhose adoption seems to be due to the objection-able assumption that it is more readily understood by top executives than any more appropriate technical notation, is an obsta-cle to the use ot COBOL even as a program-ming language, because it yields comparati-velY long procedure descriptions. However, this last shortcoming is really irrelevant, primarilY because it atfects COBOL onlY as a programming language. Further this short-coming can be removed easily tr0lll the lan-guage without a:ny major change in the logic ot its translators. It has also been a cammon experience ot individuals program-ming with COBOL that after a fey statements one drops the English of the language and uses abbreviations, especially for such phrases as "IS GREATER THAN". describ-ing non-arithmetic data processdescrib-ing procedures.

The first philosophy can be summarized as tol-lows: t We must algebrize the non-numerical pro-cedures, in order to be able to applY to them 6 successful algebraic languages such as Fortran t

The other can be expressed in this

vs:r:

'The major problem in non-arithmetic data processing is the one ot detining and COOrdinating the circumstances, sets of equiordered files can be reduced to boolean algebrae of files. In such an algebra the most common file handling oper-ations can be defined by simple algorithms: for exampleJa k-~ sorting-by-merging procedure (either tixed or variable length-sequence) is represented as a recurrent summation of k files.

From a file-theoretical standpoint, a pro-cedure is broken down into a sequence of PULSES, at whose beginning new records are (logically) entered, during which calculations are per-formed, and at whose end all the compl.etely processed records are (logically) filed. Only records whose keys all have the same value are de-scribing procedures is the Algebraii BuSiness LaDguage (ABL). It is described in , where the basic concepts of the theory of files are

de-fined mathematica.lly. In its siq>lest version, an .A:BL procedure description consists of a sequence of 'conditional. expressions', namely of sets of executive orders ('actiOns') subject to boolean expressions ( , conditions' ). 1'!lere are no control statements, and the conditional ex-pressions are to be considered sequentiaJJ.y8, 1. e., from the first to the last.

Optimization.

Simplification. In each procedure, a LO-GICAL ORDER ot the files involved must be given

a) Max1:m:izes the parallelism of logical input-output

b) Minimizes the amount of internal proces-sing".

From an applicati ve standpoint, only' DD -optimized procedures should be considered:

notice that the maximization of the parallelism ot the physical input-output flow can be ob-tained only on the basis of a logical one whose parallelism is maximized.

l(ow two pulses are alw~s independent as far as internal processing is concerned, and two phases are alw~s independent as far as input-output is concerned: consequently, in order to DD - optimize a procedure, it is suf-ficient to

I) Ma.x1m1ze the parallelism of the logical input-output within the pulse.

II) 1I1n1m1ze the amount of internal proces-sing wi thin the phase.

Since ABL is a sequential language, point

II can be accomplished si~ly by performing a precedence analysis and a simplification of the procedure deSCription. (Notice that this would not be easy in a v. Neumann language).

In order to discuss point I, let us con-sider separately input-output.

Input. The pattern which maximizes the input-parallelism is unique for arr:r phenomenon of the kind we are considering. More preCisely, during which the last record pertaining to it is entered.

standardized and automatiC, output is entirely and directly controlled by the analyst. In fact, the conditions under which documents are to be issued al~s depend upon the particular phenomenon considered. Furthermore, the de-termination of these conditions can otten be considered as the major single factor in the representation of this phenomenon. Since it is important to have these conditions

Indicators. The boolean variables used for writing a procedure description (which are compounded into boolean expressions in ABL, while they consists of sets of parts of differ-ent statemdiffer-ents in any v. Keumann language) may have three origins:

a) They may be generated by comparison between numeric, alphameric or boolean entities.

140 such references are generated by means ot care-ful constructions and comparisons of keys. The theory of files suggests that the layouts of the keys of the files are given as part of the data deSCriptiOns, and that the keys of the records are constructed and related automatically to each other as part of the I-~ operations: con-sequently, these operations are not under the control ot the analyst. Since references to the current status of the data flow are otten neces-sary for making decisions which condition the phenomenon considere~'ABL must have a provision for giving to the analyst complete information about it. The configuration of the data flow never changes during any pulse, and from a file-theoretical standpoint it can be fully characte-rized by stating the occurrence or omission of DERIVATIVE' of F characterize those records of F whose key has a value characterizes those records of F which are incomplete or non-conforming.

Four further indicators, which are CODDD.on to all files, are available in each representa-tion of a phenomenon for denoting its initia-tion and closure. No other informainitia-tion re-garding the data flow is needed in any DD -optimized procedure/in whose description the indicators can be used without any distinction from the other boolean variables.

The setting and resetting of the indica-tors (i.e., the 'indicator logic') is per-formed automatiCally according to the rules stated inl (section

3),

where the laws of the automatic data flow control-in particular of the input mechanism - are stated in terms of

anywhere in the procedure description: in par-ticular in the

nov

Control ExpreSSions.

Hardware and ~lementation Se~ential Languages

Like a mathematical synthesisot a physical phenomenon can be stated by means of a sequence of equations, so the theory of files allows one to express a mathematical synthesis of a data processing phenomenon in ABL by means ot a se-quence ot condi tional expressions. In both cases the sequence is considered fram the first

e~ation (or conditional expreSSion, respective-ly) to the last one. '.rhe flow control expres-sions are conditional expresexpres-sions where the action consists of issuing a record. Neither the equations of an algorithm nor the condition-al expressions of a non-arithmetic procedure description are in a one-to-one correspondence Wi th the steps of any path that a machine con.trol would follow in order to carry them out. Unlike any v. Neumann language, ABL is

'se~ntial' and. asynchronous Wi th respect to the ~ the procedures described are imple-mented. 0l1r study shows that languages having this structure are generally more suitable than v. Neumann languages for approaching data procES-sing phenomena scientifically.

It one vants to utilize the theory of files transform the sequential representation of any data processing phenomenon into a DD optimized flow chart. Apparently this transformation can quite easily be made because of the stan-dard input scheme, and of the fact that each conditional expression completely determines one specific issue of the procedure, like the presence of a record in an output file or the value of a certain field of an output record, etc. A difficulty arises when we consider the interrelationship between the indicators of the various files; for exampl~, the condition-part of a certain conditional ~xpression, say EA., may depend upon the setting ot an indicator of an output file whose records are filed under . the control ot another :now Control ExpreSSion, say EB, which comes af'ter EA in the sequential description. Conse~ently, the sequence ot operatiOns must be properly arranged in order to avoid unnecessary look-aheads. Such rear-rangements should not be performed by the analYst, ·who should only be concerned With the statement of the information processing effect of the phenomenon, rather than With procedural considerations or with any simplification of

the correlation among expressions. This simpli-fication should be carried out by the machine, together with the entry and removal of auxiliary conditional expressions and with the optimiza-tion of the arithmetic for.mulae. This last operation should not be bounded to the optimiza-tion of each single for.mula within itself, but should consist of analyzing the relations bet-ween different expreSSions in order to avoid unneeded repetitions. The study of Semapraxis codifies the eflorts of analysists toward the intelligent utilization of computers for such machine simplifications. In particularlO by Feldstein enunciates such details.

The ABL representation of a phenomenon can also be mechanically checked against tautologies and contradictions which may depend on an

Most stored program data processors are provided with an operating system which in-cludes efficient buffered input-output sub-routines. Same data processors-the IBM 7070-74, for example - have specific features (scatter-read-gather-wri te, highly parallel memory bus, block transmission with rearrangement, etc.) which allow the programming of very efficient I-¢ routines, including the necessary key logic.

In accordance with the adoption of same new ideas in the design of machinery, (consider for instance the non-arithmetic processor of the IBM Harvest, Or the systems with a Fixed+Varia-ble structure9 ) it is sometimes convenient to wire such routines.l which become parts o;r modules

of the hardware.

When a system bas to carry out procedures represented in ABL a similar alternative arises for the indicator logic,which will be program-med for standard systems and built for more ad-vanced and specialized ones.

A third case where the issue of a compa-rison between wired and programmed ^tgiant com-mands' varies with the modernity and specializa-tion of design of the basic hardware is related to the handling of the compact and flexible

t table operations t with whose use ABL provides the analyst (seel , section 2).

'rhe implementation of ABL is significantly conditioned by the hardware considered: it appears more difficult to carry it out for stand-ard, strictly stored-program computers than for more advanced ones. The generation of a logi-cal operations written by programmers using COBOL or symbolic machine languages should be considered as a clumsy, approximative and only partially satisfactory replacement for a clean, universal and fully automatic indicator logiC.

Let us conclude by pointing out that the advantages of adopting sequential languages does not seem to be bounded to the use of large scale data processing systems. On the contraryl such languages appear to be intimately related

to the nature of non-numerical data processing phenomena, regardless of their implementation;

for example, a sequential language quite similar to ABL proved to be well suited for representing procedures to be carried out by very simple, externally programmed data processors3.

'*

The preparation of this paper was Non-Arithmetic Data Processing Procedures"

(Forthcoming in J.A.C.M.).

2. L. Lombardi, "System Handling of Functional Operators" (Forthcoming in J.A.C.M.) • 3. L. Lombardi, "Inexpensive Punched Card

Equipment" (Forthcoming.) •

4. H. Goldstine, J. v Neumann, "Planning and Coding for an Electronic Digital Computer"

(LA.S., 1948).

5. IBM Staff', "Flow Charting and Block Diagram-ming TechniqueS' (c20-Bo08).

6. Notice that Fortran is also a "v. Neumann language" •

7. seel, section

4.

8. Another ^tsequential language twas conceived independently of us by C. B. Tompkins with the collaboration of M.A. Melkanoff and J.D.

SWift.

9. G. Estrin, "Organization of Computer Systems - The Fixed plus Variable structure Computer"

(Proc. of the 1960 WJCC).

10. M.A. Feldstein, "Semapraxis" (Forthcoming~.

POLYPHASE MERGE SORTING -- AN ADVANCED TECHNIQUE R. L. Gilstad

Minneapolis-Honeywell Regulator Company Electronic Data Processing Division

Wellesley Hills, Massachusetts The Challenge

Designers of generalized library sort packages for the current and future generations of computers are faced With the challenge of developing new techniques that provide more effective use of these computers. The major concern in developing efficient sorting routines in the past has been the internal sorting are in computer terms, generations.

Emphasis is now being given toward more tech-nique, originally called the !tN-lIt techtech-nique, now a proven method better known as the Cascade sorting technique.

The intention of this paper is to introduce a new merging technique, polyphase sorting. The following section describing the application of the Cascade sorting technique is included to aid in the understanding of the evolution of the polyphase sorting technique and to prepare for certain comparisons later in this paper between the various sorting techniques. A complete study of the changes in merge sorting would, of course, tech-niques, this paper assumes a general understand-ing of them by those interested in this subject.

Cascade Sorting

The Cascade Merge Sort, available exclu~

sively in the Honeywell 800 automatic program-ming packages, is a two-segment program, the first part of which is

an

internal sort that creates strings of ordered items. The internal sorting method that has proven to be most advan-tageous to Honeywell for generalized sort gener-ators uses the "tag bin ^Itconcept which transfers internally only a tag representing each item stored in memory, instead of transferring the entire item. Further, the "replacement" sorting

method is added, which creates strings of ordered items substantially longer than the num-ber of items stored in memory. This method, in fact, provides strings averaging twice the num-ber of items stored in memory for randomly ordered input data and longer strings if any pre-ordering exists in the input file.

The only real difference between the inter-nal sort, hereafter called the pre-sort, for Cascade sorting and normal sorting is the manner

Im Dokument JOINT CONFERENCE (Seite 148-158)