CENTRAL PROCESSOR

Firmware sort processor with LSI components

Figure 1-System configuration

Firmware Sort Processor with LSI Components 185

r-- - --- - --- ---

---I I

MAIN MEMORY 1 WORK AREA /Jo

---.

--r-STRING LIST 4>0 ______ -L ^r

/In

IORT

Figure 2-Memory allocation and data flow

INITIAL PARAMETERS TABLE

than the length of the machine word and this is the case for the majority of sorting jobs.

The analysis of the benchmark problems on sorting reveals that up to 40 percent of the total sorting time (see later section of this paper) is for CPU operations.

The functional description

The Sort Processor (SP) is an internally programmed, firmware special purpose processor dedicated for per-forming the sort routine outside of the computer's CPU. The SP shares the common l\1:M with the CPU on a lower priority basis and has the simplest interface with the CPU (Figure 1).

The START signal informs the SP that a Control Word (CW) is available on the MM bus. The CW con-sists of function and address fields. The function code indicates the type of operation to be performed by the SP, e.g., sort (ascending or descending), transfer status, resume, terminate. The address field specifies the starting address (ao) of the initial parameters and boundary conditions table required for the sorting.

This table set by the sort-merge control program, con-tains the following parameters (Figure 2):

(30 and (3n are the initial addresses of the first and the last records in the work area.

¢o and ^¢mare the initial and final addresses of the string list buffer.

(Instead of (3~ and ¢m the number of records in the work area (n) and the size of the string list (m) can be given.)

186 Spring Joint Computer Conference, 1970

SYSTEMB

---r---CPU

!'T..E!!~ ______________ ^L~~~2_____________ _

Figure 3-System configurations with sort processor

'Yo = address of the key word of the first record l = length of the key

r = length of the record (considering fixed length of records)

The operating sequence of the SP, after receiving the CW, typically proceeds in the following manner:

1. Reads the initial parameters table by the address

ao and stores it in its Register File (RF).

2. Generates the effective ,addresses of the consecu-tive keys by computing 'Yi = 'Yo

+

^irstarting with i = 0, subsequently i = 1, 2, 3 ... , n.

3. Reads the key (Ki). Meanwhile generates the initial addresses of each record (f3i = f30

+

^ir),

and stores the codes Kif3i in the consecutive loca-tions of its Search l\![emory (SM). The capacity of the Sl\![ can be smaller than the number of records in the work area, so that in the initial load the SM will be filled at i

<

4. Locates the first desired (the highest or lowest) key (KO) from the SM and stores the corresponding address (f30) in the CPo location.

5. Replaces the vacancy in the SM with the next (Ki+1)(f3i+l) code available from the work area.

6. Searches for the next desired key using the key

Kj-l located at the previous (j - 1) cycle as base key for the comparison, and stores the address f3j corresponding to the. newly located K ^j into the location cpj, and returns to step 5.

The iteration of steps 5 and 6 continue until the end of the string. This may occur when one of the following conditions arise:

a. f3i = f3n (or i = n) indicating that the work area is exhausted.

b. cpj = CPm (or j = m) indicating that the string list is exhausted.

c. No more successful searches in the SM are pos-sible. The Kj was the last desired key, e.g., the SM does not contain any more keys that are greater than (or less than) the current base key.

Now the SP interrupts the CPU and stays idle until a new control word is received.

The SP continues functioning if it is preassigned to control the peripheral file (Figure 3, Systems C and D), and may:

a. transfer the sorted string (by the string list) from the MM into the peripheral file, load new raw data into the l\![M;

b. reorganize the memory map, move the buffer areas;

c. exchange status information with the Systems Supervisor, and resume sorting operations.

The degree of the complexity of these control func-tions depends upon the computer systems architecture and the preestablished functional duties of the SP.

Although the algorithm described appears to be optimum for the proposed systems organization, other sorting methods can be implemented.

The hardware structure

The SP consists of three major functional blocks, all designed with MOS LSI components: search memory, register file and microprogram storage. The block diagram of the SP is illustrated in Figure 4.

Search Memory (SM)

The SM is divided into two' sectors. The KEY sector, designated for storing and searching key words, includes logic for comparing keys. The ADDRESS sector stores the initial addresses of the records. The number of bits per word in this sector equals log2 M, M being the size of the computer's MM in words directly accessible to the SP. Both these sectors are independently expandable in their bit directions, and the whole SM is expandable in the word direction in modules. These capabilities allow the SP to meet various sorting applications and to be integrated In computer systems that have different MM sizes.

Two types of MOS LSI memories for the SM are considered:

a. Associative Memory (AM). A mod lIar AM with LSI components can be organized, using mono-lithic or hybrid technology. The logic to perform the "next greater"- and "next smaller" functions is integrated into the AM chips. This could allow the SM to locate the next desirable key in an inter-rogation cycle. For current MOS technology, this is in the range of a few microseconds. However, the integration of these functions, because of their complexity, would result in a low yield of the AM chips. Also, because of the specific nature of these functions, such an AM chip might have limited marketplace. For these reasons, the inte-gration of only the "equality" function appeared to be a more reasonable approach. To locate the . next desired key i:q. an "equality" search, the current key is modified (incremented or decre-mented) and compared continuously until an

"equality" response is detected. The average number of these comparisons is equal to one-half of the key length in binary bits. Such a sacrifice in the searching speed seems to be justifiable by the economic reasons mentioned.

b. Recirculating Memory (RM). The RM is or-ganized with recirculating MOS dynamic shift registers. The initial key is compared with the contents of the RM for "equality." After the response is detected, the initial key is modified and compared continuously until the next desired key is detected. The search is performed by com-paring sequentially each word in the RM with the base key using a single comparator for the whole RM, as opposed to the AM, which contains

i i

SEARCH!

I MEMORyl

i

PERIPHERAL camMM.

PERIPHERAL FILE . .

Figure 4-Sort processor blOCK diagram

Firmware Sort Processor with LSI Components 187

TABLE I -Comparison table for search memories

Type MOS I/O Relative

of Components Pins Per Search

SM Per Cell Cell Ratio Time

AM 10 1:4 1

RM-1 6 1:13 5

RM-2 6 1:18 20

a comparator per each word. The search time for the RM depends not only on the length of the key word, but also on the length of the shift registers composing the R1VI.

Based on the state-of-the-art of the MOS technology, a comparison is made between the AM utilizing the

"equality" function only and two types of R1VI: the RM-1 with dual 64 bit, and the RM-2 with 256 bit dynamic shift registers. The frequency range (4 MC) and the other electrical characteristics of the chips are identical for both Rl\t{-1 and Rl\t{-2. The comparison is made for a 256 word SM, with the length of the keys in eight bytes (characters). The RM-1 has four external comparators functioning in parallel, one for each 64 word module, while the RM~2 has one external com-parator only for the whole 256 word module.

The results of this comparison are summarized in Table 1. It is evident from this comparison that RM's offer slower performance at lower cost. The worst case search time of the Rl\t{'s can be estimated by the following formula:

(W. b )

TSR = ~ X 10-⁶ sec

where: W = number of words in the RM module (or the length of the shift register) consisting of a single comparator

b ⁼ length of the key in binary bits

f

= frequency of the shift register In megacycles

The search time can be decreased by using shorter shift registers, if the cost for the additional hardware is justified.l\t{ore dramatic improvements are achievable through the increase of the frequency (f). The current MOS technology already reaches up to the 20 MC range for the shift registers. Further improvements are expected in the characteristics of M OS associative memories and· shift registers.

SM's of various searching speeds can be organized for a given cost-performance criteria. However, for the SP as a low priority background processor in a

188 Spring Joint Computer Conference, 1970

computer system, the critical issue seems to be the economical factor, not the inherent speed. Accordingly, the use of dynamic shift registers for the SM: presently appears to be more reasonable.

This is a scratch pad memory used to store initial parameters and boundary conditions as described earlier. Several temporary storage registers for in-dexing and counting are also included in the RF.

To make the RF more 'uniform and functionally flexible, all registers have the same log2 M length.

Sixteen registers in the RF appear to be sufficient.

Microprogram Storage (,uS)

This random access memory stores the microprogram of the SP. Either read-only (ROM) or read-write (RW1V1) memories can be used for the ,uS. Comparing the ROM vs RWM, the following factors must be considered:

a. Because of the nondestructive read-out of the semiconductor memories, the ROM does not offer significant speed or economical advantages.

b. A higher yield can be achieved in a ROM of a given size substrate. However, this may not result in a decisive advantage because the produc-tion of ROM requires a certain degree of cus-tomization, while the RWI'vf is an established off-the-shelf product.

c. The RWM allows greater flexibility and sim-plicity in microprogram alterations, debugging and maintenance.

Thus, the LSI RWM for the ,uS appears to be more desirable. Now the SP can perform not only various sort algorithms but also complementary functions such as table look-up, file maintenance, and list process-ing by simple reloadprocess-ing the ,uS by the desired micro-program.

Three basic microroutines reside in the ,uS:

a. the search microroutine which controls the S]\;1 and generates the 1\11V1 addresses of the sorted records,

b. the 1\11\1 interface control routine which performs all the communications between the 1\11\1 and the SP,

c. the peripheral file interface control routine for controlling the I/O operations if the SP needs to communicate directly with the peripheral file (see Figure 3, System Configurations C and D).

Each of the microroutines are stored in individual LSI memory units and monitored by a common synchronizer. Such a partitioning of the control me-dium allows:

• Simultaneous execution of the search and inter-face control microroutines.

• More efficient use of LSI technology.

• Easy integration of the SP "with almost any com-puter system by simply altering the microroutines.

• Easier maintenance and diagnostics.

From the economical standpoint, this approach does not cause any cost increase. Unlike magnetic memories, the size of the LSI semiconductor memory does not affect the bit price. Roughly 4096 bits of RWM, organized 128 X 32, are required for the ,uS.

SYSTEM CHARACTERISTICS AND CONFIGURATIONS

The SP as a "black box" can be integrated practically with any computer system and relieves the computer's CPU of the burden of sorting. It is applicable for com-puting systems operating in different processing modes.

In the conventional batch processing systems, the SP functions as a stand-alone, low priority processor. In real-time or time-sharing systems, the SP functions as a background in-house processor. Substituting for the sort routine only, the SP does not cause any structural changes in the computer system architecture.

The system characteristics of the SP are summarized as follows:

a. The SP is easily connected to the MM channel of the computer and does not require any specific and/ or additional hardware provisions from the computer (it behaves as any peripheral controller).

b. In a multiprocessing environment, the SP shares the common MM with the other processing units on a preestablished priority basis.

c. The interface between the SP and the M1\1 is asynchronous and operates on the request-acknowledgement basis.

d. The SP requires the simplest software support.

Statements like SORT ASCENDING, SORT DESCENDING, RESU1\1E, TERMINATE, TRANSFER STATUS on the systems language level must be compiled into a single control word format which sets the SP to the appropriate operational state. Further, the SP performs the specified function autonomously.

e. The sort-merge control program performs overall supervision and interaction of the SP with the

system. Data preparation and ]VIlVI allocation required for sorting also can be performed.

The SP can be integrated with the computer system in several configurations. Four typical system con-figurations are illustrated in Figure 3, and are described as follows:

SYSTEM A is the simplest configuration where the SP shares the MM with the CPU on a lower priority basis. The sorting time for this configura-tion is relatively long.

SYSTElJ![ B allows the SP more freedom in ac-cessing the appropriate MM bank. Although the SP remains a low priority processor, this con-figuration results in higher sorting speed.

In both system configurations A and B, data transfer between the MM and the peripheral file for sorting is accomplished through the conventional I/O channel and is controlled by the appropriate software routine.

SYSTEM C has the same MM sharing scheme as System A, in addition, the SP shares the peripheral file with the CPU. The control of the data ex-change between the MM and the peripheral file, required for the sort-merge operations, is per-formed by the appropriate microroutine of the SP.

SYSTEM D combines the MM sharing scheme of System B and the peripheral file sharing scheme of System C. System D comprises fully parallel processing capabilities aL ¹ offers the highest sorting efficiency.

In all of these configurations, the logic structure and the basic functional blocks of the SP remain practically unchanged. The specific interface characteristics of Systems B, C and D are easily programmed into the microprogram storage. The choice of a configuration depends upon the applications spectrum of the given computer system. Configurations C and D seem to be more applicable for the business computer systems where large amounts of data are to be processed and the I/O portion of the sort-merge operations are of significant magnitude. Configurations A and B can be used in scientific-engineering applications where a relatively small number of files are to be sorted. The trade-off between the desired degree of sorting effi-ciency and cost of the features for sharing the lVIM and/ or the peripheral file should be decided at the user's level.

EFFICIENCY AND PERFORMANCE ANALYSIS The efficiency of the SP depends upon the applica-tions orientation, the size and the basic functional

char-Firmware Sort Processor with LSI Components 189

,.

^FOR:^C~32K

12 TOTAL TIME ITI

~m

• ---•• -

^12K

-

C IaKC IOKC IOKC 120KC

CIt 1111

Figure .5-Statistical curves of sorting parameters

acteristics of the computer system. It is very difficult to depict generalized analytical expressions that corre-late computer system parameters and sorting because of . the diversity and inconsistency of the numerous variables involved.

The diagrams of Figure 5 illustrate the correlation between the main sorting parameters: the sorting time (in minutes), the main memory capacity C (in kilocharacters), and the transfer rate R of the periph-eral file (in kilocharacters per second).

These diagrams are derived by analyzing and com-bining the statistical data for two typical models of computer systems performing a sort.3, 6, 7, 8

The following conclusions can be derived:

a. The CPU time ( 8) spent for sorting generally does not depend upon the size of the MM.

b. The increase of the transfer rate R causes a decline of the total sorting time T hyperbolically. The 8 remains unchanged.

c. For R = 60 KC, the ratio between the I/O time (T - 8) and 8 equals 75 percent to 25 percent.

This ratio is equal to 60 percent to 40 percent for R 2:: 120 KC. This is the prevailing range of the transfer rates for the present magnetic files used in the small-to-medium and larger computer sys-tems.

Considering the fact that at least 25 percent of the computer's workload in a business oriented system in-volves sorting, and 40 percent of that workload is the burden of the CPU, it is evident that the Sort Processor can release up to 10 percent of the CPU's overall working time.

The modular logic structure of the SP is highly adaptable to the further advances in LSI technology.

Larger, faster and cheaper LSI chips (MOS or bipolar) can be easily utilized in the SP, improving the cost-performance index and increasing its overall efficiency in the computer systems.

The estimates show that the SP, designed with

190 Spring Joint Computer Conference, 1970

today's off-the-shelf MOS LSI components, can save considerable amounts of user money.

SUMMARY

The semiconductor technology presently offers LSI components (specifically MOS memory chips) that have a very attractive price-performance index (less than ten cents per bit and around 100 nanoseconds access time). During the coming years this index will be subject to continuous and dramatic improvement thus setting up broader technical and economical grounds for hardware-software trade-offs. The purpose of these trade-offs is the simplification of the software sector of computer systems and the increase of the overall systems productivity for the user.

The Sort Processor designed with LSI components relieves the CPU from the burden of performing the tedious and time consuming sorting operations. It behaves like a low priority peripheral processor and does not cause any structural changes in the architec-ture of the computer system. For the small-to-medium and larger computer systems the Sort Processor can release up to 10 percent of the CPU's total workload.

The techniques of the search memory and dynamic microprogramming allow use of the Sort Processor for algorithmic functions other than sorting.

ACKNOWLEDGElVIENT

The author expresses his appreciation to the NCR-ED Research Department for encouraging the work on

this project, and gratitudes to his colleagues: to Mr.

A. G. Hanlon for stimulating discussions and advices, to Mr. D. W. Rork for his constructive engineering work in designing the breadboard of the Sort Processor, and to Mr. F. Sherwood for early discussions on sorting algorithms.

Special thanks are due to Mrs. Ann Peralta who performed the tedious job of typing and retyping this paper.

REFERENCES

1 The Diebold Research Program Technology Series September 1968 2 I

FLORES-Computer sorting Prentice-Hall Inc 1969 3 Computer characteristics digest

Auerbach April 1969 4 D A BELL

The principles of sorting

Computer Journal Vol 1 No 2 June 19;")8

.~ R R SEEBER

Associative self-sorting memory

Proceedings of EJCC Vol 18 pp 179-187 1960

6 IBM system/360 disc and tape operating system. Sort/merge program specification

File No S360-33 Form C24/3444

7 The National 315 electronic data processing system sorting tables, magnetic tapes

The National Cash Register Company Dayton Ohio 8 Magnetic tape sort generator

Reference manual The National Cash Register Company Dayton Ohio

Im Dokument SPRI NG JOI NT COMPUTER CONFERENCE (Seite 193-199)