THE FORMALIZATION OF INPUT

While sampling provides the method for reducing the dimensional com-plexity of natural phenomena, formalization is the method for reducing the intellectual complexity. I wish to propose a quantitative measure of the degree of formalization in a set of records. To do so, consider a record of N bits. The question asked is, How many different things can be repre-sented by such a record? The answer, of course, is simply 2 ^Nas a maxi-mum. But now, suppose we format that record (structure it and formalize implicit relations). To be specific, we will divide it into

f

separate fields of n bits each. Can we now describe, and quantify, a measure of formaliza-tion? To answer this question, we still use as our criterion the number M

FORMS OF INPUT 27 of different things which can be represented by such a record and then measure the degree of formalization by

For example, a fixed format allows the same n bit configuration to rep-resent a different code when used in each of the

f

fields. Hence, M ⁰ =

f·

2ⁿand

Co = log!f

+

n =

~ (1 ⁺

^10g)f)

fn f n

If we reorganize the record into one field of fg bits and f fields of n - g bits, the first can allow the specification of the format, from a set of 2 ^fg formats, for the particular record; then within each format each n - g bit configuration can represent a different code when used in each of the

f

remaining fields. Hence M2 = 2^fg^•

f·

2ⁿ^-^gand

Cg = logf

+

(n - g) = ~ (n - g

+

+ 10gV)

fn f n n n

A different approach is to allow a set of role indicators, say 2^g^;then the number of possible formats is again 2/g. Each field will then have n - g bits left for definition of a code within the format and within the role de-scribed by its role indicator. The total number of different codes is then Mg = 2 ^{fg •}

f .

2 ^n-gand

The effective power of either the format definition or the role indication approaches is therefore effectively the same. The difference in practice is solely one of processing convenience.

Normally, of course, we think of the number of formats 2f or the num-ber of classes of codes, 2^g, which the role indicators define, as relatively limited; but as g gets large and equals n, each configuration becomes a class unto itself. The result is the concept of "implicit" formats, where each n-bit configuration defines a table describing the formats in which it can occur, in terms of its occurrence in a given field. The actual format for a given record is then the logical intersection of the allowable formats

for the configurations in each field. Then Mn = 21n.

f

and

Cg = log 2¹ⁿ.

f

= 1 + logf

fn fn

(Parenthetically, it might be asked how fn bits are able to allow definition of more than 2 In different things. The point, of course, is that a record de-scribes a relation among thef different fields, and although the number of relations among them cannot be more than 2 In, the number of different codes being related certainly can be. Another parenthetical comment is that the numb~r of codes in each case is a maximum. In practice, the ac-tual number of codes will be very much less.)

The result is clear: given a record of fn bits, the degree of formalization of it is measured in terms of a single parameter, g-which can be inter-preted either as defining the number of formats or the number of classes of terms-by the function

Graphically:

/

o

logf. ²^1g ^•²ⁿ^-^g C k = : :

-fn

~

+ n - g + (log

f)

n fn fn

/'2

..9

J __

FORMS OF INPUT 29 From a practical standpoint, the significance of g is that it represents the number of different tables which must be stored and referenced in order to determine the meaning of codes within the record, and thus of the record as a whole.

Incidentally, this entire line of argument can be generalized, in very obvious ways, to include the effects of variable length fields and variable-length formats. On the other hand, it should be recognized that the pro-gramming problems in such generalized formats are enormously greater.

Now, turning to the relationship between input and format, it seems evident that complex phenomena occur at high levels in the spectrum of formalization which I have defined. A sentence, a photograph, a signal-each is at least at the level of an implicit format (in the sense I have de-fined it), depending heavily on context for both form and meaning. It therefore is difficult, if not impossible, for a computer to handle them without introduction of formalization, either through dictionaries of al-lowable forms or through external processing into a standard form.

To implement each of the stages in format formalization therefore re-quires the introduction of a dictionary-of the codes, of the formats, of the role indicators, of the terms themselves. In fact, the concept of format dictionaries may well be a fundamental one in the formalization not only of format but even of meaning. In particular, any format can lead to a nesting of formats-the terms appearing at the one level can imply for-mats which themselves consist of terms implying further forfor-mats, etc.

Such a cascading of formats leads to further generalization of the format concept to even higher levels of complexity.

A final question should be discussed: How do we create a formaliza-tion? I think that the method of formatting provides one useful picture, but it's not the only one. Several approaches to different aspects have been proposed, each representing a variation of the mathematical concept of decomposition-or analysis into fundamental, critical components. For example, methods for file organization (classification) based on decompo-sition of the association matrix have been suggested.⁶⁸^,69 At least one con-cept for decomposing item structure based on combinatorial assignment has been suggested.⁷⁰^,71 The usual lattice model for vocabulary structure implies the possibility of lattice decomposition for creating a facet analy-sisY

INTERNAL PROCESSES OF SYNTHESIS Although internal processing as such falls outside the scope of this talk, there is such an intimate relationship between it and the basic input that I want to comment on that relationship. For example, data indentification

is trivially simple if the input is well formalized (formatted), and can be ex-tremely complex otherwise. File ·organization, similarly, is almost self-evident with formatted data and not at all self-evident with essentially free text. Therefore, the extent to which the input is formalized will directly affect the complexity of the internal processes. Now, this may be self-evident, but it is not at all evident how we choose the proper balance be-tween formalization of input and complexity of internal processing. In the field of information retrieval, for example, investigation has tended to concentrate on either the highly formalized end of the spectrum-charac-terized by the several existing file management programs-or at the essen-tially implicit formats represented by language translation. Although much work has gone into definition of role indicators of various kinds, little has been done on the definition of flexible formats. I suggest that, because of the problem of balancing external formalization and internal complexity, serious consideration be given to the format approach.

With respect to the other factor in input-the dimensional one and the necessity for sampling-similar comments can be made. Much of the difficulties in character reading and pattern recognition are a direct result of sampling. It seems important therefore to develop an adequate theory for this area. One exists for signals, but for two-dimensional images it is a different matter. Again, the significance of the relationship between sampling and internal processing may be self-evident, but the mathematics of it-at least for images-is not at all self-evident.

SUMMARY

In summary, input, as I have considered it, is a process of transforming the physical and intellectual complexity of physical phenomena into simple forms suitable for processing by a computer. The methods for ac-complishing the transformation are, respectively, sampling and formaliza-tion. Their characterization in mathematical terms is an essential first step to the understanding and solution of basic problems in the handling of information. My intent here today has been to describe these two aspects and indicate some directions in which the mathematical characterizations may develop. I wish particularly to emphasize the importance which image processing will play in the years to come and the value of format-ting as a picture of formalization.

REFERENCES

1. Gibbons, James, "How Input/Output Devices Affect Data Processor Per-formance," Control Eng. (July 4, 1957), pp. 97-102.

FORMS OF INPUT 31 2. Blumenthal, E., and F. Lopez, "Punched Card to Magnetic Tape Converter for UNIVAC," Review of Input and Output Equipment Used in Computing Systems-Joint AIEE-IRE-ACM Computer Conference, December 1952.

3. Leiner, Alan L., "Buffering Between Input-Output and the Computer," Re-view of Input and Output Equipment Used in Computing Systems-Joint AIEE-IRE-ACM Computer Conference, December 1952.

4. Frank, W. L., et aI., "Programming On-Line Systems," Datamation, vol. 9, no. 5 (June 1963), pp. 28-32.

5. Lee, R. C., and F. B. Cox, "High-Speed Analog-Digical Computer for Simu-lation," IRE Trans. PGEC (June 1959).

6. Proceedings of the Combined Analog-Digital Computer Systems Symposium, Philadelphia, December 16-17, 1960.

7. Logetronic, for example, a type of electronic dodging equipment,

8. Aronson, Milton H., Data Storage Handbook (Instruments Publishing Com-pany, Inc., 1962).

9. "All about Paper Tape," Datamation, vol. 5 (May-June and July-August 1959).

10. Perlman, Justin A., "Data Collection for Business Information Processing,"

Datamation, vol. 9, no. 2 (February 1963), pp. 54-58.

11. Gruenberger, Fred, Computing Manual, University of Wisconsin, 1953.

12. Kaufman, E. N., "Digital Signal Conversion," Inst. and Cont. Sys., vol. 37, no. 2 (February 1964), pp. 117-119.

13. Wright, R. E., "How to Make Computer Compatible Tapes," Control Eng., vol. 9, no. 5 (May 1962), pp. 127-129.

14. Recordak, The Miracode System of Automated Information Retrieval, East-man-Kodak,1964.

15. FMA, Users Guide, 1962.

16. Kuipers, J. W., et aI., "A Minicard System for Documentary Information,"

American Doc., vol. 8, no. 4 (October 1957), pp. 246-268.

17. Shaw, Ralph R., "The Rapid Selector," J. Doc., vol. 5 (1949), pp. 164-171.

18. Becker, J., and R. M. Hayes, Information Storage and Retrieval: Tools, Ele-ments, Theories (Wiley, 1963), pp. 308-316.

19. Fischer, George L., et al. (eds.), Proceedings of the Symposium on Optical Character Recognition, Washington, D.C., January 15-17, 1962 (Spartan, 1962).

20. Bower, C. G., "Survey of Analogue to Digital Converters," National Bureau of Standards Report #2755 (July 1953).

21. Burke, H. E., "Survey of Analogue-to-Digital Data Converters," Review of Input and Output Equipment Used in Computing Systems-Joint AIEE-IRE-22. ACM Computer Conference, Dec. 10-12, 1952, pp. 98-105.

Fischer, P. P., "Analogue-to-Digital Converters," Electro-Technician, vol. 69, no. 3 (March 1962), pp. 165-168.

23. Fleischer, A. A., and E. Johnson, "Analogue-to-Digital Converter Capable of Nanosecond Resolution," IEEE Trans. Nuclear Science, NS-1O (1) (January 1963), pp. 31-35.

24. Klein, Martin J., "Analog-Digital Converters: An Evaluation," Datamation, vol. 4 (May-June 1958).

25. Suskind, Alfred (ed.), Analog-Digital Conversion Techniques (M.LT. Press, 1957).

26. Husbey, H. D., and Granino A. Korn, Computer Handbook (McGraw-Hill, 1962), pp. 1829ff.

27. Beers, Yardley, Introduction to the Theory of Error (Addison-Wesley, 1957).

28. Barnes, John E., Jr., "Sampled Data Systems and Periodic Controllers," in Handbook of Automation, Computation, and Control (Wiley, 1958).

29. Blanger, C. G., "Sampled-Data Techniques in Complex Simulation," Eastern Simulation Council, Jan. 5, 1959.

30. Cherry, Colin, On Human Communication (Wiley, 1957), pp. 121ff.

31. Lin v ill , William K., and John M. Salzer, "Analysis of Control Systems In-volving Digital Computers," Proc. IRE, vol. 41 (July 1954).

32. Ragazzini, J. F., and G. F. Franklin, Sampled-Data Control Systems (Mc-Graw-Hill,1958).

33. Salzer, John M., "Frequency Analysis of Digital Computers Operating in Real Time," Proc. IRE, vol. 42 (February 1954).

34. Hakimoglu, A., and R. D. Kulvin, "Sampling Ten Million Words per Sec-ond," Electronics, vol. 37, no. 6 (Feb. 7, 1964), pp. 52-54.

35. Hollitch, Robert S., and Albert K. Hawkes, Automatic Data Reduction, A Catalogue of Devices Useful in Automatic Data Reduction, WADC Technical Report #54-519, Part II, Armour Research Foundation (November 1954).

36. Walli, Charles R., "Quantizing and Sampling Errors in Hybrid Computa-tion," Proc. FJCe (Oct. 27-29, 1964).

37. Greanias, Hoppel, Kloomok, and Osborne, "Design of Logic for Recognition of Printed Characters," IBM Journal of Research and Development.

38. Johnson, J. R., and N. Rochester, "A Simulated Machine for Recognizing Printed Numerical Characters by a Method of Lakes and Inlets," IBM Jour-nal of Research and Development, Poughkeepsie, Nov. 30, 1953.

39. Clapp, L. C., "Optical Information Processing," Int. Sci. and Tech. (July 19, 1963), pp. 34-41.

40. David, E. E., Jr., and O. E. Selfridge, "Eyes and Ears for Computers," Proc.

IRE, vol. 50, no. 5 (May 1962), pp. 1093-1101.

41. Davis, Malcolm R., and T. O. Ellis, "The RAND Tablet: A Man-Machine Communication Device," Proc. FJCC (Oct. 27-29, 1964).

42. Duncan, A. H., "Automatic Picture Digitizer," Brit. Comm and Ele .. , vol. 9, no. 9 (September 1962), pp. 676-679.

43. Fulton, Roger L., "Visual Input to Computers," Bus. Datamation, vol. 9, no.

7 (August 1963), pp. 37-40.

44. Hargreaves, Barrett, et aI., "Image Processing Hardware for a Man-Machine Graphical Communication System," Proc. FJCC (Oct. 27-29, 1964).

45. Holmes, W. S., and H. M. Maynard, "Input-Output Equipment for Research Applications," Nat. Electr. Con! Proc., vol. 18 (1962), paper 1539, pp. 509-517.

46. Jacks, Edwin L., "A Laboratory for the Study of Graphical Man-Machine Communication," Proc. FJCC (Oct. 27-29, 1964).

FORMS OF INPUT 33

47. Julesz, Bela, "Binocular Depth Perception and Pattern Recognition," in Cherry, Colin (ed.), Information Theory (Butterworth, 1961).

48. Kirsch, R. A., et aI., "Experiments in Processing Pictorial Information with a Digital Computer," Proc. FJCC (December 1957).

49. Krull, Fred N., and James E. Foote, "A Line-Scanning System Controlled from an On-Line Console," Proc. FJCC (Oct. 27-29, 1964).

50. Larsen, K., "Automatic Readings and Interpolation of Strip Charts and Film Records," Proc. ISA, Reprint 63 (March 1963).

51. Maffi, C., and E. Marchesini, "Semiautomatic Equipment for Statistical Analysis of Air Photo Linears," Photogramm. Eng., vol. 30, no. 1 (January 1964), pp. 139-141.

52. MeFee, R. H., "Information Processing in Infrared Systems," J. Soc. Photog.

Inst. Engrs., vol. 2, no. 1 (Oct.-Nov. 1963), pp. 14-16.

53. Shaler, D., "Ultrahigh-Speed Microfilm Facsimile System," IEEE Trans.

Comm. Electr., vol. 82, no. 66 (May 1963), pp. 201-207.

54. Shonnard, J. R., "High-Speed Communication of Graphic Intelligence with Hard Copy Readout," AlEE Trans. Comm. Elect., vol. 81, pt. 1 (61) (July 1962), pp. 176-178.

55. "Eye for Computers Is Quick as a Wink," Business Week (Sept. 19, 1964), p.80.

56. Stein, Edward S., et aI., Factors Influencing the Design of Original-Document Scanners for Input to Computers, National Bureau of Standards, August 19, 1964.

57. Stereomat, Benson-Lehner, Inc. (no longer manufactured, but some devices were delivered).

58. Carslaw, H. S., Fourier Series and Integrals (Dover, pp. 326ff.).

59. Whittaker, J. M., Interpolatory Function Theory (Cambridge, 1935).

60. Blackman, R. B., and John W. Tukey, The Measurement of Power Spectra, (Dover, 1958), pp. 50-52.

61. Tukey, John W., "Estimation of Power Spectra," in Rudolph E. Langer (ed.), On Numerical Approximation (Wisconsin, 1959).

62. Blackman, R. B., and John W. Tukey, op. cit.

63. Schade, Otto, "Evaluation of Photographic Image Quality and Resolving Power," J. SMPTE, vol. 73, no. 2 (February 1964), pp. 81-119.

64. Shaw, R., "The Application of Fourier Techniques and Information Theory to the Assessment of Photographic Image Quality," Photo. Sci. Eng., vol. 7, no. 5 (Sept.-Oct. 1962), pp. 28 1 if.

65. Hu, M. K., "Visual Pattern Recognition by Moment Invariants," IRE Trans.

Inform. Theory, IT-8, vol. 2 (February 1962), pp. 180-187; F. L. Alt, "Digital Pattern Recognition by Moments," J. Assoc. Compo Mach., vol. 9, no. 2 (April 1962), pp. 240-258; M. K. Hu, Theory of Adaptive Mechanisms, Syra-cuse University (December 1963), pp. 16-65.

66. Courant, R., and D. Hilkert, M-ethods of Mathematical Physics, vol. I and II (Interscience, i~62), esp. Chapter V,and Appendix to v. II.

67. Morrey, Charles B., Jr., Multipledntegral Problems in the Calculus of Varia-tions and Related Topics (U.C. Press, 1943).

68. ADI-NBS, Symposium on Statistical Association Techniques (Washington, D.C., March 1964).

69. Borko, H., and M. D. Bernick, "Automatic Document Classification," J.

Assoc. Compo Mach.

70. O'Conner-Schultz, "Scan Column Index."

71. Prywes, Multi-List Processing (U. of Pennsylvania, 1963).

72. Hillman, Donald J., Study of Theories and Models of Information Storage and Retrieval (Lehigh, 1963).

5 Signals and Numerical

Im Dokument ee ion (Seite 36-45)