• Keine Ergebnisse gefunden

Communication Between Computer and User in Information Searching

Im Dokument RETRIEVAL INFORMATION (Seite 84-93)

II. FINANCING THE FPD TECHNICAL INFORMATION CENTER

11. Communication Between Computer and User in Information Searching

THIS IS A REPORT of an educational experiment at the University of Pittsburgh Computing Center. I believe it may offer several sugges-tions to large organizasugges-tions which have a centralized large scale computing center, on how to take advantage of the availability of such a facility.

We have a tape oriented IBM 7070/1401 system at the center. The tape oriented con-figuration determines the flow of activity, as shown in Exhibit 1.

The IBM 1401 computer is used only to transfer information from cards to magnetic tape and, at the completion of the task, from tape to printer. Not even editing is done on the 1401. All computation is carried out on the IBM 7070 system proper. Such a system

pro-vides very high efficiency of operation since the 7070, which serves as the central proces-sor, is never "waiting" to read cards or to print results.

Such an operation is typical of most large computing centers. The question that imme-diately arises is, how does the user communi-cate with such a system? This problem is solved in general by the use of a monitor system computer program. We refer to our monitor, somewhat facetiously, as PEST, Pitt Executive System for Tapes. It controls the sequencing of programs, one after the other, automatically, allowing a mixture of student homework, graduate theses, faculty research, and production type runs without interrup-tion. Some of the runs use a compiler language

Exhibit 1

Flow of Data in Tape Oriented Configuration DATA

IBM CARDS

PRINTER OR PUNCH

IBM 1401 COMPUTER

6 MAGNETIC TAPE DRIVES

4 SWITCHABLE MAGNETIC TAPE DRIVES

IBM 7070 COMPUTER

like Fortran, some are in machine code or an assembly language, and some are just the execution of a program in our computer pro-gram library. Approximately 150 different runs are executed each day.

One expects a scientist or an engineer to adapt to such a technique of operation. But we also wanted our computer to be used by schools other than the natural sciences. We wanted to allow the professor of education to be able to analyze the vocabulary of first grade readers, the student of Middle English to use computer aids, and the lawyer to make similar use of a computer.

It was in this latter area that our program-ming received the greatest impetus. We have transferred to magnetic tape over 10,000,000 words of statutory materials from selected states. This material was prepared by Mr.

John Horty, Director of the Health Law Cen-ter,.at the university. Free or natural text was typed using Friden Flexowriters to prepare paper tape, which was then converted to punched cards. There are about 10 words on each card. There are no deletions. The sen-tences average approximately 18 words long.

All material was punched as complete sen-tences. Each statutory section was treated as

CREAM, BUTTERMILK, AND ALL OTHER FLUID DERIVATIVES OF MILK.

"MILK PRODUCTS" MEANS ICE CREAM, ICE CREAM MIX, CUSTARD ICE CREAM, FRENCH ICE CREAM, FROZEN CUSTARD, AND OTHER SIMILAR FROZEN PRODUCTS. AND ALL DAIRY PRODUCTS USED IN THE MANUFACTURE THEREOF •

• ,CANNED MILK., MEANS CONDENSED, EVAPORATED OR CONCENTRATED MILK IN HERMETICALLY SEALED CONTAINERS OR FOR MANUFACTURING PURPOSES.

"CERTIFIED MILK" MEANS MILK FROM DAIRY FARMS OPERATED IN ACCORDANCE WITH THE .,METHODS AND STANDARDS FOR THE PRODUCTION AND DISTRIBUTION OF .CERTIFI~D MILK" LAST ADOPTED ~Y THE AMERICAN ASSOCIATION OF MEDICAL MILK COMMISSIONS INCORPORATED", AND

THE PRODUCTION AND HANDLING OF WHICH SHALL BE CERTIFIED TO BY A COMMISSION INSTITUTED IN COMPLIANCE THEREWITH.

"SECRETARY" MEANS THE SECRETARY OF AGRICULTURE OF THIS COMMONWEALTH, OR HIS AUTHORIZED REPRESENTATIVE.

,.PERSON., INCLUDES SINGULAR AND PLURAL. MASCULINE AND

FEMININE. AND ANY INDIVIDUAL. FIRM. COPARTNERSHI~, lNSTITUTION, ASSOCIATION. OR CORPORATION THEREOF.

Exhibit 2 Sample Input Text

Exhibit 3 Vocabulary

ACCESSIBLE 28.3.024,9.18

ACCOtH:lDATIONS 112.3.014,5.13

ACCOMPANIED 1.3.043,14.26 1.3.047,16.18 53.3.009,5.5 78.3.024,8.5 78.3.031,10.7 79.3.023,7.5 79.3.030,9.9

ACCORDANCE 1.3.015,7.10 2.3.005,3.29 4.3.007,3.46

9.3.010,4,52 11.3.007,4.22 17.3.010,5.12 71.3.007,3.48 102.3.033,14.35 118.3.006,4.24

ACCORDING 61.3.014,5.70 64.3.011,4.49 64.3.026,6.75

69.3.015,3.134 79.3.008,3.62 91.3.011,5.24 91.3.016,6.56

77.3.021,6.5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

9.3.006,4.21 88.3.080,19.23 65.3.011,4.13

a separate document, and the documents were assigned numerically increasing document numbers. The documents average 200 words in length.

Since we knew that the user might wish to identify a word within a sentence within a document, we assigned a numerical identifi-cation to each word. Exhibit 2 shows the text of a typical document as stored on magnetic tape. This document was entered as document number 1. Exhibit 3 shows that the word, ACCORDANCE, appears in document num-ber 1, line 15, sentence 7, word 10, matching the reference in Exhibit 2. A computer program to prepare a vocabulary tape containing such a concordance listing was written by the cen-ter. Thus, for each of approximately 20 states there is a text tape and a vocabulary tape. interest. How does an English teacher, lawyer, or professor of education communicate with the computer to use such text data? Suppose he wants to find the documents dealing with

"food poisoning". Does he have to learn the intricacies of computer coding?

The usual answer to this is that he has to depend on some expert. He may hire a com-puter programmer, or he may even go further and seek the help of a professional library scientist skilled in all the devices of indexing.

In the spirit of a university atmosphere, we felt there should be another answer for him.

He should be given aids so that he could communicate with the computer directly him-self. In this way he would be able to do his own study and not be dependent on "experts".

For this reason, a User Oriented Search Language was developed. One example will serve to illustrate its use. Suppose I were interested in the regulations concerning food

Exhibit 4

poisoning. Several concepts come to mind.

These can be identified by actual words that appear in the texts.

AA The concept of food or fish or meat or seafood as represented if one of these words appears in a document.

BB The concept of poison or poisonous.

EE The concept of spoil or decayed.

GG The concept of a hotel or restaurant.

JJ The concept of regulations or penal-ties or suspension (of a license).

MM The combination of the concept la-beled AA, above, and the concept labeled EE, above, in the same docu-ment would insure that we were interested in spoilage of food. To provide this, we wish to require that the document to be retrieved would contain at least one word in group AA and at least one word in group EE. We let MM serve as a label for this group of documents. Now if there were less than, say, 10 documents dealing with the concept MM of spoiled food, we would like to print them out. If there were more than 10 such documents, we might wish to insist that there be a reference to concept CC (sickness) in the same document and we would like a citation to these.

Exhibit 4 shows how such a search request is prepared by a user - namely, the lawyer, himself.

+ is used for or

D is used for "and in the same document".

IF D+O+ 10 +6 means, "if the number of documents in the list collected up to this point is between 0 and 10, then skip 6 lines to the line which has the word PRINT and execute that com-mand".

Cards are punched just as shown in Exhibit 4, one card per line. A header card identifying the program to be used is also added, and this small batch of 35 cards is fed into the system shown in the flow chart in Exhibit 1. Preceding this problem in the card input stack may ,be an engineering problem, and following it may

be a business school inventory model study.

The output is . shown in bhibit Sa through Se.

A few comments will complete our report:

1. Notice the use of the IF statement pro-vides a means for the user to control his answers, conditionally dependent on some intermediate results. This is a necessary and really a key factor in com-munication between the user and the computer. It avoids the necessity of his returning many different times, dissatis-fied with getting either too many irrele-vant documents or having restricted his search by the excessive use of the D or

"and" command and getting no output.

2. Because the search language is trivial to learn, every user may phrase his own search. His satisfaction is infinitely higher. Personal participation means much to him compared with dependency on an "expert".

3. His learning curve on his own work goes up very fast. He is quick to perceive his own successes and failures. He comes to understand his own use of language and the language of his subject much quicker.

Many other computer programs for analysis of natural text have been developed by our center. Some of these are described in the Communications of the Association for Com-puting Machinery, September, 1961, issue. But the one idea I feel is an original contribution to the area of information retrieval has been in the introduction of a User Oriented Search Language and the communication link it provides a user with the computer.

The programs described in this paper were written by Charles R. T. Bacon of the com-puting center staff, and have been made avail-able through the GUIDE library by contacting the IBM Program Librari&n, IBM, White Plains, New York. Mr. Bacon's work has been a significant contribution to the use of our center in new areas beyond the usual scientific applications.

The legal retrieval applications research under the direction of Mr. John Horty is sup-ported by the Council on Library Resources, the National Institutes of Health, the U. S.

Office of Education, and the Ford Foundation.

Output of the Search Request shown in Exhibit 4 Exhibit Sa

Exhibit 5b

Exhibit 5c

Exhibit 5d

Exhibit 5e

By H. P. Luhn Consultant

Im Dokument RETRIEVAL INFORMATION (Seite 84-93)