• Keine Ergebnisse gefunden

ABCDEFGHIJKLMNOPQRSTUVWXYZ 123lfSb7SQO

Im Dokument Contents of Volume 2 datopro (Seite 169-172)

*@#()'t&-O$

'''=+09];r:,.

~ "External program" refers to those units that are con-nected on-line and can be instructed to perform field selection or editing and formatting. "Internal program"

refers to those units that include processing logic (often a minicomputer) within the system defined. A less flexible method is to use plugboards to control functions.

Recognition

There is not too much to say about recognition of marks by a mark reader. Typically, they are diagonal slashes made in a preprinted box or outline. Care must be taken when erasing because if the paper is roughened too much, it will have a low reflectance and make the reader conclude that the roughened area is a mark.

Bar codes are well illustrated by one of the accompanying figures (page 70D-OI0-78i).

There are many specialized fonts that are used today in addition to traditional printing type styles. A few of the . iIiore sophisticated readers recognize printing type styles

in addition to or in place of the OCR fonts.

The more commonly used OCR fonts include:

• OCR A-A group of three fonts having similar shapes but different sizes and proportions. Developed by the American National Standards Institute, these are now the most widely used OCR fonts. For perspective, the sizes 2.re roughly equivalent to IO-point type (Size I), which is the size of the printing you are now reading, I4-point type (Size III), and 16-point type (Size N;

also called OCR C).

Size I has a pitch of about 10 characters per inch and is typically used on typewriters and computer print-ers. Size III also has a pitch of about 10 characters per inch and is typically used on cash registers and accounting machines. Size IV has a pitch of about 7 characters per inch and is typically used on embossed plastic cards (credit cards) and metal plates.

The newest revision of the ANSI standard (proposed)

includes the definition of a lower case alphabetic group for Size I, alternate shapes for some punctua-tion marks, and expanded specificapunctua-tions for spectral bands, paper and print characteristics, and posi-tioning.

• OCR B-Popular in Europe, this font is now under consideration for standardization in the United States. It differs considerably from the ANS standard and is characterized by being closer to conventional type faces than the ANS fonts. Exponents of the OCR B font are concerned about readability by people, even though it is more difficult to build the machine recognition logic to handle it. This font is also called ISO B, and occasionally ECMA-II. There are also three sizes defined for OCR B: I, III, and N.

These are essentially equivalent in size to OCR A, except that OCR B Size IV characters are a little shorter and broader than OCR A Size IV. OCR B Sizes I and IV include numerics, upper and lower case alphabetics, punctuation marks, special symbols, and several foreign language symbols. Size III includes numerics, 7 upper case alphabetics, and 3 special symbols.

• Farrington 7B, I2F, and 12L-Another popular group of OCR fonts, due to Farrington's early appearance on the OCR scene. The three codes have somewhat similar shapes but differ in size and character set. The 7B and l2F are numeric fonts, while the 12L is alphabetic only. The 7B is much larger than the 12 F/L. Imprinting is normally done by a special type·

writer or a credit card embosser.

• IBM 1428-This is an alphanumeric font associated with the IBM 1428 Optical Reader and imprinted by an IBM 1403 Line Printer or IBM Selectric Type-writer.

• IBM 407-A font produced by the widely used IBM 407 Accounting machine.

• NCR NOF (Numeric Optical Font)-This is a numeric font, usually imprinted by an adding machine or cash register. It is widely used in retail applications. t::>

©

700-010-78p Peripherals

All About Optical Readers

}::> • E-13B-This is not properly an OCR font. It was developed by and for banks prior to the development of OCR. It is a highly stylized numeric font intended for printing in magnetic ink to facilitate the sorting and processing of bank checks. It has not caught on anywhere else, but most banks use it. Some optical readers can read this font, which enhances their suitability for banks converting to OCR.

• Handprint - If the ANSI standard that has been in the proposed stage for over a year finally jells, there will be an official handprint font. It may include numeric and alphabetic symbols, punctuation marks, deletion provisions, symbols oriented toward pro-gramming, and special provisions for international usage.

All fonts, both numeric and alphanumeric, usually include a few special symbols for control purposes. The OCR A and B fonts include a full array of punctuation symbols as well. Some of these fonts are illustrated in the accom-panying figures.

The scanning technique is the method for optically con-verting the printed images to electrical signals. Some sort of photosensitive device, photocell or phototransistor, is used to sense the light reflected from the document. For bar-code and character readers, additional components are required to scan portions of the code or character in proper order so that the features, and thus the character, can be identified. The scanning components can be an array of photo devices, a mechanical disc, or a flying spot (CRT). The photo-device array is used by most bar-code readers. Size normally interferes with using it for scanning characters, but note that REI does quite nicely with it.

The mechanical disc technique employs a rotating disc with a slit in it to project a beam of light over the character in a predetermined order. The flying spot scanner uses an electron beam that is moved within a CRT to generate a spot of light, thus providing, potentially anyway, a much faster scan rate. The flying spot scanner is very adaptable to reading multiple lines, while the mechanical disc scanning technique requires either an incremental document transport or an elaborate system of mirrors to scan multiple lines.

Once the printed image has been translated into electrical signals, the recognition logic interprets these signals as a particular character.

Three principal recognition techniques are used for char-acter interpretation: matrix matching, stroke analysis, and curve tracing. Matrix matching involves comparing the matrix of signals caused by the reflecting (paper) and non-reflecting (printed character) portions of the char-acter with a set of signals for each charchar-acter until a match is found. Typically, the closest match within prescribed limits is identified, because a perfect set of signals is most

unusual due to variations in print and paper quality. The stroke analysis method is somewhat similar on a much simpler basis; readers employing this technique usually are reading a highly stylized font specifically designed for the technique. Curve tracing logic actually traces out the outline of the character to derive a set of signals for analysis. The curve tracing technique is adaptable to variations in character size and orientation, making it a good choice for interpreting hand-printed characters.

However, breaks in the printed character tend to affect this method more than the matrix matching method.

Output

The purpose of the optical reader is to generate data in a computer-readable form. Magnetic tape, punched cards, and punched tape are conventional computer-readable forms. De facto standards, now official, have been estab-lished by IBM for magnetic tape and punched cards.

Teletype has done essentially the same for punched tape.

Exceptions that are relatively new on the market are magnetic tape cassettes and 96-column cards-but neither of these has had much impact on the optical reader market, though they unquestionably will in the future.

Communications lines enable the transmission of data from remote locations to a central site for processing.

Optical reading terminals are principally mark and bar-code readers, but a few character readers are available for use as remote terminals.

On-line optical readers are usually produced by a com-puter manufacturer for a particular line of comcom-puters, but a few independents also sell to this market. Frequently, the independents will design an interface for your com-puter, but it may cost you extra.

Performance

Readers that handle one size of documents are easy to rate for performance because it is predictable. In a similar manner, a journal tape reader typically transports the tape at a fixed rate, with predictable performance. Readers that handle different sizes of documents and variable-size data fields are not as conducive to having their perform-ance stated in simple terms.

Three ways of measuring the performance of optical readers are documents per minute, lines per minute, and characters per second. The documents-per-minute rating is usually most applicable to mark and bar-code readers, as well as to character readers that read only one or two lines. The lines-per-minute rating is usually most meaning-ful for journal tape readers. The instantaneous character scanning rate in characters per second is probably the most meaningful single measure for character readers that

read whole pages of text.

r:

© 1974 DATAPRO RESEARCH CORPORATION, DELRAN, N.J. 08075 REPRODUCTION PROHIBITED

MARCH 1974

70D-010-78q Peripherals

All About Optical Readers l:> . Careful evaluation of timing information, which often

becomes quite complex, is necessary to accurately predict the performance of the more sophisticated character readers. The size of the document, the amount and location of data on the document, and processing of the data read can all affect the rate at which documents proceed through the reader. On-line units can be affected by other activities of the computer, if running in a multiprogra.-rurJng environment, or by poor programming of input/output functions.

Error Control

Discussion of errors and controls always leads to dissen-tion no matter what area of data processing is being discussed. Advertised (and verified) reject rates of 0.25 to 0.5 percent cause some users to become startled and disillusioned when they see 30 or 50 percent of the documents going into the reject pocket.

There are three principal types of errors of concern to users of optical character readers: ambiguous characters, invalid data, and documents in poor condition.

Ambiguous characters are those for which the reader cannot make a decision about what character each should be. There can be many reasons. Typical ones include broken or poorly formed characters and dirt or other marks that are picked up by the reader. Handling of this situation varies with the reader and with programming.

Many readers automatically rescan an ambiguous char-acter. Some substitute a standard character for all unread-able characters and continue. Others display the character on a CRT screen for operator determination; sometimes adjacent data is also displayed to give the operator more context for making the decision. Printing quality and paper quality can drastically affect the incidence of this type of error.

OCR users quickly learned that the inclusion of checks in the data was extremely useful for insuring the

mainte-nance of an adequate throughput level by reducing the number of rejected documents. This technique most frequently takes the form of repeated data fields, particu-larly for numeric entries. The technique is applicable only if the data can be processed and the actions of the reader controlled on the basis of the result.

Another commonly employed check is the check digit.

The digits of a numeric field are manipulated, and there are several standard formulas, to generate a check digit.

This digit is included in the input. The reader or associ-ated processor generates another check digit while reading and compares it to the one read in. Failure of this check normally causes the document to be rejected.

Documents that have extraneous items on them (such as stamps) or that have been badly mutilated can cause misfeeding and/or jams. The typical character reader is far less susceptible to this kind of jam than the average card reader, but people have been conditioned not to fold, spindle, or mutilate punched cards.

Pricing, First Delivery, and Number Installed

The prices in the comparison charts were furnished by the manufacturers in November and December 1973. Where a range is shown, the variations indicate the cost of optional features. Where only one price is shown, it is generally a base price for a working but not deluxe system. If a price is not shown, NA means the manufacturer declined to publish the price for that type of acquistion (typically, rentals that are negotiated).

First delivery indicates the date of the first successful installation of that model, or the anticipated date of first installation.

The number installed to date shows the number in use as of November or December 1973. When a manufacturer declined to provide this information, NA is inserted. !>

The Control Data 955 Page and Document Reading Sys-tem, the top of the Control Data line, is one of the com-mercially successful optical character readers. Accom-panying the reader at left are a magnetic tape transport, a magnetic tape controller, a teletypewriter, and the SC

1700 computer system con-troller.

MARCH 1974 © 1974 DATAPRO RESEARCH CORPORATION, DELRAN, N.J. 08075

700-010-78r Peripherals

All About Optical Readers

Im Dokument Contents of Volume 2 datopro (Seite 169-172)