• Keine Ergebnisse gefunden

LEXICAL CONVENTION

Im Dokument UNIX™ SYSTEM V . . (Seite 128-135)

All awk programs are made up of lexical units called tokens. In awk there are eight token types:

l. numeric constants 2. string cons tan ts 3. keywords 4. identifiers 5. operators

6. record and file tokens 7. comments

8. separators.

Numeric Constants

A numeric constant is either a decimal constant or a floating constant. A decimal constant is a non null sequence of digits containing at most one decimal point as in 12, 12., 1.2, and .12. A floating constant is a decimal constant followed by e or E followed by an optional

+

or - sign followed by a nonnull sequence of digits as in 12e3, 1.2e3, 1.2e-3, and 1.2E+3. The maximum size and precision of a numeric constant are machine dependent.

String Constants

A string constant IS a sequence of zero or more characters surrounded by double quotes as in "," "a", "ab", and" 12". A double quote is put in a string by proceeding it with \ as in "He said,

\ Sit! \"" . A newline is put in a string by using \n in its place. No other characters need to be escaped. Strings can be (almost) any length.

Keywords

Strings used as keywords are shown in Figure 6-1.

Keywords

begin break length

end close log

FILENAME continue next

FS close number

NF exit print

NR exp printf

OFS for split

ORS getline sprintf

OFMT if sqrt

RS in string

index substr

int while

Figure 6-1. Strings Used as Keywords

Identifiers

Identifiers in awk serve to denote variables and arrays. An identifier is a sequence of letters, digits, and underscores, beginning with a letter or an underscore. Uppercase and lowercase letters are different.

Operators

The awk has assignment, arithmetic, relational, and logical operators similar to those in the C programming language and regular expression pattern matching operators similar to those in the UNIX operating system program egrep and lex.

6-4

awk

Assignment operators are shown in Figure 6-2.

Assignment Operators

Symbol Usage Description

= assignment

+= plus-equals X += Y is similar toX=X+Y

--

minus-equals X-=Y is similar to X = X-Y

*= times-equals X *= Y is similar to X = X*Y /= divide-equals X = Y is similar

toX = X/Y

%= mod-equals X%=Y is similar to X = X%Y

++ prefix and + + X and X ++ are similar

postfix to X=X+l

increments

- prefix and - and X- similar

postfix toX=X-l

decrements

Figure 6-2. Symbols and Descriptiolls for Assignment Operators

Arithmetic operators are shown in Figure 6-3.

Arithmetic Operators Symbol Description

+

unary binary plus - unary and binary minus

*

multi plica tion / division

% modulus

( ... ) grouping

Figure 6-3. Symbols and Descriptions for Arithmetic Operators

6-6

awk

Relational operators are shown in Figure 6-4.

Relational Operators Symbol Description

< less than

<= less than or equal to - - equal to

!= not equal to

>= greater than or equal to

> grea ter than

Figure 6-4. Symbols and Descriptions for Relational Operators

Logical operators are shown in Figure 6-5.

Logical Operators Symbol Description

&&

and

I I

or

I I

! not

Figure 6-5. Symbols and Descriptions for Logical Operators

Regular expression matching operators are shown in the Figure 6-6.

Regular Expression Pattern Matching Operators

Symbol Description

- matches

!- does not match

Figure 6-6. Symbols and Descriptions for Regular Expression Pattern

Record and Field Tokens

The $0 is a special variable whose value is that of the current input record. The $1, $2 ... are special variables whose values are those of the first field, the second field, . . . , respectively, of the current input record. The keyword NF (Number of Fields) is a special variable whose value is the number of fields in the current input records. Thus $NF has, as its value, the value of the last field of the current input records. Notice that the field of each record is numbered 1 and that the number of fields can vary from record to record. None of these variables is defined in the action associated with a BEGIN or END pattern, where there is no current input record.

The keyword NR (Number of Records) is a variable whose value is the number of input records read so far. The first input record read is 1.

6-8

awk assignment statement RS = "c" in an action.

Field Separator command line, -Ft makes tab the field separator.

If the field operator is not a blank, then there is a field in the record on each side of the separator. For instance, if the field separator is 1, the record lXXXl has three fields. The first and last are null. If the field separator is blank, then fields are separated by white space, and none of the NF fields are null.

Multiline Records

The assignment RS =" " makes an empty line the record separator and makes a non null sequence (consisting of blanks, tabs, and possibly a newline) the field separator. With this setting, none of the first NF fields of any record are null.

Output Record and Field Separators

The value of OFS (Output Field Separator) is the output field separator. It is put between fields by print. The value of ORS (Output Record Separators) is put after each record by print. Initially ORS is set to a newline and OFS to a space. These values may change to any string by assignments such as ORS = "abc" and OFS = "xyz".

Comments

A comm~nt is introduced by a

#

and terminated by a newline. For example:

# part of the line is a comment

A comment can be appended to the end of any line of an awk program.

Separators and Brackets

Tokens in awk are usually separated by nonnull sequences of blank, tabs, and newlines, or by other punctuation symbols such as commas and semicolons. Braces { ... } surround actions, slashes / .. .1 surround regular expression patterns, and double quotes" ... " surround strings.

Im Dokument UNIX™ SYSTEM V . . (Seite 128-135)