• Keine Ergebnisse gefunden

Effective and Efficient Shell Programming

Im Dokument The Chapter (Seite 41-45)

'1.13 Supporting Commands and Features

7.14 Effective and Efficient Shell Programming

This section outlines strategies for writing efficient shell procedures, ones that do not waste resources in accomplishing their purposes. The primary re8.80n for choosing a shell procedure to perform a specific function is to achieve a desired result at a minimum human cost. Emphasis should always be placed on simplicity, clarity, and readability, but efficiency can also be gained through awareness of a few design strategies. In many cases, an effective redesign of an existing procedure improves its efficiency by reducing its size, and often increases its comprehensibility. In any case, you should not worry about optimizing shell procedures unless they are intolerably slow or are known to consume an inordinate amount of a system '8 resources.

7·39

The same kind or iteration cycle should be applied to shell procedures as to other programs: write code, measure it, and optimize only the lew important parts. The user should become familiar with the time command, which can be used to measure both entire procedures and parts thereof. Its use i~ strongly recommended; human intuition is notoriously unreliable when used to estimate timings of programs, even when the style or programming is a familiar one.

Each timing test should be run several times, because the results are easily disturbed by variations in system load.

7.14.1 Number or Processes Generated

When large numbers of short commands are executed, the actual execution time of the commands may well be dominated by the overhead of creating proce$ses. The procedures that incur significant amounts orsuch overhead are those that perform mueh looping and those that generate command sequences to be interpreted by another shell.

It you are worried about efficiency, it is important to know which commands are currently built into the shell, and which are not. Here is the alphabetical list orthose that are built in:

break case cd continue eva!

exec exit export for if

newgrp read readonly set shift

test times trap umask until

wait while {}

Parentheses, (), are built into the shell, but commands enclosed within them are executed as a child process, i.e., the shell does a rork, but no exec. Any command not in the above list requires both rork and exec.

The user should always have at least a vague idea of the number or processes generated by a shell procedure. In the bulk of observed procedures, the number of processes created (not necessarily sim ultaneously) can be described by:

processes - (k *n)

+

c

where k and t are constants, and n may be the number of procedure arguments, the number of lines in some input file, the number of entries in some directory , or some other obvious quantity. Efficiency improvements are most commonly gained by reducing the value of k, sometimes to zero.

Any procedure whose complexity measure includes n' terms or higher powers of n is likely to be intolerably expensive.

As an example, here is an analysis of a procedure named ,plit,' whose text is given below:

7-40

*

split

trap 'rm temp"; trap 0; exit' 0 1 2 3 15 startl =0 start2==O

b-1A-Za-z] , cat> tempSS

., read stdin into temp file ., save original lengths of SI, S2 if test -s "SI"

then startl- 'wc -I

<

SI' fi

if test -8 "82"

then start2== 'wc -I

<

S2' fi

grep "Sb" tempS. > > SI

*

lines with letters onto SI grep -v "Sb" tempSS

I

grep

lo.Q]'

>

>

S2

*

lines with only numbers onto S2 total-" 'wc -I

<

tempSS'"

endl==" 'wc -I

<

SI''' end2=" 'wc -I

<

$2'"

lost==" 'expr Stotal - \(Sendl - Sstartl\) \ - \(Send2 - Sstart2\)'''

echo" Stotal read, Slost thrown away"

For each iteration of the loop, there is one expr plus either an echo or &nother expr. One additional echo is executed at the end. Ir n is the number ollines of input, the number ofprocessesis2* n

+

1.

Some types of procedures should ftot be written using the shell. For example, ir one or more processes are generated for each character in some file, it is a good indication that the procedure should be rewritten in C. Shell procedures should not be used to scan or buil d files a character at a time.

-: .14.2 Number or Data Bytes Accessed

It is worthwhile considering any action that reduces the number of bytes read or writti4!n. This may be important Cor those procedures whose time is spent passing data around among a few processes, rather than in creating large numbers of short processes. Some filters shrink their output, others usually increase it. It always pays to put the drinker. first when the order is irrelevant. For instance, the second of the following examples is likely to be faster because the input to sort will be much smaller:

sort file

I

grep pattern grep pattern file

I

sort

1-41

7.14.3 Shortening Directory Searches

Directory searching can consume a great deal of time, especially in those applications that utilize deep directory structures and long pathnames.

Judicious use of cd, the change directory command, can help shorten long pathnames and thus reduce the number of directory searches needed. As an exercise, try the following commands:

Is -l/usr/bin/* >/dev/null cd lusr/bin; Is -1 * >/dev/null

The second command will run faster because of the fewer directory searches.

7.14.4 Directory-Search Order and the PATH Variable

The PAm variable is a convenient mechanism for allowing organization and sharing of procedures. However, it must be used in a sensible fashion, or the result may be a great increase in system overhead.

The process of finding a command involves reading every directory included in every path name that precedes the needed pathname in the current PAm variable. As an example, consider the effect of invoking nroff (i.e., / ulr/ bini nroff) when the value or PATH is ":/bin:/usr Ibin". The sequence of directories read is:

/bin

/ I

lusr lusr/bin

This is a total of six directories. A long path list assigned to PATH can increase this number significantly.

The vast majority of command executions are or commands round in

I

bira and, to a somewhat lesser extent, in /ulrlbira. Careless PAm setup may lead to a great deal or unnecessary searching. The rollowing four examples are ordered rrom worst to best with respect to the efficiency of command searches:

:/usr /john/bin:/usr /localbin:/bin: lusr Ibin :/bin:/usr/john/bin:/usr /localbin:/usr Ibin :/bin:/usr /bin:/usr /john/hin: /usr /localbin /bin::/usr Ibin: lusr Ijohn/bin: /usr Ilocalbin

The first one above should be avoided. The others are acceptable and the choice among them is dictated by the rate of change in the set of commands kept in

I

bin and

I Ulr/

bin.

7·42

A procedure that is expensive because it invokes many short-lived commands may often be speeded up by setting the PATH variable inside the procedure so that the fewest possible directories are searched in an optimum order.

7.14.5 Good Ways to Set Up Directories

It is wise to avoid directories that are larger than necessary. You should be aware of several special sizes. A directory that contains entries for up to 30 files (plus the required. and .. ) fits in a single disk block and can be searched very efficiently. One that has up to 286 entries is still a small directory; anything larger is usually a disaster when used as a working directory. It is especially important to keep login directories small, preferably one block at most. Note that, as a rule, directories never shrink. This is very important to understand, because if your directory ever exceeds either the 30 or 286 thresholds, searches will be inefficient; furthermore, even ir you delete files so that the number or files is less than either threshold, the system will still continue to treat the directory inefficiently.

Im Dokument The Chapter (Seite 41-45)