Design and Analysis of an Optimal Control Algorithm for the Output of a Stochastic Automaton

(1)

deposit_hagen

Publikationsserver der Universitätsbibliothek

Mathematik und

Informatik

Informatik-Berichte 10 – 11/1980

Design and Analysis of an Optimal

Control Algorithm for the Output of a

Stochastic Automaton

(2)

Ernst-Erich Doberkat

K~y Words: Stochastic automata, optimal control, linear -

programming, analysis of algorithms, average complexity.

CR Categories: 5.22, 5.25, 5.41

AMS Classification: 68 A 20

Abstract: Given the input-output behavior of a stochastic automaton and a map which compares output words, optimal controls are introduced, arid an explicit formula for such a control is derived by means of linear programming techniques. The second part of this paper deals with the s i tu a t i o n t hat an i n p u t an d an out p u t wo r d a r e a i v e n an d the probability with which the opti~al control puts out the ward is requested. An algorithm for co~puting this probability is presented and analysed with respect to its behavior in the warst as well as in the average case.

The latter analysis develops some new techniques, which are necessary since an uncountable variety of possible

inputs has to be considered. These techniques come ~ain- ly from real analysis, in particular from measure theory.

(3)

1. Introduction

Consider a finite stochastic automaton which has to be controlled by another stochastic automaton in such a way that some objective functional is to be minimized, then the problem considered here is to find the optimal con- troling automaton. To obtain a more precise formulation of the problem, let F be a function which compares output words, F(w,w¹ ⁾ is the measure for the similarity of w and w¹ ^; examples of F are furnished by various metrics which can be defined from a metric on the output alphabet. Assume that the automaton to be controlled

puts out the word w after input of .the word v with probability T(v)(w) , and the automaton which serves as a control puts out w' after input v with probability L(v)(w' ). The deviation in this situation is F(w,w') weighted by the respective probabilities yields

F(w,w') T(v)(w) L(v)(w')

as a kind of expectation for this situation. Hence the expected value of the deviation after input v using the control L is

r(v,L): =

I I

F.(w;w') T(v) (w) L(v) (w') w w'

A control L' is said tobe better, than L iff r(v,L') s; r(v,L)

holds for every input word v . Consequentiy, an optimal control will minimize r(v,•) for all input words v subject to the condition that it is the behavior of a stochastic automaton.

(4)

For the sake of unifying notions, in Section 2 stochastic and nondeterministic automata are remembered. In Section 3 it is shown that an optimal control can be characte- · rized explicitely by some peak points of a function which is generated from F , and T . This characteri- zation is derived by means of the Duality Theorem of Linear Programming. Moreover in Section 3 F is specia- lized, viz., tobe the sum metric, and the maximum metric on the set of output words respectively, when the output letters are endowed with the discrete n1etric, and the optimal control is seen to have a particular simple form.

The latter case is considered in greater detail in Sec- tion 4, in which an input and an output ward ar~ given and the value of the optimal control in this situation

is to be computed. For this computation an algorithm is presented and analysed with respect to its worst and average behavior. The average case considered here is mathematically handled by assuming that the respective words and the automaton are given by a stochastic process; here measure theoretic ideas are necessary since counting (the usual device in such an average case analysis) is impössible by the uncountability of the set of all stochastic automata.

2. Stochastic Automata

Let for this paper X and Y be fi~·ed and finite sets of input and output letters, respectively. A stochastic automaton is a mathematical machine which works in the following way: given an input letter and a state, the

(5)

automaton changes its state and puts an output letter out, and for any state and any output letter there is defined a probability that this state and this letter is the new state, and the actual output, respectively. Moreover it is assumed that some initial state is adopted with a cer- tain probability. Formally defined, (Z;K,p) is said to be a stochstic automaton iff given x EX , z E Z , K(x,z) is a probability on ZxY, and if p is a probability distribution over- Z , the finite set of internal states.

Hence given (x,z,2 ¹ ,y)EXxZxZxY, K(x,z)(z ¹ ,y) is the probability that the automaton puts out y and adopts 2¹ as its new state, provided the input in state z has been x ; p(z) is the probability that the automaton has z as its initial state. Let for later use Prob(X x Y) be the set of all stochastic vectors on Z x Y , i .e.

oEProb(ZxY) holds iff ⁰ maps ZxY into [0,1] such tha t

I{o(z,y); ZEZ, yEY} ⁼1

holds. Then a stochastic automaton can be identified with a map K:XxZ-Prob(ZxY) and pEProb(Z), provided the set Z of states is given.

Let be as usual

x*

be the set of words over X with e as the empty ward and j V j as the l ength of ^VE,X

*

. Define

K(e,z)(z ¹,e) =

and inductively K(vx,z)(z ¹,wy)

2 : 2 I

{

1, i f

0, otherwise ,

=

I

K(x,z)(z ¹,y) K(v,z)(z,w) , zEZ

(6)

then K(v,z)(z' ,w) is the probability that z' is the new state and w the output word after input of v in state z •

Taking the distribution of the initial states into ac- count, define

K*p(v)(w): =

I [. l

^·K(v,z)(z¹ ,w) p(z)J

ZEZ z ':EZ

then K;(v)(w) is the probability that the automaton answers the input v with w .. KP

*

governs the input/

output behavior of the automaton and is a stochastic transformation in the following sense: a map

T: x* - Prob(Y" .. )

is said to be stochastic transformation iff T(v)(w) = O ,

whenever lvl=l=lwl, and if

I

T(vx)(wy) =T(v)(w) yEY

holds for any V E X '

*

w E y

*

' X E X . I t i s an easy exercise to demonstrate that K* has these properties.

p

A Theorem due to P. Starke shows that for any stochastic transformation T there exists a stochastic automaton (Z;K,p) such that K

*

P ~qua s l T([l01,Satz 18 ). Hence from an external observer's point of view stochastic automata, and stochastic transformations need not be distinguished.

A similar construction is possible for nondeterministic automata. Call a map R with domain x* and range {A;AcY*} an automaton transformation iff R(v) cylvl holds , and if R(v) equals the projection of R(vx) to

(7)

for any VEX, XEX

*

R ( V ) is interpreted as the set of possible outputs of a nondeterministic automaton, and the second defining condition says that if wy is a

possible output after input of vx., then w is a possible out put after input of v

chastic transformation, let

1 f T i s a s to -

R(v) : = {w E v*; T(v) (w) > O}

then R is an automaton transformation which supports

T, that means wER(v) iff T(v)(w) >0. Conversely, there is a canonic construction which yields a stochastic transformation supporting a given autmaton transformation R • Indeed define inductively

T(e)(e): = 1

T(vx)(wy) ⁼

T(v) (w)

C a r d ( { Y ^{I ;}W Y_ ^I E R ( V X ) } )

' i f

wy E

R(vx) 0, otherwise

then T supports R . We will have occasion in the sequel to make much use of this construction, which will be refered to as canonic.

The construction of an optimal control will require a Theorem from Linear Programming. A linear program (P) is determined by a linear map A from IRm

for some m , and n , a vector b E IRn of constraints, and a vector a E IRm which serves as a reward function.

The program (P) then reads: determine under the constraints

C ~ Ü,

Ac = b

(8)

a vector c such that

<d,c>-Min

<·, •> denoting inner product on IRm . This primal program (P) determines a dual program (D) , in which under the constraints

a vector eEIRm is to be determined such that

< e, b> - Max !

where At is the transposed of A . Vectors which satis- fy the respective constraints are said to be feasibZe solutions for the respective programs, and feasible vectors satisfying the extremal properties are said to be optimal solutions for the programs. From [2], Satz 6-3.1 now is interferred the following

Ouality Theorem of Linear Programming:

Given a linear program ^{( p)} with adjoint dual program ( D) let

*

a nd

*

be feasible solutions for ^{( p )}

' ^C ' ^e

and ( D) respectively. Then

*

i s optimal solution

' ^C an

fo r ( p) _' provided <d,c > * equals <e ,b>

* •

This Theorem will be helpful in the next section.

3. Constructing an Optimal Control

Fix for this section a stochastic transformation T for which an optimal control will be obtained. The control will be a stochastic transformation, too, such that a functional which serves for comparison will be minimized.

Assume that we are given a symmetric map F with domain

*

{ (w,w'); w,w' E Y with lw 1 = lw' 1} , and range JR+ ,

(9)

the nonnegative reals, where symmetry means that

F(w,w') = F(w' ,w) holds always. If the stochastic transformation L is used as a control, then the probability that T puts out w is T(v)(w) , and that L puts out w' is L(v)(w') , provided both devices are in this situation fed with the input word v . The expected error in this situ-

ation is F(w,w') T(v)(w) L(v)(w'),

when F is interpreted as a measure for distinctness (e.g., if F is a metric), hence the expected error using the control L after input v is

r(v,L) ⁼ l{F(w,w') T(v)(w) L(v)(w');

w,w' E v*, lwl = lv l = lw' I}

Consequently, an optimal control will minimize r

Definition: A stochastic transformation L* is said to be an optimai controi 'for T with respect to F iff ·

r(v,L*) 5r(v,L) holds for any

L •

*

v E:X and any stochastic transformation

Now let g be a stochastic source, i.e. g x*-[0,1]

is a map such that

l

^g(v)=l

1v1=n holds and moreover

I

^g(vx)⁼^g(v)

XE:X

is true for any ^VEX

*

• g might be considered as ade- vice which governs the input of T , and L . In this

(10)

case let

=

I

r(v,L) g(v)

1v1= n

be the expected error i~ the control for input words of length n . A. Schmitt ([9]) calls a stochastic transformation L

0 pt an opti!J1al prediction for T ·; ff 1 n

A""(lopt)

.

⁼ l im

-

_n

I

^r.(L₁ _op^t)

n-oo i = l

exists, and if

A (L t) 5.A (L)

oo op ^oo

holds for any stochastic transformation L , for which A₀₀(L) exists. It will be shown below that in a special case Lopt can be used as an optimal control. In [3]

an attempt was made to generalize the prediction problem to the case that the alphabets in question are no langer finite, but that at least Y is a compact metric space, and the control problem considered here might be considered as a continuation of [31 for finite alphabets.

After some computational experience, an optimal control can be constructed as follows: define a set valued map R

inductively as follows:

R(e) : = {e}

R(vx) = {wy;

!,

w E R(v),

I

^F(w¹^,wy)

w'EYIVXI T(vx)(w¹⁾

= m i n

l

^{F (}w ¹^,wy) T ( v x ) ( w ' ) }

yEY w'EYIVXI

then R is an automaton transformation. This is not difficult to prove.

(11)

In the sequel F will be varied. It would be formally exact to note the dependency of R on F , but this would make notation clumsy. lt will be clear in the sequel which F is in question. Now define L* by the canonic construction above, then L* takes part in the competition to predict T optimally, and will now shown tobe the winner.

3.1 Theorem:

L* predicts T optimally.

Before Proving 3.1, some technical preparations are made in order to make the Proof not to involved. X and Y are assumed tobe linear ordered, Cartesian products have the lexicographic order. Thus everJ Cartesian product can be mapped order isomorphic to an initial section of the natural numbers. This map is always assumed tobe applied, if vectors or matrices are written as e.g.

A = ( a xy , y ¹ ) xy E X ^xY , y ^I E Y ·

The proof of 3.1 is done now by induction of the number m of s teps a 1 ready performed. Hence, i f m = 0 , there

is nothing to show. Assume all is demonstrated for m = n.

I f m ⁼n + 1 , a 1 i n e a r pro gram ( P ) i s form u 1 a t e d . W e a r e looking for a nonnegative real vector

(cvxwy)vxwy Ex" * ^x X ^x yn ^x Y with the following properties:

( _i) _VVEXn VXEX VwEY .: n , l cvxwy=bvxw

*

= L (v)(w)' *

yEY

(12)

(ii) <d,c*>-Min!, where

d : =

I

^F(w¹,wy) T(vx)(w¹ ⁾ q(vx) ,

V X W y ^W1E y n + 1

<·, •> denotes inner product, and q is an arbitrary stochastic source, which is taken into consideration for technical reasons only.

Condition (i) takes care of the properties of a stochastic transformation; condition (ii) guarRntees opti-

n n

mality later. In (i) there are card (X x Y x X) constraints, thus a matrix

is defined by

avxw,v¹x¹w¹y: = {

1 ' 0' Hence (P) reads as follows:

( p) ^C

*

~ Ü, ÄC* = b,

<d,c*>-Min!

otherwise

This primal program leads to the following dual program (D) : determine a vector

1r = (1rvxw)vxw E Xn x X x yn with the properties

(D1) At1r~d, (02) <b,1r>-Max!

(At is the transposed of A) .

From the construction of A

.

it is seen that this yields

(13)

the fo11owing inequalities:

'vV E X n vx ÊX ^VWÊy n vy Êy : ^'IT < d . vxw- vxwy Since b is nonnegative, an optimal solution rr* of

(D) is given by

= qtvx)min

l

n+l F(w¹ ,wy)T(vx)(w¹ ⁾ ^, yEY w¹EY

the value of

<b,1r * > =

( D) at rr

*

i s

l{l*(v)(w)q(vx) min

yEY

L

^{F ( w}¹^,^wy)T ( v x) ( w ¹^{) ;}

w¹ EYn+l ^V^E^{Xn ,}

XEX, WEY}. n Hence, defining

cvxwy:

*

= L (vx) (wy) ,

*

C is a feasible solution of (P) , and <d,c > equals *

*

<b,rr > By the Duality Theorem of Linear Programming, c

*

is an optimal solution of (P)

I n pa r t i c u l a r ,

r(vx,L*) = inflr(vx,L)

h o 1 d s f o r a ny v ^EX n , x ^EX • Th i s ca n b e s e e n b y c o n s i - dering a special source q, which puts out any word with the same probability.

•

In order to connect Theorem 3.1 with a result due to A. Schmitt ([9], Satz 3.1), let p be the discrete metric on Y , and let

n

=

I

i=l

1

p(y.,y.)

l l

1

= card ({i;yi-/ Y;})

hence F(w,w¹ ⁾ counts matches in the words w , and w'.

It is convenient to write for a probability

µ ( Y n • Y) : =

l

ⁿ ^{µ (}wy) . wEY

µ on ^yn+l

(14)

3.2 Corollary:

Let F be defined as above, and define inductively S(e) = {e}

S(vx) = {wy;wES(v),T(vx)(Ylvl. y) =·max T(vx)(Ylvl. y)}

yEY

Then S equals R . Proof:

The equality R(v) = S(v) is shown by induction on lvl.

Assume i t is established for v EX*, 1v1 ^:$; n; v EXn, x EX, wy E S(vx) • I f w¹ E Yn , y ¹ E Y are arbi trary, we have

I

F(w,wy)T(vx)(w) =

I

F(w*,w)T(v)(w*) + [1-

wEYn+l w*EYn I(vx) (Yn • y)]

hence wyER(vx).

:$;

I

^F(w*,w ¹ )T(v)(w*)+[ .•• ]

*

yn

w E

\ * *

:$; l F(w ,w)T(v)(w) + [1- w*Eyn T(vx)(Yn • y ¹^{) ]}

:$;

I

F(w,w'y' )T(vx)(w),

*

yn W E

An analoguos argument demonstrates that wy E R(vx)

i m p l i es f o r an a r b i t rar y y ^I E Y t h e i n e qua 1 i ty 1 -T(vx)(Yn • y) :$; 1 - T(vx)(Yn • y ¹ ⁾ ^, hence

wy ES ( V X) .

•

(15)

In [9] it is shown that the canonic stochastic transformation for S is indeed an optimal control for the given stochastic transformation under the prediction error which is induced by the match counting ^{F ,} where opti- mality is measured by A₀₀ ^• The main difficulty in esta- blishing that result is to demonstrate that the limit in question exists. This is done in a sophisticated way by a compactness argument, using the we11 known fact that lim Ak exists, whenever A is a stochastic matrix, with Ak as its k-th product with itself.

Assume that F is generated by the discrete metric p

upon setting

hence F is itself the discrete metric on the set of output words of equal length. This .Situation will be of interest in Section 4, and L* can be obtained in a particular simple form.

3.3 Corollary:

Let F be the discrete metric on y* and define Q(e): = {e} ,

Q(vx): = {wy;w EQ(v),T(.vx)(wy) =

max T(vx)(wy)}

yEY Then Q equals R .

Proof: The equality in question follows from the observation that

(16)

I

F(w,wy) T(vx)(w) wEY1vx1

coincides with

1-T(vx)(wy) f O r a ny W c Y l V l , y c Y • 0

4. Analysis of an Algorithm for Computing the Optimal Control

Given a positive integer t , an input word

x

₁... xt , and an output word y 1 ·•·Yt of respective lengths t, we want to know which probality is assigned to y

1 ... yt by the optimal control after input of x1.~.xt . In terms of 3.3 this can be accomplished in the following way:

compute Q(x1) ; if y1 EQ(x1) ,

*

1

L (x1)(Y1) = card (Q(x 1) otherwise L (x

*

1)(y1) ⁼0 , and

L (x 1 ... xt)(yl ... yt)

*

⁼0

i s r et ur n e d . I n t h e forme r ca s e , Q ( x 1 ^,x2 ) i s c o m p u t e d , and if y1y2 is a member of i t , L (x

*

1x2)(y 1y 2) is computed as indicated by the canonic construction, otherwise the computation stops, etc. From the construction

of L* it is evident that

L

*

(x1 ... xt)(y1 ... yt)>O holds if and only if

L*(x1 ... x 1)(y

1 ... y

1) >0 holds for every i , l s i < t , and

y 1 ... ytEQ(x 1 ... xt).

This yields an algorithm for the computation of L*(v)(w)

(17)

for any pair v and w of input and output words.

Before this algorithm is presented, some preparations are needed. The stochastic automaton is assumed tobe finite, say Z={l, ... , l } , and it is assumed that it starts with probability one in a fixed state zEZ Thus p is the probability concentrated at

z ,

^and

T(v)(w) equals

l

K(v,z)(z,w) zEZ

f or a ny V E X ,

*

^WE Y ,

*

!VI= lwl. We assume that Y={l, ... ,n}, and that K is given by the set of .t ^x.t - matrices

{A(x,y); xEX, yEY}, where A(X,Y);,j=K(x,i)(j,y)

.Then K(x₁... xm,i)(j,y₁... ym) is the i,j - entry of the product matrix A(x₁,y₁) • ... • A(xm,Ym) •

4.1 Algorithm:

Input t t

K,, t , ^Xl ... ^Xt E X , y l ... ^Yt E Y

Output L

*

(x₁... xt)(y₁... yt) , the optimal control for T(x 1 ... xt)(y₁... yt) ·

Method: O. B is an n-dimensional array of l x l - matrices, the n-dimensional real array w holds the current values of T , c the current value of the prediction.

l. 6011. y-1 ta n da B(y)-A(x

1,y);

time-!; c - 1 .

2. 11.e.pe.a.:t :the. .6:te.p.6 3 - ·6, ,lfi time S t a.nd

C > Ü

3. ß ^Oll. Y - 1 ta n da w(y) -

I

B(y)z,z •

ZEZ 4. Q -{YEY; w(y) =max w(y)};

YEY

Yt· 1me EQ :the.n c-c/card(Q) e.l1.ie. c-0.

(18)

5. time-time+l.

6. 601i y-1 .:t.o n do B(y)-matrix product of B(ytime - 1) and A(xtime'Y)

7. L (x 1 ... xt)(y 1 ... yt)-c *

4.2 Proposition:

Algorithrn 4.1 is correct.

Proof:

Denote by M the number -0f time step 6 is performed, by cm (Qm,Bm(Y), respectively) the value of c (Q,B(,Y), respectively) after step 4 has been performed when time has the value m~M . By induction on m, the following has tobe shown:

If m=l, ( a )

( b )

( C )

vy E Y: y E Qm i ff y1 .•. y

111_

1^.YE Q(x

1 ... xm), cm=L (x 1 ... xm)(y 1 ... ym),

*

VyEY:B (y) . . =K(x

1 ... x ,i)(j,y

1 ... y

1y).

m , , J m m-

B1 (Y);,j=K(x 1 ,i)(j,y), thus w(y) = T(x

1) (y) , and y E Ql

if and only if

T(x1)(y) =max T(x

1^{)(y} .} yEY

Consequently, yEQ

1 ^iff yEQ(x

1) , in particular

*

c1 ⁼L (x 1)(y

1)

holds in this case. This proves (a) - (c) for m= 1 , and correctness has been established in case M = 1 .

Assume that M> 1 , and that (a) - (c) hold for some

m < M . Then m < t , and cm> 0, thus control reaches step

6 with time= m + 1 . Since by (c) B (y) . . coincides

m l , J

(19)

with K(x 1 ... xm,i)(j,y 1 : .. ym), we have

8m+l(Y);,j ⁼K(xl ... xm+l'i)(j,yl ... ym+l) ' and furthermore

w ( y ) = T ( X 1 . . . X m + 1 ) ( y l . . . y mY ) Since cm>O

y_{1 ...}ym E Q(x 1 ... xm) holds by (b), thus y EQ m+ l iff If Ym+l EQm+l '

cm+l =cm/ card(Qm+l) ⁼L (xl .... xm+l)(yl .. ·Ym+l).

*

If however y ttcQ

m+l ^F- m+l '

cm+l = O ⁼L (xl · · .xm+1)(Y1 ·· ·Ym+l)

*

This proves (a) - (c) for m + 1 .

•

We want to investigate both the warst and the average case behavior of algorithm 4.1 with respect to arithmetic operations under the uniform cost criterion ([l], ch.l). This criterion assumes that the cost of performing an operation is independent of the particular kind of operation - an addition costsas much, as a division - as well as the particular size of the operands - operations with integers invo1ved are as much, as operations with floating point numbers. Although this measure is somewhat unrealistic, it provides a rough estimate on the complexity of an algorithm. For the analysis an im- p1ementation of step 4 has tobe considered, and the fo11owing refinement of this step is fixed, where a subset of Y is thought of as an-dimensional boolean array. This leads to

4¹ ^• max-max w(y); card-0

yEY

(20)

t5011. ^y-1 to n da it5 max= w(y) t h e.n. a ( y ) - 1 ; c a r d - c a r d + 1

e.Lo e. a ( y ) - O ;

i t5 a ( Y t i m e ) ⁼1 t h e. n. c - c / c a r d e.Lo e.

C +- Ü •

This requires at most O(n) arithmetic operations. Since the steps 3 - 6 are performed at most t times, step 3

requires O(n • l) , step 6 O(n. ,e. 10927) operations in the warst case, provided the Strassen algorithm ([1], 6.2) is used for matrix multiplication, in the warst case

O(t • n

-.e.

109 27)

arithmetic operations are needed to compute

L*(x₁... xt)(y₁... yt). Note that this is independent of the cardinality of X .

The average case analysis is considerably more difficult.

Usually the input for'an algorithm which is tobe ana- lyzed is taken from an at most countably infinite set of possible inputs. On the other hand there are uncountably many stochastic automata, and hence another approach for the average case analysis must be found. It is assumed in the sequel that the stochastic automaton as we11 as the input and the output words are realizations of a stochastic process. Roughly spoken, the approach chosen transforms the distributions of these processes by means of the steps 2 - 6 of the Algorithm 4.1 . Since each in- stance of an iteration in these steps constitutes a measurable map, the images of the distributions in question under these maps are

(21)

investigated, because these image measures are the distributions of the process transformed by the respective

steps of the algorithm. This approach has turned out to be similar to the approach to the semantics of probabilistic programs due to D. Kozen [6] in which a probabi- bilistic program is shown tobe a device which transforms measures (note, however, that Kozen ¹s probabilistic programs require random choices within the program, and that our algorithm is strictly deterministic).

Let (S1,A,IP) be a probability space. A stochastic automaton can be identified with a map from XxZ to

Prob(Z ^x Y), hence it is assumed that the automaton which serves as an input to 4.1 is realized by a stochastic process {k(x,z); x EX, z E Z} of measurable maps

k(x,z): S"I-- Prob(X x Z)

when the latter set is endowed with the restriction of the usual Bore1 o-field on lRXxZ to Prob(XxZ).

The input and output words are specified by a stochastic process

in X ^x Y o-field .

{j ; m . m EIN} of measurable maps j m with values this set being endowed with the discrete

The following assumptions are imposed

a) {j ; m EIN} is identical distributed, m

b) {jm; mEIN} u{k(x,z); xEX, zEZ} is stochastic independent.

Denote the projections of jm to the input and the output alphabet by im, and om, respectively, then

i1(w) ••• im(w) is the input, and o₁(w) ... om(w) is the

(22)

output word in 4.1, when the automaton is chosen according to wErl. Note that according tob) the automaton

i s chosen i ndependently from the i nput and the output word, as one would expect intuitively. In order to fix an assumption on the distribution of k(., .) , some prepara~

tions are needed.

For this, let M be a finite set with m: = card(M) . Oenote by r⁵(M) the set of all substochastic vectors on M, i.e. thesetofallmaps V:M-[O,l]whichsatis- fy

l

^V(a)~l

aEM

and let r~ denote r⁵ ({1, ... ,m}) , similarly, Probm is Prob({l, ... ,m}). If a$M is a distinguished ele- m e n t, t h e n Prob ( M u { a } ) i s h o m o mo r p h i c t o r ⁵⁽^{M )} ^u^p^{o n}

defining a map

a: r⁵^{(M) -} ^Prob(M^lJ^{a})

by

a(p)(x) =

p(x), if x+a 1 -

I

^{p(x) ,} ^{; f}

xEM ^X⁼^{a .}

The equidistribution on (as a subset of IRm) is

r 1m h

m •. /\ , Vi en is m-dimensional volume, viz., Lebesgue measure. Now an appropiate measure on r⁵(M) should be found: let ß M-{l, ... ,m} be a bijection, and define ß 1 : ,.. "ms-"s(M) ,.. u po n s e tt· , ng

ß1(a1•···•am)(x): = aß(x) .

By means of

s

₁ Lebesgue measure nov,1 is transported in

(23)

the fo11owing way: 1et Bcz⁵(M) be a Bore1 set, then

s

-1 ₁ ^[M] ^{:= {b;ß}₁^{(b) EM}}

is a Borel set in z_{m '}⁵ and

;,.m(

ßll

^[M])

is defined tobe the transported Lebesgue measure on z⁵

(M);

denote the measure constructed in this manner by

/\M , hence

AM=ß1(Am).

Since Probm spains algebraical ly a (m - 1)-dimensional hyperplane in IRm,

/\Prob(M)) = Am(_Probm) = 0

holds, hence another way to construct a_canonic measure on Prob(M) must be found. For this the homeomorphism a

i s brought i nto p l ay : i nterconnecti ng ^a wi th ß1 ' ^zs^m-1 i s mapped homeomorphically onto prcb(M), hence ß l o a

transports A m-1 to Prob(M ). Denote by Ao the rneasure an Prob(Z xY) constructed in this manner.

lt is assumed that the distribution of the process k

above is independent of x and z , and that with K(A): = IP ( k(x,z) EA)

the re1ation

K << ^Ao

holds. This means that ^K is absolutely.continuous with respect to Ao , i .e., that K(A) = 0 'f-or every Borel set Ac Prob(Z x Y) for which Ao(A) = O • K is the image under k(x,z) of IP (as

s

1) , this symbolized by

K: = k(x,z)(IP)

~ m

A' is the image of A under

(24)

Another property of ^K is symmetry : ^K is assumed to be symmetric in the sense that it is intensitive to permuta- tions of Z x Y: if 'ir is an arbitrary permutation of the latter set, let i be the associated bijection on Prob(Z x X), which is defined by rr(s) =so TT, and symmetry of K reads now

This can be interpreted in the sense that no special arrangement of states and outputs is preferred by K ,

and the assumption ^K<< Ao in the sense that an event which is observed in Prob (Z x Y) under equidistribution

r(n •.t) • Ao with probability O, will be observed under

K with this probability. Finally we need the Change of Variable Formula for manipulations of integrals: Let A and B be open subsets of lRm for some m such that A is the image of B under sorne differentiable homeomorphism g . Then the integral over A for every integrable map h: A-iR can be evaluated as follows:

fhd;x.m = fh(x

1, ... ,x )dx ... dxm

A A

m

= ih(g(y1 , ... ,yin)) ^j det 9¹(y 1 , ... ,Ym) ^jdy 1 ... dym

=

fh

o g , j det g ^I j d1,.m ,

B .

see [5], Article 675 (or any other book on Calculus which treats higher dimensional integrals). This Formula will be a convenient tool when calculating an integral, e.g. of the type

(25)

-

This is done by means of Jacobi's substitution

where

Y1: = xl(l - x2) ' y2 : = x

1x

2(1 - x 3)

.• =

Then the equalities

.L

m ^Yi ⁼^{xl '}

1=1

and

hold , and

Hence the integral in question reads

1 1

J f

₀^{..• f(h(x}₀ ₁^{, ...}^{,x ))} ^x^m-l₁ ^x^m-2₂ ^•

m

which is sometimes easier to evaluate, see [5],Article 676, Nt> 13 .

To begin with the average case analysis of 8lgorithm 4.1 the expected running time is computed. In,order to do this, consider th~ following auxiliar processes

(26)

, (w): = max s (w)(y) .

m yEY m

sm(w) describes the output behaviour at time m , provided the inputs at times 1, ... ,m are given by the input process, and the output at times 1, ... ,m-1 by the output process, if moreover the automaton is chosen according to w

cr {w) and m

If now the output at time m is om(w) , 'm{w) serve as an indicator for maximality of this output.

Now let Ht(w) be the number of times with which step 4 is executed, when all the inputs to the algorithm are chosen according to the observation of w , and the respective words are of length t . Moreover let

H: = inf{m; crm<-rm}, then we have

Ht=inf{H,t}, sinceincase m&t

Ht(w) ⁼m holds if and only if

o 1 ( w) =. -r 1 { w) f o r 1 ~ i ~ m - 1 , and

om(w)<,m(w).

This implies, too, that Ht and H are random variables.

Computing the expectation of Ht requires some preparations. For this, con~fder the additional process tm , which is defined by

and which describes the state ~ransition behavior of the randomly chosen automaton with random input and output.

(27)

---

Now let

JJm : ⁼ k * ( i ₁... im, z) (IP)

be the distribution of the Prob(Z x Ym)-valued random variable

w - k * ( i l ( w ) ••• im ( w ),z ) ( w ) •

The next Lemma wil 1 list some properties of ]J •

. m

Before stating it, remember the product of a family of measures: if y.

1 is a measure on the cr-field A. •

for i EI (I finite) , then the product measure iEI

©

y. l

is the measure on the cr-field generated by

{ II .B . ; V • E I : 8 . E A . } ,

iEI l. 1 1 1

which is uniquely defined by

(@ y.)( II 8.) = II y.(8.) .

iEI ¹ iEI ¹ iEI .l ¹

4.3 Lemma:

a) jm+l i s independent of {k*(i 1 ... im,z);

of { tm ( • ) ( Z ) ; Z E Z } for any m E lN , b) _]J1 equals ^K '

Z E Z}

C) given s E Prob( Z ^xym) , s ' · E ( Prob( Z ^xY ) ) Z , define

l

and

ip (s,s' )(z' ,wy): ⁼

I

s(z,w) • s' (z)(z' ,y) ,

m . zEZ

then and

1/J m ( s , s ' ) E Prob( Z x Y m+l )

l.lm+1=1/Jm(µm®@ µl)

ZEZ

Proof a) follows from the independence assumption cn k(•,·) and (jn)nElN' and from the fact that jm+l 1s

(28)

neither involved in the construct,·on of k*(,· ,· z) 1 · · · m' '

nor in the construction of t _m(·)(z) f or any z ^E^Z

b) Let B cProb(Z x Y) be a Borel set, then the follo1'ling chain of equalities holds:

µ1(B) = IP (k(i

1,z) EB)

=

l

^IP(il ⁼^X, k(x,z) E B)

)(EX

l

IP(i1=x)IP(k(x,z)EB) xEX

=K(B)•

I

^IP(i ₁ ^=x)

xEX

= K(B) •

Note that some particular form of the distribution of ; 1 does not come into consideration.

c) {k(im+l'z); zEZ}. is independent, hence the distribution of the vector k(im+l'·) with

k(im+l(w),•)(w) E (Prob(Z x Y))Z

is given by

®

µl . Consequently the distribution of

ZEZ

(k*(i_{1 ...}im,2) , k(im+l'•)) is described by

µ ®

®

^µl

m ZEZ

because of part a). But since we have according to the construction of the map $m

k * ( i _{1 ...}im+ ₁, z) = $m ( k * ( i ₁... im, z) , k (im+ 1 , •) ) , the asserted equality follows. D

The next Lemma lists a particularly important property of the distribution of tm , viz., that it is absolutely continuous with respect to ~Z . This will turn out tobe

(29)

the key tool of the subsequent analysis.

4.4 Lemma:

For every m EIN the following properties hold a) tm(IP)<<

Az ,

b)

I

^{IP (k*(i}₁^...îm,z)(z,o₁^.••ôm) ⁼Ô) ⁼

o

zEZ

Proof: 1. tm+l can be represented .as fol lows:

= 1/J ( tm ( w) , k ( im+ ₁( w) , • ) ( w) ) ( z, o m+ _{1 (}w) ) where 1/J: Is(Z) x (Is(Z)/-1:s(Z) is defined

by

1/J(s,s')(z): =

I

s(z') • s'(z')(z) . z'EZ

Now tm is independent of the vector k(im+l'·)(z,om+l), the distribution of which is just

®

^t₁(IP) because

ZEZ of the independence of

{k(im+l'z' )(z,om+l); z EZ}

Consequently

holds.

2. The assertion tm(IP) <<)..Z now is proved by induction on m .

(i)m::l Let Bcr⁵(Z) a Borel set with ).. (B) = 0 ,

z

and fix for the moment

x

^{EX ,} ^z^EZ , then

w

(k(x,z)(·,Y) E B) = K({s E Prob(Z x Y);s(·,Y) E B})

= 0

(30)

since the construction of Ao implies

;.._

0 ( { s E Prob'. Z x Y) ; s ( • ,y) E B}) = o

and since for a Borel set B' c

z::1

with ;.._,t(B') ⁼O holds

=

r ( (

n - 1\ • ,e)

J. (

^{1 "}ⁱ

t

^a^{i /} n-1) • l-1 da 1 ... da ,t

= 0

If x and y are no longer fixed, the independence of

' j1 and k(•,·) implies

tl(IP) (B)

= I I

IP (jl

=

xy,k(x,z)(--,y) E B) xEX yEY

=

l I

_{IP (j 1}⁼xy) IP ( k ( x, z) ( • , y )E B ) xEX yEY

= 0

( i i) l!)_-:[1 __

1.:.

·Let a) be proved for m , then the i nduc- tion hypothesis implies

tm+l(lP) ⁼ 1/i(t (IP) <J!>

®-

_{t 1 (IP))}

m ZEZ

Hence by the transitivity of

«

it suffices to demonstrate that

iµ()..z®

®

^Az)^« ^Az

ZEZ

holds, and remembering the construction of AZ , it is enough to show that

(*)

(31)

.

where ~ is a map from f i ned by

l

to r⁵ ^, which is de- l

I

= (ao r•a , r, l, ... ,a o,r ·a r,,e n)·

r= 1

Hence let be a Borel set with given a s

0 E r 1 ,

(a 1, ... ,al) EA(a0 )

if and only if

i}i(a

0, ••• ,a

1 ) ^EC

then it must be demonstrated that (**)

J ,

_/\^{l •l} _(A(a ^d

0 )) ·a

0 =0,

Is l

Thes-e anapplication of Fubini's Theorem ([8], Satz 3.2.2) will assure us that (*) is correct.

To begin with, fix a₀E I,e, • ^s Since the set of all such tha t a.

1 = 0 for some i can be neglected with pect to ;i._ .l , it can be assumed w. 1. g. tha t ao, i

+

0 holds for any i

.

^Hence

where

l · l

f ·

E: = {(b

1,~ .• ,b) E lR ; l (b

1, ... ,b ₀ ⁾ EC}

r=l r, r,-<- and

By the Change of Variable Formula

;i._.l·.l(A(a

0)) ~ ldet f' ¹ ;i._.l•.l(E)

holds. Interchanging b . . and b . . , one obtains

1,J J,l

res-

(32)

-

where

f · f ,t

= {(b

1, ... ,b

1^)EJR ^;

L

^(b

1 , ... ,b₀ )EC}

r=l ,r ,{__,r Now let

be the substitution due to Jacobi, which is described above, and let

{

ffif•l_ ffil•l H: (v

1, . . . ,vl)-(h(v

1), ... ,h(v,e_)) then we have

E1 =H[F], where

F : = { ( V l , . . . , v l) E ffil • l ; ( V l l , . . . , V l l ) E C }

.

' '

But now we are done:

:>i.l·l (A(a

0 ))::; ldet f ¹ 1 Al·l(E)

since evidently

:>i.l•l(F)=O 3. Fix Z E Z , then

=ldetf'l•Al•l (E 1)

::; 1 det f' I •

J

^{I det H}¹ ¹ d:>i.,e_.,e_

F

= 0

k*(i1 ... i ,z)(z,o

1 ... o )=0

m - m

holds if and only if

tm E C : = { s E L s ( Z ) ; s ( Z ) = 0 } Since

A

z

(C) = 0 ,

(33)

part a) implies

* ^A

lP ( k (1 i ... im, z) ( z, o1 ... om) ⁼O) = O .

•

Having a look back at this very technical proof, let us describe its method. The idea of this proof consisted in tracing back the origin of the measure

z

.\ ; the asserted property has been shown to hold in~the-ctdmain

and via the map which transported Lebesgue measure the property in question is seen tobe carried over, too.

This method of transported measures will be appliect later again.

From 3.4 a conclusion can be drawn which is a first step towards the computation of the expected running time.

4.5 Corollary:

m-1

lP ( H = m ) = lP ( cr m < Tm) •

TT

lP (cr . = T J ) •

j=l J

Proof: Fora fixed ' m we have m-1

{H=m}={cr <T }t1n{cr.=T.}, m m j=l J J

hence the independence of the events at the right of the equation above has tobe demonstrated. Let

nG>: = { 3m EIN 3 z E Z : tm ( •) ( z) = 0}

then 4.4,b) implies that it is no lost of generality to assume that tm is strictly positive on all of ^{Q ,} since IP( n

0) = 0 . Adopting this assumption, and defining

k(x,z)(Z •Y) =

l

k(x,z)(z' ,y)

z¹EZ we see that

(34)

{a =T }=U

n

{k(i ,z)(Z•y)=k(i ,z)(Z•o )}

m m yEY zEZ m m m

This is implied by sm(w)(Y) - am(w)

=

I

^{t _}₁(w)(z)•[k(i (w),z)(w)(Z•y)-k(i (w),z(w)(Z•o )]

zEZ m m m m

Since {jr; .1:::;r:::;m} is independen-t, it is seen that the the asserted equality holds.

•

Because of this Corol Jary, IP( a. = T-) has tobe computed.

1 1

The idea in doing so is to show that y-sm(w)(y)

has almest surely only exactly one peak point, and that the probability that y is such a peak does not depend on y • This will of course imply that

holds.

4.6 Lemma:

IP(am=Tm)=w 1

Th e f o l l o w i n g pro per t i es h o l d f o r s m f o r a ny m E IN a) sm (IP) is symmetric,

b) IP(sm(•)(y)=sm(•)(y'))=O, provided YfY'.

Proof: We will again transport measures and try to pre- serve the properties of interest during transportation.

1. w-k(i

1(w),·)(w)E(Prob(ZxY)/ is distributed accor- m+

di ng to

®

µl . Hence, l et the map 4i be defi ned as ZEZ

follows

s

z

s

2:: ( Z) x (Prob ( Z x Y) ) - I ( Z) , with

(35)

cp(s,s ¹)(y)

then sm+l (IP) can be wri tten as

s +l (IP)= cp(t (IP)e,

©

^µl)

m m ZEZ

Since µl = ^K ^, and ^K is symmetric, the symmetry of sm+l(IP) is deduced easily. Note that in case m = 1 symmetry of s

1(IP) can be deduced directly from symmetry of

K •

2. In order to prove part b) , put for y fY' B _y,y, . - {s E r⁵(Y); s(y) = s(y¹J} , then

sm(IP) (By,y•) = 0

is to prove. Consider the cases m = 1 , and m > 1 . (i) _m_:_1_:_ It is sufficient to show that

~ ({sEProb(ZxY);

l

^s(z,y)=

l

s(z,y')})=O

o zEZ zEZ

and because of part a), and the construction of A

0 it will be enough to show that

An•l-1 (A) = 0 where

l l

A

= { (

a 1' ... 'an. f-1) E r

~

• l-1) ; rf 1 a r

=

rI 1 a l+ r}.

This is not difficult to prove by means of the Change of Variable Formula.

(ii) In case m>l we have

sm(IP)«<PPZQll@ µ1) ZEZ

because of part a) in 4.4 and part b) in 4.3. Thus it is enough to show that

(36)

rppz® ®

^µ₁^)(B ^.)=O

ZEZ y,y

holds, and the construction of AZ , and that it is enough to show that

A 0 now yields /~- A(n•f-l)•f (A)=O,

where

(a0, . . . ,a,e_) EA

if and only if the following defining conditions

and

E "s ,..."s ₁ . 0

ao ,..,e_ 'a;c,..n•l-1' :::;1:::;,{..'

l l

I I

i=l j=l

, ..

a. . = 1 'J

l ,e_

\ \ a

1 a. ₀ ^•

l l o, 1,-{..+J

i=l j=l

then the Theorem pf Fubini gives

Now let

a

~

⁼^{a o}^{,(i - 1 )} ^{m o}^d ^,e_⁺1 ( 1 :::; i :::; ,e_ 2 ) '

then (since a .

t

O holds for any i expect on a set

0, 1

which is negligible with respect to Al ) we get the fol- lowing inequality

,_(n•f-l)•f

" (A(a₀⁾⁾

:::; A(n•l-l)•l ({(b¹,b¹¹,b* ); b¹,b¹¹ E[O,l]l·f ,

b * e ,[ o ^{1 ] (}n • l -3 ) • l < a * ^b¹> = < a * b ¹¹> }

' ' , ' ^.

and this implies

Am ( IP) ( B y, y , ) = 0 . D

Now all preparations are done, and we are in a position which allows us to compute the required expectations.

(37)

4. 7 Co ro I l a ry:

If the words which serve as an input to the Algorithm are of length t , the expectation and the variance of the run-

ning time Ht amount to

a nd

Proof: Let

lE ( H t) = _n __ n + t ( n-1) + _!_

n•l nt(n-1) nt 2 var(Ht)= n 2+0(tt)

(n-1) n

B y ={s EI5

(Y); s(y) =max s(y)}

yEY

be the set of all substochastic vectors on Y that have a peak in y . Then {By; y E Y} constitutes a cover of t ⁵(Y) , and

sm(IP) (By n By')::; sm(IP) (BY,Y')

= 0

by 4.6, provided y{y' . By the symmetry of sm (IP), s m (iIP ) ( B y ) = Am ( IP ) ( B y , ) ,

thus

Consequently

and hence

= -1

n

IP(am=tm)=

I

yEY

=

n ,

1

n - 1

- - - n By 4.5 we have

IP ( H

=

m )

=

.!!...::_l m for 1::s:m::s:t-1, and since

IP (o = y) IP ( s E B )

m m y

(38)

{Ht = t} = {H = t} u

n

{0.<T.},

l l

i = 1 we have

n - 1 1 lP (Ht = t) =-t- + t

n n

Hence

t

IE (Ht) = mil m IP (Ht = m)

= __!_ + _n __ n + t • ( n - 1)

t t

n n-1 n •(n-1) Similarly

var (Ht) = IE (Hi) - (IE (Ht))'2

n t 2

= z+O(t)

(n-1) n

is obtained.

•

This result permits us to calculate the expected number of arithmetic operations which are necessary in order to execute the algorithm in question for words of length t Rather than doing so, it is assumed that the words are arbitrary long, and the corresponding quantity is calcu- lated for this case. Denote by A the number of arithme- ti c opera ti ons in th i s case, then A : ^{S] -} IR i s random variable, and we have

4.8 Theorem:

For the expectation and the standard deviation of A the equations

hold.

IE (A) ⁼0(1¹⁰⁹2 7) , 0{A)=O(/n.,e_¹⁰⁹2 7 )

Proof: 1. The running time H of the algorithm for arbi-