• Keine Ergebnisse gefunden

Structural Minimization of Risk on Estimation of Heterogeneity Distributions

N/A
N/A
Protected

Academic year: 2022

Aktie "Structural Minimization of Risk on Estimation of Heterogeneity Distributions"

Copied!
22
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Working Paper

Structural Minimization of

Risk in

Estimation of Heterogeneity Distributions

AnatoLi MichaLski AnatoLi Sashin

December 1986 WP-86-76

International Institute for Applied Systems Analysis

A-2361 Laxenburg, Austria

(2)

NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHORS

Structural Minimization of

Risk in

Estimation of Heterogeneity Distributions

AnatoLi MichaLski AnatoLi ll'ashin

December

1986 WP-86-76

Working Papers a r e interim r e p o r t s on work of t h e I n t e r n a t i o n a l I n s t i t u t e f o r Applied Systems Analysis a n d h a v e r e c e i v e d only limited review. Views o r opinions e x p r e s s e d h e r e i n d o n o t n e c e s s a r i l y r e p r e s e n t t h o s e of t h e I n s t i t u t e o r of i t s National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 L a x e n b u r g , A u s t r i a

(3)

Foreword

Population heterogeneity dynamics i s one of t h e r e s e a r c h directions in IIASA's Population Program. One typical and p r a c t i c a l problem r e l a t e d t o hidden heterogeneity i s t h e estimation of t h e heterogeneity distribution.

This p a p e r d e s c r i b e s t h e a p p r o a c h t o such a n estimation which i s based on t h e method of s t r u c t u r a l minimization of mean r i s k . I t i s shown how t h i s method c a n b e implemented t o some real d a t a . The main ideas of t h e method are a l s o described.

Anatoli Yashin Deputy Leader Population Program

(4)

Contents

Page

1. Introduction 1

2. Estimation of Hidden Heterogeneity as a n Ill-Posed Problem 2

3. Estimation of Hidden Heterogeneity 4

4. Experiments with Real Data 9

Appendix: S t r u c t u r a l Minimization of Mean Risk in Small Sample Cases 1 2

References 18

(5)

Structural Minimization of Risk in Estimation of Heterogeneity Distributions

AnatoLi MichaLski*, AnatoLi Y a s h i n * *

1. Introduction

Assume t h a t t h e r e a r e two r a n d o m v a r i a b l e s z a n d T a n d b o t h m a r g i n a l d i s t r i - b u t i o n of T a n d c o n d i t i o n a l d i s t r i b u t i o n of T g i v e n z a r e known. What c a n o n e s a y a b o u t t h e d i s t r i b u t i o n d e n s i t y of z ?

The v e r s i o n of t h i s p r o b l e m i s known i n e c o n o m e t r i c s a n d d e m o g r a p h y : T i s in- t e r p r e t e d as a r a n d o m d u r a t i o n o r d e a t h time, z i s t h e l a t e n t ( h e i e r o g e n e i t y ) v a r i - a b l e which c h a r a c t e r i z e s t h e individual's d i f f e r e n c e s i n s u s c e p t i b i l i t y t o t r a n s i - t i o n s o r d e a t h [1,2,3].

Denotc b y f ( z ) , U ( t ) t h e p r o b a b i l i t y d e n s i t y f u n c t i o n s of r a n d o m v a r i a b l e s z a n d T r e s p e c t i v e l y a n d by k ( t ] z ) t h e c o n d i t i o n a l d i s t r i b u t i o n d e n s i t y f u n c t i o n of T given z

.

W e a s s u m e t h a t a l l t h e s e d e n s i t i e s e x i s t .

I t i s e a s y t o see t h a t f u n c t i o n s U ( t ) , k ( t ( z ) a n d f ( 2 ) are r e l a t e d as follows

Formula (1) i s t h e f i r s t kind i n t e g r a l Fredholm e q u a t i o n with r e s p e c t t o f u n c t i o n f ( z ) with k e r n e l f u n c t i o n k ( t ( z ) . To f i n d f ( z ) when U ( t ) a n d k (t

1

z ) are g i v e n means t o s o l v e t h e i n t e g r a l e q u a t i o n (1) with r e s p e c t t o f ( 2 ) . I t t u r n s o u t t h a t t h e s o l u t i o n of t h i s e q u a t i o n i s u n s t a b l e . I t means t h a t small d i s t u r b a n c e s i n k e r n e l f u n c t i o n c a n p r o d u c e big c h a n g e s i n f ( 2 ) . M o r e o v e r , if i n a d d i t i o n t h e k e r n e l f u n c t i o n i s a l s o unknown t h e n e q u a t i o n (1) c a n h a v e a non-unique s o l u t i o n .

T h e l a s t p r o p e r t y h a s t h e i m p o r t a n t c o n s e q u e n c e s f o r a p p l i c a t i o n s . I t means, f o r i n s t a n c e , t h a t o n e s h o u l d u s e maximum a n c i l l i a r y information t o s p e c i f y t h e k e r n e l f u n c t i o n k ( t

1

z ) as p r e c i s e as p o s s i b l e b e f o r e t h e d a t a p r o c e s s i n g .

sP.nfltoli Mlchalskf, I n s t i t u t e of Control S c i e n c e s , ProfsojusnaJa 65, Moscow, USSR

**Anatoll Yashln, Population Program, IIASA, A-2361 Laxenburg, Austrla

(6)

A n o t h e r i m p o r t a n t r e m a r k i s t h a t in a p p l i c a t i o n s o n e usually d o e s n o t h a v e t h e p r e c i s e knowledge of t h e d i s t r i b u t i o n densit.y U ( t ). The t y p i c a l i n f o r m a t i o n which c o m e o u t o f , s a y , c l i n i c a l s t u d i e s are t h e o b s e r v e d d e a t h times f o r a sample of n individuals. I t i s c l e a r t h a t s u c h c i r c u m s t a n c e s c a n only c o m p l i c a t e t h e e s t i - mation p r o b l e m of f (2).

R e c e n t l y many p u b l i c a t i o n s w e r e d e v o t e d to t h e p r o b l e m s of modeling a n d es- timation of h e t e r o g e n e i t y i n p o p u l a t i o n a n a l y s i s using o t h e r a p p r o a c h e s . S h e p a r d a n d Z e c k h a u s e r [4] showed t h a t h e t e r o g e n e i t y could b e r e s p o n s i b l e f o r o v e r e s t i - m a t e s of t h e r e s u l t s of medical improvements. Keyfitz a n d Littman [5] d e m o n s t r a t - e d t h a t i g n o r i n g h e t e r o g e n e i t y l e a d s to i n c o r r e c t c a l c u l a t i o n s o f l i f e e x p e c t a n c y . Vaupel a n d Yashin [2,3] d e s c r i b e d many p a r a d o x e s a n d p u z z l e s which c a n b e e x - p l a i n e d using t h e h e t e r o g e n e i t y c o n c e p t . Heckman a n d S i n g e r [I] c o n s i d e r e d t h e i d e n t i f i c a t i o n p r o b l e m i n e c o n o m e t r i c models f o r d u r a t i o n d a t a b o t h f o r p a r a m e t r i c a n d n o n p a r a m e t r i c c a s e s . T h e y h a v e found in p a r t i c u l a r t h a t t h e e s t i -

m a t e s of t h e model f o r d u r a t i o n d a t a a r e s e n s i t i v e to t h e a s s u m p t i o n s a b o u t h e t e r o - g e n e i t y models. Manton et a l . [6] c a m e to t h e s i m i l a r c o n c l u s i o n .

One i d e a which i s d i s c u s s e d i n o u r p a p e r d e a l s with t h e n a t u r e of s u c h s e n s i - t i v i t y . I t t u r n s o u t t h a t v e r y o f t e n t h e i d e n t i f i c a t i o n i n t h e p r e s e n c e of hidden h e t e r o g e n e i t y i s a n ill-posed p r o b l e m , r e l a t e d to t h e s o l u t i o n of e q u a t i o n (1).

Some p r o p e r t i e s of t h i s e q u a t i o n which are r e l e v a n t to o u r s t u d y are dis- c u s s e d i n c h a p t e r 2. In c h a p t e r s 3 a n d 4 we d e s c r i b e t h e a p p r o a c h to t h e s o l u t i o n of e q u a f i o n (1) g i v e n t h e i n f o r m a t i o n a b o u t n d e a t h times. C h a p t e r 3 f o c u s e s o n t h e a n a l y s i s of a r t i f i c i a l d a t a which w e r e g e n e r a t e d b y t h e models of h e t e r o g e n e - o u s m o r t a l i t y . C h a p t e r 4 d e m o n s t r a t e s t h e r e s u l t s of t h e a p p l i c a t i o n of t h e d e v e l o p e d a p p r o a c h to t h e real d a t a . In b o t h c h a p t e r s t h e d a t a p r o c e s s i n g algo- r i t h m s w e r e b a s e d o n so-called s t r u c t u r a l minimization of mean r i s k a p p r o a c h . The main i d e a s a n d r e s u l t s of t h i s a p p r o a c h are given i n t h e Appendix.

2. Estimation of Hidden Heterogeneity a s an Ill-Posed Problem

The t h r e e m a j o r m a t h e m a t i c a l p r o b l e m s are r e l a t e d to e q u a t i o n (1). The f i r s t i s a b o u t when t h e solution of t h i s e q u a t i o n e x i s t s . The s e c o n d i s a b o u t w h e t h e r t h e s o l u t i o n i s unique. T h e t h i r d i s a b o u t how s e n s i t i v e i s t h e s o l u t i o n to t h e d i s t u r - b a n c e s of t h e f u n c t i o n U ( t ).

(7)

In t h i s p a p e r w e will not a n a l y z e t h e f i r s t p r o b l e m , r e f e r e n c i n g p u b l i c a t i o n [7]

f o r t h o s e who a r e i n t e r e s t e d in a d e e p e r u n d e r s t a n d i n g of t h e e x i s t e n c e condi- tions. The nonunicity p r o b l e m will b e d e m o n s t r a t e d in a p a r t i c u l a r c a s e . S i n c e t h e s e n s i t i v i t y p r o b l e m i s v e r y i m p o r t a n t f o r t h e d a t a a n a l y s i s we will f o c u s o u r main a t t e n t i o n in t h i s p a p e r o n t h i s p r o b l e m .

L e t u s c o n s i d e r a n e x a m p l e of a n ill-posed p r o b l e m which c a n a r i s e in demo- g r a p h i c a p p l i c a t i o n s . Assume t h a t t h e c o n d i t i o n a l d e n s i t y ( k e r n e l f u n c t i o n ) c a n b e r e p r e s e n t e d in t h e f o r m

t

k ( t

1

z )

=

z X ( t ) e z p ( - 2

f

X ( s ) d s ) ,

0

I t i s well known t h a t if t h e k e r n e l f u n c t i o n i s smooth, t h e n s l i g h t v a r i a t i o n s of U ( t ) c a n p r o d u c e t h e big c h a n g e s in f (2) [R]. One c a n see t h a t if X ( t ) i n (2) i s smooth, t h e n o n e c a n e x p e c t i n s t a b i l i t y in t h e s o l u t i o n of e q u a t i o n ( 1 ) .

The c o n d i t i o n a l d e n s i t y f u n c t i o n k (l

1

z ) g i v e n by ( 2 ) c o r r e s p o n d s t o t h e well- known p r o p o r t i o n a l h a z a r d model of m o r t a l i t y , w h e r e z i s a h e t e r o g e n e i t y v a r i a b l e a n d X ( t ) i s t h e u n d e r l i n e d h a z a r d . Assume t h a t X ( t ) = a X o ( t ) w h e r e a i s some s c a l e p a r a m e t e r . L e t u s show t h a t f o r d i f f e r e n t v a l u e s of a o n e c a n find t h e dif- f e r e n t s o l u t i o n s of t h e i n t e g r a l e q u a t i o n ( 1 ) .

T h e e q u a t i o n ( 1 ) now will b e

w h e r e U ( . ) i s a d e n s i t y f u n c t i o n f o r o b s e r v e d s u r v i v a l times.

Denote b y f l ( z ) t h e s o l u t i o n of (3) f o r t h e case a

=

1. F o r a n y o t h e r value of a o n e c a n w r i t e

w h e r e u = a z . S i n c e f o r a n y f i x e d a e q u a t i o n ( 3 ) h a s a unique s o l u t i o n o n e may w r j t e

The r e l a t i o n b e t w e e n s o l u t i o n of (3) a n d f u n c t i o n f l ( . ) f o l l o a s f r o m t h e n e x t e x - p r e s s i o n

(8)

The last s t a t e m e n t shows, t h a t using d i f f e r e n t v a l u e s f o r p a r a m e t e r a w e h a v e dif- f e r e n t s h a p e s f o r d e n s i t y of hidden h e t e r o g e n e i t y v a r i a b l e , i . e . , t h e s o l u t i o n of ( 3 ) i s n o t unique when a i s unknown.

3. E s t i m a t i o n o f H i d d e n H e t e r o g e n e i t y

In t h i s c h a p t e r t h e new a p p r o c h t o t h e s o l u t i o n of e q u a t i o n (3) i s c o n s i d e r e d . The a p p r o a c h t a k e s i n t o a c c o u n t t h e i n s t a b i l i t y p r o p e r t y of t h e solution of e q u a - t i o n ( 3 ) a n d t h e l a c k of i n f o r m a t i o n a b o u t t h e d i s t r i b u t i o n d e n s i t y U ( t ) . The method i s b a s e d o n t h e s t r u c t u r a l minimization of mean r i s k . T h e i d e a s of t h i s a p - p r o a c h a r e o u t l i n e d in t h e Appendix. To implement t h e s e i d e a s c o n s i d e r t h e family of f u n c t i o n s [ Q ( x ) ] w h e r e

a n d f ( 2 ) i s some d i s t r i b u t i o n d e n s i t y f u n c t i o n of z with X ( t )

>

0. L e t u s t a k e t h e mean r i s k f u n c t i o n a l in t h e f o r m

G e n e r a l t h e o r y of s t r u c t u r a l minimization of mean r i s k c o n s i d e r s mean r i s k f u n c - t i o n a l with n o n n e g a t i v e loss f u n c t i o n Q ( x ) . In o u r case i t i s n o t s o . H o w e v e r , as- suming t h a t t h e d i s t r i b u t i o n of z i s c o n c e n t r a t e d o n a f i n i t e i n t e r v a l o n e c a n al- ways a d d some p o s i t i v e c o n s t a n t t o all f u n c t i o n s f r o m t h i s family a n d make t h e m po- s i t i v e without c h a n g i n g t h e optimal p o i n t of t h e f u n c t i o n a l .

T h e f u n c t i o n a l G with s u c h Q ( x ) i s t h e p a r t i c u l a r c a s e of so-called mixed en- t r o p y f u n c t i o n a l . I t t a k e s i t s minimal v a l u e o n t h e s o l u t i o n of e q u a t i o n (1). T h e em- p i r i c a l r i s k f u n c t i o n a l will b e as follows

which c o i n c i d e s with t h e minus likelihood f u n c t i o n a l .

(9)

A s a f i r s t e x a m p l e let u s c o n s i d e r t h e families of f u n c t i o n s [Qi j i n t h e f o r m of (4) w h e r e t h e f u n c t i o n s f i ( z ) are s u p p o s e d t o b e a h i s t o g r a m

i

w h e r e a k , i t 0 , a k n i

=

1, a n d H k , i ( z ) are t h e s t e p f u n c t i o n s e q u a l t o

k = 1

1 when z k P i S z

<

z k + l , i a n d e q u a l t o 0 o t h e r w i s e , z ~ , ~ , k

=

1 , 2 ,..., i

Zk + l , i

-

Zk,t

are f i x e d p o i n t s z l i = 0 , z i + l , i

=

1 , a k , c are t h e p a r a m e t e r s of t h e h i s t o g r a m , i i s t h e n u m b e r of t h e p a r a m e t e r s .

VJe u s e d t h e v a l u e s z k , i

=

( k - 1 ) / i f o r c r e a t i n g t h e h i s t o g r a m . One c a n u s e a n y o t h e r set of z k S i if t h e r e i s i n f o r m a t i o n o n s u b i n t e r n a t i o n a l inside [0,1] w h e r e d e n s i t y f u n c t i o n f ( z ) c h a n g e s f a s t . If t h e r e i s n o s u c h p r e l i m i n a r y i n f o r m a t i o n , t h e n o n e s h o u l d u s e e q u i d i s t a n t p o i n t s z k a f .

T h e h i s t o g r a m a p p r o x i m a t i o n of d e n s i t i e s i s widely u s e d i n s t a t i s t i c a l p r a c - t i c e . I t p r e s u p p o s e d t h e f i n i t e n e s s of t h e p o s s i b l e v a l u e s of z . T h e n u m b e r of in- t e r v a l s of t h e h i s t o g r a m will b e d e t e r m i n e d d u r i n g t h e s t r u c t u r a l minimization of r i s k p r o c e d u r e . We a s s u m e t h a t t h e d i s t r i b u t i o n s f ( z ) are all d e f i n e d o n t h e in- t e r v a l [0,1]. This i n t e r v a l c a n b e c h a n g e d if o n e h a s p r e l i m i n a r y i n f o r m a t i o n o n w h e r e t h e d i s t r i b u t i o n of z i s c o n c e n t r a t e d .

I t i s i m p o r t a n t t o e m p h a s i z e t h a t we d o n o t a s s u m e real d i s t r i b u t i o n of h e t e r o - g e n e i t y p a r a m e t e r t o b e i n f o r m (6). E x p r e s s i o n ( 6 ) g i v e s only a n a p p r o x i m a t i o n of real d i s t r i b u t i o n a n d t o implement s t r u c t u r a l r i s k minimization m e t h o d we d o n ' t n e e d t o know t h e p r e c i s e f o r m of t h i s d i s t r i b u t i o n .

Now i t i s e a s y t o c o n s t r u c t f u n c t i o n a l families lQ,] b y c h a n g i n g t h e n u m b e r of p a r a m e t e r s i i n ( 6 ) . S o family l Q 1 ] will b e given b y f u n c t i o n s

family l Q 2 ] will b e given b y e x p r e s s i o n ( 8 )

a n d s o o n . W e will u s e t h e uniform g r e e d z l i , z z f ,..., z i i f o r which ( z ~ + ~ , ~

-

z ~ , ~ ) = I / i. In t h e c a s e if o n e h a s m o r e i n f o r m a t i o n o n h e t e r o g e n e i t y d i s t r i b u t i o n , o n e c a n u s e o t h e r s p e c i a l g r e e d s with d i f f e r e n t k n o t s . T h e o n l y thing i s i m p o r t a n t t h a t t h e g r i d i s t o b e f i x e d b e f o r e o n e starts t o implement t h e s t r u c -

(10)

t u r a l r i s k minimization method, b e c a u s e t h e i n e q u a l i t y (A5) i n t h e Appendix i s valid o n l y in t h i s case. If o n e will t r y to f i t t h e g r e e d to t h e e x p e r i m e n t a l d a t a , t h a n o n e c a n h a v e wrong r e s u l t .

S u b s t i t u t i n g (6) i n t o (5), o n e c a n see t h a t in e v e r y family Qt o n e i s t o minimize t h e f u n c t i o n a l

I

w h e r e

w h e r e z k B i are t h e k n o t s i n t h e g r e e d f o r (6)

Following t h e s t r u c t u r a l minimization of mean r i s k a p p r o a c h o n e s h o u l d minim- i z e t h e f u n c t i o n a l of e m p i r i c a l r i s k (5), t h e n c o m p a r e t h e v a l u e s of t h e f u n c t i o n a l s

f o r d i f f e r e n t i arid c h o o s e t h e minimal v a l u e of

Bi.

H e r e f ; ( z ) d e n o t e s t h e histo- gram c o n s t r u c t e d b y minimizing f u n c t i o n a l (5) in t h e family of h i s t o g r a m s with i p a r a m e t e r r .

A s a s e c o n d e x a m p l e l e t u s c o n s i d e r t h e s i t u a t i o n when p r e l i m i n a r y informa- tion i s a v a i l a b l e o n t h e h e t e r o g e n e i t y d i s t r i b u t i o n . Assume t h a t h e t e r o g e n e i t y v a r i a b l e z c a n t a k e t h e f i n i t e n u m b e r of known values. One n e e d s t o e s t i m a t e t h e r e s p e c t i v e p r o b a b i l i t i e s o b s e r v i n g a s a m p l e of s u r v i v a l times z l .z2,

..

.,zL. This a p - p r o a c h c o r r e s p o n d s to t h e c a s e when t h e population u n d e r i n v e s t i g a t i o n c o n s i s t s of a f i n i t e n u m b e r of homogeneous s u b g r o u p s a n d w e know t h e v a l u e s of h e t e r o - g e n e i t y v a r i a b l e f o r e a c h of t h e s e s u b g r o u p s . This s i t u a t i o n i s s i m p l e r t h a n a b o v e b u t i t i s r e l e v a n t f o r many p r a c t i c a l s i t u a t i o n s . In r e a l l i f e w e c a n h a v e informa- tion a b o u t s u r v i v i n g in, s a y , g e n e t i c s u b g r o u p s a n d we may b e i n t e r e s t e d in p r o -

(11)

p o r t i o n s of t h e s e s u b g r o u p s in t h e t o t a l population.

To use o u r method f o r t h i s c a s e we r e w r i t e e x p r e s s i o n (4) in t h e form

w h e r e P,

=

P(z

=

z,).

A s a m a t t e r of f a c t now we e s t i m a t e not function b u t some numbers a n d instead of functional family 1Q

1,

now o n e c a n u s e just i dimensional v e c t o r s p a c e , where i i s number of f i x e d g r o u p s minus 1 b e c a u s e t h e sum of P, i s t o e q u a l 1.

Now o n e c a n c h e c k d i f f e r e n t h y p o t h e s e s a b o u t s u b g r o u p s in t o t a l population.

When we c o n s i d e r d i f f e r e n t n u m b e r s of g r o u p s we h a v e d i f f e r e n t families a n d minimizing e x p r e s s i o n

on p r o p o r t i o n

P'

a n d number of g r o u p s i we will find t h e b e s t s u i t a b l e number of s u b g r o u p s a n d p r o p o r t i o n s f o r them.

To d e m o n s t r a t e t h e power of t h e method, we p e r f o r m e d c a l c u l a t i o n s with sam- p l e s , g e n e r a t e d with known p r o b a b i l i s t i c d i s t r i b u t i o n s . We c o n s i d e r e d t h e continu- o u s d i s t r i b u t i o n of h e t e r o g e n e i t y v a r i a b l e with density function

w h e r e @ i s some known p a r a m e t e r . The density function c o r r e s p o n d s to t h e c a s e when t h e h e t e r o g e n e i t y v a r i a b l e c a n b e e x p r e s s e d in t h e form used in Cox's model r91

z

=

e-PU

a n d U i s a random v a r i a b l e with uniform d i s t r i b u t i o n on t h e i n t e r v a l [0,1]. F o r both examples t h e numerical c a l c u l a t i o n s were p r o v i d e d .

In t h e first. case w e estimated t h e continuous density (J'(z)) by histogram. The number of p a r a m e t e r s in t h e histogram vias determined o n a given sample by t h e method d e s c r i b e d a b o v e . Typical e s t i m a t e of continuous d i s t r i b u t i o n ( f ( z ) ) i s shown in C h a r t 1. In Table 1 we p u t t h e value of p a r a m e t e r @, sample s i z e L , d e t e r - mined number of p a r a m e t e r s in h i s t o g r a m i , p r o b a b i l i t y of e v e r y s u b i n t e r v a l in

(12)

c o r r e s p o n d e n c e with f ( z ) , P , a n d e s t i m a t e d p r o b a b i l i t y of e v e r y s u b i n t e r v a l in

A

c o r r e s p o n d e n c e with t h e h i s t o g r a m P.

T a b l e 1. T a b l e 2.

I N

L P

P I

From Table 1 o n e c a n s e e t h a t t h e l a r g e r t h e s a m p l e s i z e , t h e b e t t e r t h e e s t i - mation, b u t e v e n in t h e c a s e of s m a l l s a m p l e o n e s t i l l h a s a good e s t i m a t i o n .

(13)

C h a r t 1.

I n t h e c a s e of mix d i s t r i b u t i o n when h e t e r o g e n e i t y v a r i a b l e may h a v e only f i x e d v a l u e s we e s t i m a t e d p r o b a b i l i t i e s of t h e s e values, or p r o p o r t i o n s b e t w e e n d i f f e r e n t states of h e t e r o g e n e i t y v a r i a b l e . In Table 2 we p u t n u m b e r of s u b g r o u p s i n p o p u l a t i o n N , s a m p l e s i z e L , r e a l p r o p o r t i o n s

P ,

a n d e s t i m a t e s 2;.

H e r e a g a i n o n e c a n see t h a t t h e l a r g e r t h e sample s i z e , t h e b e t t e r t h e estima- t i o n , b u t i n small s a m p l e case t h e e s t i m a t e i s good e i t h e r .

4. Experiments w i t h R e a l D a t a

In t h i s c h a p t e r we p r e s e n t t h e r e s u l t s , o b t a i n e d b y t r e a t m e n t of real d a t a . The d a t a file was e x t r a c t e d f r o m t h e Umea D a t a Base with kind h e l p of Gun S t e n f l o (Umea U n i v e r s i t y , Sweden). The f i l e included r e c o r d s of s u r v i v a l time f o r c h i l d r e n b o r n i n o n e p a r i s h b y m o t h e r s n o t o l d e r t h a n 2 6 y e a r s in 1818-1895. T h a t f i l e was s e p a r a t e d in two s u b f i l c s i n a c c o r d a n c e with p a r e n t ' s o c c u p a t i o n . F i r s t s u b f i l e in- c l u d e d r e c o r d s f o r c h i l d r e n of f a r m e r s , w o r k e r s , r u r a l p r o l e t a r i a n s a n d cases with n o o c c u p a t i o n a l r e f e r e n c e . T h e s e c o n d s u b f i l e included t h e rest a n d in f a c t i t was r e c o r d s with unknown o c c u p a t i o n . W e h a d 196 r e c o r d s in t h e f i r s t s u b f i l e a n d 5 7 9 in t h e s e c o n d o n e . I t w a s found t h a t s u r v i v a l s h i p of c h i l d r e n i n t h e s e two f i l e s i s d i f f e r e n t . F o r c h i l d r e n of f a r m e r s , w o r k e r s , r u r a l p r o l e t a r i a n s a n d n o o c c u p a - t i o n a l r e f e r e n c e t h e mean v a l u e of s u r v i v a l time was 1 1 8 0 d a y s . 8 0 % of t h i s g r o u p s u r v i v e d m o r e t h e n 2 0 0 d a y s , 5 0 % s u r v i v e d m o r e t h a n 5 4 0 d a y s a n d 2 0 % s u r v i v e d

(14)

more t h e n 2000 days. F o r c h i l d r e n of p a r e n t s with unknown o c c u p a t i o n t h e mean value of s u r v i v a l time w a s 427 d a y s . 80% of t h i s g r o u p s u r v i v e d more t h e n 9 0 days, 50% s u r v i v e d rnore t h e n 200 days a n d 20% s u r v i v e d more t h e n 5 0 0 days. Histograms of s u r v i v a l time, b a s e d on t h e s e two f i l e s a r e p r e s e n t e d on C h a r t s 2 and 3.

C h a r t 2.

Survivalship P r o p o r t i o n s

-for Children

0-f

Phisical W o r k e r s ( i n p e r c e n t s )

I t i s worth mentioning t h a t t h e p e r c e n t of d e a d c h i l d r e n in t h e f i r s t s u b g r o u p i s t h r e e times l e s s t h a n in t h e s e c o n d one. In numbers p e r c e n t s a r e 18.5% f o r t h e f i r s t s u b g r o u p a n d 53.0% f o r t h e s e c o n d one. S u c h a s i t u a t i o n could h a p p e n f o r in- s t a n c e , if t h e s u b g r o u p with unknown occupation h a s h a d more c a s e s with b a d feed- ing of t h e c h i l d r e n and only "strong c h a p s " s u r v i v e .

To d e m o n s t r a t e t h e u s e of t h e method we p u t b a c k r e c o r d s from t h e two sub- g r o u p s t o g e t h e r . Information a b o u t surviving in t h o s e two s u b g r o u p s , which we ob- tained on t h e p r e l i m i n a r y investigation, was used as a p r i o r y information. We s e t a h y p o t h e s i s t h a t t h e g e n e r a l sample c o n s i s t s of two homogeneous s e t s . The value of h a z a r d r a t e f o r t h e f i r s t s e t we assumed t o b e equal t o t h e e s t i m a t e of h a z a r d r a t e , c a l c u l a t e d on surviving times in r e c o r d s f o r c h i l d r e n of p h y s i c a l w o r k e r s . F o r t h e

(15)

s e c o n d set we p u t h a z a r d r a t e e q u a l s t o t h e e s t i m a t e of i t , c a l c u l a t e d o n s u r v i v i n g times in r e c o r d s with unknown o c c u p a t i o n of p a r e n t s . The n u m b e r s were 0.000847 a n d 0.00234 f o r t h e f i r s t a n d t h e s e c o n d s e t s , r e s p e c t i v e l y . F o r e s t i m a t i o n of h a - z a r d rates we u s e d maximum likelihood e s t i m a t e in t h e f o r m

Then we a p p l i e d o u r method t o e s t i m a t e t h e p r o p o r t i o n b e t w e e n two mentioned s e t s i n t h e g e n e r a l s a m p l e . By c a l c u l a t i o n s o n IBM P C we e s t i m a t e d t h e p r o p o r t i o n b e t w e e n f i r s t a n d s e c o n d sets as 5/13. In o u r d a t a f i l e t h e r e l a t i o n b e t w e e n r e c o r d s with o c c u p a t i o n m o r e t h a n f o u r t o r e c o r d s with o c c u p a t i o n z e r o was 5/14.

S o t h e e s t i m a t i o n i s r a t h e r c l o s e t o t h e o r i g i n a l v a l u e . I t means t h a t t h e method c a n b e s u c c e s s f u l l y u s e d f o r estimation of h i d d e n h e t e r o g e n e i t y .

C h a r t 3.

S u r u i u a I s ; h i ~ P r o p o r t i o n s f o r Children o f P a r e n t s w i t h Unknown O c c u p a t i o n

( i n p e r c e n t s )

(16)

A p p e n d i x

Structural M i n i m i z a t i o n of M e a n R i s k in S m a l l S a m p l e C a s e s

E q u a t i o n ( 1 ) c a n b e s o l v e d using s p e c i a l p r o b a b i l i s t i c t e c h n i q u e s f o r i t s solu- t i o n . The a p p r o a c h i s b a s e d on t h e methods of s t r u c t u r a l minimization of mean r i s k . C o m p r e h e n s i v e a n a l y s i s of t h i s p r o b l e m was d e v e l o p e d b y Vapnik [ l o ] . More d e t a i l e d c o n s i d e r a t i o n of i n t e g r a l e q u a t i o n s ' s o l u t i o n p r o b l e m s r e l a t e d to t h e mean r i s k minimization was d o n e by Michalski [ I l l .

The i d e a of mean r i s k minimization method i s as follows. L e t X b e a r a n d o m v a r i a b l e with d i s t r i b u t i o n f u n c t i o n F ( z ) . L e t IQ: Q ( z ) 2 0 j d e n o t e t h e c l a s s of a11 n o n n e g a t i v e f u n c t i o n s s u c h t h a t f o r e a c h f u n c t i o n Q ( z ) t h e f u n c t i o n a l

e x i s t s . T h e f u n c t i o n a l G i s c a l l e d t h e mean r i s k f u n c t i o n a l . To minimize t h e mean r i s k means to find t h e f u n c t i o n Q* f r o m t h e family of f u n c t i o n s [ Q j s u c h t h a t mean r i s k t a k e s t h e minimal v a l u e o n Q*. N o t e t h a t if t h e d i s t r i b u t i o n f u n c t i o n F ( z ) i s known, t h e a p p r o a c h to minimization of a mean r i s k i s s t r a i g h t f o r w a r d .

In many p r a c t i c a l p r o b l e m s , h o w e v e r , t h e d i s t r i b u t i o n f u n c t i o n of X i s unk- nown, b u t t h e s a m p l e of i n d e p e n d e n t r e a l i z a t i o n s of X i s o f t e n a v a i l a b l e . If t h e s a m p l e i s l a r g e e n o u g h t h e p r o b l e m i s e q u i v a l e n t t o t h e mean r i s k minimization with a known d i s t r i b u t i o n f u n c t i o n . If t h e s a m p l e i s small t h e n o n e s h o u l d use a n o t h e r a p p r o a c h to minimize t h e mean r i s k . S u c h a p p r o a c h i s c a l l e d t h e s t r u c t u r a l minim- i z a t i o n of mean r i s k

[lo].

I t t u r n s out t h a t t h e p r o p e r t y of s a m p l e to b e "small" or " l a r g e " d e p e n d s o n i t s s i z e I, a n d on t.he p r o p e r t i e s of f u n c t i o n a l family IQ

1.

This c r u c i a l p r 0 p e r t . y of f u n c t i o n a l family i s c a l l e d t h e "complexity" of t h i s family.

(17)

The main i d e a of s t r u c t u r a l minimization of mean r i s k method i s to s u b s t i t u t e t h e unknown mean r i s k functional (Al) by t h e err~pirical r i s k f u n c t i o n a l GL which i s completely defined by t h e sample of random v a r i a b l e X:

to s t r u c t u r i z e t h e functional family Q s e l e c t i n g s e v e r a l c l a s s e s of t Q l j , [ Q 2 j ,

. . .

, tQn j and making minimization within e a c h c l a s s .

The . f i r s t s t e p in t h i s p r o c e d u r e seems t o b e n a t u r a l s i n c e t h e sample of X i s t h e only information a b o u t unknown d i s t r i b u t i o n . The n e x t s t e p d e s e r v e s s p e c i a l explanation.

Minimizing t h e e m p i r i c a l r i s k within t h e c l a s s t Q j o n e s h o u l d b e s u r e t h a t i t s minimizing function i s c l o s e enough to t h e function t h a t minimizes t h e mean r i s k . The g u a r a n t e e of t h i s c l o s e n e s s i s t h e uniform c o n v e r g e n c e of t h e e m p i r i c a l r i s k functional t o t h e mean r i s k functional when t h e s i z e of t h e sample L t e n d s to infini- t y .

The uniform c o n v e r g e n c e of e m p i r i c a l r i s k means t h a t f o r a n y fixed E t h e p r o -

bability Pd

g o e s to z e r o when t h e s i z e L of t h e sample tends t o infinity. I t t u r n s o u t t h a t p r o - bability P d d e p e n d s on t h e p r o p e r t y of a functional c l a s s I Q j . This p r o p e r t y i s r e p r e s e n t e d by t h e notion of "complexity" of a c l a s s [ Q ] . The p r s c i r e mathemati- c a l definition of t h e m e a s u r e of complexity

K

of a functional c l a s s o n e c a n find in [lo]. L a t e r we will give t h e m e a s u r e of complexity f o r some p a r t i c u l a r functiorlal c l a s s e s .

If t h e uniform c o n v e r g e n c e e x i s t s t h e n p r o b a b i l i t y P d c a n b e e s t i m a t e d as follows

w h e r e

K

i s t h e complexity i n d e x . One c a n s e e from t h i s inequality t h a t t h e l e s s K is, t h e b e t t e r i s a p p r o x i m a t i o n of mean r i s k by t h e e m p i r i c a l o n e . i t means that. in t h e "simple" c l a s s e s of f u n c t i o n s o n e c a n find more p r e c i s e estimation of t h e mean r i s k .

(18)

To implement t h i s r e s u l t t o t h e p r o b l e m of mean r i s k minimization using t h e s a m p l e of v a l u e s of r a n d o m v a r i a b l e X , l e t u s c o n s i d e r t h e s y s t e m of f u n c t i o n a l c l a s s e s f Q l ]

c

1Q2{ C . . . lQnl with t h e i n c r e a s i n g i n d i c e s of complexity. L c t u s show how in t h i s c a s e t h e i n e q u a l i t y (A3) c a n b e u s e d . Taking i n t o a c c o u n t (A4) we h a v e

w h e r e Kt i s t h e c o m p l e x i t y i n d e x of

lQt 1.

Denoting b y q t h e r i g h t - h a n d s i d e of i n e q u a l i t y ( A 5 ) o n e c a n e a s i l y find t h e formula f o r E when q , L , a n d Kf a r e g i v e n

Using t h i s e x p r e s s i o n o n e c a n e s t i m a t e t h e mean r i s k v a l u e b y t.he e m p i r i c a l r i s k using f o r m u l a

This f o r m u l a m a k e s s e n s e f o r a l l f u n c t i o n s f r o m t h e class f Q f if t h e d e n o m i n a t o r in t h e r i g h t - h a n d s i d e i s p o s i t i v e . N o t e t h a t t h e r e a c h e d v a l u e of mean r i s k in t h e c l a s s

tQi 1

h a s a n u p p e r bound B f

min GL

B, = Q E I Q i j

Thus f o r e a c h f u n c t i o n a l c l a s s

tQi 1

a n d g i v e n L a n d q o n e c a n c a l c u l a t e t h r e e v a r i a b l e s : E * , G:, a n d Bt which c o r r e s p o n d t o t h e v a l u e of r e l a t i v e uniform a p -

(19)

proximation e r r o r , minimum value of e m p i r i c a l r i s k in t h e c l a s s lQi

1

a n d t h e u p p e r bound of t h e r e a c h e d value of t h e mean r i s k at t h e minimum p o i n t of t h e e m p i r i c a l r i s k in t h e c l a s s lQi 1.

In t h e c l a s s e s with small Ki t h e value of c i i s small a n d t h e e m p i r i c a l r i s k g i v e s a good approximation f o r t h e mean r i s k . However t h e minimum value of t h e e m p i r i c a l r i s k G; c a n b e high and consequently t h e r e a c h e d v a l u e of t h e mean r i s k u p p e r bound Bi c a n a l s o b e high.

With t h e i n c r e a s i n g of t h e complexity of t h e c l a s s l Q j t h e approximation of mean r i s k by t h e e m p i r i c a l r i s k became worse, t h e value of ci +l became l a r g e r b u t t h e maximum value of t h e e m p i r i c a l r i s k G: i s d e c r e a s i n g s i n c e lQi

1 c

lQi A s a r e s u l t of t h a t t h e u p p e r bound Bi i s a l s o d e c r e a s i n g . S t a r t i n g f r o m some level of complexity of t h e c l a s s lQf j, s a y K j . , t h e growth of t h e e r r o r ci i s n o t compen- s a t e d by t h e d e c r e a s i n g of t h e value of t h e e m p i r i c a l r i s k and t h e u p p e r bound of t h e r e a c h e d value of t h e mean r i s k starts t o grow. I t means t h a t lQi, j c a n b e c h o s e n as a p r o p e r c l a s s in which t h e minimization of t h e e m p i r i c a l r i s k will g u a r a n t e e t h e minimal value of t h e u p p e r bound f o r t h e r e a c h e d mean r i s k with given p r o b a b i l i t y 1 -q

.

One example f o r t h e system of c l a s s e s lQi

1

c a n b e given by t h e a l g e b r a i c poli- noms of d i f f e r e n t d e g r e e s :

w h e r e aj a r e t h e a r b i t r a r y p a r a m e t e r s . If t h e sample of t h e c o u p l e ( z , y ) i s given t h e n o n e car1 c a l c u l a t e t h e value of t h e e m p i r i c a l r i s k a n d t h e value of Bi which we will identify with t h e estimation of t h e mean r i s k

By solution of t h e mean r i s k minimization problem using t h e f i n i t e sample of c o u p l e (z,y) we will u n d e r s t a n d t h e function Q* which g i v e e s t h e minimum of t h e e m p i r i c a l r i s k in t h e c l a s s lQi. ]. This value d e p e n d s on sample s i z e L , sample values, a n d t h e validation value of t h e uniform approximation of t h e mean r i s k by t h e e m p i r i c a l o n e 1 - q . In p r a c t i c a l c a l c u l a t i o n s t h i s value i s o f t e n t a k e n as .95.

The t y p i c a l s i t u a t i o n i s r e p r e s e n t e d in F i g u r e 1.

The i m p o r t a n t p r o p e r t y of t h e s t r u c t u r a l mean r i s k minimization i s t h a t i t d o e s n o t r e q u i r e t h a t t h e minimizing function belongs t o t h e functional family lQ { . The method allows to make t h e b e s t g u a r a n t e e d approximation b a s e d o n t h e f i n i t e s i z e of t h e e x p e r i m e n t a l sample and s e t of c l a s s e s { Q 1 j , l Q 2 { ,...

.

Moreover, i t

(20)

-

16

-

F i g u r e 1.

- - -

ct values

GI

values

---

Bi values

t u r n s o u t t h a t in t h e c a s e of f i n i t e s a m p l e s sometimes o n e s h o u l d e x c l u d e t h e minimum p o i n t f r o m t h e f u n c t i o n a l c l a s s [lo].

L e t u s e x p l a i n t h e notion o c o m p l e x i t y i n d e x K f o r f u n c t i o n a l family

191.

As- sume t h a t o n e h a s a s a m p l e T

=

! X I , .

. .

,YL ] of r a n d o m v a r i a b l e X . F o r a n y given n u m b e r C

>

0 a n d f u n c t i o n Q ( x ) o n e c a n d i v i d e t h e s a m p l e T i n t o two s u b s a m p l e s T' a n d T' using t h e r u l e : n u m b e r X j b e l o n g s t o s u b s a m p l e T if Q ( X j )

>

C a n d t o s u b - s a m p l e T' if Q ( z j ) 5 C . Changing t h e n u m b e r C a n d t a k i n g all p o s s i b l e f u n c t i o n s Q ( z ) f r o m f Q ] o n e g e t s d i f f e r e n t s u b s a m p l e s . The maximal n u m b e r of d i f f e r e n t divisions f o r a l l p o s s i b l e s a m p l e s h a v i n g t h e s i z e L i s c a l l e d t h e c o m p l e x i t y func- t i o n of t h e c l a s s [ Q ] on t h e s a m p l e s having t h e s i z e L . This f u n c t i o n d e p e n d s o n t h e s a m p l e s i z e and t h e f u n c t i o n a l family. We will u s e t h e n o t a t i o n m g ( L ) f o r t h i s f u n c t i o n . I t is c l e a r t h a t m g ( L ) S Z L . I t t u r n s o u t t h a t t h e c o m p l e x i t y f u n c t i o n ei- t h e r e q u a l s Z L o r s t a r t i n g f r o m some n u m b e r

K

s a t i s f i e s t h e i n e q u a l i t y

(21)

w h e r e K i s t h e c r i t i c a l s a m p l e s i z e . T h e v a r i a b l e K d e p e n d s only on t h e p r o p e r t i e s of t h e f u n c t i o n a l family [Qi a n d is c a l l e d i t s comp1exit.y i n d e x .

The v a l u e of 'A in s o m e c a s e s c a n b e e a s i l y c a l c u l a t e d . I f , f o r i n s t a n c e ,

A ' - 1

Q ( x , y ) = ( y - a j x j 1 2 t h e n K

=

N. A n o t h e r e x a m p l e c o r r e s p o n d s t o t h e case when t h e f u n c t i o n Q ( x ) h a s n o t m o r e t h a n N e x t r e m u m s a n d x i s s c a l a r . In t h i s c a s e K

=

N

+

1 [lo].

Note t h a t e v e r y w h e r e i n t h i s c h a p t e r t h e e x p l a n a t i o n of mean r i s k optimiza- t i o n w a s c o n d u c t e d i n t e r m s of f u n c t i o n s of o n e o r t w o r a n d o m v a r i a b l e s X a n d Y.

One c a n e a s i l y see t h a t t h e a p p r o a c h i s a p p r o p r i a t e f o r a n a r b i t r a r y n u m b e r of r a n d o m v a r i a b l e s .

(22)

REFERENCES

[I] H e c k m a n , J . J . a n d B. S i n g e r (1985) T h e I d e n t i f i c a t i o n P r o b l e m in E c o n o m e t r i c Models f o r D u r a t i o n D a t a . In A d v a n c e s in ~ c o n o m e t r i c s , e d i t e d by W e r n e r H i l d e n b r a n d . C a m b r i d g e U n i v e r s i t y P r e s s .

[Z] V a u p e l , J.W. a n d A.I. Yashin (1995) T h e D e v i a n t Dynamics of D e a t h i n H e t e r o - g e n e o u s P o p u l a t i o n s . P a g e s 1 7 9 - 2 1 1 in Sociological Methodology 1985, e d i t e d b y Nancy B. Tuma. S a n F r a n c i s c o : J o s s e y - B a s s .

[3] Vaupel, J.W. a n d A.I. Yashin (1986) H e t e r o g e n e i t y R u s e s : S o m e S u r p r i s i n g Ef- f e c t s of S e l e c t i o n o n P o p u l a t i o n Dynamics. The A m e r i c a n S t a t i s t i c i a n 39(3):176-155.

[4] S h e p a r d , D. a n d R . Z e c k h a u s e r (1977) I n t e r v e n t i o n s in Mixed P o p u l a t i o n s : Concepts a n d A p p l i c a t i o n s . Discussion P a p e r S e r i e s . H a r v a r d U n i v e r s i t y , J K F S c h o o l of G o v e r n m e n t .

[5] K e y f i t z , N. a n d G . Littman (1979) M o r t a l i t y i n a H e t e r o g e n e o u s P o p u l a t i o n . Po- p u l a t i o n S t u d i e s 33:333-342.

[6] Manton, K.G., E. S t a l l a r d , a n d J.W. Vaupel (1986) A l t e r n a t i v e Models f o r t h e H e t e r o g e n e i t y of M o r t a l i t y R i s k s Among t h e Aged. J o u r n a l of t h e A m e r i c a n S l a t i s t i c a l A s s o c i a t i o n 81 (385):635-644.

[7] R i e s z , F. a n d B. Nagy (1955) f i n c t i o n a l A n a l y s i s . Ungaz, New Y o r k .

[8] Tichonov, A.N. a n d A r s e n i n , V.A. (1974) Method f o r Ill-Posed P r o b l e m s S o l u t i o n . M a n u s c r i p t .

[9] Cox, D.R. (1972) R e g r e s s i o n Models a n d Life T a b l e s . J o u r n a l of t h e R o y a l S t a - t i s t i c a l S o c i e t y , S e r i e s B 3 4 :187-202.

[lo]

Vapnik, V.N. (1982) D e p e n d e n c i e s R e s t o r a t i o n o n B a s e of S m a l l S a m p l e s . S p r i n g e r - V e r l a g

.

[ll] Michalski, A.I.(1984) Algorithms f o r D e p e n d e n c i e s R e c o n s t r u c t i o n . Xloscow, N a u k a .

Referenzen

ÄHNLICHE DOKUMENTE

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria... CURRENT DEMOGRAPHIC PROFILE OF THE

International Institute for Applied Systems Analysis A-2361 Laxenburg, Austria... Donella Meadows,

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria... Kunhanski Program Leader System and Decision

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria... 1 Inequality constrained least squares estimation of regres- sion

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria... An Alternative Presentation of

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg,

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria... Anatoli Yashin Deputy Leader

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria... ANNA'S LIFX