Independent Component Analysis ICA The Cocktail-Party Problem

(1)

The Cocktail-Party Problem

Faith Riggold 1964 “Cocktail-Party”

ICA

2

Independent Component Analysis

(2)

ICA: the task

Given

n

Signals

X

that are a linear mixture of unknown source signals

S

, can we estimate the source signals? The “BSS: Blind Source Separation” or “Cocktail-Party”

problem:

Estimated Separated Sources

W Y

X = A·S Y=W·X

unknown to be estimated

Unknown Source Signals

S A

Observed Signals

“linear mixture of sources”

X

ICA: the problem

4

Given: X = { x_i( t ) | 1 ≤ i ≤ n, 1 ≤ t ≤ T }, a n ^xT data matrix Problem: How to decompose the matrix into

X = AS

^with

an unknown mixing matrix

A

and unknown source signal S_i_●

X = A

x

···· S_1●····

···· S_2●····

···· S_m●····

So far only linearity is assumed. -> Many solutions & ambiguities !!!

Ambiguities:

1. The variance (scaling ) of S_i_● cannot be determined, either a scalar multiplication of A^{or of}S

Ambiguities:

1. The variance (scaling ) of S_i_● cannot be determined, either a scalar multiplication of A^{or of}S

2. The order of the sources is arbitrary.

-> we should normalize the sources.

(3)

ICA: the data model

X = A

x

···· S_1●····

···· S_2●····

···· S_m●····

Question:

What could be an assumption on the sources

S

that helps to decompose the

X

into

A

and

S

? Assumption:

1) All sources signals Si, the rows of

S,

are statistically independent.

2) Since we can not estimate the magnitude of Si, we fix it to E[Si^Si^T] =1 ==> E[

SS

^T] =

I

6

Definition:

Random variables (vectors)

y

_iare statistically independent if

P

(y₁, y₂,….,y_n)⁼

∏ P

(y_i)

ICA: statistical independence

==> for any function

g

_i

E

[

g

₁^(y1)

g

₂^(y2)···

g

_n^(yn)]⁼

∏ E

[

g

_i^(yi) ]

(4)

ICA: what about PCA

=> E[

s

] = 0 , E[

s ²

^]⁼ 1

uncorrelated versus independence

=>

s,t

are uncorrelated

=> E [ f( s ) g( t )] = 2E [ s⁴] = ²

 

³ ⁴

5 and E [ f( s )] E[ g( t )] = 2

E[ s² ^] 2E[ s² ]

=>

s,t

are stat. dependent -√3

½

^√3

+√3

p(s) : uniform distribution

define t = 2s

²

=> E[ s·t ] = 2E[ s³ ] =0

define f(x) = x

²

and g(x) = x

also E[ s ]E[ t ] =0 ?? =>

s ,t

statistically independent??

8

If more than one source signal is Gaussian we can not separate the sources with ICA.

Let us assume:

all source signals are Gaussian, uncorrelated and of unit variance.

Then an orthogonal mixing matrix would generate signals X_i with a

completely symmetrical Gaussian joint density function.

ICA: gaussian signals are of no use

A completely symmetrical joint density function contains no information on the structure of the mixing matrix

A

^.

(5)

ICA: the data model continued

Assumptions:

1) non-Gaussian source signals Si (except possibly one).

2) All sources signals Si, the rows of

S

are statistically independent.

3) Since we can not estimate the magnitude of Si, we fix it to E[Si^Si^T] =1 ==> E[

SS

^T] =

I

ICA: approach

10

We search for a matrix W (ideally W = A^-1) so that the rows y_i of

Y = WX

1. are maximal statistically independent, 2. are maximal non Gaussian,

3. and of variance E[y_iy_i^T] =1.

(6)

ICA: PCA as preprocessing

PCA

• computes the axis of maximum variance

• these axis are uncorrelated ( but only for Gaussian data statistically independent).

Now we can normalize the axis by their variance => Z = V·X=V·A·S with E[z_iz_i^T] =1.

PCA normalize by variance

“whitening the data”

for non Gaussian data 

ICA: the procedure

12

Source Mixture

Whitened Signals

Z = V·X=V·A·S

X = A·S Y=W·X

S

X X X

(7)

ICA: the problem reformulated twice

2.) Independence approach :

1. Measure the independence between the signals.

2. Find the signals that maximize this independence.

1.)

Non-Gaussian approach :

By the central limit theorem, the PDF of a sum of n independent random variables tends to a Gaussian random variable.

1. Find a measure ofnon-Gaussianity.

2. Find W such that the outputs PDF are as different as possible from the Gaussian function.

ICA: measure of non-Gaussian

14

There exist several approaches to measure if a pdf is Gaussian or not!

However, we do not know the full pdf!

 it is more reasonable to use more global measures of the distribution such as mean, variance,…

Remember, moments and cumulants (semi-invariants) are easy to compute!

1

[ ] ( ), 1, 2, ...

N

i i

i

I

m E x x P x i



  

i^th moment:

(8)

ICA: measure of non-Gaussian

1 1

2 2

2 2 1

4

( ) [ ]

( ) [ ] -

...

( ) [ ] [ ] [ ]

[ ] [ ]

i i

i j i j

i j k l i j k l i j k l

i k j l

x E x m

x x E x x m m

x x x x E x x x x E x x E x x E x x E x x



 



 

  

 



[ ] [ ]

i j k l

i l j k

E x x E x x E x x E x x



The cumulants of distribution are:

The Kurtosis is then defined:

4 2 2

( _i) [ _i ] - 3 ( [ _i ])

ku rt x  E x E x

  

ICA: measure of non-Gaussian

16

Both cumulants and kurtosis are good to measure the deviation of a distribution from being Gaussian:

For ICA the Kurtosis is commonly applied:

For finding the independent components:

optimize

W

so that is maximum: ⁽ ⁾

n

j j

ku rt Y



(9)

ICA: 2nd approach: independence

Remember:

Kullbach-Leibler divergence measures a distance between two pdf’s (! not symmetric):

     

, ln ^b 

a b a

a

p x

L p p p x d x

p x

 

Definition of stat. independent

p

(y₁, y₂,….,y_n) =

∏ p

(y_i)

Now we measure L ( p(y₁, y₂,….,y_n) , ∏p(y_i) )

   

( )

( ), ( ) ln

n

i i

i

p y

L p Y p y p Y d Y

p Y

 

  

 

  

ICA: measure of independence

18

Now we measure L

(

p(y₁, y₂,….,y_n) , ∏p(y_i) )

   

( )

( ), ( ) ln

n

i i

i

p y

L p Y p y p Y d Y

p Y

 

  

 

  

     

ln ln ( )

n

i i

p Y p Y d Y p Y p y d Y

   

   

ln ( )

n

i i

H Y p Y p y d Y

    

   

n i i

H Y H y

   

(10)

ICA: application to images

Mixtures

ICA: application to images

20

PCA

ICA

(11)

Independent Component Analysis ICA The Cocktail-Party Problem

The Cocktail-Party Problem

ICA

Independent Component Analysis

ICA: the task

n

X

S

ICA: the problem

X = AS

A

X = A

ICA: the data model

X = A

S

X

A

S

S,

SS

I

Definition:

y

P

∏ P

ICA: statistical independence

g

E

g

g

g

∏ E

g

ICA: what about PCA

s

s ²

s,t

 

s,t

½

²

²

s ,t

ICA: gaussian signals are of no use

A

ICA: the data model continued

S

SS

I

ICA: approach

Y = WX

ICA: PCA as preprocessing

ICA: the procedure

ICA: the problem reformulated twice

2.) Independence approach :

Non-Gaussian approach :

ICA: measure of non-Gaussian

ICA: measure of non-Gaussian

ICA: measure of non-Gaussian

W



ICA: 2nd approach: independence

p

∏ p

ICA: measure of independence

(

ICA: application to images

Mixtures

ICA: application to images

PCA

ICA

ICA: application to images

ICA

Originals