Anoteonmatrixdiﬀerentiation Kowal,Pawel MunichPersonalRePEcArchive

(1)

Munich Personal RePEc Archive

A note on matrix differentiation

Kowal, Pawel

December 2006

Online at https://mpra.ub.uni-muenchen.de/3917/

MPRA Paper No. 3917, posted 09 Jul 2007 UTC

(2)

A note on matrix differentiation

Paweł Kowal July 9, 2007

Abstract

This paper presents a set of rules for matrix differentiation with respect to a vector of parameters, using the flattered representation of derivatives, i.e. in form of a matrix. We also introduce a new set of Kronecker tensor products of matrices. Finally we consider a problem of differentiating matrix determinant, trace and inverse.

JEL classification: C00

Keywords: matrix differentiation, generalized Kronecker products

1 Introduction

Derivatives of matrices with respect to a vector of parameters can be expressed as a concatenation of derivatives with respect to a scalar parameters.

However such a representation of derivatives is very inconvenient in some applications, e.g. if higher order derivatives are considered, and or even are not applicable if matrix functions (like determinant or inverse) are present.

For example finding an explicit derivative of det(∂X/∂θ) would be a quite complicated task. Such a problem arise naturally in many applications, e.g.

in maximum likelihood approach for estimating model parameters.

The same problems emerges in case of a tensor representation of derivatives. Additionally, in this case additional effort is required to find the flattered representation of resulting tensors, which is required, since running numerical computations efficiently is possible only in case of two dimensional data structures.

In this paper we derive formulas for differentiating matrices with respect to a vector of parameters, when one requires the flattered form of resulting derivatives, i.e. representation of derivatives in form of matrices. To do this

(3)

we introduce a new set of the Kronecker matrix products as well as the generalized matrix transposition. Then, first order and higher order derivatives of functions being compositions of primitive function using elementary matrix operations like summation, multiplication, transposition and the Kronecker product, can be expressed in a closed form based on primitive matrix functions and their derivatives, using these elementary operations, the generalized Kronecker products and the generalized transpositions.

We consider also more general matrix functions containing matrix functions (inverse, trace and determinant). Defining the generalized trace function we are able to express derivatives of such functions in closed form.

2 Matrix differentiation rules

Let as consider smooth functions Ω ∋ θ 7→ X(θ) ∈ R^m^×ⁿ, Ω ∋ θ 7→

Y(θ)∈R^p^×^q, whereΩ⊂R^kis an open set. FunctionsX, Y associate am×n and p×q matrix for a given vector of parameters,θ = col(θ1, θ2, . . . , θk). Let the differential of the function X with respect to θ is defined as

∂X

∂θ =£ _∂X

∂θ1

∂X

∂θ2 . . . ^∂X_∂θ

k

¤

for ∂X/∂θi ∈R^m^×ⁿ,i= 1,2, . . . , k.

Proposition 2.1. The following equations hold 1. _∂θ^∂(αX) =α^∂X_∂θ

2. _∂θ^∂(X+Y) = ^∂X_∂θ +^∂Y_∂θ

3. _∂θ^∂(X×Y) = ^∂X_∂θ ×(Ik⊗Y) +X×^∂Y_∂θ

where α ∈ R and Ik is a k ×k dimensional identity matrix, assuming that differentials exist and matrix dimensions coincide.

Proof. The first two cases are obvious. We have

∂

∂θ(X×Y) =£ _∂X

∂θ1 ×Y +X× _∂θ^∂Y₁ . . . _∂θ^∂X

k ×Y +X× _∂θ^∂Y

k

¤

=£ _∂X

∂θ1 . . . ^∂X_∂θ

k

¤×







Y · · · 0 ... ... ...

0 · · · Y





+X×£ _∂Y

∂θ1 . . . _∂θ^∂Y

k

¤

= ∂X

∂θ ×(Ik⊗Y) +X×∂Y

∂θ

(4)

Differentiating matrix transposition is a little bit more complicated. Let us define a generalized matrix transposition

Definition 2.2. Let X = [X1, X2, . . . Xn], where Xi ∈ R^p^×^q, i = 1,2, . . . , n is a p×q matrix is a partition of p×nq dimensional matrix X. Then

Tn(X) .

=£

X₁^′, X₂^′, . . . , X_n^′ ¤ Proposition 2.3. The following equations hold

1. _∂θ^∂(X^′) = T_k(^∂X_∂θ) 2. _∂θ^∂(Tn(X)) = Tk×n(^∂X_∂θ)

Proof. The first condition is a special case of the second condition forn = 1.

We have

∂

∂θ(T(n)(X)) =£

T(n)(_∂θ^∂X₁) . . . T(n)(^∂X_∂θ

k) ¤

=h

∂X1^′

∂θ1, . . . , ^∂X_∂θⁿ^′

1 . . . ^∂X_∂θ¹^′

k, . . . ,^∂X_∂θⁿ^′

k

i= T(k×n)

³∂X

∂θ

´

since

∂X

∂θ =£ _∂X1

∂θ1, . . . ,^∂X_∂θ₁ⁿ . . . ^∂X_∂θ¹

k, . . . ,^∂X_∂θⁿ

k

¤

Let us now turn to differentiating tensor products of matrices. Let for any matrices X, Y, where X ∈ R^p×^q is a matrix with elements xij ∈ R for i= 1,2, . . . , p,j = 1,2, . . . , q. The Kronecker product, X⊗Y is defined as

X⊗Y .

=







x11Y · · · x1qY ... . .. ...

xp1Y · · · xpqY







Similarly as in case of differentiating matrix transposition we need to introduce the generalized Kronecker product

Definition 2.4. Let X = [X1, X2, . . . Xm], where Xi ∈R^p^×^q, i= 1,2, . . . , m is a p×q matrix is a partition of p×mq dimensional matrix X. Let Y = [Y1, Y2, . . . Yn], whereYi ∈R^r^×^s, i= 1,2, . . . , nis a r×s matrix is a partition of r×ns dimensional matrix Y. Then

X⊗¹_nY .

= [X⊗Y₁, . . . , X⊗Y_n] X⊗^m_n Y .

= [X1⊗¹_nY, . . . , Xm⊗¹_nY] X⊗^1,m_n₁_,n²₂^,...,m_,...,n^s_s Y .

= [X⊗^m_n₂²_,...,n^,...,m_s^s Y1, . . . , X⊗^m_n₂²_,...,n^,...,m_s^sYn1] X⊗^m1,m_n₁_,n₂_,...,n²^,...,m_s^s Y .

= [X1⊗^1,m_n₁_,n²₂^,...,m_,...,n^s_s Y, . . . , Xm1 ⊗^1,m_n₁_,n²₂^,...,m_,...,n^s_sY] assuming that appropriate matrix partitions exist.

(5)

Proposition 2.5. The following equations hold 1. _∂θ^∂(X⊗Y) = ^∂X_∂θ ⊗Y +X⊗¹_k ^∂Y_∂θ

2. _∂θ^∂(X⊗^m_n₁¹_,...,n^,...,m_s^sY) = ^∂X_∂θ ⊗^k,m_1,n₁¹_,...,n^,...,m_s^s Y +X⊗^1,m_k,n₁¹_,...,n^,...,m_s^s ^∂Y_∂θ Proof. We have

∂

∂θ(X⊗^m_n₁¹_,...,n^,...,m_s^s Y) =£ _∂

∂θ1(X⊗^m_n₁¹_,...,n^,...,m_s^s Y) · · · _∂θ^∂

k(X⊗^m_n₁¹_,...,n^,...,m_s^sY) ¤

=£ _∂X

∂θ1 ⊗^m_n₁¹_,...,n^,...,m_s^s Y · · · _∂θ^∂X

k ⊗^m_n₁¹_,...,n^,...,m_s^s Y ¤ +£

X⊗^m_n₁¹_,...,n^,...,m_s^s _∂θ^∂Y

1 · · · X⊗^m_n₁¹_,...,n^,...,m_s^s _∂θ^∂Y

k

¤

= ∂X

∂θ ⊗^k,m_1,n₁¹_,...,n^,...,m_s^sY +X⊗^1,m_k,n₁¹_,...,n^,...,m_s^s ∂Y

∂θ

SinceX⊗Y =X⊗¹₁Y, in case of the standard Kronecker product we obtain

∂

∂θ(X⊗Y) = ∂X

∂θ ⊗^k₁Y +X⊗¹_k ∂Y

∂θ = ∂X

∂θ ⊗Y +X⊗¹_k ∂Y

∂θ

In proposition 2.1 we have omitted the case of multiplication of a matrix by a scalar function, using proposition 2.5 we obtain

Proposition 2.6. Let α is a scalar function of θ and X is a matrix valued function of θ, X(θ)∈R^p^×^q. Then

∂

∂θ(αX) =α×∂X

∂θ + ∂α

∂θ ⊗X

Proof. Expression αX can be represented as αX = (α⊗Ip)×X, where Ip

is a p×p dimensional identity matrix. Hence

∂

∂θ(αX) = ∂

∂θ((α⊗Ip)×X) = ∂(α⊗Ip)

∂θ ×(Ik⊗X) + (α⊗Ip)× ∂X

∂θ

= (∂α

∂θ ⊗Ip)×(Ik⊗X) +α× ∂X

∂θ = ∂α

∂θ ⊗X+α× ∂X

∂θ

LetS is a set of smooth matrix valued functions Ω∋ θ 7→X(θ)∈R^p^×^q, whereΩ⊂R^kis an open set, for any integersp, q ≥1not necessary the same for all functions in S. LetdifS .

={∂X/∂θ :X ∈ S}. The setS may contain scalars and matrices, which are interpreted as constant functions.

(6)

Let ext(S) is a set of functions obtained by applying elementary matrix operations on the set S, i.e. ext(S) is a smallest set such that if X, Y ∈ ext(S), then matrix valued functions X+Y, X×Y, Tn(X), X⊗^m1,...,m_n1,...,n_s^s Y if exist belong to ext(S), where n, n1, . . . , ns, m1, . . . , ms are any positive integers.

Theorem 2.7. dif(ext(S)) = ext(S ∪dif(S)).

Proof. By induction using propositions 2.1, 2.3, 2.5, 2.6.

The theorem 2.7 states, that derivatives of matrix valued functions obtained by applying elementary operations like summation, matrix multiplication, generalized transposition and generalized Kronecker tensor product can be expressed as a combination of these functions and their derivatives using these elementary operations. Applying the theorem 2.7 to a set T = dif(ext(S)) we can see that also higher order derivatives can be ex- presses, using these elementary operations, as combinations of elementary functions S and their higher order derivatives.

3 Derivatives of matrix determinant, trace and inverse

Let us consider derivatives of matrix inverse, determinant and trace. We need to introduce the generalized trace defined analogously as the generalized transposition.

Definition 3.1. Let X = [X1, X2, . . . Xn], where Xi ∈ R^p^×^p, i = 1,2, . . . , n is a p×p matrix, is a partition of p×np dimensional matrix X. Then

trn(X) .

=£

trX1,trX2, . . . ,trXn

¤

Proposition 3.2. The following equations hold 1. ^∂^det(X)_∂θ = det(X)×trk(X⁻¹× ^∂X_∂θ)

2. ^∂^tr_∂θⁿ^(X) = tr_k×n(^∂X_∂θ)

3. ^∂X_∂θ⁻¹ =−X⁻¹× ^∂X_∂θ ×(Ik⊗X⁻¹)

(7)

Proof. We have

∂det(X)

∂θ =h

∂det(X)

∂θ1 . . . ^∂^det(X)_∂θ

k

i

=£

det(X) tr(X⁻¹× ^∂X_∂θ

1) . . . det(X)×tr(X⁻¹× ^∂X_∂θ

k) ¤

= det(X)×trk(X⁻¹× ∂X

∂θ )

∂trn(X)

∂θ =h

∂trn(X)

∂θ1 . . . ^∂^tr_∂θⁿ^(X)

k

i =£

trn(_∂θ^∂X₁) . . . tr(_∂θ^∂X

k) ¤

= trk×n(∂X

∂θ ) Similarly

∂X⁻¹

∂θ =h

∂X⁻¹

∂θ1 . . . ^∂X_∂θ⁻¹

k

i=−£

X⁻¹_∂θ^∂X₁X⁻¹ . . . X⁻¹_∂θ^∂X

kX⁻¹ ¤

=−X⁻¹×£ _∂X

∂θ1 . . . _∂θ^∂X

k

¤×(Ik⊗X⁻¹) = −X⁻¹∂X

∂θ (Ik⊗X⁻¹) since in case of scalar parameterθ ∈R,∂det(X)/∂θ = det(X) tr(X⁻¹∂X/∂θ),

∂tr(X)/∂θ = tr(∂X/∂θ), and∂X⁻¹/∂θ =−X⁻¹(∂X/∂θ)X⁻¹ (see for example Petersen, Petersen, (2006)).

Let a set S and operation dif are defined as in the previous section.

Let ext2(S) is a set of functions obtained by applying elementary matrix operations and matrix determinant, trace and inverse on the setS, i.e. ext(S) is a smallest set such that if X, Y ∈ ext2(S), then matrix valued functions X +Y, X×Y, Tn(X), X⊗^m1,...,m_n1,...,n_s^s Y, det(X), trn(X), X⁻¹ if exist belong to ext2(S), wheren, n1, . . . , ns, m1, . . . , ms are any positive integers.

Theorem 3.3. dif(ext2(S)) = ext2(S ∪dif(S)).

Proof. By induction using propositions 2.1, 2.3, 2.5, 2.6, 3.2.

4 Derivatives of function composition

Letf is a matrix valued function given by R^p ∋x7→f(x)∈R^m^×ⁿ

and g is a vector valued function Ω ∋ θ 7→ g(θ) ∈ R^p. We can define a function composition Ω∋θ7→f(g(θ))∈R^m^×ⁿ.

Proposition 4.1. The following condition holds

∂

∂θf(g(θ)) = ∂f(g(θ))

∂x ×³∂g(θ)

∂θ ⊗I_n´

(8)

Proof. Let

f(x) =







f11(x) . . . f1n(x) ... . .. ...

fm1(x) . . . fmn(x)







where fij(x) are scalar valued functions. Then for s = 1, . . . , k

∂fij(x)

∂θs

=

p

X

k=1

∂fij(x)

∂xk

×∂xk

∂θs

= ∂fij(x)

∂x × ∂x

∂θs

since ∂x/∂θs is a column vector. Further f(x)

∂θ_s =

p

X

k=1







∂f11(x)

∂xk . . . ^∂f_∂x¹ⁿ^(x) ... . .. ...k

∂fm1(x)

∂xk . . . ^∂f^mn_∂x^(x)

k





×∂xk

∂θ_s =

p

X

k=1

∂f(x)

∂x_k × ∂xk

∂θ_s

=h

∂f(x)

∂x1 . . . ^∂f(x)_∂x

p

i×







In× ^∂x_∂θ_s¹ ...

In× ^∂x_∂θ_s^p





= ∂f(x)

∂x ×(∂x

∂θ_s ⊗In)

Finally f(x)

∂θs

= ∂f(x)

∂x ×£ _∂x

∂θ1 ⊗In . . . _∂θ^∂x

k ⊗In ¤

= ∂f(x)

∂x ×(∂x

∂θ ⊗In)

5 Properties of the generalized Kronecker prod- uct

Proposition 5.1. For any matrices A, B 1. A⊗^k₁ B =A⊗B.

2. A⊗...,1,1,...

...,nk,1,...B =A⊗^...,m_...,n_k^k_,...^,...B.

3. A⊗...,1,1,...

...,nk,nk+1,...B =A⊗^...,1,..._...,n_k×nk+1,...B.

4. A⊗^...,m_...,1,n^k_k+1^,m^k+1_,...^,...B =A⊗^...,m...,nk+1^k^×^m,...^k+1^,...B.

assuming that the Kronecker products exist.

(9)

Proposition 5.2. For any matrices A, B, C

1. A⊗^m_n₁¹_,...,n^,...,m_k^k(B+C) =A⊗^m_n₁¹_,...,n^,...,m_k^kB +A⊗^m_n₁¹_,...,n^,...,m_k^kC.

2. (A+B)⊗^m_n₁¹_,...,n^,...,m_k^k C =A⊗^m_n₁¹_,...,n^,...,m_k^kC+B⊗^m_n₁¹_,...,n^,...,m_k^k C.

assuming that the Kronecker products exist and matrix dimensions coincide.

Proposition 5.3. For any matrices A, B, C, D

(AB)⊗^m_n₁¹_,...,n^,...,m_s^s(CD) = (A⊗C)×(B⊗^m_n₁¹_,...,n^,...,m_s^s D)

assuming that products AB and CD, as well as Kronecker products exist.

Proof. Observe that X⊗^m_n₁¹_,...,n^,...,m_s^s Y =X⊗^m_n₁¹_,...,n^,...,m_s_,1^s^,1Y, and (AB)⊗¹₁ (CD) = (A ⊗ C) × (B ⊗¹₁ D), since (AB) ⊗ (CD) = (A ⊗ C) × (B ⊗ D). Let (AB)⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1(CD) = (A⊗C)×(B⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1D)for k ≥0. Then

(AB)⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1(CD)

=£

(AB)⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1(CD1) . . . ,(AB)⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1(CDnk+1) ¤

=£

(A⊗C)(B ⊗^m_n_k^k_,...,n^,...,m₁¹_,1^,1D1) . . . ,(A⊗C)(B⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1Dnk+1) ¤

= (A⊗C)×(B ⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1D) Similarly

(AB)⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1(CD)

=h

(AB1)⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1(CD) . . . ,(ABmk+1)⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1(CD) i

=h

(A⊗C)(B1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1D) . . . ,(A⊗C)(Bmk+1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1D) i

= (A⊗C)×(B ⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1D)

Proposition 5.4. For any matrices A, B of size p1×q1 and p2×q2

A⊗^m_n₁¹_,...,n^,...,m_s^s B = (A⊗B)×(Iq1 ⊗^m_n₁¹_,...,n^,...,m_s^s Iq2) assuming that Kronecker product exists.

A⊗^m_n₁¹_,...,n^,...,m_s^s(B⊗C) = (A⊗^m_n₁¹_,...,n^,...,m_s^s B)⊗C assuming that Kronecker products exist.

(10)

Proof. Observe that X⊗^m_n₁¹_,...,n^,...,m_s^s Y = X⊗^m_n₁¹_,...,n^,...,m_s_,1^s^,1Y, and A⊗¹₁ (B⊗C) = (A⊗¹₁B)⊗C, sinceA⊗(B⊗C) = (A⊗B)⊗C. LetA⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1(B⊗C) = (A⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1B)⊗C for k≥0. Then

A⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1(B⊗C)

=£

A⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1(B1⊗C) . . . , A⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1(Bnk ⊗C) ¤

=£

(A⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1B1)⊗C . . . ,(A⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1Bnk)⊗C ¤

= (A⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1B)⊗C Similarly

A⊗^m_n_k+1^k+1_,n^,m_k_,...,n^k^,...,m₁_,1¹^,1(B⊗C)

=h

A1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1(B ⊗C) . . . , Amk+1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1(B⊗C) i

=h

(A1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1B)⊗C . . . ,(Amk+1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1B)⊗C i

= (A⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1B)⊗C

A⊗(B⊗^m_n₁¹_,...,n^,...,m_s^sC) = (A⊗B)⊗^q,m_1,n₁¹_,...,n^,...,m_s^sC

where q is a number of columns of the matrix A, assuming that Kronecker products exist.

Proof. Observe that X⊗^m_n₁¹_,...,n^,...,m_s^s Y = X⊗^m_n₁¹_,...,n^,...,m_s_,1^s^,1Y, and A⊗(B⊗¹₁ C) = (A⊗B)⊗¹₁ C = (A⊗B)⊗^q,1_1,1 C, since A ⊗(B ⊗C) = (A⊗B)⊗C and A⊗^q₁C =A⊗C if the Kronecker product exists. LetA⊗(B⊗^m_n_k^k_,...,n^,...,m₁¹_,1^,1C) = (A⊗B)⊗^q,m_1,n_k^k_,...,n^,...,m₁_,1¹^,1C for k ≥0. Let A= [A1, . . . , Aq]. Then

Ai⊗(B⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1C)

=£

Ai⊗(B ⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1C1) . . . , Ai⊗(B⊗^m_n_k^k_,...,n^,...,m₁¹_,1^,1Cnk) ¤

=£

(Ai⊗B)⊗^m_n_k^k_,...,n^,...,m₁¹_,1^,1C1 . . . ,(Ai⊗B)⊗^m_n_k^k_,...,n^,...,m₁_,1¹^,1Cnk

¤

= (Ai⊗B)⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1C Similarly

A_i⊗(B⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1 C)

=h

Ai⊗(B1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1C) . . . , Ai⊗(Bmk+1⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1C) i

=h

(Ai⊗B1)⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1C . . . ,(Ai⊗Bmk+1)⊗^1,m_n_k+1^k^,...,m_,n_k_,...,n¹^,1₁_,1 C i

= (Ai⊗B)⊗^m_n_k+1^k+1_,n^,m_k_,...,n^k^,...,m₁_,1¹^,1C

(11)

Finally,

A⊗(B⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1C)

=h

A1⊗(B⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1 C) . . . , Aq⊗(B ⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1C) i

=h

(A₁⊗B)⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1C) . . . ,(A_q⊗B)⊗^m_n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1 C) i

= (A⊗B)⊗^q,m_1,n_k+1^k+1_,n^,m_k^k_,...,n^,...,m₁_,1¹^,1C

Proposition 5.7. Let A is m×n matrix. Let B is p×q matrix. Then A⊗¹_qB = (Im⊗¹_pIp)×(B⊗A)

Proof. LetAⁱ is i-th column of AandB^j is j-th column ofB. Let I_p^k denotes k-th column ofp×pidentity matrix and letB_jⁱ denotes element of B atj-th row and i-th column. Then

(Im⊗¹_pIp)×(B^j⊗Aⁱ) =£

Im⊗I_p¹ . . . Im⊗I_p^p ¤

×



 B₁^jAⁱ

. . . B_p^jAⁱ





=

p

X

r=1

(Im⊗I_p^r)×(Aⁱ⊗B_r^j) =

p

X

r=1

Aⁱ⊗(I_p^r×B_r^j) = Aⁱ⊗B^j

Further

(Im⊗¹_pIp)×(B^j⊗A) = (Im⊗¹_p Ip)×£

B^j ⊗A¹ . . . B^j⊗Aⁿ ¤

=£

A¹⊗B^j . . . Aⁿ⊗B^j ¤

=A⊗B^j (I_m⊗¹_pI_p)×(B⊗A) = (I_m⊗¹_pI_p)×£

B¹⊗A . . . B^q⊗A ¤

=£

A⊗B¹ . . . A⊗B^q ¤

=A⊗¹_qB

Proposition 5.8. Let A is m×n matrix. Let B is p×q matrix. Then A⊗^1,n_m₁¹_,1,...,m^,...,1,n^s^,1

s,1,q/m¯ B = (Im⊗¹_pIp)×(B⊗^m_n₁¹_,...,n^,...,m_s^s A)

where m¯ =m1× · · · ×ms, assuming that the Kronecker products exist.

(12)

Proof. Proposition holds for s= 0. Let for any s≥0A⊗^1,n_m_s_,1,...,m^s^,...,1,n₁¹_,1,q/_m_¯_sB = (Im⊗¹_pIp)×(B ⊗^m_n_s^s_,...,n^,...,m₁¹ A), wherem¯s =m1× · · · ×ms. Then

(Im⊗¹_pIp)×(B⊗^1,m_n_s+1^s^,...,m_,n_s_,...,n¹ ₁ A)

= (I_m⊗¹_pI_p)×£

B⊗^m_n_s^s_,...,n^,...,m₁¹ A1 . . . B⊗^m_n_s^s_,...,n^,...,m₁¹ Ans+1

¤

=h

A1⊗^1,n_m_s_,1,...,m^s^,...,1,n₁¹_,1,q/^,1 _m_¯_s B . . . Ans+1⊗ⁿ_m^s_s^,...,n_,...,m¹^,1₁_,q/_m_¯_sB i

=A⊗^1,n_1,1,m^s+1^,1,n^s^,...,1,n¹^,1

s,1,...,m1,1,q/m¯s B and

(I_m⊗¹_pI_p)×(B ⊗^m_n_s+1^s+1_,n^,m_s_,...,n^s^,...,m₁¹ A)

= (I_m⊗¹_pI_p)×£

B1⊗^1,m_n_s+1^s^,...,m_,n_s_,...,n¹ ₁ A . . . Bms+1 ⊗^1,m_n_s+1^s^,...,m_,n_s_,...,n¹ ₁ A ¤

=h

A⊗^1,n_1,1,m^s+1^,1,n^s^,...,1,n¹^,1

s,1,...,m1,1,q/m¯s+1B₁ . . . A⊗^1,n_1,1,m^s+1^,1,n^s^,...,1,n¹^,1

s,1,...,m1,1,q/m¯s+1 B_m_s+1 i

=A⊗^1,n_m ^s+1^,1,n^s^,...,1,n¹^,1

s+1,1,ms,1,...,m1,1,q/m¯s+1 B

Proposition 5.9.

(Im⊗¹_qIq)⁻¹ =Iq⊗¹_mIm

Proof. Observe that (Im⊗¹_qIq) is an orthogonal matrix since this matrix can be obtained permuting columns of the matrix Imq. Hence (Im ⊗¹_q Iq)⁻¹ = (Im⊗¹_qIq)^T. Further

(I_m⊗¹_qI_q)^T =





I_m⊗(I_q¹)^T . . . Im⊗(I_q^q)^T



=





(I_q¹)^T ⊗¹_mI_m . . . (I_q^q)^T ⊗¹_mIm



=I_q⊗¹_mI_m

The second equality can be shown using for example proposition 5.7.

Proposition 5.10.

(Im⊗^k_nIq)⁻¹ = (Inm⊗^nk_q/nIq/n)×(Ikq⊗^k_m/kIm/k) assuming that the Kronecker product exists.

Proof. Observe that(Im⊗¹_nIq)is an orthogonal matrix since this matrix can be obtained permuting columns of the matrix Imq. Hence (Im ⊗¹_nIq)⁻¹ =

(13)

(Im⊗¹_nIq)^T. Further

In⊗(Iq/n⊗¹_mIm)×(Im⊗¹_nIq)^T =





(Iq/n⊗¹_mIm)×Im⊗(I_q¹)^T . . .

(Iq/n⊗¹_mIm)×Im⊗(I_qⁿ)^T





=





(I_q¹)^T ⊗¹_mI_m . . . (I_qⁿ)^T ⊗¹_mIm



=Iq⊗¹_mIm

The second equality can be shown using for example proposition 5.7. Hence (Im⊗¹_nIq)⁻¹ =³

In⊗(Iq/n⊗¹_mIm)⁻¹´

×(Iq⊗¹_mIm)

=³

In⊗(Im⊗¹_q/nIq/n)´

×(Iq⊗¹_mIm)

= (Inm⊗ⁿ_q/nIq/n)×(Iq⊗¹_mIm) Further

³I_m⊗^k_nI_q´⁻1

=³

(I_k⊗I_m/k)⊗^k,1_1,nI_q´⁻1

=³

I_k⊗(I_m/k⊗¹_nI_q)´⁻1

=Ik⊗³

(Inm/k⊗ⁿ_q/nIq/n)×(Iq⊗¹_m/kIm/k)´

=I_k⊗(I_nm/k⊗ⁿ_q/nI_q/n)×I_k⊗(I_q⊗¹_m/k I_m/k)

= (Inm⊗^nk_q/nIq/n)×(Ikq⊗^k_m/kIm/k)

Proposition 5.11. Let A is m×n matrix. Let B is p×q matrix. Then A⊗B = (Im⊗¹_pIp)×(B⊗A)×(Iq⊗¹_nIn)

Proof.

(Im⊗¹_pIp)×(B⊗A) =A⊗¹_qB = (A⊗B)×(In⊗¹_qIq)

= (A⊗B)×(Iq⊗¹_nIn)⁻¹

6 Concluding remarks

Derived formulas requires matrix tensor products, which are absent, when representing derivatives as the concatenation of derivatives with respect to a scalar parameters. Hence, this approach may decrease numerical efficiency.

This problem however can be resolved using appropriate data structures.

(14)

References

[1] T.P. Minka. Old and new matrix algebra useful for statistics. notes, December 2000.

[2] K.P. Petersen and M. S. Petersen. The matrix cookbook. notes, 2006.