An inexact alternating proximal gradient algorithm for nonnegative CP tensor decomposition

(1)

SCIENCE CHINA

Technological Sciences

Print-CrossMark

September 2021 Vol. 64 No. 9: 1893–1906 https://doi.org/10.1007/s11431-020-1840-4

⃝c Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2021 tech.scichina.com link.springer.com

.

Article

.

Special Topic: Tensor Methods in Machine Learning

An inexact alternating proximal gradient algorithm for nonnegative CP tensor decomposition

WANG DeQing

^1,2*

& CONG FengYu

^1,2,3,4*

1School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian116024, China;

2Faculty of Information Technology, University of Jyväskylä, Jyväskylä40100, Finland;

3School of Artificial Intelligence, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian116024, China;

4Key Laboratory of Integrated Circuit and Biomedical Electronic System, Liaoning Province, Dalian University of Technology, Dalian116024, China Received December 7, 2020; accepted April 21, 2021; published online July 20, 2021

Nonnegative tensor decomposition has become increasingly important for multiway data analysis in recent years. The alternating proximal gradient (APG) is a popular optimization method for nonnegative tensor decomposition in the block coordinate descent framework. In this study, we propose an inexact version of the APG algorithm for nonnegative CANDECOMP/PARAFAC decomposition, wherein each factor matrix is updated by only finite inner iterations. We also propose a parameter warm-start method that can avoid the frequent parameter resetting of conventional APG methods and improve convergence performance.

By experimental tests, we find that when the number of inner iterations is limited to around 10 to 20, the convergence speed is accelerated significantly without losing its low relative error. We evaluate our method on both synthetic and real-world tensors.

The results demonstrate that the proposed inexact APG algorithm exhibits outstanding performance on both convergence speed and computational precision compared with existing popular algorithms.

tensor decomposition, nonnegative CANDECOMP/PARAFAC, block coordinate descent, alternating proximal gradient, inexact scheme

Citation: Wang D Q, Cong F Y. An inexact alternating proximal gradient algorithm for nonnegative CP tensor decomposition. Sci China Tech Sci, 2021, 64:

1893–1906,https://doi.org/10.1007/s11431-020-1840-4

1 Introduction

Tensor decomposition has become very popular for multiway data analysis [1, 2]. In many real applications, tensor data are naturally nonnegative, as are the intrinsic components. Therefore, nonnegative tensor decomposition, especially the nonnegative CANDECOMP/PARAFAC (NCP) type, has become increasingly important for processing nonnegative high-order tensor data [1, 2]. The NCP decomposition can be seen as a constrained non-convex optimization problem. The block coordinate descent (BCD) framework provides an efficient way to solve such non-convex problems [3,4], wherein each factor matrix is updated alternatively as a

*Corresponding authors (email:deqing.wang@foxmail.com;cong@dlut.edu.cn)

subproblem. In the BCD framework, many optimization algorithms have been proposed for constrained tensor decomposition, such as alternating proximal gradient (APG)[3, 5], hierarchical alternating least squares (HALS)[6, 7], and alternating optimization-based alternating direction method of multipliers (AO-ADMM)[8]. The APG algorithm has been very favorable in recent years.

The essence of APG is based on Nesterov’s optimal gradient method[9], also known as the accelerated proximal gradient method [10], which has been adapted to solve many nonconvex and nonsmooth problems[11, 12]. Nesterov’s accelerated method has been extended to nonnegative matrix factorization (NMF) and NCP problems[3, 5, 13]. The APG method in[3]updates each factor matrix only once per subproblem, whereas another version in[5, 13]updates each fac-

(2)

tor matrix using multiple inner iterations in the subproblem.

Recently, there are some APG variants for tensor decomposition. Liavas et al. [14]proposed an NCP algorithm using Nesterov’s method with anL-smoothµ-strongly convex condition. Le et al. [15]introduced an NCP algorithm using the improved Nesterov’s method with two different extrapolation points. Nesterov’s accelerated method has also been used to solve the all-at-once version of CANDECOMP/PARAFAC (CP) decomposition[16, 17]. Some of the above APG-based methods used multiple inner iterations with a stopping criterion[5, 13, 18]. However, they did not analyze the influence of the number of inner iterations of APG on NCP.

The number of inner iterations in the APG subproblem will significantly affect the performance of NCP. For the APG in [3], one inner iteration was inadequate for minimizing the subproblem, which rendered the NCP too inefficient to de- compose large-scale tensor data. The APG method with multiple inner iterations [5, 13] can deeply minimize the subproblem in which a stopping criterion is used, such as the projected gradient norm or the relative change of the objective function. However, the subproblem sometimes requires hundreds of inner iterations to exactly reach the stop criterion, which might still cause ineffectiveness to the NCP problem. Deep minimization of the subproblem does not result in efficient minimization of the NCP problem. For example, because all matrix factors are always initialized by random numbers in the BCD framework, it does not make sense to calculate an accurate solution to the subproblem in the first several outer iterations. In fact, the subproblem of tensor decomposition can be iterated for only a limited number of times with an inexact solution[19], which is the main idea of inexact scheme. In recent years, the inexact scheme that utilizes a finite number of inner iterations in the subproblem has become increasingly popular for BCD-based problems[20, 21]. The inexact scheme is also known as the early stopping method[22]or the inaccurate subproblem method [23]. It has been reported that the inexact scheme can achieve faster computational speeds and better local minima than can the exact scheme[22, 24]. Moreover, the inexact scheme has been employed with APG for many problems in broad areas, such as quadratic semidefinite programming [25], maximal entropy[26], and tensor recovery[27]. Therefore, the inexact scheme and APG can naturally be extended to the NCP problem.

In this work, we propose an inexact APG (iAPG) algorithm for NCP tensor decomposition. First, we update each factor matrix in the subproblem using the same finite number of inner iterations, which is designed as a structure of multiple layers. Second, we propose a parameter warm-start strategy that saves some key parameters after updating one

factor matrix and initializes these parameters to update the same factor matrix in the next loop. This strategy can avoid the frequent resetting of parameters of conventional methods [5, 13]and accelerate their convergence. Moreover, we ob- serve that the proposed iAPG shows the best performance when the number of inner iterations is limited to around 10 to 20. For matrix or tensor decomposition, the ratio of the number of data entries to the degrees of freedom is an important indicator of the difficulty level of the problem[28].

Therefore, based on the ratios of all subproblems, we automatically select the appropriate number of inner iterations between 10 and 20. We use both synthetic and real-world tensor data to evaluate the proposed iAPG method and make com- parisons using the conventional APG algorithms and other popular NCP methods.

2 Preliminaries

In this paper, we denote a vector by a boldface lowercase letter, such asx; a matrix by a boldface uppercase letter, such as X; and a tensor by a boldface Euler script letter, such as X. Operator◦represents the outer product of vectors,∗represents the Hadamard product, ⊙represents the Khatri-Rao product,⟨ ⟩denotes the inner product,J Kdenotes the Kruskal operator, and|| ||Fmeans the Frobenius norm.

We introduce the NCP decomposition as follows. Given a nonnegativeNth-order tensorX∈ RÎ¹^×Î²^×···×Î^N, the NCP can be represented by the following minimization problem:

min

A⁽¹⁾,...,A^(N)

1 2

X−JA⁽¹⁾, . . . ,A^(N)K²

F

s.t. A⁽ⁿ⁾>0forn= 1, . . . , N,

(1)

whereA⁽ⁿ⁾ ∈ R^Iⁿ^×^R forn = 1, . . . , N are the estimated factor matrices in different modes,Inis the size in mode-n, andR is the predefined number of components that can be seen as the selected rank. We represent the objective function of eq. (1) byFtensor(A).

LetX_(n) ∈ R^Iⁿ^×^∏^N^˜^n=1,˜^n̸=n^Iⁿ^˜ represent the mode-n unfolding of original tensor X. The mode-n unfolding of JA⁽¹⁾, . . . ,A^(N⁾K can be written as A⁽ⁿ⁾(

B⁽ⁿ⁾)T

, where B⁽ⁿ⁾=(

A^(N⁾⊙ · · · ⊙A⁽ⁿ⁺¹⁾⊙A⁽ⁿ⁻¹⁾⊙ · · · ⊙A⁽¹⁾)

∈ R^∏^N^n=1,˜^˜ ^n̸=n^Iⁿ^˜^×^R. In the block coordinate descent [3, 4]

framework, each factor matrix A⁽ⁿ⁾ in the kth iteration is updated alternatively by the following subproblem:

A⁽ⁿ⁾_k = arg min

A⁽ⁿ⁾>0

1 2

X_(n)−A⁽ⁿ⁾(

B⁽ⁿ⁾_k₋₁)^T²

F

. (2)

We represent the objective function of eq. (2) byF( A⁽ⁿ⁾)

. The subproblem (2) should be further solved using an optimization algorithm.

(3)

The partial derivative ofF( A⁽ⁿ⁾)

with respect toA⁽ⁿ⁾is

∂

∂A⁽ⁿ⁾F( A⁽ⁿ⁾)

=A⁽ⁿ⁾[(

B⁽ⁿ⁾)^T B⁽ⁿ⁾]

−X_(n)B⁽ⁿ⁾. (3) The item X_(n)B⁽ⁿ⁾ is called the matricized tensor times Khatri-Rao product (MTTKRP) [2], and the item (B⁽ⁿ⁾)T

B⁽ⁿ⁾can be computed efficiently by (B⁽ⁿ⁾)T

B⁽ⁿ⁾=[(

A^(N))T

A^(N)]

∗ · · · ∗[(

A⁽ⁿ⁺¹⁾)T

A⁽ⁿ⁺¹⁾]

∗[(

A⁽ⁿ⁻¹⁾)T

A⁽ⁿ⁻¹⁾]

∗· · ·∗[(

A⁽¹⁾)T

A⁽¹⁾] . (4) BothX_(n)B⁽ⁿ⁾and(

B⁽ⁿ⁾)T

B⁽ⁿ⁾will be used frequently in the computation.

3 Alternating proximal gradient algorithms

Nesterov’s optimal gradient method [9] can solve the subproblem (2) very efficiently[3, 5, 13], which is the key idea of APG.

3.1 APG with one inner iteration

Xu et al. [3, 29] proposed APG-based algorithms for both nonnegative matrix factorization and nonnegative tensor decomposition. In their computations, the subproblem of eq.

(2) was updated only once by Nesterov’s optimal gradient method. In other words, there was only one inner iteration for the subproblem. We denote this algorithm as APG-1. The updating of A⁽ⁿ⁾ in eq. (2) is computed by the following steps.

Let L⁽ⁿ⁾_k₋₁=(

B⁽ⁿ⁾_k₋₁)^T B⁽ⁿ⁾_k₋₁

2

(5) represent the Lipschitz constant, where|| ||2 is the spectral norm of a matrix.

An extrapolation weight is computed by

ω⁽ⁿ⁾_k₋₁= min



ωˆ_k₋₁, δ_ω vu utL⁽ⁿ⁾_k₋₂

L⁽ⁿ⁾_k₋₁



, (6)

whereδ_ω<1is predefined, andωˆ_k₋₁=^t^k⁻_t¹⁻¹

k witht₀= 1 andt_k= ¹₂

( 1 +

√

1 + 4t²_k₋₁ )

. Using the extrapolation weight, let Ab⁽ⁿ⁾_k₋₁=A⁽ⁿ⁾_k₋₁+ω_k⁽ⁿ⁾₋₁(

A⁽ⁿ⁾_k₋₁−A⁽ⁿ⁾_k₋₂)

(7) denote an extrapolated point, and let

Gb⁽ⁿ⁾_k₋₁=Ab⁽ⁿ⁾_k₋₁ (

B⁽ⁿ⁾_k₋₁ )T

B⁽ⁿ⁾_k₋₁−X_(n)B⁽ⁿ⁾_k₋₁ (8)

represent the gradient atAb⁽ⁿ⁾_k₋₁. FactorA⁽ⁿ⁾at iterationkis updated by

A⁽ⁿ⁾_k = argmin

A⁽ⁿ⁾>0

⟨ bG⁽ⁿ⁾_k₋₁,A⁽ⁿ⁾−Ab⁽ⁿ⁾_k₋₁⟩ +L⁽ⁿ⁾_k₋₁

2

A⁽ⁿ⁾−Ab⁽ⁿ⁾_k₋₁²

F

=prox 1 L(n)

k−1

(Ab⁽ⁿ⁾_k₋₁−Gb⁽ⁿ⁾_k₋₁ L⁽ⁿ⁾_k₋₁

)

. (9)

The closed form of eq. (9) can be written as A⁽ⁿ⁾_k = max

(

0, Ab⁽ⁿ⁾_k₋₁−Gb⁽ⁿ⁾_k₋₁ L⁽ⁿ⁾_k₋₁

)

. (10)

The APG-1 algorithm is not monotone [12]; hence, a restart strategy is used to enhance the decrease of the objective function. Specifically, ifFtensor(A_k)>Ftensor(A_k₋₁), we resetAb⁽ⁿ⁾_k₋₁byA⁽ⁿ⁾_k₋₁and recomputeA⁽ⁿ⁾_k ,n= 1, . . . , N. The implementation of APG-1 is shown in Algorithm1.

3.2 APG with multiple inner iterations

Guan et al. [13]employed Nesterov’s optimal gradient with multiple inner iterations to solve the subproblem of nonnegative matrix factorization. Later, this method was extended to nonnegative tensor decomposition[5]. Specifically, in the subproblem (2), the factor A⁽ⁿ⁾ was updated for multiple times until a stopping criterion was exactly reached. We denote this algorithm as APG-m. Guan et al. [13, 30]proposed the inner stopping criterion by the projected gradient. The computation of the projected gradient of NMF can naturally be extended to the NCP.

Algorithm 1:APG with one inner iteration (APG-1)

Input :X,R

Output:A⁽ⁿ⁾,n= 1, . . . , N

1 InitializeA⁽ⁿ⁾∈R^Iⁿ^×^R,n= 1, . . . , N, using nonnegative random numbers;

2 fork= 1,2, . . . do

3 forn= 1toNdo

4 Make mode-nunfolding ofXasX_(n)and compute MTTKRPX(n)B⁽ⁿ⁾_k−1;

5 Compute(

B⁽ⁿ⁾_k₋₁)^T

B⁽ⁿ⁾_k₋₁andL⁽ⁿ⁾_k₋₁based on eqs. (4) and (5);

6 Computeω⁽ⁿ⁾_k₋₁,bA⁽ⁿ⁾_k₋₁andGb⁽ⁿ⁾_k₋₁according to eqs. (6)–(8);

7 UpdateA⁽ⁿ⁾_k according to eq. (10);

8 end

9 ifF^tensor( Ak

)>F^tensor( A_k−1)

then

10 UpdateA⁽ⁿ⁾_k again according to eq. (10) withAb⁽ⁿ⁾_k₋₁=A⁽ⁿ⁾_k₋₁, n= 1, . . . , N;

11 end

12 ifouter termination criterion is reachedthen

13 return A⁽ⁿ⁾_k forn= 1, . . . , N.

14 end

15 end

(4)

In the subproblem (2), the projected gradient of the nonnegative factorA⁽ⁿ⁾is defined by

∇^P_A(n)F( A⁽ⁿ⁾)

ir

=





∇A⁽ⁿ⁾F( A⁽ⁿ⁾)

ir, (

A⁽ⁿ⁾)

ir>0, min[

0,∇A⁽ⁿ⁾F( A⁽ⁿ⁾)

ir

], ( A⁽ⁿ⁾)

ir= 0, (11) wherei= 1, . . . , I_n andr= 1, . . . , R. The inner iterations will stop when the following condition is satisfied exactly:

∇^P_A(n)F( A⁽ⁿ⁾)

F 6ϵ_A(n), (12)

whereϵ_A(n)is the local tolerance of the subproblem. At the beginning of tensor decomposition, theϵ_A(n)can be set by ϵ_A(n) = max(10⁻³, ϵ)

×∇^P_A(1) 0

F( A⁽¹⁾₀ )

. . . ∇^P_A(N) 0

F( A^(N₀ ⁾)

F

, (13) whereϵis a global tolerance (e.g.,10⁻⁶or10⁻⁸). Theϵ_A(n)

is dynamically adjustable. When the number of inner iterations is less than 10, ϵ_A(n) will be updated by ϵ_A(n) ← ϵ_A(n)/10.

The implementation details of the subproblem in APG-m are shown in Algorithm2. The whole APG-m algorithm is shown in Algorithm3.

4 Inexact alternating proximal gradient algo- rithm

In this section, we introduce the proposed iAPG algorithm.

4.1 Inexact scheme

As mentioned, to solve (2), Xu et al. [3, 29]updatedA⁽ⁿ⁾ only once in the inner loop of the APG. However, one iteration is not enough to minimize the subproblem, by which the NCP is inefficient, especially for large-scale tensor data. On the other hand, for the APG having multiple inner iterations, the projected gradient norm is employed as the inner stopping criterion[13], which might cause tens or hundreds of inner iterations. Although the inner termination method using an exact threshold of the projected gradient norm will fully minimize the subproblem, too many inner iterations might not efficiently minimize the NCP problem. The number of inner iterations or the accuracy of the subproblem is a key parameter for determining the performance of the APG for NCP.

In fact, updatingA⁽ⁿ⁾in the subproblem using Nesterov’s optimal gradient method can be iterated for only limited times without obtaining an exact solution, which is the main idea of the inexact scheme. It has been reported that the inexact scheme can save computation time and achieve a better min- imum of the entire non-convex optimization problem [22].

Algorithm 2:Subproblem of APG-m (APG SUB)

Input :XB,B^TB,A0

Output:A

1 L=B^TB

2;Ab0=A0;t0= 1;j= 0;

2 repeat

3 j←j+ 1;

4 Gbj−1=Abj−1B^TB−XB;

5 Aj= max (

0,Ab_j−1−Gbj−1

L )

;

6 tj=1

2 (

1 +

√

1 + 4(tj−1)²)

;

7 ωj=tj−1−1 tj

;

8 Abj=Aj+ωj

(Aj−A_j−1)

;

9 untilthe inner stopping criterion (12) is reached;

10 A=Aj.

Algrithm 3:APG with multiple inner iterations (APG-m)

Input :X,R

Output:A⁽ⁿ⁾,n= 1, . . . , N

1 InitializeA⁽ⁿ⁾₀ ∈R^Iⁿ^×^R,n= 1, . . . , N, using nonnegative random numbers;

% The outer loop starts here.

2 fork= 1,2, . . . do

3 forn= 1toNdo

4 Make mode-nunfolding ofXasX(n)and compute MTTKRPX(n)B⁽ⁿ⁾_k₋₁;

5 Compute(

B⁽ⁿ⁾_k−1)^T B⁽ⁿ⁾_k−1;

% The inner loop starts here.

6 A⁽ⁿ⁾_k =APG SUB(X(n)B⁽ⁿ⁾_k₋₁,(

B⁽ⁿ⁾_k₋₁⁾^TB⁽ⁿ⁾_k₋₁,A⁽ⁿ⁾_k₋₁)

% The inner loop ends here.

7 end

9 return A⁽ⁿ⁾_k forn= 1, . . . , N.

10 end

11 end

% The outer loop ends here.

The NCP tensor decomposition can also be seen as a non- convex optimization problem. In this study, we propose a straightforward and efficient way to terminate the inner loop of APG subproblem by a limited maximum iteration number J which can be a small integer, such asJ = 20.

4.2 Warm-start parameters

By carefully observing the APG-1 algorithm in Sect. 3.1, we find that the extrapolation weights ^t^k−1_t⁻¹

k increase grad- ually by outer iterationskunder Nesterov’s rules, as shown in Figure 1. After adequate iterations, the factor difference A⁽ⁿ⁾_k₋₁−A⁽ⁿ⁾_k₋₂ in eq. (7) is very small. Hence, it is reasonable to increase the extrapolation weights for computing the extrapolated points. However, if we turn to the APG-m subproblem in Algorithm4, it can be found that, at the beginning of each inner loop, the parameter of extrapolation weightω_j is reset to zero by settingt₀= 1. The resetting of ω_j to zero

(5)

10

0 100

Outer iterantion index k 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

20 30 40 50 60 70 80 90 (tk-1-1)/tk

Figure 1 (Color online) The change of extrapolation weights by iterations in Nesterov’s optimal gradient method, wheret0 = 1and tk =

1 2

( 1 +√

1 + 4t²_k₋₁) .

Algorithm 4:Subproblem of iAPG (IAPG SUB)

Input :XB,B^TB,Ab0, t0, J Output:A,A, tb

1 L=B^TB

2;

2 forj= 1toJdo

3 Gbj−1=Abj−1B^TB−XB;

4 Aj= max (

0,Ab_j−1−Gbj−1

L )

;

5 tj=1

2 (

1 +

√

1 + 4(tj−1)²)

;

6 ωj=tj−1−1 tj

;

7 bAj=Aj+ωj

(Aj−A_j−1)

;

8 end

9 A=Aj,bA=Abj,t=tj.

is not reasonable when the factor matrixA⁽ⁿ⁾_k and its previous versionA⁽ⁿ⁾_k₋₁are very close to each other. Therefore, in eq. (7), it is more acceptable to use theω⁽ⁿ⁾_k after the previous inner loop as the warm-start value for the new inner loop.

On the other hand, in Algorithm 2, it can also be observed that at the beginning of each inner loop, the extrapolated point Ab⁽ⁿ⁾_k is initialized by the value of factorA⁽ⁿ⁾_k₋₁. Thus, for the first run of Step 4 in Algorithm 2, theGb0 is not a true gradient at the extrapolated point. The extrapolated pointAb⁽ⁿ⁾_k should also be set by a warm-start value.

The warm-start strategies can avoid the resetting of parameters and will be beneficial to the performance of NCP.

The inner loop of the inexact update for A⁽ⁿ⁾ with warm- start parameters is shown in Algorithm 4, which is denoted as function IAPG SUB. The proposed iAPG algorithm for NCP is shown in Algorithm5. We display the structure of iAPG with a finite number of inner iterations in Figure 2.

The finite number of inner iterations can be interpreted as a structure of multiple layers for the subproblem. The parameters ofA⁽ⁿ⁾,Ab⁽ⁿ⁾, andt⁽ⁿ⁾are dynamically updated across the layers. Our inexact scheme with a finite number of inner

Algorithm 5:Inexact APG for NCP (iAPG)

Input :X,R

Output:A⁽ⁿ⁾,n= 1, . . . , N

1 InitializeA⁽ⁿ⁾₀ ∈R^Iⁿ^×^R,n= 1, . . . , N, using nonnegative random numbers;

2 SelectJas the number of inner iterations;

3 Sett⁽ⁿ⁾₀ = 1,Ab⁽ⁿ⁾₀ =A⁽ⁿ⁾₀ , n= 1, . . . , N;

% The outer loop starts here.

4 fork= 1,2, . . . do

5 forn= 1toNdo

6 Make mode-nunfolding ofXasX_(n)and compute MTTKRPX(n)B⁽ⁿ⁾_k₋₁;

7 Compute(

B⁽ⁿ⁾_k−1)^T B⁽ⁿ⁾_k−1;

% The inner loop starts here.

8 (A⁽ⁿ⁾_k ,Ab⁽ⁿ⁾_k , tk)= IAPG SUB(X_(n)B⁽ⁿ⁾_k₋₁,(

B⁽ⁿ⁾_k₋₁⁾^TB⁽ⁿ⁾_k₋₁,Ab⁽ⁿ⁾_k₋₁, t⁽ⁿ⁾_k₋₁, J)

% The inner loop ends here.

9 end

10 ifFtensor(A_k)>Ftensor( A_k−1)

then

11 UpdateA⁽ⁿ⁾_k again withAb⁽ⁿ⁾_k₋₁=A⁽ⁿ⁾_k₋₁,n= 1, . . . , N;

12 end

14 return A⁽ⁿ⁾_k forn= 1, . . . , N.

15 end

16 end

% The outer loop ends here.

iterations/layers can be seen as a type of algorithm unrolling or unfolding[31]in tensor decomposition.

4.3 Selecting the number of inner iterations

After designing the algorithm using the inexact scheme and parameter warm-starting strategy, there remains an essential question of how to select the best number of inner iterations J for the subproblem.

We synthesized several tensors with balanced mode sizes and selected different initial numbers of components (ranks) for the NCP. The synthesizing steps are introduced in Sect.5.1. One group of tensors had a size of100×100×100 and ranks of 8,20,and50, respectively, and another group had a size of300×300×300and ranks of20,50,and100, respectively. Afterwards, we tested different inner iteration numbers using J = [1,5,10,15, . . . ,100]. For each inner iteration value, we ran iAPG 30 times and recorded the average relative error (RelErr) and running time. All results are shown in Figure 3. We find that in most cases, the iAPG algorithm has the fastest running speed and the lowest relative error when the number of inner iterations J is around 20. Moreover, when the rank is very small compared with the mode size (e.g., rank= 8for size100×100×100and rank = 20 for size 300×300×300), the algorithm having a smaller value of J = 10 shows good performances.

When the rank increases with the fixed mode size, more in-

(6)

Multiple inner iterations with fixed number of layers

The kth (single) outer iteration

A A A A

A A A A A

A A

A

A A A A A A

Figure 2 (Color online) The structure of the proposed inexact APG with a finite number of inner iterations/layers. The parameters ofA⁽ⁿ⁾,Ab⁽ⁿ⁾andt⁽ⁿ⁾ are dynamically updated across the layers.

Relative error

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Inner iteration number

Running time (s) Relative error

Running time (s)

8.20 8.21 8.22 8.23 8.24

0 1 2 3

Relative error

Mode Size=100 × 100 × 100, Rank=8 Mode Size=100 × 100 × 100, Rank=20 Mode Size=100 × 100 × 100, Rank=50

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Running time (s) 0

2 4 6 8 8.24

8.25 8.26

8.2 8.4 8.6

0 2 4 6

8.22 8.23 8.24 8.25

Relative error

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Inner iteration number 0

5 10 15 20

Running time (s)

8.25 8.30 8.35 8.40

Relative error

Inner iteration number 0

20 40 60 80

Running time (s)

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Relative error

Running time (s)

8.2 8.3 8.4 8.5 8.6

0 50 100 150 200 (a)

(b)

×10⁻³ ×10⁻³ ×10⁻³

Mode Size=300 × 300 × 300, Rank=20 Mode Size=300 × 300 × 300, Rank=50 Mode Size=300 × 300 × 300, Rank=100

×10⁻³ ×10⁻³ ×10⁻³

Figure 3 (Color online) Performance of iAPG with different inner iterations on synthetic tensors. (a) Tensor size100×100×100, rank= 8,20and50;

(b) tensor size300×300×300, rank= 20,50,and100.

ner iterations are required to achieve better performance. We only considered the balanced-mode size for the synthetic tensors in this section, because it was convenient to investigate the influence of the rank and the tensor size on the NCP decomposition.

Furthermore, we employed two real-world tensors to test the influence of the number of inner iterationsJ on the per-

formances. The first dataset was a fluorescence excitation- emission matrix (EEM) tensor with size299× 301× 41, and the second dataset is a video image tensor with size 158 × 238× 400. The details of the real-world tensors will be introduced in Sect. 5.2. For the video image tensor in this section, we only selected 400 images. The appropriate rank or number of components of a real-world tensor can be

(7)

estimated by some skills, such as the DIFFIT method[32]. In this section, we selected three rank values in order to test the performance of iAPG with different ranks. We selected rank

= [8,20,50] for the EEM tensor and rank = [20,50,100]

for the video tensor in the decomposition. The test methods were the same as that for the synthetic tensors. All results are shown in Figure4. Once again, we find that the iAPG has the best performances whenJ is around 10 or 20. When the rank is small,J around 10 is good; when the rank is large, a biggerJaround 20 is more suitable.

Based on the findings from the above tests, we infer that the selection of the number of inner iterationsJis closely related to the rank and mode sizes. For a tensor decomposition problem, the rank is related to the number of degrees of freedom. Moreover, the ratio of the number of data entries and the number of degrees of freedom is an important indicator of the difficulty level of the problem[28]. Therefore, according to the rank and mode sizes, we dynamically selectedJ = 10 orJ = 20for the iAPG algorithm according to an empirical rule using the rank and mode sizes. We specify the empirical rule inAppendix A.

4.4 Convergence and computation analysis

In the block coordinate descent framework, the APG method has shown excellent convergence properties[3, 5]. Moreover,

by iterating the subproblem several times, the inexact scheme will further decrease the objective function of eq. (2), which is beneficial to the convergence to a stationary point [33].

More detailed convergence analysis of the APG can be found in ref.[3, 34].

By observing Algorithm 5, we find that updatingA⁽ⁿ⁾ mainly consists of three parts, X(n)B⁽ⁿ⁾_k₋₁, (

B⁽ⁿ⁾_k₋₁)^T B⁽ⁿ⁾_k₋₁, and the inexact updateIAPG SUB. The computational com- plexities of these three parts are O(

R∏N

˜

n=1,˜n̸=nI_n_˜ + R∏N

˜ n=1In˜

), O( R²∑N

˜

n=1,˜n̸=nIn˜

) andO(

J ×InR²) , respectively. Clearly, with a small integer J, the third one is very small compared with the other two. Therefore, the inexact scheme will not increase the computational burden. How- ever, for some exact method that has hundreds of inner iterations, the computational complexity of the inner loop will become large and affect the whole NCP.

5 Experiments and results

We applied the proposed iAPG method on third-order synthetic tensors and real-world dense tensors. We compared our method to the following popular methods for NCP.

(1) APG-1. This is the algorithm introduced in Sect. 3.1, which contains only one inner iteration for the subproblem.

(2) APG-m. This is the algorithm introduced in Sect. 3.2,

Relative error

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Running time (s)

Relative error

EEM 299 × 301 × 41, Rank=20 EEM 299 × 301 × 41, Rank=50 EEM 299 × 301 × 41, Rank=8

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Running time (s)Relative error

Video 158 × 238 × 400, Rank=20

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Running time (s)

90 80 70 60 50 40 30 20 10

0 100

90 80 70 60 50 40 30 20 10

0 100

Relative error

Running time (s)

(a)

(b) 8.5 9.0 9.5

0 1 2 3

3.2 3.3 3.4 3.5

5 10 15

1.40 1.45 1.50

20 30 40 50

1.834 1.836 1.838 1.840 1.842

5 10 15 20

1.566 1.568 1.570 1.572

0 20 40 60

1.345 1.350 1.355

0 50 100 150

×10⁻²

×10⁻¹

Video 158 × 238 × 400, Rank=50

×10⁻¹

Video 158 × 238 × 400, Rank=100

×10⁻¹

×10⁻² ×10⁻²

Figure 4 (Color online) Performance of iAPG with different inner iterations on real-world tensors. (a) EEM tensor with size299×301×41, rank

= 8,20and50; (b) video tensor with size158×238×400, rank= 20,50,and100.

(8)

which contains multiple inner iterations and uses the projected gradient norm as the stopping criterion for the subproblem.

(3) AO-ADMM. This is the alternating optimization- based alternating direction method of multipliers proposed in ref.[8].

(4) FastHALS. This is the classical fast hierarchical alternating least squares method proposed in ref.[6, 7].

For all algorithms, the factor matrices were initialized using nonnegative random numbers by command max(0, randn(In, R)). We define the relative error of the NCP in the kth outer iteration by

RelErr_k= ∥X−JA⁽¹⁾_k , . . . ,A^(N)_k K∥F

∥X∥F

. (14)

The stopping condition of the outer loop was based on the change of relative[32]:

|RelErr_k₋₁−RelErr_k|< ε, (15) where the toleranceεwas set by1×10⁻⁶.

All the following experiments were carried out on a com- puter with Intel Core i7-9750H 2.60 GHz CPU, 32 GB mem- ory, 64-bit Windows 10 and MATLAB R2016b. The funda- mental tensor computation was based on Tensor Toolbox 2.6 [35]. The code is available on the author’s website¹⁾. 5.1 Synthetic tensor

5.1.1 Different sizes and ranks

We synthesized several third-order tensors inspired by the methods in ref. [3]. The sizes of the synthetic tensors were 1000×200×100,600×400×200and240×240×240.

All contain two rank values: R = 20 and R = 50. For each tensor, we generated the first factor matrix by max(0, rand(I₁,R)∗2−1) and the other by rand(I_n,R), forn = 2and3. Next, nonnegative noise using max(0, rand(size(X)) was added to the tensor having a signal-to-noise-ratio (SNR) of 40 dB[3].

For the proposed iAPG algorithm, the automatically selected number of the inner iterationsJfor each synthetic tensor is shown in Table B1 in Appendix B. For all NCP methods, the initial number of components was set byR= 20or R = 50according to the true rank of the synthetic tensors.

We ran all algorithms 30 times and recorded the average relative error and running time in Table1.

It is clear from Table1that the proposed iAPG algorithm has the fastest running speed compared with other algorithms.

The iAPG also has a very low relative error. In some tensor cases, for example, the one with size1000×200×100and

rank= 50and another one with size600×400×200and rank= 50, iAPG exhibits the lowest relative error compared with other algorithms.

5.1.2 Different noise levels

Furthermore, we evaluated the performances of the NCP algorithms to recover the true components at different noise levels. We synthesized a tensor with size 1000×200×100 and rank 50 using the same factor matrices as those introduced in Sect. 5.1.1. These factor matrices were retained as the true components denoted byA⁽ⁿ⁾₀ , n= 1, . . . , N. After- wards, we added nonnegative noise to the tensor at different SNR levels of 5, 10, 20, 30, and 40 dB, respectively. We employed the factor correlation coefficient (FCC) as a measure in[0,1]of how close the recovered componentsA⁽ⁿ⁾were to the true componentsA⁽ⁿ⁾₀ . The FCC is computed as follows.

(1) Compute the correlation matrix, Mρ = corr(

A⁽ⁿ⁾,A⁽ⁿ⁾₀ )

that contains the pairwise correlation coefficient between each pair of columns inA⁽ⁿ⁾andA⁽ⁿ⁾₀ using the MATLAB command.

(2) Compute the FCC by finding the maximum value in each column of Mρ and then performing the average.

This can be implemented by MATLAB commands, FCC = mean(

max(

Mρ,[ ],1)) .

We ran all algorithms 30 times and recorded the average FCC on each mode in Table2.

It can be observed from Table2that the FCC values of the iAPG at different noise levels were very similar to those of the APG-1, APG-m, AO-ADMM, and FastHALS algorithms.

For the low SNR cases, such as 5 and 10 dB, the iAPG algorithm performed a bit better than did the other methods.

5.2 Real-world tensors 5.2.1 Datasets

We employed four real-world tensors to evaluate the algorithms.

(1) Fluorescence EEM tensor

The first is a third-order fluorescence EEM tensor[36]col- lected from measurements on human blood plasma samples²⁾. The size of this dataset was299×301×41, where 299 in- dicates the number of samples, 301 represents the emission wavelengths from 300 to 600 with 1-nm increments, and 41 represents the excitation wavelengths from 250 to 450 with 5-nm increments. There were about 17% missing elements marked as not-a-number “NaN” in the data. We conducted tensor completion [3] and then replaced the NaNs with the recovered elements.

(2) Event-related potential tensor

1)http://deqing.me/

2)http://www.models.life.ku.dk/anders-cancer