Constrained polynominal optimization problems with noncommuting variables

(1)

Constrained polynominal optimization problems with noncommuting variables

Cafuta, Kristijan Igor Klep Janez Povh

Konstanzer Schriften in Mathematik Nr. 285, August 2011

ISSN 1430-3558

Fach D 197, 78457 Konstanz, Germany

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-152835

(2)

(3)

NONCOMMUTING VARIABLES

KRISTIJAN CAFUTA, IGOR KLEP¹, AND JANEZ POVH²

Abstract. In this paper we study constrainedeigenvalue optimization of noncommutative (nc) polynomials, focusing on the polydisc and the ball. Our three main results are as follows:

(1) an nc polynomial is nonnegative if and only if it admits a weighted sum of hermitian squares decomposition; (2) (eigenvalue) optima for nc polynomials can be computed using a single semidefinite program (SDP) – this sharply contrasts the commutative case where sequencesof SDPs are needed; (3) the dual solution to this “single” SDP can be exploited to extract eigenvalue optimizers with an algorithm based on two ingredients:

• solution to atruncated nc moment problemvia flat extensions;

• Gelfand-Naimark-Segal (GNS)construction.

The implementation of these procedures in our computer algebra systemNCSOStoolsis presented and several examples pertaining to matrix inequalities are given to illustrate our results.

1. Introduction

Starting with Helton’s seminal paper [Hel02], free real algebraic geometry is being es- tablished. Unlike classical real algebraic geometry where real polynomial rings in commuting variables are the objects of study, free real algebraic geometry deals with real polynomials in noncommuting (nc) variables and their finite-dimensional representations. Of interest are no- tions ofpositivityinduced by these. For instance, positivity via positive semidefiniteness, which can be reformulated and studied using sums of hermitian squares and semidefinite programming. In the sequel we will use SDP to abbreviate semidefinite programming as the subarea of nonlinear optimization as well as to refer to an instance of semidefinite programming problems.

1.1. Motivation. Among the things that make this area exciting are its many facets of applications. Let us mention just a few. A nice survey on applications to control theory, systems engineering and optimization is given by Helton, McCullough, Oliveira, Putinar [HMdOP08], applications to quantum physics are explained by Pironio, Navascu´es, Ac´ın [PNA10] who also consider computational aspects related so noncommutative sum of squares. For instance, optimization of nc polynomials has direct applications in quantum information science (to compute upper bounds on the maximal violation of a generic Bell inequality [PV09]), and also in quantum chemistry (e.g. to compute the ground-state electronic energy of atoms or molecules, cf. [Maz04]). Certificates of positivity via sums of squares are often used in the theoretical

Date: 11 April 2011.

2010Mathematics Subject Classification. Primary 90C22, 14P10; Secondary 13J30, 47A57.

Key words and phrases. noncommutative polynomial, optimization, sum of squares, semidefinite programming, moment problem, Hankel matrix, flat extension, Matlab toolbox, real algebraic geometry, free positivity.

1Supported by the Slovenian Research Agency (project no. J1-3608 and program no. P1-0222). Part of this research was done while the author held a visiting professorship at the University Konstanz supported by the program “free spaces for creativity”.

2Supported by the Slovenian Research Agency - program no. P1-0297(B).

1

(4)

physics literature to place very general bounds on quantum correlations (cf. [Gla63]). Fur- thermore, the important Bessis-Moussa-Villani conjecture (BMV) from quantum statistical mechanics is tackled in [KS08b] and by the authors in [CKP10]. How this pertains to operator algebras is discussed by Schweighofer and the second author in [KS08a], Doherty, Liang, Toner, Wehner [DLTW08] employ free real algebraic geometry (or free positivity) to consider the quantum moment problem and multi-prover games.

We developed NCSOStools [CKP] as a consequence of this recent interest in free positivity and sums of (hermitian) squares (sohs). NCSOStools is an open source Matlab toolbox for solving sohs problems using semidefinite programming (SDP). As a side product our toolbox implements symbolic computation with noncommuting variables in Matlab. Hence there is a small overlap in features with Helton’s NCAlgebra package for Mathematica [HMdOS].

However,NCSOStoolsperforms only basic manipulations with noncommuting variables, while NCAlgebrais a fully-fledged add-on for symbolic computation with polynomials, matrices and rational functions in noncommuting variables.

Readers interested in solving sums of squares problems for commuting polynomials are referred to one of the many great existing packages, such as GloptiPoly [HLL09], SOSTOOLS [PPSP05], SparsePOP [WKK⁺09], or YALMIP [L¨of04].

1.2. Contribution. This article adds on to the list of properties that are much cleaner in the noncommutative setting than their commutative counterparts. For example: a positive semidefinite nc polynomial is a sum of squares [Hel02], a convex nc semialgebraic set has an LMI representation [HM], proper nc maps are one-to-one [HKM11], etc. More precisely, the purpose of this article is threefold.

First, we shall show that every noncommutative (nc) polynomial that is merely positive semidefinite on a ball or a polydisc admits a sum of hermitian squares representation with weights and tight degree bounds (Nichtnegativstellensatz3.4). Note that this contrasts sharply with the commutative case, where strict positivity is needed and nevertheless there do not exist degree bounds, cf. [Sch09].

Second, we show how the existence of sharp degree bounds can be used to compute (eigenvalue) optima for nc polynomials on a ball or a polydisc by solving a single semidefinite programming problem (SDP). Again, this is much cleaner than the corresponding situation in the commutative setting, where sequences of SDPs are needed, cf. Lasserre’s relaxations [Las01,Las09].

Third, the dual solution of the SDP constructed above, can be exploited to extract eigenvalue optimizers. The algorithm is based on 1-stepflat extensionsof noncommutative Hankel matrices and the Gelfand-Naimark-Segal (GNS) construction, andalways works – again con- trasting the classical commutative case.

1.3. Reader’s guide. The paper starts with a preliminary section fixing notation, introduc- ing terminology and stating some well-known classical results on positive nc polynomials (§2).

We then proceed in §3 to establish our Nichtnegativstellensatz. The last two sections present computational aspects, including the construction and properties of the SDP computing the minimum of an nc polynomial in §4, and the extraction of optimizers in §5. We have implemented our algorithms in our open source Matlab toolbox NCSOStools freely available at http://ncsostools.fis.unm.si/. Throughout the paper examples are given to illustrate our results and the use of our computer algebra package.

(5)

2. Notation and Preliminaries

2.1. Words, free algebras and nc polynomials. Fix n ∈ N and let hXi be the monoid freely generated by X := (X1, . . . , Xn), i.e., hXi consists of words in the n noncommuting letters X₁, . . . , X_n (including the empty word denoted by 1). We consider the free algebra RhXi. The elements of RhXi are linear combinations of words in the n letters X and are called noncommutative (nc) polynomials. An element of the form aw where a∈R\ {0} and w ∈ hXi is called a monomial and a its coefficient. Words are monomials with coefficient 1.

The length of the longest word in an nc polynomialf ∈RhXiis thedegree off and is denoted by degf. The set of all words and nc polynomials with degree≤ dwill be denoted by hXi_d and RhXi_d, respectively. If we are dealing with only two variables, we shall use X, Y instead of X₁, X₂.

BySkwe denote the set of all symmetrick×kreal matrices and byS⁺_k we denote the set of all real positive semidefinite k×k real matrices. Moreover,S:=S

k∈NSk and S⁺:=S

k∈NS⁺_k. IfA is positive semidefinite we denote this by A0.

2.1.1. Sums of hermitian squares. We equip RhXi with the involution ∗ that fixes R∪ {X}

pointwise and thus reverses words, e.g. (X1X₂²X3 −2X₃³)^∗ = X3X₂²X1−2X₃³. Hence RhXi is the ∗-algebra freely generated by n symmetric letters. Let SymRhXi denote the set of all symmetric polynomials,

SymRhXi:={f ∈RhXi |f =f^∗}.

An nc polynomial of the form g^∗g is called a hermitian square and the set of all sums of hermitian squares will be denoted by Σ². Clearly, Σ² (SymRhXi. The involution ∗extends naturally to matrices (in particular, to vectors) over RhXi. For instance, if V = (vi) is a (column) vector of nc polynomials v_i∈RhXi, then V^∗ is the row vector with componentsv_i^∗. We useV^tto denote the row vector with components vi.

We can stack all words from hXi_d using the graded lexicographic order into a column vectorW_d. The size of this vector will be denoted by σ(d), hence

σ(d) :=|W_d|=

d

X

k=0

n^k= n^d+1−1

n−1 . (1)

Everyf ∈RhXi_2dcan be written (possible nonuniquely) as f =W_d^∗GfWd, whereGf =G^∗_f is called aGram matrix forf.

Example 2.1. Consider f = 2 +XY XY +Y XY X ∈SymRhXi. Let W₂ =

1 X Y X² XY Y X Y²t

. Then there are manyG_f ∈S7 satisfying f =W₂^∗G_fW₂; for instance

Gf(u, v) =

1 ifu^∗v=XY XY ∨ u^∗v=Y XY X ∨ u^∗v= 1, 0 otherwise.

Obviously f 6∈Σ² but we have

f =g^∗₁g1+g₂^∗g2+g₃^∗g3+g^∗₄g4+X(1−X²−Y²)X+Y(1−X²−Y²)Y, (2) where

g₁ = r3

2, g₂ =

√2

2 (X²−Y²), g₃ =

√2

2 (1−X²−Y²), g₄ = (XY +Y X).

(6)

Alternately,

f = (XY +Y X)^∗(XY +Y X) + (1−X²) +Y(1−X²)Y + (1−Y²) +X(1−Y²)X. (3) 2.2. Nc semialgebraic sets and quadratic modules.

2.2.1. Nc semialgebraic sets.

Definition 2.2. Fix a subsetS⊆SymRhXi. The (operator)semialgebraic set D^∞_S associated to S is the class of tuples A = (A1, . . . , An) of bounded self-adjoint operators on a Hilbert space makings(A) a positive semidefinite operator for everys∈S. In case we are considering only tuples of symmetric matricesA∈Sⁿ satisfyings(A)0, we writeD_S. When considering symmetric matrices of a fixed sizek∈N, we shall use D_S(k) :=D_S∩Sⁿ_k.

We will focus on the two most important examples of nc semialgebraic sets:

Example 2.3.

(a) Let S={1−Pn

i=1X_i²}. Then B:= [

k∈N

n

A= (A1, . . . , An)∈Sⁿ_k |1−

n

X

i=1

A²_i 0 o

=D_S (4)

is the nc ball. Note Bis the set of all row contractions of self-adjoint operators on finite- dimensional Hilbert spaces.

(b) Let S={1−X₁², . . . ,1−X_n²}. Then D:= [

k∈N

A= (A₁, . . . , A_n)∈Sⁿ_k |1−A²₁0, . . . ,1−A²_n0 =D_S (5) is thenc polydisc. It consists of alln-tuples of self-adjoint contractions on finite-dimensional Hilbert spaces.

In the rest of the paper we will

(§3) establish which nc polynomialsf are positive semidefinite onB and D;

(§4) construct asingle SDP which yields the smallest eigenvaluef attains onB and D; (§5) use the solution of the dual SDP to compute an eigenvalue minimizer forf on Band D.

2.2.2. Archimedean quadratic modules. The main existing result in the literature concerning nc polynomials (strictly) positive on Band Dis due to Helton and McCullough [HM04]. For a precise statement we recall (archimedean) quadratic modules.

Definition 2.4. A subset M ⊆SymRhXi is called aquadratic module if 1∈M, M +M ⊆M and a^∗M a⊆M for all a∈RhXi.

Given a subsetS ⊆SymRhXi, the quadratic moduleM_Sgenerated bySis the smallest subset of SymRhXi containing alla^∗safors∈S∪ {1},a∈RhXi, and closed under addition:

MS = nX^N

i=1

a^∗_isiai|N ∈N, si∈S∪ {1}, a_i ∈RhXio . The following is an obvious but important observation:

Proposition 2.5. Let S⊆SymRhXi. If f ∈M_S, then f|_D^∞

S 0.

(7)

The converse of Proposition 2.5 is false in general, i.e., nonnegativity on an nc semialgebraic set does not imply the existence of a weighted sum of squares certificate, cf. [KS07, Example 3.1]. A weak converse holds for positive nc polynomials under a strong boundedness assumption, see Theorem 2.7below.

Definition 2.6. A quadratic moduleM is archimedean if

∀a∈RhXi ∃N ∈N: N −a^∗a∈M. (6) Note if a quadratic module M_S is archimedean, then D_S^∞ is bounded, i.e., there is an N ∈N such that for every A ∈ D_S^∞ we have kAk ≤ N. Examples of archimedean quadratic modules are obtained by generating them from defining sets for the nc ball and the nc polydisc.

2.2.3. A Positivstellensatz. The main result in the literature concerning archimedean quadratic modules is a theorem of Helton and McCullough. It is a perfect generalization of Putinar’s Positivstellensatz [Put93] for commutative polynomials.

Theorem 2.7 (Helton & McCullough [HM04, Theorem 1.2]). Let S∪ {f} ⊆ SymRhXi and suppose that M_S is archimedean. If f(A)0 for all A∈ D^∞_S, then f ∈M_S.

We remark that if D_S is nc convex [HM04,§2], then it suffices to check the positivity of f in Theorem 2.7 on D_S, see [HM04, Proposition 2.3]. Our Nichtnegativstellensatz 3.4 will show that forB and Dpositive semidefiniteness of f is enough to establish the conclusion of Theorem 2.7. Under the absence of archimedeanity the conclusions of Theorem 2.7 may fail, cf. [KS07].

3. A Nichtnegativstellensatz

The main result in this section is the Nichtnegativstellensatz3.4. For a precise formulation we introduce truncated quadratic modules.

3.1. Truncated quadratic modules. Given a subsetS ⊆SymRhXi, we introduce Σ²_S:=n X

i

h^∗_is_ih_i |h_i ∈RhXi, s_i ∈So , Σ²_S,d:=n X

i

h^∗_isihi |hi ∈RhXi, s_i ∈S,deg(h^∗_ishi)≤2d o

, M_S,d:=n X

i

h^∗_is_ih_i |h_i ∈RhXi, s_i ∈S∪ {1},deg(h^∗_ish_i)≤2do ,

(7)

and callM_S,dthetruncated quadratic modulegenerated byS. NoteM_S,d= Σ²_d+Σ²_S,d⊆RhXi_2d, where Σ²_d:= Σ²_∅,d denotes the set of all sums of hermitian squares of polynomials of degree at mostd. Furthermore,MS,dis a convex cone in theR-vector space SymRhXi_2d. For example, if S ={1−P

jX_j²}thenM_S,dcontains exactly the polynomialsf which have asum of hermitian squares(sohs) decomposition over the ball, i.e., can be written as

f =X

i

g_i^∗g_i+X

i

h^∗_i 1−

n

X

j=1

X_j²

h_i, where (8)

deg(g_i)≤d, deg(h_i)≤d−1 for all i.

(8)

Similarly, forS ={1−X₁²,1−X₂², . . . ,1−X_n²},M_S,d contains exactly the polynomialsf which have asohs decomposition over the polydisc, i.e., can be written as

f =X

i

g_i^∗g_i+

n

X

j=1

X

i

h^∗_i,j 1−X_j²

h_i,j, where (9)

deg(g_i)≤d, deg(h_i,j)≤d−1 for alli, j.

We also call a decomposition of the form (8) or (9) asohs decomposition with weights.

Example 3.1. Note the the polynomial f from Example 2.1 has a sohs decomposition over the ball, as follows from (2). Moreover, (3) implies thatf also has a sohs decomposition over the polydisc.

Let us consider another example.

Example 3.2. Let f = 2−X²+XY²X−Y²∈SymRhXi. Obviously f 6∈Σ² but

f = (Y X)^∗Y X+ (1−X²) + (1−Y²), (10) i.e., f has a sohs decomposition over the polydisc, as well over the ball, since

f = 1 + (Y X)^∗Y X+ (1−X²−Y²). (11) Notation 3.3. For notational convenience, the truncated quadratic modules generated by the generator for the nc ball Bwill be denoted by M_B,d, i.e.,

M_B_,d:=n X

i

h^∗_is_ih_i |h_i∈RhXi, s_i∈ {1−X

j

X_j²,1},deg(h^∗_is_ih_i)≤2do

⊆SymRhXi_2d, (12) Likewise, with s0 := 1 andsi:= 1−X_i²,

M_D_,d :=n X

j n

X

i=0

h^∗_i,js_ih_i,j |h_i ∈RhXi,deg(h^∗_is_ih_i)≤2do

⊆SymRhXi_2d. (13)

3.2. Main result. Here is our main result. The rest of the section is devoted to its proof.

Theorem 3.4 (Nichtnegativstellensatz). Let f ∈RhXi_2d. (1) f|_B0 if and only if f ∈M_B,d+1.

(2) f|_D0 if and only if f ∈M_D_,d+1.

By [HM04,§2], f|_B0 if and only iff|_B_(σ(d))0. A similar statement holds for positive semidefiniteness onD. These results will be reproved in the course of proving Theorem 3.4.

3.3. Proof of Theorem 3.4. To facilitate considering the two cases (the ball B and the polydisc D) simultaneously, we note they both contain anε-neighborhoodN_ε of 0 for a small ε >0. Here

N_ε:= [

k∈N

n

A= (A1, . . . , An)∈Sⁿ_k |ε²−

n

X

i=1

A²_i 0 o

. (14)

(9)

3.3.1. A glance of polynomial identities. The following lemma is a standard result in polynomial identities, cf. [Row80]. It is well known that there are no nonzero polynomial identities that hold for all sizes of (symmetric) matrices. In fact, it is enough to test on anε-neighborhood of 0. An nc polynomial of degree < 2d that vanishes on all n-tuples of symmetric matrices A∈ N_ε(N)ⁿ, for someN ≥d, is zero (this uses the standard multilinearization trick together with e.g. [Row80,§2.5, §1.4]).

Lemma 3.5. If f ∈RhXi is zero onN_ε for some ε >0, then f = 0.

A variant of this lemma which we shall employ is as follows:

Proposition 3.6.

(1) Suppose f =P

ig_i^∗gi+P

ih^∗_i(1−P

jX_j²)hi ∈M_B,d. Then f|_B= 0 ⇔ gi =hi= 0 for all i.

(2) Suppose f =P

ig_i^∗gi+P

i,jh^∗_i,j(1−X_j²)hi,j ∈M_D,d. Then f|_D= 0 ⇔ gi=hi,j = 0 for all i, j.

Proof. We only need to prove the (⇒) implication, since (⇐) is obvious. We give the proof of (1); the proof of (2) is a verbatim copy.

Consider f = P

ig^∗_ig_i +P

ih^∗_i(1−P

jX_j²)h_i ∈ M_B_,d satisfying f(A) = 0 for allA ∈ B. Let us chooseN > d and A∈B(N). Obviously we have

g_i(A)^tg_i(A)0 and h_i(A)^t(1−X

j

A²_j)h_i(A)0.

Since f(A) = 0 this yields

g_i(A) = 0 and h_i(A)^t(1−X

j

A²_j)h_i(A) = 0 for alli.

By Lemma3.5,g_i = 0 for all i. Likewise, h^∗_i(1−P

jX_j²)h_i= 0 for all i. As there are no zero divisors in the free algebra RhXi, the latter implieshi = 0.

3.3.2. Hankel matrices.

Definition 3.7. To each linear functional L:RhXi_2d →R we associate a matrix HL (called an nc Hankel matrix) indexed by words u, v∈ hXi_d, with

(HL)u,v =L(u^∗v). (15)

IfL ispositive, i.e.,L(p^∗p)≥0 for allp∈RhXi_d, thenH_L0.

Given g ∈ SymRhXi, we associate to L the localizing matrix H_L,g^shift indexed by words u, v∈ hXi_d−deg(g)/2 with

(H_L,g^shift)u,v =L(u^∗gv). (16) IfL(h^∗gh)≥0 for all h withh^∗gh∈RhXi_2d thenH_L,g^shift 0.

We say that L isunital ifL(1) = 1.

Remark 3.8. Note that a matrixH indexed by words of length ≤dsatisfying thenc Hankel condition Hu1,v1 =Hu2,v2 wheneveru^∗₁v1 =u^∗₂v2, gives rise to a linear functional Lon RhXi_2d as in (15). IfH 0, then Lis positive.

(10)

Definition 3.9. Let A ∈ R^s×s be a symmetric matrix. A (symmetric) extension of A is a symmetric matrix ˜A∈R(s+`)×(s+`) of the form

A˜=

A B B^t C

for someB∈R^s×` andC ∈R^`×`. Such an extension isflat if rankA= rank ˜A, or, equivalently, ifB =AZ and C=Z^tAZ for some matrix Z.

For later reference we record the following easy linear algebra fact.

Lemma 3.10.

A B B^t C

0 if and only if A 0, and there is some Z with B = AZ and CZ^tAZ.

3.3.3. GNS construction. Suppose L : RhXi_2d+2 → R is a linear functional and let ˇL : RhXi_2d →R denote its restriction. As in Definition 3.7 we associate toL and ˇL the Hankel matrices HL and HLˇ, respectively. In block form,

H_L=

HLˇ B B^t C

. (17)

IfHL is flat overHLˇ, we callL (1-step)flat.

Proposition 3.11. Suppose L:RhXi_2d+2 →R is positive and flat. Then there is an n-tuple A of symmetric matrices of size s≤σ(d) = dimRhXi_d and a vector ξ ∈R^s such that

L(p^∗q) =hp(A)ξ, q(A)ξi (18) for all p, q∈RhXi with degp+ degq≤2d.

Proof. For this we use the Gelfand-Naimark-Segal (GNS) construction. Let H_L,L, Hˇ Lˇ be as above. NoteHL(and henceHLˇ) is positive semidefinite. Since HL is flat overHLˇ, there exist s linearly independent columns of HLˇ labeled by words w∈ hXi with degw≤d which form a basisB of E = RanH_L. Now L (or, more precisely, H_L) induces a positive definite bilinear form (i.e., a scalar product) h , i_E on E.

Let Ai be the left multiplication with Xi on E, i.e., if w denotes the column of HL

labeled by w∈ hXi_d+1, then Ai: u7→Xiu foru ∈ hXi_d. The operator Ai is well defined and symmetric:

hA_ip, qi_E =L(p^∗X_iq) =hp, A_iqi_E.

Let ξ := 1, and A = (A1, . . . , An). Note it suffices to prove (18) for words u, w ∈ hXi with degu+ degw≤2d. Since theA_iare symmetric, there is no harm in assuming degu,degw≤d.

Now compute

L(u^∗w) =hu, wi_E =hu(A)1, w(A)1i_E =hu(A)ξ, w(A)ξi_E.

3.3.4. Separation argument. The following technical proposition is a variant of a Powers- Scheiderer result [PS01,§2].

Proposition 3.12. M_B,dandM_D,d are closed convex cones in the finite dimensional real vector space SymRhXi_2d.

(11)

Proof. We shall consider the case of the nc ball, whence letS = {1−P

iX_i²}; the proof for the polydisc is similar. By Carath´eodory’s theorem on convex hulls, each element ofM_S,d can be written as the sum of at mostm:=σ(d) + 1 terms of the formg^∗g andh^∗(1−Pn

i=1X_i²)h whereg∈RhXi_d,h∈RhXi_d−1. HenceM_S,d is the image of the map

Φ : (

RhXi^m+1_d ×RhXi^m+1_d−1 →SymRhXi_2d (g1, . . . , gm+1, h1, . . . , hm+1)7→Pm+1

j=1 g_j^∗gj+Pm+1

j=1 h^∗_j 1−Pn i=1X_i²

hj. We claim that Φ⁻¹(0) = {0}. If f = Pm+1

j=1 g_j^∗gj +Pm+1

j=1 h^∗_j 1−Pn i=1X_i²

hj = 0, then Proposition3.6showsg_j = 0 =h_j for allj. This proves that Φ⁻¹(0) ={0}. Together with the fact that Φ is homogeneous [PS01, Lemma 2.7], this implies that Φ is a proper and therefore a closed map. In particular, its imageM_S,d is closed in SymRhXi_2d.

3.3.5. Concluding the proof of Theorem 3.4. We now have all the tools needed to prove the Nichtnegativstellensatz 3.4. We prove (1) and leave (2) as an exercise for the reader. The implication (⇐) is trivial (cf. Proposition 2.5), so we only consider the converse.

Assumef 6∈M_B_,d+1. By the Hahn-Banach separation theorem and Proposition3.12, there is a linear functional

L:RhXi_2d+2 →R (19)

satisfying

L M_B_,d+1

⊆[0,∞), L(f)<0. (20) Let ˇL:=L|_R_hXi_2d.

Lemma 3.13. There is a positive flatlinear functional Lˆ :RhXi_2d+2 →Rextending L.ˇ Proof. Consider the Hankel matrixH_L presented in block form

H_L=

HLˇ B B^t C

.

The top left block HLˇ is indexed by words of degree ≤ d, and the bottom right block C is indexed by words of degree d+ 1.

We shall modify C to make the new matrix flat overHLˇ. By Lemma 3.10, there is some Z withB =HLˇZ and CZ^tHLˇZ. Let us form

H =

HLˇ B B^t Z^tHLˇZ

.

Then H 0 and H is flat over HLˇ by construction. It also satisfies the Hankel constraints (cf. Remark3.8), since there are no constraints in the bottom right block. (Note: this uses the noncommutativity and the fact that we are considering only extensions of one degree.) Thus H is a Hankel matrix of a positive linear functional ˆL:RhXi_2d+2→R which is flat.

The linear functional ˆL satisfies the assumptions of Proposition 3.11. Hence there is an n-tupleA of symmetric matrices of sizes≤σ(d) and a vectorξ ∈R^s such that

L(pˆ ^∗q) =hp(A)ξ, q(A)ξi for all p, q∈RhXi with degp+ degq≤2d. By linearity,

hf(A)ξ, ξi= ˆL(f) =L(f)<0. (21) It remains to be seen that A is a row contraction, i.e., 1−P

jA²_j 0. For this we need to recall the construction of theA_j from the proof of Proposition3.11.

(12)

Let E = RanH_L_ˆ. There exist s linearly independent columns of HLˇ labeled by words w∈ hXi with degw≤dwhich form a basisBofE. The scalar product on E is induced by ˆL, and A_i is the left multiplication withX_i on E, i.e., A_i:u7→X_iuforu∈ hXi_d.

Let u∈E be arbitrary. Then there areαv ∈R forv∈ hXi_dwith u= X

v∈hXid

α_vv.

Writeu=P

vαvv∈RhXi_d.Now compute (1−X

j

A²_j)u, u

= X

v,v⁰∈hXi_d

αvα_v⁰

(1−X

j

A²_j)v, v⁰

=X

v,v⁰

α_vα_v⁰ v, v⁰

−X

v,v⁰

α_vα_v⁰X

j

A_jv, A_jv⁰

=X

v,v⁰

α_vα_v⁰L(vˆ ^0∗v)−X

v,v⁰

α_vα_v⁰X

j

L(vˆ ^0∗X_j²v)

= ˆL(u^∗u)−X

j

L(uˆ ^∗X_j²u) =L(u^∗u)−X

j

L(uˆ ^∗X_j²u).

(22)

Here, the last equality follows from the fact that ˆL|_R_hX_i

2d = ˇL=L|_R_hXi

2d. We now estimate the summands ˆL(u^∗X_j²u):

L(uˆ ^∗X_j²u) =HLˆ(Xju, Xju)≤HL(Xju, Xju) =L(u^∗X_j²u). (23) Using (23) in (22) yields

(1−X

j

A²_j)u, u

=L(u^∗u)−X

j

L(uˆ ^∗X_j²u)

≥L(u^∗u)−X

j

L(u^∗X_j²u) =L u^∗(1−X

j

X_j²)u

≥0, where the last inequality is a consequence of (20).

All this shows that Ais a row contraction, that is, A∈B. As in (21), hf(A)ξ, ξi=L(f)<0,

contradicting our assumptionf|_B0 and finishing the proof of Theorem 3.4.

4. Optimization of nc polynomials is a single SDP

In this section we thoroughly explain how eigenvalue optimization of an nc polynomial over the ball or polydisc is a single SDP.

4.1. Semidefinite Programming (SDP). Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space [Nem07,BTN01, VB96]. The importance of semidefinite programming was spurred by the development of efficient (e.g. interior point) methods which can find an ε-optimal solution in a polynomial time in s, m and logε, where sis the order of the matrix variables m is the number of linear constraints. There exist several open source packages which find such solutions in practice. If the problem is of medium size (i.e., s≤ 1000 and m ≤ 10.000), these packages are based on interior point methods (see e.g. [dK02,NT08]), while packages for larger semidefinite programs

(13)

use some variant of the first order methods (cf. [MPRW09, WGY10]). For a comprehensive list of state of the art SDP solvers see [Mit03].

4.1.1. SDP and nc polynomials. LetS⊆SymRhXibe finite and let f ∈SymRhXi_2d. We are interested in the smallest eigenvalue f?∈Rthe polynomial f can attain on D_S, i.e.,

f? := inf

hf(A)ξ, ξi |A∈ D_S, ξ a unit vector . (24) Hence f? is the greatest lower bound on the eigenvalues of f(A) for tuples of symmetric matrices A∈ D_S, i.e., (f −f?)(A)0 for all A∈ D_S, andf? is the largest real number with this property.

From Proposition2.5 it follows that we can boundf_? from below as follows f? ≥ f_sohs^(s) := sup λ

s. t. f−λ∈M_S,s, (SPSDPeig−min)

for s ≥ d. For each fixed s this is an SDP and leads to the noncommutative version of the Lasserre relaxation scheme, cf. [PNA10]. However, as a consequence of the Nichtnegativstel- lensatz 3.4, if D_S is the ball B or the polydisc D then we do not need sequences of SDPs, a single SDP suffices.

4.2. Optimization of nc polynomials over the ball. In this subsection we consider S = 1−Pn

i=1X_i² and the corresponding nc semialgebraic set B=D_S, the so-called nc ball.

From Theorem 3.4 it follows that we can rephrase f_?, the greatest lower bound on the eigenvalues off ∈RhXi_2d over the ballB, as follows:

f? = fsohs = sup λ

s. t. f−λ∈M_S,d+1. (PSDPeig−min) Remark 4.1. We note that f? > −∞ since positive semidefiniteness of a polynomial f ∈ RhXi_2don B only needs to be tested on the compact setB(N) for some N ≥σ(d).

Verifying whether f ∈M_B,d is a semidefinite programming feasibility problem:

Proposition 4.2. Let f = P

w∈hXi_2df_ww. Then f ∈ M_B_,d if and only there exist positive semidefinite matrices H and G of order σ(d) and σ(d−1), respectively, such that for all w∈ hXi_2d,

fw= X

u,v∈hXid u∗v=w

H(u, v) + X

u,v∈hXid−1 u∗v=w

G(u, v)−

n

X

j=1

X

u,v∈hXid−1 u∗X2

jv=w

G(u, v). (25)

Proof. By definitionM_S,d contains only nc polynomials of the form X

i

h^∗_ihi+X

i

g^∗_i 1−X

j

X_j²

gi, deghi ≤d, deggi ≤d−1.

If f ∈ MS,d then we can obtain from hi, gi column vectors Gi and Hi of length σ(d) and σ(d−1), respectively, such thath_i =H_i^tW_d and g_i =G^t_iWd−1. Let us define H := P

iH_iH_i^t

(14)

and G:=P

iG_iG^t_i. It follows that f =X

i

W_d^∗H_iH_i^tW_d+X

i

W_d−1^∗ G_i 1−X

j

X_j²

G^t_iWd−1

=W_d^∗ X

i

H_iH_i^t

W_d+W_d−1^∗ X

i

G_iG^t_i−X

j

X_j X

i

G_iG^t_i X_j

Wd−1

=W_d^∗HWd

| {z }

=:S1

+W_d−1^∗ GWd−1

| {z }

=:S2

−W_d^∗X

i,j

G^j_i(G^j_i)^tWd

| {z }

=:S3

,

(26)

where the column vectorsG^j_i are defined by G^j_i(u) =

(Gi(v), ifu=Xjv, 0, otherwise.

We have to show that (26) is exactly (25), i.e.,G and H are feasible for (25). Let us consider G˜ := P

i,jG^j_i(G^j_i)^t. Suppose w = u^∗v for some u, v ∈ hXi_d. Equation (26) implies that fw is the sum of all coefficients corresponding to w in sums S1, S2 and S3. The coefficient corresponding to w in S1 is X

u,v∈Wd u∗v=w

H(u, v). If in additionw ∈ hXi_2d−2, then w appears also in the summandS₂ with coefficient X

u,v∈Wd−1 u∗v=w

G(u, v). In the third summand S₃ appear exactly the words w which can be decomposed as w =u^∗v =u^∗₁X_j²v1 for some 1≤ j ≤ n and some u₁, u₂ ∈ hXi_d−1. Such whave coefficients

−

n

X

j=1

X

u1,v1∈hXid−1 u∗

1X2 jv1=w

G(X˜ ju1, Xjv1) = −

n

X

j=1

X

1X2 jv1=w

X

i

G^j_i(Xju1)G^j_i(Xjv1)

−

n

X

j=1

X

1X2 jv1=w

X

i

Gi(u1)Gi(v1) = −

n

X

j=1

X

1X2 jv1=w

G(u1, v1).

Therefore matrices H and Gare feasible for (25).

To prove the converse we start with rank one decompositions: H = P

iH_iH_i^t and G = P

iGiG^t_i. If we definehi=H_i^tW_dandgi =G^t_iWd−1 then feasibility ofHandGfor (25) implies X

i

h^∗_ih_i+X

i

g_i^∗ 1−X

j

X_j² g_i = X

i

X

u,v∈hXi_d

H_i(u)H_i(v)u^∗v+X

i

X

u,v∈hXi_d−1

G_i(u)G_i(v)u^∗v−X

j

G_i(u)G_i(v)u^∗X_j²v

= X

w∈hXi_2d

X

u,v∈hXid u∗v=w

H(u, v)w+ X

w∈hXi2d−2

X

G(u, v)w− X

w∈hXi_2d

X

j

X

jv=w

G(u, v)w

= X

w∈hXi_2d

fww=f, concluding the proof.

(15)

Remark 4.3. The last part of the proof of Proposition4.2explains how to construct the sohs decomposition with weights (8) forf ∈M_B_,d. First we solve semidefinite feasibility problem in the variablesH ∈S⁺_σ(d),G∈S⁺_σ(d−1)subject to constraints (25). Then we compute by Cholesky or eigenvalue decomposition vectorsH_i ∈R^σ(d)and G_i∈R^σ(d−1)such that H=P

iH_iH_i^tand G=P

iGiG^t_i. Polynomials hi andgi from (8) are computed ashi =H_i^tWdand gi =G^t_iWd−1. By Proposition 4.2, the problem (PSDPeig−min) is a SDP; it can be reformulated as

fsohs = sup f1− hE_1,1, Hi − hE_1,1, Gi

s. t. fw = X

u,v∈hXid+1 u∗v=w

H(u, v) + X

u,v∈hXid u∗v=w

G(u, v)−

n

X

j=1

X

u,v∈hXid u∗X2

jv=w

G(u, v), for all 16=w∈ hXi_2d+2,

H ∈ S⁺_σ(d+1), G ∈ S⁺_σ(d).

(PSDP’eig−min) The dual semidefinite program to (PSDPeig−min) and (PSDP’eig−min) is:

Lsohs = infL(f)

s. t. L: SymRhXi_2d+2 →R is linear L(1) = 1

L(q^∗q)≥0 for all q ∈RhXi_d+1 L(h^∗(1−P

jX_j²)h)≥0 for all h∈RhXi_d.

(DSDPeig−min)_d+1

Proposition 4.4. (DSDPeig−min)_d+1 admits Slater points.

Proof. For this it suffices to find a linear map L : SymRhXi_2d+2 → R satisfying L(p^∗p) >0 for all nonzero p ∈ RhXi_d+1, and L(h^∗(1−P

jX_j²)h) > 0 for all nonzero h ∈ RhXi_d. We again exploit the fact that there are no nonzero polynomial identities that hold for all sizes of matrices, which was used already in Proposition 3.6.

Let us chooseN > d+ 1 and enumerate a dense subsetU ofN ×N matrices fromB (for instance, take allN ×N matrices fromBwith entries in Q), that is,

U ={A^(k):= (A^(k)₁ , . . . , A^(k)_n )|k∈N, A^(k)_j ∈B(N)}.

To each B ∈ U we associate the linear map

L_B : SymRhXi_2d+2 →R, f 7→trf(B).

Form

L:=

∞

X

k=1

2^−k L_A(k)

kL_A(k)k. We claim thatL is the desired linear functional.

Obviously, L(p^∗p) ≥0 for all p∈ RhXi_d+1. Suppose L(p^∗p) = 0 for some p ∈RhXi_d+1. Then L_A(k)(p^∗p) = 0 for all k ∈ N, i.e., for all k we have trp^∗(A^(k))p(A^(k))) = 0, hence p^∗(A^(k)))p(A^(k))) = 0. SinceU was dense inB(N), by continuity it follows thatp^∗pvanishes on alln-tuples fromB(N). Proposition3.6implies that p= 0. Similarly,L(h^∗(1−P

jX_j²)h) = 0 implies h= 0 for allh∈RhXi_d.

(16)

Remark 4.5. Having Slater points for (DSDPeig−min)_d+1 is important for the clean duality theory of SDP to kick in [VB96, dK02]. In particular, there is no duality gap, so L_sohs = f_sohs(=f?). Since also the optimal valuef_sohs>−∞(cf. Remark4.1),f_sohs is attained. More important for us and the extraction of optimizers is the fact thatL_sohs is attained, as we shall explain in §5.

4.3. Optimization of NC polynomials over the polydisc. In this section we consider S={1−X₁², . . . ,1−X_n²} (27) and the corresponding nc semialgebraic set

D=D_S= [

k∈N

A= (A₁, . . . , A_n)∈Sⁿ_k |1−A²₁0, . . . ,1−A²_n0 ,

the so-called nc polydisc. Many of the considerations here resemble those from the previous subsection, so we shall be sketchy at times.

The truncated quadratic module tailored for this S is M_D_,d =n X

i

h^∗_is_ih_i|h_i∈RhXi, s_i∈S∪ {1},deg(h^∗_is_ih_i)≤2do .

Theorem 3.4 implies that the problem (PSDPeig−min), where S is from (27), yields also the greatest lower bound on the eigenvalues of an nc polynomial f over the polydisc.

Similarly to Proposition 4.2we can prove:

Proposition 4.6. Let f =P

w∈hXi_2dfww. Then f ∈M_D,d if and only there exists a positive semidefinite matrixH of order σ(d), and positive semidefinite matrices Gi, 1≤i≤nof order σ(d−1)such that

f_w = X

u,v∈hXid u∗v=w

H(u, v) +X

i

X

G_i(u, v)−

n

X

i=1

X

iv=w

G_i(u, v), for all w∈ hXi_2d. (28) Proof. Iff ∈M_D,d then we can find h_i ∈RhXi_d and g_i,j ∈RhXi_d−1 such that

f =X

i

h^∗_ih_i+X

i,j

g_i,j^∗ (1−X_j²)g_i,j.

These polynomials yield column vectorsH_i andG_i,j of lengthσ(d) andσ(d−1), respectively, such that hi = H_i^tWd and gi,j = G^t_i,jWd−1. Let us define H := P

iHiH_i^t, Gj := P

iGi,jG^t_i,j and G:=P

jG_j. It follows that

f = X

i

W_d^∗HiH_i^tWd+X

i,j

W_d−1^∗ Gi,j(1−X_j²)G^t_i,jWd−1

= W_d^∗(X

i

HiH_i^t)W_d+W_d−1^∗ X

i,j

Gi,jG^t_i,j−X

j

Xj(X

i

Gi,jG^t_i,j)Xj

Wd−1

= W_d^∗HW_d

| {z }

=:S1

+W_d−1^∗ GWd−1

| {z }

=:S2

−W_d^∗X

i,j

G^j_i(G^j_i)^tW_d

| {z }

=:S3

,

(17)

where the column vectorsG^j_i are defined by G^j_i(u) =

Gi,j(v), ifu=Xjv, 0, else.

Let us consider ˜G := P

i,jG^j_i(G^j_i)^t. Suppose w = u^∗v for some u, v ∈ hXi_d. We can find w in S1; the corresponding coefficient is exactly X

u,v∈hXid u∗v=w

H(u, v). If we additionally have w∈ hXi_2d−2 thenwappears also in the summandS₂ with coefficient X

G(u, v). In the third summandS₃ there appear exactly the wordswwhich can be decomposed asw=u^∗₁X_j²v₁ for some 1≤j≤nand some u1, v1 ∈ hXi_d−1. Such whave coefficients

−

n

X

j=1

X

1X2 jv1=w

G(X˜ ju1, Xjv1) = −

n

X

j=1

X

1X2 jv1=w

X

i

G^j_i(Xju1)G^j_i(Xjv1) =

−

n

X

j=1

X

1X2 jv1=w

X

i

Gi,j(u1)Gi,j(v1) = −

n

X

j=1

X

1X2 jv1=w

Gj(u1, v1).

Therefore matrices H and Gi are feasible for (28).

To prove the converse we start with rank one decompositions: H =P

iHiH_i^t and Gj = P

iG_i,jG^t_i,j. If we defineh_i=H_i^tW_d andg_i,j =G^t_i,jWd−1 then feasibility ofH and G_j for (28) implies

X

i

h^∗_ihi+X

i,j

g_i,j^∗ (1−X_j²)gi,j = X

i

X

u,v∈W_d

Hi(u)Hi(v)u^∗v+X

i,j

X

u,v∈Wd−1

Gi,j(u)Gi,j(v)u^∗v−X

i,j

Gi,j(u)Gi,j(v)u^∗X_j²v

= X

w∈W_2d

X

u,v∈Wd u∗v=w

H(u, v)w+ X

w∈W_2d−2

X

u,v∈Wd−1 u∗v=w

X

j

G_j(u, v)w− X

w∈W_2d

X

j

X

u,v∈Wd−1 u∗X2

jv=w

G_j(u, v)w

= X

w∈W_2d

fww=f.

Remark 4.7. Similarly to Remark 4.3, the proof of Proposition4.6 shows how to construct an sohs decomposition with weights (9) for f ∈M_D,d.

By Proposition 4.6, the problem of computing f_? over the polydisc is an SDP. Its dual semidefinite program is:

L_sohs = infL(f)

s. t. L: SymRhXi_2d+2→R is linear L(1) = 1

L(q^∗q)≥0 for all q∈RhXi_d+1

L(h^∗(1−X_j²)h)≥0 for all h∈RhXi_d, 1≤j≤n.

(DSDPeig−min)_d+1

(18)

For implementational purposes, problem(DSDPeig−min)_d+1 is more conveniently given as L_sohs = infhH_L, G_fi

s. t. H_L(u, v) =H_L(w, z), ifu^∗v = w^∗z, whereu, v, w, z ∈ hXi_d+1 H_L(1,1) = 1, H_L∈S⁺_σ(d+1), H_L^j ∈S⁺_σ(d), ∀j

H_L^j(u, v) =H_L(u, v)−H_L(X_ju, X_jv), for allu, v∈ hXi_d, 1≤j≤n

(DSDP’eig−min)_d+1 whereG_f is a Gram matrix for f, and H_L^j representsL acting on nc polynomials of the form u^∗(1−X_j²)v, i.e., H_L^j is the localizing matrix for 1−X_j².

Proposition 4.8. (DSDPeig−min)_d+1 admits Slater points.

Proof. We omit the proof as it is the same as that of Proposition4.4.

Like above, by Proposition4.8,L_sohs =f_sohs(=f_?) and the optimal valuef_sohsis attained.

Corollary5.2 from the next section shows that alsoLsohs is attained.

4.4. Examples. We have implemented the construction of the above SDPs in our open source toolboxNCSOStools. Using a standard SDP solver (such as SDPA [YFK03], SDPT3 [TTT99]

or SeDuMi [Stu99]) the constructed SDPs can be solved. We demonstrate the software on the polynomials from Examples2.1and 3.2.

>> NCvars x y

>> f1 = 2 + x*y*x*y + y*x*y*x;

>> f2 = 2 - x^2 + x*y^2*x - y^2;

We compute the optimal valuef_? on the ball by solving (DSDPeig−min)_d+1.

>> NCminBall(f1) ans = 1.5000

>> NCminBall(f2) ans = 1.0000

Similarly we computef_? on the polydisc by solving(DSDP’eig−min)_d+1.

>> NCminCube(f1) ans = 4.0234e-013

>> NCminCube(f2) ans = 1.0872e-011

Note: the minimum of the commutative collapse ˇf₁ of f₁ over the ball B(1) = {(x, y) ∈ R² |x² +y² ≤1} and the polydisc D(1) = {(x, y) ∈ R² | |x| ≤ 1,|y| ≤1} is equal to 2 and both minima for ˇf₂ are equal to 1.

Together with the optimal valuef? our software can also return a certificate for positivity of f −f_?, i.e., a sohs decomposition with weights forf −f_? as presented in (8) and (9). For example:

>> params.precision=1e-6;

>> [opt,g,decom_sohs,decom_ball] = NCminBall(f2,params) opt = 1.0000

g = 1-x^2-y^2 decom_sohs = 0

0