Algebraic Statistics

(1)

Algebraic Statistics

(2)

(3)

Karl-Heinz Zimmermann

Algebraic Statistics

Hamburg University of Technology

(4)

21071 Hamburg Germany

c2009, 2015 Karl-Heinz Zimmermann, author

urn:nbn:de:gbv:830-88213556

(5)

For my Teachers

Thomas Beth

^†

Adalbert Kerber

Sun-Yuan Kung

Horst M¨ uller

(6)

(7)

Preface

Algebraic statistics brings together ideas from algebraic geometry, commutative algebra, and combina- torics to address problems in statistics and its applications. Computer algebra provides powerful tools for the study of algorithms and software. However, these tools are rarely prepared to address statistical challenges and therefore new algebraic results need often be developed. This way of interplay between algebra and statistics fertilizes both disciplines.

Algebraic statistics is a relatively new branch of mathematics that developed and changed rapidly over the last ten years. The seminal work in this field was the paper of Diaconis and Sturmfels (1998) introducing the notion of Markov bases for toric statistical models and showing the connection to commutative algebra. Later on, the connection between algebra and statistics spread to a number of different areas including parametric inference, phylogenetic invariants, and algebraic tools for maximum likelihood estimation. These connection were highlighted in the celebrated bookAlgebraic Statistics for Computational Biology of Pachter and Sturmfels (2005) and subsequent publications.

In this report, statistical models for discrete data are viewed as solutions of systems of polynomial equations. This allows to treat statistical models for sequence alignment, hidden Markov models, and phylogenetic tree models. These models are connected in the sense that if they are interpreted in the tropical algebra, the famous dynamic programming algorithms (Needleman-Wunsch, Viterbi, and Felsenstein) occur in a natural manner. More generally, if the models are interpreted in a higher di- mensional analogue of the tropical algebra, the polytope algebra, parametric versions of these dynamic programming algorithms can be established.

Markov bases allow to sample data in a given fibre using Markov chain Monte Carlo algorithms.

In this way, Markov bases provide a means to increase the sample size and make statistical tests in inferential statistics more reliable. We will calculate Markov bases using Groebner bases in commutative polynomial rings.

The manuscript grew out of lectures on algebraic statistics held for Master students of Computer Science at the Hamburg University of Technology. It appears that the first lecture held in the summer term 2008 was the first course of this kind in Germany. The current manuscript is the basis of a four- hour introductory course. The use of computer algebra systems is at the heart of the course. Maple is employed for symbolic computations, Singular for algebraic computations, and R for statistical computations. The second edition at hand is just a streamlined version of the first one.

Hamburg, Nov. 2015 Karl-Heinz Zimmermann

(8)

(9)

Part I

Algebraic and Combinatorial Methods

(14)

(15)

1 Commutative Algebra

Commutative algebra is a branch of abstract algebra that studies commutative rings and their ideals.

Both algebraic geometry and algebraic number theory are built on commutative algebra. Ideals in polynomial rings are usually studied by their Groebner bases. The latter can be used to tackle important problems like testing the membership in ideals and solving polynomial equations.

1.1 Polynomial Rings

LetKbe a field. A monomial in a collection of variables or unknowns X1, . . . , Xn overKis a product X^α=X₁^α¹· · ·X_n^αⁿ, α1, . . . , αn ∈N₀. (1.1) The total degree of a monomial X^α is the sum of the exponents |α| = α1+. . .+αn. For instance, X₁²X₃³X4 is a monomial of total degree 6 in the variables X1, X2, X3, X4, since α = (2,0,3,1) and

|α|= 6.

We can form linear combinations of monomials with coefficients in K. The resulting objects are polynomials inX1, . . . , Xn overK. A general polynomialf inX1, . . . , Xn with coefficients inKhas the form

f =X

α

cαX^α, cα∈K, (1.2)

where the sum is over a finite number of elements α ∈ Nⁿ₀. A nonzero product cαX^α involved in a polynomial is called a term and the scalar cα is called the coefficient of the term. For instance, takingKto be the field Qof rational numbers and using the variablesX,Y, Z instead of subscripts, f =X²+Y Z−1 is a polynomial containing three terms.

The set of all polynomials in X1, . . . , Xn with coefficients inK is denoted byK[X1, . . . , Xn]. The polynomials inK[X1, . . . , Xn] can be added and multiplied as usual,

X

α

cαX^α

! +



X

β

dβX^β



=X

α

(cα+dα)X^α, (1.3)

(16)

X

α

cαX^α

!

·



X

β

dβX^β



=X

α,β

(cadb)X^α+β. (1.4)

ThusK[X1, . . . , Xn] forms a commutative ring with identity calledpolynomial ringinX1, . . . , XnoverK. Moreover, the addition of polynomials inK[X1, . . . , Xn] suggests thatK[X1, . . . , Xn] forms an infinite- dimensionalK-vector space with the monomials as aK-basis.

Each nonzero polynomial f in K[X1, . . . , Xn] has adegree, denoted by deg(f). This is the largest total degree of a monomial occurring inf with a nonzero coefficient. For instance,f = 4X³+3Y⁵Z−Z⁴ is a polynomial of degree 6 inQ[X, Y, Z]. The nonzero elements ofKare the polynomials of degree 0. For any nonzero polynomialsf andg in K[X1, . . . , Xn], we have deg(f g) = deg(f) + deg(g) by comparing monomials of largest degree. Thus the polynomial ringK[X1, . . . , Xn] is an integral domain (i.e., it has no zero divisors) and only nonzero constant polynomials have multiplicative inverses inK[X1, . . . , Xn].

Hence,K[X1, . . . , Xn] is not a field.

A polynomial f in K[X1, . . . , Xn] is calledhomogeneous if all involved monomials have the same total degree. For instance, f = 3X⁴+ 5Y Z³−X²Z² is a homogenous polynomial of total degree 4 in Q[X, Y, Z]. It is clear that each polynomialf inK[X1, . . . , Xn] can be written as a sum of homogeneous polynomials called thehomogeneous componentsoff. For instance,f = 3X⁴+ 5Y Z³+X²−Y²−1 in Q[X, Y, Z] is a sum of homogeneous componentsf⁽⁴⁾ = 3X⁴+ 5Y Z³,f⁽²⁾=X²−Y², andf⁽⁰⁾=−1, Example 1.1 (Singular). Polynomial rings can be generated over different fields. The polynomial ringQ[X, Y, Z] is defined as

> ring r1 = 0, (x,y,z), dp;

> poly f = x2y-z2

> f*f-f;

x4y2-2x2yz2+z4-x2y+z2

Polynomials can be written in short (e.g.,x2y−z2) or long (e.g.,x²y−z²) notation. The definition of polynomial rings over other fields follows the same pattern such as the polynomial ring over the finite fieldZ₅,

> ring r2 = 5, (x,y,z), dp;

the polynomial ring over the finite Galois field GF(8),

> ring r3 = (2^3,a), (x,y,z), dp; // primitive element a

> number n = a2+1; // element of GF(8)

> n*n;

a5

and the polynomial ring over the extension fieldQ(a, b),

> ring r4 = (0,a,b), (x,y,z), dp;

> number n = 2a+1/b2; // element of Q(a,b)

> n*n;

4a2b4+4ab2+1/(b4)

♦

(17)

1.2 Ideals 5

1.2 Ideals

Ideals are the most prominent structures studied in polynomial rings.

A nonempty subsetI of the polynomial ringK[X1, . . . , Xn] is anideal if

• for eachf, g∈I, we have−f andf +g∈I, and

• for eachf ∈I andg∈K[X1, . . . , Xn], we havef·g∈I.

The first condition ensures that I is an additive subgroup of K[X1, . . . , Xn] and equals the subgroup criterion which says that for eachf, g∈I, we havef−g∈I.

Lemma 1.2.Let f1, . . . , fs be polynomials inK[X1, . . . , Xn], Then the set hf1, . . . , fsi=

( _s X

i=1

hifi|h1, . . . , hs∈K[X1, . . . , Xn] )

(1.5) is an ideal of K[X1, . . . , Xn], the smallest ideal of K[X1, . . . , Xn]containingf1, . . . , fs.

Proof. Letf, g ∈ hf1, . . . , fsi. Write f =h1f1+. . .+hsfs and g=h^′₁f1+. . .+h^′_sfs, where hi, h^′_i ∈ K[X1, . . . , Xn], 1≤i≤s. Thenf−g = (h1−h^′₁)f1+. . .+ (hs−h^′_s)fs and thusf−g∈ hf1, . . . , fsi. Moreover, ifh∈K[X1, . . . , Xn] thenf·h= (h1h)f1+. . .+ (hsh)fsand thusf·h∈ hf1, . . . , fsi. In view of the last assertion, note that each ideal of K[X1, . . . , Xn] that contains f1, . . . , fs must also contain

hf1, . . . , fsi. ⊓⊔

The idealhf1, . . . , fsiis called theideal generated byf1, . . . , fs. The set{f1, . . . , fs}is sometimes called abasis of the ideal. In particular, the sets h∅i={0}andh1i=K[X1, . . . , Xn] are thetrivialideals.

There are several ways to construct new ideals from given ones.

Proposition 1.3.Let I andJ be ideals ofK[X1, . . . , Xn]. The sumofI andJ is the set

I+J ={f+g|f ∈I, g∈J}. (1.6)

• The sum I+J is an ideal ofK[X1, . . . , Xn].

• The sum I+J is the smallest ideal containingI∪J.

• If I=hf1, . . . , friandJ =hg1, . . . , gsi, then

I+J=hf1, . . . , fr, g1, . . . , gsi. (1.7) Proof. Letf, f^′∈I andg, g^′∈J. Then (f+g)−(f^′+g^′) = (f−f^′) + (g−g^′) inI+J. Moreover, let h∈K[X1, . . . , Xn]. Then (f+g)·h= (f ·h) + (g·h)∈I+J. Hence,I+J is an ideal.

LetLbe an ideal ofK[X1, . . . , Xn] containingI∪J. Iff ∈I andg∈J thenf+g∈Land thusL containsI+J.

Let h ∈ hf1, . . . , fr, g1, . . . , gsi. Then h = h1f1+. . .+hrfr+h^′₁g1+. . .+h^′_sgs, where hi, h^′_j ∈ K[X1, . . . , Xn], 1≤i≤r, 1≤j ≤s. Thus his of the form f +g, wheref ∈I and g∈J, and hence h∈I+J. Conversely, the idealhf1, . . . , fr, g1, . . . , gsicontainsI∪J and thus by the second assertion

must be equal toI+J. ⊓⊔

Proposition 1.4.Let I andJ be ideals ofK[X1, . . . , Xn]. The productof I andJ is the ideal

I·J =hf·g|f ∈I, g∈Ji. (1.8)

(18)

• The intersection I∩J is an ideal inK[X1, . . . , Xn].

• The product I·J is contained in the intersection I∩J.

• If I=hf1, . . . , friandJ =hg1, . . . , gsi, then

I·J =hfi·gj|1≤i≤r,1≤j ≤si. (1.9) Proof. Let f, g ∈ I∩J. Then f −g ∈ I and f −g ∈ J and so f −g ∈ I∩J. Let f ∈ I∩J and h∈K[X1, . . . , Xn]. Since Iand J are ideals,f·h∈Iandf ·h∈J. Thusf·h∈I∩J.

Let f ∈I andg ∈J. Thenf ·g is contained in bothI and J and thus belongs toI∩J. That is, I·J⊆I∩J.

Sincefi·gj belongs toI·J it follows that I·J containshfi·gj|1≤i≤r,1≤j≤si. Conversely, let h ∈ I ·J. Then h can be written in terms of generators f ·g, where f ∈ I and g ∈ I. But the constituents of these generators f and g can be written with respect to the bases f1, . . . , fr and g1, . . . , gs, respectively. Thus the polynomialhbelongs to the ideal hfi·gj|1≤i≤r,1≤j ≤si. ⊓⊔ Example 1.5 (Singular).The above ideal operations inQ[X, Y, Z] can be defined as follows,

> ring r = 0, (x,y,z), dp;

> ideal i = xyz, x2-y2;

> ideal j = x2-1, y2-z2;

> i+j _[1]=xyz _[2]=x2-y2 _[3]=x2-1 _[4]=y2-z2

> i*j

_[1]=x3yz-xyz _[2]=xy3z-xyz3 _[3]=x4-x2y2-x2+y2 _[4]=x2y2-y4-x2z2+y2z2

♦ Proposition 1.6.Let I be an ideal ofK[X1, . . . , Xn]. The set

√I={f ∈K[X1, . . . , Xn]|f^m∈I for some integer m≥1}. (1.10) is an ideal ofK[X1, . . . , Xn]containingI called theradical ofI with p√

I=√ I.

Proof. We haveI⊆√

I, sincef ∈I, i.e.,f¹∈I, impliesf ∈√ I.

Claim that √

I is an ideal. Indeed, let f, g ∈√

I. By definition, there are positive integersk and l such that f^k, gl ∈ I. Expanding (f +g)^k+l−1 by the binomial theorem shows that each term is a multiple of somef^mg^m^′ withm+m^′=k+l−1. Thus eitherk≥mor l≥m^′ and thusf^k org^lis in I. Thus all terms in (f +g)^k+l−1 belong toIand hencef+g lies in√

I.

Let f ∈√

I and g∈ K[X1, . . . , Xn]. By definition,f^m ∈I for some m≥1. Thus (f g)^m=f^mg^m belongs toI and hencef g lies in√

I. It follows that√

I is an ideal.

Claim that p√ I = √

I. Indeed, we have already shown that √

I lies in p√

I. Conversely, let f ∈p√

I. Then f^m∈√

I for some positive integermand thus (f^m)^l∈I for some positive integerl.

Thusf ∈√

Iand hence the claim follows. ⊓⊔

(19)

1.3 Monomial Orders 7

An idealI is calledradical if√

I=I. For instance, the above assertion shows that√

I is radical.

Example 1.7 (Singular).The computation of the radical of an ideal requires the loading of a library.

> LIB "primdec.lib"; // load library for radical

> ring r = 0, (x,y,z), dp;

> ideal i = xy, x2, y3-y5;

> radical(I);

_[1]=x _[2]=y3-y

♦ An ideal I of K[X1, . . . , Xn] is prime if I 6= K[X1, . . . , Xn] and for every pair of elements f, g ∈ K[X1, . . . , Xn],f g∈Iimpliesf ∈I org∈I.

An ideal I ofK[X1, . . . , Xn] is maximal ifI6=K[X1, . . . , Xn] andI is maximal with respect to set inclusion.

Lemma 1.8.Each maximal ideal m of K[X1, . . . , Xn]is prime.

Proof. Letf, g∈K[X1, . . . , Xn] with f g∈m. Supposef 6∈m. Thenm∪ hfi=K[X1, . . . , Xn], sincem is maximal. Then m+af = 1 for somem∈m anda∈K[X1, . . . , Xn]. Thusmg+af g=g and hence g∈m.

Example 1.9.For any fieldK, every maximal idealmofK[X1, . . . , Xn] is given as follows: take a finite algebraic extension fieldLofKand a point (a1, . . . , an)∈Lⁿ, consider the idealhX1−a1, . . . , Xn−ani ofL[X1, . . . , Xn], and put

m=hX1−a1, . . . , Xn−ani ∩K[X1, . . . , Xn].

In particular, ifKis algebraically closed, every maximal ideal ofK[X1, . . . , Xn] has the form m=hX1−a1, . . . , Xn−ani

for somea1, . . . , an∈K. ♦

Example 1.10.In the polynomial ringK[X1, . . . , Xn], every idealhSigenerated by a subset S of the set of variables{X1, . . . , Xn}is prime; in particular, if S=∅, thenhSi={0}. The only maximal ideal

among these prime ideals ishX1, . . . , Xni. ♦

1.3 Monomial Orders

We study several ways to order the terms of a polynomial. For this, we first consider orders on the set Nⁿ₀ ofn-tuples of natural numbers. The setNⁿ₀ forms a monoid with the component-wise addition

(α1, . . . , αn) + (β1, . . . , βn) = (α1+β1, . . . , αn+βn) and the zero vector 0 = (0, . . . ,0) is the identity element..

A monomial ordering onNⁿ₀ is a total ordering>onNⁿ₀ satisfying the following properties:

(20)

1. Ifα, β∈Nⁿ₀ withα > β andγ∈Nⁿ₀, thenα+γ > β+γ.

2. Ifα∈Nⁿ₀ andα6= 0, thenα >0.

The first condition shows that the ordering is compatible with the addition in Nⁿ₀ and the second condition means that 0 is the smallest element of the ordering. Both conditions imply that ifα, β∈Nⁿ₀, thenα+β > α.

For the monoidN₀, there is only one monomial ordering 0<1<2<3< . . . ,

but in monoidsNⁿ₀ withn≥2 there are infinitely many monomial orderings.

Example 1.11.The following orderings depend on the ordering of the variablesX1, . . . , Xn.

• Lexicographical ordering (lp):

α >lpβ :⇐⇒ ∃1≤i≤n: α1=β1, . . . , αi−1=βi−1, αi> βi.

• Degree lexicographical ordering (Dp):

α >Dpβ :⇐⇒ |α|>|β| ∨(|α|=|β| ∧α >lpβ).

• Degree reverse lexicographical ordering (dp):

α >dpβ :⇐⇒ |α|>|β| ∨(|α|=|β| ∧ ∃1≤i≤n: αn=βn, . . . , αi+1=βi+1, αi< βi).

In all three orderings, (1,0, . . . ,0), . . . ,(0, . . . ,0,1) > 0. For instance, (3,0,0) >lp (2,2,0) but (2,2,0)>Dp(3,0,0) and (2,2,0)>dp(3,0,0). Moreover, (2,1,2)>Dp(1,3,1) but (1,3,1)>dp(2,1,2).

♦

In the following, we require the natural component-wise ordering onNⁿ₀ given by (α1, . . . , αn)≤^nat (β1, . . . , βn) :⇐⇒ α1≤β1, . . . , αn≤βn. For instance, (1,1,2)≤^nat(2,1,2)≤^nat (2,1,4).

Theorem 1.12. (Dickson’s Lemma) Let A be a subset ofNⁿ₀. There is a finite subsetB of A such that for eachα∈Athere is a β∈B withβ ≤^natα.

The setB is called aDickson basisofA(Fig. 1.1).

Proof. Forn= 1 take the smallest element ofA⊆N₀as the only element of B.

Forn≥1,A⊆Nⁿ⁺¹₀ , andi∈N₀ define

Ai={α^′ ∈Nⁿ₀ |(α^′, i)∈A} ⊆Nⁿ⁺¹₀ . By induction,Ai has a Dickson basisBi. Furthermore, by induction,S

i∈N0Bi has a Dickson basisB^′. SinceB^′ is finite, there is an indexj such thatB^′⊆B1∪. . .∪Bj.

Claim that a Dickson basis ofAis given by

B={(β^′, i)∈Nⁿ⁺¹₀ |0≤i≤j, β^′ ∈Bi}.

Indeed, let (α^′, k)∈A. Thenα^′ ∈Ak. Since Bk is a Dickson basis ofAk, there is an elementβ^′ ∈Bk

such thatβ^′ ≤^nat α^′. If k≤j, then (β^′, k)∈B and (β^′, k)≤^nat (α^′, k). Otherwise, there areγ^′ ∈B^′ andi≤j such thatγ^′≤^natβ^′ and (γ^′, i)∈Bi. Then (γ^′, i)∈B and (γ^′, i)≤^nat(α^′, k). ⊓⊔

(21)

1.3 Monomial Orders 9

✲

✻

r❡ r r r r

r❡ r r r r r r

r r r r r r r

r❡ r r r r r r r

r r r r r r r r

r❡ r r r r r r r r

r r r r r r r r r

Fig. 1.1. A subsetAofN²₀ and a Dickson set ofA(encircled points).

Corollary 1.13.Each monomial ordering on Nⁿ₀ is a well-ordering.

Proof. Let>be a monomial ordering onNⁿ₀ andAbe a nonempty subset ofNⁿ₀. By Dickson’s lemma, the setAhas a Dickson basisB. Letα∈A. Then there is an elementβ∈Bwithβ≤^natα. Thus there is an element γ∈Nⁿ₀ withα=β+γ. Since 0≤γ, it follows thatβ ≤β+γ=α. Hence, the smallest element of the Dickson basis B with respect to the monomial ordering is the smallest element of A.

Therefore, the monomial ordering is a well-ordering. ⊓⊔

Corollary 1.14.For any monomial ordering>on Nⁿ₀, each decreasing chain of elements ofNⁿ₀ α⁽¹⁾> α⁽²⁾> . . . > α^(k)> . . .

becomes stationary (i.e., there is somej0 such thatα^(j)=α^(j⁰⁾ for allj ≥j0).

Proof. Put A ={α⁽ⁱ⁾ |i ∈ N}. By Corollary 1.13, A has a smallest element and hence the sequence

must become stationary. ⊓⊔

(22)

A monomial ordering>onNⁿ₀ carries forward to a monomial ordering on the set of monomials of the polynomial ringK[X1, . . . , Xn]. For this, define for allα, β∈N₀,

X^α> X^β :⇐⇒ α > β.

Since any monomial ordering is total, the terms that are involved in a polynomial ofK[X1, . . . , Xn] can be uniquely written in increasing or decreasing order. A polynomial f in K[X1, . . . , Xn] whose terms are written in decreasing order is incanonical form, i.e.,

f =c0X^α⁽⁰⁾+. . .+cmX^α^(m), ci∈K^∗,

where α⁽⁰⁾ > . . . > α^(m). Note that polynomials stored in canonical form can be efficiently tested on equality.

For polynomials in a polynomial ringK[X] with one unknown, there is only one monomial ordering, 1< X < X²< X³< . . . ,

but in polynomial rings with several unknowns there are infinitely many monomial orderings.

Example 1.15 (Singular). Polynomials are stored and printed in canonical form.

> ring r1 = 0, (x,y,z), lp;

> poly f = x3yz+y5; f;

x3yz+y5

> ring r2 = 0, (x,y,z), Dp;

> poly f = imap(r1,f); f;

x3yz+y5

> ring r2 = 0, (x,y,z), dp;

> poly f = imap(r1,f); f;

y5+x3yz

♦ The leading data of a polynomialf in K[X1, . . . , Xn] are defined as follows:

• leading term lt_>(f) =c0X^α⁽⁰⁾,

• leading coefficient lc_>(f) =c0, and

• leading monomial lm_>(f) =X^α⁽⁰⁾.

A polynomialf is called monicif its leading coefficient is equal to 1.

Example 1.16.Consider the polynomialf = 4XY²Z+ 4Z²−5X³+ 7X²Z² in Q[X, Y, Z], whereX corresponds toX^(1,0,0),Y toX^(0,1,0), andZ to X^(0,0,1). Thus

f = 4X^(1,2,1)+ 4X^(0,0,2)−5X^(3,0,0)+ 7X^(2,0,2).

In thelpordering, (3,0,0)≥(2,0,2)≥(1,2,1)≥(0,0,2) and the canonical form is f =−5X³+ 7X²Z²+ 4XY²Z+ 4Z².

(23)

1.4 Division Algorithm 11

In theDpordering, (2,0,2)≥(1,2,1)≥(3,0,0)≥(0,0,2) and the canonical form is f = 7X²Z²+ 4XY²Z−5X³+ 4Z²,

In thedpordering, (1,2,1)≥(2,0,2)≥(3,0,0)≥(0,0,2) and the canonical form is f = 4XY²Z+ 7XY²Z−5X³+ 4Z².

♦ Example 1.17 (Singular). The leading data of a polynomial can be obtained as follows.

> ring r = 0, (x,y,z), lp;

> poly f = (xy-z)*(x2-yz);

> f;

x3y-x2z-xy2z+yz2

> leadmonom(f);

x3y

> leadexp(f);

3,1,0

> leadcoef(f);

1

> lead(f);

x3y

> f-lead(f); // tail -x2z-xy2z+yz2

♦

1.4 Division Algorithm

The ordinary division algorithm for polynomials in one variable carries forward to the multivariate case by making use of a monomial ordering.

Theorem 1.18.Let>be a monomial ordering onNⁿ₀. Letf be a nonzero polynomial inK[X1, . . . , Xn] and letF= (f1, . . . , fm)be a sequence of nonzero polynomials inK[X1, . . . , Xn]. There are polynomials h1, . . . , hm andrin K[X1, . . . , Xn] such that

f =h1f1+· · ·+hmfm+r (1.11) and either r= 0or none of the terms in ris divisible by lt_>(f1), . . . ,lt_>(fm). Moreover, if hifi6= 0, thenlt_>(hifi)≤lt_>(f),1≤i≤m.

The proof is constructive and mimicks the division algorithm (Alg. 1.1).

Proof. First, puth1=. . .=hm= 0,r= 0, and s=f. Then we have

f =h1f1+· · ·+hmfm+ (r+s). (1.12) This equation serves as an invariant throughout the algorithm that proceeds in iterative steps. Ifs= 0, the algorithm terminates. Otherwise, there are two cases:

(24)

• Reduction step: If lt_>(s) is divisible by some lt_>(fi), 1≤i ≤m, then take the smallest indexi with this property and put

s=s−lt_>(s) lt_>fi

and hi=hi+ lt_>(s)

lt_>(fi). (1.13)

• Shifting step: Iflt_>(s) is not divisible by any of thelt_>(fi), 1≤i≤m, then put

r=r+lt_>(s) and s=s−lt_>(s). (1.14) In both cases, the equation (1.12) still holds. Moreover, if r6= 0, then the assertion that no term ofr is divisible by lt_>(fi), 1≤i≤m, inductively holds. The leading term of the polynomial sis strictly decreasing with respect to the monomial ordering after each of the assignments (1.13) and (1.14). Thus the sequence formed by the leading terms ofsin successive steps is strictly decreasing. By Corollary 1.13, the monomial ordering is a well-ordering and hence the sequence becomes stationary. Therefore, the division algorithm terminates withs= 0.

In view of the inequalities, the leading term ofsdecreases in each step and is either added to some product hifi (reduction step) or to the remainderr (shifting step). Moreover, in the reduction step, the leading term of sadded to the product hifi is the largest term added. Since lt_>(s) = lt_>(f) at

the start of the computation, the inequalities follows. ⊓⊔

The remainder on the division off byF is often denoted by r=f^F. Algorithm 1.1Division algorithm.

Require: nonzero polynomialsf andf1, . . . , fm inK[X1, . . . , Xn] Ensure: polynomialsh1, . . . , hm andr inK[X1, . . . , Xn] as in Thm. 1.18

h1←0, . . . , hm←0 r←0

s←f

whiles6= 0do i←1

division occurred←false

whilei≤mand division occurred = falsedo if lt(f[i]) divideslt(s)then

s←s−lt(s)/lt(fi)∗fi

hi←hi+lt(s)/lt(fi) division occurred←true else

i←i+ 1 end if end while

if division occurred = falsethen r←r+lt(s)

s←s−lt(s) end if

end while

(25)

1.4 Division Algorithm 13

Example 1.19.Consider the polynomialsf =X²Y +XY²+Y², f1 =Y²−1, andf2 =X−Y in Q[X, Y] using the lp ordering withX > Y. Initially, we have h1 =h2 = 0,r = 0, and s=f. First, lt_>(s) =X²Y is divisiblelt_>(f2) =X and so

s=s−X²Y

X (X−Y) = 2XY²+Y² and

h2=h2+X²Y

X =XY.

Second,lt_>(s) = 2XY² is divisiblelt_>(f1) =Y². Thus s=s−2XY²

Y² (Y²−1) =Y²+ 2X and

h1=h1+2XY² Y² = 2X.

Third,lt_>(s) = 2X is divisible bylt_>(f2) =X. So s=s−2X

X (X−Y) = 2Y +Y² and

h2=h2+2X

X =XY + 2.

Fourth,lt_>(s) =Y²is divisible by lt_>(f1) =Y². Thus s=s−Y²

Y²(Y²−1) = 2Y + 1 and

h1= 2X+ 1.

Fifth,lt_>(s) = 2Y is not divisible bylt_>(f1) =Y² orlt_>(f2) =X. It follows that r= 2Y and s= 1.

Sixth,lt_>(s) = 1 is not divisible bylt_>(f1) =Y² orlt_>(f2) =X. Consequently, r= 2Y + 1 and s= 0.

Therefore,

f = (2X+ 1)·(Y²−1) + (XY + 2)·(X−Y) + (2Y + 1) and f^F = 2Y + 1.

♦ Example 1.20 (Singular). The expression of a polynomial as a linear combination with remainder according to the division theorem is provided by the commanddivision, while the commandreduce only yields the remainder upon division.

(26)

> ring r = 0, (x,y), lp;

> ideal i = y2-1, x-y;

> poly f = x2y+xy2+y2;

> reduce(f,std(i)); // reduction by standard basis of i 2y+1

> division(f,i); // division with remainder [1]:

_[1,1]=2x+1 _[2,1]=xy+2 [2]:

_[1]=2y+1 [3]:

_[1,1]=1

♦

1.5 Groebner Bases

Groebner bases are specific generating sets of polynomial ideals.

Example 1.21.Take the polynomialsf =XY²−X,f1=Y²−1, andf2=XY + 1 inQ[X, Y] using thelpordering withX > Y. First, the division off intoF = (f1, f2) yields

f =X·(Y²−1) + 0·(XY + 1).

Second, the division off intoF^′= (f2, f1) gives

f =Y ·(XY + 1) + 0·(Y²−1) + (−X−Y).

Thus the division depends on the ordering of the polynomials in the sequence F. Moreover, the first representation shows that the polynomialf lies in the idealI=hf1, f2i, while this cannot be deduced

from the second representation. ♦

Let>be a monomial ordering onK[X1, . . . , Xn] and letIbe an ideal ofK[X1, . . . , Xn]. AGroebner basis of I with respect to > is a finite set of polynomials G = {g1, . . . , gs} in I such that for each nonzero polynomial f ∈ I, lt_>(f) is divisible by lt_>(gi) for some 1 ≤ i ≤ s. Groebner bases were invented by Bruno Buchberger in the 1960s and named after his advisor Walter Groebner (1899-1980).

Example 1.22 (Singular). Consider the idealI=hY²−1, XY + 1iinQ[X, Y] using thelpordering with X > Y. A Groebner basis of the ideal I can be computed by using the command groebneror std. The latter command is more general and can be applied to calculate standard bases of polynomial ideals.

> ring r = 0, (x,y), lp;

> ideal i = y2-1, xy+1;

> ideal j = std(i);

> j;

j[1]=y2-1 j[2]=x+y

(27)

1.5 Groebner Bases 15

The computed Groebner basis ofIis{Y²+ 1, X+Y}. ♦

Theorem 1.23.Each ideal I of K[X1, . . . , Xn] has a Groebner basis with respect to any monomial ordering.

Proof. Let>be a monomial ordering on K[X1, . . . , Xn]. Consider the set A={α∈Nⁿ₀ |X^α=lm_>(f) for somef ∈I}

of exponents of all leading monomials of the polynomials in the ideal I. By Dickson’s lemma, the set Ahas a Dickson basis B={β1, . . . , βs}, whereX^βⁱ =lm_>(gi) for somegi ∈I, 1≤i≤s. Letf be a nonzero polynomial ofI withlm_>(f) =X^α. Thenα=βi+γ for some 1≤i≤sand γ∈Nⁿ₀. Thus X^α =X^βⁱX^γ and hence the leading term of f is divisible by the leading term of gi. It follows that

{g1, . . . , gs}is a Groebner basis of I. ⊓⊔

Proposition 1.24 (Ideal Membership Test). Let>be a monomial ordering onK[X1, . . . , Xn]and letI=hg1, . . . , gsi be an ideal ofK[X1, . . . , Xn]. If G={g1, . . . , gs} is a Groebner basis ofI, then for each polynomial f inK[X1, . . . , Xn], we have f ∈I if and only iff^G = 0.

Proof. Letf ∈K[X1, . . . , Xn] whose division into G yieldsf =h1g1+. . .+hsgs+f^G. Let f^G = 0.

Thenf ∈I by definition ofI.

Conversely, let f ∈I. Thenf^G=f −(h1g1+. . .+hsgs) belongs toI. Assume thatf^G6= 0. Then lt_>(f^G) is divisible by somelt_>(gi), sinceGis a Groebner basis ofI. But this contradicts the division algorithm, since none of the terms in the remainderf^G is divisible by any of the termslt_>(gi). ⊓⊔ Corollary 1.25.Let>be a monomial ordering onK[X1, . . . , Xn]and letIbe an ideal ofK[X1, . . . , Xn].

If G={g1, . . . , gs} is a Groebner basis ofI with respect to >, thenI=hg1, . . . , gsi.

Proof. By definition, g1, . . . , gs∈I and thushg1, . . . , gsi ⊆I. Conversely, let f ∈I. The division of f intoGyieldsf =h1g1+. . .+hsgs+f^G. Thus by Prop. 1.24,f^G= 0 and hencef ∈ hg1, . . . , gsi. ⊓⊔ Proposition 1.26.Let G = {g1, . . . , gs} be a Groebner basis in K[X1, . . . , Xn] with respect to any monomial ordering >. For each polynomial f in K[X1, . . . , Xn], the remainder f^G is uniquely determined and independent of the order of the elements inG.

Proof. Letf be a polynomial inK[X1, . . . , Xn] and let I =hg1, . . . , gsi. First, assume that there are two expressionsf =h1g1+. . .+hsgs+randf =h^′₁g1+. . .+h^′_sgs+r^′ as given by the division theorem.

Thenr^′−r= (h1−h^′₁)g1+. . .+ (hs−h^′_s)gslies inI. Suppose thatr^′−r6= 0. SinceGis a Groebner basis of I, the leading term of r^′−r is divisible by the leading term of some gi, 1 ≤i≤s. But this contracts the fact that rand r^′ are remainders and so none of their terms are divisible by any of the gi, 1≤i≤s.

Second, let G^′ be a permutation of the Groebner basisG. Then the division algorithm yieldsf = h^′₁g1+. . .+h^′_sgs+f^G^′. But the remainder is uniquely determined and thereforef^G=f^G^′. ⊓⊔ The remainder on division of a polynomialf by a Groebner basis of an idealIis a uniquely determined normal form of f modulo I depending only on the monomial ordering and not how the division is performed.

Theorem 1.27. (Hilbert Basis Theorem) Each ideal I ofK[X1, . . . , Xn]is finitely generated.

(28)

The proof follows directly from Thm. 1.23 and Cor. 1.25.

A ring isNoetherian if each ideal ofRis finitely generated.

Theorem 1.28.The following conditions of a ringR are equivalent:

1. Each ideal ofR is finitely generated (that is,R is Noetherian).

2. Each ascending chain of ideals I1⊂I2⊂ · · · inR becomes stationary (that is, there is an index j0

such that Ij=Ij0 for allj ≥j0).

3. Each nonempty set of ideals inR contains a maximal element (with respect to inclusion).

Proof. Suppose each ideal ofR is finitely generated. Assume thatI1 ⊂I2⊂ · · ·is an ascending chain of ideals in R. ThenI = S

jIj is an ideal in R and by hypothesis has a finite generating set G. If G⊂I1∪. . .∪Ij0, thenIj =Ij0 for allj ≥j0.

Suppose that each ascencing chain of ideals in Rbecome stationary. Assume thatS is a nonempty set of ideals in R. If I1 ∈ S is not maximal in S, then there exists an ideal I2 in S that properly containsI1. Continuing like this gives an ascending chain of ideals in S that will become stationary.

ThenIj=Ij0 for allj≥j0andIj0 is maximal in S.

Suppose that each nonempty set of ideals in R contains a maximal element. Let I be an ideal of R, and let S be a set of ideals J ⊆I of R that are finitely generated. Then S is nonempty and by hypothesis contains a maximal elementJ0=hf1, . . . , fsi. Assume thatI6=J0. Then there is an element f ∈I\J0 and sohf, f1, . . . , fsi will be a finitely generated ideal inI that properly containsJ0. This contradicts the maximality ofJ0. Hence,I is finitely generated. ⊓⊔

By the Hilbert basis theorem and the above result, we obtain the following.

Corollary 1.29.The polynomial ring K[X1, . . . , Xn] is Noetherian.

1.6 Computation of Groebner Bases

The basic algorithm for the computation of a Groebner basis of an ideal in K[X1, . . . , Xn] is due to Buchberger.

Let>be a monomial ordering onK[X1, . . . , Xn] and letf, g∈K[X1, . . . , Xn]\ {0}withlm_>(f) = X^αandlm_>(g) =X^β, respectively. The least common multiple of αandβ w.r.t. the natural ordering onNⁿ₀ is

γ= lcm(α, β) = (max{α1, β1}, . . . ,max{αn, βn}) Then the least common multiple ofX^αandX^β w.r.t. the relation of divisibility is

X^γ = lcm(X^α, X^β).

Define theS-polynomial off andg as

S(f, g) = X^γ

lt_>(f)·f − X^γ

lt_>(g)·g. (1.15)

Note that S(f, g) lies in the ideal generated by f and g. Moreover, in S(f, g) the leading terms of f andg cancel and thusS(f, g) exhibits a new leading term.

(29)

1.6 Computation of Groebner Bases 17

Example 1.30.Consider the polynomialsf = 2Y²+Z²andg= 3X²Y+Y ZinQ[X, Y, Z] with respect to thelpordering withX > Y > Z. Thenlm_>(f) =Y²,lm_>(g) =X²Y, and lcm(lt_>(f),lt_>(g)) = X²Y². Thus

S(f, g) = X²Y²

2Y² ·f −X²Y² 3X²Y ·g= 1

2X²Z²−1 3Y²Z.

♦ Theorem 1.31. (Buchberger’s S-Criterion) Let > be a monomial ordering on K[X1, . . . , Xn]. A setG={g1, . . . , gs}of polynomials inK[X1, . . . , Xn]is a Groebner basis of the idealI=hg1, . . . , gsiif and only ifS(gi, gj)^G= 0 for all pairsi6=j.

Proof. LetGbe a Groebner basis ofI. Since each S-polynomialS(gi, gj) belongs toI, it follows from Prop. 1.24 thatS(gi, gj)^G= 0.

Conversely, assume thatS(gi, gj)^G= 0 for all pairsi6=j. Letf ∈I. Write f =h1g1+. . .+hsgs,

whereh1, . . . , hs∈K[X1, . . . , Xn]. Letlt_>(f) =cX^α,lt_>(gi) =ciX^αⁱ, andlt_>(hi) =diX^βⁱ, 1≤i≤ s. Define δ = max>{αi+βi |1 ≤i ≤s}. The above equation shows that the leading term of f is a K-linear combination of the leading termslt_>(higi),higi6= 0, and thereforeα≤δ.

Ifδ=α, we can assume thatδ=α1+β1=. . .=αr+βr, where r≤sandhigi6= 0 for 1≤i≤r.

Then

cX^α= (c1d1+. . .+crdr)X^δ.

Thus lt_>(g1) = c1X^α¹ divides lt_>(f) = cX^α. If all nonzero polynomials f ofI have this property, thenGis a Groebner basis ofI.

Ifα < δ, the maximal leading terms on the right-hand side of the representation off must cancel.

By the above notation, we obtain

c1d1+. . .+crdr= 0. (1.16)

Write the polynomial f in the form

f =C+ (h1−lt_>(h1))g1+. . .+ (hr−lt_>(hr))gr+hr+1gr+1. . .+hsgs, whereC=lt_>(h1)g1+. . .+lt_>(hr)gr. By puttingki=X^βⁱgi/ci, 1≤i≤r, we obtain

C=c1d1k1+. . .+crdrkr (1.17)

=c1d1(k1−k2) + (c1d1+c2d2)(k2−k3) + (c1d1+c2d2+c3d3)(k3−k4) +. . . . . .+ (c1d1+. . .+cr−1dr−1)(kr−1−kr) + (c1d1+. . .+crdr)kr.

ThusCis a linear combination ofki−kj, 1≤i < j≤r. DefineX^α^i,j as the least common multiple of X^αⁱ andX^α^j. Then there existsξ∈Nⁿ₀ so thatξ+αi,j =αi+βi=αj+βj, 1≤i < j≤r. We have

ki−kj= X^βⁱgi

ci −X^β^jgj

cj

=X^ξ

X^α^i,jgi

ciX^αⁱ −X^α^i,jgj

cjX^α^j

=X^ξS(gi, gj)

(30)

andlt_>(ki−kj)< δ, 1≤i < j≤r. It follows from (1.16) and (1.17) that C=c^′₁X^ξ¹S(g1, g2) +. . .+c^′_r−1X^ξ^r−1S(gr−1, gr), wherec^′₁, . . . , c^′_r−1∈Kandξ1, . . . , ξr−1∈Nⁿ₀. By hypothesis,

S(gi, gj) =h^ij₁g1+. . .+h^ij_sgs

for some polynomials hîj₁, . . . , hîj_s with lt_>(hîj_l ) ≤ lt_>(S(gi, gj)), 1 ≤ i < j ≤ s and 1 ≤ l ≤ s. It follows that the polynomial C can be written as a linear combination of the polynomials g1, . . . , gs. Thus by (1.17), the polynomialf can be expressed as a linear combination of the polynomialsg1, . . . , gs,

f =h^′₁g1+. . .+h^′_sgs,

where max>{lt_>(h^′_igi)|h^′_igi6= 0,1≤i≤s}< δ. Since each monomial ordering is a well-ordering, we obtain by continuing in this way an expression

f =h^′′₁g1+. . .+h^′′_sgs,

where the leading monomialX^δon the right-hand side equalslt_>(f). Then the caseα=δwill establish

the result. ⊓⊔

Buchberger’s S-criterion can be used to calculate a Groebner basis of a given ideal (Alg. 1.2).

Algorithm 1.2Buchberger’s algorithm.

Require: I=hf1, . . . , fmiideal ofK[X, . . . , Xn],F={f1, . . . , fm} Ensure: Groebner basisGofI withF ⊆G

G←F repeat

G^′←G

foreach pairf6=ginG^′do S←S(f, g)^G^′

if S6= 0then G←G∪ {S}

end if end for untilG=G^′

Theorem 1.32.Buchberger’s algorithm terminates and the output is a Groebner basis.

Proof. First, we prove correction. Claim that at each stepG⊆I. Indeed, this is true at the start of the algorithm. SupposeG⊆I holds at the beginning of some pass and putG={g1, . . . , gs}. Then for allf, g∈G, S(f, g)∈ hf, gi ⊆I. Moreover, the division algorithm givesS(f, g) =h1g1+. . .+hsgs+ S(f, g)^G. ThusS(f, g)^G ∈Iand henceG⊆I after each pass. Upon termination, the remainders of the S-polynomials divided by the current setGare 0. In this case, Buchberger’s S-criterion shows that the setGis a Groebner basis.

(31)

1.7 Reduced Groebner Bases 19

Second, we show termination. For this, consider the ideal of leading terms ofG={g1, . . . , gs}given by

hLT(G)i=hlt(g1), . . . ,lt(gs)i.

In each pass, the setGis replaced by a new setG^′. IfG6=G^′, there is at least one remainderr=S(f, g)^G withf, g ∈G which is added toG^′. Since no term of r is divisible by any of the leading terms of the polynomials inG, we have

hLT(G)i ⊂ hLT(G^′)i.

This gives an ascending chain of ideals of K[X1, . . . , Xn]. But the polynomial ring K[X1, . . . , Xn] is Noetherian and so the chain becomes stationary; that is, at some pass

hLT(G)i=hLT(G^′)i.

ThusG=G^′ by the wayG^′ is constructed fromGand hence the algorithm stops. ⊓⊔ Example 1.33.Consider the idealI=hY²+Z², X²Y +Y ZiinQ[X, Y, Z] with respect to thelpordering withX > Y > Z. The following session provides a Groebner basis ofIaccording to Buchberger’s algorithm:

> LIB "teachstd.lib"; // library for command spoly

> ring r = 0, (x,y,z), lp;

> ideal i = y2+z2, x2y+yz;

> reduce(spoly(y2+z2, x2y+yz), i);

x2z2+z3

> ideal j = y2+z2, x2y+yz, x2z2+z3;

> reduce(spoly(y2+z2, x2y+yz), j);

0

> reduce(spoly(y2+z2, x2z2+z3), j);

0

> reduce(spoly(x2y+yz, x2z2+z3), j);

0

It follows that{Y²+Z², X²Y +Y Z, X²Z²+Z³}is a Groebner basis of I. ♦

1.7 Reduced Groebner Bases

Groebner bases are not unique since a Groeber basis remains a Groebner basis if an arbitrary polynomial is added. It will shown that reduced Groebner bases are unique.

Let>be a monomial ordering onK[X1, . . . , Xn]. A Groebner basisG={g1, . . . , gs}inK[X1, . . . , Xn] isminimal if the polynomialsg1, . . . , gs are monic andlt_>(gi) is not divisible bylt_>(gj) for any pair i6=j.

Proposition 1.34.Each nonzero idealI of K[X1, . . . , Xn]has a minimal Groebner basis with respect to any monomial ordering.

(32)

Proof. LetG={g1, . . . , gs} be a Groebner basis of the idealI with respect to the monomial ordering

>. We may assume that each generator gi is monic by multiplying gi with the inverse of its leading coefficient, 1≤i≤s.

SupposeGis not minimal. We may assume thatlt_>(g1) is divisible bylt_>(gi) for some 2≤i≤s.

By reduction, the polynomial

h=g1−lt_>(g1)

lt_>(gi)gi (1.18)

belongs to I and, by Prop. 1.24, its division intoGyields h^G = 0. But the leading term of g1 cancels in (1.18) and is larger than the leading term ofh. Thus the polynomialg1 cannot be used during the division of hby the basis G. Hence, the polynomialhis a linear combination ofg2, . . . , gs. It follows that by (1.18), the generatorg1 is also a linear combination ofg2, . . . , gs. Therefore,G^′={g2, . . . , gs} also generates the ideal I. Moreover,G^′ is a Groebner basis since if the leading term of a polynomial f ∈I is divisible bylt_>(g1), then it is also divisible bylt_>(gi).

Repeating the above argument leads to a minimal Groebner basis in a finite number of steps. ⊓⊔ Let>be a monomial ordering onK[X1, . . . , Xn]. A Groebner basisG={g1, . . . , gs}inK[X1, . . . , Xn] isreduced ifGis a minimal Groebner basis and no term ingiis divisible bylt_>(gj) for any pairi6=j.

Proposition 1.35.Each nonzero ideal I of K[X1, . . . , Xn] has a unique reduced Groebner basis with respect to any monomial ordering.

Proof. Let{f1, . . . , fr} and{g1, . . . , gs}be reduced Groebner bases ofI with respect to the monomial ordering>.

Claim that r = s and after reordering lt_>(f1) = lt_>(g1), . . . ,lt_>(fs) = lt_>(gs). Indeed, by definition of Groebner bases, lt_>(g1) is divisible by somelt_>(fi), 1 ≤i ≤ r. We may assume that i= 1. Moreover, lt_>(f1) is divisible by some lt_>(gj), 1≤j ≤s. Thenlt_>(gj) divideslt_>(g1). By minimality, we have j = 1. Since f1 and g1 are monic, it follows that lt_>(f1) =lt_>(g1). The same argument applies to the other generators. In this way, we obtain the desired result.

Claim that f1=g1, . . . , fs=gs. Indeed, consider the polynomialf1−g1. The first assertion shows that the leading terms inf1 and g1 cancel. From this and the definition of reduced Groebner bases it follows that no term in f1−g1 is divisible by lt_>(f1) =lt_>(g1),lt_>(f2) =lt_>(g2), . . .lt_>(fs) = lt(gs). Thus iff1−g1 is divided into (f1, . . . , fs), it is already the remainder. Butf1−g1∈Iand so it follows from Prop. 1.24 that (f1−g1)^G= 0. Hence, f1=g1. The same procedure applies to the other generators and the claim follows.

Finally, claim that the ideal I has a reduced Groebner basis. Indeed, the ideal I has a minimal Groebner basis{g1, . . . , gs}by Prop. 1.34. First, replaceg1by the remainder ofg1modulo (g2, . . . , gs).

By the division algorithm, none of the terms of the newg1is divisible bylt_>(g2), . . . ,lt_>(gs). Moreover, by minimality, the leading term of the original g1 will be shifted to the newg1. Second, substituteg2

by the remainder of g2 modulo (g1, g3, . . . , gs). This procedure is Continued until gs is replaced by the remainder ofgsmodulo (g1, . . . , gs−1). Then the leading terms of the original generatorsg1, . . . , gs

will survive and thus the new generators g1, . . . , gs will still form a Groebner basis. Furthermore, by construction, none of the terms ingi is divisible bylt_>(gj) by any pairi6=j. Hence, we end up with

a reduced Groebner basis as claimed. ⊓⊔

Example 1.36 (Singular).The commandsgroebnerandstdcompute reduced Groebner bases with respect to (global) monomial orderings.

Algebraic Statistics

Algebraic Statistics

Karl-Heinz Zimmermann

Algebraic Statistics

Hamburg University of Technology

For my Teachers

Thomas Beth

Adalbert Kerber

Sun-Yuan Kung

Horst M¨ uller

Preface

Contents

Part I

Algebraic and Combinatorial Methods

1

Commutative Algebra

1.1 Polynomial Rings

1.2 Ideals

1.3 Monomial Orders

1.4 Division Algorithm

1.5 Groebner Bases

1.6 Computation of Groebner Bases

1.7 Reduced Groebner Bases