Polynomials: a precise definition - A version without solutions

As I have already mentioned in the above list of prerequisites, the notion of poly-nomials (in one and in several indeterminates) will be occasionally used in these notes. Most likely, the reader already has at least a vague understanding of this notion (e.g., from high school); this vague understanding is probably sufficient for reading most of these notes. But polynomials are one of the most important notions in algebra (if not to say in mathematics), and the reader will likely encounter them over and over; sooner or later, it will happen that the vague understanding is not sufficient and some subtleties do matter. For that reason, anyone serious about do-ing abstract algebra should know a complete and correct definition of polynomials and have some experience working with it. I shall not give a complete definition of the most general notion of polynomials in these notes, but I will comment on some of the subtleties and define an important special case (that of polynomials in one variable with rational coefficients) in the present section. A reader is probably best advised to skip this section on their first read.

It is not easy to find a good (formal and sufficiently general) treatment of poly-nomials in textbooks. Various authors tend to skimp on subtleties and technical points such as the notion of an “indeterminate”, or the precise meaning of “formal expression” in the slogan “a polynomial is a formal expression” (the best texts do not use this vague slogan at all), or the definition of the degree of the zero polyno-mial, or the difference between regarding polynomials as sequences (which is the

classical viewpoint and particularly useful for polynomials in one variable) and re-garding polynomials as elements of a monoid ring (which is important in the case of several variables, since it allows us to regard the polynomial rings Q[X] and Q[Y] as two distinct subrings of Q[X,Y]). They also tend to take some question-able shortcuts, such as defining polynomials innvariables (by induction overn) as one-variable polynomials over the ring of(n−₁)-variable polynomials (this short-cut has several shortcomings, such as making the symmetric role of thenvariables opaque, and functioning only for finitely many variables).

More often than not, the polynomials we will be using will be polynomials in one variable. These are usually handled well in good books on abstract algebra – e.g., in [Walker87, §4.5], in [Hunger14, Appendix G], in [Hunger03, Chapter III,

§5], in [Rotman15, Chapter A-3], in [HofKun71, §4.1, §4.2] (although in [HofKun71,

§4.1, §4.2], only polynomials over fields are studied, but the definition applies to commutative rings mutatis mutandis), in [AmaEsc05, §8], and in [BirMac99, Chap-ter III, §6]. Most of these treatments rely on the notion of acommutative ring, which is not difficult but somewhat abstract (I shall introduce it below in Section 6.1).

Let me give a brief survey of the notion of univariate polynomials (i.e., polyno-mials in one variable). I shall define them as sequences. For the sake of simplicity, I shall only talk of polynomials with rational coefficients. Similarly, one can de-fine polynomials with integer coefficients, with real coefficients, or with complex coefficients; of course, one then has to replace each “Q” by a “Z”, an “R” or a “C”.

The rough idea behind the definition of a polynomial is that a polynomial with rational coefficients should be a “formal expression” which is built out of rational numbers, an “indeterminate” Xas well as addition, subtraction and multiplication signs, such as X⁴−27X+³

2 or−X³+2X+1 or 1

3(X−3)·X²or X⁴+7X³(X−2) or−15. We have not explicitly allowed powers, but we understandXⁿ to mean the product XX· · ·_X

| {z }

ntimes

(which is 1 when n =0). Notice that division is not allowed, so we cannot get X

X+₁ (but we can get 3

2X, because 3

2 is a rational number). Notice also that a polynomial can be a single rational number, since we never said thatX must necessarily be used; for instance,−15 and 0 are polynomials.

This is, of course, not a valid definition. One problem with it that it does not explain what a “formal expression” is. For starters, we want an expression that is well-defined – i.e., into that we can substitute a rational number for X and obtain a valid term. For example, X−+·5 is not well-defined, so it does not fit our bill;

neither is the “empty expression”. Furthermore, when do we want two “formal expressions” to be viewed as one and the same polynomial? Do we want to equate X(X+2) with X²+2X ? Do we want to equate 0X³+2X+1 with 2X+1 ? The answer is “yes” both times, but a general rule is not easy to give if we keep talking of “formal expressions”.

We could define two polynomials p(X) and q(X) to be equal if and only if, for every number α ∈ Q, the values p(α) and q(α) (obtained by substituting α for

X in p and in q, respectively) are equal. This would be tantamount to treating polynomials as functions: it would mean that we identify a polynomial p(X) with the function Q → _Q, _α 7→ _p(α). Such a definition would work well as long as we would do only rather basic things with it²⁵, but as soon as we would try to go deeper, we would encounter technical issues which would make it inadequate and painful²⁶. Also, if we equated polynomials with the functions they describe, then we would waste the word “polynomial” on a concept (a function described by a polynomial) that already has a word for it (namely,polynomial function).

25And some authors, such as Axler in [Axler15, Chapter 4], do use this definition.

26Here are three of these issues:

• One of the strengths of polynomials is that we can evaluate them not only at numbers, but also at many other things, e.g., at square matrices: Evaluating the polynomial X²−3X at the square matrix However, a function must have a well-defined domain, and does not make sense outside of this domain. So, if the polynomial X²−3Xis regarded as the function Q→_Q, α 7→

α²−3α, then it makes no sense to evaluate this polynomial at the matrix

1 3

−1 2

, just because this matrix does not lie in the domain Q of the function. We could, of course, extend the domain of the function to (say) the set of square matrices overQ, but then we would still have the same problem with other things that we want to evaluate polynomials at. At some point we want to be able to evaluate polynomials at functions and at other polynomials, and if we would try to achieve this by extending the domain, we would have to do this over and over, because each time we extend the domain, we get even more polynomials to evaluate our polynomials at; thus, the definition would be eternally “hunting its own tail”! (We could resolve this difficulty by defining polynomials asnatural transformations in the sense of category theory. I do not want to even go into this definition here, as it would take several pages to properly introduce. At this point, it is not worth the hassle.)

• Let p(X) be a polynomial with real coefficients. Then, it should be obvious that p(X) can also be viewed as a polynomial with complex coefficients: For instance, if p(X)was defined as 3X+⁷

2X(X−1), then we can view the numbers 3, 7

2 and−1 appearing in its definition as complex numbers, and thus get a polynomial with complex coefficients. But wait! What if two polynomials p(X)and q(X) are equal when viewed as polynomials with real coefficients, but become distinct when viewed as polynomials with complex coefficients (because when we view them as polynomials with complex coefficients, their domains grow larger to include complex numbers, and a new complexαmight perhaps no longer satisfyp(α) =q(α))? This does not actually happen, but ruling this out is not obvious if you regard polynomials as functions.

• (This requires some familiarity with finite fields:) Treating polynomials as functions works reasonably well for polynomials with integer, rational, real and complex coefficients (as long as one is not too demanding). But we will eventually want to consider polynomials with coefficients in any arbitrary commutative ringK. An example for a commutative ring Kis the finite fieldFpwith pelements, wherep is a prime. (This finite fieldFpis better known as the ring of integers modulop.) If we define polynomials with coefficients inFp

as functionsFp→_F_p, then we really run into problems; for example, the polynomialsX andX^pover this field become identical as functions!

The preceding paragraphs indicate that it is worth defining “polynomials” in a way that, on the one hand, conveys the idea that they are more “formal ex-pressions” than “functions”, but on the other hand, is less nebulous than “formal expression”. Here is one such definition:

Definition 1.7. (a) A univariate polynomial with rational coefficients means a se-quence (p₀,p₁,p₂, . . .)∈ _Q^∞ of elements ofQsuch that

all but finitely manyk ∈_N_satisfy p_k =_0. ₍₄₁₎ Here, the phrase “all but finitely many k ∈ _N satisfy pk = 0” means “there exists some finite subset J ofNsuch that every k ∈ _N\ J satisfies p_k =0”. (See Definition 5.17 for the general definition of “all but finitely many”, and Section 5.4 for some practice with this concept.) More concretely, the condition (41) can be rewritten as follows: The sequence (p₀,p₁,p₂, . . .) contains only zeroes from some point on (i.e., there exists some N ∈ _N such that pN = pN+₁ = pN+2 =

· · ·=0).

For the remainder of this definition, “univariate polynomial with rational co-efficients” will be abbreviated as “polynomial”.

For example, the sequences (0, 0, 0, . . .), (1, 3, 5, 0, 0, 0, . . .),

4, 0,−²

3, 5, 0, 0, 0, . . .

0,−1, 1

2, 0, 0, 0, . . .

(where the “. . .” stand for in-finitely many zeroes) are polynomials, but the sequence (1, 1, 1, . . .) (where the

“. . .” stands for infinitely many 1’s) is not (since it does not satisfy (41)).

So we have defined a polynomial as an infinite sequence of rational num-bers with a certain property. So far, this does not seem to reflect any intuition of polynomials as “formal expressions”. However, we shall soon (namely, in Definition 1.7 (j)) identify the polynomial (p0,p1,p2, . . .) ∈ _Q^∞ with the “for-mal expression” p0+p1X+p2X²+· · · (this is an infinite sum, but due to (41) all but its first few terms are 0 and thus can be neglected). For instance, the polynomial (1, 3, 5, 0, 0, 0, . . .) will be identified with the “formal expression”

1+3X +5X²+0X³+0X⁴+0X⁵+· · · = 1+3X +5X². Of course, we can-not do this identification right now, since we do can-not have a reasonable definition of X.

(b) We let Q[X] denote the set of all univariate polynomials with rational coefficients. Given a polynomial p = (p₀,p₁,p₂, . . .) ∈ _Q[X], we denote the numbers p0,p1,p2, . . . as the coefficientsof p. More precisely, for everyi ∈ _{N, we} shall refer topias thei-th coefficientofp. (Do not forget that we are counting from 0 here: any polynomial “begins” with its 0-th coefficient.) The 0-th coefficient of p is also known as theconstant termof p.

Instead of “the i-th coefficient of p”, we often also say “the coefficient before Xⁱ of p” or “thecoefficient of Xⁱ in p”.

Thus, any polynomial p∈ _Q[X] is the sequence of its coefficients.

(c)We denote the polynomial(0, 0, 0, . . .) ∈ _Q[X]by0. We will also write 0 for it when no confusion with the number 0 is possible. The polynomial 0 is called thezero polynomial. A polynomial p ∈_Q[X] is said to benonzero if p 6=0.

(d)We denote the polynomial (1, 0, 0, 0, . . .) ∈ _Q[X] by 1. We will also write 1 for it when no confusion with the number 1 is possible.

(e)For anyλ∈ Q, we denote the polynomial(λ, 0, 0, 0, . . .) ∈ _Q[X]by constλ.

We call it the constant polynomial with valueλ. It is often useful to identifyλ ∈_Q with constλ∈ _Q[X]. Notice that0=const 0 and1 =const 1.

(f) Now, let us define the sum, the difference and the product of two poly-nomials. Indeed, let a = (a0,a₁,a2, . . .) ∈ _Q[X] and b = (b0,b₁,b2, . . .) ∈ _Q[X] be two polynomials. Then, we define three polynomials a+b, a−b and a·b in Q[X]by

a+b= (a0+b0,a1+b1,a2+b2, . . .); a−b= (a0−b0,a1−b1,a2−b2, . . .);

a·b= (c0,c1,c2, . . .), where

c_k =

∑

k i=₀

a_ib_k−i for everyk∈ _N.

We calla+bthesumofaandb; we call a−b thedifferenceofaandb; we calla·b theproductof aand b. We abbreviatea·b by ab, and we abbreviate 0−a by−a.

For example,

(1, 2, 2, 0, 0, . . .) + (3, 0,−1, 0, 0, 0, . . .) = (4, 2, 1, 0, 0, 0, . . .); (1, 2, 2, 0, 0, . . .)−(3, 0,−1, 0, 0, 0, . . .) = (−2, 2, 3, 0, 0, 0, . . .);

(1, 2, 2, 0, 0, . . .)·(3, 0,−1, 0, 0, 0, . . .) = (3, 6, 5,−_2,−2, 0, 0, 0, . . .).

The definition ofa+bessentially says that “polynomials are added coefficien-twise” (i.e., in order to obtain the sum of two polynomials a and b, it suffices to add each coefficient of a to the corresponding coefficient of b). Similarly, the definition of a−_b says the same thing about subtraction. The definition of a·b is more surprising. However, it loses its mystique when we identify the polynomials a and b with the “formal expressions” a0+a1X+a2X²+· · · and b0+b1X+b2X²+· · · (although, at this point, we do not know what these ex-pressions really mean); indeed, it simply says that

a0+a1X+a2X²+· · · b0+b1X+b2X²+· · ·=c0+c1X+c2X²+· · · , where c_k = _∑^k

i=0

a_ib_k−i for every k ∈ N. This is precisely what one would ex-pect, because if you expand a0+a1X+a2X²+· · · _b₀+b1X+b2X²+· · · us-ing the distributive law and collect equal powers of X, then you get precisely c0+c1X+c2X²+· · ·. Thus, the definition of a·bhas been tailored to make the distributive law hold.

(By the way, why is a·b a polynomial? That is, why does it satisfy (41) ? The proof is easy, but we omit it.)

Addition, subtraction and multiplication of polynomials satisfy some of the same rules as addition, subtraction and multiplication of numbers. For example, the commutative laws a+b = b+a and ab = ba are valid for polynomials just as they are for numbers; the same holds for the associative laws (a+b) +c = a+ (b+c) and (ab)c = a(bc) and the distributive laws (a+b)c = ac+bc and a(b+c) = ab+ac. Moreover, each polynomial asatisfies a+0 =0+a= a and a·0=0·a =0and a·1 =1·a= aand a+ (−a) = (−a) +a=0.

Using the notations of Definition 6.2, we can summarize this as follows: The set Q[X], endowed with the operations + and · just defined, and with the ele-ments 0 and 1, is a commutative ring. It is called the(univariate) polynomial ring overQ.

(g) Let a = (a0,a1,a2, . . .) ∈ Q[X] and λ ∈ Q. Then, λa denotes the polyno-mial (λa₀,λa₁,λa₂, . . .) ∈ _Q[X]. (This equals the polynomial (constλ)·a; thus, identifying λwith constλ does not cause any inconsistencies here.)

(h)If p = (p0,p1,p2, . . .) ∈ _Q[X] is a nonzero polynomial, then the degreeof p is defined to be the maximum i ∈ _N satisfying p_i 6= 0. If p ∈ _Q[X] is the zero polynomial, then the degree of pis defined to be −_{∞. (Here,}−_∞ is just a fancy symbol, not a number.) The degree of a polynomial p ∈ _Q[X] is denoted degp.

For example, deg(0, 4, 0,−1, 0, 0, 0, . . .) =3.

(i) If a = (a0,a1,a2, . . .) ∈ _Q[X] and n ∈ N, then a polynomial aⁿ ∈ _Q[X] is defined to be the product aa· · ·a

| {z }

ntimes

. (This is understood to be 1 when n = 0. In general, an empty product of polynomials is always understood to be 1.)

(j) We let X denote the polynomial(0, 1, 0, 0, 0, . . .) ∈ _Q[X]. (This is the nomial whose 1-st coefficient is 1 and whose other coefficients are 0.) This poly-nomial is called the indeterminateof Q[X]. It is easy to see that, for any n ∈ _N, we have

Xⁿ =



0, 0, . . . , 0

| {z }

nzeroes

, 1, 0, 0, 0, . . .



.

This polynomial X finally provides an answer to the questions “what is an indeterminate” and “what is a formal expression”. Namely, let (p0,p1,p2, . . .) ∈ Q[X] be any polynomial. Then, the sum p₀+p₁X+p₂X²+· · · is well-defined (it is an infinite sum, but due to (41) it has only finitely many nonzero addends), and it is easy to see that this sum equals (p0,p1,p2, . . .). Thus,

(p₀,p₁,p₂, . . .) = p₀+p₁X+p₂X²+· · · for every (p₀,p₁,p₂, . . .) ∈ _Q[X]. This finally allows us to write a polynomial (p0,p1,p2, . . .)as a sum p0+p1X+ p2X²+· · · while remaining honest; the sum p0+p1X+p2X²+· · · is no longer a “formal expression” of unclear meaning, nor a function, but it is just an al-ternative way to write the sequence (p0,p1,p2, . . .). So, at last, our notion of a polynomial resembles the intuitive notion of a polynomial!

Of course, we can write polynomials as finite sums as well. Indeed, if (p0,p1,p2, . . .) ∈ _Q[X] is a polynomial and N is a nonnegative integer such

that every n> N satisfies pn =0, then

(p₀,p₁,p₂, . . .) = p₀+p₁X+p₂X²+· · ·= p₀+p₁X+· · ·+p_NX^N (because addends can be discarded when they are 0). For example,

(4, 1, 0, 0, 0, . . .) = 4+1X =₄+X and 1

2, 0,1

3, 0, 0, 0, . . .

= ¹

2+0X+¹

3X²= ¹ 2+ ¹

3X².

(k)For our definition of polynomials to be fully compatible with our intuition, we are missing only one more thing: a way to evaluate a polynomial at a number, or some other object (e.g., another polynomial or a function). This is easy: Let p = (p0,p₁,p2, . . .) ∈ _Q[X] be a polynomial, and let α ∈ _{Q. Then,} p(α) means the numberp0+p1α+p2α²+· · · ∈ Q. (Again, the infinite sum p0+p1α+p2α²+

· · · makes sense because of (41).) Similarly, we can define p(_α)whenα ∈ _R(but in this case, p(α)will be an element ofR) or whenα∈ _C(in this case,p(α) ∈ _C) or when α is a square matrix with rational entries (in this case, p(α) will also be such a matrix) or when α is another polynomial (in this case, p(_α) is such a polynomial as well).

For example, if p = (1,−2, 0, 3, 0, 0, 0, . . .) = 1−2X+3X³, then p(_α) = 1− 2α+3α³ for everyα.

The map Q→ _Q, _α 7→ p(_α) is called the polynomial function described by p. As we said above, this function is not p, and it is not a good idea to equate it with p.

If α is a number (or a square matrix, or another polynomial), then p(α) is called the result of evaluating p at X = α (or, simply, evaluating p at α), or the result of substituting α for X in p. This notation, of course, reminds of functions;

nevertheless, (as we already said a few times) pisnot a function.

Probably the simplest three cases of evaluation are the following ones:

• We have p(0) = p0+p10¹+p20²+· · · = p0. In other words, evaluating p at X=0 yields the constant term of p.

• We have p(1) = p₀+ p₁1¹+ p₂1²+· · · = p₀+p₁+p₂+· · ·_{. In other} words, evaluating p atX =1 yields the sum of all coefficients of p.

• We have p(X) = p0+p1X¹+p2X²+· · · = p0+p1X+p2X²+· · · = p.

In other words, evaluating p at X = X yields p itself. This allows us to write p(X) for p. Many authors do so, just in order to stress that p is a polynomial and that the indeterminate is called X. It should be kept in mind that X isnot a variable (just as p isnot a function); it is the (fixed!) sequence (0, 1, 0, 0, 0, . . .) ∈ _Q[X] which serves as the indeterminate for polynomials inQ[X].

(l)Often, one wants (or is required) to give an indeterminate a name other than X. (For instance, instead of polynomials with rational coefficients, we could be considering polynomials whose coefficients themselves are polynomials in Q[X]; and then, we would not be allowed to use the letter X for the “new”

indeterminate anymore, as it already means the indeterminate of Q[X] !) This can be done, and the rules are the following: Any letter (that does not already have a meaning) can be used to denote the indeterminate; but then, the set of all polynomials has to be renamed asQ[η], whereη is this letter. For instance, if we want to denote the indeterminate as x, then we have to denote the set by Q[x].

It is furthermore convenient to regard the sets Q[_η] for different letters η as distinct. Thus, for example, the polynomial 3X²+1 is not the same as the polynomial 3Y²+1. (The reason for doing so is that one sometimes wishes to view both of these polynomials as polynomials in the two variables X and Y.) Formally speaking, this means that we should define a polynomial in Q[η] to be not just a sequence (p0,p1,p2, . . .) of rational numbers, but actually a pair ((p₀,p₁,p₂, . . .), “η”) of a sequence of rational numbers and the letter η. (Here,

“η” really means the letterη, not the sequence(0, 1, 0, 0, 0, . . .).) This is, of course, a very technical point which is of little relevance to most of mathematics; it be-comes important when one tries to implement polynomials in a programming language.

(m) As already explained, we can replaceQby Z, R, C or any other commu-tative ring K in the above definition. (See Definition 6.2 for the definition of a commutative ring.) When Qis replaced by a commutative ringK, the notion of

“univariate polynomials with rational coefficients” becomes “univariate polyno-mials with coefficients in K” (also known as “univariate polynomials over K”), and the set of such polynomials is denoted byK[X]rather thanQ[X].

So much for univariate polynomials.

Polynomials in multiple variables are (in my opinion) treated the best in [Lang02, Chapter II, §3], where they are introduced as elements of a monoid ring. However, this treatment is rather abstract and uses a good deal of algebraic language²⁷. The treatments in [Walker87, §4.5], in [Rotman15, Chapter A-3] and in [BirMac99, Chap-ter IV, §4] use the above-mentioned recursive shortcut that makes them inferior (in my opinion). A neat (and rather elementary) treatment of polynomials in n vari-ables (for finiten) can be found in [Hunger03, Chapter III, §5], in [Loehr11, §7.16], in [GalQua20, §30.2] and in [AmaEsc05, §I.8]; it generalizes the viewpoint we used in Definition 1.7 for univariate polynomials above²⁸.

27Also, the book [Lang02] is notorious for its unpolished writing; it is best read with Bergman’s companion [Bergma15] at hand.

28You are reading right: The analysis textbook [AmaEsc05] is one of the few sources I am aware of to define the (algebraic!) notion of polynomials precisely and well.

Im Dokument A version without solutions (Seite 48-56)