Initial Cases of Transﬁnite Induction

(1)

Initial Cases of Transfinite Induction

The goal here is to study the in a sense “most complex” proofs in first- order arithmetic. The main tool for proving theorems in arithmetic is clearly the induction schema

A(0)→ ∀_x(A(x)→A(Sx))→ ∀_xA(x).

An equivalent form of this schema is “course-of-values” or cumulative induction

∀_x(∀_y<xA(y)→A(x))→ ∀_xA(x).

Both schemes refer to the standard ordering of N. It is tempting to try to strengthen arithmetic by allowing more general induction schemes, e.g., w.r.t. the lexicographical ordering of N×N. Even more generally, let≺ be a well-ordering of Nand usetransfinite induction:

∀_x(∀_y≺xA(y)→A(x))→ ∀_xA(x).

It can be understood as

Suppose the property A(x) is “progressive”, i.e., from the validity of A(y) for all y ≺ x we can conclude that A(x) holds. Then A(x) holds for allx.

For which well-orderings this schema is derivable in arithmetic? We will prove a classic result of Gentzen (1943) which in a sense answers this question completely. However, to state the result we have to be more explicit about the well-orderings used.

4.1. Ordinals below ε₀

We need some knowledge and notations for ordinals. However, we do not want to assume set theory here. We rather introduce an initial segment of the ordinals (the ones < ε₀) in a formal, combinatorial way, i.e., via ordinal notations based on “Cantor normal form”. From now on “ordinal” means

“ordinal notation”.

Definition. We define (a) α is an ordinal,

(b) α < β for ordinalsα, β.

77

(2)

78 4. INITIAL CASES OF TRANSFINITE INDUCTION

simultaneously by induction, as follows.

(a) If α_m, . . . , α₀ are ordinals, m ≥ −1 and α_m ≥ · · · ≥ α₀ (where α ≥β meansα > β orα=β), then

ω^α^m+· · ·+ω^α⁰

is an ordinal. The empty sum (denoted by 0) is allowed.

(b) Ifω^α^m+· · ·+ω^α⁰ and ω^βⁿ+· · ·+ω^β⁰ are ordinals, then ω^α^m+· · ·+ω^α⁰ < ω^βⁿ+· · ·+ω^β⁰

iff there is an i ≥ 0 such that αm−i < βn−i, αm−i+1 = βn−i+1, . . . , αm =βn, or elsem < nand αm=βn, . . . , α0 =βn−m.

We shall use the notation:

1 :=ω⁰,

k:=ω⁰+· · ·+ω⁰ withk copies of ω⁰, ω^αk:=ω^α+· · ·+ω^α withk copies of ω^α.

The level of an ordinal is defined by lev(0) := 0, lev(ω^α^m+· · ·+ω^α⁰) :=

lev(α_m) + 1. For ordinals of level k+ 1 we have ω_k ≤ α < ω_k+1, where ω0:= 0,ω1 :=ω¹,ωk+1:=ω^ω^k.

Lemma 4.1.1. <is a linear order with 0 the least element.

Proof. By induction on the levels.

Example.

0<1<2· · ·< ω < ω+ 1· · ·< ω2< ω2 + 1· · ·< ω3· · ·< ω²

< ω²+ 1· · ·< ω²+ω· · ·< ω³· · ·< ω^ω =ω₂· · ·< ω₃· · · Definition (Addition of ordinals).

(ω^α^m+· · ·+ω^α⁰) + (ω^βⁿ+· · ·+ω^β⁰) :=ω^α^m+· · ·+ω^αⁱ+ω^βⁿ+· · ·+ω^β⁰ where iis minimal such thatα_i≥β_n.

Lemma 4.1.2. + is an associative operation which is strictly monotonic in the second argument and weakly monotonic in the first argument.

Proof. Exercise.

Remark. + isnot commutative:

1 +ω =ω6=ω+ 1.

(3)

There is also a commutative version on addition, the natural sum (or Hessenberg sum). It is defined by

(ω^α^m+· · ·+ω^α⁰)#(ω^βⁿ+· · ·+ω^β⁰) :=ω^γ^m+n+· · ·+ω^γ⁰,

whereγm+n, . . . , γ0 is a decreasing permutation ofαm, . . . , α0, βn, . . . , β0. It is easy to see that # is associative, commutative and strictly monotonic in both arguments.

How ordinals of the formβ+ω^α can be approximated from below? First note that

δ < α→β+ω^δk < β+ω^α.

For any γ < β+ω^α we can find a δ < αand a ksuch that γ < β+ω^δk.

To work with ordinals in an arithmetical system we set up some effective bijection between our ordinals < ε₀ and non-negative integers (i.e., a G¨odel numbering). For its definition it is useful to refer to ordinals in the form

ω^α^mkm+· · ·+ω^α⁰k0 withαm>· · ·> α0 and ki6= 0 (m≥ −1).

(By convention, m=−1 corresponds to the empty sum.) Definition.

(a) For every ordinal α we define its G¨odel numberpαqby pω^α^mk_m+· · ·+ω^α⁰k₀q:= Y

i≤m

p^k_pαⁱ

iq

−1,

wherepn is the n-th prime (starting withp0 := 2).

(b) For every integerx≥0 we define its ordinal o(x) by o Y

i≤l

p^q_iⁱ

−1 :=X

i≤l

ω^o(i)q_i,

where the sum is understood as the natural sum.

Lemma 4.1.3 (Bijection between ordinals and non-negative integers). (a) o(pαq) =α,

(b) po(x)q=x.

We can transfer relations and operations on ordinals to computable relations and operations on non-negative integers.

(4)

Abbreviations:

x≺y:= o(x)<o(y), ω^x :=pω^o(x)q, x⊕y :=po(x) + o(y)q, xk :=po(x)kq, ω_k :=pω_kq.

4.2. Provability of initial cases of transfinite induction We will derive initial cases of transfinite induction in arithmetic:

∀_x(∀y≺xP y→P x)→ ∀x≺aP x

for some numberaand a predicate symbolP, where≺is the standard order of order type ε₀ defined before.

Remark. Gentzen (1943) proved that this result is optimal in the sense that for the full system of ordinals < ε₀ the principle

∀_x(∀_y≺xP y→P x)→ ∀_xP x

of transfinite induction is underivable. However, we will not present a proof in these notes.

4.2.1. Arithmetical systems. By anarithmetical system Zwe mean a theory based on minimal logic in the ∀→-language (including equality axioms) such that

(a) The language of Z consists of a fixed supply of function and relation constants assumed to denote computable functions and relations on the non-negative integers.

(b) Among the function constants there is a constant S for the successor function and 0 for (the 0-place function) zero.

(c) Among the relation constants we have =, P and also≺for the ordering of typeε0 of N, as introduced before.

(d) Terms are built up from object variablesx, y, z by f(t₁, . . . , t_m), where f is a function constant.

(e) We identify closed terms which have the same value; this expresses that each function constant is computable.

(f) Terms of the form S(S(. . . S0. . .)) are called numerals. Notation: Sⁿ0 ornor just n.

(g) Formulas are built up from atomic formulas R(t1, . . . , tm), with R a relation constant, byA→B and∀_xA.

The axioms of Zare

(5)

• Compatibility of equality

x=y→A(x)→A(y),

• thePeano axioms, i.e., the universal closures of Sx=Sy→x=y,

(42)

Sx= 0→A, (43)

A(0)→ ∀_x(A(x)→A(Sx))→ ∀_xA(x), (44)

withA(x) an arbitrary formula.

• R~n whenever R~nis true (to express that R is computable).

• Irreflexivity and transitivity for ≺ x≺x→A,

x≺y→y≺z→x≺z

Further axioms – following Sch¨utte – are the universal closures of x≺0→A,

(45)

z≺y⊕ω⁰ →(z≺y→A)→(z=y→A)→A, (46)

x⊕0 =x, (47)

x⊕(y⊕z) = (x⊕y)⊕z, (48)

0⊕x=x, (49)

ω^x0 = 0, (50)

ω^x(Sy) =ω^xy⊕ω^x, (51)

z≺y⊕ω^Sx→z≺y⊕ω^e(x,y,z)m(x, y, z), (52)

z≺y⊕ω^Sx→e(x, y, z)≺Sx, (53)

where ⊕, λx,y(ω^xy), e and m denote function constants and A is any formula. These axioms are formal counterparts to the properties of the ordinal notations observed above.

Theorem 4.2.1 (Provable initial cases of transfinite induction in Z). Transfinite induction up to ω_n, i.e., for arbitraryA(x) the formula

∀_x(∀_y≺xA(y)→A(x))→ ∀_x≺ω_nA(x), is derivable in Z.

Proof. To every formulaA(x) we assign a formulaA⁺(x) (with respect to a fixed variable x) by

A⁺(x) :=∀_y(∀_z≺yA(z)→ ∀_z≺y⊕ω^xA(z)).

(6)

We first show

If A(x) is progressive, thenA⁺(x) is progressive,

where “B(x) is progressive” means ∀_x(∀_y≺xB(y) → B(x)). Assume that A(x) is progressive and

(54) ∀_y≺xA⁺(y).

Our goal is A⁺(x) :=∀_y(∀_z≺yA(z)→ ∀_z≺y⊕ω^xA(z)). Assume

(55) ∀_z≺yA(z)

and z≺y⊕ω^x. We have to showA(z).

Case x= 0. Thenz≺y⊕ω⁰. By (46):

z≺y⊕ω⁰ →(z≺y→A)→(z=y→A)→A

it suffices to derive A(z) from z ≺y as well as from z = y. If z ≺y, then A(z) follows from (55), and if z = y, then A(z) follows from (55) and the progressiveness of A(x).

Case Sx. Fromz≺y⊕ω^Sx we obtainz≺y⊕ω^e(x,y,z)m(x, y, z) by (52) and e(x, y, z)≺Sx by (53). By (54) we haveA⁺(e(x, y, z)), i.e.

∀_u≺y⊕ωe(x,y,z)vA(u)→ ∀_u≺(y⊕ωe(x,y,z)v)⊕ω^e(x,y,z)A(u) and hence, using (48) and (51)

∀_u≺y⊕ωe(x,y,z)vA(u)→ ∀_u≺y⊕ωe(x,y,z)(Sv)A(u).

Also from (55) and (50), (47) we obtain

∀_u≺y⊕ωe(x,y,z)0A(u).

By induction:

∀_u≺y⊕ωe(x,y,z)m(x,y,z)A(u) and hence A(z).

Next we show, by induction on n, how to derive

∀_x(∀_y≺xA(y)→A(x))→ ∀_x≺ω_nA(x) for arbitrary A(x).

Assume the left hand side, i.e., that A(x) is progressive.

Case 0. Thenx≺ω⁰ and hencex≺0⊕ω⁰ by (49). By (46) it suffices to derive A(x) from x≺0 as well as from x= 0. Nowx≺0→A(x) holds by (45), and A(0) then follows from the progressiveness ofA(x).

Case n+ 1. Since A(x) is progressive, by the above also A⁺(x) is. By IH:∀_x≺ω_nA⁺(x), henceA⁺(ω_n) sinceA⁺(x) is progressive. By definition of A⁺(x) (with (45): x≺0→A and (49): 0⊕x=x) we obtain ∀_z≺ωωnA(z).

(7)

Remark. In the induction step we derived transfinite induction up to ωn+1 for A(x) from transfinite induction up to ωn for A⁺(x). Define the level of a formula by

lev(R~t) := 0,

lev(A→B) := max(lev(A) + 1,lev(B)), lev(∀_xA) := max(1,lev(A)).

Then lev(A⁺(x)) = lev(A(x)) + 1. Hence to prove transfinite induction up toω_n, the induction scheme inZis used for formulas of leveln.

4.3. Iteration operators of higher types

We have just seen that the strength of the induction scheme increases with the level of the formula proved by induction. A similar phenomenon occurs when one considers types instead of formulas, and iteration (a special case of recursion) instead of induction. Such operators have a similar relation to ordinals < ε₀.

4.3.1. The extended Grzegorczyk hierarchy. We define the fast growing functions (F_α)_α<ε₀ by

F0(x) := 2^x,

F_α+1(x) :=F_α^(x)(x) (F_α^(x) x-th iterate of F_α), Fλ(x) :=F_λ[x](x).

Here the fundamental sequence λ[x] for a limit number λ < ε₀ and x∈Nis defined in a natural way:

(a) Any such limit number can be written uniquely in the form λ=ω^αⁿ+ . . .+ω^α⁰ with λ > αn≥. . .≥α0 >0.

(b) Then let λ[x] :=

(ω^αⁿ+. . .+ω^α¹ +ω^α⁰⁻¹·x ifα0 is a successor ω^αⁿ+. . .+ω^α¹ +ω^α⁰^[x] ifα₀ is a limit.

The extended Grzegorczyk hierarchy is defined as follows. Let E_α be the elementary closure ofF_α, i.e., the least class of functions containingF_α and the initial functions

U_nⁱ :=λx1,...,xnxi (for 1≤i≤n), C_nⁱ :=λ_x₁_,...,x_ni (forn≥0,i≥0), λ_x,y(x+y)

λx,y(x−· y)

(8)

which is closed under (simultaneous) substitution and bounded sums and products. The initial part for ordinals < ω was (essentially) defined and studied by Grzegorczyk (1953).

There are many characterizations of E_α, for instance

E_α consists of all register machine computable functions with time (i.e. number of computation steps) bounded by a finite iteration of the function F_α.

Let

E_ε₀ := [

α<ε0

E_α “ε₀-recursive functions”

Theorem 4.3.1 (Ackermann (1940); Kreisel (1952)). E_ε₀ consists of the functions “provably recursive” in arithmetic.

The proof of this theorem is beyond the scope of these notes. In fact, one can define natural subsystems of arithmetic whose provably recursive functions are exactly the ones inE_α.

4.3.2. Characterization of(Fα)α<ε0 by higher type iteration. We extend the definition of the functionsF_α into higher types. It is convenient here to introduce integer types ρn:

ρ₀:=N, ρn+1:=ρn→ρn.

Ifx₀, . . . , x_n+1 are of integer typesρ₀, . . . , ρ_n+1, then we can form x_n+1(x_n) (of typeρn) and so on, finallyxn+1(xn). . .(x0), or shortlyxn+1(xn, . . . , x0).

Note that lev(ρ_n) =n.

We define F_αⁿ⁺¹ of typeρ_n+1 forα < ε₀: F₀ⁿ⁺¹(xn, . . . , x0) :=

(2^x⁰ ifn= 0 x^(xn⁰⁾(xn−1, . . . , x0) otherwise.

F_α+1ⁿ⁺¹(x_n, . . . , x₀) := (F_αⁿ⁺¹)^(x⁰⁾(x_n, . . . , x₀), F_λⁿ⁺¹(xn, . . . , x0) :=F_λ[xⁿ⁺¹

0](xn, . . . , x0).

Here x^(y)n (xn−1, . . . , x₀) denotesI(y, x_n, . . . , x₀) with aniteration functional I of type N→ρ_n→ρn−1→. . .→ρ₀ →ρ₀ defined by

I(0, y, z) :=z,

I(x+ 1, y, z) :=y(I(x, y, z)).

Lemma 4.3.2.

F_αⁿ⁺¹(F_βⁿ) =F_β+ωⁿ α

(9)

provided β+ω^α = β # ω^α, i.e., in the Cantor normal form of β the last summand ω^β⁰ (if it exists) has an exponent β0≥α.

Proof. By induction onα. Case α= 0.

F₀ⁿ⁺¹(F_βⁿ, xn−1, . . . , x0) = (F_βⁿ)^(x⁰⁾(xn−1, . . . , x0)

=F_β+1ⁿ (xn−1, . . . , x₀) Case α successor.

F_αⁿ⁺¹(F_βⁿ, xn−1, . . . , x0) = (F_α−1ⁿ⁺¹)^(x⁰⁾(F_βⁿ, xn−1, . . . , x0)

=F_β+ωⁿ α−1·x₀(xn−1, . . . , x0) by IH :=F_(β+ωⁿ α)[x0](xn−1, . . . , x0)

:=F_β+ωⁿ α(xn−1, . . . , x₀) Recall the claim

F_αⁿ⁺¹(F_βⁿ) =F_β+ωⁿ α ifβ+ω^α=β#ω^α. Case α limit.

F_αⁿ⁺¹(F_βⁿ, xn−1, . . . , x₀) =F_α[xⁿ⁺¹

0](F_βⁿ, xn−1, . . . , x₀)

=F_β+ωⁿ α[x0](xn−1, . . . , x0) by IH

=F_(β+ωⁿ α)[x0](xn−1, . . . , x₀)

=F_β+ωⁿ α(xn−1, . . . , x0).

The result just proved indicates the computational complexity involved in the use of finite types. (F_αⁿ⁺¹)α<ε0 and in particular (F_α¹)α<ε0 can be built from iteration functionals (and F0(x) = 2^x) by application alone. In the resulting representation of the functions (F_α)_α<ε₀ we do not need the fundamental sequences λ[x]. The application pattern for Fα corresponds to the Cantor normal form of α.