Substitutional Pre-Theories - On the metatheory of linguistics

4.3. SUBSTITUTIONAL PRE-THEORIES 69

70 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE generally assumed to have underlying structures, and these structures often seem to be a grammar-theoretic pendant of the language-theoretic notion of substitution (but as we will see and most people know, this is true only under some additional assumptions). We will stick with this, as it is our goal to find a formal foundation for linguistics that linguists would judge to be adequate. But it is important to keep in mind that many things which at the level of linguistics seem to have the status of observations, at the level of metalinguistic have the status of mere assumptions, and so does the presumably structural nature of natural languages. So from a metalinguistic point of view, this choice is by no means without alternatives, and later on we will in fact we will in fact consider very interesting alternatives (or rather: extensions).

In the treatment of pre-theories, whether based on strings or not, it is much easier for presentation to depart from a given alphabet; therefore, we will adopt this convention. Note however that pre-theories are defined independently of alphabets. So take a (finite) alphabet Σ and a language L ⊆ Σ^∗. We now introduce two relations over Σ^∗×Σ^∗, which will be fundamental for what is to follow.

1. Write~vvw~ if and only if for some~x, ~y∈Σ^∗,~x~v~y=w;~ 2. w~ ≤⁰_L~v if and only if~x~v~y∈L⇒~x ~w~y∈L.

The first one is anti-symmetric, reflexive and transitive, as can be easily checked. It is quite self-explaining, and will be referred to as (contingent) substring relation. To illustrate the second one, for I := {ab, a}, we have a ≤I ab, but not ab ≤I a. The second relation is a pre-order, that is, it is reflexive and transitive, but not anti-symmetric. Reflexive is obvious, transitivity follows from the transitivity of the logical implication by which it is defined;

it is also antisymmetric modulo the distributional equivalence ∼L. Given a languageL, we call a stringw~ trivial in L, if there is no~v∈L, such thatw~ v~v;

triviality means the string has no occurrence in any word inL. Denote the set of substrings ofw~ byfact(w); we extend this notion to sets on the natural way~ and writefact[L]. w~ is then trivial inIif and only if w /~ ∈fact[I].

As ≤⁰_L is defined by an implication, we allow any trivial string on its right hand side. To avoid complication arising from this, we will assume that ≤L

is the restriction of ≤⁰_L to strings which are non-trivial inL, that is, ≤_L=≤⁰_L

∩(fact(L)×fact(L)). Now to continue the example: if we have nontrivial strings

w, ~v,w~ 6=~v,w~ v~vand~v≤_Lw, then we can deduce that~ Lis necessarily infinite, as can be easily shown by iterated substitution. Showing that a language is necessarily infinite is a type of argument which we will encounter quite some times in what is to follow. To make these arguments neat, we need always the restriction to non-trivial strings, so this is another good reason for excluding trivial strings.

Now, given a finite language I, non-empty strings w, ~~ v, we putw~ ≈^P_L¹⁰~viff 1. w~ v~v, and

2. w~ ≤I ~v;

≈^P1_I is the reflexive closure of≈^P1_I ⁰.

This defines our first, simple analogical mapP1, by puttingP1(I) ={(~w, ~v) :

w≈^P1_I ~v}, providedI is finite. We definef1 as a set of inference rules over the

4.3. SUBSTITUTIONAL PRE-THEORIES 71 relational language defined by the structurehΣ^∗,6=i, so what all statements of the relational language have the form~x6=~y. There are two rule schemata, which, to enhance readability, we immediately write in the form of trees:

`w~~y~v∈f1_P ~y⇐^P_I ~x

`w~~x~v∈f1P(I) , (4.7)

x≈^P_I ~y ~x6=~y

x⇐^P_I ~y (4.8)

Note that theP on the trees is a variable for analogical maps: they can be used with arbitrary maps (as with arbitrary languages); the rules just make sure the identities are preserved. Same holds forI and string symbols; so actually these two schemes serve as a shorthand for infinitely many rule instances. We will have a short look at an example to see this pre-theory at work:

Example 14 Take I := {ab, a}. Clearly, we have P1(I) = {(a, ab),(ab, a)}.

Therefore, we have f1P1(I) =a(b^∗). To show how our calculus works, we will show this result in a some detail. For example, consider the following derivation:

`ab∈I

`ab∈f1_P₁(I)

a≈^P1_I ab a6=ab a⇐^P1_I ab

`abb∈f1_P₁(I)

a≈^P_I¹ab a6=ab a⇐^P1_I ab

`abbb∈f1P1(I) (4.9)

By this example it is easy to see that a(b)^∗ ⊆f1_P1(I); to see the inverse inclusion, consider that we cannot derive anything else by means of the analogy a⇐^P1_I ab; and moreover, we cannot derive anything else by the inverseab⇐^P1_I b either: all derivable strings are already in the language. This can be easily seen in considering the following example:

`ab∈I

`ab∈f1_P1(I)

a≈^P1_I ab a6=ab a⇐^P1_I ab

`abb∈f1_P1(I)

a≈^P1_I ab a6=ab a⇐^P1_I ab

`abbb∈f1P1(I)

ab≈^P_I¹a ab6=a ab⇐^P1_I a

`abb∈f1P1(I)

(4.10) This simple pre-theory, though very elementary, has a considerable complexity, which we will show for pedagogical reasons, so to speak.

Say a pre-theory (f, P) isundecidable, if for some wordw~ and some finite languageI, the problemw~ ∈fP(I) is undecidable. If a pre-theory is undecidable, then its adequacy will be undecidable in general, so we would like our pre-theories to be decidable in any case. The following theorem might be surprising at the first glimpse, but is not surprising any more if we consider that inf1, our string substitutions are completely unrestricted:

Theorem 15 The simple pre-theory (f1, P1)is undecidable.

72 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE We show this by reduction to the word problem for semi-groups, which is well-known to be undecidable (see for example Kleene’s classical [34]). We would get an almost immediate proof by Thue systems, if we would not have the additional requirement of the substring relation v in order to allow for substitution, so there is some work to do.

Asemigroupis a structure (M,·), whereM is a set,·is an associative, binary operation on elements of M, and M is closed under ·. As· is associative, we omit brackets, and we write m₁m₂...m_i as shorthand for m₁·m₂·...·m_i. A semigroup isfree, if for everym∈M there is a unique term denoting it; that is, all equalities are trivial. Every free semigroup has a unique smallest set of generators; ifM has a neutral element 1, if is denoted by (X−1)^∗−((X−1)^∗)², otherwise it is justX^∗−(X^∗)². We denote generator set ofM bygen(M), so that we have (gen(M))⁺ = (M,·), where M is the closure of the generators under the operation.

An unfree semigroup (M,·) has a presentation ((Σ⁺,·), ES) by the free semigroup (Σ⁺,·) over Σ and a set of equations ES of the form w~ =~v, for

w, ~v ∈ Σ⁺. We obtain =S, the set of equalities holding on terms over M in (Σ⁺, ES), as the smallest congruence over Σ⁺ containing all equations inES. So (Σ⁺, ES) presents (M,·), if (M,·)∼= [Σ⁺]=_S, the free semigroup modulo the congruence. A semigroup is finitely presented, if both Σ andES are finite. Now the word problem for (finitely presented) semigroups is as follows:

Given a (finite) presentation (Σ⁺, E_S),w, ~~ v∈Σ⁺, doesw~ =_S~v hold?

As we said, this problem is undecidable in general, and is also undecidable for finitely presented semigroups. We will now reduce the decision problem for (f, P1) to this problem. So assume we have a finitely generated semigroup (Σ⁺, ES), and we want to decide whether the equationw~ =~vis valid in (Σ⁺, ES).

We show that for every finite presentation (Σ⁺, ES), everyw, ~~ v∈Σ⁺, we can construct a finite languageI such that~v ∈fP1(I) if and only ifw~ =S ~v. The core of the proof consists in the construction of an appropriate languageI(w, ~~ v).

So take an equation w~ =S ~v, the validity of which we want to decide. Recall thatw, ~~ vin the sequel will always be used in this given sense. Before we construct the languageI(w, ~~ v), we have to construct its alphabet. Assume ~x=~y∈ES, and~x6v~y, ~y 6v~x. We then take three letters a^y_x, b^y_x, c^y_x, which are unique for any ~x, ~y, and which are not in Σ. Now assume ~x = ~y ∈ E_S, and neither ~x nor ~y are substrings of w. Then in addition take a letter~ d^x_y, which is also unique (regardless of whether ~xv~y or not). Furthermore, for any string in fact[{~x: (~x=~y)∈E_S or (~y=~x)∈E_S or (~x) =w}], we take a unique letter~ e^x_x. Now we defineI(w, ~~ v) as the smallest language, such that:

1. w~ ∈I(~w, ~v);

2. if~x∈fact[{~x:~x=~y∈ESor~y=~x∈ES or~x=w}], then~ e^x_x~xe^x_x∈I(~w, ~v).

3. if w~ = w~₁~x ~w₂, ~x = ~y ∈ E_S (or ~y = ~x ∈ E_S), ~x v ~y (or ~y v ~x), then

w₁~y ~w₂∈I(w, ~~ v);

4. if w~ = w~₁~x ~w₂, ~x = ~y ∈ E_S (or ~y = ~x ∈ E_S), ~x 6v ~y and~y 6v~x, then

w₁a^y_x~xb^y_x~yc^y_xw~₂∈I(w, ~~ v), andw~₁~y ~w₂∈I(w, ~~ v).

5. if ~x /∈ fact(~w), ~x = ~y ∈ ES (or~y =~x∈ ES), ~x v ~y (or ~y v ~x), then d^x_y~xd^x_y, d^x_y~yd^x_y ∈I(~w, ~v);

4.3. SUBSTITUTIONAL PRE-THEORIES 73 6. if~x /∈fact(~w),~x=~y ∈ES (or~y =~x∈ES), and ~x6v~y and~y6v~x, then

d^x_y~xd^x_y, d^x_y~yd^x_y, d^x_ya^y_x~xb^y_x~yc^y_xd^x_y ∈I(w, ~~ v);

This defines the languageI(w, ~~ v). We can easily check thatI(w, ~~ v) is finite, because each condition only adds finitely many strings; in particular: each condition has the form of an implication, where the premise does not get changed by the other conditions being satisfied or not.

What is the first important point isP1(I(w, ~~ v)).

Lemma 16 We have (~s, ~t)∈P1(I(w, ~~ v)), if and only if either 1. ~s=~t∈E_S (or~t=~s∈E_S), and~sv~t(or vice versa), or 2. ~t=a^y_x~sb^y_x~uc^y_x (or inversely), where~s=~u∈ES, or~t=a^y_x~ub^y_x~sc^y_x where~s=~u∈ES, and~s6v~u, ~u6v~s.

Proof. The if direction is clear by definition of I(~w, ~v). We show the only if-direction. We have some (~s, ~t)∈P1(I(w, ~~ v)) not satisfying the above conditions. By assumption, we must either have~sv~tor~tv~s; assume wlog that

~sv~t. But by assumption,~tdoes not have the form in 2., and by assumption,

~s=~t,~t=~s /∈ES. So there are stringse^s_s~se^s_s,e^t_t~te^t_t, in which each of the two have unique, distinct contexts, so we have~s6≤_I(_w,~_~_v)~t,~t6≤_I(~_w,~_v)~s; contradiction.

From this it easily follows that:

Lemma 17 Forw, ~~ v∈M^∗, if w~ =_S ~v, then~v∈fP1(I(~w, ~v))

Proof. Obvious, because each substitution corresponding to one equation

inES can be simulated by at most two analogies.

We now have to show the the other direction: for~v∈Σ^∗, if~v∈fP1(I(~w, ~v)), then w~ =S ~v. To see this, we first make sure: if~v ∈ fP1(I(w, ~~ v)), then ~v ∈ fP1(I(~w,~v))(w), because all other strings in~ I(w, ~~ v) are either derivable as well fromw~ by means of the analogies, or they do not allow to derive~v, because they contain the lettersd^y_x, e^x_x, for which there is no way to get rid of by means of any analogy, and which by assumption do not occur in~v. So we can see that the statement we have to prove can without loss of generality be weakened to the statement: if~v∈f_P_1(I)(w), then~ w~ =_S~v. So we prove:

Lemma 18 For any~v∈Σ^∗, if~v∈fP1(I(~w,~v))(~w), then~v=sw.~

Proof. This is clear for all analogies (~s, ~t), where~s=~t∈E_S, and moreover

~sv~t(or vice versa). For all analogies of the other kind, which introduce symbols not in Σ, the argument is the following: each of these analogies introduces a substringa^y_x~xb^y_x~yc^y_x. As a^y_x, b^y_x, c^y_xare unique, we can only get rid of them by the two analogies which introduce them. This means in particular, we can either substitute it by~xor by~y, where~x=~y∈ES; there is no other to get it out from a string.

So we have to use two analogies, one to introduce it, one to get rid of it, and

they exactly correspond to one equation inES.

This completes the proof of the above theorem: we havew~ =S~viff and only if, for~v∈Σ^∗,~v∈f1P1(I(~w, ~v)). So if the latter were decidable, so would be the former – contradiction.

So we see that already for this very simple pre-theory, which uses nothing but substitution in contexts, we get an undecidability result. This means that the adequacy of (f, P1) is undecidable. The main reason for this negative result is

74 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE obvious: there is no notion of structure in the pre-theory, substrings which result from substitutions are neither marked nor recognized as such. This is not only the main reason for undecidability, but also goes strongly against our intuitions on the nature of language. The other main reason for undecidability is that analogies do not make strings only longer, but also shorter, as the symmetry of P1-similarity is transferred to analogies. We will therefore now introduce new inference rules doing away with both problems, and to which we will refer as structural inference.

Im Dokument On the metatheory of linguistics (Seite 69-74)