Upward Normality - Properties of Pre-Theories II

4.6 Properties of Pre-Theories II

4.6.2 Upward Normality

88 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE This shows us how a downward normal pre-theory has to look like. The crucial properties we need to obtain this result are firstly monotonicity, secondly the fact that analogies are restricted to contexts, and thirdly another peculiarity about analogies: the analogies introducing certain brackets actually presuppose that exactly the strings, which are being inferred, are in the language in question.

I do not think that (g2, P2) is uninteresting as such; however, I think it is preferable to not have these properties, that is, it is more interesting to compute analogies globally regardless of the context in which they occur. But of course it is also much more challenging, as we cannot easily make statements about g_{P r}(I) without observingIas a whole. This is the main reason that we do focus here on (g, P r),(g, P1) in the sequel.

4.6. PROPERTIES OF PRE-THEORIES II 89 2 and case 3, where we add a string derivable by the analogy (xy, xxyy) or (yz, yyzz) are completely parallel.

Consequently, there is no finite languageJ)I such thatfP r(I) =fP r(J).

So for “restrictive” pre-theories as P r, weak upward normality is in fact a problem. We will now introduce an even stronger requirement, the one of strong upward normalityor simply upward normality.

Definition 38 A pre-theoryP is upward normal(in the strong sense), if for every infinite languageL such thatf_P(I) =L for some finiteI, the following holds: for every finiteJ ⊆L, there is a finite J⁰⊇J such thatf_P(J⁰) =L.

Obviously, strong upward normality entails weak upward normality, but is much stronger: whereas in weak upward normality, there only needs to be some arbitrarily large extension inducing the same language, for strong upward normality we must be able to extend in “every direction”, so to speak. Upward normal in this strong sense means: we cannot have too much data regarding a language; whatever fragment ofLwe observe, there is a larger fragment which convinces us that we are observing fragments ofL. No finite fragment ofLis convincing evidence against L in the sense that it excludes it as a candidate

“language”. This is obviously motivated by the following fact: given a presumable

“language”L, we reasonably assumea priori that we can observe arbitrary and arbitrarily large fragments thereof.³ Now the fact that we can observe arbitrary fragments of “language” should make us exclude languages which are excluded by fragments we can observe, because this is a contradiction, and it seems to fundamentally contradict our sense of scientific positivism.

There are some points to note. Firstly, strong upward normality does not say anything about convergence or about the fact that the larger the fragment ofL we observe, the more “plausible”Lbecomes. On the contrary, it might happen that a fragment leads us onto the “wrong track” in the following sense: Given a sequence of finite fragmentsIi, such that for alli∈N,Ii ⊆L, i≤j⇒Ii⊆Ij,

|Ii|=i,La language induced by (f, P), it might be that for eachIi, the smallest Ij ⊇Ii such thatfP(Ij) =Lhas size at leastkⁱ. That is, we need exponentially many strings to lead us back onto the right track. But this is not the kind of question or problem we are interested in at this point.

Corollary 39 (g, P r)is not upward normal (in the strong sense).

This is because upward normality obviously entails weak upward normality, but not vice versa, as we have seen. Consequently, upward normality is also highly non-trivial, and we find plenty of counterexamples. Upward normality even fails to hold for much less restrictive pre-theories such asP1:

Lemma 40 (g, P1) is not (strongly) upward normal.

Proof. Take the finite language I := {ab, xxaxxb, ayybyy, xy, xxyy}. We have

P1(I) ={(a, xxaxx),(b, yybyy),(xy, xxyy)}.

If we however put I⁰ = I∪ {xxaxxyybyy} ⊆ g_P₁(I) =: L, then we have xy6≈^I_P1⁰ xxyy, becausexy6≤_I⁰ xxyy. We cannot restore this analogy by adding

3Though this is not necessarily the case: compare our discussion on o-language and

“language”.

90 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE the stringxxaxybyy, as it is not inL. On the other side, we can easily enrich the language as to allow for additional analogies which work to the same effect, allowing to derive{xⁿyⁿ:n∈N}. For example, we can defineI⁰⁰=I⁰∪{xxxyyy}.

Now we have (xxyy, xxxyyy)∈P1(I⁰⁰), and this is fine; but the problem is now the following: we can apply this analogy also to the stringxxaxxyybyy, thereby deriving{xxaxⁿxyyⁿbyy:n∈N}, which is not inL. So the problem is that our resulting language is too big rather than to small.

Is there a way out of this problem? The answer can be shown to be negative.

Assume we have a finite language J :I⁰ ⊆J ⊆gP1(I). In order to make sure that {xⁿyⁿ : n ∈ N} ⊆ gP1(J), we need some strings of the form xⁿyⁿ ∈ J;

and assume thatx^ky^k∈J is the longest of these strings. Then there are two possibilities:

Case 1: we do not have ~xa~x~yb~y ∈J, such that x^ky^k v~x~y. In this case, (xy, x^ky^k)∈P1(J), and so, we can derive strings fromxxaxxyybyy which are

not inL, as above.

Case 2: we do have~xa~x~yb~y∈J, such thatx^ky^kv~x~y. In this case, whatever analogies we have that allow us to derive the strings{xⁿyⁿ :n∈N}, they are also applicable to~xa~x~yb~y, thereby again deriving strings not inL.

So upward normality is quite problematic for pre-theories, though it is a most natural requirement. A reason for that is that so far we have used the relation

≤L. But this relation, as we have seen, is not preserved under projection, and in the worst case, it might even become undecidable under projection. This means it also becomes undecidable for upward normality: assume thatL=fP(I) for someP based on≤I. The question whether for every finiteJ ⊆L, there is a finiteJ⁰ ⊇J such thatw~ ≤J⁰~v might turn out to be undecidable, as well as it might be undecidable forLitself.

Lemma 41 Given a CFL L, the question whether for every finite fragment I⊆L, there is a (finite)J :L⊇J ⊇I such thatw~ ≤_J~v is undecidable.

Proof. If for some finiteI ⊆Lthere is no finite J ⊇I such thatw~ ≤J~v, thenw~ 6≤L~v, for the following reason: assumew~ ≤L~v. Then for every~x ~w~y∈I, there is~x~v~y∈L. Take the set of these strings, which is finite. This contradicts the assumption.

Conversely, assume we have for every finiteIa finiteJ ⊆Lsuch thatw~ ≤J~v.

Then we have w~ ≤L ~v. For assume we do not havew~ ≤L ~v. Then there is

x~v~y∈L,~x ~w~y /∈L. So for the finite setI={~x~v~y}, there is no finite J ⊆Lsuch thatw~ ≤_J~v, otherwise we would have~x ~w~y∈J and soJ 6⊆L. Contradiction.

Thereby, we see that we have w~ ≤L~v, for a CFLL, exactly if for every finite set I ⊆L, there is a finite J : I ⊆ J ⊆L, such that w~ ≤J ~v. This in turn means the decision problems are equivalent, and as the problem “forLa CFL, isw~ ≤L~v?” is undecidable, so is our current problem.

On the other side, there are few meaningful alternatives to ≤L in substitu-tional pre-theories. So if we want upward normality, there seems to be only one reasonable way to go: when considering a finite languageJ, we firstreduce it to some relevant fragment thereof, and onlyafterwards project it.

As some simplistic solutions for characteristicity (as injective pre-theories) have shown us the right way to go, we might want to look simplistic solutions for upward normality. A first upward normal pre-theory might be constructed on the basis of the following: Take a pre-theory (f, P) and fix a constantk≥0. Define

4.6. PROPERTIES OF PRE-THEORIES II 91 the map pk, mapping stringsets on stringsets by pk(I) ={w~ ∈ I : |w| ≤k}.

We can now defineP^k:=P ◦pk. This does not ensure upward normality: for (g, P r^k), take the languageI:={aaa, aaaa, ab, aabb}. If we addaaabbb, aaaabbbb to the language, even if they do not have relevance for the analogies, they allow to apply the analogy to new strings like aⁿb^m for n > m, where we use the additional strings as premises for inferences, not analogies.

So in order to get upward normality, we must extend this map to the premises, for example by defining (f^k, P^k), such thatf^k_P

k:=f_P◦p_k. But in this case we obviously fail a fundamental requirement, namely the requirements that our maps beincreasing, that is, that we haveI⊆f_P(I). So this is highly unsatisfying, as it does not even define a projection

How can we proceed? There is another, more elegant approach. We mostly look at pre-theories based on≤L. Take an integerk≥0, and define the relation

≤^k_L as follows:

Definition 42 ForL⊆Σ^∗, w~ ≤^k_L~v if and only if for all(~x, ~y)∈Σ^∗×Σ^∗, if

x ~w~y∈Land|~x ~w~y| ≤k, then~x~v~y∈L.

So ≤^k_L is some kind of restriction of≤L, but not in a set-theoretic sense:

in general, we can both have≤L6⊆≤^k_L and≤^k_L6⊆≤L for a givenk and a given languageL. So from a set-theoretic point of view, the relations are incomparable.

Now take a pre-theoryP_≤_L, which uses≤L, and change all occurrences of ≤L

in the conditions to≤^k_L, thereby obtainingP_≤k

L. This alone does not give us upward normality for the same reason as above: using inferencef, either we can derive additional strings with the new premises (though they do not play any role for the analogies), or there are strings we cannot recover.

We can solve this problem as follows: assume we have a pre-theory (f, P), and, for a given alphabet Σ, a finite set of (infinite) languages{Lt:t∈T}, with

|T| ≤k, which satisfies the following criteria:

1. S

t∈TLt= Σ^∗;

2. for eacht∈T, there is a finiteI⊆Σ^∗ such thatfP(I) =Lt; 3. for eacht, t⁰∈T, Lt∩Lt⁰ is finite.

The motivation behind this definition is as follows: the first condition is to say: no observation is impossible; the second one says: every one of these languages is finitely induced, and the third condition is to make sure that each of the languages can be uniquely characterized by a finite set. We can also avoid making the alphabet explicit at this point, by defining a functionLfrom any alphabet to a finite set of infinite languages over this alphabet such that for any Σ,L(Σ) is a set of languages satisfying the above constraint. We now devise a projection as follows:

f^L_P(I) =

(L_t, ift⁰ 6=t⇒I6⊆L_t⁰,

I otherwise. (4.18)

By this procedure, we can make any pre-theory upward normal. Note that we did not actually define the pre-theory here, only the projection to which it gives rise. The reason we can grant this is the following: we will show later on, thatevery projection can be formalized as a pre-theory. So writing a projection

92 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE as a pre-theory is in this sense rather an exercise, which can be tedious and not be very instructive, and therefore we skip it at this point. That does of course not mean that it is useless to look at pre-theories at all: mostly, we define pre-theories for their plausibility, and then look to which projections they give rise.

Now, the price we pay for upward normality of (f^L, P) is obvious: we get a finite number of possible languages. Note that this is maybe not as bad as it seems: many linguistic schools of thought have tried to restrain the space of possible languages, and the mainstream generative school has tried to cut them even down to a finite number (modulo some considerable abstraction).

From our point of view, this assumption has some interesting consequences:

Lemma 43 Letf_P be a projection, such that there are only finitely many infinite languages induced byf_P over a given alphabetΣ. Then there is a finite setJ⊆Σ^∗ such that for all I:J ⊆I, we have eitherfP(I)is finite, or we havefP(I) = Σ^∗. Proof. Let Σ^∗_{f in}denote all finite subsets of Σ^∗; it is thus a subset of℘(Σ^∗).

Case 1: S

I∈Σ^∗_{f in}fP(I) (Σ^∗. Then there is a finite set J ⊆Σ^∗ which is contained in no infinite language induced byf_P. Consequently, ifJ ⊆I,I finite, thenf_P(I) is finite. Note that in this case, J can be chosen to be a singleton {w}, thereby strengthening the result.~

Case 2: S

I∈Σ^∗_{f in}f_P(I) = Σ^∗. LetL_t:t∈T be the languages induced byf_P over Σ^∗. For all Lt :t∈T, if Lt6= Σ^∗, choose a w /~ ∈Lt. This yields a finite setJ. Assume (i) there is anL_t= Σ^∗. In this case, we must have for I⊇J, f_P(I) = Σ^∗ orf_P(I) finite, because there is no other possible image. Otherwise (ii) if there is noL_t= Σ^∗, for anyI⊇J,f_P(I) must be finite.

The main point in the proof is that there are infinitely many finite languages over Σ. This result seems to be of some relevance to a view of language as the one put forward by the principles and parameters program, which is somewhat in line with the approach of (f^L, P) laid out above: it says that the number of possible languages – modulo lexicon – are finite. This entails thus that we find sets of strings of the above kind, which is very implausible in the first place.

But of course, in applying this purely language-theoretic result to a linguistic theory one has to be very careful, and we will not work this out at this point.

Anyway, this result shows that pre-theories which induce only finitely many infinite languages have some properties we would judge as disfavorable. So this road is not very appealing.

And so the question remains open: how can we devise an interesting pre-theory which is strongly upward normal?

Im Dokument On the metatheory of linguistics (Seite 88-92)