• Keine Ergebnisse gefunden

Characteristic and Downward Normal Pre-Theories

Im Dokument On the metatheory of linguistics (Seite 82-88)

4.6 Properties of Pre-Theories II

4.6.1 Characteristic and Downward Normal Pre-Theories

82 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE same holds forP r, and can be seen with the same example. Is this a bad thing?

There seems to be dispute on whether similarity as such should be transitive or not; there are good arguments in favor and disfavor. There seems to be a strict conceptions of similarity, which is intransitive, and a broad conception, which allows for transitivity and thereby is more liberal. We do not want to go into this, but just point out the following: in its asymmetric reading,P1 is in fact transitive, that is:

Lemma 27 Ifw~ ≈P1I ~v≈P1I ~uandw~ v~vv~u, thenw~ ≈P1~u.

The proof is simple: as both≤I,vare transitive, we know thatw~ ≤I~u, ~wv~u.

So for P1, if we skip the symmetry, we get transitivity. For P r, this does not obtain, and in fact, the question for transitivity is meaningless:

Lemma 28 There is no finite languageI, with distinctw, ~~ v, ~u∈fact(I), such thatw~ ≈P rI ~v≈P rI ~uandw~ v~vv~u.

Proof. Assume we havew~ =~x, ~v=~y1~x~y2, ~u=~z1~y1~x~y2~z2. Assume we have

~a~x~b∈I, where (~a,~b) is the shortest (in terms of string-length) context of~xin I.

This entails that it is non-recursive for the analogies in question. Ifw~ ≈P1I ~v, then~a~y1~x~y2~b∈I. As~v≈P1I ~u, we also have~a~z1~y1~x~y2~z2~b∈I. But then, we also need~a~z1~x~z2~b∈I(downward P r); and so we need~a~z1~z1~y1~x~y2~z2~z2~b∈I (upward P r); then we need~a~z1~z1~x~z2~z2~b∈I etc., such that Iis necessarily infinite.

So we might think of P1 as representing the liberal, transitive notion of similarity, P rrepresenting the restrictive, intransitive one.

4.6. PROPERTIES OF PRE-THEORIES II 83 similar form in metalinguistics. However, whereas in the linguistic view, there is little we can do about it, in metalinguistics we can make it much more amenable by choosing, in the classical paradigm, an appropriate pre-theory. We will first discuss this with a variant of the problem of “private languages”.

A problematic fact for linguistic theory is the following: there is a considerable incongruence in linguistic judgments, not only between different speakers, but also regarding the same speaker and different contexts/times. Whereas it might be plausible to attribute the former effect to the fact that different speakers know different languages, this does not sound plausible regarding the same speaker over different times, which in the case of some priming effects might even be very short.

This question of “private languages” indeed poses fundamental problems to linguistics (see [42]). Whereas we cannot deal with these here, there is certainly a similar problem for the metalinguist: different linguists will surely not observe exactly the same fragment of a language; even worse, one linguist will find the same utterance one time to be acceptable, whereas another time it will not be judged unacceptable. This is not as bad as one would think: if it is not judged to be acceptable, it does not follow that it is in the negative language. Still, this is problematic, and it should somehow be possible to yield an agreement on the projected language despite differences in the observations. Note that maybe a linguist, as concerned with the cognitive reality of speakers, might think differently. But keep in mind that we are doingmetalinguistics here, and we have the goal of constructing the proper subject of linguistics, on which linguists should agree! In a word, we would have a better situation if we can have agreement on “language” despite some disagreement on the observed language.

So it would be very favorable to have a property of pre-theories which up to a certain point can ensure this. This is done by the concept of characteristic pre-theories:

Definition 29 We say a pre-theory(f, P)is characteristic, if for every lan-guageLand for all languages I1, ..., In such thatfP(Ii) =L: 1≤i≤n, there is a unique smallest languageJ such that f(J) =LandJ ⊆T

1≤i≤nIi.

Note that this covers the special cases where there is no finite language I such that fP(I) = L; because then the only language inducing L is L itself.

We can say that this unique smallest language is characteristic of L, and if a pre-theory is characteristic, then for every languageL it induces there is a smallest characteristic language. As we said, this is motivated by the question of critical data: with characteristic pre-theories we know which part of an observed languageI is essential for the language it generates underP, and which not.

A pre-theory (f, P) isinjective, if for any I, J, ifI6=J, then fP(I)6=fP(J).

Obviously, any pre-theory (f, P) such thatfP isinjectiveis trivially characteristic, and in this case, the concept of characteristicity is entirely meaningless. We will however not consider such pre-theories, because they would violate a fundamental principle of linguistics: as linguists, and equally as speakers, we are exposed to very different data (observed languages), yet still we agree broadly on what

“language” is. Obviously, we do not need to, but if wecannot agree, then there is really no hope. We will later consider some other important properties of pre-theories, which exclude the injective pre-theories categorically; such that this simplistic solution is completely out of question.

84 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE For the pre-theories we have seen so far, we have seen in some examples that different finite languages give rise to the same infinite language under some pre-theory, and this is the case where things get interesting for us.

We have said that the disagreement on whether certain strings belong to a language or do not is a big problem for us. In how far do characteristic pre-theories address this? What we want is a kind of downward monotonicity:

we want to be sure that by taking certain strings away from our database, we still obtain the same language. It is however unclear how we can make this desideratum precise: a general version of downward monotonicity of the form: if fP(I) = L, J ⊆ I, then fP(J) = L is out of the question, because not only it is much too strong to have any linguistic plausibility, but it would also have a devastating effect from a purely mathematical point of view: it would trivialize the entire procedure of projection (contrary to a similar, upward version of monotonicity, which we will discuss later on). Being characteristic is in some sense a weak version of downward monotonicity, but much weaker than the general downward monotonicity, and it seems reasonable to require it. Characteristicity requires that if we have fP(I1) =fP(I2), then there is a J ⊆ I1∩I2 such that fP(I1) = fP(J). This says, among other, that if two linguists agree on the “language”L, yet they disagree on their observations I1

on the one hand andI2 on the other, they will find anI3, on which they both agree and which still yields the languageL.

Contrary to what one might think, characteristicity is a highly non-trivial requirement. And in fact, one might argue that it is way to strong: because it makes the presupposition is thatI1, I2 induce the same language. This however might already by undecidable for many pre-theories we consider! That in turn would make the strong requirement practically useless. We therefore formulate a more careful version of this property, which addresses the same problem in a fashion which is more satisfying:

Definition 30 A pre-theory(f, P)isdownward normal, if for anyI1, I2 such thatfP(I1)⊇I2,fP(I2)⊇I1, there exists aJ ⊆I1∩I2, such thatfP(J)⊇I1∪I2. Note that downward normality and characteristicity do not imply each other in any direction. It is however clear that downward normality solves the above problem in a very satisfying manner: assume we disagree over the symmetric differenceI1∆I2(we putM∆N:= (M∪N)−(M∩N)). Then there is a solution with data we both agree on, such that still any string inI1∆I2 is contained in the “language” resulting from projection. So we can reject arguable data for projection, but we always find a way to make sure that the strings are still part of “language”. Because this is fully effective – provided the pre-theory is decidable – we will prefer this property over characteristicity.

So the question is: what are the requirements for analogies and inferences to make sure that pre-theories are characteristic or downward normal? This is a difficult question, and all our criteria so far are insufficient.

Consider the pre-theory (g, P1), where we allow for an analogy only if

~

w ≤I ~v (that is, the set of contexts in which the two occur are in inclusion relation) and w~ v ~v (provided w~ 6= ~v. These conditions are not sufficient:

takeI1:={axb, cxd, aixjb}, I2:={axb, cxd, cixjd}. We can make the analogy x⇐ixj in both languages; inI1∩I2={axb, cxd} we cannot, as we have no occurrence of the substringixj, nor can we in any still smaller language. SoP1

4.6. PROPERTIES OF PRE-THEORIES II 85 is neither characteristic nor upward normal, and we need stronger requirements.

The following is less obvious.

Lemma 31 (g, P r)is not characteristic and not downward normal.

Proof. For illustration purposes, we show this with two typical examples.

Counterexample 1 to characteristicity. Consider the following two lan-guages: I1 := {ab, aabb, aaabbb} ∪ {cb, ccbb} ∪ {bbb, dbbbd}; I2 := {ab, aabb} ∪ {cb, ccbb, cccbbb} ∪ {bbb, dbbbd}. We haveI1∩I2={ab, aabb, cb, ccbb, bbb, dbbbd}.

In this case, we havebbb≈P rI

1∩I2dbbbd, but we do not get this similarity inI1 or I2. Yet, there is no J⊆I1∩I2, such thatbbb, dbbbd∈J butbbb6≈P rJ dbbbd, as can be easily seen (or checked by hand).

Counterexample 2 to characteristicity and downward normality. PutI1 = {ab, aabb, aaabbb, xaaaxbbb}, I2 = {ab, aabb, xaaaybbb, xxaaayybbb}. We have gP r(I1) =gP r(I2), as can be easily checked, but forI1∩I2={ab, aabb, xaaaxbbb},

we have gP r(I1∩I2)(gP r(I1).

So how can we ensure the two properties? The way to characteristicity is long and complicated: basically, we have to ensure that an infinite language uniquely encodes the smallest finite language that induces it. This is feasible with more or less reasonable methods; yet, it is somehow counterintuitive from my point of view: linguistic metatheory is all about having finite objects and constructing infinite objects; whereas characteristicity is more about having infinite objects and constructing (showing the existence) of finite objects. Even worse, we cannot even claim to have the infinite objects: maybe its relevant properties cannot be simply read off from our finite characterization. So it is the concern about our commitment to finitary procedures which speaks most strongly in favor of downward normality as opposed to characteristicity. We will therefore only describe an approach to obtain downward normality.

What is most problematic about downward normality is that analogies are permitted or prohibited by global properties of the language: we always have to consider all strings of a language, unless of course we have some additional information about them, such as that they do not contain a certain substring.

We will now present a pre-theory, where analogies can be determinedlocally, that is, we can allow for a certain analogy in a certain context, though not in some other context, and we can compute them only by looking at a certain subset of the language.

We say a string w~ is elementary, if it does not contain any substring of the form~x1~x1~x~x2~x2, where~x6=6=~x1~x2. We define the analogical map P2 as follows:

(w~~x~v, ~w~x1~x~x2~v)∈P2(I), if 1. w~~x~v, ~w~x1~x~x2~v∈I, 2. w~~x~v is elementary, and

3. (there is no~z∈I such thatw~~x~v@~z@w~~x1~x~x2~v.)

Next, we define the inference (meta-)rules g2; note that here and in the sequel, we have the convention thatw~ represents a possibly bracketed string; to refer to its unbracketed version, we writeh(w):~

86 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE

`w~~x~v∈g2P2(I) h(w~~x~v)⇐PI2h(w~~x1~x~x2~v) ~x∈Σ

`w(~x1(~x)~x2)~v∈g2P2 , (4.15) where (,)∈/Σ, andhis the usual homomorphism mapping (,)7→. As we can see, our relational language thus comprises statements of the form~x∈Σ;

we thus need a unary relationR= Σin the language-theoretic structure. This scheme is complicated because we need the distinction between the bracketed string on the one side (for judgments) and the unbracketed string for analogies on the other on other; moreover, we want to make sure that the string~xdoes not contain any brackets (the third premise). Say a pair of brackets (,) in a string

~

w(~x)~vis simple, if~xdoes not contain any brackets. This first rule is necessary to introduce simple brackets, so to speak, because the next rule presupposes them:

~

w(~x1(~x)~x2)~v∈g2P2 w~0~x~v0P2I w~0~x1~x~x2~v0

~

w(~x1(~x1(~x)~x2)~x2)~v∈g2P2 , (4.16) where w~0, ~v0 are arbitrary and unrelated tow, ~~ v. So once we have introduced a bracketing of the form (~x1(~x1(~x)~x2), we can expand it without contextual restrictions. This scheme thus only requires that there existssome context in which the analogy is legitimate. Note that the second premise makes sure that

~

x, ~x1, ~x2to not contain any brackets, together with the third and last inference rule ofg2:

~

w~x~v≈PI2w~~x1~x~x2~v

~

w~x~v⇐P2I w~~x1~x~x2~v. (4.17) This is sufficient. Note that already ≈P2 is asymmetric; we could make it symmetric and only allow asymmetric analogies, but we skip this for reasons of simplicity. We first make the following observation:

Lemma 32 LetI, J be two finite languages. If I⊆J, then g2P2(I)⊆g2P2(J).

We will call this property monotonicity, and later on devote a proper subsection to it.

Proof. It is clear thatP2(I)⊆P2(J), because if w, ~~ v∈I, thenw, ~~ v ∈J.

Also, the set of premises forg2P2(I) is a subset ofg2P2(J).

Lemma 33 Assume thath(~w)∈/ I, w~ ∈g2P2(I). Then h(w)~ is not elementary.

Proof. Assume thath(~w) does not contain a substring of the form~x1~x1~x~x2~x2, with~x6=6=~x1~x2. If w~ =h(w), then~ w~ ∈I and the claim follows. Assume that

~

wcontains substrings of the form (~x1(~x)~x2). Then each of these substrings has been introduced by an inference which, by the last definitions, must have the form

`w~1~x ~w2∈g2P2(I) h(w~1~x ~w2)⇐P2I h(w~1~x1~x~x2w~2)

`w~1(~x1(~x)~x2)w~2∈g2P2(I) ,

which presupposes that h(~w1~x1~x~x2w~2)∈I (check the definition ofP2). So ifh(~w) does not contain a substring of the form~x1~x1~x)~x2~x2, thenw~ ∈I, and

by contraposition the claim follows.

We can now show the following:

4.6. PROPERTIES OF PRE-THEORIES II 87 Theorem 34 (g2, P2)is downward normal.

Proof. Assume we have I1, I2 withh◦g2P2(I1)⊇I2, h◦g2P2(I2)⊇I1. We show that every stringw~ ∈I1∪I2 can be derived by a subset ofI1∩I2. By the above monotonicity result, the claim then follows.

We prove only one part, namely that I1 ⊆ g2P2(I1∩I2); the proof for I2⊆g2P2(I1∩I2) is identical. We do this by an induction on the strings ofI1, using the partial ordervω. By vω we denote thescattered substring relation, that is, we havew~ vω~v, iffw~ =w~1... ~wiand~v=~v1w~1~v2...~viw~i~vi+1. Importantly, we make an induction on the strong language containing brackets, where the crucial step is the following: we show that for everyw~ ∈I1,~v∈g2P2(I2) such thath(~v) =w, we have~ ~v∈g2P2(I1∩I2).

The induction base is clear: every vω minimal string of I1 is in I1∩I2, because inferences strictly increase string length.

Induction hypothesis: take a~v ∈I1, w~ ∈g2P2(I2) such that h(w) =~ ~v, and assume the claim holds for all w~0 ∈ g2P2(I2) such that w~0 @ω w~ and h(w~0)∈I1.

Case 1: w~ contains no brackets – in this case, we havew~ ∈I2, and asw~ ∈I1, we have w~ ∈I1∩I2, and thusw~ ∈g2P2(I1∩I2).

Case 2: assumew~ can be derived in a derivation whose last step is

`w~1~x ~w2∈g2P2(I1) h(w~1~x ~w2)⇐P2I

2 h(~w1~x1~x~x2w~2) ~x∈Σ

`w~1(~x1(~x)~x2)~w2∈g2P2(I1) ,

and thusw~ =w~1(~x1(~x)~x2)w~2In this case, we know thath(~w1~x ~w2), h(~w1~x1~x~x2w~2)∈ I2, because otherwise there could not be an analogy as the one above. Moreover, by assumption, we know thath(w~1~x1~x~x2w~2)∈I2∩I1. But we also know that h(w~1~x ~w2)∈I1: as it is inI2, it must be by assumption ing2P2(I1), and as it is the left-hand side of an analogy, it has to be elementary. So ifh(w~1~x ~w2)∈/ I1, we contradict the last lemma. Furthermore, by induction hypothesis, we have

~

w1~x ~w2∈g2P2(I1∩I2). So the same analogy works withI1∩I2. So we also have

~

w1(~x1(~x)~x2)~w2∈g2P2(I1∩I2).

Case 3: there is a derivation ofw, the last step of which is~

~

w1(~x1(~x)~x2)w~2∈g2P2(I2) a

`w~1(~x1(~x1(~x)~x2)~x2)~w2∈g2P2(I2),

whereais an appropriate analogy. We know thatw~1(~x1(~x)~x2)w~2∈g2P2(V) for some V ⊆ I1∩I2 by induction hypothesis; we also know that we have an appropriate analogy overV at hand, because otherwise we could not have introduced the brackets. Consequently, we havew~1(~x1(~x1(~x)~x2)~x2)~w2∈g2P2(V).

Note, by the way, that the three cases are not mutually exclusive, but they cover all possibilities. This proves the induction step, and shows that any string

~

w∈g2P2(I2) such thath(~w)∈I1 is actually also ing2P2(I1∩I2).

This proof even gives us a stronger corollary:

Corollary 35 Assume we have I1, I2 withh◦g2P2(I1)⊇I2, h◦g2P2(I2)⊇I1. Then for every bracketed stringw~ such thath(w)~ ∈I1∩I2, we havew~ ∈g2P2(I1)

iffw~ ∈g2P2(I2).

88 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE This shows us how a downward normal pre-theory has to look like. The crucial properties we need to obtain this result are firstly monotonicity, secondly the fact that analogies are restricted to contexts, and thirdly another peculiarity about analogies: the analogies introducing certain brackets actually presuppose that exactly the strings, which are being inferred, are in the language in question.

I do not think that (g2, P2) is uninteresting as such; however, I think it is preferable to not have these properties, that is, it is more interesting to compute analogies globally regardless of the context in which they occur. But of course it is also much more challenging, as we cannot easily make statements about gP r(I) without observingIas a whole. This is the main reason that we do focus here on (g, P r),(g, P1) in the sequel.

Im Dokument On the metatheory of linguistics (Seite 82-88)