Properties of Pre-Theories I - On the metatheory of linguistics

4.5. PROPERTIES OF PRE-THEORIES I 77

78 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE and their values very quickly become undecidable. This is what we will argue for in this intermezzo. As regards the conclusion of this section, we should have put it earlier in the order of contents. However as regards the methods applied, we think at this point it will be much easier to understand by the reader. Our first result is the following:

Theorem 21 LetG be a CFG, such thatL(G) =L⊆Σ^∗. Then the relation

≤L⊆Σ^∗×Σ^∗ is in general undecidable.

Proof. As is well-known, the universality problem for CFLs is undecidable, that is, there is no algorithm which for any CFGG, tells us whetherL(G) = Σ^∗ or not. As is also well-known, the emptiness problem for CFGs is decidable, that is, there is an algorithm which tells us whetherL(G) =∅ for any CFGG.

Now assume ≤_L(G)is decidable. We show that under this assumption the universality problem is decidable, yielding a contradiction. To check universality, we first check whetherL(G) =∅, which is decidable. If it is, we can answer the universality question negatively. So assume L(G)6=∅. Then there is a word

w ∈ L(G). We check whether ∈ L(G). Next we check whether ≤_L(G) a for all a ∈ Σ. If both are answered positively, then we have L(G) = Σ^∗; if one is negative, thenL(G)6= Σ^∗. This way, we can effectively decide whether L(G) = Σ^∗ for any CFGG. This is a contradiction, as this is an undecidable

problem. So≤L(G)is undecidable.

A similar result is the following:

Theorem 22 LetG be a CFG,L(G)⊆Σ^∗. Furthermore, letW ⊆Σ^∗ be a set of strings. Then it is undecidable whether W ∈[Σ^∗]_∼_L(G), that is, whether W is an equivalence class of strings. Furthermore, it is in general undecidable whether

w∼_L(G)~v for somew, ~~ v∈Σ^∗.

Proof. The proof is very similar: assume that∼_L(G)is decidable. Then from the decidability of the emptiness problem we can easily deduce that universality is decidable, yielding a contradiction, because |[Σ^∗]_∼_L(G)| = 1 if and only if L(G) = Σ^∗ orL(G) =∅, and the latter is decidable.

So there is little we can say about interesting classes of infinite languages, and we have to decide on things in the finite. This last result also shows why equivalence classes are really of little use for us: for the infinite, they are undecidable, and in the finite, they do not contain interesting patters, like strings in a substring relation (this will change however in the sequel).

There are some further things to consider: for example, the relations≤L and consequently∼L arecompact in the following sense:

Lemma 23 Given an infinite language L⊆Σ^∗, w, ~~ v ∈ Σ^∗, we have w~ ≤_L ~v (w~ ∼_L~v) if and only if for every finite fragment I⊆Lthere is a finite J :I⊆

J ⊆Lsuch that w~ ≤J~v (w~ ∼J~v).

We have shown an equivalent result above. So the properties in the infinite are determined by the finite fragments, even though, of course, there are infinitely many. However, there are other properties of≤Lwhich are peculiar to the infinite.

For example, take the property of well-foundedness. Well-foundedness means:

for≤⊆M×M, for eachm∈M, the set{n:n≤m} is finite. Now obviously, for any finite languageL,≤L is well-founded. This does however not hold for infinite languages:

4.5. PROPERTIES OF PRE-THEORIES I 79 Lemma 24 There exists context-free languages L, such that ≤L is not well-founded.

Proof. Take L := {a^mbⁿ : m ≥ n}. In this case, we have a^m ≤L aⁿ iff

m≥n. This is obviously not well-founded.

4.5.2 On Regular Projection

There is a good argument in favor of weaker forms of projection. For example, one might be one of the advocates of the regularity of natural language. A more reasonable position is the following: we want to allow inferences only if they

“preserve acceptability”; that is: we want to make sure, that if we infer a new judgment, then it should be as acceptable as its premise. Whereas this sounds simple from a linguistic point of view, from a formal point of view it is obviously problematic: in the relevant case, we can derive infinitely many new judgments – we cannot simply check whether they preserve acceptability. So in translating this requirement into a formal theory, one has to do some work, and we will investigate two approaches.

The first approach is the following: one might say: the set of observable (acceptable) utterances is regular, so we have to take care that projected languages are regular; everything else will be fine by that point. Therefore, we allow pre-theories of the above type, but we restrict analogies to the scheme:

x⇐~x~x1 (4.13)

that is, we make analogies roughly correspond to regular rules. Obviously, this is a particular instance of our above scheme.

We can accordingly define the analogical mapsRP r, RP1 by (~x, ~y)∈RP r(I) (and~x, ~y)∈RP1) if and only if

1. ~x≈^{P r}_I ~y (and~x≈^{P r}_I ~y), and 2. ~x=~y~y1 or~y=~x~x1.

So what we do is: in addition to the P1- orP r-requirements on contexts etc., we restrict the form analogies can have. What are formal properties that come with this scheme? One might conjecture that, given that the scheme does not allow for “center embedding”, for any finite languageI,gRP1(I) is a regular language. However, this is not true: just assume we have an analogy of the form a⇐aab, and a premiseab. Then we can make derivations of the form

`ab∈fP(I) a⇐aab

`(aab)b∈fP(I) a⇐aab

`(a(aab)b))b∈gP(I) (4.14)

Of course, we can also derive strings which are not of this form; but for I={ab},P(I) = (a, aab), we obtaingP(I)∩a^∗b^∗={aⁿbⁿ:n∈N}. From this we can conclude that we derive a non-regular language. So the restricted analogy scheme alone does not prevent us from deriving non-regular languages! But that does not show whether (g, RP1),(g, RP r) actually do derive non-regular languages.

Lemma 25 There are finite languages I, such thatgRP1(I)is not regular.

80 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE Proof. Just take the languageI={ab, aabb}. That gives exactly the above

example.

For RP r, things are more complicated: the example I = {ab, aabb} does not work, asRP r(I) =∅. In general, inRP rwe cannot have any analogies of the form (~x, ~x~y~x~z): because in this case, assume we have w~~x~v ∈I. Then by the P r conditions, we need h(w(~~ x~y~x~z)~v)∈ I (we write the brackets to make our reasoning more comprehensible; htakes them out again so that we get a simple string). But then we also need h(w~~x~y(~x~y~x~z)~z~v)∈ I etc., such that I needs to be infinite. Actually, this argument gives us a stronger result, namely that if (~x, ~x~y)∈RP r(I), then there can be no analogy (~z, ~z~u)∈RP r(I) such that~zv~x~yor~xv~z~u, unless~x=~z. Given this, we can conclude:

Lemma 26 For any finite languageI,g_{RP r}(I)is regular.

Proof. We show this by constructing a (non-deterministic) finite state automaton: take a language I, and construct an FSA as follows: 1. for any distinct input, it goes into a distinct state; the state is accepting, if the path is labelled by w~ ∈ I; and all transitions not being reachable by a prefix of a word in I are undefined. This automaton recognizes I, and each state is uniquely characterized by a single input word; we therefore writeqw~ with the obvious meaning. 2. Now for every (~x, ~x~y) ∈ RP r(I), we add a (distinct, non-deterministic transition from allq~u~xto itself, which is labelled by ~y.

The important thing is that we cannot read a new~x⁰ being on the left-hand side of an analogy, before we go back into state we have been after reading~u~x.

Therefore, this automaton recognizesgRP r(I), and obviously, it is finite.

So we see that in this case, the induced languages are regular. That is what we desired; however, it does not directly follow from the restriction of the substring condition, but from properties ofP r. It is doubtful whether this argument is really related to the fact that inferences preserve acceptability. Of course, this has some advantages: as the projected languages are regular, they have much easier decision problems; in particular, the above undecidability results do not hold. This is however only a minor comfort, if we consider that it runs counter to many intuitive arguments, we would a priori restrict our view to regular languages – and we do not even know whether this conforms at all with our intuitions on understanding!

We now come to the second approach to “regularity”. This approach is more clever in the following sense: whereas the first approach assumesa priori that “language” is regular, just by restricting possible substitutions, the second approach just assumes stricter criteria for similarity: these do notnecessarily result in regular languages, but are chosen in a way that under the presumed shape of datasets we observe, we only obtain regular languages. This presumed shape of the restrictions is the following: we have observed that certain patterns are observable only until a certain bounded depth, whereas for others we do not have this restriction. We again restrict ourselves to substitution of substrings.

We write~x≈^{P r−k}_I ~x₁~x~x₂, wherek∈N, if the following hold:

1. ~x≈^{P r}_I ~x1~x~x2

2. if (w, ~~ v) is not recursive for (~x, ~x1~x~x2), w~~x~v ∈ I, then for every i ≤ k,

w(~x1)^k~x(~x2)^k~v∈I.

4.5. PROPERTIES OF PRE-THEORIES I 81 The pre-theory P r-k thus requires that the substitution already hask in-stances inI. The underlying reasoning is the following: if we can do itk-times (rather than once), we can do it arbitrarily often. ForP1, there does not seem to be a reasonable analogue. It is intuitively clear that in natural languages, by choosingklarge enough, we can exclude any analogy (~x, ~x1~x~x2) where both

x1 6= 6=~x2. Note that this argument, though obvious from a “performance oriented” view, is very subtle from a metalinguistic point of view: even if there were no experimentally measurable limits on center embedding etc., we could still choose akwhich would exclude these analogies for any finite dataset I. We have no formal constraint to first fix the pre-theory, and then begin to gather the data. However, an explicit part of our procedure is that after fixing the pre-theory, we are still allowed to gather as much data as we want, before we do the projection (though we are not allowed to discard data!). So we can choose a reasonably smallk, and performance constraints will make sure this actually does the job we want it to do.

In principle, there is nothing we can say against this approach to “language”;

it can also easily be adapted to other pre-theories we have presented so far and which we will present in the sequel. Note that the difference ofP randP r-kis not so much of mathematical relevance, as of linguistic relevance: it is the particular nature of the language we observe which makes a huge difference between the two.

As we focus on mathematical properties, we will not ponder very much about this restriction. We mention however thatP r-kgives us a very good example of what a property of “language” modulo a pre-theory means, a notion we will scrutinize later on: the fact that with sufficiently largek, “natural languages” are regular under (g, P r-k) is (might be) an empirical property of natural language.

Consequently, adopting (g, P r-k), natural languages being regular is still not an empirical property, but neither is it a truism on methodological grounds (we will call this amethodological universal), as it could be otherwise! So it is something in between, and we thus say that “languages” are regular modulo (g, P r-k);

whereas they might not be modulo (g, P r)! So we can speak of properties modulo pre-theories.

There is however also a mathematical difference between P r and P r-k.

Put, for example, k = 4 and I = {ab, aabb, aaa, caaac}. We get g_{P r}(I) = {aⁿbⁿ : n ∈ N} ∪ {cⁿaaacⁿ : n ∈ N}. For P r-4 we get g_{P r−4}(I) = I.

This is clear; what is less clear that there is not even an extension J, such that J ⊇ I, and gP r(I) = g_{P r−4}(J)! To get {aⁿbⁿ : n ∈ N}, we need {ab, aabb, aaabbb, aaaabbbb, aaaaabbbbb}. But if we have these strings in a lan-guage, there is no way of getting an analogy (aaa, caaac) anymore! (Note how these examples relate to upward normality, a notion we consider later on). So P r-kis not only smaller in the sense of the results of projections of a given finite language, but also in the sense that the class of languages it can induce from any finite language seems to be quite restricted, and it seems to me that it is not restricted in a very favorable way (we will prove this claim later on).

4.5.3 On Similarity

We will have a short look on some properties of the similarity relations we have introduced so far. ≈^P_I¹,≈^{P r}_I are symmetric by definition. As such, they are both intransitive. To see this for P1, just but I ={a, ab, b}, where P1(I) = {(a, ab),(ab, a),(b, ab),(ab, b)}. This is intransitive, because (a, b)∈/ P1(I). The

82 CHAPTER 4. THE CLASSICAL METATHEORY OF LANGUAGE same holds forP r, and can be seen with the same example. Is this a bad thing?

There seems to be dispute on whether similarity as such should be transitive or not; there are good arguments in favor and disfavor. There seems to be a strict conceptions of similarity, which is intransitive, and a broad conception, which allows for transitivity and thereby is more liberal. We do not want to go into this, but just point out the following: in its asymmetric reading,P1 is in fact transitive, that is:

Lemma 27 Ifw~ ≈^P1_I ~v≈^P1_I ~uandw~ v~vv~u, thenw~ ≈^P1~u.

The proof is simple: as both≤_I,vare transitive, we know thatw~ ≤_I~u, ~wv~u.

So for P1, if we skip the symmetry, we get transitivity. For P r, this does not obtain, and in fact, the question for transitivity is meaningless:

Lemma 28 There is no finite languageI, with distinctw, ~~ v, ~u∈fact(I), such thatw~ ≈^{P r}_I ~v≈^{P r}_I ~uandw~ v~vv~u.

Proof. Assume we havew~ =~x, ~v=~y1~x~y2, ~u=~z1~y1~x~y2~z2. Assume we have

~a~x~b∈I, where (~a,~b) is the shortest (in terms of string-length) context of~xin I.

This entails that it is non-recursive for the analogies in question. Ifw~ ≈^P1_I ~v, then~a~y1~x~y2~b∈I. As~v≈^P1_I ~u, we also have~a~z1~y1~x~y2~z2~b∈I. But then, we also need~a~z₁~x~z₂~b∈I(downward P r); and so we need~a~z₁~z₁~y₁~x~y₂~z₂~z₂~b∈I (upward P r); then we need~a~z₁~z₁~x~z₂~z₂~b∈I etc., such that Iis necessarily infinite.

So we might think of P1 as representing the liberal, transitive notion of similarity, P rrepresenting the restrictive, intransitive one.

Im Dokument On the metatheory of linguistics (Seite 77-82)