Weak and Strong Generative Capacity - Questions Around the Projection Problem

2.7 Questions Around the Projection Problem

2.7.5 Weak and Strong Generative Capacity

In formal language/grammar theory, one distinguishes between weak and strong generative capacity. The former designates the language as a set of strings, which a given grammar⁶ generates, the latter designates the set of derivations (in Chomsky’s terminology: structural descriptions) with the associated strings.

According to Chomsky, weak generative capacity is of little if any interest to theoretical linguistics (see for example [5],p.16). This is somewhat surprising, given their evident epistemic priority: in fact, all we can observe is actually strings - no one has ever observed a derivation as a primary datum, nor will anyone ever do so, probably; all we see is utterances and speakers judgements on utterances. According to Chomsky, one reason for the neglect of weak generative capacity and associated problems in the generative theory⁷ is the following: our language use is restricted in such a way that what we actually observe and need to handle is only a trivial fragment of the language we “know” (we use

“know” parallel to “language”, to say that this “knowledge” is not a datum, but a theoretical construct. Therefore, matters of complexity in the sense of formal languages, as decidability, parsing efficiency etc., are trivialized: for the fragment we use, these matters are of no importance anyway. What in turnis important is to get the derivations right, in order to achieve explanatory adequacy (and get a semantics).

There is a good point in this; but from our perspective of the priority of epistemology, we would see things exactly the other way round: if something

6We ignore for the moment other generating/recognizing devices.

7This only regards mainstream generativism; there is a tradition which is strongly concerned with both weak and strong generative capacity.

36 CHAPTER 2. FUNDAMENTALS AND PROBLEMS is empirically unaccessible (like the presumed complexity of language), there is no reason to assume its existence in the first place. The Chomskyan answer is of course: the reason for assuming its existence is precisely strong generative capacity and explanatory adequacy. Now from our point of view, accepting this argument, we can also go one step further: if we assume that questions of weak generative capacity are trivialized by language use, we might also ask: (i) is “language use” (in our terms: the observable language) restricted in a way such that it trivializes questions of generative capacity etc. within a precise bound? And if this is the case, then (ii) it could be the case that it trivializes questions of strong generative capacity! The first question is quite clear and has been addressed various times ([65], [20]). The second point needs some explanation. There are two main reasons derivations are interesting to linguistic theory. The first one is: we usually want some kind of compositional semantics, and semantic representations canonically depend on derivations. The second one is less innocent: we usually formulate our theoretical requirements for explanatory adequacy on derivations, not on associated strings. We have to explain what it means for a dataset/language to trivialize strong generative capacity. We start however by making the idea of trivialization precise for weak generative capacity.

Let G,G⁰ be classes of grammars; we write G≤G⁰ iff the class of languages generated by grammars inGis a subclass of the class of languages generated by grammars inG⁰.

Definition 2 Given two classes of grammarsG,G⁰,G⁰≤G, a class of languages D, we say thatDweakly trivializes Gwith respect toG⁰ if for any D∈ D, if there isG∈Gwith L(G)⊇D, then there isG⁰ ∈G⁰ withL(G⁰)⊇D.

Now this is not a very strong notion: if for any alphabet Σ, there is aG∈G⁰ such thatL(G) = Σ^∗, thenG⁰ trivializes any class of grammars. What should give us a more adequate notion is a notion which considers both positive and negative data. As we will argue later on, the linguist is provided with both positive and negative data when performing his task. So assume we have a pair of sets (D⁺,D⁻), such that D⁺∩ D⁻ = ∅ (here D⁺ is the positive, D⁻ the negative data). Now we say the following:

Definition 3 Given two classes of grammars G,G⁰, G⁰ ≤G, a class of pairs of finite, disjoint languagesD, we say thatD trivializesGwith respect toG⁰, if for any (D⁺, D⁻)∈ D, if there isG∈Gwith L(G)⊇D⁺,L(G)∩D⁻=∅, then there isG⁰ ∈G⁰ withL(G⁰)⊇D⁺,L(G⁰)∩D⁻=∅.

This is a meaningful notion, which we have to explain briefly. ByDwe intend the class of observations we can make, and because observations are finite, we can assume that they always form a subset of the class of all disjoint pairs of finite languages. So we can take this class as an example, and it is in a sense the strongest case: because ifDtrivializes Gwrt. G⁰, andD⁰ ⊆ D, then also D⁰ trivializes Gwrt. G⁰. So assume Df in is the class of all (L, L⁰) such that L, L⁰ are finite and L∩L⁰ =∅. What trivialization results do we obtain? One obvious thing is the following: Dtrivializes any class of grammars wrt. the class of regular grammars, for obvious reasons. Even smaller classes of grammars do the same, take the star-free languages, and even the co-finite languages. So if we go for trivialization, we finally end up with quite trivial grammars. Note, by

2.7. QUESTIONS AROUND THE PROJECTION PROBLEM 37 the way, the connection with learning and Angluin’s theorem (see [1]); we will elaborate on these notions in chapter 6.

Weak trivialization is what according to Chomsky makes weak generative capacity uninteresting for linguistics, and one might follow him in this point.

But the same concept can be defined for strong generative capacity, though the definition requires a bit more work. When we start, we have to take some concept of structural description (SD) for granted (this notion is taken from LSLT; I will use it only here); and we denote bySD(~x) an SD which is associated to a string~x(but note that there are possibly manySDs for a single string).

We now assume that grammars, more than generating strings, generate strings associated with structural descriptions.

Definition 4 Given classes of grammars G,G⁰, G⁰ ≤G, and a class of disjoint pairs of languages D , we say that D strongly trivializes G wrt. G⁰, if for every(D⁺, D⁻)∈ D, if there is aG∈G such thatGassigns at least one SD to every~x∈D⁺ and no SD to any ~y∈D⁻, then there is a grammarG⁰ ∈G⁰, such that (i)Gassigns an SD’ to every~x∈D⁺ and no SD to any~y∈D⁻, and (ii) there is a bijective mapφ:S[D⁺]7→ S⁰[D⁺], where S[D] (S⁰[D])is the set of structural descriptions whichG(G⁰)assigns to some~x∈D.

Note that this definition is much more problematic than the first one, mainly because it is not always very clear what counts as a structural description. Take for example tree adjoining grammars (TAG). In most standard approaches to their semantics (see [28]), we do not interpret the derived tree, which usually counts as the structural description, but rather thederivation tree, which is a regular tree (contrary to the derived syntactic tree). Kobele ([35]) provided a semantics for minimalist grammars ([66], [50]) in a similar way. The reason this is possible is that the derivation treeencodes the syntactic tree. Therefore,φ just has to be an appropriate coding, which however might be difficult to find.

Note, by the way, the fact that we only need no encode the final SD, not all of its derivation steps, and that the coding needs to work only for strings inD⁺ – the structural descriptions on other strings do not matter at this point.

A by now classical example for what we here have defined as strong triv-ialization was provided by GPSG: though GPSG-grammars are context-free, they could handle phenomena of movement in much the same way as the much more powerful transformational grammars. They therefore trivialized them for what was considered to be “natural language” at that point. When this dataset was enlarged with data from Swiss German, GPSG-grammars turned out to be inadequate to describe some regularities (in the sense of structural descriptions).

So the question of strong trivialization is actually a very interesting one. There has been some very interesting work on the topic of coding and codability of properties within grammars, as ([37],[57]) which for reasons of space we can only mention.

The intuition behind the definition of trivialization is that for the data we can observe, we trivialize a certain grammar if we cansimulate the strong generative capacity of our desired grammar in terms of a weaker grammar. In this case, though we abandon the phrase structure hypothesis, all our intuitions can be captured and all our generalizations can be maintained (for the finite fragment we have observed!) within a weaker grammar. One might think of these arguments as essentially formal and non-linguistic, but if it is the case that our observable

38 CHAPTER 2. FUNDAMENTALS AND PROBLEMS fragment of language trivializes the grammars which we think to be descriptively adequate, this would be a great insight into the structure of language. It would mean that the patterns of our language are used precisely in a way such that we can simulate complex patterns with a simple grammar.

This brings us back to our projection problem. The notion of trivialization allows us for several choices: either we go the Chomskyan way, and say: we do not need to care for any complexity arguments, because our formalisms are trivialized by the data and much weaker formalisms. But this choice has a somewhat unscientific flavor, after all. So we rather might want to go the other way and say: why should we need stronger formalisms at all, given the data we have and formalisms which (strongly) trivialize them, which means that empirically speaking, they are as adequate? After all, scientific thought should lead us to choose the simplest hypothesis! The point is that the choice between the strong and the weak formalism is usually not based on the data weobserve, but on the data weconstruct, which is, in the Chomskyan reasoning, exactly the more complex language! So this is another example where a question, which is of presumably linguistic nature, turns out to depend on projections and thus on linguistic metatheory. We will later on pursue the ideas we have sketched here in the setting of finitary linguistics.

Im Dokument On the metatheory of linguistics (Seite 35-38)