CATEGORY STRUCTURES

define a class of formal objects called "category structures" where each such object provides a ... to facilitate analysis and comparison of the under...

0 downloads 149 Views 2MB Size
CATEGORY

STRUCTURES

Gerald Gazdar Cognitive Studies P r o g r a m m e , University of Sussex, Brighton BN1 9QN, U.K. Geoffrey K. Pullum Cowell College, University of California, Santa Cruz, Santa Cruz, California 95064, USA Robert Carpenter, Ewan Klein Centre for Cognitive Science, University of Edinburgh, E d i n b u r g h EH8 9 L W , U.K. T h o m a s E. H u k a r i Department of Linguistics, University of Victoria, Victoria, B.C., C a n a d a V8W 2Y2 R o b e r t D. L e v i n e Department of Linguistics, University of British Columbia, Vancouver, B.C., C a n a d a V6T l W 5

This paper outlines a simple and general notion of syntactic category on a metatheoretical level, independent of the notations and substantive claims of any particular grammatical framework. We define a class of formal objects called "category structures" where each such object provides a constructive definition for a space of syntactic categories. A unification operation and subsumption and identity relations are defined for arbitrary syntactic categories. In addition, a formal language for the statement of constraints on categories is provided. By combining a category structure with a set of constraints, we show that one can define the category systems of several well-known grammatical frameworks: phrase structure grammar, tagmemics, augmented phrase structure grammar, relational grammar, transformational grammar, generalized phrase structure grammar, systemic grammar, categorial grammar, and indexed grammar. The problem of checking a category for conformity to constraints is shown to be solvable in linear time. This work provides in effect a unitary class of data structures for the representation of syntactic categories in a range of diverse grammatical frameworks. Using such data structures should make it possible for various pseudo-issues in natural language processing research to be avoided. We conclude by examining the questions posed by set-valued features and sharing of values between distinct feature specifications, both of which fall outside the scope of the formal system developed in this paper.

The notion syntactic category is a central one in most grammatical frameworks. As Karttunen and Zwicky (1985) observe, traditional "parsing" as taught for languages like Latin involved little more than supplying a detailed description of the grammatical category of each word in the sentence to be parsed. Phrase structure grammars are entirely concerned with assigning terminal strings to categories and determining dominance and precedence between constituents on the basis of their categories. In a classical transformational grammar

(TG), the objects transformations manipulate are primarily strings of syntactic categories (and, to a lesser extent, of terminal symbols). This is just as true of recent TG work. Although the use of syntactic categories is not a logical prerequisite of generative grammar (see Levy and Joshi (1978)), no linguistic approach known to us dispenses with them altogether. In view of this, it is perhaps surprising that linguists have not attempted to explicate the concept "syntactic category" in any gen-

Copyright 1988by the Associationfor ComputationalLinguistics.Permissionto copy withoutfee all or part of this materialis granted provided that the copies are not madefor direct commercialadvantageand the CL referenceand this copyrightnotice are includedon the first page. To copy otherwise, or to republish, requires a fee and/or specificpermission. 0362-613X/88/010001-19503.00 Computational Linguistics, Volume 14, Number I, Winter 1988

1

Gerald Gazdar et al.

eral way, i.e., independently of particular systems of notation and the associated substantive assumptions about grammar. In this paper we offer an explicit metatheoretical framework in which a notion of "syntactic category" receives a precise definition. The framework is intended to facilitate analysis and comparison of the underlying concepts of different theories, freed from the notational and sociological baggage that sometimes encumbers the original presentations in the literature. Viewed from the standpoint of implementation, it can be regarded as providing a unitary data structure for categories that can be used in the implementation of a number of superficially different grammatical frameworks. We begin by defining in section 1 a space of categories broad enough to encompass the objects employed as syntactic categories in a range of diverse types of generative grammar. Then, in section 2, we present the syntax and semantics for L c, a formal language for defining constraints on categories. In the succeeding section we provide illustrative definitions of the grammatical categories used in a number of frameworks. We cover simple phrase structure grammar in section 3.1; tagmemics in section 3.2; Harman's (1963) augmented phrase structure grammar in section 3.3; relational grammar and arc pair grammar in section 3.4; X syntax, TG, and the government-binding (GB) framework in section 3.5; generalized phrase structure grammar (GPSG) in section 3.6; systemic grammar in section 3.7; categorial grammar in section 3.8; and Aho's (1968) indexed grammar in section 3.9. We then go on to consider some relevant computational complexity matters (section 4). Finally, we discuss two issues that do not arise in any of these approaches, and which fall outside the scope of the simple theory that we present, namely the use of sets as values of features (section 5) and values shared between distinct feature specifications (section 6). These issues are important in the context of the category systems employed in functional unification grammar (FUG), lexical functional grammar (LFG), and the PATR II grammar formalism. Our goal in this paper is not an empirical one, but rather one which is analogous to that of Montague's "Universal Grammar" (1970) (see Halvorsen and Ladusaw (1977) for a useful introduction) which attempts to give a general definition of the notion "possible language" in terms applicable to, but not limited to, the study of human languages. We have the much more modest goal of characterizing one rather simple and general notion of "possible syntactic category", and of exploring the range of linguistic approaches that it will generalize to, its formal properties, and its limitations. As will become evident below, our exercise is complementary in certain respects to that of Pereira and Shieber (1984) and Shieber (1987) and to recent work of Rounds and his associates on the development of a logic for the description of the notions of syntactic category that are embodied in functional unification grammar and 2

Category Structures

PATR II (see Kasper and Rounds (1986), Moshier and Rounds (1987), Rounds and Kasper (1986)). We do not concern ourselves with the appearance or representational details of a given theory of categories (or any of the other aspects of the linguistic framework in question, e.g., its rule system), but only with its underlying semantics--the issue of what set-theoretic (or other nonlinguistic) objects provide categories with their interpretation. We are content with being able to exhibit an isomorphism between one of the theories of categories permitted by our framework and the concrete example we are considering; we need not demonstrate identity. Hence we have deliberately refrained from specifying a formal language for representing categories and features. To the extent that we need to produce exemplificatory features or categories for inspection, we may use the conventional notation of the approach in question, or the ordinary notations of set theory, or an informal labeled graph notation introduced below, but we do not offer a representational formalism for categories that has a significance of its own. In the framework we provide, it is possible to define the category systems of a wide variety of apparently very different approaches to natural language syntax simply by defining two primitive typing functions, and by varying the constraints stated on the categories that they induce. The exercise of expressing the content of various specific linguistic approaches in such terms immediately calls attention to certain interesting formal issues. For example, we reconstruct below the notion of a list-valued (or stack-valued) feature in terms of category-valued features, which automatically allows operations defined on categories such as unification to apply to lists without special redefinition. An interesting fact that emerges from the view taken here is that on the matter of syntactic categories, there is somewhat more commonality among the diverse approaches currently being pursued than there appears to be when those approaches are viewed in the formalisms used by their practitioners. The various syntactic frameworks that we examine below can be seen to share a great deal of their underlying substantive claims about the information content of the category label of a constituent. Our explication of these underlying commonalities may make somewhat easier the task of the computational linguist attempting to implement a system on the basis of some grammatical framework, or attempting to decide which approach to implement in the first place. In order to prepare for some of the definitions that follow, we will briefly and informally sketch some of our assumptions about features and categories and the terminology we shall use for talking about them. A category is a set of feature specifieations meeting certain conditions to be defined below. A feature specification is an attribute-value pair (f, v) where the attributef(the feature) is atomic (i.e., given by some finite list, and regarded as unanalyzable) and the value v is either Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et al.

atomic or complex. H e r e we shall assume just one type of complex value, namely a category (but see below in section 5). An example of an atom-valued feature specification would be (SINGULAR,+) (which many grammarians would write as [+SINGULAR]); intuitively, it might mark singular number (though, of course, the interpretation it actually has depends on the role it plays in the grammar). An example of a complex feature specification, with a category as the value, would be (AGREEMENT, {(SINGULAR,+), (GENDER,FEM),(PERSON,3)}); intuitively, it might be used to c o n v e y that the value of the AGREEMENT feature is a category representing the combination of singular number, feminine gender, and third person. In the following sections, we will always use SMALLCAPITALS for feature names, and we will generally replace " - " and " + " , which are standard usage in the linguistic literature for the atomic values of a binary feature, by 0 and 1 respectively. As we have said, a category is a set of feature specifications meeting certain conditions. We will now specify these. We do not require that every feature name be represented in each category, but we do require that each occurrence of a feature be paired with exactly one value in any set of specifications; thus {(SINGULAR,+>, (SINGULAR,-->} could not be a category. Hence a category can be modeled as a partial function C:F--> V, where F is a set of features and V is the set of values. An equivalent alternative would be to treat categories as total functions into a range that includes an element ± that can stand as the value where the corresponding partial functions would fail to assign a value. Note that we use the term 'range' here, and subsequently, to refer to a set that includes all the values that a partial function or family of partial functions might take given appropriate domain elements, rather than just the set of values that it does take when we fix a particular domain for a function. It may be helpful to think of a category as having the structure of an unordered tree, and we will introduce a type of diagram below which exhibits this structure overtly. Often, however, the idea of categories as partial functions will be crucial, so it should be kept in mind throughout. Since the set V of values may include categories, the definition of the entire set of categories has to be given recursively. Moreover, it has to allow for the possibility that not all values are compatible with all features. Thus, for example, in a given feature system, (GENDER, 0 ) and (PERSON,plural) might be coherent objects but mnemonically perverse, whereas in another feature system, they might simply be ill-formed. We shall show how these issues can be resolved in the coming sections. We will not, however, give a constructive definition of the set of categories for each grammatical framework we consider. Instead, given our comparative and metatheoretical goals, it turns out to be more Computational Linguistics, Volume 14, Number I, Winter 1988

Category Structures

convenient to define a category system as a pair (~, C> where ~ is a category structure, which defines a set of potential categories (see section 8), and C is a set o f constraints expressed in L c, a language for which the category structure defines the models (see section 9). The actual categories in the system are then to be construed as that subset of the potential categories defined in ~, each m e m b e r of which satisfies every constraint listed in C. 1 DEFINING CATEGORY STRUCTURES In this section we define the notion of a category structure, which is basically a choice of primitives: a list of features, and a range of possible values for each. Here and throughout the paper we will frequently use " 2 " to denote the set {0, l} (the context will make it clear when " 2 " represents an integer and when it represents a set). We will write A B for the set o f total functions from B into A , A (m for the set of partial functions from B into A, @(A) for the power set of A,IAI for the cardinality of A, and Aft) for the domain of a (partial) function f ( i f f is a partial function than A(f) is the set of items to which f assigns a value). A category structure E is a quadruple (F, A, r, p) where F is a finite set of features, A is a finite set o f atoms, r is a function in 2F, and p is a function from {flW.D = 0} into ~(A). The function r partitions F into two sets: the set of type 0 features F ° = {fir(f) = 0}, and the set of type 1 features F l = {fir(f) = 1}. We will write r as r 0 when F = F °. T y p e 0 features take atomic values and type 1 features take categories as values. The function p assigns a range of atomic values to each feature of type 0. The set of categories K is recursively defined in terms of (F, A, r, p), in a way very similar to that used in Pollard (1984, p. 299ff), though Pollard's assumptions differ on some important details. A relatively informal presentation will suffice here. We will refer to the set o f partial functions from F ° into A that are consistent with p as the type 0 categories. We first define the set of pure type 0 categories of ~ as those containing only type 0 feature specifications. Then we build up K via a series of approximations we will refer to as levels, finally taking the infinite union of all the levels to obtain K itself: (1) a. O is a category at level 0 b. If a is a type 0 category and fl is a category containing only type 1 features whose values are categories at level n, then a U fl is a category at level n + 1. c. K is the set of all categories at all levels n -> 0. Given the way K is built up, the induction step in (Ib) being restricted to union of finite partial functions, it should be clear that K is a recursive set. We can define certain relations and operations on the space K of possible categories. Thus, we can give a

Gerald Gazdar et aL

Category Structures

constructive definition for unification (symbolized U) as a binary operation on categories.

there are just two distinct types of well-formed basic constraint:

(2) Definition: unification (i) if (f, v) E a but/300 is undefined, then (f, v) E aU/3; (ii) if (f, v) E/3 but a(f) is undefined, then (f, v) E ~U/3; (iii) if (f, v;) e a and (f, b) e /3 and ~-(]) = 1, then if vi U b is undefined, a U/3 is also undefined, else (f, v i U vj) @ a U/3; (iv) if (f, vi) E a and (f, 5) E /3 and T(j') = 0, then if vi = vj, (f, vi) E /3 U/3, else a U/3 is undefined. (v) nothing else is in a U/3.

(5) a. f b. 32a

We can then use unification to define the subsumes relation between categories (where 'subsumes' means 'is more general/underspecified than', or 'is extended by'). We symbolize 'subsumes' with 'E_', and define it as follows. (3) Definition: subsumption o~ subsumes/3 (a E_/3) if and only if/3 = a U/3. Thus a subsumes/3 if and only if/3 is the unification of a and/3. When a subsumes/3 then we may refer to/3 as an e x t e n s i o n of a. If a U/3 is undefined, then/3 = a U/3 fails, and a does not subsume/3. From this it follows that, if a and/3 are categories, then a = / 3 if and only if a E_/3 and/3 E_ a. The following theorem is provable by induction on category levels. (4) Theorem: a subsumes/3 if and only if (i) Vf E (A(a) fq F °) [a(f) = /3(f)] and (ii) Vf E (A(a) tq F 1) [a(f) F"/3(f)]. 2 THE CONSTRAINT LANGUAGE L c We now provide an interpreted formal language, L C, for expressing specific constraints on categories. Constraints are statements that can be true or false of a category. By requiring satisfaction of the constraint, a constraint can be used to delimit a subspace within the set K induced by a given category structure E, to serve as the grammatical categories for a particular type of grammar. It should be noted that our goals in formulating L c are slightly different from those of Rounds and his associates: L c is a language for formulating constraints on well-formed categories, not a language whose expressions are intended for use in place of categories. To put it rather crudely, our language is for category definition whereas Rounds' is (in part) for category manipulation. However, the languages look rather similar syntactically, and where they overlap, the semantics is essentially the same. We define two types of constraint: basic and complex. If f is an element of F, and a is an element of A, then 4

(where ~-(f) = 0)

Informally, (5a) constrains a category to contain some specification for the feature f; thus, the constraint "BAR" says that every syntactic category satisfying it has as one of its elements a pair (BAR, n). This does not entail that every value of every category-valued feature contained in the category must contain BAR; a basic constraint applies to the " t o p level" of the tree-like structure of a category. Likewise, (5b) says of a category satisfying it that it has as one of its elements the pair (f, a). Note that the only thing a basic constraint can require of a type 0 feature beyond saying that it must be present (defined) is that it have a particular atomic value, and that a basic constraint cannot require anything of a type 1 feature at all beyond demanding its presence. Turning to complex constraints, we now continue the list (5), giving the syntax for each type of complex constraint together with an informal indication of its semantics. Assume t h a t f i s an element of F 1, and 05 and are themselves well-formed basic or complex constraints, and that we are considering the interpretation of the constraints with respect to some fixed category structure Y and some category a. (5) c. d. e. f. g.

32 05 ' f is defined in a and its value satisfies 05' -7 05 ' a does not satisfy 05' 05 V ~O' a satisfies either 05 or ~O' 05 A @ ' a satisfies both 05 and ~0' 05 --> ~0 'either a does not satisfy 05 or a does satisfy ~0' h. 05 ~ ~0' a satisfies either both or neither of 05 and i. D05 ' a satisfies 05, and all values of type 1 features in a satisfy 1~05' j. © 05 'either a satisfies 05 or some value of a type 1 feature in a satisfies O 05'

Constraints of the forms (5a) through (5h) are fairly straightforward, but constraints like those shown in (5i) and (5j) need a little more discussion. They introduce modality into our language. Their purpose is to allow for recursive constraints to be imposed on successively e m b e d d e d layers of category values. As indicated, a category a satisfies r-]05provided that, firstly, a satisfies 05 and secondly, whenever a assigns a category/3 to a type 1 feature f, /3 satisfies D05. This may appear to introduce a circularity, but it does not: categories are finite, and within any category there will be a level so deeply embedded in the tree structure that there are no more category values within it; at that point [~05 is true if 05 is, thus ending the recursion. Our choice of notation in (5i) is quite deliberate: in effect, constraints of the form (50 express universal quantification over embedded accessible' categories in Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et al.

the way that the familiar necessity operator [] of modal logic enforces universal quantification over accessible worlds in the standard semantics. The possibility operator in (5j) is, as usual, the dual of the necessity operator: O 4, says of a category a satisfying it that either a satisfies tO, or there exists a category-value/3 assigned to a type 1 f e a t u r e f b y a such that/3 satisfies <>tO. As a simple example of the sort of work a complex constraint in Lc might do in a grammatical theory, consider the constraint that is known as the " C a s e filter" in recent T G (see Chomsky 1980, p. 25). Stated informally as " * N , where N has no C a s e " , the constraint appears to require every occurrence of the feature complex characterizing the category N, i.e., every occurrence o f [ + N , - V ] , to co-occur with a feature called " C a s e " . The constraint can be stated in Lc as (6). (6) [](((N: I) A (v: 0)) ~ CASE) Here and from now on, we use parentheses in the obvious way wherever it is necessary prevent ambiguity in the statement of constraints. The account of L c given thus far will suffice for a reading of this paper, but those readers who would like to see the semantics given more formally may turn to the appendix. To recapitulate, a theory of categories ® in our sense is a pair (E, C), where I£ is a category structure and C is a set of sentences of L o The set of categories determined by ® is the maximal subset Kc of K determined by E such that each m e m b e r of K c satisfies every member of C.

3 ILLUSTRATIVE APPLICATIONS We will now illustrate the application of the apparatus developed thus far by reconstructing the category systems used in a number of well-known grammatical frameworks that linguists have developed, most of them frameworks that have been used in natural language processing systems at one time or another. 3.1 SIMPLE PHRASE STRUCTURE GRAMMAR

The case of simple phrase structure grammar is trivial, but will serve as an introduction to the form of later sections, and as a straightforward example of the use of a type 0 feature. The set of categories used in a simple phrase structure grammar is just some finite set of atomic categories {al . . . . . a,}, for example, {S, NP, VP, Det, N, V}. So we fix values for F, A, z, and p as in (7): (7) a. b. c. d.

F = {LABEL} A = {a 1. . . . . a,} ~'o p = {
Thus, for example, w e might have A = {S, NP, VP, Det, N, V}, and thus have /~LABEL) as the same set. In Computational Linguistics, Volume 14, Number 1, Winter 1988

Category Structures

addition, we need the following constraint, to make sure that every category does indeed have a specification for the solitary type 0 feature LABEL, i.e., to exclude the empty set from counting as a category: (8) LABEL

Obviously, w e can n o w show that the category inventory for any simple phrase structure grammar is representable. We let 0 be the bijection defined by 0(a/) = {(LABEL,ai)}, and the result is immediate. Thus there is a bijection from the set of simple phrase structure grammar categories to the categories admitted by the category structure (7) under the constraint (8). As is evident, the set of categories induced is finite, and of cardinality n = IAI. 3.2 TAGMEMICS

It may be that there are more published syntactic analyses of languages in the framework of tagmemics than in any other theoretical framework ever developed. Since the early 1960s, those who have followed the work of Kenneth Pike, including a very large number of field linguists working for the S u m m e r Institute of Linguistics, have produced analyses of hundreds of languages, mostly non-Indo-European. Moreover, Postal (1964, p. 33) remarks that " t h e s e languages are, for the most part, exotic enough so that the tagmemic descriptions of them may very well be the only ones done." Tagmemics describes syntactic structure in terms of TAGMEMES,which are notated in the form A:b, where A is said to represent a SLOT and b a FILLER. For example, Elson and Pickett (1962) represent (part of) the structure of English prepositional phrases and intransitive clauses with tagmemic formulm (i.e., rules) similar to the following (we simplify very slightly): (9) a. L r a P h r - - +R:prep + A : m N c b. mNc -- + L i m : a r --M:aj + H : n c c. iCl = + S : m N c +iP:v 3 The informal explication of these is: (9a) one type of location relater-axis phrase consists of an obligatory relater slot filled by a preposition followed by an obligatory axis slot filled by a modified count noun phrase; (9b) one type of modified count noun phrase consists of an obligatory limiter slot filled by an article followed by an optional modifier slot filled by an adjective followed by an obligatory head slot filled by a count noun; (9c) one type of intransitive clause consists of an obligatory subject slot filled by a modified common noun phrase followed by an obligatory intransitive predicate slot filled by a verb of class 3. Thus the left hand side of a formula (before the equality sign) consists of an atomic label, and the right hand side is a string of tagmemes, which are ordered triples (a, b, c) where a is an indication of optional (-+) or obligatory (+) status, b is a slot or function name, and c is a filler or category label. 5

Gerald Gazdar et al.

One way of representing tagmemes in our terms is to employ a type 0 feature bearing the slot name, taking as value an atomic label identifying the filler. Thus we set up correspondences like the following: (10) a. R:prep {(R, prep)} b. A:mNc {(A, mNc)} c. Lim:ar {(him, ar)}

d. M:aj

{(M, aj)}

e. H:nc {(H, nc)} f. S:mNc {(s, mNc)} g. P:v 3 {(P, v3)} Left hand sides of formulae can be seen as implicit schematizations over slot names. For example, (9b) says that for any slot name o-, a constituent labelled {(o-, mNc)} may have the immediate constituent analysis seen on the right hand side of the equation. A category structure representing a set of categories including all those seen in the above illustrative examples is given in (11). (II) a. F b. A

=

{R,A,LIM,M,H,S,P}

= {LraPhr,prep,mNc,ar,aj,nc,vl,v2,v3}

c . "r0

d. p = {(R,{prep}),(A,{mNc}),(LIM,{ar}),(m,{aj}),(H, {nc}),(s,{mNc}),(p,{Vl,V2,V3})} This artificially tiny fragment does not show much of the structure that would be revealed in a larger fragment, with more word classes and phrases types, but it will suffice to show how we could set up a category structure that provided isomorphic correspondents to the categories employed in a tagmemic description. Moreover, there is an unclarity about whether there is more to a tagmemic formula than has been illustrated here; as discussed by Postal (1964), there are some remarks about the treatment of agreement in Elson and Pickett (1962) that imply either finite schematization or additional representational devices of an unclarified sort. We will not explore this topic here. Postal (1964) is probably right in saying that tagmemics appears to be only notationally distinct from context-free phrase structure grammar. Longacre (1965) claims that "[b]y bringing together function and set in the tagmeme" tagmemics ensures that "function is at once kept in focus and made amenable to formal analysis." Under our reconstruction, "functions" like "subject" o r " m o d i f i e r " are "made amenable to formal analysis" simply by incorporating them into the feature structure of categories, making it clear that little was at stake in the debate between Postal and Longacre over the content of tagmemics. It is clear that the number of categories defined by a category structure for tagmemics will be bounded from above by IFI • IAI, and thus finite. The question of whether tagmemics reduces to context-free grammar therefore turns on whether tagmemic formulae can in all cases be reduced to contextfree rules. This seems likely, but such issues are not the focus of our attention in this paper. 6

Category Structures

3.3 H A R M A N ' S A U G M E N T E D P H R A S E S T R U C T U R E GRAMMAR

Harman (1963) presents a proposal that involves augmenting the ordinary category inventory (S, NP, VP, etc.) of simple phrase structure grammar by attaching "an unordered sequence of zero or more (up to N for some finite N) subscripts" to a category. Abbreviatory conventions are then used to manage large sets of rules over the resultant vocabulary. Note that the indices stand for the members of a set rather than a sequence, and that there is only a finite number of them. To formalize Harman's proposal in the present framework, we again use LABEL as the feature that identifies major syntactic categories in the traditional sense, and we set up a finite number of type 0 features ~'1. . . . . ~', to correspond to the presence (value 1) or absence (value 0) of each of the n different subscripts. The set of feature specifications for these features reconstructs the characteristic function of the set of indices. The category structure is as follows: (12) a. b. c. d.

F = {LABEL, F 1 . . . . . Fn} A = {al . . . . . a,,} tA 2 "r0 p = {(LABEL,{a 1. . . . . am}), (FI, 2) . . . . .

(Fn, 2)}

We now have to guarantee that every category has a value for LABEL and a value for each •,. in F. We therefore impose the following constraint: (13)

LABEL / ~ F I / ~ . . . / ~ F n

The resultant specification induces a finite set of categories, of cardinality m • 2n. Harman's system is more than just a historical curiosity. More recent works are found that use almost exactly the same sort of syntactic categories. For example, the use made of syntactic features in one influential variety of augmented phrase structure grammar, the Prolog-based definite clause grammar (DCG) formalism of Pereira and Warren (1980) closely resembles that of Harman. However, it is clear that the full power of the DCG formalism can, in principle, be used to exploit features with structured values and valuesharing (see section 6 on the latter). 3.4 R E L A T I O N A L A N D A R C P A I R G R A M M A R

Relational grammar (RG) Perlmutter and Postal (1977) and arc pair grammar (APG) Johnson and Postal (1980), (henceforth J & P ) appear to make relatively little use of grammatical category information, expressing most grammatical rules as conditions on arcs representing grammatical relations between nodes (in RG) or as conditions on relations between such arcs (in APG) rather than on the labeling of nodes. Nonetheless, J&P make clear that nodes are assigned grammatical category labels in APG, and since APG is essentially a formalized elaboration of RG ideas, we will assume that much the same is true in RG, though the RG literature so far has not made such aspects of the approach Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et ai.

Category Structures

explicit. Syntactic category labels are not entirely without utility in RG and APG, since, for example, agreement rules crucially make reference to categorial properties like number, gender, and person, and the proper formulation of agreement rules has been a topic of some interest in RG and APG research. As defined in J&P, an arc is an ordered pair (R((a, b)), c I . . . CA) where R((a, b)) indicates that b (the second or head node) bears the grammatical relation named by the "relational sign" R to a (the first or tail node), and cl through ck are the representational strata Ladusaw (1985) at which this holds. In APG, categories are assigned to nodes by means of arcs in which the relational sign is L; such arcs are referred to as L arcs. The head of an L arc is simply an atomic label from a set of "grammatical category n o d e s " (called GNo by J&P) that is given by listing. Two types of grammatical category are recognized in APG: Major categories such as CI (clause), Nom (nominal), and V (verb), and minor categories such as Feminine, Singular, Third-Person, etc. A general constraint (Pair Network L a w 31, the Major Category Exclusiveness Law) prevents a node from being the tail of two distinct arcs with heads in the set Major (J&P, 202), i.e., the set of grammatical category nodes that represent major categories. We can obtain the effect of this law simply by assuming a type 0 feature LABEL which takes values in the set of Major categories. In the case of minor categories, APG permits multiple atomic elements from GNo to be attached by L arcs to a single tail node (J&P). Thus a node might be the tail of L arcs whose head nodes are the atoms Nom, Feminine, Singular, and Third-Person, representing a third person singular feminine noun or noun phrase. It is easy to represent such sets of labels attached to a single node using type 0 features. We can represent the set of elements of GNo assigned to a given tail node by including a category corresponding to the characteristic function of that set, as with the indices in Harman's system. So we fix values for F, A, T, and p as shown in (14): (14) a. b. c. d.

F = {LABEL, F I . . . . . Fn} A = {al . . . . . am} U 2 ~'o p = {(LABEL,{a I . . . . . am}), (F1, 2) . . . . .

(Fn, 2)}

Here Major = {?l . . . . . ?n}, and GNo = {? 1. . . . . ~,} U am}. The constraint needed is the following:

function ~"that is undefined for LABELand which assigns 0 to each ~'i E F. It can be shown that the category system just defined adequately represents category labelling in APG, in the sense that there exists a bijection 0 between (a) nonterminal nodes together with their grammatical category L arcs in an admissible APG syntactic representation and (b) admissible categories induced by the category structure in (14) and the constraint in (15). From an arbitrary well-formed APG pair network we can extract the set X of arcs it contains (J&P), and the set N of nodes associated with X. Since we are not concerned with coordinates, we can discard the coordinate sequences and consider just the incomplete arcs to which the arcs in X correspond. By Theorem I (J&P), all and only the terminal nodes in N are heads of L arcs. Extracting just the arcs with terminal nodes as heads gives us the set of L arcs from X; and discarding those with heads not in GNo gives us just the L arcs with grammatical category labels as their heads. The members of this set can be partitioned into equivalence classes having the same tail node (since by definition no arc has more than one tail). For convenience of reference we can call these equivalence classes categorylabelled nodes. Theorem. There is a bijection from APG categorylabelled nodes to categories admitted by (14) and (15). Proof. Consider an arbitrary category-labelled node K with tail n. By PN L a w 31, the Major Category Exclusiveness Law, exactly one arc in K has a head which is in Major. Let 01 be the bijection established by 01(L(n, a)) = t~, and let 02 be the bijection established by 0(a) = (LABEL, O~) iff a E Major and (a, 1) otherwise. The category corresponding to K will be the smallest set that contains 0102(A) for all arcs A in K and contains (Fi, 0) for all vi in F that are not in the range of 01. Since 01 and 02 are bijections, their product 0102 is a bijection. The correspondence in the opposite direction is obvious. A node that is the tail of no L arcs will be mapped by 0102 to ~, and other nodes will be mapped onto categories in which the values of the features record the details of the category-labelling L arcs in r together with (redundantly) information about which one is the major category, the mapping yielding a unique result in each case.ll

{a I . . . . .

(15) ~ ' I A . . . A F . This has the effect of requiring every category to include the characteristic function of a set (of minor categories, in the APG sense). However, we do not need to guarantee that every category has a specification for LABEL, as J & P specifically leaves it open whether there are nonterminal nodes with no associated grammatical categories; the absence of any grammatical category node will be reconstructed in our terms as that Computational Linguistics, Volume 14, Number I, Winter 1988

The set of APG (and, we assume, RG) categories induced is finite, and ceteris paribus is of cardinality m • 2"; it will be much smaller once further conditions on coocurrence of minor categories are imposed (Masculine and Feminine presumably cannot both be mapped to 1 in a category, for example). It is of interest that despite the utterly different grammatical formalism and theoretical background associated with it, the APG notion of syntactic category can be seen to be almost identical to that of Harman's augmented phrase struc7

Gerald Gazdar et al.

ture grammar, nodes without LABELvalues contributing the only relevant difference.

3.5 X SYNTAX, TRANSFORMATIONAL GRAMMAR, GOVERNMENT-BINDING

In the great majority of contemporary works in transformational grammar (TG), including those representing what is known as "government-binding" (GB) Chomsky (1981), the conception of grammatical categories follows what is called "the X-bar convention" Jackendoff (1974) Hornstein (1977) or "X-bar syntax". "X-bar" is often notated X or X', or as X 1, X 2, etc., the superscript numeral denoting the number of bars or bar level.) The central idea of X-bar syntax is that phrasal categories are "projected" from lexical categories. Given a lexical category X, the related phrasal nodes are assumed to be X(= X' = X1), X(= X" = X2), and so on. Representing phrasal categories as founded on lexical categories in this way amounts to treating categories as non-atomic, the distinction between lexical categories and the various levels of phrasal category being tantamount to a feature specification distinction. Bar level is not treated in terms of features in most works using X-bar notation, probably because of the tradition in TG (and related work in segmental phonology) restricting features to the values { - , +}. Thus Bresnan (1975) treats categories as ordered pairs (i, M) where i is a natural number representing the bar level and M is a matrix of feature specifications, and the same formalization is used by Lasnik and Kupin (1977). Here we simply integrate bar level information with the rest of the feature system. Although the origins of the X-bar proposal (Harris 1951) do not take such a feature analysis of categories any further, but treat lexical categories as atomic, it is always assumed in current instantiations of X-bar syntax that lexical categories themselves have a feature analysis. In much TG, it is presupposed that the lexical categories N, A, V, and P are to be analyzed in terms of two binary features N and v. 1 Lasnik and Kupin (1977) is a fairly explicit formulation of this type of category system. They assume a maximum bar level of three. To characterize their system of categories, we fix our values for F, A, ~-, and p as in (16), and impose the constraint in (17). (16) a. F = {N, V, BAR} b. a = {0, 1 , 2 , 3 } C. ~o d. p = {iN, 2), (V, 2>, (BAR, a)}

(17) N/% v/% BAR This yields a system of 16 categories, four at each bar level. 8

Category Structures

Jackendoff (1977) proposes a version of X-bar syntax in which lexical categories are distinguished from one another by means of the features [-----SUBJ], [--0BJ], [-----C0MP], and [--+DET] rather than by I-----N]and [-+v]. He does not provide an explicit definition of his full set of categories, but he gives enough detail for it to be deducible. To define Jackendoff's system of categories, we fix our values for F, A, ~', and p in the manner shown below: (18) a. b. C. d.

F = {SUBJ, C0MP, DET, 0BJ, BAR} a = {0, 1,2,3} 70

p = {(SUBJ, 2), (C0MP, 2), (DET, 2), (0BJ, 2),
To get the exact set of permissible categories, we need to make sure that SUBJ, 0BJ, COMP,and BARare defined in all categories, and that DET is only specified in [-C0MP], [-0BJ] categories. The following set of L c constraints will achieve this. (19) a. SUBJ A OSJ A COMP /%, BAR b. DET --) ((COMP:0) /~ (0BJ:0)) We can now obtain a bijection between Jackendoff's X-bar categories and the admissible categories induced by F, A, and the constraints listed in (19). We define a mapping 0 between the Jackendoff's own category abbreviations and the admissible categories with respect to (19a) and (19b), as follows (we schematize by writing X with n bars as X n, 0 <-- n <- 3): (20) a. b. c. d. e. f.

0(V") = {(SUBJ, 1), (0BJ, 1>, (C0MP, 1), (BAR, n)} 0(M") = (SUBJ, 1), (0BO, I), (C0MP, 0>, (bar, n)} 0(P") = {(SUBJ, 0), (0BJ, I), (C0MP, 1), (BAR, n)} 0(Prt n) = {(SUBJ, 0), (OBJ, 1), , (C0MP, 1), (BAR, n)} 0(Art") = {(SUBJ, 1), (0BJ, 0>, , (BAR, n)} h. 0(A n) = {(SUBJ, 0), (OBJ, 0), (C0MP, 1), (BAR, n)} i. 0(Deg") = {(SUBJ, 0), (0BJ, 0), (COMP,0), (DET, 1),

(BAR, n>} j. O(Adv") = {(SUBJ, 0),
3

SUBJ

1

OBJ

0

COMP

1

As is evident, the set of categories induced by Jackend o f f s system has a cardinality of 40, ten at each bar level. Computational Linguistics, Volume 14, Number 1, Winter 1988

Category Structures

Gerald Gazdar et al.

Sets of categories as small as this are clearly insufficient for the description of natural languages. All transformational grammarians seem to agree that references to distinctions of tense, mood, voice, person, number, gender, case, pronominality, definiteness, wh-ness, and many other morphological and syntactic distinctions are in fact needed in a grammar. As pointed out by PuUum (1985), some statements in the TG literature suggest that further features are provided for the expression of such distinctions but are restricted to lexical (
In some recent TG, more than one indexing system is employed. Thus Rouveret and Vergnaud (1980, p. 160) "postulate that each verbal complex in a structure is identified by some integer p and each [-N] element in the verbal complex p bears the superscript p . " This superscripting system is distinct from the subscripting system maintained to indicate anaphoric linkage or binding, and neither places an upper bound on the number of indices. Hence it would not be sufficient to have a single type 1 feature. Two further type 1 features SUBSCRIPT and SUPERSCRIPT could be used, each taking category values representing indices with SUCCESSORand OF. It may seem implausible to suppose that anyone would choose in practice to handle indexing via a feature system such as that just suggested• Nonetheless, it would clearly be possible, which shows that one can incorporate integer indices into the structure of categories in terms of a finite number of features and a finite number of atoms, which might not initially have been evident• 3.6 GENERALIZED PHRASE STRUCTURE GRAMMAR

The generalized phrase structure grammar framework (GPSG), as set out in Gazdar, Klein, Pullum, and Sag (1985), (henceforth GKPS), differs from the examples considered so far in that it makes extensive use of features that are permitted to have categories as their values. 2 For concreteness, we suggest how the set of categories for the G K P S version of GPSG would be reconstructed in the framework presented here (see G K P S pp. 245-6, for the complete lists where we abbreviate with " . . . " ) . (24) a. F =

{SUBJ, N, C0MP, BAR ..... AGR, SLASH}

b. A = {0, 1, 2, . . . . for, that . . . . } C. ~" =

{(SUBJ, 0),
d. p = {(SUBJ, 2),
(22) Constraints are necessary to ensure that the value of SUCCESSORdoes not contain anything but SUCCESS0ROr0e specifications. To this end, we constrain each feature f E F ° (except 0e) as shown in (23a), and in addition we impose (23b) and (23c): (23) a. [] --1 (SUCCESSOR:39 b. [] --1 (SUCCESSORA OF) C. [ ] --1 (SUCCESSOR: --1 OF --I SUCCESSOR)

Computational Linguistics, Volume 14, Number 1, Winter 1988

constraint: (25) [] -~ (f: o f ) This prevents a category-valued feature f from being specified anywhere within the value of an occurrence of f. An example of a moderately complex category with more than one category-valued feature that nonetheless obeys (25) is shown in (26).

9

Category Structures

Gerald Gazdar et al.

(,,t~ { ~ Animate Subjective

--

Question _ _

Case

_

Personal

Objective Reflexve Possessive Possessive-Determner

I First

~ _ . P _ _ ~ Second

_

Ingular--

f

[

Demonstrative- - l ~

3.7 SYSTEMIC GRAMMAR

Systemic grammar, originally known as "scale and category" grammar, has its origins in the work of Halliday (1961) and is widely known among computational linguists through Winograd (1972) and other works, and it has recently received rigorous formalization in the hands of Patten and Ritchie (1987). Tree structures in systemic grammar tend to be fiat, more structural information being expressed through categories than in most other approaches Hudson (1971). Categories in systemic grammar are simply bundles of feature specifications: there is "nothing in systemic theory corresponding to the distinction between "features"--such as [+past] --and "categories"--such as NP and S---in TG theory" Hudson (1971, p. 48). A set of well-formed categories in a systemic grammar is defined by a system network, which "is in effect a body of rules, in symbolic form, which specify precisely how features can combine with each other: in other words, which features can appear together in the paradigmatic description of a single item, and which cannot" Hudson (1971). We will not discuss rules for forming systemic networks (and hence categories) here, but will instead refer the reader to the presentation in Winograd (1983), where a system network expressing category information for the English pronominal form is provided as an example of the notational techniques used in systemic grammar for specifying a set of categories. We reproduce this in Figure 1. The content of Figure 1 can be reconstructed straightforwardly as a category structure subject to a set of L c constraints (for a closely related analysis of this 10

I Feminine

Neuter

| Plural

Near /

The constraint (25) restricts us to exactly the set of legal GKPS categories. 3 The total GKPS category set is finite, but naturally, it is extremely large (Ristad (1986) calculates that it is in excess of 10774). I t is clear that the set of GKPS categories is vastly too large to be precompiled and stored-and indeed, no implementation that we know of has attempted this.

J

_

Far

Figure 1: Systemic Network for English Pronouns

example, developed independently, see Mellish (1986). The following is the category structure that we need: (27) a. F = {PRONOUN, CASE, PERSON, GENDER, NUMBER, ANIMACY, PROXIMITY}

b. A = {question, personal, demonstrative, subjective, objective, reflexive, possessive, possessive-determiner, first, second, third, feminine masculine, neuter, singular, plural} C. TO

= {
d. p

, } The constraints that must be imposed are the following:

(28) a. PRONOUN b.

(PRONOUN:question)

~

(CASE /~ -'-I PERSON /'k ---I

NUMBER /~ ANIMACY /~ 7 PROXIMITY) C. (PRONOUN:personaD <-->(CASE /~ PERSON /~ NUMBER /k -q ANIMACY /~ "7 PROXIMITY) d. (PRONOUN:demonstrative) <--)(7 CASE /~ 7 PERSON /~ NUMBER /~ -3 ANIMACY /~ PROXIMITY) e. GENDER ~ (PRONOUN A (PERSON:thirD A (NUMBER:

singular)) Note that this description of the pronominal system of English is artificially complicated by its isolation from the rest of the grammar. If it were embedded in the context of a definition of a wider class of categories (for example, the English noun class network given by Winograd (1983), it would be modified by the elimination of (28a) and the relaxation of (28b-d) to simple conditionals. Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar

Category Structures

et al.

The structure seen in this example employs only type 0 features. For example, the category it defines for a pronoun like h e r s e l f would be (29). (29)

PRONOUN

personal

CASE

reflexive

PERSON

third

NUMBER

singular

GENDER

feminine

Interestingly, however, systemic grammar as formalized by Hudson (1971), at least is not limited to type 0 features. Hudson explicitly permits recursive growth of feature structures in order to count constituents (see pp. 60-62). This could be reconstructed here by using a type 1 feature in roughly the manner we employed SUCCESSOR, above. Such a use of type 1 features immediately makes the size of the category set infinite. 3.8 CATEGORIAL GRAMMAR Categorial grammar originates with work by Lesniewski and Adjukiewicz in the 1940s (see van Benthem, Buszkowski and Marciszewski (1986), Haddock, Klein and Morrill (1987) and Oehrle, Bach and Wheeler (1987) for recent work and references to the earlier literature). The set of categories used is infinite. It is often defined as the smallest set containing some set of basic categories {al . . . . . a,}, and closed under the operation of forming from two categories a and/3 a new category al/3. To reconstruct the category system for categorial grammar, we define E as shown in (30). (30) a. F = {LABEL, DOMAIN, RANGE} b. A = {a, . . . . . a.} C. "/"= {} d. p = {}

W e then add the following: (31) a. [-](DOMAIN <-->--lLABEL) b. [-](DOMAIN <-->RANGE) We can now represent any category allowed in the simple form of categorial grammar considered so far. For example, the category (StNP)I(SINP) can be represented as shown graphically in (32). (32)

DOMAIN

DOMAIN

To show formally that we have captured the content of the category system of categorial grammar, we can exhibit a bijection between the categorial grammar categories and the admissible categories induced by F, A, and the constraints defined above. We define a mapping 0 between the categorial grammar categories and the admissible categories with respect to (31a) and (31b), as follows: (33) a. O(a i) = } where a and/3 are categories. A simple structural induction argument suffices to show that 0 is indeed bijective. The smallest category will be of the type ai, and corresponds to {}.

Each further step replaces a i or aj by a non-basic category and will clearly yield a unique result. It can be seen immediately that the mapping 0 has an inverse. The categories defined thus far are non-directional, in the sense that a complex category can combine with an argument either to its left or its right. However, most definitions assume directional categories Bach (1984). This further specification can be easily incorporated by introducing a new feature name DIRECTIONwhich takes values in 2. We then add a constraint that categories taking values for DOMAINalso take a value for DIRECTION, thus determining the directionality of the category. (35) [-](DOMAIN~ DIRECTION)

The translation function is then: (36) a. O(a i) = {
b. 0(og/3) = {, , ,
RANGE DOMAIN RANGE

Computational Linguistics, Volume 14, Number 1, Winter 1988

Indexed grammars are a generalization of phrase structure grammars due originally to Aho (1968). Like categorial grammar and some of the other frameworks previously mentioned, it uses an infinite category set. In the formulation presented in Gazdar (1985), an indexed grammar category consists of an atomic label and a 11

Gerald Gazdar et al.

Category Structures

possibly empty list (or stack) of atomic indices drawn from a finite set. There is a familiar technique for encoding lists or stacks in a notation which relies on the fact that lists can be d e c o m p o s e d into an initial element and the residual list (see, for example, Shieber (1984)). Thus, we add new elements INDEX and LIST to the set F: (37) a. F = {LABEL, INDEX, LIST} b. A = {a 1. . . . . am} tO {0, i, . . . . . i,} C. 7" = {(LABEL, 0>, (INDEX, 0>, (LIST, I>} d. p -- {(LABEL, {a I..... am}), (INDEX, {0, iI..... i,}>}

defined, since the distinction between atomic indices and indices taken from a finite set of categories has no language-theoretic implications. Given the representability of list-valued features as category-valued features in the present framework, the definitions of subsumption and unification automatically apply to lists without the need for any redefinition. If the empty category is used as the end marker for lists then two lists of different lengths will unify if one is a prefix of the other. Depending upon the linguistic interpretation of lists, this may or may not be what one wants. In our illustration, we use an atomic end marker that will block prefix unification.

A list of indices of the form (38a) is represented as (38b). (38) a. [J0,Jl . . . . . J J b. {(LIST, { , (LIST, {
4 COMPUTATIONAL COMPLEXITY OF CATEGORY CHECKING 4

{(INDEX, Jk), (LIST, {(INDEX, 0>})} . . .>}

In addition, we need the following constraints: (39) a. LABEL /% LIST b. [] --1 (LABEL /~ INDEX) C. [ ] --I (LIST: --I INDEX) d. [] --I(LIST /~ INDEX:0)

The first requires that at the top level, an indexed category has a label and a list of indices. The second disallows INDEXfrom co-occurring with LABEL,enforcing the constraint recursively downward. The third requires that if LIST is defined anywhere, then INDEXis defined in its value. And the last, also enforced recursively downward, requires that if INDEX has the value 0, LIST is not defined (so the end of the list of indices is unambiguously flagged by INDEXhaving the value 0). A category bearing an " e m p t y " list of indices is thus one whose value for LIST is {(INDEX, 0)}. An example of a category allowed by these constraints is shown in (40). (40)

I LAB £ I D I I LIS

I

T h e o r e m . The checking problem for categories is solvable in linear time. P r o o f . Assuming a category structure E = for f E F ~ and n - 0 corresponds to a node labelled f with the first elements of oq through o-k as its daugh-

__li xlal li rs:Xl:! LIST

~" I

Indexed grammar as originally formalized by Aho uses lists of atomic indices as part of the composition of categories. It is also possible in the framework we have defined to allow features to have lists of categories as their values. This is in fact proposed in the literature by Shieber (1984) and Pollard (1985). To extend an indexed grammar to permit G K P S - s t y l e categories in place of atomic indices, one can simply make INDEX a type 1 feature, add the G K P S category structure and constraints to the indexed grammar category structure and constraints, and then exempt LIST (but, crucially, not INDEX) from being subject to the constraint schema in (25). The resultant type of grammar, assuming that the limitations on rules in indexed grammars are maintained, is equivalent to indexed grammar as originally 12

The checking problem for categories is the problem of determining whether a category is legal given a fixed set of constraints, or more precisely, of determining for an arbitrary category oz and a fixed formula 4' of L c whether o~satisfies 4'. It is a special case of the problem of determining whether some arbitrary model satisfies some fixed formula of a logic.

LIST

l0

I

LeT(s). Let T be such a tree, and let 4' be a fixed formula of L c. We check T for satisfaction of 4' by annotating each node of T with the complete list of all subexpressions of 4', and working from the frontier to the root recording at each node which subexpressions are satisfied by the subtree rooted there. At each point the checking is local: only the current node and its daughters (if any) need be examined. E v e n for a subformula like [-q¢, all that must be verified at a node q as we work up the tree is that q, is satisfied at q and 7q¢ is recorded as satisfied at each daughter node. The conclusion of the procedure will be to determine whether or not 4' itself is true at the root of T, and thus whether T is well-formed. If 4' has s subformul~e and T has n nodes, the time taken is bounded by sn (the number of steps Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et al.

required if every subformula is evaluated at every node), and thus linear in n, the size of the input. • Of somewhat less interest than the checking problem is the universal checking problem, that of determining for an arbitrary input pair (tk, a), ~b a formula and a a category, whether a satisfies 4). The difference is that here ~bis not held constant; the task is analogous not to checking the legality of a category within a selected grammatical framework, but rather to a kind of framework-design oversight role, switching frameworks with every input and evaluating the given category relative to the proffered constraint. We note, however, that the universal checking problem only calls for, at worst, quadratic time. To see this, simply note that we can use the algorithm sketched above, and take account of s as well as n as part of the size of the input. The worst case is where s and n contribute about equally to the size of the product sn, i.e., where s -~ n. Then sn -~ ((s + n)/2) 2 = (s + n)2/4, which varies with the square of the input size s + n. For some special cases, both the checking problem and the universal checking problem are of course much easier. For example, if only type 0 features are permitted, checking is decidable in real time by a simple inspection of the finite number of (f, a) pairs, regardless of whether ~b is part of the input or not. Note that the much harder satisfiability problem, that of determining for an arbitrary formula ~bwhether there exists a category a that satisfies it, is of even less interest in the present context. When a grammatical framework intended for practical use is devised, the constraints on its category system are formulated to delimit a particular set of categories already well understood and exemplified. There is no practical interest in questions about arbitrary formulae of L c for which no one has ever considered what a satisfying category would be like. We would expect the satisfiability problem for L c to be PSPACE-complete, like the satisfiability problem for most modal logics. Ristad (1986, p. 33-4) proves a PSPACE-hardness result for what he calls "GPSG Category-Membership", specifically with respect to the GKPS framework, and this can immediately be seen to be extendable to the satisfiability result for L c (as mentioned in footnote 3, L c is in effect a language for the statement of feature cooccurrence restrictions, and can be used in the same way that Ristad uses the GKPS FCR formalism). The problem he considers, despite the misleading name he gives it, is the analog of satisfiability, not of checking; it asks whether there exists an extension of a given category that satisfies a given set of FCRs, and since the given category might be O, this is equivalent to satisfiability. Satisfiability is NP-complete even for simple propositional logic, so as soon as it is appreciated that a language for stating constraints on categories is in effect a logic with categories as its models, the complexity of satisfiability for category Computational Linguistics, Volume 14, Number 1, Winter 1988

Category Structures

constraints comes as no surprise. Checking of GKPS categories, on the other hand, which Ristad does not consider, can be done very fast, as a corollary of the theorem above. 5 SETS AS VALUES

All the syntactic approaches that we have considered so far distinguish syntactic categories from structural description of expressions in a fairly transparent fashion. In FUG Kay (1979, 1985), LFG Kaplan and Bresnan (1982), and work by Shieber and others on PATR II Shieber (1984), this traditional distinction disappears almost entirely. Thus, in LFG, syntactic categories and the structural descriptions known as f-structures are exactly the same kind of object. In FUG, not only is there no formal distinction between categories and structural descriptions, but even the distinction between structural descriptions and grammars disappears. At first sight, LFG f-structures seem likely to be the trivial case of a set of categories observing no constraints on admissibility at all. We simply take F to be the LFG set of f-structure attribute names, and A to be the LFG set of atomic f-structure values (the "simple symbols" and "semantic forms"). So, following this reasoning, the set of LFG f-structures would be just K, modulo the appropriate typing. However, this is not the case, for reasons that will emerge below. The first problem we consider is that at least two of the frameworks just mentioned permit sets as feature values. In one sense we already permit sets as values since type 1 features have categories as their values, and categories are sets. Categories are a rather special kind of set, however, namely partial function from features to values. Suppose we merely wanted to have a model for a set of atoms. Then, as we saw in our discussion of APG, we can model such a set by constructing the set's characteristic function. But modelling a set that way, whilst perfectly adequate for APG categories, has a consequence that may not always be acceptable: two sets on the same domain will unify just in case they are exactly the same set. Given certain quite natural interpretations of a feature system making use of sets, this may not be what we want. An alternative strategy then, and one which is also consistent with our framework, is to model sets as partial functions into a single value range (as opposed to total functions into a two value range). For example, the subset of the authors of this paper with British addresses could be represented as a partial function on the domain {Gazdar, Pullum, Carpenter, Klein, Hukari, Levine}, namely the function {(Carpenter, 1), (Gazdar, 1), (Klein, 1),} instead of the following total (characteristic) function on the same domain: {(Carpenter, 1), (Gazdar, 1), (Hukari, 0), (Klein, 1), (Levine, 0), (Pullum, 0)}. Then unification of the partial functions amounts to union of the corresponding sets. This is fine if our intended interpretation of the set is 13 f

CategoryStructures

Gerald Gazdar et al.

conjunctive, i.e., if {a, b, c} means that a holds and b holds and c holds (Carpenter has a British address and Klein has a British address and Gazdar has a British address). But if our intended interpretation is disjunctive, then we want the unification operation to give us intersection, not union. FUG actually uses set-valued attributes with a disjunctive interpretation Kay (1979). And, in a discussion of possible enhancements to the PATR II formalism, Karttunen (1984) provides a number of very relevant examples that illustrate the issues that arise when a unification-based formalism is augmented in order to encompass disjunction. As Chris Barker has pointed out to us, a perverse variant of the approach to conjunctively interpreted sets outlined above serves to handle the disjunctive interpretation of sets of atoms. We map the set {Accusative, Dative} into the partial function {(NOMINATIVE, 0>, (ABLATIVE, 0}, (GENITIVE, 0)} on the domain {ACCUSATIVE,

DATIVE, NOMINATIVE, ABLATIVE, GENITIVE}. NOW unification (and hence union) of such complement-specifying partial functions gives us an operation equivalent to intersection applied to the original sets. Thus the unification of {(NOMINATIVE, 0), (ABLATIVE, 0), (GENITIVE, 0>} (standing for {Accusative, Dative}) with {(NOMINATIVE, 0), (ACCUSATIVE, 0), (GENITIVE, 0)} (standing for {Ablative, Dative}) gives us {(NOMINATIVE,0>, (ABLATIVE, 0), (GENITIVE, 0), (ACCUSATIVE, 0>} which stands for {Dative}. Clearly, the present approach could be generalized to directly allow a type of feature that would take sets of atoms as values. The price to be paid for this, in a metatheoretical exercise such as the one we are engaged in, would be that the definition of unification becomes dependent on the intended interpretation of such features: the relevant clause needs to use union if the interpretation is conjunction, and intersection if the interpretation is disjunction. An altogether more serious issue arises when we consider the possibility of attributes taking sets of categories as values. We could represent such sets in a manner analogous to the treatment of lists, but with a special marking (given in terms of special attributevalue pairs) indicating that the list representation in question is to be interpreted as a set. The trouble with this is that the identity conditions for the resulting objects are no longer transparent. Two structurally distinct lists may or may not count as identical, depending on whether or not they are both representing sets, and that in turn will depend on whether particular attributes appear in certain relevant structural positions. Likewise, our existing definitions of unification and subsumption would simply fail to provide one with intuitively reasonable results, and its seems unlikely that they could be made to do so without further formal contortions. This whole strategy seems contrived and inelegant. The alternative is, again, to introduce a new type of feature, one taking sets of categories as its values, and 14

some recent works have done just this. Sabimana (1986) proposes a feature ARGwhich takes a set of categories as its value. The feature appears on elements that correspond semantically to predicates, and its value is the set containing the categories that correspond semantically to the arguments of that predicate. The Japanese Phrase Structure Grammar (JPSG) of Gunji (in press) goes further in that it restricts itself entirely to such features (together with atom-valued features, of course) and does not employ simple category-valued features at all. Both FUG and LFG also permit category-set values, in effect, though the interpretation they assign to the resulting objects is, once again, different. FUG's interpretation is, as with atom sets, disjunctive. On this interpretation, unification of two sets of categories can be defined as the set of categories each of whose members is the unification of a pair in their Cartesian product (again, see Karttunen (1984) for relevant discussion of this kind of approach). In LFG, sets of categories acting as values for single attributes are used in the analysis of adjuncts (and possibly coordination) and the interpretation is intendedly conjunctive Kaplan and Bresnan (1982). Under this interpretation, there is, in general, no unique unification to be had, although one can define an operation to provide one with a set of possible unifications. In Gunji (in press), where a conjunctive interpretation is assigned to category-set values, the non-uniqueness problem is sidestepped by defining unify as a predicate of category pairs, rather than as an operation. In view of all these considerations, we have opted for simplicity over generality and simply excluded set valued features from our purview.

6 SHARED VALUES One property that FUG and PATR II have in common, which sets them apart from the simpler grammar type discussed earlier in this paper, is the option of letting two or more distinct features share the same value. Thus, FUG functional descriptions allow one instance of a value to simultaneously be the value of more than one (instance of an) attribute. Consequently, the implicit hierarchy, represented graphically, does not respect the single-mother requirement that is built deep into our definitions. Of course, two category-valued features within a category may contingently have identical values, but this is not the same as sharing the same value (except in common parlance, perhaps). Kasper and Rounds (1986) refer to the distinction as one of type identity versus token identity. If we take a category, containing two contingently identical category-values, and unify it with a second category, then the contingent identity may not be preserved in the result. Consider, for example, the result of unifying these two categories: Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et

Category Structures

al.

7 CONCLUSION

where the values of F and H are identical in the first but not in the second. The result is:

d e and here the values of ~"and H are no longer identical. If the original common value had been genuinely shared, then no unification would have been possible (see also Shieber (1985) where the term "reentrancy" is used in this connection). There is an alternative way of thinking about the problem of shared values, and that is to reconstruct it in terms of indexing: every value carries an index, and two structurally identical values are the very same thing if and only if they bear the same index. An integer indexing of this sort can be represented in the present framework as we have already see in section 4.5 above. However, a coindexing reconstruction would not be a sensible way of thinking about shared values in the present context since such a use of indices makes nonsense of structurally defined unification, subsumption, and so on. For two intuitively identical structures to unify, it would not be sufficient for them to exhibit the same internal patterns of coindexed values. Rather, they would need in addition to manifest the very same choice of indices. Clearly, this is not what one wants, as choice of index is completely arbitrary, and structures differing only in identity of the integers selected as indices should be regarded as equivalent. To achieve a semantics for shared-value category formalisms, it is necessary to move beyond the partial function-based category structures that provide the basis for our semantics, and thus depart from the particular category constraint logic that it induces. Like set values, shared values are simply beyond the scope of the rather parsimonious theory of categories developed here. 5 The reader interested in pursuing richer approaches should consult Pereira and Shieber (1984) for a domain-theoretic account of the semantics of categories in LFG, PATR II, and GPSG; Ait-Kaci and Nasr (1986), who capture shared values with a coreference relation on the nodes of the tree; Kasper and Rounds (1986), Moshier and Rounds (1987), and Rounds and Kasper (1986) for a finite state automaton-based logic and semantics for categories in FUG and PATR II; and van Benthem (1986a, b) for an interesting foundational discussion and application of such an automatonbased semantics. Computational Linguistics, Volume 14, Number 1, Winter 1988

We have developed and applied a general framework for defining syntactic categories, including categories in which features can have categories as their value, which latter possibility turns out to subsume the possibility of a feature taking as its value a list of indices or categories, drawn from either a finite or an infinite set. The unitary way in which we have characterized these diverse systems is intended to assist in the exploration and comparison of grammatical formalisms. Questions concerning whether particular rule types and operations on categories that are familiar from one approach to grammar can be carried over unproblematically to another approach, and questions concerning the implementation difficulties that arise when a given formalism is adopted, can in many cases be settled in a straightforward and familiar way, namely by reducing them to previously encountered types of question. The grammatical frameworks we have considered as examples fall into a five-class typology which we can now explicate. The first class contains the frameworks that use only atom-valued features (simple phrase structure grammar, Harman's augmented phrase structure grammar; RG and APG); the second contains the special case of G K P S , which uses category-valued features but imposes a constraint which prevents them from having effects on expressive power that could not ultimately by simulated by atom-valued features; the third contains the frameworks that use just a single category-valued feature (our key example being indexed grammar); the fourth contains frameworks making use of more than one category-valued feature (an example being categorial grammar); and the fifth includes those frameworks that fall outside the scheme we have developed in that their categories are not representable as finite partial functions constrained by statements in L c (LFG, FUG, PATR II, etc.). It is not at all clear which of these five classes of approaches will prove the most suitable for implementing natural language processing systems in the long term. In this paper, we hope to have made somewhat clearer the nature of the issues at stake. We hope also to have done something more: for the first four classes, we have provided what is in effect a unitary type of data structure for the representation of their syntactic categories. Thinking in terms of such data structures should make it possible for pseudo-issues in natural language processing research to be avoided in a large class of circumstances, to the point that even a decision in mid-project to change the grammatical framework from one linguistic approach to another need not entail any fundamental redesign of what are in most frameworks the basic objects of syntactic representation. APPENDIX

In this appendix we restate the semantic rules for L c more precisely. All well-formed expressions of L c have 15

Gerald Gazdar et aL

Category Structures

the same kind of d e n o t a t i o n ~ t h e y denote truth values (i.e., members of 2) relative to the category structure and a category a determined by E. If 05 is a well-formed expression o f L o then we use f105flx,~to stand for the denotation of 05 with respect to the category structure and category a. If 005(]z~,, ~ = 1 then we shall say that t~ SATISFIES 05. The formal statement of our semantic rules is the following, where a, f, 05, and q~ are as above. (AI) a. b. c. d. e. f. g. h. i.

0fl~.~ = 1 iff s0') is defined. Of:aD~,,~= 1 iff a(J) = a . Uf:05fl~.~= 1 iff 0050~.,~0~ = I. D~050~.~ = 1 iff 0050~.~ = 0. 005 V q~.~ = 1 iff D05D~,~= 1 or 0q~:,,~ = 1. 005 A ~ , , ~ = 1 iff 005~,,~ = 1 and 0q~,,~ = 1. 005---* q~,,~ = 1 iff 005~,,~ = 0 or 0q~x,,~ = 1. 105 <-> ~1:~,~ = 1 iff 1050:~,~ = lqA~,~. 1[]051~,~ = 1 iff 1051:~,~ = 1 and for all f i n F ~ n ~(a),lD051:,,,~00 = 1. j. I 0 051:, ~ = 1 iff 1051:, ~ = 1 or for some f i n F ~ n ~(a),l ~ 4>0:~,=00 = 1.

Note that if a ~_ fl and a satisfies 05, it does ~0T follow in L c that 3 satisfies 05 (compare Rounds and Kasper (1986), T h e o r e m 6). F o r example, we have ~ ~_ {(F, a)} and ~ satisfies --1 F, but {(F, a)} does not. Likewise, the fact that both a and/3 satisfy some constraint 05does not entail that a U/3 will satisfy 05, even if a IA/3 is defined. The desire to incorporate negation whilst maintaining an upward closure property lead Moshier and Rounds (1987) to set aside a classical semantics for their feature description language and postulate an intuitionistic se mantics that, in effect, quantifies o v e r possible extensions. We will write ~ 05 to mean that for every category structure ]i and category a in 11, a satisfies 05. Given this, we can list some valid formula: and valid formula schemata of the logic of category constraints. (A2) a.

~a)

--> f

(for all a E p(]), f E F °)

This simply says that if a feature has an atomic value, then it has a value. We also have all the valid formula: of the standard propositional calculus, which we will not list here. Furthermore, we have the following familiar valid modal formula:. (A2) b. c. d. e. f. g. h.

~1-]05 ~ - - 1 0 7 05 ~ ( 0 5 - - - , 05) ~ 0 5 ~ 05 ~ 0 5 ~ <)05 PD(05 A q,) ~ (•05 A []q,) t=~(05V ~b),~-~(O05V O~b) ~D05 ~ •DO5

Here, (A2h) shows us that our logic at least contains $4 (we follow the nomenclature of Hughes and Cresswell (1968) throughout). But we do not have ~ <>05--~ [] 0 05, and so our logic does not contain $5. To see this, consider the following category, assuming F is a cate16

gory-valued feature: {(F, O)}. This category satisfies 0 F but not [ ] O F. The category {(F, {(G, a)}), (H, {(G, b)})} (graphically represented in (50), below) provides us with an analogous falsifying instance for ~ O 1-105~ [ ] O 05 when we set 05 = (~:a).

This shows that our logic does not contain $4.2. Interestingly, the converse of this constraint zs valid, hence: (A2) i. ~F-1005---~ OD05 This is easy to demonstrate: if o~satisfies [ ] O 05 then 0 05 must hold in all the categories that terminate a, and if O 05 holds in those categories, then 4, and I-]05 hold in them as well. So r-]05 holds in at least one category in o~, and thus a must satisfy O D05. This shows that our logic at least contains K1 and, as a consequence, is not contained by SS. H o w e v e r , our logic cannot contain K2, since the latter contains S4.2. N o r does it contain K1.2 since the latter's characteristic axiom, namely ~ 05~ 1-1(O 05~ 05) is shown to be invalid by the category {(G, a), (F, {(G, b), (F, {(G, a)})})} (shown in (51), below) when set set 05 = (c: a).

I In fact, our logic does not merely contain K1, it also contains KI.1, whose characteristic axiom is: (A2) j.

~Fq(D(05 --> D05) ---> 05) -o 05)

Hughes and Cresswell note that KI.1 'is characterized by the class of all finite partial orderings, i.e., finite frames in which R [the accessibility relation] is reflexive, transitive, and antisymmetrical' Hughes and Cresswell ((1984), p. 162). So it should be no surprise, given the basis for our semantics, that our logic turns out to include KI.1. This logic, also known as S4Grz (after Grzegorczyk (1967)), 'is decidable, for e v e r y nontheorem of S4Grz is invalid in some finite weak partial ordering' (Boolos (1979, p. 167). Two further valid formula schemata of L c have some interest, before we conclude the list of valid formula: in (A2): (A2) k. ~ 0 - T f 1. ~(f.'05)--> 005

(for a l l f E F I) (forallfEF

1)

The first of these follows from the fact that categories are finite in size and thus ultimately grounded in categories that contain no category-valued features: f must be false of these terminating e m b e d d e d categories, and hence O --1f must be true of the category as a whole. Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et

Category Structures

al.

The second states that if a category is defined for a category-valued feature whose value satisfies 4,, then the category as a whole satisfies O 4'. (A2) m. ~(f:th) ---~f (for a l l f E F I) n. ~ ( ( f : 4 , ) A ( f : ~ ) ) ~ t f . ' 4 , A 0 ) ( f o r a l l f E F ~) o. P((i2~b)V~q0)~--~(f:thVq0 ( f o r a l l f E F l) It is worth considering the valid formulae one would get in certain restricted classes of category structures. Suppose we consider category structures which contain only atom-valued features (i.e., F = F°). In this case, as one would expect, the modal logic collapses into the propositional calculus and the relevant notion of validity (call it Po) gives us the following: (A5) moth , o ruth The converse case, where we only permit categoryvalued features (i.e. F = F1), is uninteresting, since it is not distinct from the general case. We can always encode atom-valued features as (sets of) categoryvalued features and subject the latter to appropriate constraints, as follows. For every feature specification (f, a) such t h a t f E F ° and a E p(f), we introduce a new type 1 feature f a and use the presence of 0Ca, 0 ) to encode the presence of (f, a) and likewise absence to encode absence. Then, for each pair of atoms a and b in p(f), we require the new features to satisfy [] -7 (fa A fb). And to constrain each new feature f a to have the empty set as its value, we stipulate [] -7 (fa:g) for every feature g. H o w e v e r , consider validity in category structures containing at most one category-valued feature (call this kind of validity ~ 1)- With this restriction, the $4.2 axiom considered earlier becomes valid: (A6) ~10[N~b--~ [ ] O 4 , In addition, we get (A7). (A7) ~ ~[]([]t h ~ [-]~) V [-l(f--]q~--~ [~th) This means that this restricted logic at least contains K3, but it cannot contain K4, since ~ 1 ~ ) ~ (0[~(~ "--> D~b) is falsified by the category {(G, a), (~" {(G, b), (~', {(G, a)})})} when we set ~b = (G: a).

IFll = 1 are of some potential relevance to the study of indexed grammars whose categories can be- construed as being restricted in just this way (see section 4.9, above). ACKNOWLEDGMENTS Chris Barker has contributed substantively to the research reported here, and we offer him our gratitude. We are also grateful to Edward Briscoe, Jeremy Carroll, Roger Evans, Joseph Halpern, David J. Israel, Ronald M. Kaplan, William Keller, James Kilbury, William A. Ladusaw, Christopher Mellish, Richard E. Otte, Fernando Pereira, P. Stanley Peters, Carl J. Pollard, Stephen Pulman, William Rounds, Stuart M. Shieber, H e n r y T h o m p s o n and Manfred Warmuth for their generous assistance during the research reported in this paper. Though in some respects they have contributed substantially, they should not be associated with any errors that the paper may contain. In addition, we thank Calvin J. Pullum, who is responsible for the diagrams, and we acknowledge partial research support from the following sources: the UCSC Syntax Research Center (Gazdar, Hukari, Levine, Pullum); grants from the (U.K.) SERC and ESRC (Gazdar); N S F Graduate Fellowship RCD-8651747 (Carpenter); N S F grants BNS-85 11687 and BNS-85 19708 (Pullum). REFERENCES

Aho, Alfred V. 1968 Indexed Grammars. Journal of the Association for Computing Machinery 15: 647-671. Ait-Kaci, Hassan; and Nasr, Roger. 1986 Proceedings of the 13th Annual ACM Conference on Principles of Programming Languages: 219-228. Association for Computing Machinery.

Bach, Emmon. 1984Some Generalizations of Categoilal Grammar. In Landman, Fred; and Veltman, Frank, Eds., Varieties of Formal Semantics: Proceedings of the 4th Amsterdam Colloquium, September 1982, Foils, Dordrecht, Holland: 1-23.

van Benthem, Johan. 1986a Semantic Automata. In Groenendijk, Joroen; de Jongh, Dick; and Stokhof, Martin, Eds., Information, Interpretation and Inference. Foils, Dordrecht, Holland. Reprinted in van Benthem, Johan. 1986Essays in Logical Semantics. D. Reidel, Dordrecht, Holland: 151-176. [Also published as CSLI Report 85-27, Center for the Study of Language and Information, Stanford, 1985] van Benthem, Johan. 1986bTowards a Computational Semantics: In Cooper, Robin; Engdahl, Elisabet; and Gardenfors, P., Eds., Proceedings of a Workshop on Generalized Quantifiers, Lund 1985. D. Reidel, Dordrecht, Holland.

In fact it must also contain K3.1, in view of the validity of (A2j) above, and this logic, also known as S4.3Grz, is characterized by finite linear orderings Hughes and Cresswell (1984). This is the characterization we would expect given the character of the ~1 restriction on the form of permissible categories, since with only one category-valued feature, there is at most one path through the structure of a category and so the partial order becomes a linear order. These observations concerning the logic induced by category structures where Computational Linguistics, Volume 14, Number I, Winter 1988

van Benthem, Johan. 1986c Categoilal Grammar. In Johan van Benthem. 1986 Essays in Logical Semantics. D. Reidel, Dordrecht, Holland: 123-150. van Benthem, Johan; Buszkowski, W.; and Marciszewski, W., Eds., Categorial Grammar. John Benjamin, Amsterdam, Holland. Boolos, George. 1979 The Unprovability of Consistency. Cambridge University Press, Cambridge, England. Bresnan, Joan W. 1975Transformations and Categories in Syntax. In Butts, Ronald; and Hintikka, Jaakko, Eds., Basic Problems in Methodology and Linguistics. D. Reidel, Dordrecht, Holland: 283-304. Chomsky, Noam. 1970 Remarks on Nominalization. In Jacobs, R.; and Rosenbaum, P., Eds., Readings in English Transformational Grammar. Ginn, Waltham, Massachusetts: 11-61. 17

Gerald Gazdar et

al.

Chomsky, Noam. 1980 On Binding. Linguistic Inquiry 11: 1--46. Chomsky, Noam. 1981 Lectures on Government and Binding. Dordrecht: Foris. Elson, Benjamin; and Pickett, Velma. 1962 An Introduction to Morphology and Syntax. Summer Institute of Linguistics, Santa Ana, California. Gazdar, Gerald. 1985 Applicability of Indexed Grammars to Natural Languages. Center for the Study of Language and Information, Stanford, California: Report No. CSLI-85-34. Gazdar, Gerald; Klein, Ewan; Pullum, Geoffrey K.; and Sag, Ivan A. 1985 Generalized Phrase Structure Grammar. Harvard University Press, Cambridge, Massachusetts. Grzegorczyk, Andrzej. 1967 Some Relational Systems and the Associated Topological Spaces. Fundamentae Mathematicae 60: 223231. Haddock, Nicholas; Klein, Ewan; and Morrill, Glyn, Eds., 1987 Categorial Grammar, Unification Grammar and Parsing. Edinburgh Working Papers in Cognitive Science 1, Edinburgh, Scotland. Halliday, Michael A. K. 1961 Categories of the Theory of Grammar. Word 17:241-292. Halvorsen, Per-Kristian; and Ladusaw, William A. 1979 Montague's 'Universal Grammar': an Introduction for the Linguist. Linguistics and Philosophy 3: 185-223. Harman, Gilbert H. 1963 Generative Grammars without Transformation Rules: a Defense of Phrase Structure. Language 39: 597-616. Harris, Zellig S. 1951 Methods in Structural Linguistics. University of Chicago Press, Chicago, Illinois. Hendriks, Herman. 1986 Foundations of GPSG Syntax. Doctoraalscriptie Wijsbegeerte, University of Amsterdam, Amsterdam, Holland. Hornstein, Norbert. 1977 S and X' Convention. Linguistic Analysis 3: 137-176. Hudson, Richard A. 1971 English Complex Sentences. North Holland, Amsterdam, Holland. Hughes, G. E.; and Cresswell, Max J. 1968 An Introduction to Modal Logic. Methuen, London, England. Hughes, G. E.; and Cresswell, Max J. 1984 A Companion to Modal Logic. Methuen, London, England. Jackendoff, Ray. 1974 Introduction to the X Convention. Indiana University Linguistics Club, Bloomington, Indiana. Jackendoff, Ray. 1977 X Syntax: A Study of Phrase Structure. MIT Press, Cambridge, Massachusetts. Johnson, David E.; and Postal, Paul M. 1980 Arc Pair Grammar. Princeton University Press, Princeton, New Jersey. Kaplan, Ronald; and Bresnan, Joan. 1982 Lexical-Functional Grammar: a Formal System for Grammatical Representation. In J. W. Bresnan, Ed., The Mental Representation of Grammatical Relations. MIT Press, Cambridge, Massachusetts: 173-281. Karttunen, Lauri. 1984 Features and Values. Proceedings of the lOth International Conference on Computational Linguistics and the 22nd Annual Meeting of the Association for Computational Linguistics. Stanford University, California: 28-33. Karttunen, Lauri; and Zwicky, Arnold M. 1985 Introduction to Dowty, D.R.; Karttunen, L.; and Zwicky, A.M., Eds., Natural Language Parsing. Cambridge University Press, Cambridge, England: 1-25. Kasper, Robert T.; and Rounds, William C. 1986 A Logical Semantics for Feature Structures. In Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics: 257-266. Kay, Martin. 1979 Functional Grammar. In Chiarrello, Christine et al., Eds., Proceedings of the 5th Annual Meeting of the Berkeley Linguistics Society: 142-158. Kay, Martin. 1985 Parsing in Functional Unification Grammar. In Dowty, D.R.; Karttunen, L.; and Zwicky, A.M., Eds., Natural Language Parsing. Cambridge University Press, Cambridge, England: 251-278.

18

Category Structures Ladusaw, William A. 1985 A Proposed Distinction Between Levels and Strata. Presented to the Annual Meeting of the Linguistic Society of America, Seattle, Washington. Memo no. SRC-85-04, Syntax Research Center, University of California, Santa Cruz, California. Lasnik, Howard; and Kupin, Joseph J. 1977 A Restrictive Theory of Transformational Grammar. Theoretical Linguistics 4: 173-196. Levy, Leon S.; and Joshi, Aravind, K. 1978 Skeletal Structural Descriptions. Information and Control 39:192-211. Longacre, Robert E. 1965 Some Fundamental Insights of Tagmemics. Language 41:65-76. Mellish, Christopher. 1986 Implementing Systemic Classification by Unification. Manuscript, University of Sussex. Montague, Richard. 1970 Universal Grammar. In Thomason, Richmond H., Ed., Formal Philosophy. Yale University Press, New Haven, Connecticut: 222-246. Montague, Richard. 1973 The Proper Treatment of Quantification in Ordinary English. In Thomason, Richmond H., Ed., Formal Philosophy. Yale University Press, New Haven, Connecticut: 247-270. Moshier, M. D., and Rounds, William C. 1987 A Logic for Partially Specified Data Structures. Proceedings of the ACM Conference on Principles of Programming Languages, Munich. Oehrle, Richard T.; Bach, Emmon; and Wheeler, Deirdre W., Eds., 1987 Categorial Grammars and Natural Language Structures, D. Reidel, Dordrecht, Holland. Patten, Terry; and Ritchie, Graeme. 1987 A Formal Model of Systemic Grammar. In Kempen, Gerard, Ed., Natural Language Generation: Recent Advances in AI, Psychology and Linguistics. Kluwer, Amsterdam, Holland. Pereira, Fernando C. N.; and Shieber, Stuart M. 1984 The Semantics of Grammar Formalisms Seen as Computer Languages. In Proceedings of the lOth International Conference on Computational Linguistics and the 22nd Annual Meeting of the Association for Computational Linguistics: 123-129. Pereira, Fernando C. N.; and Warren, David H. D. 1980 Definite Clause Grammars for Language Analysis--a Survey of the Formalism and a Comparison with Augmented Transition Networks. Artificial Intelligence 13: 231-278. Perlmutter, David M.; and Postal, Paul M. 1977 Toward a Universal Characterization of Passivization. In Whistler, Kenneth et al., Eds., Proceedings of the 3rd Annual Meeting of the Berkeley Linguistics Society 394--417. Reprinted in: Perlmutter, David M., Ed., Studies in Relational Grammar 1. University of Chicago Press, Chicago, Illinois. Pollard, Carl J. 1984 Generalized Phrase Structure Grammars, Head Grammars, and Natural Languages. Ph.D. dissertation, Stanford University. Pollard, Carl. 1985 Phrase Structure Grammar Without Metarules. Goldberg, Jeffrey; MacKaye, Susannah; and Wescoat, Michael, Eds., Proceedings of the West Coast Conference on Formal Linguistics, Volume Four. Stanford Linguistics Association, Stanford, California: 246-261. Postal, Paul M. 1964 Constituent Structure: A Study of Contemporary Models of Syntactic Description. Publication 30 of the Indiana University Research Center in Anthropology, Folklore, and Linguistics, Bloomington, Indiana. Pullum, Geoffrey K. 1985 Assuming Some Version of X-Bar Theory. In Eilfort, William D.; Kroeber, Paul D.; Peterson, Karen L., Eds., CLS 21, Part 1: Papers from the General Session at the Twenty-First Regional Meeting. Chicago Linguistic Society, Chicago, Illinois: 323-353. Ristad, Eric Sven. 1986 Computational Complexity of Current GPSG Theory. Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics: 30-39. Rounds, William C.; and Kasper, Robert T. 1986 A Complete Logical Calculus for Record Structures Representing Linguistic Informa-

Computational Linguistics, Volume 14, Number I, Winter 1988

Gerald Gazdar et

Category Structures

al.

tion. Proceedings of the 15th Annual Symposium on Logic in Computer Science. Cambridge, Massachusetts. Rouveret, Alain; and Vergnaud, Jean-Roger. 1980 Specifying Reference to the Subject: French Causatives and Conditions on Representations. Linguistic Inquiry 11: 97-202. Sabimana, Firmard. 1986 The Relational Structure of the Kirundi Verb. D.Phil. dissertation, Indiana University, Bloomington, Indiana. Shieber, Stuart. 1984 The Design of a Computer Language for Linguistic Information. In Proceedings of the lOth International Conference on Computational Linguistics and the 22nd Annual Meeting of the Association for Computational Linguistics: 362366. Shieber, Stuart. 1985 Criteria for Designing Computer Facilities for Linguistic Analysis. Linguistics 23:189-211. Shieber, Stuart. 1987 Separating Linguistic Analyses from Linguistic Theories. In Whitelock, Peter J. et ai., Eds., Linguistic Theory and Computer Applications. Academic Press, London. Stockwell, Robert P.; Schacter, Paul; Partee, Barbara H. 1973 The Major Syntactic Structures of English. Holt, Rinehart and Winston, New York, New York. Winograd, Terry. 1972 Understanding Natural Language. Academic Press, New York, New York. Winograd, Terry. 1983 Language as a Cognitive Process: Volume 1 Syntax. Addison-Wesley, Reading, Massachusetts. NOTES 1. Bresnan (1975) correctly attributes the [-+N, -+V] feature system to lectures delivered by Chomsky at the 1974 Linguistic Institute in Amherst, Massachusetts. In some works, e.g., Jackendoff (1977) and Gazdar, Klein, Pullum, and Sag (1985), Chomsky (1970) is

Computational Linguistics, Volume 14, Number 1, Winter 1988

2.

3

4. 5.

wrongly given as the source. The latter work does, however, contain the following relevant comment: " w e might just as well eliminate the distinction of feature and category, and regard all symbols of the grammar as sets of features" (p. 208). As Hendriks (1986) has noted, the definition of categories given in GKPS "is a bit of a mess from a formal point of view" (1986, p. 19). Definition 1 reads as follows: ,,po is a function from F to POW(A) such that for a l l f ~ (F-Atom), p°00 = {{}}" (GKPS, p. 36). But {{}}is not in the power set of A; "POW(A)" should be replaced by "POW(A) O {{{}}}". Parts of the text and examples following Definition l assume correctly that it ends ,,pO(f) = {{}},,, but other parts assume incorrectly that it ends ,,po(f) = {},,. If the latter version were adopted, Definition 4 would fail to add category-valued feature specifications in the desired way (since the condition " 3 C ' E ff'-~(t)[C' C_ C]" would never be satisfied w h e r e n = 1.) The "feature cooccurrence restrictions" (FCRs) of GKPS form part of the definition of admissible tree rather than being part of the definition of categories. However, every GKPS FCR can be expressed in L c, and the translation is trivial. We are indebted to Joseph Haipern for his help with the material in this section. One of our referees has suggested that our semantics can be made to handle sharing by introducing an equality predicate into L c, marking shared value situations with special nonce features, and then using conditional constraints triggered by these features to impose identical values on the relevant features. But we have been unable to get any scheme of this kind to work in the general case. There appears to be no upper bound to the number of nonce features that may be required, and moreover, unification ceases to behave in an intuitively reasonable manner.

19