STRUCTURES

Gerald Gazdar Cognitive Studies P r o g r a m m e , University of Sussex, Brighton BN1 9QN, U.K. Geoffrey K. Pullum Cowell College, University of California, Santa Cruz, Santa Cruz, California 95064, USA Robert Carpenter, Ewan Klein Centre for Cognitive Science, University of Edinburgh, E d i n b u r g h EH8 9 L W , U.K. T h o m a s E. H u k a r i Department of Linguistics, University of Victoria, Victoria, B.C., C a n a d a V8W 2Y2 R o b e r t D. L e v i n e Department of Linguistics, University of British Columbia, Vancouver, B.C., C a n a d a V6T l W 5

This paper outlines a simple and general notion of syntactic category on a metatheoretical level, independent of the notations and substantive claims of any particular grammatical framework. We define a class of formal objects called "category structures" where each such object provides a constructive definition for a space of syntactic categories. A unification operation and subsumption and identity relations are defined for arbitrary syntactic categories. In addition, a formal language for the statement of constraints on categories is provided. By combining a category structure with a set of constraints, we show that one can define the category systems of several well-known grammatical frameworks: phrase structure grammar, tagmemics, augmented phrase structure grammar, relational grammar, transformational grammar, generalized phrase structure grammar, systemic grammar, categorial grammar, and indexed grammar. The problem of checking a category for conformity to constraints is shown to be solvable in linear time. This work provides in effect a unitary class of data structures for the representation of syntactic categories in a range of diverse grammatical frameworks. Using such data structures should make it possible for various pseudo-issues in natural language processing research to be avoided. We conclude by examining the questions posed by set-valued features and sharing of values between distinct feature specifications, both of which fall outside the scope of the formal system developed in this paper.

The notion syntactic category is a central one in most grammatical frameworks. As Karttunen and Zwicky (1985) observe, traditional "parsing" as taught for languages like Latin involved little more than supplying a detailed description of the grammatical category of each word in the sentence to be parsed. Phrase structure grammars are entirely concerned with assigning terminal strings to categories and determining dominance and precedence between constituents on the basis of their categories. In a classical transformational grammar

(TG), the objects transformations manipulate are primarily strings of syntactic categories (and, to a lesser extent, of terminal symbols). This is just as true of recent TG work. Although the use of syntactic categories is not a logical prerequisite of generative grammar (see Levy and Joshi (1978)), no linguistic approach known to us dispenses with them altogether. In view of this, it is perhaps surprising that linguists have not attempted to explicate the concept "syntactic category" in any gen-

Copyright 1988by the Associationfor ComputationalLinguistics.Permissionto copy withoutfee all or part of this materialis granted provided that the copies are not madefor direct commercialadvantageand the CL referenceand this copyrightnotice are includedon the first page. To copy otherwise, or to republish, requires a fee and/or specificpermission. 0362-613X/88/010001-19503.00 Computational Linguistics, Volume 14, Number I, Winter 1988

1

Gerald Gazdar et al.

eral way, i.e., independently of particular systems of notation and the associated substantive assumptions about grammar. In this paper we offer an explicit metatheoretical framework in which a notion of "syntactic category" receives a precise definition. The framework is intended to facilitate analysis and comparison of the underlying concepts of different theories, freed from the notational and sociological baggage that sometimes encumbers the original presentations in the literature. Viewed from the standpoint of implementation, it can be regarded as providing a unitary data structure for categories that can be used in the implementation of a number of superficially different grammatical frameworks. We begin by defining in section 1 a space of categories broad enough to encompass the objects employed as syntactic categories in a range of diverse types of generative grammar. Then, in section 2, we present the syntax and semantics for L c, a formal language for defining constraints on categories. In the succeeding section we provide illustrative definitions of the grammatical categories used in a number of frameworks. We cover simple phrase structure grammar in section 3.1; tagmemics in section 3.2; Harman's (1963) augmented phrase structure grammar in section 3.3; relational grammar and arc pair grammar in section 3.4; X syntax, TG, and the government-binding (GB) framework in section 3.5; generalized phrase structure grammar (GPSG) in section 3.6; systemic grammar in section 3.7; categorial grammar in section 3.8; and Aho's (1968) indexed grammar in section 3.9. We then go on to consider some relevant computational complexity matters (section 4). Finally, we discuss two issues that do not arise in any of these approaches, and which fall outside the scope of the simple theory that we present, namely the use of sets as values of features (section 5) and values shared between distinct feature specifications (section 6). These issues are important in the context of the category systems employed in functional unification grammar (FUG), lexical functional grammar (LFG), and the PATR II grammar formalism. Our goal in this paper is not an empirical one, but rather one which is analogous to that of Montague's "Universal Grammar" (1970) (see Halvorsen and Ladusaw (1977) for a useful introduction) which attempts to give a general definition of the notion "possible language" in terms applicable to, but not limited to, the study of human languages. We have the much more modest goal of characterizing one rather simple and general notion of "possible syntactic category", and of exploring the range of linguistic approaches that it will generalize to, its formal properties, and its limitations. As will become evident below, our exercise is complementary in certain respects to that of Pereira and Shieber (1984) and Shieber (1987) and to recent work of Rounds and his associates on the development of a logic for the description of the notions of syntactic category that are embodied in functional unification grammar and 2

Category Structures

PATR II (see Kasper and Rounds (1986), Moshier and Rounds (1987), Rounds and Kasper (1986)). We do not concern ourselves with the appearance or representational details of a given theory of categories (or any of the other aspects of the linguistic framework in question, e.g., its rule system), but only with its underlying semantics--the issue of what set-theoretic (or other nonlinguistic) objects provide categories with their interpretation. We are content with being able to exhibit an isomorphism between one of the theories of categories permitted by our framework and the concrete example we are considering; we need not demonstrate identity. Hence we have deliberately refrained from specifying a formal language for representing categories and features. To the extent that we need to produce exemplificatory features or categories for inspection, we may use the conventional notation of the approach in question, or the ordinary notations of set theory, or an informal labeled graph notation introduced below, but we do not offer a representational formalism for categories that has a significance of its own. In the framework we provide, it is possible to define the category systems of a wide variety of apparently very different approaches to natural language syntax simply by defining two primitive typing functions, and by varying the constraints stated on the categories that they induce. The exercise of expressing the content of various specific linguistic approaches in such terms immediately calls attention to certain interesting formal issues. For example, we reconstruct below the notion of a list-valued (or stack-valued) feature in terms of category-valued features, which automatically allows operations defined on categories such as unification to apply to lists without special redefinition. An interesting fact that emerges from the view taken here is that on the matter of syntactic categories, there is somewhat more commonality among the diverse approaches currently being pursued than there appears to be when those approaches are viewed in the formalisms used by their practitioners. The various syntactic frameworks that we examine below can be seen to share a great deal of their underlying substantive claims about the information content of the category label of a constituent. Our explication of these underlying commonalities may make somewhat easier the task of the computational linguist attempting to implement a system on the basis of some grammatical framework, or attempting to decide which approach to implement in the first place. In order to prepare for some of the definitions that follow, we will briefly and informally sketch some of our assumptions about features and categories and the terminology we shall use for talking about them. A category is a set of feature specifieations meeting certain conditions to be defined below. A feature specification is an attribute-value pair (f, v) where the attributef(the feature) is atomic (i.e., given by some finite list, and regarded as unanalyzable) and the value v is either Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et al.

atomic or complex. H e r e we shall assume just one type of complex value, namely a category (but see below in section 5). An example of an atom-valued feature specification would be (SINGULAR,+) (which many grammarians would write as [+SINGULAR]); intuitively, it might mark singular number (though, of course, the interpretation it actually has depends on the role it plays in the grammar). An example of a complex feature specification, with a category as the value, would be (AGREEMENT, {(SINGULAR,+), (GENDER,FEM),(PERSON,3)}); intuitively, it might be used to c o n v e y that the value of the AGREEMENT feature is a category representing the combination of singular number, feminine gender, and third person. In the following sections, we will always use SMALLCAPITALS for feature names, and we will generally replace " - " and " + " , which are standard usage in the linguistic literature for the atomic values of a binary feature, by 0 and 1 respectively. As we have said, a category is a set of feature specifications meeting certain conditions. We will now specify these. We do not require that every feature name be represented in each category, but we do require that each occurrence of a feature be paired with exactly one value in any set of specifications; thus {(SINGULAR,+>, (SINGULAR,-->} could not be a category. Hence a category can be modeled as a partial function C:F--> V, where F is a set of features and V is the set of values. An equivalent alternative would be to treat categories as total functions into a range that includes an element ± that can stand as the value where the corresponding partial functions would fail to assign a value. Note that we use the term 'range' here, and subsequently, to refer to a set that includes all the values that a partial function or family of partial functions might take given appropriate domain elements, rather than just the set of values that it does take when we fix a particular domain for a function. It may be helpful to think of a category as having the structure of an unordered tree, and we will introduce a type of diagram below which exhibits this structure overtly. Often, however, the idea of categories as partial functions will be crucial, so it should be kept in mind throughout. Since the set V of values may include categories, the definition of the entire set of categories has to be given recursively. Moreover, it has to allow for the possibility that not all values are compatible with all features. Thus, for example, in a given feature system, (GENDER, 0 ) and (PERSON,plural) might be coherent objects but mnemonically perverse, whereas in another feature system, they might simply be ill-formed. We shall show how these issues can be resolved in the coming sections. We will not, however, give a constructive definition of the set of categories for each grammatical framework we consider. Instead, given our comparative and metatheoretical goals, it turns out to be more Computational Linguistics, Volume 14, Number I, Winter 1988

Category Structures

convenient to define a category system as a pair (~, C> where ~ is a category structure, which defines a set of potential categories (see section 8), and C is a set o f constraints expressed in L c, a language for which the category structure defines the models (see section 9). The actual categories in the system are then to be construed as that subset of the potential categories defined in ~, each m e m b e r of which satisfies every constraint listed in C. 1 DEFINING CATEGORY STRUCTURES In this section we define the notion of a category structure, which is basically a choice of primitives: a list of features, and a range of possible values for each. Here and throughout the paper we will frequently use " 2 " to denote the set {0, l} (the context will make it clear when " 2 " represents an integer and when it represents a set). We will write A B for the set o f total functions from B into A , A (m for the set of partial functions from B into A, @(A) for the power set of A,IAI for the cardinality of A, and Aft) for the domain of a (partial) function f ( i f f is a partial function than A(f) is the set of items to which f assigns a value). A category structure E is a quadruple (F, A, r, p) where F is a finite set of features, A is a finite set o f atoms, r is a function in 2F, and p is a function from {flW.D = 0} into ~(A). The function r partitions F into two sets: the set of type 0 features F ° = {fir(f) = 0}, and the set of type 1 features F l = {fir(f) = 1}. We will write r as r 0 when F = F °. T y p e 0 features take atomic values and type 1 features take categories as values. The function p assigns a range of atomic values to each feature of type 0. The set of categories K is recursively defined in terms of (F, A, r, p), in a way very similar to that used in Pollard (1984, p. 299ff), though Pollard's assumptions differ on some important details. A relatively informal presentation will suffice here. We will refer to the set o f partial functions from F ° into A that are consistent with p as the type 0 categories. We first define the set of pure type 0 categories of ~ as those containing only type 0 feature specifications. Then we build up K via a series of approximations we will refer to as levels, finally taking the infinite union of all the levels to obtain K itself: (1) a. O is a category at level 0 b. If a is a type 0 category and fl is a category containing only type 1 features whose values are categories at level n, then a U fl is a category at level n + 1. c. K is the set of all categories at all levels n -> 0. Given the way K is built up, the induction step in (Ib) being restricted to union of finite partial functions, it should be clear that K is a recursive set. We can define certain relations and operations on the space K of possible categories. Thus, we can give a

Gerald Gazdar et aL

Category Structures

constructive definition for unification (symbolized U) as a binary operation on categories.

there are just two distinct types of well-formed basic constraint:

(2) Definition: unification (i) if (f, v) E a but/300 is undefined, then (f, v) E aU/3; (ii) if (f, v) E/3 but a(f) is undefined, then (f, v) E ~U/3; (iii) if (f, v;) e a and (f, b) e /3 and ~-(]) = 1, then if vi U b is undefined, a U/3 is also undefined, else (f, v i U vj) @ a U/3; (iv) if (f, vi) E a and (f, 5) E /3 and T(j') = 0, then if vi = vj, (f, vi) E /3 U/3, else a U/3 is undefined. (v) nothing else is in a U/3.

(5) a. f b. 32a

We can then use unification to define the subsumes relation between categories (where 'subsumes' means 'is more general/underspecified than', or 'is extended by'). We symbolize 'subsumes' with 'E_', and define it as follows. (3) Definition: subsumption o~ subsumes/3 (a E_/3) if and only if/3 = a U/3. Thus a subsumes/3 if and only if/3 is the unification of a and/3. When a subsumes/3 then we may refer to/3 as an e x t e n s i o n of a. If a U/3 is undefined, then/3 = a U/3 fails, and a does not subsume/3. From this it follows that, if a and/3 are categories, then a = / 3 if and only if a E_/3 and/3 E_ a. The following theorem is provable by induction on category levels. (4) Theorem: a subsumes/3 if and only if (i) Vf E (A(a) fq F °) [a(f) = /3(f)] and (ii) Vf E (A(a) tq F 1) [a(f) F"/3(f)]. 2 THE CONSTRAINT LANGUAGE L c We now provide an interpreted formal language, L C, for expressing specific constraints on categories. Constraints are statements that can be true or false of a category. By requiring satisfaction of the constraint, a constraint can be used to delimit a subspace within the set K induced by a given category structure E, to serve as the grammatical categories for a particular type of grammar. It should be noted that our goals in formulating L c are slightly different from those of Rounds and his associates: L c is a language for formulating constraints on well-formed categories, not a language whose expressions are intended for use in place of categories. To put it rather crudely, our language is for category definition whereas Rounds' is (in part) for category manipulation. However, the languages look rather similar syntactically, and where they overlap, the semantics is essentially the same. We define two types of constraint: basic and complex. If f is an element of F, and a is an element of A, then 4

(where ~-(f) = 0)

Informally, (5a) constrains a category to contain some specification for the feature f; thus, the constraint "BAR" says that every syntactic category satisfying it has as one of its elements a pair (BAR, n). This does not entail that every value of every category-valued feature contained in the category must contain BAR; a basic constraint applies to the " t o p level" of the tree-like structure of a category. Likewise, (5b) says of a category satisfying it that it has as one of its elements the pair (f, a). Note that the only thing a basic constraint can require of a type 0 feature beyond saying that it must be present (defined) is that it have a particular atomic value, and that a basic constraint cannot require anything of a type 1 feature at all beyond demanding its presence. Turning to complex constraints, we now continue the list (5), giving the syntax for each type of complex constraint together with an informal indication of its semantics. Assume t h a t f i s an element of F 1, and 05 and are themselves well-formed basic or complex constraints, and that we are considering the interpretation of the constraints with respect to some fixed category structure Y and some category a. (5) c. d. e. f. g.

32 05 ' f is defined in a and its value satisfies 05' -7 05 ' a does not satisfy 05' 05 V ~O' a satisfies either 05 or ~O' 05 A @ ' a satisfies both 05 and ~0' 05 --> ~0 'either a does not satisfy 05 or a does satisfy ~0' h. 05 ~ ~0' a satisfies either both or neither of 05 and i. D05 ' a satisfies 05, and all values of type 1 features in a satisfy 1~05' j. © 05 'either a satisfies 05 or some value of a type 1 feature in a satisfies O 05'

Constraints of the forms (5a) through (5h) are fairly straightforward, but constraints like those shown in (5i) and (5j) need a little more discussion. They introduce modality into our language. Their purpose is to allow for recursive constraints to be imposed on successively e m b e d d e d layers of category values. As indicated, a category a satisfies r-]05provided that, firstly, a satisfies 05 and secondly, whenever a assigns a category/3 to a type 1 feature f, /3 satisfies D05. This may appear to introduce a circularity, but it does not: categories are finite, and within any category there will be a level so deeply embedded in the tree structure that there are no more category values within it; at that point [~05 is true if 05 is, thus ending the recursion. Our choice of notation in (5i) is quite deliberate: in effect, constraints of the form (50 express universal quantification over embedded accessible' categories in Computational Linguistics, Volume 14, Number 1, Winter 1988

Gerald Gazdar et al.

the way that the familiar necessity operator [] of modal logic enforces universal quantification over accessible worlds in the standard semantics. The possibility operator in (5j) is, as usual, the dual of the necessity operator: O 4, says of a category a satisfying it that either a satisfies tO, or there exists a category-value/3 assigned to a type 1 f e a t u r e f b y a such that/3 satisfies <>tO. As a simple example of the sort of work a complex constraint in Lc might do in a grammatical theory, consider the constraint that is known as the " C a s e filter" in recent T G (see Chomsky 1980, p. 25). Stated informally as " * N , where N has no C a s e " , the constraint appears to require every occurrence of the feature complex characterizing the category N, i.e., every occurrence o f [ + N , - V ] , to co-occur with a feature called " C a s e " . The constraint can be stated in Lc as (6). (6) [](((N: I) A (v: 0)) ~ CASE) Here and from now on, we use parentheses in the obvious way wherever it is necessary prevent ambiguity in the statement of constraints. The account of L c given thus far will suffice for a reading of this paper, but those readers who would like to see the semantics given more formally may turn to the appendix. To recapitulate, a theory of categories ® in our sense is a pair (E, C), where I£ is a category structure and C is a set of sentences of L o The set of categories determined by ® is the maximal subset Kc of K determined by E such that each m e m b e r of K c satisfies every member of C.

3 ILLUSTRATIVE APPLICATIONS We will now illustrate the application of the apparatus developed thus far by reconstructing the category systems used in a number of well-known grammatical frameworks that linguists have developed, most of them frameworks that have been used in natural language processing systems at one time or another. 3.1 SIMPLE PHRASE STRUCTURE GRAMMAR

The case of simple phrase structure grammar is trivial, but will serve as an introduction to the form of later sections, and as a straightforward example of the use of a type 0 feature. The set of categories used in a simple phrase structure grammar is just some finite set of atomic categories {al . . . . . a,}, for example, {S, NP, VP, Det, N, V}. So we fix values for F, A, z, and p as in (7): (7) a. b. c. d.

F = {LABEL} A = {a 1. . . . . a,} ~'o p = {