Sense-Linking in a Machine Readable Dictionary
Department of Computer Science
University of Massachusetts, Amherst, MA 01003
Dictionaries contain a rich set of relationships between their senses, but often these relationships are only implicit. We report on our experiments to automatically identify links between the senses in a machinereadable dictionary. In particular, we automatically identify instances of zero-affix morphology, and use that information to find specific linkages between senses. This work has provided insight into the performance of a stochastic tagger.
Machine-readable dictionaries contain a rich set of relationships between their senses, and indicate them in a variety of ways. Sometimes the relationship is provided explicitly, such as with a synonym or antonym reference. More commonly the relationship is only implicit, and needs to be uncovered through outside mechanisms. This paper describes our efforts at identifying these links.
The purpose of the research is to obtain a better understanding of the relationships between word meanings, and to provide data for our work on wordsense disambiguation and information retrieval. Our hypothesis is that retrieving documents on the basis of word senses (instead of words) will result in better performance. Our approach is to treat the information associated with dictionary senses (part of speech, subcategorization, subject area codes, etc.) as multiple sources of evidence (cf. Krovetz ). This process is fundamentally a divisive one, and each of the sources of evidence has exceptions (i.e., instances in which senses are related in spite of being separated by part of speech, subcategorization, or morphology). Identifying related senses will help us to test the hypothesis that unrelated meanings will be more effective at separating relevant from nonrelevant documents than meanings which are related.
We will first discuss some of the explicit indications of sense relationships as found in usage notes and deictic references. We will then describe our efforts at uncovering the implicit relationships via stochastic tagging and word collocation.
2 Explicit Sense Links
The dictionary we are using in our research, the Longman Dictionary of Contemporary English
(LDOCE), is a dictionary for learners of English as a second language. As such, it provides a great deal of information about word meanings in the form of example sentences, usage notes, and grammar codes. The Longman dictionary is also unique among learner's dictionaries in that its definitions are generally written using a controlled vocabulary of approximately 2200 words. When exceptions occur they are indicated by means of a different font. For example, consider the definition of the word gravity:
ffl gravity n 1b. worrying importance: He doesn't understand the gravity of his illness - see grave2
ffl grave adj 2. important and needing attention and (often) worrying: This is grave news | The sick man's condition is grave
These definitions serve to illustrate how words can be synonymous1 even though they have different parts of speech. They also indicate how the Longman dictionary not only indicates that a word is a synonym, but sometimes specifies the sense of that word (indicated in this example by the superscript following the word `grave'). This is extremely important because synonymy is not a relation that holds between words, but between the senses of words.
Unfortunately these explicit sense indications are not always consistently provided. For example, the definition of `marbled' provides an explicit indication of the appropriate sense of `marble' (the stone instead of the child's toy), but this is not done within the definition of `marbles'.
LDOCE also provides explicit indications of sense relationships via usage notes. For example, the definition for argument mentions that it derives from both senses of argue - to quarrel (to have an argument), and to reason (to present an argument). The notes also provide advice regarding similar looking variants (e.g., the difference between distinct and distinctive, or the fact that an attendant is not someone who attends a play, concert, or religious service). Usage notes can also specify information that is shared among some word meanings, but not others (e.g., the note for venture mentions that both verb and noun carry a connotation of risk, but this isn't necessarily true for adventure).
Finally, LDOCE provides explicit connections between senses via deictic reference (links created by
1We take two words to be synonymous if they have the same or closely related meanings.