|Expanding Access to Science and Technology (UNU, 1994, 462 pages)|
|Session 4 : Intelligent access to information: Part 2|
|The new world of computing: The sub-language paradigm|
How are sub-languages implemented in the computer? Members of one class of sub-languages are already implemented in computers, namely programming languages. How are they currently implemented? A compiler is written that embodies both the syntax of the language and the semantics. The compiler accepts a sentence of the language and returns a single, machine language program. When used in interactive mode, this program is then executed. That is, the abstract computer that understands the programming language consists of a hardware computer with its own machine language and the compiler, which translates the programming language into machine language. To change the programming language, one rewrites the compiler. There are compelling reasons why this is a bad technology for implementing sub-languages, including programming languages. Before discussing these reasons, we present here a radically different technology.
The manager enters: "Task G will be delayed 10 days."
1. changes appropriately the data base entry for the duration of task G;
2. recomputes a Pert chart for the entire project;
3. for those tasks that will experience a significant delay, generates e-mail to the affected managers, identifying the cause and potential consequences of the delay;
4. sends any changes in critical path tasks to the Project Manager.
Let the computer be a Universal Language Processor (a hardware computer with a universal language processor replacing the compiler of a particular language, if you like). It operates in two modes:
(1) It accepts, one at a time, the rules of grammar and their associated semantic procedures that define the sub-language, building them into its internal grammar table.
(2) It accepts an input sentence, parses it according to the grammar, uses the resulting parsing graph to compose the associated semantic procedures, evaluates them, outputs the result, and cycles.
A typical Rule of Grammar (as understood by the computer):
>Syntax :<noun_phrase> =><adjective>" "<noun_phrase>
>Semantics: POST adj_mod_proc
Thus it is a simple, straightforward implementation of compositional semantics.
A little insight into what is going on here will be useful in understanding the power of this paradigm. Sub-languages are defined to the computer in terms of grammar rules, consisting of a syntactic aspect and an associated semantic procedure. An example of such a rule is shown in figure 3. Given the constituents of a meaningful phrase - for example: "government" and "contracts" - the semantic procedure goes to the two associated data files and produces the "meaning" of the entire phrase: "government contracts." The role of syntax is to show how words and phrases can be combined into meaningful statements. Once the syntactic structure of a sentence is seen, the associated semantic procedures can be composed appropriately. The rules of grammar, along with the corresponding semantic procedures, constitute the building blocks. Each of these rules is implemented as a separate unit. The syntax of a sentence provides the plan for combining these building blocks into the complex meaning of the entire sentence. Thus the individual semantic procedures can be efficiently composed in innumerable ways to produce the needed answers to immediate user concerns.
(The first question that will come to the mind of a knowledgeable computer person is the effect of such an architecture on response time. Let us deal with this immediately. In our current implementation of this architecture, against a moderate size database concerning ships and shipping [for computational linguists, this is the well-known DARPA "blue" file.], and using a sizable grammar, the parsing time for the following sentence: "What is the cargo type and destination of each ship whose port of departure was some Soviet port?" is about a tenth of a second; the through put time, including database access, is 8 seconds. The key to these response times lies in the fact that in such very high-level sub-languages, the object-class data structures and processes are highly optimized, so that in processing a sentence, one is composing a few highly optimized procedures.)
The first thing to notice are the implications of the independence of the grammar rules - syntax and associated semantic procedures. As said above, in building a sub-language, rules are added one at a time. These same rule-adding utilities can obviously be used at any time to add an additional rule or, for example, a whole family of rules implementing a new object class. It is these same utilities that implement the user's ability to extend his own sub-language by definitions.
An "insider's" problem is to determine how the great number of highly complex procedures that may all be needed at some time or another can be retained in a form that makes them available for rapid response to a query. One way that has proved particularly effective is to use "pages" in peripheral memory that are organized on the basis of semantic content. In response to a particular query, only those pages that are required are brought into main memory - whether they be database record, procedure, text, image, digitized voice, or other pages. Pages holding all manner of material are brought into the same paging area. Obviously, procedure pages require a modicum of run time binding, but since the number of paging slots is large, there is very little trashing of pages between main and peripheral memories.
The information available to the computer is organized into a network of "nodes" and "links." The "nouns" of a sub-language point to certain of the nodes in this semantic net. The syntax rules also have a geometric interpretation in terms of the semantic net; they indicate how to move from one set of nodes to another. Thus the parser composes the path from the words in the initial expression of a question to the nodes constituting the desired answer. The information about a node is kept on a database record on one or more pages of peripheral memory.
Organizing information in this way provides a highly efficient and flexible method for maintaining a rather shallow level of information organization (essentially equivalent to an entity-attribute database or relational database, plus inheritance). By linking such "database" records to more complex forms of representation (e.g., texts, pixel files, postscript files, engineering drawings) and by providing sophisticated semantic procedures that can exploit the additional complexities of these structures, the computer can give wide-ranging responses to highly complex technical questions. In the terminology of object-oriented programming, these database records constitute the object representations for the single all-encompassing object class, "noun." Any hierarchy of subclasses of objects may be created, such as "image noun," "matrix noun," "co-variance matrix noun," etc., with their associated processing procedures.
A new object class can be easily implemented as a new subclass of the "noun" object class; when an instance of the new object class is created, first its record as an instance of "noun" is created, and then a link from this record to an instance of the data structure of the new class is added. As an example, suppose one were building a new sub-language to be used by the structural engineers in an aerospace company. Suppose the company already had a major investment in files of stress data and, say, FORTRAN procedures that processed these files. The new object class, a sub-object class of "noun," would be created whose associated data structure was that of the stress data files. Syntax rules for noun phrases that engineers commonly used in referring to the stress data would be added, their corresponding semantic procedures consisting largely of calls to the relevant FORTRAN routines. Such queries as: "Plot the stress against wing tip loading for both Model A12 and Model A14 wing aileron designs" would be immediately available.
In today's highly visual world, sub-languages are seldom limited to written text. But how can this complete integration of media be implemented? Certainly the identification of the object class with its encapsulation of structure and process is a major step. Another step concerns the extended "alphabet" available to all sub-languages. All letters and characters of the usual alphabet as well as the entire extended ASCII character set; all graphic "events," such as clicks of the mouse and movements of the cursor; and all "interrupts" from internal and external sources (properly screened and identified) can be used in the input string that is fed to the language processor. (The computer, like human beings, has "fingers" for pointing and "intonation" and "gestures" it can use.) In this respect, all sub-languages have the same terminal vocabulary, namely this extended alphabet. Once this is established, grammar rules can supply the recursive, flexible link between the input string and the internal object classes. For example, one can at any time introduce a new icon, placing under it any sentence or phrase of the sub-language that is then evaluated in line whenever the icon is clicked during input of a query.
In figure 4, an airline mechanic is seen working on the radar nose-cone of a Boeing 747 aircraft. He turns to his computer for detailed technical support. He has already entered information identifying the particular aircraft he is working on and has called for a display of the nose-cone area. The computer-generated photo image of the relevant area (plus an invisible back-plane drawing outlining all significant parts) provides a highly efficient medium for communication. For example, he may type "leak" and click his mouse on the image of the place he suspects is leaking oil. The computer may respond with the spoken word: "tighten" and blink the bolt it identified in its diagnosis as the probable cause of the leak. In response to a sparsely stated but technically involved question, the mechanic receives an immediately useful response that reflects a high degree of built-in understanding.
In figure 5, a maintenance professional is entering instructions into his personal, completely mobile, telephone-computer. It eliminates any need for the usual truck-full of manuals. The professional's efficiency is greatly increased, since the computer tailors its responses to the specific installation. Astute use of hypermedia links from one data display to another quickly provides pathways to the details the professional really needs. References that establish context (e.g., "I am at . . ."), as well as pronouns and elliptic constructions (e.g., "What about the other connector?") play important roles in effective dialogue. Note that pointing to and blinking significant areas in pictures and drawings constitutes visual "pronouns" (e.g., "voltage 'there'?" or "tighten 'that'," ''[ show schematic icon] of 'that"').
There is a leak here (pointing at the monitor}. Is this Shuttle cock tightly closed? (valve on monitor blinks) Yes. Check the connection here (arrow points to indicated point).
I am at 766 Oak Lane. Show me the electric panel wiring diagram.