J. C. Schlimmer and L. A. Hermens
Volume 1, 1993
Links to Full Text:Description of Online Appendix: People like to record information. Doing this on paper is initially efficient, but lacks flexibility. Recording information on a computer is less efficient but more powerful. In our new note taking softwre, the user records information directly on a computer. Behind the interface, an agent acts for the user. To help, it provides defaults and constructs a custom user interface. The demonstration is a QuickTime movie of the note taking agent in action. The file is a binhexed self-extracting archive. Macintosh utilities for binhex are available from mac.archive.umich.edu. QuickTime is available from ftp.apple.com in the dts/mac/sys.soft/quicktime.
FAQ | Best Paper | WWW AI Resources | Comments | Notify Me of New Articles
© Copyright 1993-2006 AI Access Foundation, Inc.
Journal of Artificial Intelligence Research 1 (1993) 61-89 Submitted 8/93;
published 11/93
(C)
1993 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Software Agents: Completing Patterns and
Constructing User Interfaces
Jeffrey C. Schlimmer
SCHLIMMER@EECS.WSU.EDU
Leonard A. Hermens
LHERMENS@EECS.WSU.EDU
School of Electrical Engineering & Computer Science,Washington State University,
Pullman, WA 99164-2752, U.S.A.
Abstract
To support the goal of allowing users to record and retrieve information,
this paperdescribes an interactive note-taking system for pen-based computers
with two distinctive features. First, it actively predicts what the user
is going to write. Second, it automaticallyconstructs a custom, button-box
user interface on request. The system is an example of a learning-apprentice
software-agent. A machine learning component characterizes thesyntax and
semantics of the user's information. A performance system uses this learned
information to generate completion strings and construct a user interface.
1. Introduction and Motivation
People like to record information for later consultation. For many , the
media of choice ispaper. It is easy to use, inexpensive, and durable. To
its disadvantage, paper records do not scale well. As the amount of information
grows, retrieval becomes inef ficient, physical stor-age becomes excessive,
and duplication and distribution become expensive. Digital media offers better
scaling capabilities. With indexing and sub-linear algorithms, retrieval
is ef fi-cient; using high density devices, storage space is minimal; and
with electronic storage and high-speed networks, duplication and distribution
is fast and inexpensive. It is clear that ourcomputing environments are evolving
as several vendors are beginning to market inexpensive, hand-held, highly
portable computers that can convert handwriting into text. We viewthis as
the start of a new paradigm shift in how traditional digital information
will be gathered and used. One obvious change is that these computers embrace
the paper metaphor ,eliminating the
need
for typing. It is in this paradigm that our research is inspired, and one
ofour primary goals is to combine the best of both worlds by making digital
media as convenient as paper.This document describes an interactive note-taking
software system for computers with pen-based input devices. Our software
has two distinctive features: fi rst, it actively predictswhat the user is
going to write and provides a default that the user may select; second, the
software automatically constructs a graphical interface at the user 's request.
The purpose ofthese features is to speed up information entry and reduce
user errors. Viewed in a lar ger context, the interactive note-taking system
is a type of self-customizing software.To clarify this notion, consider a
pair of dimensions for characterizing software. As Figure 1 depicts, one
dimension is task specifi city. Software that addresses a generic task(e.g.,
a spreadsheet) lies between task independent software (e.g., a compiler)
and task specific software (e.g., a particular company' s accounting software).
Another dimension isthe amount of user customization required to make the
software useful. Task generic software lies between the two extremes, requiring
modest programming in a specialized
S
CHLIMMER
& H
ERMENS
62
language. Self-customizing software uses machine learning techniques to automatically
cus-tomize task generic software to a specific user. Because the software
learns to assist the user by watching them complete tasks, the software is
also a learning apprentice. Similarly ,because the user does not explicitly
program the defaults or the user interface for the note taking system, it
is a type of software agent. Agents are a new user interface paradigm thatfree
the user from having to explicitly command the computer. The user can record
information directly and in a free-form manner. Behind the interface, the
software is acting on behalfof the user, helping to capture and organize
the information.
Next we will introduce the performance component of the note-taking software
in moredetail, then describe the representations and algorithms used by the
learning methods. We also present empirical results, comparing the performance
of seven alternate methods onnine realistic note-taking domains, and fi nally,
we describe related research and identify some of the system's limitations.
Figure 1: Continuum of software development depicting the traditional trade-offbetween
the development cost per user and the amount of user customization required.
Self-customizing software eliminates the need for user customization by starting
withpartially-specified software and applying machine learning methods to
complete any remaining customization.
Generic Task Specificity of Product Specific Low Development Cost / User
High
High
Low User Customization Required
Visual BASIC
Custom Software Spreadsheets Self-Customizing
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
63
2. Performance Task
The primary function of the note-taking software is to improve the user's
speed and accuracyas they enter notes about various domains of interest.
A
note
is a short sequence of descrip-tive terms that describe a single object
of interest. Example 1 shows a note describing a particular personal computer
(recorded by the fi rst author from a Usenet newsgroup during1992):
4096K PowerBook 170, 1.4MB and 40MB Int. Drives, 2400/9600 Baud FAX Modem
(Example 1) Example 2 is a note describing a fabric pattern (recorded by
the first author's wife):
Butterick 3611 Size 10 dress, top
(Example 2) Tables 5 through 11 later in the paper list sample notes drawn
from seven other domains.The user may enter notes from different domains
at their convenience and may use whatever
syntactic style comes naturally.From the user 's point of view , the software
operates in one of two modes: a
contextual
prompting mode, and an interactive graphical interface mode. In the first
mode, the softwarecontinuously predicts a likely completion as the user writes
out a note. It of fers this as a default for the user . The location and
presentation of this default must balance confl ictingrequirements to be
convenient yet unobtrusive. For example, the hand should not hide the indicated
default while the user is writing. Our solution is to have a small, colored
comple-tion button follow to the left and below where the user is writing.
In this location, it is visible to either right- or left-handed people as
they write out notes. The user can reposition the but-ton to another location
if they prefer . The default text is displayed to the immediate right of
this button in a smaller font. The completion button is green; the text
is black. The comple-tion button saturation ranges from 1 (appearing green),
when the software is highly confident of the predicted value, to 0 (appearing
white), when the software lacks confi dence. The but-ton has a light gray
frame, so it is visible even when the software has no prediction. Figure
2
Figure 2: Screen snapshot of the note-taking software in contextual prompting
modefor a PowerBook note. The two triangles in the lower left are scroller
buttons.
S
CHLIMMER
& H
ERMENS
64
portrays a screen snapshot of the software operating in the contextual prompting
mode for aPowerBook note.
The software' s second mode presents an interactive graphical interface.
Instead ofrequiring the user to write out the text of a note, the software
presents a radio-button and check-box interface (what we call a
button-box interface
). With this, the user may selectfrom text fragments, portions of notes
called
descriptive terms
, by tapping on radio-buttonsor check-boxes with the pen interface device.
Each selection from the button-box interface is added to the current note.
Intuitively , check boxes are generated to depict optional descrip-tive terms,
whereas radio-button panels are generated to depict alternate, exclusive
descriptive terms. For user convenience, the radio-buttons are clustered
into panels and are sortedalphabetically in ascending order from top to bottom.
To allow the user to add new descriptive terms to a button-box panel, an
additional blank button is included at the bottom of each.When the user selects
a radio button item, the graphical interface is expanded to depict additional
choices corresponding to descriptive terms that follow syntactically . The
softwareindicates its predictions by preselecting the corresponding buttons
and highlighting them in green. The user may easily override the default
selection by tapping the desired button.Figure 3 portrays a screen snapshot
of the software operating in the interactive graphical interface mode for
a PowerBook note.The software is in prompting mode when a user begins to
write a note. If the learned syntax for the domain of the note is suf ficiently
mature (see Section 6, Constructing a Button-Box Interface), then the software
can switch into the button-box mode. To indicate this to the user, a mode
switch depicted as a radio button is presented for the user 's notice. A
conve-nient and unobtrusive location for this switch is just below the completion
button. In keeping with the color theme, the mode switch also has a green
hue. If the user taps this switch, thewritten text is removed , and the appropriate
radio buttons and check boxes are inserted. The system automatically selects
buttons that match the user -written text. As the user makesadditional selections,
the interface expands to include additional buttons. When the user finishes
a note, in either mode, the software returns to prompting mode in anticipation
ofanother note.
1
Because the interface is constructed from a learned syntax, as the software
Figure 3: Screen snapshot of the note-taking software in button-box mode
for aPowerBook note.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
65
refines its representation of the domains of the notes, the button-box interface
also improves.On-line Appendix 1 is a demonstration of the system's operation
in each of its two modes.
3. Learning a Syntax
To implement the two modes of the note taking software, the system internally
learns twostructures. To characterize the syntax of user 's notes, it learns
finite-state machines (FSMs). To generate predictions, it learns decision
tree classifi ers situated at states within the FSMs.In order to construct
a graphical user interface, the system converts a FSM into a set of buttons.
This section describes the representation and method for learning FSMs.
The nextsection discusses learning of the embedded classifiers.
3.1 Tokenization
Prior to learning a fi nite-state machine, the user 's note must fi rst be
converted into asequence of tokens. Useful tokenizers can be domain independent.
However , handcrafted domain-specific tokenizers lead to more useful representations.
The generic tokenizer usedfor the results reported here uses normal punctuation,
whitespace, and alpha-numeric character boundaries as token delimiters. For
example, our generic tokenizer splits the samplePowerBook note in Example
1 into the following 16 tokens:
:NULL
"
4096
""
K
""
PowerBook
""
170
""
, 1.4
""
MB
""
and
"" 40
"" MB
""
Int.
""
Drives
""
, 2400/9600
"" Baud
""
FAX
""
Modem
" . The token
:NULL
is prepended by the tokenizer . This convention simplifi es the code
forconstructing a FSM.
3.2 Learning a Finite-State Machine
Deterministic fi nite-state machines (FSMs) are one candidate approach
for describing thesyntax of a user 's notes because they are well understood
and relatively expressive. Moreover, Angluin (1982) and Berwick and Pilato
(1987) present a straightforward algorithm forlearning a specific subclass
of FSMs called k-reversible FSMs. The algorithm is incremental
1. Of the functionality described here, our prototype implements all but
the transition from button-box to contextual prompting. The mechanism for
such a transition is machine dependent and is not germane to this research.
S
CHLIMMER
& H
ERMENS
66
and does not suffer from presentation order effects. Berwick and Pilato defi
ne a k-reversibleFSM as:
"A regular language is
k-reversible
, where
k
is a non-negative integer , if whenever twoprefixes
whose last k wor ds [tokens] match
have a tail in common, then the two prefi xeshave all tails in common. In
other words, a deterministic fi nite-state automaton (DF A) [FSM] is
k
-reversible if it is deterministic with lookahead
k
when its sets of initial andfinal states are swapped and all of its arcs
[transitions] are reversed."
Given a list of tokens, the k-reversible FSM algorithm first constructs a
prefix tree, whereall token sequences with common k-leaders share a k-length
path through the FSM. For example, Figure 4a depicts a simple FSM constructed
for a single fabric pattern note. Thetext of the user's note was converted
into a sequence of tokens. Then a transition was created for each token and
a sequence of states was created to link them together. One state serves
asthe initial state, and another indicates the completion of the sequence.
For convenience, this latter, terminal state is depicted with a double circle.
If the FSM is able to fi nd a transitionfor each token in the sequence, and
it arrives at the terminal state, then the FSM accepts the token sequence
as an instance of the language it defi nes. Figure 4b depicts the same FSMafter
another path has been added corresponding to a second fabric pattern note
(Example 2). Now the FSM will accept either note if expressed as a sequence
of tokens. This FSM is atrivial prefix tree b ecause only the first state
is shared between the two paths.
Figure 4: (a) Degenerate finite-state machine after processing a single
fabric patternnote, and (b) prefix tree finite-state machine after adding
a second fabric pattern note (cf. Example 2).
11/12
dress
3035 :NULL Butterick
start
terminal (a)
Size
11/12 10
dress dress
3035 3611 :NULL :NULL Butterick Butterick
start
terminal terminal
Size Size
top (b)
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
67
A prefix tree is minimal for observed token sequences, but it may not be
general enoughfor use in prediction. (The prefi x tree is, in essence, an
expensive method for memorizing token sequences--which is not the desired
result.) For the sake of prediction, it is desirableto have a FSM that can
accept new , previously unseen combinations of tokens. The prefi x tree
automaton can be converted into a more general FSM by mer ging some of its
states. Aparticular method for doing this converts a prefi x tree into a
k-reversible FSM via Angluin's (1982) algorithm. The algorithm mer ges states
that have similar transitions, and it creates aFSM that accepts all token
sequences in the prefix tree, as well as other candidate sequences. Table
1 lists the three rules for deciding when to merge a pair of states in a
prefix tree to forma k-reversible FSM. In the special case where k equals
zero, all states have a common kleader, and Rule 2a ensures that there will
be only one accepting state.Because the rules in Table 1 must be applied
to each pair of states in the FSM, and because each time a pair of states
is mer ged the process must be repeated, the asymptoticcomplexity of the
process is O( ), where is the number of states in the FSM.
Applying these rules to the prefix tree in Figure 4b with k equal to zero
results in a FSMdepicted in Figure 5a. Notice that the fi rst two states
have been mer ged to make the FSM deterministic (Rule 1). The accepting
states have also been mer ged in compliance withRule 2a. The resulting FSM
has fewer states but is not more general. It only accepts the two token sequences
originally seen. Extending this example, Figure 5b illustrates the addition
ofa third fabric pattern note as a prefix tree path to the FSM. Reapplying
the rules results in the FSM shown in Figure 6. The first two states have
been mer ged as before through the actionof the determinism Rule 1. Note
that a pair of latter states have also been mer ged because they share a
common zero-leader (true of all pairs of states) and because they transition
tothe common terminal state on the token "
dress
".Figure 7 depicts a more sophisticated result; it shows a learned zero-reversible
FSM for notes about PowerBook computers. This example shows that the model
number "
100
" isnever followed by a specifi cation for an internal fl oppy drive, but
that other model numbers are. Any model may have an external fl oppy drive.
Note that there is a single terminal state.Whitespace and punctuation have
been eliminated for clarity in the figure.
The rules listed in Table 1 are generalization operators that allow the
FSM to acceptpreviously unobserved sequences. Whenever two or more states
are mer ged into one, the FSM will accept more sequences than before if the
new state is at the tail end of more transi-tions than one of the previous
states and if the new state is at the head end of at least one transition.
For example, the state just after State 1 in Figure 7 was mer ged from
severalprevious states and generalizes memory sizes for PowerBook models.
These rules comprise a heuristic bias and may be too conservative. For example,
Figure 8 depicts a FSM for notes
A k-leader is defined as a path of length k that accepts in the given state.Merge
any two states if either of the following is true: 1. Another state transitions
to both states on the same token; or(This enforces determinism.) 2. Both
states have a common k-leader anda. Both states are accepting states, or
b. Both states transition to a common state via the same token.
Table 1: FSM state merging rules from (Angluin, 1982).
n
3
n
S
CHLIMMER
& H
ERMENS
68
about fabric patterns. Many of the states prior to the accepting state could
be usefullymerged, but using only the rules listed in Table 1, many more
notes will have to be processed before this happens. If the FSM in Figure
8 were rendered as a button-box interface, it wouldreflect little of the
true structure of the domain of fabric patterns. Table 2 lists specializations
of Rules 2a and 2b and an additional pair of rules we developed to make the
FSM generalizemore readily. Note that the parameter k has been set to zero
in Rule 2 and to one in Rule 3. Effectively, two states are mer ged by
Rules 3a or 2b' if they share an incoming or outgoingtransition. Rule 3b
is a Kleene rule that encourages the FSM to generalize the number of times
a token may appear in a sequence. If one state has a transition to another,
then mergingthem will result in a transition that loops from and to the newly
mer ged state. Figure 9 depicts a FSM for notes about fabric patterns learned
using all three generalization rules inTable 2. The resulting FSM accurately
captures the syntax of the user 's fabric pattern notes and correctly indicates
the syntactically optional tokens that may appear at the end of note.When
rendered as a button-box interface, it clearly depicts the user 's syntax
(as illustrated later by Figure 12). The added generalization rules may
have only mar ginal effects on thesystem's ability to accurately predict
a completion as the user writes out a note (as Table 14 below indicates).
Their purpose is to improve the quality of the custom interface.Cohen (1988)
uses an interesting alternative representation for learning a syntactic form.
The goal in his work is to guide the generation of proof structures. Intuitively
, the represen-tation is a finite-state machine that accepts a tree rather
than a sequence, and for this reason it is termed a tree automaton. Like
the rules in Tables 1 and 2, tree automatons are generalized
Figure 5: (a) Finite-state machine after processing two fabric pattern notes
andapplying state merging rules in Table 1, and (b) prefix tree finite-state
machine after adding a third fabric pattern note.
11/12 10
dress dress
3035 3611
:NULL Butterick
start
terminal Size Size
top (a)
11/12 10 10
dress dress dress
3035 3611 3674
:NULL :NULL Butterick Butterick
start
terminal terminal
Size Size Size
top (b)
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
69
by merging states that share similar transitions. Oddly enough, one motivation
for using treeautomatons is that they are less likely to introduce extraneous
loops, the opposite of the problem with the original FSM mer ging rules in
Table 1. It is not clear how to map thesequence of tokens in the user 's
notes into a tree structure, but the less sequential nature of the tree automaton
may help alleviate sequencing problems in rendering the custom userinterface
(see Section 9, Observations/Limitations).
3.3 Parsing
To use the fi nite-state machine for prediction, the software needs a strategy
for dealing withnovel tokens. For example, when the user takes a note about
a PowerBook computer with a
Figure 6: Sample finite-state machine after processing three fabric pattern
notes.
11/12 1010
dress dress
3035 36113674
:NULL Butterick
start
terminal Size SizeSize
top
Merge any two states if any of the following are true:1. Another state transitions
to both states on the same token; or
(This enforces determinism.)2'. Both states have a common 0-leader and a.
Both states are accepting states, orb. Both states transition to a common
state via the same token; or 3. Both states have a common 1-leader anda.
Both states transition to a common state via any token, or
b. One transitions to the other via any token. Table 2: Extended FSM state
merging rules.
S
CHLIMMER
& H
ERMENS
70
ModemFAX
bisBaud K 9.6
32K
v14.4
Ext Drives 1.4
Ext andDrive Drives
and 20 40 20 40 80 120 MB
1.4
Int
MB Int MB
and MB MB
1.480 100 140 145 160 170
PowerBook
K 2048 4096 6144 8192
:NULL
start
terminal 4800/96002400/96002400/48009600 2xBattery,Battery, Case,Charger,
FPU,Video Output
5
7
4 2 3
1
6
Figure 7: Zero-reversible FSM characterizing PowerBook notes (cf. Example
1).
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
71
new memory configuration, the FSM will not have a transition for the first
token. If the soft-ware is to prompt the user, then it must have a means
for deciding where novel tokens lie in a note' s syntax--which state to predict
from. Without such a mechanism, no meaningfulprediction can be generated
after novel tokens.
A state may not have a transition for the next token. In general, this is
a single symptomwith three possible causes: (1) a novel token has been inserted,
(2) a suitable token has been omitted and the next token would be accepted
by a subsequent state, or (3) a token has beensimply replaced by another
in the syntax. For example, in the sequence of tokens {
:NULL
,"
12288
", "
K
", "
PB
"}, "
12288
" is a novel token, a familiar memory size has been omitted, and"
PowerBook
" has been replaced by "
PB
".An optimal solution would identify the state requiring a minimum number
of insertions, omissions, and replacements necessary to parse the new sequence.
An ef ficient, heuristicapproximation does a greedy search using a special
marker. Each time the marked state in the FSM has a transition for the next
token written by the user, the marker is moved forward, anda prediction is
generated from that state. When there is no transition for the next token,
a greedy search is conducted for some state (including the marked one and
those reachablefrom it) that has a transition for some token (including the
next one and those following). If such a state is found, the marker is moved
forward to that state, tokens for the transitions ofskipped states are assumed
omitted, and novel tokens are assumed inserted. If no state past the marker
has a transition for any of the remaining tokens, the remaining tokens areassumed
to be replacements for the same number of the most likely transitions; the
marker is not moved. If the user writes a subsequent token for which some
state has a transition, the
TopSkirt Top Skirt Jumper Dress Jumper Dress DressDress Dress Jumper
11/12 10108-10-1212 1012 12 11/1212 11/12 Size SuzeSize SizeSize SizeSize
SizeSizeSizeSizes
3035 3611367437224198 4352 6171 4864 5377 59065057 5424 5465
SimplicityMcCall'sButterick :NULL
start
terminal
Figure 8: Zero-reversible finite-state machine characterizing fabric pattern
noteslearned using merging rules listed in Table 1.
S
CHLIMMER
& H
ERMENS
72
marker is moved as described above, and the syntax of the user 's note is
realigned with thelearned syntax. Continuing with the simple PowerBook example,
the marker is moved to State 1 of the FSM in Figure 7 because the initial
state had a transition for the fi rst token
:NULL
. Because State 1 doesn't have a transition for the next token "
12288
", a greedy searchis conducted to fi nd a nearby state that accepts either
"
12288
", "
K
", or "
PB
". The state justbefore State 2 accepts "
K
", so the marker is moved to that state. Another greedy search isstarted
to fi nd a state that accepts "
PB
". Because one cannot be found, the heuristic parsingassumes that it should
skip to the next transition. In this case the one labeled "
PowerBook
".Consequently, the system generates a prediction from State 2 to prompt
the user. 3.4 Multiple Finite-State Machines
If the user decides to take notes about multiple domains, it may be necessary
to learn a sepa-rate syntax for each domain. For example, a single syntax
generalized over both the Power Book and fabric pattern notes is likely to
yield confusing predictions and an unnatural userinterface. Maintenance of
multiple finite-state machines is an instance of the clustering problem--deciding
which notes should be clustered together to share a FSM. As Fisher (1987)discusses,
this involves a trade-off between maximizing similarity within a cluster
and minimizing similarity between clusters. Without the fi rst criteria,
all notes would be put into asingle cluster. Without the second criteria,
each note would be put into its own cluster.
One obvious approach would be to require the user to prepend each note with
a uniquetoken to identify each note' s domain. This simplifi es the clustering
computation. All notes sharing the fi rst token would share a FSM. However
, with this scheme, the user would have
TopSkirtJumper JumperDress
1210 11/128-10-12
3035 361136743722 4198 4352 6171 4864 5377 59065057 5424 5465
SimplicityMcCall'sButterick :NULL
start
terminal SizeSizes
Figure 9: Finite-state machine characterizing fabric pattern notes learned
usingextended rules in Table 2. Compare to zero-reversible finite-state
machine for the same domain in Figure 8.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
73
to remember the identifying token or name for each domain. An interface
could provide apop-up list of all previously used domain identifi ers. This
is not satisfactory because it requires overhead not needed when taking notes
on paper.An alternative approach doesn' t require any extra ef fort on the
part of the user . A new note is grouped with the FSM that skips the fewest
of its tokens. This heuristic encourageswithin cluster similarity because
a FSM will accept new token sequences similar to those it summarizes. To
inhibit the formation of single-note FSMs, a new FSM is constructed only
ifall other FSMs skip more than half of the new note' s tokens. This is a
parametrized solution to encourage between-cluster dissimilarity.
4. Learning Embedded Classifiers
Finite-state machines are useful representations for capturing the syntax
of a user 's notes,and they are easy to learn. When predicting a note's completion,
it is essential that a prediction be made from the correct state in the FSM
(as discussed above). It is also necessary todecide whether to terminate
(indicating acceptance of the note) or continue prediction, and, in the later
case, which transition to predict. To facilitate these decisions, the FSM
can main-tain a count of how many times parsing terminated and how many times
each transition was taken. Prediction can then return the option with the
maximum frequency.Figure 10 depicts a FSM for which this method will prove
insufficient. There is only one state, an accepting state, and the transition
corresponding to the token "
X
" is optional. (Thiscorresponds to a check box interface item.) There are
two problems with a frequency-based prediction. First, the FSM does not indicate
that the transition is to be taken at most once, yetthis is quite clear from
the user interface. Second, simple frequency-based prediction would always
recommend termination and never the transition. The FSM accepts whether the
box ischecked or not, thus the frequency of termination is greater than or
equal to the frequency of the transition. This problem arises whenever there
is a loop.Embedding general classifiers in a FSM can alleviate some of the
FSM's representational shortcomings. For example, in the FSM depicted in
Figure 10, a decision tree embedded inthis state easily tests whether the
transition has already been taken and can advise against repeating it. Moreover,
a classifier can predict based on previous transitions rather than justthe
frequency of the current state' s transitions. Therefore, a decision tree
embedded in the state of Figure 10 can predict when the transition should
be taken as a function of other ,earlier tokens in the sequence. Table 3
lists sample decision trees embedded in states of the FSM depicted in Figure
7. The first tree tests which token was parsed by a distant state, ineffect
augmenting the FSM representation. It relates memory size to hard disk capacity
(small amounts of memory correlate with a small hard disk). The second tree
prevents anoptional loop from being taken a second time by testing to see
if the state has yet been visited during a parse of the note. After processing
additional notes, this second decision tree
X
start terminal
Figure 10: Simple finite-state machine with one state.
S
CHLIMMER
& H
ERMENS
74
becomes more complex as the system tries to predict which PowerBooks have
FAX modemsand which do not.
A classifier is trained for each state in the FSM which: (a) has more than
one transition,or (b) is marked as a terminal state but also has a transition.
The classifiers are updated incrementally after the user fi nishes each note.
The classifier 's training data are token sequencesparsed at this state.
The class value of the data is the transition taken from, or termination
at, this state by the token sequences. Only those classifi ers whose states
are used in a parse areupdated. The attributes of the data are the names
of states prior to this one, and the values of the attributes are the transitions
taken from those states. A distinct attribute is defi ned eachtime a state
is visited during a given parse, so when a loop transition is taken a specifi
c attribute reflects this fact. For any of the attributes, if the corresponding
state was not visitedwhile parsing the token sequence, the attribute has
a special, empty value.
Consider the PowerBook FSM shown in Figure 7. A classifi er would be embedded
atStates 1, 2, 3, 4, 5, 6, 7. A training example corresponding to the note
in Example 1 for the classifier at State 6 would be:
Attributes: Values:
S1 = "
4096
"S2 = "
170
"S3 = NIL S4 = "
40
"S5 = "
Drives
"S6 = "
, 2400/9600
"S7 = "
FAX
"S7-1 = "
Modem
"Class: =
:TERMINATE
. Note that there is no value for State 3, denoting that it wasn' t visited
during the parse ofExample 1. Also there are two attributes for State 7 denoting
that it has been visited twice.
The classifier gives informed advice about which transition to take or whether
to termi-nate. The FSM in turn gives the classifi er a specifi c context
for operation. If only a single classifier were used to predict the next
token, it would be hard pressed to represent the differ-ent predictions required.
The domain is naturally narrowed by the FSM and therefore reduces the representational
demands on the classifi er. Later , we present empirical results
Decision tree embedded in State 3
:If State 1 exited with
"2048
"Then predict
" 20"
Else if with
"4096"
Then predict
" 40"
Else if with
"6144"
Then predict
" 40"
Else if with
"8192"
Then predict
" 40"
.
Decision tree embedded in State 7
:If State 7 has not been visited Then predict
" FAX"
Else if State 7 exited with
" FAX"
Then predict
" Modem
" . Table 3: Sample decision trees embedded in the finite-state machine
depicted inFigure 7.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
75
comparing a single classifi er to a set of classifi ers embedded in a FSM.
The findings thereshow that the latter outperforms the former , confi rming
the intuition that learning is more effective if situated within a narrow
context.From the classifier's point of view, the learning task is non-stationary.
The concept to be learned is changing over time because the structure of
the FSM is changing. When two statesare merged, one of the two classifiers
is discarded. The other is now embedded in a different position in the FSM,
and it sees dif ferent training data. Similarly , when other states aremerged,
the attributes of the training data also change. To help mitigate this ef
fect, the new state takes the oldest identifi er assigned to the two mer
ged states. Empirical results inTable 14 illustrate that the FSM does not
have to be fi xed before the classifi er can learn useful information.
5. Contextual Prompting
In the prompting mode, the software continuously predicts a likely completion
as the userwrites out a note. It presents this as a default next to the completion
button. The button's saturation ranges from white to green in proportion
to the confidence of the prediction. If theuser taps the completion button,
the prompt text is inserted at the end of the current note.
A completion is generated by parsing the tokens already written by the user
, finding thelast state visited in the FSM, and predicting the next most
likely transition (or termination). This process is repeated until a stopping
criterion is satisfi ed, which is discussed below . Ifthe last token written
by the user is incomplete, matching only a prefix of a state's transition,
then the remainder of that transition is predicted. If the last token matches
more than onetransition, a generalized string is predicted using special
characters to indicate the type and number of characters expected. If a digit
is expected, a "
#
" is included; if a letter , an "
a
" isincluded; if either are possible, a "
?
" is included; and if some transition' s tokens are longerthan others, a
"
...
" is appended to the end . For example, if the user has written"
4096K PowerBook 1
", the possible values for PowerBook models of "
100
", "
140
","
160C
", and "
170
" are generalized, and the prompt is "
#0...
".A simple calculation is used to compute the confi dence of the prediction
and set the button's color saturation. It is the simple ratio
where is the frequency of the predicted arc (or terminate) [i.e., the number
oftimes this choice was taken while parsing previously observed notes],
is the total frequency of all arcs (and terminate), and is the number of
tokens skipped duringheuristic parsing (cf. Section 3.3, Parsing). Confidence
is directly proportional to the simple likelihood of the prediction and is
degraded in proportion to the number of tokens the FSMhad to skip to get
to this point. This information is used in a simple way , so it is unclear
if more sophisticated measures are needed.The stopping criterion is used
to determine how much of a prompt to of fer the user. At one extreme, only
a single token can be predicted. This gives the user little context and maynot
provide much assistance. At the other extreme, a sequence of tokens that
completes the note can be predicted. This may be too lengthy, and the user
would have to edit the prompt ifselected. The stopping criterion in Table
4 balances these two extremes and attempts to limit prompts to a consistent
set of tokens. In particular , Condition 3 stops expanding the prompt
f pred iction
( )
f total
( )
1
skipped
+( )*
f prediction
( )
f total
( )
skipped
S
CHLIMMER
& H
ERMENS
76
upon reaching a syntactic boundary (leading punctuation) or upon reaching
a semanticboundary (falling confidence).
6. Constructing a Button-Box Interface
In the button-box mode, the software presents an interactive graphical interface.
Instead ofwriting out the note, the user may select note fragments by tapping
buttons. To switch from contextual mode to button-box mode, a green radio
button indicator is displayed below thecompletion button when the software
is confident about the user's syntax. If the user taps this indicator, the
existing text is removed, and the corresponding buttons in the button-box
inter-face are selected. As the user selects additional buttons, the interface
dynamically expands to reveal additional choices. Because the interface refl
ects an improving syntactic representa-tion, it also improves with successive
notes.
The button-box interface is a direct presentation of a fi nite-state machine.
After the userhas written out a token or so of the note, the software fi
nds the FSM that best parses these tokens. The mode switch is presented
if the syntax is suf ficiently mature--if the averagenumber of times each
state has been used to parse earlier notes is greater than 2. If the user
selects this indicator, the FSM is incrementally rendered as a set of radio
buttons and checkboxes.
The two user interface item types correspond to optional choices (check boxes)
andexclusive choices (radio buttons). Mapping a FSM into these two item types
proceeds one state at a time. Given a particular state to be rendered, any
transition that starts a path thatdoes not branch and eventually returns
back to the state is rendered as a check box (a loop). The loop corresponds
to syntactically optional information. The label for the check box con-sists
of each of the transition labels along the looping path. Other non-looping
transitions are rendered as buttons in a single radio button panel along
with an extra, unlabeled button. Theycorrespond to syntactically exclusive
information. The label for each radio button consists of each transition
label up to the point of a subsequent branch or termination. For example,compare
the FSM depicted in Figure 7 and the corresponding button-box interface
in Figure 3.Because the transitions for dif ferent radio buttons lead to
dif ferent parts of the FSM, it may confuse the user to render the entire
FSM at once. So, each branching state is renderedas it is visited. Initially
, the fi rst state in the FSM is rendered. Then, when a radio button is
selected, the branching state at the end of its transition path is rendered.
Note that checkboxes do not trigger additional rendering because the branching
state at the end of their loop
Stop expanding the prompt if any of the following are true:1. The next prediction
is to terminate; or 2. The next prediction is a generalized string; or3.
At least one token has already been predicted and
a. The prediction starts with punctuation, orb. The confidence of the prediction
is lower; or 4. The next prediction is the same as the last prediction; or5.
More than 10 tokens have already been predicted.
Table 4: Stopping criterion for contextual prompting.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
77
has already been rendered. This interactive process is repeated as long
as the user selectsradio buttons that lead to branching states.
7. Empirical Results
We tested the interactive note taking software on notes drawn from a variety
of domains.Tables 5 through 11 list sample notes from seven domains (in addition
to the PowerBook and fabric pattern sample notes listed above).
CVA-62 8/6/63 to 3/4/64 Mediterranean A-5A AG 60X CVA-61 8/5/64 to 5/6/65
Vietnam RA-5C NG 10X
Table 5: Sample notes from the airwing domain. Listed above are 2 of the
78 notesabout airwing assignments aboard aircraft carriers collected from
(Grove & Miller, 1989).
B, 81, 5, 151 (2.5), Cyl. 4, 2-bbl., Pontiac C, 82, X, 173 (2.8), Cyl. 6,
2-bbl., Chevrolet
Table 6: Sample notes from the engine code domain. Listed above are 2 of
the 20 notesabout the meaning of engine codes stamped on automobile identification
plates collected from Chilton's Repair & Tune-Up Guide (1985).
90, Mazda MPV, 40K MI, 7 Pass, V6, Auto ABS, PL/PW, Cruise, Dual Air
87, Grand Caravan, 35K MI, 7 Pass, V6, Auto Cruise, Air, Tilt, Tinting
Table 7: Sample notes from the minivan domain. Listed above are 2 of the
22 notesabout minivan automobiles collected by the first author.
Lorus Disney Oversize Mickey Mouse Watch. Genuine leather strap.
Seiko Disney Ladies' Minnie Mouse Watch. Leather strap.
Table 8: Sample notes from the watch domain. Listed above are 2 of the 89
notesabout personal watches collected from the Best catalog (a department
store).
S
CHLIMMER
& H
ERMENS
78
azatadine maleate Blood: thrombocytopenia. CNS: disturbed coordination, dizziness,
drowsiness, sedation, vertigo. CV: palpitations, hypotension. GI: anorexia,
dry mouth and throat, nausea, vomiting. GU: Urinary retention. Skin: rash,
urticaria. Other: chills, thickening of bronchial secretions.
brompheniramine maleate Blood: aganulocytosis, thrombocytopenia. CNS: dizziness,
insomnia, irritability, tremors. CV: hypotension, palpitations. GI: anorexia,
dry mouth and throat, nausea, vomiting. GU: urinary retention. Skin: rash,
urticaria. After parenteral administration: local reaction, sweating, syncope
may occur.
Table 9: Sample notes from the antihistamine domain. Listed above are 2 of
the 17notes on the side effects of antihistamines collected from the Nurses
Guide to Drugs (1979).
Canon FD f/1.8, 6oz., f/22, 13in., good sharpness, poor freedom from flare,
better freedom from distortion, focal length marked on sides as well as on
front of lens
Chinon f/1.7, 6oz., f/22, 9in., poor sharpness, good freedom from flare,
good freedom from distortion, cannot be locked in program mode, which is
only a problem, of course, when lens is used on program-mode cameras
Table 10: Sample notes from the lens domain. Listed above are 2 of the 31
notes about35mm SLR camera normal lenses collected from the Consumer Reports
(1988).
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
79
Summary characteristics of the nine domains are listed in Table 12 together
with somesimple measures to indicate prediction dif ficulty. For instance,
Column 1 shows the number of notes in the domain. With a lar ger number
of notes, the easier it should be to accuratelytrain a predictive method.
Column 4 shows the standard deviation (STD) of the length of all notes in
each domain. It is more likely that a well-behaved FSM can be discovered
whenSTD is low . In this and successive tables, the domains are ranked by
STD. Column 5 presents the percentage of unique tokens in the notes. The
fewer novel tokens a note has, themore likely that successive tokens can
be predicted. This measure places an upper bound on predictive accuracy
. Column 6 shows the percentage of constant tokens, ones that alwaysappear
in a fi xed position.
It is easier to predict these constant tokens. Finally , Column 7indicates
the percentage of repeated tokens. When fewer tokens are repeated verbatim
within a note, the more likely that the predictive method will not become
confused about its localewithin a note during prediction.
The first six domains are natural for the interactive note taking task because
they exhibita regular syntax. The last three domains are included to test
the software' s ability on less suitable domains. Notes from the Antihistamine,
Lens, and Raptor domains contain highly-variable lists of terms or natural
language sentences. Learned FSMs for notes in these domains are unlikely
to conver ge, and, in the experiments reported here, only the FSM forthe
Lens data exceeded the maturity threshold (average state usage greater than
2).
7.1 Contextual Prediction Accuracy
Column 7 of Table 13 lists the accuracy of next-token predictions made by
the software inprompting mode. The fi rst nine rows list predictive accuracy
over all tokens as notes from each of the nine domains are independently
processed in the order they were collected. Thelast row lists predictive
accuracy over all tokens as notes from all nine domains are collectively
processed. This simulates a user taking notes about several domains simultaneously.
To put these results in context, the table also lists predictive accuracies
for several other methods. Column 1 lists the accuracy for a lower bound
method. It assumes that each noteshares a fixed sequence of tokens. Termed
common
, this method initializes its structure to the
22in. W. 48in. A very large falcon. Three color phases occur: blackish, white,
and gray-brown. All are more uniformly colored than the Peregrine Falcon,
which has dark mustaches and hood.
16-24in. W. 42in. Long-winged, long-tailed hawk with a white rump, usually
seen soaring unsteadily over marshes with its wings held in a shallow 'V'.
Male has a pale gray back, head, and breast. Female and young are brown above,
streaked below, young birds with a rusty tone.
Table 11: Sample notes from the raptor domain. Listed above are 2 of the
21 notesabout North American birds of prey collected from (Bull & Farrand,
1977).
S
CHLIMMER
& H
ERMENS
80
first note. It then removes each token in this sequential structure that
cannot be found inorder in other notes. At best, this method can only predict
the constant, delimiter-like tokens that may appear regularly in notes. Its
performance is limited by the percentage of constanttokens reported in Column
6 of Table 12. It performs best for the PowerBook notes where it learns the
following note syntax:
* :NULL * "K" * " PowerBook" * "MB" * "MB" * " Int." * .
(Example 3) (The asterisks indicate Kleene star notation.) This reads as
some sequence of zero or moretokens then the token
:NULL
, followed by zero or more tokens then "
K
", followed by zero ormore tokens then "
PowerBook
", and so on. It is less successful for the minivan notes where itlearns
a simpler syntax:
* :NULL * "K" * " MI" * " Pass" * .
(Example 4) Columns 2 and 3 of Table 13 list the accuracy of using a classifi
er to directly predict thenext token without explicitly learning a syntax.
In this paradigm, examples are prefi xes of
token sequences. Attributes are the last token in the sequence, the second
to last token, thethird to last token, and so on. Class values are the next
token in the sequence--the one to be predicted. Column 2 lists the performance
of a simple Bayes classifi er, and Column 3 liststhe performance of an incremental
variant of ID3 (Schlimmer & Fisher , 1986). Perhaps surprisingly, these methods
perform considerably worse than the simple conjunctive method.Without the
benefit of a narrow context provided by the FSM, these methods must implicitly
construct representations to detect dif ferences between similar situations
that arise within asingle note. For example, in the PowerBook notes, a classifi
er-only approach must learn to discriminate between the first and second
occurrence of the "
MB
" token.Column 4 of Table 13 lists the accuracy of a more viable prediction
mechanism. Based on simple ideas of memorization and termed
digram
, the method maintains a list of tokensthat have immediately followed each
observed token. For example, in the fabric pattern domain, this method retains
the list of tokens {"
8-10-12
", "
10
", "
11/12
", "
12
"} as thosethat follow the token "
Size
". Each list of
follow
tokens are kept in order from most to leastfrequent. To predict the next
token, the system looks for the last token written and predicts
1 2 3 4 5 6 7 Domain N Notes N Tokens Tokens/Note STD % Unique % Constant
% Repeated Airwing 78 936 12.0 0.3 18 8 0 Pattern 13 75 5.8 0.7 21 0 0 Engine
Code 20 222 11.1 0.8 0 0 0 Minivan 22 335 15.2 1.7 9 17 0 PowerBook 95 1238
13.0 2.6 1 31 15 Watch 89 832 9.3 5.1 13 0 1 Antihistamine 17 421 24.8 9.4
17 8 1 Lens 31 1066 34.4 9.6 1 26 19 Raptor 21 878 41.8 11.5 33 7 22
Table 12: Quantitative properties of the nine domains used to test alternative
methods.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
81
the most frequent follow token. This method is nearly as ef fective as any
other in Table 13,especially on the combined task when notes from each domain
are entered in random order . Laird (1992) describes an efficient algorithm
for maintaining higher-dimensional n-grams, ineffect increasing the context
of each prediction and effectively memorizing longer sequences of tokens.
Laird's algorithm builds a Markov tree and incorporates heuristics that keep
thesize of the tree from growing excessively large. Regrettably, these methods
are unsuitable for the interactive note-taking software because of the dif
ficulty of using them to construct acustom user interface. It is plausible
to construct a panel of exclusive choices based directly on the set of follow
tokens, but it is unclear how to identify optional choices correspondingto
loops in fi nite-state machines. Moreover, if notes are drawn from dif ferent
domains, and those domains share even a single token, then some follow set
will include tokens fromdifferent domains. Using these follow sets to construct
a user interface will unnecessarily confuse the user by introducing options
from more than one domain at a time.Column 5 of Table 13 lists the accuracy
of prediction based solely on the learned FSMs. Without an embedded classifi
er, this method must rely on prediction of the most commontransition (or
termination) from each state. Because the prediction is based on simple
counts (as noted in Section 4, Learning Embedded Classifi ers), this method
never
predicts optionaltransitions. Columns 6 and 7 of Table 13 list the accuracy
of predicting using FSMs and embeddedclassifiers. The classifiers used are
simple Bayes and the incremental ID3, respectively . The latter outperforms
either the FSM alone or the FSM with embedded Bayes classifi ers. If thesystem
only makes predictions when its confi dence measure is greater than 0.25,
the accuracy is signifi cantly dif ferent for the Engine Code, Minivan, Lens,
and Raptor domains,ranging between 10 and 22 percentage points of improvement.
Column 8 of Table 13 lists an estimate of the upper -bound on predictive
accuracy . Thiswas calculated by assuming that prediction errors were only
made the first time each distinct token was written.
1 2 3 4 5 6 7 8 Domain Common Bayes ID4 Digram FSM FSM+Bayes FSM+ID4 Upper
Airwing 19 8 8 47 62 44 62 79 Pattern 25 15 16 34 43 43 51 68 Engine Code
18 8 8 59 64 63 69 87 Minivan 29 6 7 54 46 44 47 80 PowerBook 40 7 8 73 70
76 82 96 Watch 21 10 14 44 39 33 42 78 Antihistamine 11 4 6 40 24 22 24 68
Lens 22 3 3 68 63 60 63 91 Raptor 9 2 3 11 12 9 12 55 Combined -- -- -- 48
46 45 49 --
Table 13: Percentage of tokens correctly predicted as a function of the
learningmethod.
S
CHLIMMER
& H
ERMENS
82
7.2 Design Decisions
The note taking software embodies a number of design decisions. Table 14
lists the effects ofthese decisions on predictive accuracy by comparing versions
of the software with and without each design feature. The fi rst column
lists the predictive accuracy for the software' snominal confi guration.
Column 2 lists the accuracy data for a slightly dif ferent generic tokenizer.
Accuracy is higher for some domains, lower for others. A custom-built tokenizeris
one way to incorporate knowledge about the domain. Columns 3 and 4 show
the accuracy for the system using only the original two FSM mer ging rules
(cf. Table 1) and all but thelast mer ging rule (cf. Table 2), respectively
. The decreased structural generality tends to lower predictive accuracy
, but the embedded classifi ers help compensate for the reducedaccuracy.
Column 5 lists the accuracy for when the FSM does not heuristically continue
parsing upon encountering a token for which there is no immediate transition.
As expected,accuracy suffers considerably in some domains because a novel
token in a sequence completely foils any subsequent prediction. Columns
6 and 7 list accuracy for dif ferentvalues of the free parameter controlling
the clustering of notes together into a FSM. There is little ef fect on predictive
accuracy in this case. Column 8 shows the accuracy for whenembedded classifiers
do not use information about repeated states in the FSM.
Without thisinformation, the classifiers cannot predict that a loop transition
should be taken exactly once. Surprisingly, elimination of this feature has
little effect on accuracy. Column 9 lists the accu-racy for when the embedded
classifi ers associated with a pair of FSM states are discarded when the
states are merged. Finally, Column 10 lists the accuracy for when a new FSM
stateis assigned a unique ID rather than the ID of the oldest of the two
merged states.
7.3 Sample Button-Box Interfaces
In addition to Figure 3, Figures 11 through 15 depict button-box interfaces
for the fi ve otherwell-behaved note taking domains listed at the top of
Table 12. These interfaces are visual and of fer the user an or ganized view
of their notes, presenting options in a natural way .However, whenever unique
tokens are involved, the current software makes no attempt to explicitly
generalize tokens. This effect is reflected in the tour dates for the Airwing
notes inFigure 11. Note that the radio button panel consists of a long series
of dates, none of which is likely to be selected for a new note.
1 2 3 4 5 6 7 8 9 10 Domain Norm Diff Tokens Rules 2a,b Rules 2ab,3a No Restart
Accept= 1/4 Accept= 3/4 Repeat Atts Drop Class'r New IDs Airwing 62 62 63
62 62 62 62 62 61 63 Pattern 51 51 53 52 50 51 51 51 51 53 Engine Code 69
71 72 69 43 69 69 69 67 72 Minivan 47 48 48 47 28 47 47 52 45 48 PowerBook
82 80 83 83 77 82 82 81 80 82 Watch 42 42 43 43 28 42 42 42 41 43 Antihistamine
24 25 24 24 9 24 24 24 24 24 Lens 63 66 64 63 46 63 63 63 63 64 Raptor 12
11 12 12 11 12 12 12 12 12
Table 14: Percentage of tokens correctly predicted as a function of design
variations.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
83
8. Related Work
Self-customizing so ftware agents have several subjective dimensions on
which they can beevaluated and compared:
*
Anticipation
--Does the system present alternatives without the user having torequest
them?
*
User interface
--Is the system graphical, or is it command-line oriented?*
User control
--Can the user override or choose to ignore predictive actions?*
Modality
--If the system has a number of working modes, can the user work in anymode
without explicitly selecting one of them?
*
Learning update
--Is learning incremental, continuous and/or real-time?
Figure 11: Screen snapshot of the note-taking software in button-box mode
for anairwing note.
Figure 12: Screen snapshot of the note-taking software in button-box mode
for afabric pattern note.
S
CHLIMMER
& H
ERMENS
84
*
User adjustable
--Can the user tune the system parameters manually?Here we describe related
systems that exhibit properties in each of these agent dimensions. Our note
taking softwar e utilizes the
anticipation
user interface technique pioneered byEager (Cypher, 1991). Eager is a non-intrusive
system that learns to perform iterative procedures by watching the user .
As such, it is a learning apprentice, a software agent , and anexample of
programming by example or demonstration. Situated within the HyperCard environment,
it continuously watches a user 's actions. When it detects the second cycle
of aniteration, it presents an execute icon for the user 's notice. It also
visually indicates the anticipated next action by highlighting the appropriate
button, menu item, or text selection ingreen. As the user performs their
task, they can verify that Eager has learned the correct
Figure 13: Screen snapshot of the note-taking software in button-box mode
for anengine code note.
Figure 14: Screen snapshot of the note-taking software in button-box mode
for aminivan note.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
85
procedure by comparing its anticipations to their actions. When the user
is confident enough,they can click on the execution icon, and Eager will
run the iterative procedure to completion. Eager is highly anticipatory ,
uses a graphical interface, is non-obtrusive, non-modal,and learns in real-time,
but is not user adjustable.
CAP is an apprenticeship system that learns to predict default values (Dent,
et al., 1992).Its domain of operation is calendar management, and it learns
preferences as a knowledgable secretary might. For example, a professor may
prefer to hold a regular group meeting in aparticular room at a particular
time of day for a particular duration--information that a secretary would
know from experience. CAP collects information as the user manages theircalendar,
learns from previous meetings, and uses the regularities it learns to of
fer default values for meeting location, time, and duration. The learning
system is re-run each night onthe most recent meeting data, and the learned
rules are applied for prediction the following day. CAP is also designed
to utilize an extensible knowledge base that contains calendarinformation
and a database of personnel information. The system continues to be used
to manage individual faculty calendars. Though of fering some intelligence,
CAP' s user inter -face is line-oriented and is based on the Emacs editor
. Questions asked of the user about meetings are presented using a command-line
dialog, and the default predictions aredisplayed one-at-a-time. CAP can
be characterized as anticipatory , command-line oriented and modal with user
control (but not user adjustable), where learning is done in batch.Another
related system addresses the task of learning to fi ll out a form (Hermens
& Schlimmer, 1993). The system recreates a paper form as an on-screen facsimile,
allowing theuser to view all of the pertinent information at a glance. Input
typed by the user into the electronic form is processed by a central form-fi
lling module. When the user completes a formcopy, it is printed, and each
field value on the form is forwarded to a learning module (a decision tree
learning method). The learned representations predict default values for
each fi eldon the form by referring to values observed on other fi elds and
on the previous form copy . From the user 's point of view , it is as if
spreadsheet functions have been learned for eachfield of the form. Empirical
studies indicate that this system reduced the number of keystrokes required
of the user by 87% on 269 forms processed over the 8 month period in
Figure 15: Screen snapshot of the note-taking software in button-box mode
for awatch note.
S
CHLIMMER
& H
ERMENS
86
which it was actually used by of fice personnel . This system is unobtrusive,
non-modal andanticipatory, uses a graphical interface, and updates learning
in real-time.
Maes and Kozierok (1993) are addressing the problem of self-customizing software
at amuch more task-independent level. They identify three learning opportunities
for a software agent: observing the user 's actions and imitating them, receiving
user feedback upon error ,and incorporating explicit training by the user
. To illustrate the generality of their framework, they demonstrate simple
learning apprentices that help sort the user 's electronic mailand schedule
meetings. Their initial systems use an instance-based (case- or memory-based)
approach primarily because it allows ef ficient update and because it naturally
generates aconfidence in each of its predictions. User 's may set thresholds
on these predictions, corresponding to a minimum confi dence for when the
agent should prompt the user (a "tell-me"threshold) and a higher minimum
confi dence for the agent to act immediately on behalf of the user (a "do-it"
threshold). The framework for learning in this case is anticipatory, utilizesa
graphical user interface, is devoted to user control, is non-modal, learns
in real-time, and is user adjustable.A system developed for Macintosh Common
Lisp (MCL) provides a word-completion mechanism for word prefi xes typed
by the user in any window . J. Salem and A. Ruttenberg(unpublished) have
devised MCL methods to display a word completion in the status bar of the
each window. If the user desires to add the completion to the window , they
simply pressthe
CLEAR
key . This word completion mechanism is similar to fi le-name completion
in
EMACS
and the C-shell in
UNIX
systems, except that the word is displayed for the userbefore it is added.
This system is anticipatory (unlike the
UNIX
file completion), is commandline oriented (but displays the default completion
in a graphical window), can be fully controlled by the user, is non-modal,
learns in real time, is not intended to be user adjustable(though knowledgeable
MCL programmers could easily make changes to the code).
The interactive note taking software we have devised does not require any
user program-ming. It only receives implicit user feedback when the user
chooses to complete a note in a different way than prompted. It does not
have any mechanisms for direct user instruction orthreshold tuning. In a
system designed to be as easy to use as paper, such explicit adjustment may
be inappropriate. We characterize our system as anticipatory , graphically-oriented,
andmodal (due to the switching that takes place when a user wishes to display
the button-box interface). It allows the user to override default prompts
and predictions, and it learns inreal-time. We have not included features
that allow the user to confi gure the performance of the agent.
9. Observations/Limitations
The interactive note-taking software is designed to help users capture information
digitally ,both to speed entry and improve accuracy , and to support the
longer term goal of ef ficient retrieval. The software incorporates two
distinctive features. First, it actively predicts whatthe user is going to
write. Second, it automatically constructs a custom radio-button, checkbox
user interface.This research explores the extremes of FSM learning and prediction,
where the system has no explicit
a priori
knowledge of the note domains. We have tried to design the systemso that
it can learn quickly , yet adapt well to semantic and syntactic changes ,
all without a knowledge store from which to draw. It is clear that knowledge
in the form of a domain-spe-cific tokenizer would aid FSM learning by chunking
signifi cant phrases and relating similar notations and abbreviations. Some
preliminary work has shown that, after a few notes have
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
87
been written, users may create abbreviations instead of writing out whole
words. A domain-specific tokenizer would be able to relate an abbreviation
and a whole word as being in the same class, and therefore allow for more
fl exibility during note taking. For example, adomain-specific tokenizer
may recognize that "
Megabytes
", "
Meg
"
,
"
MB
", and "
M
" all repre-sent the same token for memory sizes. One could imagine a framework
that would allow for domain-specific tokenizers to be simply plugged in.The
prototype built to demonstrate these ideas was implemented on a conventional,
micro computer with keyboard input. As a consequence, it was impossible
to evaluate useracceptance of the new interface or the adaptive agent. With
newly available computing devices incorporating pen input and handwriting
recognition, it should be possible to re-engineer the user interface and
field test these ideas with actual users.
One aspect of note learning, related to tokenization and the button-box user
interfacedisplay, is the difficulty of generalizing numeric strings or unique
tokens. The cardinality of the range of model numbers, telephone numbers,
quantities, sizes, other numeric values, andeven proper names is very lar
ge in some note domains. The fi nite-state machine learning method presented
here is incapable of generalizing over transitions from a particular state,and,
as a consequence, the current system has the problem of displaying a very
lengthy button-box interface list. (A button is displayed for each value
encountered in the syntax ofnotes, and there may be many choices.) For example,
a large variety of pattern numbers may be available in the fabric pattern
note domain. An appropriate mechanism is desired to deter-mine when the list
of numeric choices is too large to be useful as a button-box interface. The
system can then generalize the expected number , indicating the number of
digits to promptthe user:
####
, for example. This may be helpful to remind the user that a number isexpected
without presenting an overbearing list of possibilities. Another limitation
of the current ef fort lies in the choice of fi nite-state machines torepresent
the syntax of the user 's notes. Notes may not be regular expressions with
the consequence that the FSMs may become too large as the learning method
attempts to acquirea syntax. This may place an unreasonable demand on memory
and lead to reduced prompting effectiveness.The choice of finite-state machines
also apparently constraints the custom user interface. Because FSMs branch
in unpredicable ways , button-box interface s must be rendered incre-mentally.
After the user indicates a particular transition (by selecting a button),
the system can render states reachable from that transition for the user
. Ideally, the user should be ableto select buttons corresponding to note
fragments in any order , allowing them to write down the size before the
pattern number , for example. To construct a non-modal user interface, amore
flexible syntactic representation is needed.
Several of the low-level design decisions employed in this system are crude
responses totechnical issues. For instance, the decision to render a syntax
as a button-box interface only if the average number of times each state
has been used to parse notes is greater than 2. Thisignores the fact that
some parts of the state machine have been used frequently for parsing notes
while other parts have rarely been used. Similarly , the particular measure
for estimat-ing prompting confidence (and setting the saturation of the completion
button) is simplistic and would benefit from a more sound statistical basis.
Acknowledgments
Anonymous reviewers suggested an additional example in Section 3, offered
some refi ne-ments to the user interface, graciously identifi ed some limitations
of the work listed in
S
CHLIMMER
& H
ERMENS
88
Section 9, and pointed out some additional related work. Mike Kibler , Karl
Hakimian, andthe EECS staff provided a consistent and reliable computing
environment. Apple Cambridge developed and supports the Macintosh Common
Lisp programming environment. AllenCypher provided the tokenizer code.
This work was supported in part by the National Science Foundation under
grant number 92-1290 and by a grant from Digital EquipmentCorporation.
References
Angluin, D. (1982). Inference of reversible languages.
Journal of the Association forComputing Machinery
,
29
, 741-765. Berwick, R. C., & Pilato, S. (1987). Learning syntax by automata
induction.
Machine Learn-ing
,
2
, 9-38. Bull, J., & Farrand, J., Jr. (1977). The Audubon Society Field Guide
to North American Birds(Eastern Edition). NY: Alfred A. Knopf (pp. 401-682).
Chilton's Repair & T une-Up Guide: GM X-Body 1980-1985
(1985). Randor , PA: ChiltonBook (p. 7). Cohen, W. W. (1988). Generalizing
number and learning from multiple examples in explana-tion based learning.
Proceedings of the Fifth International Confer ence on MachineLearning
(pp. 256-269). Ann Arbor, MI: Morgan Kaufmann. Consumer Reports (1988),
53
(12), 302-303. Mount Vernon, NY: Consumers Union. Cypher, A. (1991). Eager:
Programming repetitive tasks by example.
Proceedings of CHI
(pp. 33-39). New Orleans, LA: ACM. Dent, L ., Boticario, J., McDermott, J.,
Mitchell, T., & Zabowski, D. (1992). A personallearning apprentice.
Proceedings of the T enth National Confer ence on Artificial Intelli-gence
(pp. 96-103). San Jose, CA: AAAI Press. Fisher, D. H. (1987). Knowledge
acquisition via incremental conceptual clustering.
MachineLearning
,
2
, 139-172. Grove, M., & Miller, J. (1989). North American Rockwell A3J/A-5
Vigilante. Arlington, TX:Aerofax (pp. 13-15).
Hermens, L. A., & Schlimmer , J. C. (1993). A machine-learning apprentice
for the comple-tion of repetitive forms.
Proceedings of the Ninth IEEE Confer ence on Artificial Intelli-gence for
Applications
. Orlando, FL. Laird, P. (1992). Discrete sequence prediction and its applications.
Proceedings of the TenthNational Conference on Artificial Intelligence
(pp. 135-140). San Jose, CA: AAAI Press.
S
OFTWARE
A
GENTS
: C
OMPLETING
P
ATTERNS
& C
ONSTRUCTING
U
SER
I
NTERFACES
89
Maes, P., & Kozierok, R. (1993). Learning interface agents.
Proceedings of the EleventhNational Conference on Artificial Intelligence
(pp. 459-465) . Washington, D. C.: AAAIPress. Nurse's Guide to Drugs (1979).
Horsham, PA: Intermed Communications (pp. 454-462). Schlimmer, J. C., & Fisher
, D. H. (1986). A case study of incremental concept induction.
Proceedings of the Fifth National Confer ence on Artificial Intelligence
(pp. 496-501).Philadelphia, PA: AAAI Press.