A New Technique for Combining Multiple Classifiers using The Dempster-Shafer Theory of Evidence

A. Al-Ani and M. Deriche

Volume 17, 2002

Links to Full Text:

Abstract:
This paper presents a new classifier combination technique based on the Dempster-Shafer theory of evidence. The Dempster-Shafer theory of evidence is a powerful method for combining measures of evidence from different classifiers. However, since each of the available methods that estimates the evidence of classifiers has its own limitations, we propose here a new implementation which adapts to training data so that the overall mean square error is minimized. The proposed technique is shown to outperform most available classifier combination methods when tested on three different classification problems.

Extracted Text

Journal of Artificial Intelligence Research 17 {2002} 333-361 Submitted 2/2002;
published 11/2002
 A New Technique for Combining Multiple Classifiers using
 The Dempster-Shafer Theory of Evidence
 Ahmed Al-Ani a.alani@qut.edu.au
 Mohamed Deriche m.deriche@qut.edu.au
 Signal Processing Research Centre
 Queensland University of Technology
 GPO Box 2434, Brisbane, Q 4001, Australia Abstract
 This paper presents a new classifier combination technique based on the
Dempster-
 Shafer theory of evidence. The Dempster-Shafer theory of evidence is a powerful
method
 for combining measures of evidence from different classifiers. However,
since each of the available methods that estimates the evidence of classifiers
has its own limitations, we
 propose here a new implementation which adapts to training data so that
the overall mean
 square error is minimized. The proposed technique is shown to outperform
most available
 classifier combination methods when tested on three different classification
problems.
 1. Introduction
 In the field of pattern recognition, the main ob jective is to achieve the
highest possible clas- sification accuracy. To attain this ob jective, researchers,
throughout the past few decades,
 have developed numerous systems working with different features dependingupon
the ap-
 plication of interest. These features are extracted from data and can be
of different types like continuous variables, binary values, etc. As such,
a classification algorithm used with a
 specific set of features may not be appropriate with a different set of
features. In addition,
 classification algorithms are different in their theories, and hence achieve
different degrees
 ofsuccess for different applications. Even though, a specific feature set
used with a specific classifier might achieve better results than those obtained
using another feature set and/or classification scheme, we can not conclude
that this set and this classification scheme achieve
 the best possible classification results {Kittler, Hatef, Duin,& Matas,
1998}. As different
 classifiers may offer complementary information about the patterns to be
classified, combin- ing classifiers, in an efficient way, can achieve better
classification results than any single
 classifier {even the best one}.
 As explained by Xu et al. {1992}, the problem of combining multiple classifiers
consists
 of two parts. The first part, closely dependent on specific applications,
includes the problems of \How many and what type of classifiers should be
used for a specific application?, and
 for each classifier what type of features should we use?", as well as other
problems that are
 related to the construction of those individual and complementary classifiers.
The second
 part, which is common to various applications, includes the problems related
to the question \How to combine the results from different existing classifiers
so that a better result can be
 obtained?". In our work, we will be concentrating on problems related to
the second issue. c
 fl2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.Al-Ani
& Deriche The output information from various classification algorithms can
be categorized into
 three levels: the abstract, the rank, and the measurement levels. In the
abstract level,
 a classifier only outputs a unique label, as in the case of syntactic classifiers.
For the
 rank level, a classifier ranks all labels or a subset of the labels in a
queue with the label at the top being the first choice. This type was discussed
by Ho et al. {1994}. For the
 measurement level, a classifier attributes to each class a measurement value
that reflects the degree of confidence that a specific input belongs to a
given class. Among the three levels,
 the measurement level contains the highest amount of information while the
abstract level
 contains the lowest. For this reason, we adopted, in this work, the measurement
level. Kittler et al. {1998} differentiated between two classifiercombination
scenarios. In the
 first scenario, all the classifiers use the same representation of the input
pattern. On the
 other hand, each classifier uses its own representation of the input pattern
in the second
 scenario. They illustrated that in the first case, each classifier can be
considered to pro- duce an estimate of the same a posteriori class probability.
However, in the second case
 it is no longer possible to consider the computed a posteriori probabilities
to be estimates
 of the same functional value, as the classification systems operate in different
measure-
 ment systems. Kittler et al. {1998} focused on the second scenario, and
they conducted
 a comparative study of the performance of several combination schemes namely;
product,
 sum, min, max, and median. By assuming the joint probability distributions
to be con- ditionally independent, they found that the sum rule gave the
best results. A well known
 approach that has been used in combining the results of different classifiers
is the weighted
 sum, where the weights are determined through a Bayesian decision rule {Lam
& Suen, 1995}. An alternative method was presented by Hashem & Schmeiser
{1995}, where a cost
 function was used to minimize the mean square error {MSE} in order to calculate
a linear
 combination of the corresponding outputs from a number of trained artificial
neural net-
 works {ANNs}. The expectation maximization algorithm was used by Chen &
Chi {1998}
 to perform the linear combination. The fuzzy integral has been used by Cho
& Kim {1995a,
 1995b} to combine multiple ANNs, while {Rogova, 1994; Mandler & Schurmann,
1988} have used the Dempster-Shafer theory of evidence to combine the result
of several ANNs. Many
 other combination methods have also been used to combine classifiers, such
as bagging and
 boosting {Dietterich, 1999}, which are powerful methods for diversifying
and combining classification results obtained using a single classification
algorithm and a specific feature set. In bagging, we get a family of classifiers
by training on different portions of the training
 set. The method works as follows. We first create N training bags. A single
training bag
 is obtained by taking a training set of size S and sampling this training
set S times with
 replacement. Some training instances will occur multiple times in a bag,
while others may not appear at all. Next, each bag is used to train a classifier.
These classifiers are then
 combined. Boosting, on the other hand, is based on multiple learning iterations.
At each iteration, instances that are incorrectly classified are given a
greater weight in the next it-
 eration. By doing so, in each iteration, the classifier is forced to concentrate
on instances it
 was unable to correctly classify in earlier iterations. In the end, all
of the trained classifiers are combined.
 In this paper, we will focus on combining classification results obtained
using N different
 feature sets, f
 1
 ; 001 001 001 ; f
 N
 . Each feature set willbe used to train a classifier, and hence there willbe
N different classifiers, c
 1
 ; 001 001 001 ; c N
 . For a specific input x, each classifier c n
 produces
 334A New Technique for Combining Multiple Classifiers
Ahmed al-AniFeatureExtraction1Classifierc1ClassifierNcxExtractionNFeaturefCombinationzyyfN11NFigure
1: A multi-classifier recognition system
 a real vector y
 n
 = [y
 n
 {1}; 001 001 001 y
 n
 {k}; 001 001 001 y
 n
 {K }] T
 , where K is the number of class labels and y
 n
 {k} corresponds to the degree that c n
 considers x has the label k. This degree could be a probability, as in the
Bayesian classifier, or any other scoring system. Fig. 1 shows the
 blockdiagram of a multi-classifier recognition system. Unlike statistical-based
combination techniques, the Dempster-Shafer theory of evidence has the ability
to represent uncertainties and lack of knowledge. This is quite important
for
 the problem of classifier combination, because there is usually a certain
level of uncertainty
 associated with the performance of each of the classifiers. Since available
classifier combi-
 nation methods based on this theory do not accurately estimate the evidence
of classifiers,
 this paper attempts to solve this issue by proposing a new technique based
on the gradient
 descent learning algorithm, which aims at minimizingthe MSE between the
combined out-
 put and the target output of a given training set. Aha {1995} gave the following
definition
 for learning:
 Learningdenotes changes in the system that are adaptive in the sense that
they
 enable the system to do the same task or tasks drawn from the same population
more effectively the next time. Basedon the above, we show that instead of
attempting to find an analytical formula which
 accurately measures evidence, one can obtain a very good estimate of evidence
by just using
 appropriate learning procedures, as will be discussed later. Some basic
concepts of the Dempster-Shafer theory of evidence are presented in the next
section. Section three discusses the existing methods for computing evidence.
The
 proposed combination technique is presented in section four. Section five
compares the
 proposed algorithm to other conventional methods used by Kittler et al.
{1998}, the fuzzy integral, and a previous implementation of the Dempster-Shafer
theory. Section six provides
 a conclusion to the paper.
 2. The Dempster-Shafer Theory of Evidence
 The Dempster-Shafer {D-S} theory of evidence {Shafer, 1976} is a powerful
tool for rep-
 resenting uncertain knowledge. This theory has inspired many researchers
to investigate
 335Al-Ani & Deriche different aspects related to uncertainty and lack of
knowledge and their applications to real
 life problem. Today , the D-S theory covers several different models, such
as the theory of
 hints {Kohlas & Monney, 1995} and the transferable belief model {TBM} {Smets,
1998}. The latter will be adopted in this paper as it represents a powerful
tool for combining
 measures of evidence.
 Let002 = f022
 1 ; :::::; 022
 K
 g be a finite set of possiblehypotheses. This set is referred to as the
frame of discernment, and its powerset denoted by 2 002
 . Following are the basic concepts of
 the theory: Basic belief assignment {BBA}. A basic belief assignment m is
a function that assigns
 a value in [0; 1] to every subset A of 002 and satisfies the following:
m{;}= 0; and
 X
 A022002 m{A} = 1 {1} It is worth mentioning that m{;} could be positive
when considering unnormalized combi-
 nation rule as will be explained later. While in probability theory a measure
of prob ability is
 assigned to atomic hypotheses 022
 i
 , m{A} is the part of belief that supports A, but does not
 supportanything more specific, i.e., strict subsets of A. For A 6= 022
i
 , m{A} reflects some ig-
 norancebecause it is a belief that we cannot subdivide into finer subsets.
m{A} is a measure
 of support we are willingto assign to a composite hypothesis A at the expense
of support m{022
 i
 } of atomic hypotheses 022 i
 . A subset A for which m{A} > 0 is called a foc al element.
 The partial ignorant associated with A leads to the following inequality:
m{A} + m{A} 024 1,
 whereA is the compliment of A. In other words, the D-S theory of evidence
allows us
 to represent only our actual know ledge without being forced to overcommit
when we are ignorant.
 Belief function. The belief function, bel{:}, associated with the BBA m{:}
is a function
 that assigns a value in [0; 1] to every nonempty subset B of 002. It is
called \degr e e of belief
 in B" and is defined by
 bel{B} =
 X
 A022B
 m{A} {2}
 We can consider a basic belief assignment as a generalization of a probability
density func-
 tion whereas a belief function is a generalization of a probability function.
 Combination rule. Consider two BBAs m
 1
 {:} and m 2
 {:} for belief functions bel
 1
 {:} and
 bel
 2 {:} respectively. Let A
 j
 and B
 k
 be focal elements of bel
 1
 and bel 2
 respectively. Then
 m
 1
 {:} and m
 2
 {:} can be combined to obtain the belief mass committed to C 032 002 according
 to the following combination or orthogonal sum formula {Shafer, 1976},
 m{C } = m
 1
 010 m
 2
 {C } =
 X
 j;k;A
 j
 \B
 k
 =C
 m
 1
 {A
 j }m
 2
 {B
 k
 }1 000
 X j;k;A
 j
 \B
 k
 =; m
 1
 {A j
 }m
 2 {B
 k
 } ; C 6= ; {3}
 336A New Technique for Combining Multiple Classifiers
 The denominator is a normalizing factor, whichintuitively measures how much
m
 1 {:} and
 m
 2
 {:} are conflicting. Smets {1990} proposedthe unnormalized combination rule:
 m
 1   flm
 2 {C } =
 X A
 j
 \B
 k
 =C
 m
 1
 {A
 j
 }m
 2
 {B k
 }; 8C 022 002 {4}
 This rule implies that m{;} could be positive, and in such case reflects
some kind of con- tradiction in the belief state. In this work we willconsider
that m{;} = 0 and use the normalized combination rule. A comparison between
normalized and unnormalized combi-
 nation rules for the problem of combining classifiers will be considered
in the future.
 Combining several belief functions. The combination rule can be easily extended
to
 several belief functions by repeating the rule for new belief functions.
Thus the pairwise
 orthogonal sum of n belief functions bel
 1
 ; bel 2
 ; 001 001 001 ; bel
 n
 , can be formed as
 {(bel 1
 010 bel
 2
 } 010 bel
 3
 } 001 001 001 010 bel
 n
 = n
 M
 i=1 bel
 i
 {5} Notation. According to Smets {2000}, the full notation for bel andits
related functions is:
 bel
 002<
 Y;t
 [E C
 Y;t
 ]{w 0
 2 A} = x where Y represents the agent, tthe time, 002 the frame of discernment,
< a boolean algebra
 of subsets of 002, w
 0
 the actual world, A a subset of 002, and E C
 Y;t
 all what agent Y knows
 at t. Thus, the above expression denotes that the degree of belief held
by Y at t that w
 0
 belongs to the set A of worlds is equal to x. The belief is based on the
evidential corpus
 E C
 Y;t
 held be Y at t. Inpractice, many indices can be omitted for simplicity sake.
Usually < is the power set
 of 002, which is 2
 002
 . When bel is defined on 2
 002 , < is not explicitly stated. 'w
 0
 2 A' is denoted
 as 'A'. Y and/or t are omitted when the values of the missing elements are
clearly defined
 from the context. Furthermore, E C is usually just a conditioning event.
So, bel{A} is one
 of the most often used notations {Smets, 2000}. In the proposed method,
we will adopt the
 following notation: bel
 n
 {022
 k
 }, where the agent isthe classifier, and the subsets of concern
 are the class labels.
 It is important to mention that the combination rule given by Eq. 3 assumes
that the
 belief functions to be combined are independent. Consider that we have certain
information and would like to measure its belief, then we can think of this
process as a mapping from
 the \original information level" to the \belief level". Liu & Bundy {1992}
explainedthat
 independence in the original information level would lead to independence
in the belief level. But, if two independentbelief functions are rooted to
the original information level, then
 their original information may or may not be independent. For the problem
of combining
 multiple classifiers, the original information level consists of outputs
of the classifiers to be
 combined, while the belief level consists of the evidence of these classifiers
{or their BBAs}.
 The assumption that these BBAs are independent, whether obtained from independent
or
 dependent original information, can hence justify the use of D-S theory.
In fact, many
 337Al-Ani & Deriche existing classifier combination methods assume the classification
results of different classi-
 fiers to be independent {Mandler & Schurmann, 1988; Hansen & Salamon, 1990;
Xu et al.,
 1992}. Since the classifiers' evidence plays a crucial role in the combination
performance,
 there is an increased interest in the proper estimation of such evidence.
In the next section,
 we discuss how a number of existing classifier combination methods estimate
evidence of
 classifiers, and in section 4 we present our proposed method. 3. Existing
Methods for Computing Evidence
 Mandler & Schurmann {1988} proposed a method that transforms distance measures
of the
 different classifiers into evidence. This was achieved by first calculating
a distance between
 learning data sets and a number of reference points in order to estimate
statistical distri- butions of intra- and interclass distances. For both,
the a posteriori probability function
 was estimated, indicating to which degree an input pattern belongs to a
certain reference
 point. Then, for each class label, the class conditional probabilities were
combined into evidence value ranging between 0 and 1, whichwas considered
as the BBA of that class.
 Finally, Dempster's combination rule was used to combine the BBAs of the
different classi-
 fiers to give the final result. As explained by Rogova {1994}, this method
brought forward questions about the choice of reference vectors and the distance
measure. Moreover, ap-
 proximations associated with estimation of parameters of statistical models
for intra- and
 interclassdistances can lead to inaccurate measure of the evidence. Xu et
al. {1992} used K + 1 classes to perform the classification task, where for
the
 {K + 1}
 th
 class denotes that the classifier has no idea about which class the input
comes
 from. For each classifier c
 n
 , n= 1::N , recognition, substitution, and rejection rates {
* 
 n
 r
 , 
* 
 n
 s ,
 and 1 000 
*  n
 r
 000 
*  n
 s
 } were used as a measure of BBA, m
 n
 , on 002 as follows:
 1. If the maximum output of a specific classifier belongs to K + 1, then
m
 n
 has only a focal element 002 with m n
 {002} = 1.
 2. When the maximum output belongs to one of the K classes, m
 n
 has two focal elements 022
 k
 and022
 k
 with m
 n
 {022 k
 } = 
* 
 n
 r
 , m
 n
 {022
 k
 }= 
* 
 n s
 . As the classifier says nothing about any
 other propositions, m
 n
 {002} = 1000 m
 n
 {022 k
 } 000 m
 n
 {022
 k
 }.
 The drawback of this method is again the way evidence is measured. There
are two problems
 associated with this method. Firstly, many classifiers do not produce binary
outputs, but rather probability like outputs. So, inthe first case, it is
inaccurate to assign 0 to both
 m
 n
 {022
 k
 } and m
 n
 {022
 k
 }. Secondly, this way of measuring evidence ignores the fact that classifiers
 normally do not have the same performance with different classes. This had
a clear impact
 on the performance of this combination method when compared with other conventional
 methods especially the Bayesian {Xu et al., 1992}.
 Rogova {1994} used several proximity measures between a reference vector
and a clas- sifier's output vector. The proximity measure that gives the
highest classification accuracy
 waslater transformed into evidences. The reference vector used was the mean
vector, 026
 n k
 , of
 the output set of each classifierc
 n and each class label k. A number of proximity measures, d
 n
 k
 , for 026
 n
 k
 and y
 n
 were considered. For each classifier, the proximity measure of each class
 338A New Technique for Combining Multiple Classifiers
 is transformed into the following BBAs:
 m
 k
 {022
 k } = d
 n
 k ; m
 k
 {002} = 1000 d
 n
 k mk
 {022
 k
 } = 1 000
 Y
 l6=k
 {1 000 d n
 l
 }; mk
 {002} = Y
 l6=k {1000 d
 n
 l
 }
 The evidence of classifier c
 n
 and class k isobtained by combining the knowledge about
 022
 k
 , thusm
 k
 010 mk
 . Finally, Dempster's combination rule was used to combine evidences
 for all classifiers to obtain a measure of confidence for each class label.
Note that the first
 combination was performed with respect to the class label {Rogova usedthe
notations k andk}, while in the second one the agent was n. This idea was
a promising one. However, the
 ma jor drawback is the way the reference vectors are calculated, where the
mean of output vectors may not be the best choice. Also, trying several proximity
measures and choosing
 the one that gives the highest classification accuracy is itself questionable.
 4. The Proposed Combination Technique
 In this section we willestimate the value of m n
 {022
 k }, whichrepresents the belief in class label
 k that is produced by classifier c
 n
 . In addition, we willalso estimate m
 n
 {002}, which reflects
 the ignorance associated with classifier c
 n
 . Since the ultimate ob jective is to minimize the MSE between the combined
classification results and the target output, m
 n
 {022
 k } and m
 n
 {002}
 will be estimated using an iterative procedure that aims at attaining this
ob jective. We will
 first compare y
 n
 , which is the output classification vector produced by classifier c
 n
 , to a
 reference vector, w
 n
 k , and the obtained distance will be used to estimate the BBAs. These
 BBAs will then be combined to obtain a new output vector, z, that represents
the combined
 confidence in each class label. w
 n
 k
 will be measured such that the MSE between z and the
 target vector, t, of a training dataset is minimized. Note that there are
two indices for w
 n
 k
 . Thus, for class label k, we don't only consider the value assigned to
it by classifier c
 n
 , but rather the whole output vector {values assigned to each class label}.
Let the frame of discernment 002 = f022
 1
 ; 001 001 001 022
 k ; 001 001 001 ; 022 K
 g, where 022 k
 is the hypothesis that the input x is of class k. Considering a BBA, m
 n
 , such that m n
 {022
 k } 025 0, m
 n
 {002} =
 1 000 P
 K
 k=1 m
 n
 {022 k
 }, and m
 n
 is 0 elsewhere. Let d
 n
 {022 k
 } be a distance measure and g
 n
 the
 unnormalized ignorance of classifier c
 n , then m
 n
 {022
 k
 } and m
 n
 {002} willbe estimated according
 to the following formulas:
 d
 n
 {022
 k
 }= exp{000kw
 n
 k
 000 y
 n
 k 2
 } {6}
 m
 n
 {022 k
 } =
 d n
 {022
 k }K X
 k=1
 d n
 {022
 k } + g
 n
 {7}
 m
 n
 {002} =
 g
 nK
 X
 k=1
 d
 n {022
 k
 } + g
 n
 {8} where m
 n
 {022
 k
 } and m n
 {002} are the normalized values of d
 n
 {022
 k
 } and g n
 respectively. Similar
 to w
 n k
 , the minimized MSE will be used to estimate g
 n
 . 339Al-Ani & Deriche Evidences of all classifiers are combined according
to the normalized combination rule toobtain a measure of confidence of each
class label. The k
 th element of the new combined vector is given by:
 z{k} = m{022 k
 } = m
 1 {022
 k
 }010 001 001 001 010 m N
 {022
 k } =
 M
 n2N
 m
 n
 {022
 k
 } {9}
 For a given classifier c
 n
 , let I = f1 001 001 001 N g n fng, m
 I
 =
 L
 i2I
 m
 i
 , then Eq. 9 can be written as:
 z{k}= m
 I
 {022 k
 } 010 m n
 {022
 k } {10}
 where according to Eq. 3, the combination of two BBAs is: m
 j
 {022 k
 } 010 m l
 {022
 k } =
 m
 j {022
 k
 }m
 l
 {022 k
 } + m
 j
 {022
 k }m
 l
 {002} + m
 j
 {002}m l
 {022
 k }1 000 X
 p
 X
 q
 q6=p
 m
 j
 {022 p
 }m
 l {022
 q
 } {11}
 w
 n
 k
 and g
 n
 will be initialized randomly, then their values will be adjusted according
to a
 training dataset so that the MSE of z is minimized.
 E rr = kz000 tk
 2
 {12}
 The values of w
 n
 k and g
 n
 are adjusted according to the formulas:
 w n
 k
 [new] = w
 n
 k [old] 000 ff @ E rr@ w
 n
 k
 [old]
 {13}
 g
 n
 [new] = g
 n
 [old] 000 fi
 @ E rr@ g
 n [old]
 {14} where ff and fi are the learning rates. The terms @ E rr=@ w
 n
 k and @ E rr=@ g
 n
 are derived as
 follows:
 @ E rr@ w
 n
 k
 =
 @ E rr@ z{k} @ z{k}@ m
 n
 {022
 k
 } @ m
 n
 {022
 k
 }@ w
 n
 k
 {15}
 @ E rrg
 n =
 @ E rr@ z{k} @ z{k}@ m
 n
 {022
 k
 } @ m
 n
 {022
 k
 }@ g
 n
 {16} 340A New Technique for Combining Multiple Classifiers
 where, @ E rr@ z{k}
 = 2[z{k} 000 t{k}] {17}
 @ z{k}@ m
 n {022
 k
 } =
 {"
 1 000 X
 p
 X
 q
 q6=p
 m
 n
 {022 p
 }m
 I {022
 q
 } #
 [m
 I {022
 k
 } + m
 I
 {002}] + [m
 n
 {022 k
 }m
 I {022
 k
 } + m
 n
 {022 k
 }m
 I {002} + m
 n
 {002}m
 I
 {022
 k
 }]
 "
 X
 p
 p6=k
 m
 I {022
 p
 } #},
 "
 1 000 X
 p
 X
 q
 q6=p
 m
 n
 {022 p
 }m
 I {022
 q
 } #
 2
 {18} @ m
 n
 {022
 k
 }@ w
 n
 k = 000
 2 exp{000kw
 n
 k
 000 y
 n
 k 2
 }[w
 n k
 000 y
 n ][
 X
 p
 p6=k
 d
 n
 {022
 p } + g
 n
 ][
 X p
 d
 n
 {022
 p
 } + g
 n
 ]
 2 {19}
 @ m
 n {022
 k
 }@ g
 n = 000
 d
 n
 {022
 k }[
 X
 p
 d
 n
 {022
 p } + g
 n
 ]
 2
 {20}
 Fig. 2 shows a flow chart of these learning procedures. It has been found
that adjusting
 the values of g n
 can be achieved during the first few iterations. By continuing the training
 to fine-tune the values of w
 n
 k
 until there is no further improvement on the training set, or
 wereach a pre-defined maximum number of epochs
 1
 , the result could be further enhanced.
 Note that the weight values are adjusted by each pattern {not batch training}.
We fix the value of fi = 10 0006
 , while ff is first initialized to 5 002 10
 0004
 , and is then changed according to
 the value of MSE, as described in the flow chart.
 Althoughthe computational cost involved in implementing our technique is
higher than
 that of other combination methods
 2
 , we only need to perform training once, which can be done off-line. Then,
with the optimal values of w
 n
 k
 and g
 n
 , we can perform the on-line combination, which is comparable to other combination
methods.
 Onthe other hand, as indicated in the beginning of this section, we consider
a reference
 vector, w n
 k
 , for each class. This leads to an increase in training time as the number
of classes
 and/or classifiers increases. An alternative is to consider only using a
reference value for
 each class, w
 n
 k
 . This will save more than 50045 of training time for the case of several
 classifiers and classes. Note that the same learning formulas are applicableby
replacing w
 n k
 with w
 n k
 and y
 n with y
 n
 k . We willrefer to these two alternative approaches as DS1 and DS2, respectively.
In the following section, we will compare DS1 and DS2 with other well-known
 combination methods.1. The maximum number of epochs is set to 50 in all
experiments described in this paper 2. Training time of most of the experiments
conducted in section 5 required less than 3 minutes on a
 conventional PC
 341Al-Ani & DericheAhmed al-AniStarta = a*0.7a = a*1.03EndFor each patternErr
= ErroldnewIt_no=It_no+1k{q }and{Q}mmnnComputenkwgnandAdjust-6b=10 a=5*10
,-4Initialize learningratesErr  -Err  >ThnewoldIFnkwgnandn k=1:N,=1:KzandErrnewComputeIt_no 10and-4IFErrnewErroldIt_noIt_nomax=   max. no. of iterationsTh=   current
no. of iterations=   previous MSE=   error threshold=   current MSENoYesYesNoRandomly
initializeFigure 2: Training procedure of the proposed technique It is worth
mentioning that although the training procedures of both the proposed method
and the backpropagation algorithm of ANN are based on minimizingthe MSE using
iterative approaches, the proposed method and ANN are not similar. The backpropagation
 training operates by passing the weighted sum of its input through an activation
function,
 usually in a multi-layer architecture known as multi-layer perceptron {MLP}.
Extracting
 rules from a trained MLP is a very challenging problem. On the other hand,
the training of the proposed method operates by measuring a distance between
a classification vector and
 a reference vector. This distance would later be used to measure the belief
of each class
 label for all classifiers. The final confidence of each class label is obtained
by combining thebeliefs of all classifiers. Unlike MLP, the belief of a given
class label for each classifier indicates its contribution towards the final
confidence. The reader may refer to {Denoeux,
 2000} for a description of an ANN classifier based on the D-S theory.
 342A New Technique for Combining Multiple Classifiers
 5. Performance Analysis of Different Combination Methods The following three
classification problems have been considered: texture classification,
 classification of speech segments according to their manner of articulation,
and speaker
 identification. ANNs are used to perform classification for the three problems.
For each case, classifiers will be sorted according to their performance,
such that the best classifier
 isreferred to as c 1
 , the 2
 nd best as c
 2
 , and the worst one as c N
 .
 For each problem, we willconsider different number of classes, and combine
the results of different number of classifiers, where combining results of
the best, the worst and mixtures
 of best and worst classifiers will be investigated. For example, ifwe have
five classifiers
 and would like to combine two of these, then we willconsider combining the
best two,
 fc
 1
 ; c
 2
 g, best one and worst one, fc
 1
 ; c
 5
 g, and worst two classifiers, fc
 4 ; c
 5
 g. The following
 combination methods were tested: the weighted sum {WS} 3
 , average {Av}, median {Md},
 maximum {Mx}, ma jority voting {MV}, fuzzy integral {FI} {Cho & Kim, 1995a}
 4
 , Rogova's
 D-S method {DS0} {Rogova, 1994}, and our proposed method with its two alternatives
{DS1
 & DS2}. The training set used to train the ANNs will be used to estimate
the confusion matrix for WS and FI, as well as to estimate the evidence of
DS0, DS1, and DS2. Two measures will be used to compare the performance of
the different combination methods, namely: overal l performanc e and error
re duction rate {ERR}. The overall perfor-
 mance is the mean of classification accuracy obtained by combining all considered
subsets
 of 2; 001 001 001 ; N classifiers. E RR is the percentage of error reduction
obtained by combining classifiers with reference to the best single classifier:
 E RR =
 E R
 BSC 000 E R
 C CE R
 BSC 002 100 {21}
 where E R
 BSC
 is the error rate of the best single classifier and E R
 C C
 is the error rate ob-
 tained by combining the considered classifiers. Unlike classification accuracy,
E RR clearly
 shows how the performance of the combined classifiers improves or deteriorates
compared
 to the best single classifier. In other words, it shows the merit of performing
the combi-
 nation. We willspecifically concentrate on the maximum ERR obtained by combining
all
 the considered subsets of 2; 001 001 001 ; N classifiers. In addition,
we will also investigate how the
 value of E RR gets affected by increasing the number of combined classifiers.
5.1 Texture Classification Several experiments have been carried out for
the classification of texture images. The
 textures considered here are: bark, brick, bubbles, leather, raffia, water,
weave, wood and
 wool {USC, 1981}. In order to obtain a better comparison between the different
combination
 methods, we considered classifying the first two textures, then the first
three, the first five
 and finally all the nine textures. Additive Gaussian noise, with different
signal-to-noise ratio,
 has been added to {1024 002 1024} pixels image of each texture class to
form the training and
 testing sets. 961 patterns were obtained from each image using {64 002
64} windows with an overlap of 32 pixels.3. The weights of each classifier
are determined according to the classification accuracy of each class label
using the training dataset
 4. The reader may refer to Appendix A for a brief description of this method
 343Al-Ani & DericheNo. of classesS DH 1S DH 2S DH 3S DH 4E n286.9685.7384.4485.4591.14384.5884.5283.9186.2489.72585.1084.6284.3483.4688.84980.9777.4477.5175.7283.65Table
1: Texture classification accuracy of the five original classifiers for different
number
 of class labels Four nine-feature vectors were calculated using statistics
of sum and difference histogram
 {S DH } of the co-occurrence matrix with different directions, vertical
{S DH 1
 }, horizontal {S DH
 2
 }, and the two diagonals {S DH
 3
 and S DH
 4
 } . For each direction, the features used
 were: mean, variance, energy, correlation, entropy, contrast, homogeneity,
cluster shade,
 and cluster prominence. The fractal dimension {FD} has also been used to
form the tenth
 feature of each vector. The energy contents of texture images {E n} has
been used to form another feature vector using 9 different masks. Again the
tenth feature was FD. Each of these five feature vectors has been used as
input to an ANN. The numbers of
 training and testing patterns depend upon number of classes considered,
i.e. for the case
 oftwo classes, 15376 patterns were used to train the networks and 5766 to
test them. The
 resultsobtained are shown in Table 1. Note that as the number of classes
increases the overall accuracy decreases. In addition, the performance of
the E n classifiers is found to be better than that of the other four.No.
of classesWSAvMdMxMVFIDS0DS1DS2289.1689.0487.6690.1288.0990.0888.7090.6690.72388.5288.3987.4188.8687.3088.7188.4090.2190.08589.6089.4187.9989.2387.8390.2889.5292.6991.50984.9684.5583.3782.9083.2386.7684.8789.8386.79Table
2: Overall performance of the various combination methods for different number
of
 class labels {texture classification}
 The overall performance of the tested combination methods for different
number of class
 labelsare shown in Table 2. For the case of 2 classes, it is clear that
the overall performances
 of DS1 and DS2 are better than that of the other combination methods. When
mixtures of
 good and bad classifiers are considered, the performance of combination
methods, except
 for DS1 and DS2, is closer to or worse than that of the best single classifier.
This is shown in Table 3 for the combination of fc
 1
 ; c 3
 ; c
 4 ; c
 5
 g, fc
 1
 ; c 4
 ; c
 5 g, fc
 1 ; c
 5
 g, etc
 5
 . When 3 and 5
 classes are considered, DS1 performs slightly better than DS2, andboth outperform
the
 other methods. The gap between DS1 and other methods gets wider when all
9 classes are
 considered. The superiority of DS1 reflects the advantage of using the whole
output vector
 in measuring evidences of classifiers.5. The reader may refer to Appendix
B for detailed results of other cases
 344A New Technique for Combining Multiple ClassifiersClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c
 292.5692.5992.5992.5192.5192.6192.4092.4692.46c
 1
 , c
 591.1691.1291.1291.0991.0991.0091.3391.6291.61c
 4 , c
 585.0785.0785.0785.2285.2285.0785.1585.1085.12c
 1
 , c
 2 , c
 391.2191.2188.9291.6288.9291.6990.8192.4092.53c
 1
 , c
 2 , c
 591.0390.8188.6891.4888.7191.4890.4392.4792.39c
 1
 , c 4
 , c
 589.8089.5986.2191.2186.2191.2488.8891.6891.78c
 3
 , c 4
 , c
 585.3885.3885.4785.4085.4885.3385.2285.4385.40c
 1 , c
 2
 , c
 3
 , c
 489.9489.7087.8491.4789.1391.5989.1392.2192.33c
 1
 , c
 2
 , c
 3
 , c 589.7089.4287.5391.4289.0491.2988.9292.3292.42c
 1
 , c
 2
 , c
 4
 , c 589.7289.4987.3791.4888.9491.4089.0092.2592.26c
 1
 , c
 3
 , c
 4
 , c 588.5788.4586.3091.0987.0390.9887.8191.7891.87c
 2
 , c
 3
 , c
 4
 , c 586.0786.1185.8786.2586.2686.1385.9386.6686.80c
 1
 , c
 2
 , c
 3
 , c 4
 , c
 588.8188.5486.6391.3386.5991.2188.1092.2192.33Table 3: Classification accuracy
of texture images using different combination methods {2
 textures} The best E RR values of WS, FI, DS0, DS1 and DS2 are determined
according to Eq.
 21. Since WS has been widely used in the literature, and it outperforms
other conventional
 methods {Av, Md, Mx, and MV}, as observed in Table 2, then we will use it
as a represen-
 tative of the conventional methods when performing the comparison with FI,
DS0, DS1 and
 DS2. Figure 3a shows the E RR values when 2 classes are considered. It is
clear that the maximum E RR values of these five combination methods are
very close, ranging between
 14045 to 16045. They are obtained by combining the best two classifiers
for WS, FI and DS0, while DS1 and DS2 use three classifiers to obtain their
maximum E RR. As mentioned
 earlier, The performance of the first four individualclassifiers is weaker
than that of the
 E n. Notice that, for both DS1 and DS2, there is no significant degradation
in E RR as the
 number of combined classifiers increases.
 For the case of 3 classes, bothDS1 and DS2 outperform other combination
methods
 in terms of the maximum E RR. They achieve values of 17:3045 and 19:6045
respectively,
 compared to 11:4045 or less for other methods as shown in Figure 3b. In
addition, E RR of
 DS1 and DS2 are not affected as the number of combined classifiers increases.
For the case of 5 classes, the maximum E RR values sorted in a descending
order are:
 DS1 50:7045, DS2 40:2045, FI 31:6045, WS 28:1045, and DS0 23:8045,
as shown in Figure 3c. In
 addition, E RR values of DS1 improve as the number of combined classifiers
increases, DS2 is the second best, while E RR values of other methods degrade
as the number of combined
 classifiers increases. For the case of 9 classes, the superiority of DS1
becomes clearer, where
 as shown in Figure 3d, the maximum E RR value of DS1 is 54045 compared
to 37:5045 or less for other methods. It is worth mentioning that even though
the maximum E RR values of
 other methods degrade, they still perform better than the best single classifier.
This leads
 us to conclude that as the number of classes increases, the performance
of most classifier
 combination methods gets better overall.
 345Al-Ani & Deriche-30-20-1001020WSFIDS0DS1DS2-20-10010203022.533.544.5501020304050ERRNo.
of combined classifiers22.533.544.551020304050{a} 2 classes{b} 3 classes{c}
5 classes{d} 9 classesFigure 3: E RR of different classifier combination
methods obtained by considering different number of classifiers for the cases:
{a} 2 classes, {b} 3 classes, {c} 5 classes, and
 {d} 9 classes
 Taking all these facts into consideration, we can sort the methods in a
descending order asfollows: DS1, DS2, FI, WS, DS0, andthe other conventional
methods. Thus, in summary,
 for the problem of texture classification, our proposed technique with its
two alternatives {DS1and DS2} clearly outperforms other standard combination
methods with an increase in classification accuracy of about 2 000 7045.
For the cases of 2 and 3 classes, there is a little
 difference in performance between DS1 and DS2. This is because using reference
vectors of
 small size, 2 002 1 and 3 002 1, does not make a big impact upon the estimation
of evidence
 compared to that obtained using a single reference value. As the size of
the reference vector
 increases, 5 002 1 and 9 002 1 for the other two cases, its impact on
estimating the evidence
 becomesclearer, which leads to better results, but at the cost of increasing
computational
 load.
 5.2 Speech Segment Classification Six different input feature sets have
been used to classify speech segments according to their
 manner of articulation, these were: 13 mel-frequency cepstral coefficients
{MFC}, 16 log
 mel-filter bank {MFB}, 12 linear predictive cepstral coefficients {LPC},
12 linear predictive 346A New Technique for Combining Multiple ClassifiersNo.
of classesMFCMFBLPCLPRWVTARP388.2190.9881.6480.6990.6470.87683.1685.5074.7774.0684.3362.90978.4883.2471.6470.0381.3356.66Table
4: Speech segment classification accuracy of the six original classifiers
for different
 number of class labels
 reflectioncoefficients {LPR}, 10 wavelet energy bands {WVT}, and 12 autoregressive
model
 parameters {ARP}. For this experiment, speech was obtained from the TIMIT
database {MIT, SRI, & TI, 1990}. Segments of 152 speakers {56456 segments}
were used to train
 the ANNs, and 52 speakers {19228 segments} to test them. Three cases were
considered:
 3 classes {vowel, consonant, and silence}, 6 classes {vowel, nasal, fricative,
stop, glide, and silence}, andfinally 9 classes {vowel, semi-vowel, nasal,
fricative, stop, closure, lateral,
 rhotic, and silence}. The classification results for these three cases are
summarized in Table
 4.No. of classesWSAvMdMxMVFIDS0DS1DS2390.8090.4190.2086.1589.5190.6590.9091.5791.31685.5484.9184.6281.1684.0385.2985.1887.1886.37983.0582.3181.9375.6381.0082.7382.8685.2084.22Table
5: Overall performance of the various combination methods for different number
of class labels {speech segment classification}
 The two best individualclassifiers are MFB and WVT in all three cases, followedby
 MFC then other methods. Unlike texture classifiers that had one good classifier
and four,
 relatively, weak classifiers, we have here three good classifiers {MFB,
MFC and WVT} and
 three weak classifiers {LPC, LPR and ARP}.
 The overall performance values of the various combination methods are displayed
in
 Table 5. For the case of the 3 classes, it can be seen that the overall
performance of DS1 is
 better than that of DS2 and they both outperform the other methods. This
becomes even
 clearer as the number of classes increases {with more than 2045 increase
in accuracy}.
 The E RR values for the case of 3 classes are shown in Figure 4a. The maximum
E RR
 value of DS1 is 23:4045, whichis achieved by combining all six classifiers,
compared to 20:3045
 for DS2 and 19:6045 or less for the other methods. The gap between DS1
and the other methods gets wider when we consider 6 and 9 classes as shown
in Figures 4b and 4c. Because there are more good classifiers in this experiment
compared to that of the texture
 experiment, the variations of the E RR values when the number of classifiers
increases are found to be smaller. In addition, we can see that as the number
classes increases DS1 keeps
 its steady and superior performance in terms of E RR with more than 10045
increase.
 As a summary, DS1 outperforms other methods in terms of overall performance
and
 E RR measurements. It is followed by DS2, WS, and the rest of the methods.
347Al-Ani & Deriche234565101520252345610152025WSFIDS0DS1DS223456510152025No.
of combined classifiersERR{a} 3 classes{b} 6 classes{c} 9 classesFigure 4:
E RR of different classifier combination methods obtained by considering
different number of classifiers for the cases: {a} 3 classes, {b} 6 classes,
and {c} 9 classes
 5.3 Speaker Identification
 Three limited-scope experiments were carried out to perform speaker identification
using 2,
 3, and 4 speakers. Speech data from the TIMIT database was also used {MIT
et al., 1990}.
 The number of training patterns were 3232, 4481 and 5931 respectively, and
the number of testing patterns were 1358, 1921 and 2542 respectively. The
same features used to classify
 speech segments according to their manner of articulation were used to identify
speakers.
 Classification results of the six classifiers are shown in Table 6. The
performance of the individual classifiers are not quite similar to the speech
segment problem, where the three
 good classifiers are: MFB, MFC and LPC and the three weak classifiers are:
LPR, WVT
 andARP. The overall performance of the various combination methods are shown
in Table 7. For
 the case of 2 classes, it is clear that the overall performance of most
combination methods is very comparable. The superiority of DS1, and to a
lesser degree DS2, becomes clear as
 the number of classes increases {more patterns were includedto estimate
evidence}.
 Note that, becauseof the high performance of individual classifiers for
the case of 2
 classes, a small difference in the performance of combination methods will
have great impact
 on E RR, which explains the graphs' fluctuations, as shown in Figure 5a.
It can be seen 348A New Technique for Combining Multiple ClassifiersNo. of
classesMFCMFBLPCLPRWVTARP294.5896.1792.4989.6087.8084.55385.8487.2582.2081.0074.3973.03485.0185.9680.8477.9770.9364.59Table
6: Speaker identification accuracy of the six original classifiers with different
number of speakersNo. of classesWSAvMdMxMVFIDS0DS1DS2295.5395.5095.2695.3695.2195.2595.4695.4895.45390.8090.4190.2086.1589.5190.6590.9091.5791.31483.0582.3181.9375.6381.0082.7382.8685.2084.22Table
7: Overall performance of the various combination methods for different number
of
 class labels {speaker identification}
 that both maximum E RR and overall performance of most combination methods
are close. These results do not favor DS1 nor DS2, because they have an additional
computational cost. Let's now consider the case of 3 classes, Figure 5b shows
that the maximum E RR
 of DS2 is the highest followed by DS1, and they both outperform the other
methods. For
 the case of 4 classes, the maximum E RR of DS1 is 30045, compared to 27045
or less for other
 methods, as shown in Figure 5c. The figure also shows that E RR values of
DS2 and WS are close. However, as the overall performance of DS2 is better
than that of WS, DS2 can
 be considered as the second best method followed by WS, DS0 and finally
FI.
 The above results clearly show how the performance of DS1 and DS2 get affected
by the
 number of training patterns, which is crucial in achieving good estimation
of the evidence of
 each classifier. This is very clear for the case of 2 speakers. Their performance,
however, get
 better as the number of speakers and training patterns increase. In other
words, DS1 and
 DS2 require a larger number of patterns to work properly. Failing to provide
such number
 of patterns, other conventional methods, such as WS, can achieve similar
performance.
 The experiments of textures, speech segments and speaker classification
show that our
 proposed technique clearly outperforms the other methods in terms of overall
performance
 and E RR, providing that a sufficient number of patterns to estimate evidence
of classifiers exists. Also, among the different combination methods, DS1
and DS2 are the least effected
 by the inclusion of weak classifiers. The experiments also show that the
BBAs could be
 better estimated using reference vectors rather than reference values, especially
for large number of classes.
 It is worth mentioning that each one of the combination methods has its
own merit. For example, the MV is very useful combination method when dealing
with classifiers that
 produce results of the abstract level. When working in the measurement level,
other com-
 bination methods could have better performance.
 TheMx method can provide good results when the performance of the combined
classi-
 fiers are close. In such case, the classifier with higher confidenc e can
provide better results
 349Al-Ani & Deriche23456-50510152023456152025303540WSFIDS0DS1DS22345651015202530No.
of combined classifiersERR{a} 2 classes {b} 3 classes {c} 4 classes Figure
5: E RR of different classifier combination methods obtained by considering
different number of classifiers for the cases: {a} 2 classes, {b} 3 classes,
and {c} 4 classes
 than any individual classifier. This is shown in Tables 11-13 {refer to
Appendix B}, where
 good results are achieved when combining the best two or three classifiers
of the speech segment experiment compared to the best individual classifier.
However, if there is a clear
 difference in the performance of classifiers, as in the case when considering
mixtures of good and bad classifiers, then using Mx to combine the classification
results will not be a good
 choice. In case we don't have any information about the performance of the
classifiers, i.e.,
 there is no training dataset, the Av and Md methods could provide an attractive
choice. Similar to the findings of {Kittler et al., 1998; Alkoot & Kittler,
1999}, the performance of
 these two methods are found to be close with slight favor of the Av method.
If the clas-
 sification accuracy of the different classifiers are available, then the
WS method represents a good choice, where it outperforms Av in almost all
the conducted experiments. This is
 expected, as associating each classifier with a weight that reflects its
performance, would
 make the better classifier contributes more towards the final decision.
If the performance
 of the combined classifiers are very close, then combining their results
using both the Av and WS methods would lead to very similar performance,
as shown in Tables 11-13 for the
 cases of combining the best two and three speech segment classifiers.
 The FI and DS0 represent two non-linearcombination methods. According to
{Cho &
 Kim, 1995a}, the performance of FI was slightly better than the WS when
tested using an
 350A New Technique for Combining Multiple Classifiers
 optical character recognition database, whichis similar to the results we
obtained for the
 texture experiments. However, for speech segment classification and speaker
identification experiments,the performance of FI was not as good as that
of WS. On the other hand, the
 experiments conducted here show that WS slightly outperforms DS0. Note that
Rogova
 {1994} onlycompared DS0 to the original classifiers. The main problem with
both FI
 and DS0 is the appropriate estimation of their parameters. For example,
the desired sum of fuzzy densities affects the combination results of FI,
while the choice of the proximity
 measure and reference vector plays an important role in the performance
of DS0.
 DS1 and DS2 differ from DS0 by the appropriate measure of the reference
vectors, and hence the accurate estimation of the evidence of each classifier.
This will exploit the complementary information provided by the different
classifiers. In other words, the
 accurate estimation of evidence of each classifier will lead to minimizing
the MSE of the
 combined results, and hence resolving the conflicts between classifiers.
6. Conclusion
 We have developed in this work a new powerful classifier combination technique
based on the D-S theory of evidence. The technique, based on adjusting the
evidence of different
 classifiers by minimizingthe MSE of training data, gave very good results
in terms of overall
 performance and error reduction rate. To test the algorithm, three experiments
were carried
 out: texture classification, speech segments classification, and speaker
identification. All
 of the experiments showed the superiority of the proposed technique when
compared to
 conventional methods, fuzzy integral, and another D-S implementation that
uses a different measure of evidence. We have shown that accurate estimation
of the evidence from different
 classifiers based on the whole output vectors {DS1} gives the best performance,
especially
 for higher number of class labels. The only drawback of the algorithm is
that training
 can be computationally expensive {this is used to accurately estimate the
evidence of each classifier}. However, this can be executed off-line, and
as such, has no ma jor effect on the
 performance of the algorithm. We have also shown that the proposed algorithm
can easily
 achieve an increase in classification accuracy of the order of 2045 to
7045 compared to other combination methods. We believe that with more work
on enhancing the technique, the
 scheme can form a new framework for pattern classification in the future.
 Acknowledgment
 The authors wish to thank Dr. J. Chebil and Dr. M. Mesbah for their valuable
comments on the paper. The authors also acknowledge the support of Queensland
University of
 Technology for the work presented in this paper. Dr. Deriche acknowledges
the support of
 King Fahd University, Saudi Arabia, where he is currently on leave.
 Appendix A. Classifier Combination Based on the Fuzzy Integral
 Fuzzy integral is a non-linear combination method defined with respect to
a fuzzy measure. Detailed explanation of classifier combination based on
the g
 025
 fuzzy measure can be found
 in the work of Cho & Kim {1995a, 1995b}. 351Al-Ani & Deriche For a finite
set of elements, Z , the g
 025 fuzzy measure {Sugeno, 1977} is defined as the set
 function g: 2 Z
 ! [0; 1] that satisfies the following conditions: 1. g{;} = 0; g{Z } = 1,
2. g{A} 024 g{B } if A 032 B ,
 3. if fA i
 g
 1
 i=1
 is an increasing sequence of measurable sets, then lim
 i!1
 g{A
 i
 } = g{lim
 i!1
 A
 i
 },
 4. g{A [ B } = g{A}+ g{B } + 025g{A}g{B }
 for all A; B 032 Z and A \ B = ;, and for some 025 > 0001. Let h : Z
! [0; 1] be a fuzzy subset of Z . The fuzzy integral over Z of the function
h with respect to a fuzzy measure g
 is defined by
 h{z} ffi g{:} = max
 E022Z
 024
 min
 022
 min
 z2E
 h{z}:g{E }
 025 = max
 ff2[0;1]
 [min{ff; g{F
 ff
 })]; where
 F
 ff = fzjh{z} 025 ffg
 Let Z = fz
 1
 ; 001 001 001 z
 n g, and suppose that h{z
 1
 }025 h{z
 2
 } 025 001 001 001 025 h{z
 n
 }, {if not, Z is rearranged
 so that this relation holds}. Then a fuzzy integral e, with respect to a
fuzzy measure g over
 Z can be computed by
 e =
 n max
 i=1
 [min{h{z
 i
 }; g{A
 i
 })]; where
 A i
 = fz
 1
 ; 001 001 001 z
 i
 g
 g{A
 1
 } = g{fz
 1
 g} = g
 1
 g{A
 i
 } = g
 i
 + g{A
 i0001
 } + 025g
 i
 g{A
 i0001
 }; for 1 < i 024 n 025 is given by solving: 025 + 1 =
 Q n
 i=1
 {1 + 025g
 i
 }, where 0252 {0001; 1} and 025 6= 0. This can be
 calculated by solving an {n 000 1}
 st degree polynomial and finding the unique root greater
 than 0001. For the problem of combining classifiers, Z represents the set
of classifiers, A the ob ject
 under consideration for classification, and h
 k
 {z
 i
 } is the partial evaluation of the ob ject A for class !
 k
 . Corresponding to each classifier z
 i
 , the degree of importance, g
 i
 , that reflects how
 good is z
 i
 in the classification of class !
 k
 must be given. These densities can be induced
 from a training dataset.
 352A New Technique for Combining Multiple Classifiers
 Appendix B. Tables of Classification Accuracy for Different Combination
 MethodsClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c
 290.8990.7990.7990.8390.1790.2290.8191.0990.81c
 1
 , c
 589.6989.5789.5789.6889.1088.9989.9290.4590.29c 4
 , c
 585.0585.0585.0584.6784.8185.1385.5185.4985.44c
 1
 , c
 2
 , c
 389.9289.8388.0490.4887.7789.8889.5991.2691.73c 1
 , c
 2 , c
 589.5589.3987.5490.2387.0989.5689.1891.1291.10c
 1
 , c 4
 , c
 589.0588.8086.9589.6286.6289.2688.8090.7790.73c
 3 , c
 4
 , c
 585.7685.7385.4984.8185.4985.6785.9187.2786.88c 1
 , c
 2 , c
 3
 , c
 489.2989.0787.9890.1787.9189.8588.8291.4491.51c 1
 , c
 2 , c
 3
 , c
 589.0788.8387.2490.0187.5089.5888.8191.4891.53c 1
 , c
 2 , c
 4
 , c
 588.8788.7887.3490.0287.6189.6488.6691.1691.26c 1
 , c
 3 , c
 4
 , c
 588.6088.3887.0589.3987.5289.4488.3991.3191.00c 2
 , c
 3 , c
 4
 , c
 586.4586.4486.1885.5286.3186.3686.4688.3787.20c 1
 , c
 2 , c
 3
 , c
 4
 , c
 588.5488.4787.0689.7387.0389.6588.2991.5091.61Table 8: Classification accuracy
of texture images using different combination methods {3 textures}ClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1 , c
 291.9891.8791.8790.2889.8992.2191.4593.4092.81c
 1
 , c
 591.9591.7491.7490.7289.5891.3091.5593.2792.72c
 4 , c
 585.1485.1985.1984.7784.5084.9485.8286.2885.36c
 1
 , c 2
 , c
 391.4991.2988.7890.3988.6092.3591.0894.3793.19c
 1 , c
 2
 , c
 590.7790.5087.9290.3487.7491.5490.3293.6692.96c 1
 , c
 4 , c
 590.6190.3587.6790.2987.4391.4190.2593.6992.90c
 3
 , c 4
 , c
 586.2986.2285.4085.9985.6485.9186.9889.1687.29c
 1 , c
 2
 , c
 3
 , c
 490.2690.0388.3790.1288.9091.8689.9394.4893.19c
 1
 , c
 2
 , c
 3
 , c 590.1489.9188.1790.2589.0091.8390.0994.4193.26c
 1
 , c
 2
 , c
 4
 , c 589.7389.4387.6290.1988.4591.1189.4193.8292.56c
 1
 , c
 3
 , c
 4
 , c 590.3290.0687.9190.5088.5191.8390.2094.5093.33c 2
 , c
 3 , c
 5
 , c
 686.4186.3885.9985.9885.7986.1587.2189.4786.97c 1
 , c
 2 , c
 3
 , c
 4
 , c
 589.6589.3987.2790.1187.8191.2289.5194.4693.01Table 9: Classification accuracy
of texture images using different combination methods {5
 textures} 353Al-Ani & DericheClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c
 288.4588.1388.1385.4485.9189.7889.2691.4289.20c
 1
 , c
 586.3985.9785.9782.2783.0088.1887.1090.0788.08c 4
 , c
 579.5579.4079.4078.7478.5379.3779.2881.2279.91c
 1
 , c 2
 , c
 387.1086.5384.2084.6584.6089.5187.0692.0088.59c
 1
 , c 2
 , c
 586.4885.8583.6684.0884.1788.7286.9292.1388.53c
 1
 , c 4
 , c
 585.9285.3283.2282.9783.6788.2585.9892.0188.30c
 3
 , c
 4
 , c
 580.2780.2179.6379.5479.3879.9779.8082.4880.89c
 1
 , c
 2
 , c
 3
 , c 486.4285.9584.5684.3385.1689.1885.9692.4588.70c
 1
 , c
 2
 , c
 3
 , c 585.7085.1183.3783.9184.2688.5285.4192.0288.30c
 1
 , c
 2
 , c
 4
 , c 585.7985.3984.2983.8384.9188.5285.9492.4888.66c 1
 , c
 3 , c
 4
 , c
 585.4284.9283.0683.4784.0888.3484.8192.3588.35c 2
 , c
 3 , c
 4
 , c
 581.6081.4481.1580.7581.0381.5481.0284.7082.17c 1
 , c
 2 , c
 3
 , c
 4
 , c
 585.3384.8883.1983.7483.3588.0284.8392.4388.65Table 10: Classification accuracy
of texture images using different combination methods {9
 textures}ClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c 292.3492.3492.3492.3492.2291.8492.3892.4592.29c
 1
 , c
 689.9988.1188.1183.9582.6491.0690.9291.4891.42c
 5
 , c
 681.0980.6580.6576.1976.1082.0381.5482.2982.01c 1
 , c
 2 , c
 392.6392.6192.3792.3492.2792.3692.6392.7992.67c
 1
 , c
 2
 , c 692.1791.7991.8385.9791.6092.0392.2792.5392.23c
 1
 , c
 5
 , c
 689.8588.7988.2084.1787.5089.6690.3491.9291.79c
 4
 , c 5
 , c
 684.9884.9684.7679.3684.0884.8984.6285.6585.36c
 1 , c
 2
 , c
 3
 , c
 492.5992.5792.4291.6492.0492.3892.6192.7792.64c
 1
 , c
 2
 , c
 3
 , c 692.6292.4792.5086.9492.0292.4992.7392.7892.54c 1
 , c
 2 , c
 5
 , c
 692.2591.9391.8286.0191.9692.1492.3792.7792.49c 1
 , c
 4 , c
 5
 , c
 690.1589.5189.3484.6489.8489.4890.1291.9291.84c 3
 , c
 4 , c
 5
 , c
 688.7988.4288.3083.2588.4688.4688.7690.5189.94c 1
 , c
 2 , c
 3
 , c
 4
 , c
 592.7592.6492.2391.4292.1392.4992.7493.0792.81c 1
 , c
 2 , c
 3
 , c
 4
 , c
 692.5792.4892.3086.9691.9292.2592.5692.7792.56c
 1
 , c
 2
 , c
 3
 , c 5
 , c
 692.7392.5092.2086.9391.9192.5392.7693.0392.72c
 1
 , c
 2
 , c
 4
 , c 5
 , c
 692.1491.7091.0786.1990.9791.7192.2392.7892.46c
 1 , c
 3
 , c
 4
 , c
 5
 , c
 691.4191.0490.6385.7990.4190.9891.4692.4892.14c
 2
 , c
 3
 , c
 4
 , c
 5
 , c
 691.5190.9890.5285.7390.6091.1591.5092.6792.40c
 1
 , c 2
 , c
 3 , c
 4
 , c
 5
 , c
 692.6292.3692.2486.9591.9792.3892.6193.0992.64Table 11: Classification accuracy
of speech segments using different combination methods {3 classes}
 354A New Technique for Combining Multiple ClassifiersClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c 286.9786.9586.9586.4686.3086.6786.6488.1287.59c
 1
 , c
 684.3081.9881.9877.9776.6685.0184.5286.7186.39c
 5
 , c
 674.6073.9173.9170.3870.2075.0074.2076.4775.68c 1
 , c
 2 , c
 388.0988.0887.7787.1187.7687.5187.4988.7388.10c
 1 , c
 2
 , c
 686.4885.7686.0180.3885.3186.4986.4588.1987.50c 1
 , c
 5 , c
 684.3382.8081.8878.6481.2483.8884.5687.1686.45c
 4
 , c
 5
 , c
 679.2778.7377.8474.0777.2678.4678.5180.1779.39c
 1
 , c
 2
 , c
 3
 , c 488.0387.9187.8186.5887.2987.6787.4888.9888.09c 1
 , c
 2 , c
 3
 , c
 687.6487.1787.2682.5986.9587.3787.2588.6188.08c 1
 , c
 2 , c
 5
 , c
 686.5185.8485.9080.5486.1086.6986.5988.4787.57c 1
 , c
 4 , c
 5
 , c
 684.6183.6483.2879.4783.8583.9784.2987.3886.68c 3
 , c
 4 , c
 5
 , c
 683.8483.1282.8479.2883.0583.2283.5686.0084.95c 1
 , c
 2 , c
 3
 , c
 4
 , c
 587.9087.8787.4486.0187.3587.7287.3489.1588.15c
 1
 , c
 2
 , c
 3
 , c
 4
 , c 687.5787.2486.9882.7686.8587.3687.0888.9588.11c
 1
 , c
 2
 , c
 3
 , c 5
 , c
 687.6987.1786.9182.6486.9687.5287.3288.9388.12c
 1 , c
 2
 , c
 4
 , c
 5
 , c
 686.7186.0085.5481.2185.6386.4686.3988.6487.56c
 1
 , c
 3
 , c
 4
 , c
 5
 , c
 686.3685.8585.2681.5085.5786.1085.9588.2687.38c
 2
 , c 3
 , c
 4 , c
 5
 , c
 686.7085.9585.1881.6385.2386.0685.8188.3787.28c 1
 , c
 2 , c
 3
 , c
 4
 , c
 5
 , c
 687.6787.3387.0282.7686.9587.3586.9889.1488.01Table 12: Classification accuracy
of speech segments using different combination methods
 {6 classes}
 355Al-Ani & DericheClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c 284.5884.6184.6183.7783.6684.0684.1586.1185.60c
 1
 , c
 681.8578.5978.5971.4870.5082.5382.5184.4183.72c
 5
 , c
 670.4869.0469.0462.8762.6571.2269.9773.2471.94c 1
 , c
 2 , c
 385.4185.4285.0083.8185.0285.0085.3486.8586.15c
 1
 , c
 2
 , c 684.2083.4483.5374.1882.7084.1184.0186.0585.44c
 1
 , c
 5
 , c
 682.1880.2679.1972.3277.8681.2082.5385.1084.56c
 4
 , c 5
 , c
 675.9475.2374.4367.2373.4675.3075.5478.2176.92c
 1 , c
 2
 , c
 3
 , c
 485.4485.3085.2083.4184.9685.1185.1587.0586.13c 1
 , c
 2 , c
 3
 , c
 685.3484.9784.9976.4984.8285.1985.1486.7986.16c 1
 , c
 2 , c
 5
 , c
 684.4883.7783.7774.6183.9884.5884.5386.6785.86c 1
 , c
 4 , c
 5
 , c
 682.4181.4481.0373.6181.4981.6182.4485.6584.76c 3
 , c
 4 , c
 5
 , c
 681.0280.5180.1572.7479.8380.0680.4183.8682.25c 1
 , c
 2 , c
 3
 , c
 4
 , c
 585.5285.4384.9482.9384.7785.4585.4287.3286.32c 1
 , c
 2 , c
 3
 , c
 4
 , c
 685.5385.0184.4177.0484.5384.9685.0087.0986.09c
 1
 , c
 2
 , c
 3
 , c 5
 , c
 685.5485.1684.7676.6284.6685.3885.2587.1186.22c
 1
 , c
 2
 , c
 4
 , c 5
 , c
 684.6084.0383.3475.5183.6984.1784.4286.8285.78c
 1 , c
 3
 , c
 4
 , c
 5
 , c
 683.9783.3282.7475.8083.1683.6783.9186.5985.45c
 2
 , c
 3
 , c
 4
 , c
 5
 , c
 684.1483.4882.3275.4082.6683.2983.4486.5585.56c
 1
 , c 2
 , c
 3 , c
 4
 , c
 5
 , c
 685.3284.9384.7077.0884.5185.0785.1987.2685.23Table 13: Classification accuracy
of speech segments using different combination methods
 {9 classes} 356A New Technique for Combining Multiple ClassifiersClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c 296.1096.1096.1096.1096.1095.1696.0295.5195.80c
 1
 , c
 695.3894.9594.9594.9594.9596.2595.3696.1796.17c
 5
 , c
 690.2590.6990.6990.8390.8387.5189.3288.5988.66c 1
 , c
 2 , c
 396.6896.6896.9096.9796.9096.6196.9896.9196.54c
 1 , c
 2
 , c
 695.8195.7495.2395.3895.2396.1795.7395.5195.88c 1
 , c
 5 , c
 694.6694.7394.3795.0994.5194.9594.6296.2496.17c
 4
 , c
 5
 , c
 692.2092.1391.7792.6491.7091.4891.7591.1691.16c
 1
 , c
 2
 , c
 3
 , c 496.9096.9096.9796.6896.6196.6197.0596.6996.69c
 1
 , c
 2
 , c
 3
 , c
 697.0496.8296.9796.1096.5396.9796.9196.7696.69c
 1
 , c
 2
 , c 5
 , c
 695.8895.8895.7495.1695.9696.1795.9595.8095.80c
 1 , c
 4
 , c
 5
 , c
 695.8895.8195.5295.5295.4594.9595.9595.8896.24c
 3
 , c
 4
 , c
 5
 , c 695.0294.9594.3094.0194.3094.3095.1494.5594.40c
 1
 , c
 2
 , c
 3
 , c 4
 , c
 596.8296.7596.2596.5396.2596.8296.6996.7696.61c
 1 , c
 2
 , c
 3
 , c
 4
 , c
 696.1096.1095.4596.3295.4596.2596.0296.2496.10c
 1
 , c
 2
 , c
 3
 , c
 5
 , c
 696.5396.4696.1795.8896.1096.7596.4796.7696.54c
 1
 , c 2
 , c
 4 , c
 5
 , c
 695.7495.6795.3895.6095.3195.8195.8896.0295.95c 1
 , c
 3 , c
 4
 , c
 5
 , c
 696.3996.3295.6096.3995.5295.9696.2496.3996.24c
 2
 , c
 3
 , c
 4
 , c 5
 , c
 695.1695.2395.0295.4594.9595.1695.1495.5895.58c
 1 , c
 2
 , c
 3
 , c
 4
 , c
 5
 , c
 696.5396.6196.5396.1796.2595.9696.5496.5496.39Table 14: Speaker identification
accuracy using different combination methods {2 speakers}
 357Al-Ani & DericheClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c 289.2289.1789.1788.7088.9187.8289.2289.3389.17c
 1
 , c
 688.9188.5588.5587.6185.5889.3888.8689.9089.95c
 5
 , c
 680.3780.9080.9079.4477.9375.9079.4480.3779.85c 1
 , c
 2 , c
 390.5390.4790.3289.2890.0689.8090.4790.5891.20c
 1
 , c
 2
 , c
 690.3790.5889.9088.6590.0689.4390.5890.3290.21c
 1
 , c
 5
 , c
 688.7088.6087.3587.2586.9388.2988.9690.1190.58c
 4
 , c 5
 , c
 684.5484.1783.6083.2483.3483.0883.8684.6485.06c
 1 , c
 2
 , c
 3
 , c
 491.1091.1090.5889.5990.3290.9491.3191.7891.51c 1
 , c
 2 , c
 3
 , c
 691.2591.5191.3688.7090.0690.8991.6291.6292.04c 1
 , c
 2 , c
 5
 , c
 690.6390.4790.4788.8188.3990.1690.5890.9490.21c 1
 , c
 4 , c
 5
 , c
 690.3790.3289.4887.8788.5589.5490.4791.5791.46c 3
 , c
 4 , c
 5
 , c
 686.8886.7885.9484.6484.5485.7986.1587.4087.35c 1
 , c
 2 , c
 3
 , c
 4
 , c
 591.2091.2090.5889.5989.5990.8491.7291.9391.78c 1
 , c
 2 , c
 3
 , c
 4
 , c
 691.3691.3191.2588.9690.3290.8991.3691.9392.40c 1
 , c
 2 , c
 3
 , c
 5
 , c
 690.9991.1590.5888.8190.0690.7391.2091.6291.88c
 1
 , c
 2
 , c
 4
 , c 5
 , c
 691.4191.6790.9988.8690.4790.8991.4691.8392.09c
 1
 , c
 3
 , c
 4
 , c 5
 , c
 691.0590.9989.6988.3489.5989.8591.0491.8391.41c
 2 , c
 3
 , c
 4
 , c
 5
 , c
 690.3790.2789.2288.2488.8188.9689.3890.4290.32c
 1
 , c
 2
 , c
 3
 , c
 4
 , c
 5
 , c 691.5191.3691.7888.9190.8991.4691.4691.9391.98Table 15: Speaker identification
accuracy using different combination methods {3 speakers}
 358A New Technique for Combining Multiple ClassifiersClassifiersWSAvMdMxMVFIDS0DS1DS2c
 1
 , c 287.4587.5387.5387.2986.9886.5587.6987.7387.41c
 1
 , c
 684.7883.7983.7983.4081.5585.4883.8785.3784.78c
 5
 , c
 674.0074.4774.4772.0371.2869.8371.9973.2973.13c 1
 , c
 2 , c
 389.1889.1888.2488.1687.9288.3289.1089.3888.87c
 1
 , c
 2
 , c 687.9688.0887.5386.3587.3387.2587.6588.0887.41c
 1
 , c
 5
 , c
 685.6085.4183.3283.5283.1284.6684.6286.1585.13c
 4
 , c 5
 , c
 680.8480.6879.5877.1878.7680.2978.8081.5180.68c
 1 , c
 2
 , c
 3
 , c
 489.7789.8589.6588.0488.5989.6589.6190.1789.61c
 1
 , c
 2
 , c
 3
 , c
 689.2689.2288.7187.2988.2088.7588.9189.5088.71c
 1
 , c 2
 , c
 5 , c
 687.8487.3386.9086.3587.1087.3787.6588.3287.69c
 1
 , c
 4
 , c
 5
 , c
 687.0686.7486.2383.9985.4486.8286.6688.1687.14c
 3
 , c
 4
 , c
 5
 , c
 684.7484.3884.3080.7283.0184.0783.8385.6884.66c
 1
 , c
 2
 , c
 3
 , c
 4
 , c
 589.6189.6189.0288.0088.4789.1889.6190.0189.73c
 1
 , c
 2
 , c
 3
 , c
 4
 , c
 689.7789.7388.8387.4588.8789.0689.6189.9389.50c
 1
 , c
 2
 , c
 3
 , c
 5
 , c
 689.2689.1488.3687.2988.0488.5589.1889.7388.59c
 1
 , c 2
 , c
 4 , c
 5
 , c
 689.1089.0688.2086.7488.3688.3688.9589.6588.95c 1
 , c
 3 , c
 4
 , c
 5
 , c
 688.0087.8886.9885.2187.4587.9287.8889.3889.02c
 2
 , c
 3
 , c
 4
 , c 5
 , c
 688.9988.8787.2185.8087.0688.0087.9289.2688.75c
 1 , c
 2
 , c
 3
 , c
 4
 , c
 5
 , c
 689.4289.2688.9187.3788.2489.2289.5489.8989.73Table 16: Speaker identification
accuracy using different combination methods {4 speakers}
 359Al-Ani & Deriche References
 Aha, D. {1995}. Machine learning. Tutorial presented at The 1995 Artificial
Intelligence and Statistics Workshop.
 Alkoot, F., & Kittler, J. {1999}. Experimental evaluation of expert fusion
strategies. Pattern
 Re c o gnition Letters, 20, 1361{1369. Chen, K., & Chi, H. {1998}. A method
of combining multiple classifiers through soft com-
 petition on different feature sets. Neuro c omputing, 20, 227{252.
 Cho, S., & Kim, J. {1995a}. Combining multiple neural networks by fuzzy
integral for robust
 classification. IEEE Tr ansactions on Systems, Man and Cybernetics, 25,
380{384.
 Cho, S., & Kim, J. {1995b}. Multiple networks fusion using fuzzy logic.
IEEE Tr ansactions
 on Neural Networks, 6, 497{501. Denoeux, T. {2000}. A neural network classifier
based on Dempster{Shafer theory. IEEE
 Tr ansactions on Systems, Man and Cybernetics, 30, 131{150.
 Dietterich, T. {1999}. An experimental comparison of three methods for constructing
en-
 sembles of decision trees: Bagging boosting and randomization. Machine Le
arning,
 40, 139{158.
 Hansen, L., & Salamon, P. {1990}. Neural network ensembles. IEEE Tr ansactions
on Pattern Analysis and Machine Intel ligence, 12, 993{1001. Hashem, S.,
& Schmeiser, B. {1995}. Improving model accuracy using optimal linear com-
 binations of trained neural networks. IEEE Tr ansactions on Neural Networks,
6,
 792{794. Ho, T., Hull, J., & Srihari, S. {1994}. Decision combination in
multiple classifier system.
 IEEE Tr ansactions on Pattern Analysis and Machine Intel ligence, 16, 66{75.
Kittler, J., Hatef, M., Duin, R., & Matas, J. {1998}. On combining classifiers.
IEEE
 Tr ansactions on Pattern Analysis and Machine Intel ligence, 20, 226{239.
 Kohlas, J., & Monney, P. {1995}. A mathematical theory of hints. An appro
ach to the
 Dempster-Shafer theory of evidence. Berlin: Springer{Verlag.
 Lam, L., & Suen, C. {1995}. Optimal combinations of pattern classifiers.
Pattern Re c o gnition L etters, 16, 945{954. Liu, W., & Bundy, A. {1992}.
The combination of different pieces of evidence using incidence
 calculus. Tech. rep. RP 599, Dept. of Artificial Intelligence, Univ. of
Edinburgh.
 Mandler, E., & Schurmann, J. {1988}. Combining the classification results
of independent
 classifiers based on the dempster{shafer theory of evidence. In Gelsema,
E., & Kanal,
 L. {Eds.}, Pattern re c o gnition and artificial intel ligence, pp. 381{393.
North-Holland.
 MIT, SRI, & TI {1990}. DARPA TIMIT acoustic-phonetic continuous speech corpus..
 http://www.ldc.upenn.edu/doc/TIMIT.html.
 Rogova, G. {1994}. Combining the results of several neural network classifiers.
Neural
 Networks, 7, 777{781.
 Shafer, G. {1976}. A mathematical theory of evidence. Princeton University
Press.
 360A New Technique for Combining Multiple Classifiers
 Smets, P. {1990}. The combination of evidence in the transferable belief
model. IEEE
 Tr ansactions on Pattern Analysis and Machine Intel ligence, 12, 447{458.
 Smets, P. {1998}. The transferable belief model for quantified belief representation.
In
 Gabbay, D., & Smets, P. {Eds.}, Handbo ok of defeasible re asoning and uncertainty,
 pp. 267{301. Kluwer. Smets, P. {2000}. Data fusion in transferable belief
model. In 3rd Intl. Conf. Information
 Fusion, pp. 21{33.
 Sugeno,M. {1977}. Fuzzy measures and fuzzy integrals: a survey. In Gupta,
M., Saridis, G., & Gaines, B. {Eds.}, Fuzzy automata and decision pro c esses,
pp. 89{102. Amsterdam:
 North-Holland. USC {1981}. USC-SIPI image database.. http://sipi.usc.edu/services/database/.
 Xu, L., Krzyzak, A., & Suen, C. {1992}. Methods of combining multiple classifiers
and their
 applications to handwriting recognition. IEEE Tr ansactions on Systems,
Man and
 Cybernetics, 22, 418{435.
 361