| ![]() | |||||||||
weighting schemes [Wettschereck, Aha & Mohri, 1995] and other enhancements that each system provides.
Section 2 provides background information on distance functions used previously. Section 3 introduces a heterogeneous distance function that combines Euclidean distance and VDM. Sections 4 and 5 present two extensions of the Value Difference Metric which allow for direct use of continuous attributes. Section 4 introduces the Interpolated Value Difference Metric (IVDM), and Section 5 presents the Windowed Value Difference Metric (WVDM).
Section 6 presents empirical results comparing three commonly-used distance functions with the three new functions presented in this paper. The results are obtained from using each of the distance functions in an instance-based learning system on 48 applications. The results indicate that the new heterogeneous distance functions are more appropriate than previously used functions on applications with both nominal and linear attributes. Section 7 provides conclusions and future research directions.
2. Previous Distance Functions
As mentioned in the introduction, there are many learning systems that depend upon a good
distance function in order to be successful. A variety of distance functions are available for such uses,
including the Minkowsky [Batchelor, 1978], Mahalanobis [Nadler & Smith, 1993], Camberra,
Chebychev, Quadratic, Correlation, and Chi-square distance metrics [Michalski et. al, 1981; Diday,
1974]; the Context-Similarity measure [Biberman, 1994]; the Contrast Model [Tversky, 1977];
hyperrectangle distance functions [Salzberg, 1991; Domingos, 1995] and others. Several of these
functions are defined in Figure 1.
Although there have been many distance functions proposed, by far the most commonly used is the
Euclidean Distance function, which is defined as:
E(x, y) = (x
a
- y
a
)
2
a=1
m
?
(1)
where x and y are two input vectors (one typically being from a stored instance, and the other an input
vector to be classified) and m is the number of input variables (attributes) in the application. The
square root is often not computed in practice, because the closest instance(s) will still be the closest,
regardless of whether the square root is taken.
An alternative function, the city-block or Manhattan distance function, can require less
computation and is defined as:
M(x, y) = x
a
- y
a
a=1
m
?
(2)
The Euclidean and Manhattan distance functions are equivalent to the Minkowskian r-distance function [Batchelor, 1978] with r?=?2 and 1, respectively.
2.1. Normalization
One weakness of the basic Euclidean distance function is that if one of the input attributes has a
relatively large range, then it can overpower the other attributes. For example, if an application has
just two attributes, A and B, and A can have values from 1 to 1000, and B has values only from 1 to 10,
then B?s influence on the distance function will usually be overpowered by A?s influence. Therefore,
distances are often normalized by dividing the distance for each attribute by the range (i.e., maximumminimum)
of that attribute, so that the distance for each attribute is in the approximate range 0..1.
Normalization can also be achieved by dividing by the standard deviation for each attribute, or by
other statistical methods.
Related to the idea of normalization is that of using attribute weights and other weighting schemes.