page 1  (14 pages)
2to next section

University of Leeds

SCHOOL OF COMPUTER STUDIES

RESEARCH REPORT SERIES

Report 95.4

Generic 3-D Shape Model:

Acquisitions and Applications

by

X Shen & D C Hogg

Division of Artificial Intelligence

February 1995

Abstract

The paper describes a method for generating a generic deformable model from a training set of shapes depicted in a corpus of image sequences. Individual shapes in the training set are extracted automatically from the image sequences and represented by the control points of a B-spline surface. The generic model is derived by principal component analysis on the aligned training shapes. Using the acquired generic models, 3-D shape recovery, tracking and object identification are implemented within one procedure. Experimental results are presented both for generation and application of the model within the domain of vehicles appearing in traffic scenes.

1 Introduction

3-D shape models are extensively used for tracking and recognising known objects (e.g. [1, 2, 3]). In the work reported in [4, 5], a method is proposed for recovering 3-D shapes of objects from 2-D images. The model obtained in this work is specific to each presented object. However, a generic shape model being able to represent a class of objects is often more useful in many practical applications where the objects of the same class are not identical in shape (e.g. vehicles, people). We wish to produce such a model by generalisation from the specific shapes of objects in a training set.

A necessary property of such a generic model is that it should be capable of characterising any shape instance in the modelled class. Physically-based models [6, 7] achieve this by introducing dynamical deformations to accommodate variations in shape. They have been successfully used in the applications such as shape recovery and tracking [8, 5, 9]. A significant problem in utilising these models is that the scope of the generic model accommodates all shapes within the solution space of the dynamical simulation, but is not restricted to a specific target class of objects. As a result, the associated methods are often sensitive to image noise and background clutter, and therefore, may fail to detect and track a target shape correctly.

One solution is to add variable parameters to rigid geometric models. Solina and Bajcsy [10] use superquadrics as a generic shape model. A restricted solution space can be obtained by specifying the range of admissible shape parameters. This model accommodates several shape variations including tapering and bending.

An alternative approach is to constrain the deformation of a flexible shape model to a subspace of the total configuration space. Ideally, this subspace should be learned from examples. This is the approach adopted by Cootes et al. [11] who present a

1

method to produce a 2-D generic shape model (point distribution model") represented by a set of landmarks. Baumberg and Hogg [12] extend their method by (1) describing models as 2-D spline curves, and (2) acquiring training shape instances automatically from images and therefore automating the entire procedure of model construction.

In this paper, we present a method for automatically producing a generic 3-D shape model, representing a given class of objects, directly from sequences of 2- D images, and demonstrates its applications to 3-D shape recovery, tracking and recognition. For experimental purposes, the objects we are interested in are vehicles, although the method can also be applied to other classes of object. The generic model is composed of a mean shape and a orthonormal basis extracted by analysing the variances of shapes in a set of training shapes obtained using the method proposed in [5]. Each training shape is first represented by a shape vector which consists of the set of control points of a B-spline surface. The shape vectors are aligned by scaling and translation. Principle component analysis is used to obtain a set of unit eigenvectors describing the most significant variances in the shape. These eigenvectors then form a orthonormal basis defining the space of admissible shapes. The produced model is flexible in the variances of shape, and is also restricted to a specific class of objects. Applications in 3-D shape recovery, tracking and recognition can then be easily achieved within one procedure by fitting the model to the object in each image in order to estimate the pose parameters of the model, and the shape parameters which in turn are used to identify the object.

2 Acquisition of Generic Model

2.1 3-D shape recovery

In [5], a method for recovering the 3-D shapes of objects from 2-D image sequences is proposed, assuming the object is rigid and mirror symmetrical, and moves on a ground plane. With this method, instances of various shapes in a generic class of shapes can be obtained. Figure 1 shows some results from this work. The recovered shape model is a closed surface with two poles which is represented discretely by P ?Q sample points r(p; q) on the surface

r(p; q) =

2
6664

U(p; q)
V (p; q)
W (p; q)

3
7775 ;
r(p; 0) = r(p; Q)
r(0; 0) = r(0; q)
r(P ? 1; 0) = r(P ? 1; q)

; p = ; 1; : : : ; P ? 1
q = ; 1; : : : ; Q (1)

2

Figure 1: Recovery of 3-D vehicle shapes from images. Each target object is shown in a frame from the corresponding sequence. Below this frame is shown the obtained shape model.

The model is mirror symmetrical about a vertical plane through the sample points r(p; 0) and r(p;
hQ2
i
), p = ; 1; : : : ; P ? 1.

2.2 Extracting a shape vector

To concisely represent the shapes in chosen training set, a shape vector is extracted from each shape instance. The shape vector consists of a set of control points, X ij , <= i < M and <= j < N , which defines a bicubic B-spline surface S(u; v) =
fSx(u; v); Sy(u; v); Sz(u; v)gT approximating the model r(p; q), <= p < P and <=
q <= Q. The spline S(u; v) is expressed as [13]

S(u; v) =
M?1
X

i=0

N?1
X

j=0
X ij Bi(u)Bj(v) (2)

where, Bi(u) and Bj(v) are the 4th order basis functions defined on the following knot vectors respectively:

u = [0; ; ; ; 1; 2; ? ? ? ; M ? 3; M ? 3; M ? 3; M ? 3]
v = [0; 1; 2; ? ? ? ; N ? 1; N;N + 1; N + 2]

For the situation <= v < 3, the basis functions BN?k(v), 1 <= k < 3 ? [v], are simply
taken as BN?k(N + v) to ensure the surface is closed along the coordinates v. Also, we let X00 = X0j and XM?1;0 = XM?1;j, for j = ; 1; : : : ; N?1, to ensure the closed
surface has two poles.

3

Minimising the squared error between the generated surface r(p; q) and the spline approximation leads to the following equations:

r(p; q) =
M?1
X

i=0

N?1
X

j=0
X ij Bi(upq)Bj(vpq) (3)

The parameter values upq and vpq are computed as following:

upq =

8><
>:
for p = (M ? 3)
Ppi=1jr(i;q)?r(i?1;q)j
PP?1
i=0 jr(i;q)?r(i?1;q)j for p > (4)

vpq =

8><
>:
for q = (N ? 1)
Pqj=1jr(p;j)?r(p;j?1)j
PQ?1
j=0 jr(p;j)?r(p;j?1)j for q > (5)

Writing equation (3) in matrix form, we have

? = B? (6)

where, ? and ? are k?3 and l ?3 (k = (P ?2)Q+2, and l = (M ?2)N +2) matrices, of the following forms:

? =

2
6664

U00 U10 U11 ? ? ? U1;Q?1 ? ? ? UP?2;Q?1 UP?1;0
V00 V10 V11 ? ? ? V1;Q?1 ? ? ? VP?2;Q?1 VP?1;0
W00 W10 W11 ? ? ? W1;Q?1 ? ? ? WP?2;Q?1 WP?1;0

3
7775

T

(7)

? =

2
6664

X00 X10 X11 ? ? ? X1;N?1 ? ? ? XM?2;N?1 XM?1;0
Y00 Y10 Y11 ? ? ? Y1;N?1 ? ? ? YM?2;N?1 YM?1;0
Z00 Z10 Z11 ? ? ? Z1;N?1 ? ? ? ZM?2;N?1 ZM?1;0

3
7775

T

(8)

B is a k ? l matrix,

Bij(upq; vpq) =

8>>><
>>>:
B0(upq) for j = BM?1(upq) for j = l ? 1
Bm(upq)Bn(vpq) otherwise
(9)

where m =
h j?1
N
i
+ 1, and n = (j ? 1) mod N .
For each training shape, the control points X ij can be calculated by solving equation (6). However, the computation of matrix B and its inverse is expensive. Barsky and Greenberg [14] propose a method to solve the equation without computing the inverse of B, by investigating the property of the matrix B. Here, we use an alternative way based on that proposed by Baumberg and Hogg [12] for 2-D case.

4

For each training shape, we calculate P ?Q new data points r0(p; q) which correspond to the fixed uniformly spaced parameter values:

u0pq ? (M ? 3) p
P ? 1
v0pq ? (N ? 1) q
Q? 1

The data r0(p; q) can be computed by iteratively applying linear interpolations along p coordinates followed by linear interpolations along q coordinates using

U (i; q) ? U(i ? 1; q)
u0i;q ? ui?1;q = U(i; q) ? U(i ? 1; q)
uiq ? ui?1;q ; ui?1;q < u0iq <= uiq; < i < M

U (p; j) ? U(p; j ? 1)
v0pj ? vp;j?1 = U(p; j) ? U(p; j ? 1)
upj ? up;j?1 ; vp;j?1 < v0pj <= vpj; < j < N

with U (0; q) = U(0; q) and U (p; 0) = U(p; 0). Interpolations for V and W coordinates
are analogous. In our experiments, the above iterative interpolations always succeeded in producing the required new data points r0(p; q), although a formal proof of the convergence needs to be developed.

Using these new data points, the matrix B (and therefore, its inverse) is fixed for all of the training shapes, and only needs to be computed once. The computed control points of the spline consists of the shape vector G,

G =
h
X00 Y00 Z00 X10 : : : ZM?2;N?1 XM?1;0 YM?1;0 ZM?1;0
iT (10)

For the convenience of the following discussion, the coordinate triples in the shape vector are indexed as (Xi; Yi; Zi), i = 0, 1, : : :, K ? 1.

2.3 Aligning shape vectors

Having produced a set of shapes, the next step is to align each shape with a reference shape. In our implementation, the first training shape is used as the reference shape. Due to the mirror symmetry of the shape, alignment of each shape involves only scaling and translation. For a given shape, this transformation can be estimated using a linear least squares approximation.
Let ^G be the vector of the reference shape, and G0 be the shape vector obtained by scaling and translating the shape to be aligned, G, i.e.
2
6664

X i
Y i
Z i

3
7775 =

2
6664

?Xi + tx
?Yi + ty
?Zi + ty

3
7775 (11)

5

where, ? is the scale parameter, and tx, ty, and tz are the translation parameters. To align the shape vector G with the reference shape vector ^G, we choose the ?, tx, ty, and tz which minimise the following error:

Error = ( ^G ?G0)T ( ^G ?G0) (12)

Using the standard method, we obtain the following equations from which the solution is derived: 2
6666664

a1 a2 a3 a4

a2 a5 a3 a5 a4 a5

3
7777775

2
6666664

?
tx
ty
tz

3
7777775
=

2
6666664

b1
b2
b3
b4

3
7777775
(13)

where,

a1 =
K?1
X

i=0
(X2i + Y 2i + Z2i ); a2 =
K?1
X

i=0
Xi

a3 =
K?1
X

i=0
Yi; a4 =
K?1
X

i=0
Zi; a5 = K

b1 =
K?1
X

i=0
( ^XiXi + ^YiYi + ^ZiZi); b2 =
K?1
X

i=0
^Xi

b3 =
K?1
X

i=0
^Yi; b4 =
K?1
X

i=0
^Zi

2.4 Principle components analysis

The control points in the shape vector are treated as the landmark points in the point distribution model" (Cootes, et al [11]). For the aligned shapes in the training set, the deviations from the mean shape G = 1n
Pni=1 Gi are analysed using principle
component analysis [15]. n is the total number of shapes in the training set. For each aligned shape vector G, its deviation, dG, from the mean is computed as follows:
dG = G?G (14)

A 3K ? 3K covariance matrix C is then calculated using

C = E(dG dGT ) (15)

where, E(? ? ?) is the expected value.

The unit eigenvectors, ei (eTi ei = 1, i = 1; 2; : : : ; 3K) of the covariance matrix then correspond to the modes of variations of the training shapes. It has also been shown that the eigenvector corresponding to the largest eigenvalue describes the most

6

significant mode of variation, and that the proportion of the total variance explained by each eigenvector is equal to the corresponding eigenvalue [15]. To produce a concise, yet effective generic shape model, we choose s eigenvectors corresponding to the s most significant modes of variations. s is determined using following criterion: Psi=1 >=i
P3K
i=1 >=i > >=s (16)

where, >=i is the i'th eigenvalue of C, >=i >= >=i+1, and >=s is a pre-selected proportion (e.g. 95%) to the total variation. The generic shape model is then expressed as follows:

G = G+ ew (17)

where, e = [e1 e2 ? ? ? es] is the matrix consisting of the first s eigenvectors; w = [w1 w2 ? ? ? ws]T is a vector of weights (called shape parameters) for each eigenvector.

The s eigenvectors then form a orthonormal basis for the space of admissible shapes. Shape models of the objects in the same class as the training ones can then be generated by varying the parameters wi within suitable limits. Since the variance of wi over the training set is >=i, we choose wi using the following criterion:

?3
q
>=i <= wi <= 3
q
>=i (18)

which means that shapes distributed within three standard deviations of the mean are acceptable.

2.5 Experimental results

We have applied the above method to a set of 15 vehicles whose image sequences are captured by monitoring a parking space. Three of these objects are shown in Fig. 1. Each shape vector consists of 162 control points. We found the first 6 modes accounted for over 95% of the variance of the training shapes. Some of the modes of variations are shown in Fig. 2. The results show that the first mode characterises the variation from a van to a car, and the second mode characterises the variation from a hatchback to a saloon. Analysis on other modes is analogous.

3 Applications

Using a generic shape model, we can achieve 3-D shape recovery, tracking and recognition within one procedure which can be implemented in the following way. Beginning with the mean model for the first image in the sequence, the model is first transformed to a state which is most consistent with the object in 3-D. The shape of the

7

(a)

(b)

(c)

Figure 2: Effects of varying (a) the first shape parameter, (b) the second shape parameter, and (c) the fourth shape parameter by ?2:0 standard deviations.

object is then estimated by finding shape parameters wi which minimise a measure of the distance between the object profile and the profile of the model defined by these parameters. The model characterised by these estimated shape parameters is then the best approximation to the object shape to date, and is fed to the same operations applied to the next image. Shape parameters estimated at the end of the sequence are used to define the recovered shape of the object. Recognition of the object is achieved by comparing these shape parameters with those of the models in the database.

3.1 Estimating the transformation

Since the rigid object is assumed to move on a ground plane, the transformation from the model to the object is characterised by the scale ?, orientation ?, and translation (px; py; pz) (pz is fixed) in the following transformation matrix:
2
6666664

? cos ? ?? sin ? px
? sin ? ? cos ? py
? pz
1

3
7777775

The orientation of the object is again estimated by back-projecting the image motion trajectory to the ground plane [5]. Other transformation parameters are then estimated by minimising a measure of distance between the object profile and the implied model profile for that orientation.

Let Xj , j = 1; 2; : : : ; N be the N points in the shape vector which approximately project to the model profile. Define a merit function of the unknown transformation

8

x
xjx 'j
model profile
object region
xc

Figure 3: Determination of correspondence points between model profile and object profile.

parameters (?; px; py) as follows:

?1(?; px; py) =
NX

j=1
kx0j ? xjk (19)

where, x0j is the intersection with the object profile of the ray from x0, the projection of the model center, passing through xj, the projection of Xj (Fig. 3)1. k ? k is the standard Euclidean norm. If there is more than one intersection point, the farthest one from x0 is used. In the case when both the projection of the model center and xj are outside the object region, then xc, the centroid of the object region is substituted for x0.

The aim is to find the parameters (?; px; py) which minimise the function ?1. The perspective projection between the Xj and xj means ?1 is non-linear. A non-linear optimisation method [16] is then used to estimate (?; px; py). The initial estimate for the parameters are taken as the values for the previous image.

3.2 Estimating shape parameters

Once the transformation is determined, the object shape is estimated by finding the independent shape parameters wi which define the best fit to the object shape. Estimation of the wi is achieved, again by minimising a merit function of unknowns wi, similarly defined as follows:

?2(w1; w2; : : : ; ws) =
NX

j=1
kx0j ? xjk (20)

In order to ensure that the estimated shape is plausible, the shape parameters are constrained to lie within a hyper-ellipsoid [17]. For each image, the shape parameters are updated with the above estimate, and then globally constrained using the 1The calibration matrix which allows the projection from the world coordinates to image (pixel) coordinates is assumed to be known (see [5] for details)

9

following scheme:

d2 =
sX

i=1
w2i
>=i (21)

w0i =

8<
:
(dmax=d)wi d > dmax
wi otherwise (22)

This global constraint also prevents the shape recovery from being sensitive to errors in the extraction of the object profile.

3.3 Recognition

The estimated shape parameters can be used to form a measure for object recognition or classification. Pentland and Sclaroff [7] use correlation of two vectors describing two objects as a measure for comparison. In our case, the contribution of each shape parameter to the description of the object shape is not equal. Shape parameters corresponding to the most significant modes of variation describe the most common changes in the shape, and often have larger absolute values than other parameters. Simply using correlation may ignore the effect of the parameters corresponding to the less significant modes of variation which, on the contrary, are normally crucial in discriminating the different objects. We therefore use the following measure to compare two objects with shape parameters w1 and w2:

ffl =

s
Psi=1
?w1i>=i ? w2i>=i
?2

s
Psi=1
?w1i>=i
?2
+
s
Psi=1
?w2i>=i
?2 (23)

The eigenvalues in the measure are used to normalise the effect of each shape parameter. This measure is proportional to the distance between two vectors. The value of the ffl is in the interval [0; 1]. For two similar shapes, the ffl approaches to 0.

3.4 Experimental results

We applied the above method to several image sequences where the objects may or may not be within the training set (but belong to the same class). Fig. 4 shows the recovered object shapes from 6 sequences respectively. The objects shown in (a), (b) and (c) are also used in the training set. The recovered shape for the object (f) is not correct. This is because the training set does not include this kind of object. The recovered shape is the most similar shape to this object among the shapes in the training set.

10

(a) (b) (c)

(d) (e) (f)

Figure 4: Recovered object shapes using generic shape model.

The recovered shapes are compared each other by computing the measure (23). The results are given in table 1. It can be seen that object (a) is more similar to object (e) than others, and object (b) is more similar to object (d) than others. Analysis on other pairs of objects is analogous. Among all pairs of objects, (b) and (d) are the most similar.

4 Conclusion

We have proposed a method for producing a generic 3-D shape model by statistically analysing a set of training shapes recovered automatically from 2-D image sequences.

11

a b c d e f
a 0.00 0.74 0.82 0.77 0.51 0.88
b 0.74 0.00 0.68 0.45 0.89 0.54
c 0.82 0.68 0.00 0.58 0.68 0.61
d 0.77 0.45 0.58 0.00 0.82 0.61
e 0.51 0.89 0.68 0.82 0.00 0.97
f 0.88 0.54 0.61 0.61 0.97 0.00

Table 1: Results of recognition

The model is represented by a set of control points of a B-spline surface, and consists of a mean shape and a orthonormal basis defining the shape space. With a global constraint, the model can provide a sensible solution space which we believe supports robust model-based applications. Applications of utilising this model for 3-D shape recovery, tracking and recognition have been demonstrated. Experimental results have shown that the method gives very encouraging results.

References

[1] P. J. Besl and R. C. Jain. Three-dimensional Object Recognition. ACM Computing Surveys, 17(1):75{145, March 1985.

[2] R. T. Chin and C. R. Dyer. Model-based Recognition in Robot Vision. ACM Computing Surveys, 18(1):67{108, March 1986.

[3] D. G. Lowe. Robust Model-based Motion Tracking Through the Integration of Search and Estimation. International Journal of Computer Vision, 8(2):113{122, 1992.

[4] X. Shen and D. Hogg. Shape Models from Image Sequences. In Jan-Olof Eklundh, editor, Computer Vision { ECCV'94, volume I, pages 225{230. Springer-Verlag, 1994.

[5] X. Shen and D. Hogg. 3-D Shape Recovery Using A Deformable Model. Image and Vision Computing Journal, 1995. to appear.

[6] D. Terzopoulos, J. Platt, A. Barr, and K. Fleischer. Elastically Deformable Models. ACM Computer Graphics, 21(4):205{214, July 1987.

12

[7] A. Pentland and S. Sclaroff. Closed-form Solution for Physically Based Shape Modelling and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-13(7):715{729, July 1991.

[8] D. Terzopoulos, A. Witkin, and M. Kass. Constraints on Deformable Models: Recovering 3D Shape and Nonrigid Motion. Artificial Intelligence, 36(1):91{123, 1988.

[9] D. Terzopoulos and D. Metaxas. Tracking Nonrigid 3D Objects. In A. Blake and A. Yuille, editors, Active Vision, pages 75{89. MIT Press, 1992.

[10] F. Solina and R. Bajcsy. Recovery of Parametric Models from Range Images: The Case for Superquadrics with Global Deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-12(2):131{147, February 1990.

[11] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Training Models of Shape from Sets of Examples. In David Hogg and Roger Boyle, editors, British Machine Vision Conference 1992, pages 9{18, Leeds, UK, September 1992. Springer-Verlag.

[12] A. Baumberg and D. Hogg. Learning Flexible Models from Image Sequences. In Jan-Olof Eklundh, editor, Computer Vision { ECCV'94, volume I, pages 299{ 308. Springer-Verlag, 1994.

[13] R. H. Bartels, J. C. Beatty, and B. A. Barsky. An Introduction to Splines for use in Computer Graphics and Geometric Modelling. Morgan Kaufmann, 1987.