
Exponentiated Gradient Versus
Gradient Descent for Linear
Predictors
Jyrki Kivinen
Manfred K. Warmuth
UCSCCRL9416
June 21, 1994
Revised December 7, 1995
Baskin Center for
Computer Engineering & Information Sciences
University of California, Santa Cruz
Santa Cruz, CA 95064 USA
abstract
We consider two algorithm for online prediction based on a linear model. The algorithms are the wellknown gradient descent (GD) algorithm and a new algorithm, which we call EG?. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG? algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worstcase loss bounds for EG? and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG? has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments, which show that our worstcase upper bounds are quite tight already on simple artificial data.