P. Stone, R. E. Schapire, M. L. Littman, J. A. Csirik and D. McAllester
Volume 19, 2003
Links to Full Text:Journal of ArtificialIntelligence Research 19 {2003} 209-242Submitted 12/02;
published 9/03
Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous,
InteractingAuctions
Peter Stonepstone@cs.utexas.edu
Dept. of ComputerSciences, The University of Texasat Austin
1 University Station C0500, Austin, Texas 78712-1188 USA
Robert E. Schapire schapire@cs.princeton.edu Department of Computer Science,
Princeton University
35 Olden Street, Princeton, NJ 08544 USA
Michael L. Littmanmlittman@cs.rutgers.edu
Dept.of Computer Science, Rutgers University Piscataway, NJ 08854-8019 USA
Janos A. Csirik janos@pobox.com D. E. Shaw & Co.
120 W 45thSt, New York, NY 10036 USA
David McAllester mcallester@tti-chicago.edu Toyota TechnologicalInstituteat
Chicago
1427 East 60th Street, Chicago IL, 60637 USA
Abstract Auctions are becoming an increasinglypopular method for transacting
business, espe- cially over the Internet. Thisarticle presents a general
approach to buildingautonomous
bidding agents to bid in multiple simultaneous auctions for interacting
goods. A core
component of ourapproach learns a model of the empiricalpricedynamics based
on past
data and uses the model to analytically calculate, to the greatest extent
possible, optimal
bids. We introduce a new and general boosting-based algorithm for conditionaldensity
estimation problems of this kind,i.e., supervisedlearning problems in whichthe
goal is to
estimate the entireconditional distribution of the real-valuedlabel. This
approach is fully implemented as ATT ac-2001, a top-scoring agent in the
second Trading Agent Competition
{TAC-01}. We present experimentsdemonstrating the effectiveness of our boosting-based
price predictor relative to several reasonable alternatives.
1.Introduction
Auctions are anincreasingly popular method for transacting business, especially
over the
Internet.In an auction for a single good, it isstraightforward to create
automated bidding strategies|an agent could keep biddinguntil reaching a
target reserve price, or itcould
monitor the auction and place awinning bid just before the closing time
{knownas sniping}.
When bidding for multiple interacting goods in simultaneousauctions, on
the other
hand, agents mustbe able to reason about uncertainty and make complex value
assess-
ments. For example, an agent bidding on one's behalf in separate auctions
for a camera and flash may end up buying the flash andthen not being able
to find an affordablecamera.
Alternatively, if bidding forthe same good in several auctions, it maypurchase
two flashes
when only one was needed.
c
fl2003AI Access Foundation and Morgan KaufmannPublishers. All rights reserved.Stone,
Schapire, Littman, Csirik, & McAllester
This article makesthree main contributions. The first contribution is a
general ap-
proachto buildingautonomous biddingagentsto bid in multiple simultaneous
auctions for interacting goods. Westart with the observation that the key
challenge in auctions is the
prediction of eventual prices of goods: withcomplete knowledge of eventual
prices, there are direct methods for determining the optimalbids to place.
Our guiding principle is to havethe agent model itsuncertainty in eventual
prices and, tothe greatest extent possible,
analyticallycalculate optimal bids.
To attack theprice prediction problem, we propose a machine-learning approach:
gather
examplesof previous auctions and the prices paid in them,then use machine-learning
meth-
odsto predict these prices based on available features in the auction. Moreover,forour
strategy, weneeded to be able to model the uncertainty associatedwith predicted
prices;
in other words, we needed to be able to sample from a predicteddistribution
of prices
given the currentstate of the game. This can be viewed asa conditional density
estimation problem, that is, a supervised learningproblem in which the goal
is to estimate the entire
distribution of a real-valued label given a description of currentconditions,
typically in the
formof a feature vector. The second main contribution of this article is
a new algorithm for solving such general problems based onboosting {Freund
& Schapire, 1997; Schapire&
Singer, 1999}.
Thethird contribution of this article is a complete descriptionof a prototype
implemen-
tation of ourapproach in the form of ATT ac-2001, a top-scoring agent
1 in the second Trading
Agent Competition{TAC-01}that was held in Tampa Bay, FL on October 14, 2001
{Well- man, Greenwald, Stone, & Wurman, 2003a}. The TACdomain was the main
motivation for the innovations reported here.ATT ac-2001 builds on top ofATT
ac-2000 {Stone, Littman, Singh, & Kearns, 2001}, the top-scoring agentat
TAC-00, but introduces a fundamentally
new approach to creating autonomousbiddingagents.
We present detailsof ATT ac-2001 as an instantiation ofits underlying principles
that we believe have applications in a wide variety of bidding situations.
ATT ac-2001 uses a predic-
tive,data-driven approach to biddingbased on expected marginal values of
all available
goods. In this article, we present empirical results demonstrating the robustnessand
effec-
tiveness of ATT ac-2001's adaptive strategy. We also report on ATT ac-2001's
performance at
TAC-01 and TAC-02 and reflect on some of the key issues raised during the
competitions.
The remainder of the article isorganized as follows. In Section 2, wepresent
our
general approach to biddingfor multiple interacting goods in simultaneous
auctions. In
Section3, we summarize TAC, the substrate domain for our work. Section 4
describes our boosting-basedprice predictor. InSection 5, we give the details
ofATT ac-2001. In Section 6, we present empirical results including asummary
of ATT ac-2001's performancein TAC-
01, controlledexperiments isolating the successful aspects ofATT ac-2001,
and controlled experiments illustrating someof the lessons learned during
the competition.A discussion
andsummary of related workis provided in Sections 7 and 8. 2. General Approach
In a wide variety of decision-theoreticsettings, it is useful to be able
to evaluate hypothetical
situations.In computer chess, for example, a static board evaluator is used
to heuristically1. Top-scoringby one metric, and second place by another.
210Decision-TheoreticBidding with Learned Density Models
measure which player is ahead and by how much in a given board situation.The
scenario
is similar in auction domains,and our bidding agent ATT ac-2001uses a situation
evaluator,
analogous to the static board evaluator, whichestimates the agent's expectedprofit
in a
hypothetical futuresituation. This \profit predictor" has a widevariety
of uses in the agent. For example, to determine the value of an item, the
agent compares thepredicted profit
assuming the item is alreadyowned to the predicted profit assuming that
theitem is not
available.
Given prices for goods, one can oftencompute a set of purchases and an allocationthat
maximizes profit.
2 Similarly, if closing prices are known, they can be treated as fixed,
and optimal bids can be computed {bid high foranything you want to buy}.
So,one natural
profit predictor is simply tocalculate the profit of optimal purchases underfixed
predicted
prices. {Thepredicted prices can, of course, be differentin different situations,
e.g., previous closing prices can be relevantto predicting future closing
prices.} A more sophisticated approach to profitprediction is to construct
a model of the prob- ability distribution over possiblefuture prices and
to place bids that maximizeexpe cte d
profit. Anapproximate solution to this difficult optimizationproblem can
be created by
stochastically sampling possible prices and computing a profit prediction
as above for each sampled price. A sampling-based schemefor profit prediction
is important for modeling
uncertainty and the value of gaining information, i.e., reducingthe price
uncertainty.
Section2.1 formalizes this latter approach within asimplifiedsequential
auction model. This abstraction illustrates some of thedecision-making issues
in our full sampling-based approach presented in Section 2.2.The full setting
that our approach addresses isconsid-
erablymore complex than the abstractmodel, but our simplifying assumptions
allow usto
focus on a core challenge of thefull scenario. Our guiding principle is
to make decision-
theoretically optimal decisions given profit predictions for hypothetical
futuresituations.
3
2.1 SimplifiedAbstraction
In the simple model, thereare n items to be auctioned off insequence {first
item 0, then
item1, etc.}. The bidder must place a bidr
i
for each itemi, and after each bid, a closing pricey
i
is chosen for the corresponding item from a distribution specific to the
item. If the
bid matches or exceeds the closing price,r
i
025 y i
, the bidder holds itemi, h
i
= 1.Otherwise,
the bidder does not hold theitem, h
i
= 0.The bidder's utility v{H} is a function of its
final vector of holdings H = {h 0
; : : : ; h n0001
} and its costis a function of the holdings and
the vector of closing prices, H001Y . We willformalize the problem of optimal
bid selection
and develop a seriesof approximations to make the problem solvable.2. The
problem iscomputationally difficult in general, buthas beensolved effectively
in the non-trivial TAC setting {Greenwald & Boyan, 2001; Stone et al., 2001}.
3.An alternative approach would be toabstractly calculate the Bayes-Nash
equilibrium{Harsanyi, 1968}
for the game and playthe optimal strategy. We dismissed thisapproach because
of its intractability in realistically complex situations, including TAC.
Furthermore, even if we wereable to approximate the
equilibrium strategy, it is reasonable to assume that our opponents would
not play optimal strategies. Thus, we could gain additional advantageby tuning
our approach to our opponents'actual behavior as
observed in theearlier rounds, which is essentially the strategywe adopted.
211Stone, Schapire, Littman, Csirik, & McAllester
2.1.1 ExactValue
What is the valueof the auction, that is, the bidder'sexpected profit {utility
minus cost} for biddingoptimally for the rest of theauction? If a bidder
knows this value, it can make
its next bid to be one that maximizes its expected profit.The value is a
function of the bidder's current holdings Hand the current item to bid on,i.
It can be expressed as value{i; H } = max r
i
E
y
i
max
r
i+1
E
y
i+1
: : :max
r
n0001
E
y
n0001
{v{G+ H } 000 G 001Y }; {1}
wherethe components of G are the new holdingsas a result of additional winnings
g j
021
r
j
025 y
j . Note that H onlyhas non-zero entries for items that havealready been
sold
{8j025 i; H
j
= 0} and G only has non-zero entriesfor items that have yet to be sold {8j
< i; G
j
= 0}. Note also thatG and Y are fully specified whenthe g
j
and y j
variables
{forj 025 i} are boundsequentially by the expectation and maximization
operators. The
idea here is that the bidsr
i
through r n0001
are chosen tomaximize value in the context of the possibleclosing prices
y j
.
Equation 1 is closelyrelated to the equations defining the value of a finite-horizon
par-
tiallyobservable Markov decisionprocess {Papadimitriou & Tsitsiklis, 1987}
or astochastic
satisfiability expression{Littman, Ma jercik, & Pitassi, 2001}.Like these
other problems,
thesequential auction problem is computationally intractable for sufficiently
general repre- sentations of v{001} {specifically, linear functions of theholdings
are not expressive enough to achieve intractability while arbitrarynonlinear
functions are}.
2.1.2Approximate Value by Reordering There are three ma jor sources of intractability
in Equation 1|the alternation of themaxi-
mizationand expectation operators{allowing decisions to be conditioned on
an exponential
number of possible setsof holdings}, the large number of maximizations {forcing
an ex-
ponential numberof decisions to be considered}, and the large number of
expectations
{resultingin sums over an exponential numberof variable settings}.
We attack the problem of interleaved operators by moving all but the first
of themaxi-
mizations inside the expectations,resulting in an expression that approximates
the value:
v alue-est{i; H} = max
r
i E
y
i
E
y
i+1 : : : E
y
n0001
max
r
i+1
: : :max
r
n0001
{v{G +H } 000 G 001 Y}: {2}
Because the choicesfor bids r
i+1
through r
n0001 appear more deeply nested than the bindings for the closing prices
y
i
through y
n0001
, they cease to be bidsaltogether, and instead represent
decisions as to whether to purchase goods atgiven prices. Let G = opt{H;
i; Y } be a vector representing the optimal number of goods to purchase at
the prices specified bythe vector
Y giventhe current holdings H startingfrom auction i. Conceptually,this
can be computed
by evaluating
opt{H; i; Y} = argmax
g
i
;:::;g
n0001
{v{G +H } 000 H 001Y }: {3}
Thus,Equation 2 can be written:
value-est{i; H } = max r
i
E
y
i
;:::;y n0001
{v{opt{H
0
;i + 1; Y } + H 0
} 000 opt{H
0
; i + 1;Y } 001 Y } {4} 212Decision-TheoreticBidding with Learned Density
Models
where H
0
is identicalto H except the i-th componentreflects whether item i is won|r
i
025 y
i
.
Note that there is afurther approximation that can be made bycomputing the
expected
prices {as pointvalues} before solving the optimizationproblem. This approach
corresponds to further swapping the expectations towards the core of the
equation:
value-est{i; H }
ev = max
r
i {v{opt{H
0
; i + 1; E Y
} + H
0
} 000 opt{H 0
;i + 1;E
Y
} 001 E Y
} {5}
whereE
Y
= E[y
i+1
;: : : ; y
n0001 ], the vector of expected costs ofthe goods. In the remainder of
the article, we refer to methods thatuse this further approximation from
Equation 5 as expected value approaches for reasonsthat will be come apparent
shortly.
The technique of swapping maximization and expectation operators was previously
used by Hauskrecht {1997} to generate a bound for solving partially observableMarkov
decision
processes.The decrease of uncertainty when decisions aremade makes this
approximation
anupper bound on the true valueof the auction: value-est 025value. The
tightness of the ap- proximations in Equations 2 and 5 dependson the true
distributions of the expected prices. For example, if the prices were known
in advance with certainty, thenboth approximations
are exact.
2.1.3 Approximate Bidding
Given a vector of costsY , the optimization problem opt{H; i; Y } in Equation
4 is stillNP-
hard {assuming the representation of the utility function v{001} issufficiently
complex}. For
many representations of v{001}, the optimization problem can be castas
an integer linear pro-
gram and approximated by using the fractional relaxation insteadof the exact
optimization
problem.This is precisely the approach we haveadopted in ATT ac {Stone et
al., 2001}. 2.1.4 Approximation via Sampling Even assuming that opt{H;i;
Y } can be solved in unit time, aliteral interpretation of Equa-
tion 4 says we'll need to solve this optimization problemfor an exponential
number of cost vectors {or even more if the probabilitydistributions Pr{y
j } are continuous}. Kearns,Man-
sour, and Ng {1999} showed that values of partially observableMarkov decisionprocesses
could beestimated accurately by sampling tra jectoriesinstead of exactly
computing sums.
Littman et al. {2001} did the same for stochastic satisfiability expressions.
Applyingthis
idea to Equation 4 leads to the following algorithm.
1. Generate a setS of vectors of closing costsY according to the product
distribution Pr{y
i
}002 001 001 001 002 Pr{y
n0001
}. 2. For each of these samples,calculate opt{H
0
; i + 1; Y } as definedabove and average the
results, resulting in the approximation value-est
s
{i; H } = max
r i
X
Y 2S
{v{opt{H 0
; i + 1; Y} + H
0
}000 opt{H
0
; i + 1; Y }001Y }=jS j:{6}
This expression converges to value-est with increasing sample size. A remaining
challenge in evaluatingEquation 6 is computing the real-valuedbid r
i
that maximizes the value. Notethat we want to buy item iprecisely at those
closing prices for 213Stone, Schapire, Littman, Csirik, & McAllester
which the value of having the item {minus its cost}exceeds the value of
not having the item; this maximizes profit. Thus,to make a positive profit,
we arewilling to pay up to, but not more than, the difference in valueof
having the item and not having the item. Formally , let H bethe vector of
current holdings andH
w
be the holdings modified to
reflect winning itemi. Let G
w
{Y } = opt{H
w
; i+1; Y}, the optimal set of purchases assuming item i was won, and G{Y
} = opt{H;i+1; Y } the optimal set of purchases assuming otherwise
{exceptin cases of ambiguity, we write simply G
w
and G forG
w
{Y } andG{Y} respectively}. We want to select r i
to achieve the equivalence
r
i
025y
i
021
X
Y 2S
{v{G
w
+H } 000 G
w 001 Y }=jSj 000 y
i
025
X
Y 2S
{v{G +H } 000 G 001 Y}=jS j: {7} Setting
r
i
=
X
Y 2S {[v{G
w + H } 000 G w
001 Y ] 000[v{G + H }000 G 001 Y ]}=jS j: {8}
achievesthe equivalence desired in Equation 7,as can be verified by substitution,
and therefore biddingthe average differencebetween holding and not holding
the itemmaximizes
the value.
4
2.2 The FullApproach
Leveraging from the precedinganalysis, we define our sampling-based approachto
profit
prediction in general simultaneous, multi-unit auctions for interacting
goods. In this sce-
nario, let there be n simultaneous, multi-unitauctions for interacting goods
a
0 ; : : : ; a
n0001
.
The auctions might close at different times and these times arenot, in general,
known in
advance to the bidders. When an auction closes,let us assume that the m
units available
are distributed irrevocably tothe m highest bidders, who each need to pay
the price bid
bythe mth highest bidder. This scenario correspondsto an mth price ascending
auction. 5
Note that the same bidder mayplace multiple bids in an auction, and therebypotentially
win multiple units.We assume that after the auction closes, thebidders will
no longer have
anyopportunity to acquire additional copies of thegoods sold in that auction
{i.e., there
is no aftermarket}. Our approach is based upon fiveassumptions. For G =
{g
0
; : : : ; g n0001
} 2IN
n
, letv{G} 2 IR represent the value derived bythe agent if it owns g
i
units of the commodity beingsold in
auction a
i . Note that v isindependentof the costs of the commodities.Note furtherthat
this representation allows for interacting goods of all kinds,including
complementarity and
substitutability.
6
The assumptions of our approach are as follows:
1. Closing prices are somewhat, butonly somewhat, predictable. That is,
given aset
of input features X ,for each auction a
i , there exists a sampling rule that outputsa4. Note that thestrategy for
choosing r
i in Equation 8 does not exploit the factthat the sample S contains
only a finite set of possibilities fory
i
, which might make it more robust to inaccuracies in the sampling. 5. For
large enough mit is practically the same as the more efficient m + 1st auction.
Weuse the mth
price model because that is what is used in TAC'shotel auctions.
6. Goodsare considered complementary if their value as a package is greater
than the sum of theirindividual
values; goodsare consideredsubstitutable if their value as a package isless
than the sum of their
individualvalues.
214Decision-TheoreticBidding with Learned Density Models
closing price y
i
according to a probability distribution of predicted closing prices for
a
i .
2. Given a vector ofholdings H = {h
0 ; : : : ; h
n0001
} where h
i 2IN represents the quantity of the commodity being sold in auctiona
i
that are already owned by the agent, and
given a vector of fixed closing prices Y = {y
0
; : : : ; y n0001
}, thereexists a tractable
procedure opt{H; Y } to determine the optimal setof purchases {g
0
; : : : ; g
n0001
} where
g i
2 IN representsthe number of goods to be purchasedin auction i such that
v{opt{H; Y } + H} 000 opt{H; Y} 001 Y 025 v{G + H } 000 G001 Y
for all G2 IN
n
.This procedure corresponds to the optimizationproblem opt{H; i; Y } in
Equation 3.
3. Anindividual agent's bids do not have anappreciable effect on the economy
{large population assumption}.
4.The agent is free to change existing bidsin auctions that have not yet
closed. 5. Future decisions are made in thepresence of complete price information.
Thisas-
sumption corresponds to the operatorreordering approximation from the previous
section.
While these assumptions are notall true in general, they can be reasonable
enough approx-
imations to be the basis for an effective strategy.
By Assumption 3,the price predictor can generate predicted prices prior
to considering
one's bids. Thus,we can sample from these distributionsto produce complete
sets of closing prices of all goods.
For each good under consideration, weassume that it is the next one to close.If
a
different auction closes first, we can then revise our bids later {Assumption4}.
Thus, we
would like tobid exactly the good's expe cte dmarginal utility to us. Thatis,
we bid the
differencebetween the expected utilities attainable with and without the
good. Tocompute
these expectations, we simply average the utilities of having and not havingthe
good under
differentprice samples as in Equation 8. Thisstrategy rests on Assumption
5 in that we assume that biddingthe good's current expected marginal utility
cannot adversely affect our future actions, for instance by impacting our
future space of possible bids.Note that as
time proceeds, the pricedistributions change in response to the observed
price tra jectories,
thuscausing the agent to continually revise itsbids.
Table 1 shows pseudo-code for the entire algorithm. A fully detaileddescription
of an
instantiationof thisapproach is given in Section 5. 2.3 Example
Considera camera and a flash with interacting values to an agent as shown
in Table 2.
Further, consider that theagent estimates that the camera will sell for
$40with probability
25045, $70 with probability 50045, and $95 with probability 25045.Consider
the question of
what the agentshould bid for the flash {in auctiona
0
}. Thedecision pertaining to the
camerawould be made via a similar analysis. 215Stone, Schapire, Littman,
Csirik, & McAllester
* Let H = {h 0
; : : : ; h n0001
} be the agent's current holdings in each of then auctions.
* Fori= 0 to n 000 1{assume auction i is next to close}: { total-diff =
0
{ counter = 0
{As time permits:
003For each auction a
j
; j 6=i, generate a predicted price sampley
j
. LetY =
{y
0 ; : : : ; y
i0001
; 1; y i+1
; : : : ; y n0001
}.
003 Let H
w = {h
0
; : : : ; h
i0001
; h
i
+ 1; h
i+1 ; : : : ; h
n0001
}, the vector of holdings ifthe agent
wins a unit in auctiona
i
.
003Compute G
w
= opt{H
w
; Y },the optimalset of purchases if the agent wins a unit in
auction a
i
. Note that no additionalunitsof the good will be purchased, since the i-th
component of Yis 1.
003 ComputeG = opt{H; Y },the optimalset of purchases if the agent never
acquires
any additional units in theauction a
i
and prices areset to Y .
003 diff= [v{G
w
+ H } 000 G
w
001 Y ] 000[v{G + H }000 G 001 Y ]
003total-diff = total-diff + diff 003 counter = counter+ 1
{ r = total-diff=counter
{ Bidr in auction a
i
.Table 1:The decision-theoretic algorithm for bidding in simultaneous, multi-unit,
interact-
ing auctions.utilitycamer a alone$50flash alone10both100neither0T able 2:
Thetable of values for all combination ofcamera and flash in our example.
First,the agent samples from the distributionof possible camera prices.
When the price of the camera {sold in auctiona
1
} is $70 in thesample:
* H = {0; 0}; H
w
= {1; 0}; Y ={1; 70}
* G
w
= opt{H w
; Y } is the bestset of purchases the agent can make with the flash, and
assuming the camera costs $70.In this case, the only two options arebuying
the
camera or not. Buyingthe camera yields a profit of 100000 70 = 30. Not
buying the camera yields a profit of 10000 0 = 10. Thus, G w
= {0; 1},and [v{G
w + H } 000 G w
001 Y ] = v{1; 1} 000{0; 1} 001 {1; 70} = 100000 70.
* Similarly G = {0; 0} {since if the flash is not owned, buying the camera
yields a profit of 50 000 70 = 00020, and notbuying it yields a profit
of 0 0000 = 0} and [v{G+ H } 000 G 001 Y] = 0.
216Decision-TheoreticBidding with Learned Density Models
* val = 30000 0= 30.
Similarly, when the camera ispredicted to cost $40, val = 6000010 = 50;
andwhen the camera is predicted to cost $95, val= 10 000 0 = 10. Thus,we
expect that 50045 of the camera price samples will suggest a flash value
of $30, while 25045 willlead to a value of $50 and the other
25045 willlead to a value of $10. Thus,the agent willbid :5 00230 + :25
002 50 + :25 002 10 = $30
forthe flash.
Notice that in this analysisof what to bid for the flash, the actual closingprice
of
the flash is irrelevant. The proper bid depends only on the predicted price
of the camera.
To determine the proper bid for the camera, asimilar analysis would be done
using the predicted price distributionof the flash. 3. TAC
W einstantiated our approach as an entry in thesecond Trading Agent Competition
{TAC},
as described in this section.Building on the success of TAC-00held in July
2000 {Wellman,
W urman, O'Malley, Bangera, Lin, Reeves,& Walsh, 2001}, TAC-01 included19
agents from
9 countries {Wellman et al., 2003a}. A key feature ofTAC is that it required
autonomous bidding agents to buy and sellmultiple interacting goo dsinauctions
of different types.It
is designed as a benchmark problemin the complex and rapidly advancingdomain
of e-
marketplaces, motivating researchers to apply unique approaches to a common
task. By
providinga clear-cut ob jective function, TAC also allows the competitors
to focustheir
attention on the computational andgame-theoretic aspects of the problem
and leaveaside
the modeling and model validation issues that invariablyloom large in real
applications of automated agents to auctions {see Rothkopf& Harstad, 1994}.
Another feature of TAC
is that it provides an academicforum for open comparison of agent biddingstrategies
in
a complex scenario, as opposedto other complex scenarios, such as trading
inreal stock
markets, in whichpractitioners are {understandably} reluctant to sharetheir
technologies.
A TACgame instance pits eight autonomous biddingagentsagainst one another.
Each
TAC agent is a simulated travel agent witheight clients, each of whom wouldlike
to travel
from TACtown to Tampa and back againduring a 5-day period. Eachclient is
characterized
by a randomset of preferences for the possible arrival and departure dates,
hotel rooms, and entertainment tickets. To satisfy a client, an agent mustconstruct
a travel package for that client by purchasing airline ticketsto and from
TACtown and securing hotelreservations; it
is possible to obtain additional bonuses by providing entertainment tickets
as well. A TAC agent's score in a game instance is thedifference between
the sum of its clients'utilities for
the packagesthey receive and the agent's total expenditure.We provide selected
details about the game next; for full details onthe design and mechanisms
of the TAC server and
TACgame, see http://www.sics.se/tac. TAC agents buy flights, hotel rooms
and entertainment tickets through auctionsrun
from the TAC server at the University of Michigan. Eachgame instance lasts
12 minutes
andincludes a total of 28 auctions of 3different types.
Flights{8 auctions}: There is a separate auctionfor each type of airline
ticket:to Tampa
{inflights} on days 1{4 and from Tampa{outflights} on days 2{5.There is
an unlimited
supply of airlinetickets, and every 24{32 seconds their ask price changes
by from 000$10 217Stone, Schapire, Littman, Csirik, & McAllester
to $x. x increases linearly over thecourse of a game from 10 to y, wherey
2[10; 90]
ischosen uniformlyat random for each auction, and is unknown to the bidders.
In
all cases, tickets are priced between$150 and $800. When the server receives
abid at
or above the ask price, thetransaction is cleared immediately at the ask
priceand no
resale is allowed. Hotel Rooms {8}: Thereare two different types of hotel
rooms|the Tampa Towers {TT} and the Shoreline Shanties {SS}|each ofwhich
has 16 rooms availableon days 1{4.
The rooms are sold ina 16th-price ascending {English} auction,meaning that
for each
of the 8 typesof hotel rooms, the 16 highest bidders get therooms at the
16th highest
price.For example, if there are 15 bidsfor TTon day 2 at $300, 2 bids at
$150, and any number of lower bids, therooms are sold for $150 to the 15
high biddersplus
one of the $150 bidders {earliestreceived bid}. The ask price is the current
16th-
highest bid and transactions clearonly when the auction closes. Thus,agents
have
no knowledge of, forexample, the current highest bid. Newbids must be higher
than
the currentask price. No bid withdrawal or resale isallowed, though the
price of bids may be lowered provided the agentdoes not reduce the number
of rooms itwould win
were the auction to close.One randomly chosen hotelauction closes at minutes
4{11
of the 12-minute game. Ask prices are changed only on the minute.
Entertainment Tickets {12}: Alligator wrestling, amusement park, and museum
tickets are each sold for days 1{4 in continuous double auctions. Here, agents
canbuy and
sel l tickets, withtransactions clearing immediately when one agentplaces
a buy bid
at a price at least as high as another agent's sell price.Unlike the other
auction
typesin which the goods are sold from a centralized stock, each agent starts
with a {skewed} random endowment of entertainment tickets. The prices sent
toagents are
the bid-ask spre ads, i.e., the highest currentbid price and the lowest
current ask price {due to immediate clears, ask price is always greater than
bid price}. In this case,bid
withdrawal and ticket resale are both permitted. Eachagent gets blocks of
4 tickets of 2 types, 2 tickets of another 2types, and no tickets of the
other 8 types.
In addition to unpredictablemarketprices, other sources of variabilityfrom
game in-
stance to game instance arethe client profiles assigned to the agents andthe
random initial
allotment of entertainment tickets. Each TAC agent has eight clients with
randomlyas-
signed travel preferences.Clients have parameters for ideal arrival day,
IAD {1{4};ideal
departure day,IDD {2{5}; hotelpremium, HP{$50{$150}; and entertainment values,
EV
{$0{$200} foreach type of entertainment ticket.
The utility obtained by a client isdetermined by the travel package that
it is given in
combinationwith its preferences. To obtain a non-zeroutility, the client
must be assigned a feasible travel package consisting of an inflight on some
arrival day AD, an outflight on a departure day DD, and hotel rooms of the
same type {TT or SS} for the days in between
{days dsuch that AD 024d < DD}. At most one entertainment ticket of each
type can be
assigned, andno more than one on each day.Given a feasible package, the
client's utility
isdefined as 1000 000 travelPenalty+ hotelBonus + funBonus
where
218Decision-TheoreticBidding with Learned Density Models
* travelPenalty = 100{jAD000 IAD j +jDD 000 IDD j}
* hotelBonus = HPif the client is in the TT, 0 otherwise.
* funBonus = sum ofEVs for assigned entertainment tickets. A TAC agent's
scor e is the sum of its clients' utilitiesin the optimal allocation of
itsgoods {computed by the TACserver} minus its expenditures.The client preferences,
allocations,and resulting utilities from a sample game are shown in Tables
3 and 4.ClientIAD IDD HP AWAP MU1Day2 Day 5 73 175 34 242Day 1 Day 3 125
113 124573Day 4 Day 5 73157 12 1774Day 1 Day 2102 50 67 495Day 1 Day3 75
12 135 1106Day2 Day 4 86 197 8597Day 1 Day 5 9056 197 1628Day 1 Day3 50 79
92 136Table 3: ATT ac-2001's client preferences from anactual game. AW, AP,
and MU are EVs
foralligator wrestling, amusement park, and museumrespectively.ClientADDD
Hotel Ent'mentUtility1Day2 Day 5 SS AW411752Day 1 Day 2TT AW111383Day3 Day
5 SS MU3, AW412344Day 1Day 2 TT None11025Day 1 Day 2 TTAP111106Day 2Day 3
TT AW211837Day 1 Day 5SS AF2, AW3, MU414158Day 1 Day 2 TT MU11086Table 4:
ATT ac-2001's client allocations and utilities from thesame actual game as
that in
Table 3. Client 1's \4" under \Ent'ment" indicates on day 4.
Therules of TAC-01 are largely identical tothose of TAC-00, with three important
exceptions:
1. In TAC-00, flight prices did not tend toincrease;
2. In TAC-00,hotel auctions usually all closed at the end ofthe game;
3. In TAC-00,entertainment tickets were distributeduniformly to all agents
Whilerelatively minor on the surface, these changessignificantly enriched
the strategic complexity of the game. Stoneand Greenwald {2003} detail agent
strategiesfrom TAC-00.
219Stone, Schapire,Littman, Csirik, & McAllester
TAC-01 was organized as a series of fourcompetition phases, culminating
with the semifinals and finals on October 14, 2001at the EC-01 conference
in Tampa,Florida. First,
the qualifying round,consisting of about 270 games per agent,served to select
the 16
agentsthat would participate in the semifinals.Second, the seeding round,
consisting of about 315 games per agent, was usedto divide these agents into
two groups ofeight. After
the semifinals, on themorning of the 14th consisting of 11 games in each
group, four teams
from each group were selected to compete in the finals duringthat same afternoon.
The
finalsare summarized in Section 6.
TAC is not designed to be fully realistic in the sense that an agent from
TACis not
immediately deployable in thereal world. For one thing, it isunrealistic
to assume that an
agent wouldhave complete, reliableaccess to all clients'utility functions
{or even that the client would!}; typically, some sort of preference elicitation
procedurewould be required {e.g.
Boutilier,2002}. For another, theauction mechanisms are somewhat contrived
for the
purposes of creating an interesting, yetrelatively simple game. However,each
mechanism
is representativeof a class of auctions that is used in the realworld. And
it is not difficult to imagine a future in which agents doneed to bid in
decentralized, related, yet varying
auctions for similarly complex packages of goods.
4.Hotel Price Prediction
As discussedearlier, a central part of our strategy dependson the ability
to predict prices,
particularly hotel prices, at variouspoints in the game. To do this asaccurately
as possible,
we used machine-learning techniques that would examine thehotel prices actually
paid
in previous gamesto predict prices in future games. Thissection discusses
this part of
our strategyin detail, including a new boosting-basedalgorithm for conditional
density
estimation.
There is bound to beconsiderable uncertainty regarding hotel prices since
these depend
on many unknown factors, such as the time at which the hotel roomwill close,
who the
other agents are,what kind of clients have been assigned toeach agent, etc.
Thus, exactly predicting the price of a hotel room ishopeless. Instead, we
regard the closingprice as a
random variablethat we need to estimate, conditionalon ourcurrent state
of knowledge
{i.e., number of minutes remaining in the game, ask price of each hotel,
flightprices, etc.}.
We might then attemptto predict this variable's conditional expected value.
However,
ourstrategy requires that we not only predict expected value, but that we
also be ableto
estimate the entireconditional distribution so that we cansample hotel prices.
To set this up as a learning problem, wegathered a set of training examples
from previously played games. We defined a set of features for describing
each example that
together are meant tocomprise a snap-shot of all the relevant information
available at the time each prediction is made. Allof the features we used
are real valued; a couple of the
featurescan have a special value? indicating \value unknown."We used the
following
basicfeatures:
* The number ofminutes remaining in the game.
* The price of each hotel room,i.e., the current ask price for roomsthat
have not closed
or the actualselling price for rooms that have closed. 220Decision-TheoreticBidding
with Learned Density Models
* The closing time of each hotel room. Note that this feature is defined
evenfor rooms
that have not yet closed,as explained below.
* The prices of each of the flights. To this basic list, we added a number
of redundant variations, which wethought might help
the learning algorithm:
* The closing price of hotel rooms that have closed {or ? if the room has
not yet closed}.
* The current ask price of hotel rooms thathave not closed {or ? if the room
has already
closed}.
* The closing time of each hotel room minus the closing time of the room
whose price we are trying to predict.
* The number of minutes fromthe current time until each hotel roomcloses.
During the seeding rounds, it wasimpossibleto know during play who our opponents
were, although this information was available at the end of each game,and
therefore during
training. Duringthe semifinals and finals, we did know theidentities of
all our competitors. Therefore, in preparation for the semifinals andfinals,
we added the following features:
* The number of playersplaying {ordinarily eight, but sometimes fewer,for
instance if
one or more playerscrashed}.
* A bit for eachplayer indicating whether or not that playerparticipated
in this game.
Wetrained specialized predictors for predicting theprice of each type of
hotel room. In other words, one predictor was specialized for predicting
only the price of TT on day
1, anotherfor predicting SS on day2, etc. This would seem to require eightseparate
predictors. However,the tournament game is naturally symmetric aboutits
middle in the
sense that we cancreate an equivalent game by exchangingthe hotel rooms
on days 1 and 2 with those on days 4 and 3 {respectively}, and by exchanging
the inboundflights on
days1, 2, 3 and 4 withthe outbound flights on days 5, 4, 3 and2 {respectively}.
Thus,
withappropriate transformations, the outer days {1 and4} can be treated
equivalently ,and
likewise for the inner days {2 and 3}, reducing the number of specializedpredictors
by half.
We also createdspecialized predictors for predicting in the firstminute
after flight prices
hadbeen quoted but prior to receiving any hotelprice information. Thus,
a total of eight specialized predictors were built {for each combination
of TT versus SS, inner versusouter
day, and first minute versus not first minute}.
We trained our predictors to predict not theactual closing price of each
room per se, but rather how much the price wouldincrease, i.e., the difference
betweenthe closing price
and the current price.We thought that this might be aneasier quantity to
predict, and,
because our predictor never outputs a negative numberwhen trained on nonnegative
data, this approach also ensures that we neverpredict a closing price below
the current bid. From each of the previously played games, wewere able to
extract manyexamples.
Specifically, for eachminute of the game and for each room thathad not yet
closed, we
221Stone, Schapire,Littman, Csirik, & McAllester
extracted the values of all of the features described above at that moment
in the game, plus the actual closing price of the room {which we are trying
to predict}.
Note thatduringtraining, there is no problem extracting theclosing times
of all of the
rooms.During the actual play of a game, we donot know the closing times
of rooms that have not yet closed. However,we do know the exact probability
distributionfor closing
timesof all of the rooms that have not yet closed. Therefore,to sample a
vector of hotel
prices, wecan first sample according to this distribution over closing times,
and then use
ourpredictorto sample hotel prices using these sampled closingtimes.
4.1 The Learning Algorithm Having described how we set up thelearning problem,
we are now ready to describe the
learning algorithm that we used.Briefly, we solved this learning problemby
first reducing to
a multiclass, multi-label classification problem {or alternativelya multiple
logistic regression
problem},and then applying boosting techniques developed by Schapire and
Singer {1999, 2000} combined with a modification of boosting algorithms for
logistic regression proposed by Collins, Schapire and Singer {2002}.The result
is a new machine-learning algorithmfor
solving conditional density estimationproblems, described in detail in the
remainder ofthis
section. Table 5 showspseudo-code for the entire algorithm. Abstractly,
we are given pairs {x
1
; y
1
}; : : : ; {x
m
; y
m
} where each x i
belongs to a spaceX
and each y
i is in R . In our case, thex
i
's are the auction-specific feature vectors described
above; for some n, X022 {R [ f?g} n
. Each target quantity y
i
is the difference between closing
price and current price.Given a new x, our goal is toestimate the conditional
distribution
of y given x.
We proceed with the working assumption thatall training and test examples
{x;y} are
i.i.d. {i.e, drawn independently from identical distributions}.Although
this assumption is
false in ourcase {since the agents, including ours, are changing over time},
it seems like a reasonable approximation that greatly reduces thedifficulty
of the learning task.
Our first step is to reduce the estimationproblem to a classification problem
by breaking the range of the y
i 's into bins:
[b 0
; b
1 }; [b
1 ; b
2
}; : : : ; [b
k ; b
k+1
]
for some breakpoints b 0
< b
1 < 001 001 001 < b k
024 b
k+1
where for our problem, we chosek = 50.
7
Theendpoints b
0
andb
k+1
are chosen to be the smallest and largest y
i
values observed during training. We choose theremaining breakpoints b
1 ; : : : ; b
k so that roughly an equal
number of training labels y i
fall into each bin.{More technically, breakpoints are chosen so
that the entropy of thedistribution of bin frequencies is maximized.} For
each of the breakpointsb
j
{j =1; : : : ; k}, our learningalgorithm attempts to estimate
theprobability that a new y {givenx} willbe at least b j
. Given such estimatesp
j
for each b
j
, we can thenestimate the probability that y isin thebin [b
j
;b
j+1
} byp
j+1
000p
j
{and
we can then use a constant density withineach bin}. We thus have reducedthe
problem
to one of estimating multipleconditional Bernoulli variables corresponding
tothe event7.We did not experiment with varyingk, but expect that the algorithm
is notsensitive to it for sufficiently large values of k.
222Decision-TheoreticBidding with Learned Density ModelsInput:{x
1
; y
1
}; : : : ; {x
m
; y
m
}where x
i
2 X, y
i
2 R positive integers k andT
Compute breakpoints:b
0
< b
1
< 001 001 001< b
k+1
where
* b
0
=min
i
y
i
* b
k+1
= max
i
y
i
* b
1 ; : : : ; b
k chosen to minimize
P k
j=0
q j
ln q
j where q
0
;: : : ; q
k
arefraction of y
i
's in [b
0
; b 1
}; [b
1
; b
2
}; : : : ; [b
k ; b
k+1
]{using dynamic programing}
Boosting:
* for t= 1; : : : T :
000compute weights W
t {i; j } =
11 + e s
j
{y i
}f
t {x
i
;j}
where s
j {y} is as in Eq. {10} 000 use W
t to obtain base function h
t
:X 002 f1;: : : ; kg ! R minimizing m
X
i=1 k
X
j=1 W
t
{i;j }e
000s j
{y
i }h
t
{x
i
;j} over all decision rules h t
considered. The decision rules can take any form. Inour work, we use \decision
stumps," or simplethresholds on one
of the features. Output sampling rule:
* let f =
T X
t=1
h t
* let f 0
={f+ f}=2wheref {x;j } = maxff {x; j
0
} :j 024 j
0
024 kg
f{x; j } = minff {x; j
0
} : 1 024 j
0 024 j g
* tosample, given x 2 X
000 let p
j
=
11+ e
000f
0 {x;j}
000let p
0
=1; p
k+1
=0
000 choose j2 f0; : : : ; kgrandomly with probability p
j
000 p
j+1
000 choose yuniformly at random from [b j
; b
j+1
]
000 outputyTable5: The boosting-based algorithm forconditional density
estimation.
y025 b
j
, and for this,we use a logistic regression algorithm based on boosting
techniques as
described byCollinset al. {2002}.
Our learning algorithmconstructs a real-valued function f: X 002 f1; :
: : ; kg ! R with
the interpretation that
11 + exp{000f{x; j })
{9} is our estimate of the probability thaty 025b
j
,given x. The negative log likelihood of the
conditional Bernoulli variable corresponding to y
i
being above or belowb
j
is then
ln
020
1 + e 000s
j
{y
i
}f {x
i
;j} 021
223Stone, Schapire,Littman, Csirik, & McAllester
where s
j
{y} =
{
+1 ify 025 b
j
0001 if y < b
j
.
{10}
We attempt tominimize this quantity for all training examples{x
i
; y i
} and all break- points b
j
. Specifically, we try to find afunction f minimizing
m X
i=1
k X
j=1
ln 020
1 + e
000s
j
{y i
}f {x i
;j}
021 :
We use a boosting-like algorithm described by Collins et al. {2002}for minimizing
ob jective
functionsof exactly this form. Specifically, we build the function f inrounds.
On each
roundt, we add a new base functionh
t
: X 002f1;: : : ; kg !R . Let
f
t =
t0001
X
t
0
=1
h
t
0 be the accumulating sum.Following Collins, Schapire and Singer, toconstruct
each h
t
, we
first let
W t
{i; j }=
11+ e
s
j
{y
i
}f t
{x
i ;j}
be a set of weights on example-breakpoint pairs. We then choose h
t to minimize
m
X
i=1
k
X
j=1
W t
{i; j }e
000s
j {y
i
}h
t
{x
i
;j}
{11} over some space of \simple" basefunctions h
t
.For this work, we considered all\decision
stumps" h of the form h{x; j } = 8
>
<
> :
A
j
if 036{x}025022
B
j
if036{x}< 022
C
j
if 036{x}=?
where036{001} is one of the featuresdescribed above, and 022, A
j
, B j
and C
j are all real numbers.
In other words, such an h simplycompares one feature 036 to a threshold022
andreturns a
vectorof numbers h{x; 001} that depends only on whether036{x} is unknown
{?}, orabove
or below022. Schapire and Singer {2000} show how to efficiently search
for the bestsuch
h over all possible choices of 036, 022, A j
, B
j and C
j
.{We also employed their technique for \smoothing"A
j , B
j
andC
j
.}
Whencomputed by this sort of iterative procedure,Collins et al. {2002} prove
the asymptotic convergence of f t
to the minimum of the objective function in Equation {11} over all linear
combinations of the basefunctions. For this problem, we fixedthe number
of rounds toT = 300. Let f =f
T +1
bethe final predictor.
As noted above,given a new feature vector x, wecompute p
j
as in Equation{9} to be
our estimate for the probability that y 025 b
j , and we let p
0 = 1 and p
k+1 = 0. For this to
makesense, we need p
1
025 p
2 025 001 001 001 025 p k
, or equivalently ,f {x; 1} 025f {x; 2} 025 001001 001 025 f {x;k},
a
condition that may not holdfor the learned function f . To force this condition,
we replace f by a reasonable {albeitheuristic} approximation f
0 that is nonincreasing in j ,namely,
224Decision-TheoreticBidding with Learned Density Models
f
0
= {f+ f}=2 wheref {respectively, f} is the pointwise minimum {respectively,
maximum} of all nonincreasing functions gthateverywhere upper bound f{respectively,
lower bound f }.
With this modifiedfunction f
0
, we cancompute modified probabilities p j
. To sample a single point according to the estimateddistributionon R associated
withf
0
, we choose bin [b
j
;b
j+1
} with probabilityp
j
000 p j+1
, and then select a point from this bin uniformly at
random. Expected value according to this distributionis easily computed
as
k
X
j=0 {p
j
000p
j+1
} 022
b
j+1 + b
j2
: Although we present results using thisalgorithm in the trading agent context,
we did not test its performance on more generallearning problems, nor did
we compare to other methods for conditional density estimation, such as those
studied by Stone {1994}.This
clearly should be an area forfuture research.
5. ATT ac-2001 Having described hotel price predictionin detail, we now
present the remainingdetails of
ATT ac-2001'salgorithm. We begin with a briefdescription of the goods allocator,
which is used as a subroutine throughout the algorithm.We then present the
algorithm in a top-down fashion.
5.1Starting Point
A core subproblemforTAC agents is the allocation problem:finding the most
profitable
allocationof goods to clients, G
003
, given a set of owned goods and prices for all other goods. The allocation
problem corresponds tofindingopt{H; i; Y }in Equation 3. We denote the value
of G
003 {i.e., the score one would attainwith G
003
} asv{G
003
}.The general allocation
problemis NP-complete, as it is equivalentto the set-packing problem {Garey
& Johnson, 1979}. However itcan be solved tractably in TAC via integer linearprogramming
{Stone
et al., 2001}. The solution to the integer linear programis a value-maximizing
allocation of owned resources to clients along with a list ofresources that
need to be purchased.Using the linear
programming package \LPsolve", ATT ac-2001is usually able to find the globally
optimal solution in under 0.01 seconds on a 650 MHz Pentium II. However,
sinceinteger linear
programming is anNP-complete problem, some inputs can lead to a greatdeal
of search
over the integrality constraints, andtherefore significantly longersolution
times. When only
v{G
003
} is needed{as opposed to G
003 itself }, the upper bound producedby LPsolve prior to
the search over the integrality constraints, known as theLP relaxation,
can be used as an estimate. The LP relaxation can alwaysbe generated very
quickly. Note that this is not by any means the only possible formulation
of the allocation problem. Greenwald and Boyan {2001} studied a fast, heuristic
search variant and found
that it performedextremely well on a collection of large, randomallocation
problems. Stone
etal. {2001} used a randomized greedy strategy as afallback for the cases
in which the linear program took too long to solve. 225Stone, Schapire,Littman,
Csirik, & McAllester
5.2Overview
Table 6 shows ahigh-level overview of ATT ac-2001. The italicized portions
are described in the remainder of this section.When the first flight quotesare
posted:
* ComputeG
003
with current holdingsand expected prices
* Buy the flights in G
003
for which expectedcost of postponing commitment exceedsthe expected
benefitof postponing commitment
Starting 1minute before each hotel close:
* Compute G
003 with current holdings and expectedprices
* Buy the flights in G
003
for whichexpected cost of postponing commitmentexceeds expected
benefitof postponing commitment {30seconds}
* Bidhotel roomexpected marginal values givenholdings, new flights, and
expected hotel purchases {30 seconds} Last minute: Buy remaining flights
as needed by G
003 In parallel {continuously}:Buy/sell entertainment tickets based ontheir
expected valuesTable 6: ATT ac-2001's high-level algorithm. The italicized
portions are described in the
remainderof this section.
5.3 Costof Additional Rooms
Our hotel pricepredictor described in Section 4 assumes thatATT ac-2001's
bids do not affect the ultimate closing price {Assumption 3 fromSection 2}.
This assumption holds in a large economy. However in TAC, each hotel auction
involved 8 agents competing for 16 hotel
rooms.Therefore, the actions of each agent had anappreciable effect on the
clearing price: the more hotel rooms an agent attempted topurchase, the higher
the clearing price would be, all other things being equal.This effect needed
to be taken intoaccount when solving
the basic allocationproblem.
The simplifiedmodel used byATT ac-2001 assumed that thenth highest bid in
a hotel
auctionwas roughly proportional toc
000n
{over theappropriate range of n} for somec025 1.
Thus, if the predictor gave a price of p, ATT ac-2001only used this for
purchasing two hotel rooms {the \fair" share of a single agent of the 16
rooms}, and adjusted prices for other quantities of rooms by usingc.
For example, ATT ac-2001 would consider the cost ofobtaining 4 rooms to
be 4pc 2
. One or
two rooms each cost p, but 3 each cost pc, 4 each cost pc 2
, 5 each cost pc 3
, etc. So in total,2
rooms cost 2p, while 4 cost4pc
2
. Thereasoning behind this procedure is that ifATT ac-2001
buys two rooms - its fair share given that there are 16 rooms and 8 agents,
then the 16th
highest bid {ATT ac-2001's2 bids in addition to 14 others} sets the price.But
if ATT ac-2001
bids on an additional unit, the previous 15thhighest bid becomes the price-setting
bid:the
price for all rooms sold goes up from p to pc.
Theconstant c was calculated from the data of several hundred games during
the seeding round. In each hotel auction, the ratioof the 14th and 18th highest
bids {reflecting the 226Decision-TheoreticBidding with Learned Density Models
most relevant range of n} was taken as an estimate ofc
4
, and the {geometric}mean of the
resulting estimates was takento obtain c = 1:35. The LP allocator takes
these priceestimates into account when computingG
003
by as- signing higher costs to larger purchase volumes, thus tending to
spread out ATT ac-2001's
demand over thedifferent hotel auctions.
In ATT ac-2001, a few heuristics were appliedto the above procedure to improvestability
and to avoid pathological behavior: prices below $1 were replacedby $1 in
estimating c;
c= 1 was used for purchasing fewerthan two hotel rooms; hotel rooms weredivided
into
early closing and lateclosing {and cheap and expensive} ones, andthe c values
from the
corresponding subsets of auctions of the seedingrounds were used in each
case.
5.4 Hotel Expected Marginal Values
Using the hotel price prediction module as described in Section 4, coupled
with amodel of
its own effect on the economy, ATT ac-2001 is equipped todetermine its bids
for hotel rooms. Every minute, for each hotel auction thatis still open,
ATT ac-2001assumes that auction
will close next andcomputes the marginal value of that hotel room given
the predicted
closing prices ofthe other rooms. If the auction does notclose next, thenit
assumes
thatit will have a chance to revise its bids.Since these predicted prices
are represented as distributionsof possible future prices,ATT ac-2001 samples
from these distributionsand
averages the marginal values to obtain an expected marginal value. Using
the full minute
between closing times for computation {or30 seconds if there are stillflights
toconsider
too}, ATT ac-2001divides the available time among thedifferent open hotel
auctions and generates as many price samples as possiblefor each hotel room.
In the end,ATT ac-2001
bids the expectedmarginal values for each of the rooms. The algorithm is
described precisely and withexplanation in Table 7.
Oneadditional complication regarding hotel auctions isthat, contrary to
one of our
assumptions in Section 2.2 {Assumption 4},bids are not fully retractable:
theycan only
be changed to $1 abovethe current ask price. In the case thatthere are current
active
bidsfor goods that ATT ac-2001no longer wants that are less than $1 above
the current ask
price,it may be advantageous to refrain from changing the bid in the hopes
that the ask price will surpass them: that is, thecurrent bid may have a
higher expected value than
the best possible newbid. To address this issue, ATT ac-2001 samples from
the learned price distribution to find the average expected values of the
current and potential bids, and only
enters a new bid inthe case that the potential bid is better. 5.5 Expected
Cost/Benefit of Postponing Commitment
ATT ac-2001makes flight bidding decisions based on acost-benefit analysis:
in particular, ATT ac-2001 computes the incrementalcost of postponing bidding
for a particularflight versus
the valueof delaying commitment. In this section, we describe the determination
of the cost of postponing the bid. Due to difficulties that compounded withmore
sophisticated approaches, ATT ac-2001 used the following very simple modelfor
estimating the price of a flight ticketat a given
future time. Itis evident from the formulation that|giveny|the expected
price increase
from time 0 to time t was very nearlyof the form M t
2 for some M . It was alsoclear that
227Stone, Schapire,Littman, Csirik, & McAllester
* For each hotel {inorder of increasingexpected price}:
* Repeat until time bound 1. Generate a random hotel closing order{only
over open hotels}
2.Sample closing prices from predicted hotelprice distributions
3. Giventhese closing prices, compute V
0
; V
1
; : : : V
n
000V
i
021 v{G
003
} if owningi of the hotel
000 Estimatev{G
003
} with LPrelaxation
000 Assume that noadditional hotel rooms of this type can bebought
000 Forother hotels, assumeoutstanding bids abovesampled price are already
owned
{i.e., they cannot be withdrawn}. 000 Note that V
0
024 V
1 024 : : : 024 V n
: the valuesare monotonicallyincreasing since having more goods cannot be
worse in termsof possible allocations.
* The value of the ith copy ofthe room is the mean of V
i
000 V
i0001
over all the samples.
* Note further that V 1
000 V
0
025 V
2 000 V
1
: : : 025 V
n 000 V
n0001
: the value differencesare monotonically
decreasing since eachadditional room will be assigned to the client who
can derive the most
value from it.
* Bidfor one room at the value of theith copy of the room for alli such
that the value is at least as much as the current price.Due to the monotonicity
noted in the stepabove, no
matterwhat the closingprice, the desired number of rooms at thatprice will
be purchased.Table 7: The algorithm forgenerating bids for hotel rooms.
as long as the price did not hit the artificialboundaries at $150 and $800,
the constantM
must depend linearly ony 000 10. This linear dependence coefficient was
then estimated from several hundred flight price evolutionsduring the qualifying
round. Thus,for this constant
m,the expected price increase from timet to time T was m{T
2
000t
2
}{y00010}. When a
priceprediction was needed, this formula was first used for the first and
most recent actual price observations to obtain a guess fory, and then this
y was used in theformula again to
estimate the future price.No change was predicted if the formulayielded
a price decrease.
This approachsuffers from systemic biases of variouskinds {mainly due to
the fact that the variance of price changes getsrelatively smaller over longer
periods oftime}, but was
thought to be accurateenough for its use, which was to predictwhether or
not the ticket
can be expected to get significantlymore expensive over the next few minutes.
In practice,during TAC-01, ATT ac-2001started with the flight-lookahe adparameterset
to 3 {i.e., cost of postponing is the average ofthe predicted flight costs
1, 2, and 3 minutes
in the future}. However,thisparameter was changed to 2 by the endof the
finals in order
to causeATT ac-2001 to delay its flightcommitments further.
5.5.1 ExpectedBenefit of Postponing Commitment
Fundamentally , the benefit of postponingcommitments to flights is that
additional infor- mation about the eventual hotel pricesbecomes known. Thus,
the benefit of postponing
commitment is computed bysampling possible future price vectors anddetermining,
on
average, how much better the agent could do if it bought adifferent flight
instead of the one in question. If it is optimal to buythe flight in all
future scenarios, then thereis no
value in delaying thecommitment and the flight is purchasedimmediately.
However, if 228Decision-TheoreticBidding with Learned Density Models
there are many scenarios in which theflight is not the best one to get,
the purchase is more
likely to be delayed. The algorithm for determining the benefitof postponing
commitment is similar to that for determining the marginal valueof hotel
rooms. It is detailed, withexplanation, in
Table 8.
* Assume we'reconsidering buying n flights of a given type
* Repeat until time bound
1. Generate a random hotel closingorder {open hotels}
2. Sampleclosing prices from predicted price distributions{open hotels}
3. Giventhese closing prices compute V
0
; V
1 ; : : : V
n
000 V
i
=v{G
003
} if forced to buy i of the flight 000 Estimate v{G 003
} with LP relaxation 000 Assume more flights can bebought at the current
price
000 Note that V 0
025 V
1 025 : : : 025 V n
since it is never worse toretain extra flexibility.
* The value of waiting to buycopy i is the mean of V i
000 V
i0001
over all thesamples. If
all price samples lead tothe conclusion that the ith flight shouldbe bought,
then
V
i
= V
i0001
and there is no benefit to postponing commitment.Table 8: The algorithm
for generating value of postponing flight commitments. 5.6 Entertainment
Expected Values
The core of ATT ac-2001's entertainment-ticket-bidding strategy isagain
a calculation of the
expected marginalvalues of each ticket. For each ticket, ATT ac-2001computes
the expected
valueof having one more and one fewer of the ticket. These calculations
give bounds on the bid and ask prices it is willing to post. The actual bid
and ask prices are alinear function of
time remaining in thegame: ATT ac-2001 settles for a smaller and smaller
profit from ticket transactions as the game goes on.Details of the functions
of bid and ask priceas a function
of game time and ticket value remained unchanged from ATT ac-2000 {Stone
et al., 2001}. Details of the entertainment-ticket expected marginal utility
calculations are given in Table 9.
6. Results This section presents empirical resultsdemonstrating the effectiveness
of theATT ac-2001
strategy .First, we summarize its performance in the2001 and 2002 Trading
Agent Com- petitions {TACs}. Thesesummaries provide evidence of the strategy's
overall effectiveness,
but, due to the smallnumber of games in the competitions, areanecdotal rather
than sci-
entificallyconclusive. We then present controlledexperiments that provide
more conclusive evidence of the utility of our decisiontheoretic and learning
approaches embedded within ATT ac-2001.
229Stone, Schapire,Littman, Csirik, & McAllester
* Assume n of a giventicket type are currently owned
* Repeat until time bound 1. Generate a random hotel closing order{open
hotels}
2. Sampleclosing prices from predicted price distributions{open hotels}
3. Given these closingprices compute V
n0001
; V
n
; V
n+1
000V
i
= v{G
003
} if owni of the ticket
000Estimate v{G
003 } with LP relaxation
000Assume no other tickets can be bought or sold
000 Note thatV
n0001
024V
n
024 V n+1
since it is never worse to own extra tickets.
* The value of buying a ticket is the mean of V
n+1
000 V
n over all the samples; the value
of selling is the mean ofV
n
000 V n0001
.
* Since tickets are considered sequentially, if the determined buy or sell
bidleads to a
price that would clear according to the current quotes, assume the transaction
goes
throughbefore computing the values of buying and selling other ticket types.Table
9:The algorithm for generating valueof entertainment tickets.
6.1 TAC-01 Competition Of the 19 teams that entered thequalifying round,
ATT ac-2001 was oneof eight agents
to make it to thefinals on the afternoon of October 14th, 2001.The finals
consisted of
24 games amongthe same eight agents. Rightfrom the beginning, it became
clear that livingagents {Fritschi& Dorer, 2002} was the team to beat in thefinals.
They jumped to an
early lead in the first two games, and by eight games into the round, they
were morethan
135 points per game ahead oftheir closest competitor {SouthamptonTAC, He
& Jennings,
2002}. 16games into the round, they were more than 250points ahead of their
two closest competitors {ATT ac-2001and whitebear}.
From that point, ATT ac-2001, which was continually retraining itsprice
predictors based
on recent games, began making a comeback. By the time the last game was
to be played,
it was only an average of 22 pointsper game behind livingagents.It thus
needed to beat
livingagents by 514 pointsin thefinal game to overtake it, well withinthe
margins observed
in individual gameinstances. As the game completed, ATT ac-2001's score
of 3979 was one of the first scores to be posted bythe server. The other
agents' scores werereported one
by one, until only thelivingagents score was left. Afteragonizing seconds
{at least for us}, the TAC server posted a finalgame score of 4626, resultingin
a win forlivingagents.
After the competition, theTAC team at the University of Michigan conducted
a regres-
sion analysis of the effects of the client profiles on agentscores. Using
data from the seeding rounds, it was determined that agents did better when
their clients had:
1.fewer total preferred travel days; 2. higher total entertainment values;
and
3. a higher ratio ofouter days {1 and 4} to inner {2 and 3} in preferred
trip intervals.
230Decision-TheoreticBidding with Learned Density Models
Based on these significant measures, thegames in the finals could be handicapped
based on each agents' aggregate clientprofiles. Doing so indicated thatlivingagents'
clients were
much easier to satisfy than those ofATT ac-2001, giving ATT ac-2001the highest
handicapped
score.The final scores, as well as the handicapped scores, are shown in
Table10. Complete
results and affiliations areavailable from http://tac.eecs.umich.edu.AgentMeanHandicapped
scoreATT ac-200136224154livingagents36704094whitebear35133931Urlaub0134213909Retsina33523812CaiserSose30743766SouthamptonT
AC32533679T acsMan28593338T able10: Scores during the finals. Eachagent played
24 games. Southampton'sscore was
adversely affected by agame in which their agent crashed after buyingmany
flights but no hotels, leadingto a loss of over 3000 points.Discarding that
game
results in an average score of 3531.
6.2 TAC-02 Competition
A year afterthe TAC-01 competition, ATT ac-2001 was re-entered in the TAC-02
competition
using the modelstrained at the end of TAC-01.Specifically, the price predictors
wereleft
unchanged throughout {no learning}.The seeding round included 19 agents,
eachplaying
440 games over the course ofabout 2 weeks. ATT ac-2001was the top-scoring
agent in this round, as shown in Table11. Scores in the seeding round were
weighted so as to emphasize
later results over earlier results: scores on dayn of the seeding round
were given a weight
of n. Thispractice was designed to encourage experimentation early in the
round. The
official ranking in the competitions was based on the mean score after ignoring
each agent's worst 10 results so as to allow for occasional program crashes
and network problems. On the one hand, it is striking thatATT ac-2001 was
able to finish sostrongly in a field
of agents that hadpresumably improved over the course of theyear. On the
other hand,
most agents were being tuned, for better and for worse, whileATT ac-2001
was consistent throughout. In particular, we are toldthat SouthamptonTAC
experimented withits approach
during the later days ofthe round, perhaps causing it to fall out of the
lead {by weighted
score} in the end.During the 14-game semifinal heat,ATT ac-2001, which was
nowrestored
with its learning capability andretrained over the data from the 2002 seedinground,
finished
6th out of 8 therebyfailing to reach the finals.
Thereare a number of possible reasons for thissudden failure. One relatively
mun- dane explanation is that the agent had tochange computational environments
betweenthe
seeding rounds and the finals, andthere may have been a bug or computationalresource
constraint introduced.Another possibilityis that due to the smallnumber
of games in
231Stone, Schapire,Littman, Csirik, & McAllesterAgentMeanWeighted,droppedworst
10ATT ac-200130503131SouthamptonT AC31003129UMBCT AC29803118livingagents30183091cuhk29983055Thalis29523000whitebear29452966RoxyBot27382855T
able11: Top 8 scores duringthe seeding roundof TAC-02. Each agent played
440 games,
with its worst 10 games ignoredwhen computing the rankings.
thesemifinals, ATT ac-2001 simply got unlucky with respect to clients and
the interaction ofopponent strategies. However,it is also plausiblethat the
training data fromthe 2002
qualifying and seeding round datawas less representative of the 2002 finalsthan
the was the
training data from 2001;and/or that the competing agents improvedsignificantly
over the
seeding roundwhile ATT ac-2001 remained unchanged.The TAC team at the University
of Michigan has done a study of the pricepredictors of several 2002 TACagents
that suggests
that the bug hypothesis is most plausible: the ATT ac-2001 predictor from
2001 outperforms all other predictors from 2002 on the datafrom the 2002
semifinalsand finals; and oneother
agent that uses the 2002 data didproduce good predictions based on that
data{Wellman,
Reeves, Lochner, & Vorobeychik, 2003b}.
8 6.3 Controlled Experiments ATT ac-2001's success in the TAC-01 competition
demonstrates its effectiveness as a complete
system. However,since the competing agents differed along several dimensions,
the compe-
tition resultscannot isolate the successful approaches.In this section,
we report controlled experimentsdesigned to test the efficacy of ATT ac-2001's
machine-learning approach to price
prediction.
6.3.1Varying the Predictor
In the first set of experiments, weattempted to determine how the quality
ofATT ac-2001's
hotel pricepredictions affects its performance. To this end, we devised
seven price prediction schemes, varying considerably insophistication and
inspired by approaches takenby other
TAC competitors, andincorporated these schemes into our agent.We then played
these
seven agents against one another repeatedly, with regular retraining as
described below.
Following are the seven hotelprediction schemes that we used, in decreasingorder
of
sophistication:8. Indeed, in the TAC-03competition, ATT ac-2001 was enteredusing
the trained models from 2001, and it won the competition, suggesting further
thatthe failure in 2002 was due to a problem withthe learned
models that were used duringthe finals in 2002.
232Decision-TheoreticBidding with Learned Density Models
* ATT ac-2001
s : This is the \full-strength" agentbased on boosting that was used during
the tournament. {The sdenotes sampling.}
* Cond'lMean s
: This agent samples pricesfrom the empirical distribution of prices from
previously played games, conditionedonly onthe closing time of the hotel
room {a subset of of the features used byATT ac-2001
s
}.In other words, it collects all historical hotel prices and breaks them
down by thetime at which the hotel closed {as well as room type, as usual}.
Theprice predictor then simply samples from thecollection of
prices corresponding to the given closing time.
* SimpleMean s
: This agent samples prices from the empirical distribution of prices from
previously played games, withoutregard tothe closing time of the hotel room
{but still broken down by room type}. It uses a subset of the features used
by Cond'lMean
s
.
* ATT ac-2001
ev
, Cond'lMean
ev
, SimpleMean
ev : These agents predict in the same way as
their corresponding predictors above, but instead of returning a random
samplefrom
the estimated distributionof hotelprices, they deterministicallyreturn the
expected value of the distribution. {Theev denotes expected value,as introduced
in Sec-
tion2.1.}
* CurrentBid:This agent uses a very simple predictor thatalways predicts
that the hotel
room will close at its current price. In every case, whenever the price
predictorreturns a price that is below the currentprice,
we replace it with the currentprice {since prices cannot go down}. In our
experiments, we added as an eighth agent EarlyBidder, inspired by thelivingagents
agent. EarlyBidderused SimpleMean
ev
to predict closing prices, determined an optimalset of
purchases, and then placed bidsfor these goods at sufficiently high pricesto
ensure that
they would be purchased {$1001 for all hotel rooms, just aslivingagents
did in TAC-01}right
after the first flight quotes.It then never revised these bids. Each of
these agents require training,i.e., data from previously played games.However,
we are faced with a sortof \chicken and egg" problem: torun the agents,
we need to first train the agents using data fromgames in which they were
involved, butto get this
kind of data, we need tofirst run the agents. To get aroundthis problem,
we ran the
agents inphases. In Phase I, which consisted of 126games, we used training
data from
the seeding, semifinalsand finals rounds of TAC-01. In Phase II, lasting
157 games, we
retrained the agents once every sixhours using all of the data from the
seeding,semifinals
and finals rounds as wellas all of the games played in Phase II.Finally,
in Phase III, lasting 622 games, we continued to retrain theagents once every
six hours, but now usingonly
data from games played during PhasesI and II, and not including data from
theseeding,
semifinals and finals rounds. Table 12 shows how the agents performed in
each of these phases. Muchof what we
observe in this table isconsistent with our expectations. Themore sophisticated
boosting-
based agents{ATT ac-2001
s
and ATT ac-2001
ev } clearly dominated the agents based onsimpler
prediction schemes. Moreover,with continued training, these agents improved
markedly
relative toEarlyBidder. We also see the performance of the simplest agent,
CurrentBid, which
233Stone, Schapire,Littman, Csirik, & McAllesterAgentRelative ScorePhase
IPhase IIPhase IIIATT ac-2001 ev105:2006 49:5 {2}131:6 006 47:7 {2}166:2006
20:8 {1}ATT ac-2001
s27:8 006 42:1{3}86:1 00644:7 {3}122:3 006 19:4 {2}EarlyBidder140:3 006
38:6 {1}152:8 006 43:4 {1}117:0 006 18:0 {3}SimpleMean
ev00028:8 006 45:1 {5}00053:9 006 40:1 {5}00011:5 00621:7 {4}SimpleMean
s00072:0 006 47:5 {7}00071:6 00642:8 {6}00044:1 006 18:2{5}Cond'lMean
ev8:6 006 41:2 {4}3:5 006 37:5 {4}00060:1 00619:7 {6}Cond'lMean s000147:5
006 35:6 {8}00091:4006 41:9 {7}00091:1 00617:6 {7}CurrentBid00033:7006
52:4 {6}000157:1 00654:8 {8}000198:8 006 26:0{8}Table 12: The average
relative scores{006 standard deviation} for eight agents in the three
phases of our controlled experiment in which the hotel prediction algorithm
was
varied. The relative score ofan agent is its score minus the averagescore
of all
agents in that game.The agent's rank within each phase is shown in parentheses.
does not employany kind of training, significantly declinerelative to the
other data-driven agents.
On the other hand, there are some phenomena in this table that were verysurprising
to us. Most surprisingwasthe failure of sampling to help. Ourstrategy relies
heavily
not only onestimating hotel prices, but also taking samples fromthe distributionof
hotel
prices.Yet these results indicate that using expected hotel price, rather
than price samples, consistently performs better.We speculate that this may
be because an insufficient number of samples are being used {due tocomputational
limitations} so that the numbersderived
from these samples have too high a variance. Another possibilityis that
the method of using
samples toestimate scores consistently overestimates the expected score
because it assumes
the agent can behave with perfect knowledge for each individual sample|a
propertyof our
approximation scheme.Finally, as our algorithm uses sampling atseveral different
points
{computing hotel expected values,deciding when to buy flights, pricingentertainment
tick-
ets, etc.}, it is quite possible that sampling is beneficial for somedecisions
while detrimental
forothers. For example, when directly comparingversions of the algorithm
with sampling used at only subsets of the decision points, the data suggests
that sampling for the hotel decisions is most beneficial, while samplingfor
the flights and entertainment ticketsis neu-
tral at best, and possiblydetrimental. This result is not surprisinggiven
that the sampling
approach is motivated primarilyby the task of bidding forhotels.
We were also surprised thatCond'lMean
s
and Cond'lMean ev
eventually performed worse than the less sophisticated SimpleMean s
and SimpleMean
ev
. One possible explanation isthat
the simpler model happens to give predictions that are just as good as themore
complicated
model, perhaps becauseclosing time is not terribly informative,or perhaps
because the
adjustmentto price based on current price is moresignificant. Other things
being equal, the simpler model has the advantage that its statistics are
based on all of the price data,
regardless of closing time,whereas the conditional model makes eachpredictionbased
on
only an eighth ofthe data {since there are eight possible closing times,
each equally likely}.
In addition to agent performance, it is possible to measure the inaccuracy
of the eventual predictions, at least for the non-samplingagents. For these
agents, we measuredthe root
234Decision-TheoreticBidding with Learned Density Models
mean squared error of the predictions made inPhase III. These were: 56.0
forATT ac-2001
ev
, 66.6 for SimpleMean
ev , 69.8 for CurrentBid and 71.3 forCond'lMean
ev
. Thus,we see that
the lower the error ofthe predictions {according to this measure},thehigher
the score
{correlationR= 0000:88}. 6.3.2 ATT ac-2001 vs.EarlyBidder
In a sense, the twoagents that finished at the top of the standings in TAC-01
represented
oppositeends of a spectrum. The livingagentsagent uses a simple open-loop
strategy, com-
mitting to a set of desired goods right at the beginning of the game,while
ATT ac-2001 uses
a closed-loop, adaptive strategy. The open-loop strategy relies on the other
agents to stabilize the economy and create consistent final prices. In particular,
if all eight agents are open loop and placevery high
bids for the goods theywant, many of the prices will skyrocket, evaporating
any potential profit. Thus, a set of open-loopagents would tend to get negative
scores|theopen-loop
strategy is a parasite, in amanner of speaking. Table 13 shows theresults
of running 27
games with 7 copiesof the open-loop EarlyBidder and one ofATT ac-2001. Although
motivated
by livingagents, in actuality it is identical to ATT ac-2001except that
it uses SimpleMean ev
and
it places all ofits flight and hotel bids immediately after thefirst flight
quotes. It bids only for the hotels that appear inG
003
at that time.All hotel bids are for $1001. Inthe
experiments, one copy ofATT ac-2001
s
isincluded for comparison. The price predictors are all from Phase I in
the preceding experiments. EarlyBidder's high biddingstrategy backfires
and it ends up overpaying significantly for its goods.As our experiments
above indicate, ATT ac-2001 may improve even further if it is allowed to
train on thegames of the on-going
experiment as well.AgentScoreUtilityATT ac-20012431 006 4648909 006 264EarlyBidder{7}0004880006
3379870 00634Table 13: The results of runningATT ac-2001 against 7 copies
ofEarlyBidder over the course of 27 games. EarlyBidderachieves high utility,
but overpays significantly, resulting
in low scores.
The open-loop strategyhas the advantage of buying a minimal setof goods.
That is, it
never buysmore than it can use. On the other hand, itis susceptible to unexpected
prices
in that it can get stuck paying arbitrarilyhigh prices for the hotel rooms
it has decidedto
buy.
Notice in Table 13 that the average utilityof theEarlyBidder's clients is
significantly greater than that of ATT ac-2001's clients. Thus, the difference
inscore is accounted for
entirelyby the cost of the goods. EarlyBidderends up paying exorbitant prices,
whileATT ac-
2001 generally steers clearof the more expensive hotels. Itsclients' utility
suffers, but the cost-savings are well worth it. Compared to the open-loop
strategy, ATT ac-2001's strategyis relatively stable against
itself. Itsmain drawback is that as it changes itsdecision about what goods
it wants andas
235Stone, Schapire,Littman, Csirik, & McAllester
it may alsobuy goods to hedge against possible price changes, it can end
up getting stuck paying for some goods that are ultimatelyuseless to any
of its clients.
Table 14 shows the results of 7 copiesof ATT ac-2001 playing against eachother
and
one copy of theEarlyBidder. Again, training is from theseeding round and
finals of TAC- 01: the agents don't adapt during the experiment. Included
in this experiment arethree
variants of ATT ac-2001, each with a differentflight-lookahe ad parameter
{from thesection
on \cost of postponing flightcommitments"}. There were three copies each
of the agents
with flight-lookahe ad setto 2 and 3 {ATT ac-2001{2} and ATT ac-2001{3},
respectively}, and
oneATT ac-2001 agent withflight-lookahe ad set to 4 {ATT ac-2001{4}).AgentScoreUtilityEarlyBidder2869
0066910079 006 55ATT ac-2001{2}2614 006 389671 006 32ATT ac-2001{3}2570
006399641 006 32ATT ac-2001{4}2494 006 689613 006 55Table 14:The results
of runningthe EarlyBidderagainst 7 copies of ATT ac-2001over the
course of 197 games.The three different versions ofATT ac-2001 had slightly
different flight-lookahe ads.
F rom the results in Table 14 it is clear that ATT ac-2001does better when
committing
to its flight purchases later in the game {ATT ac-2001{2} as opposed toATT
ac-2001{4}). In
comparison with Table 13, the economyrepresented here does significantly
better overall.
That is, having many copies ofATT ac-2001 in the economy does notcause them
to suffer.
However, inthis economy, EarlyBidder is able toinvade. It gets a significantly
higherutility
for its clients and only pays slightly more than the ATT ac-2001agents {as
computed by
utilityminus score}.
9
Theresults in this section suggest that the variance of the closing prices
is the largest determining factor between the effectiveness of the two strategies
{assuming nobody else
is using the open-loop strategy}.We speculate that with large price variances,
the closed-
loop strategy {ATT ac-2001} should do better, butwith small price variances,
the open-loop strategy could do better.
7. Discussion
The open-loop andclosed-loop strategies of the previous sectiondiffer in
their handling of
pricefluctuation. A fundamental way of takingprice fluctuation into account
is to place \safe bids." A very high bid exposesan agent to the danger of
buying something at a ridiculously high price. Ifprices are in fact stable
then high bids are safe.But if prices
fluctuate, then highbids, such as the bids of the stable-pricestrategy,
are risky. In TAC,
hotel rooms are sold in a Vickrey-style nth price action. Thereis a separate
auction for
eachday of each hotel and these auctions are donesequentially. Although
the order of the auctions is randomized, and not known tothe agent, when
placing bids in one of these9. We suspectthat were the agents allowed to
retrain over the course of the experiments,ATT ac-2001
would end up improving, as we saw in Phase III of theprevious set of experiments.
Werethis to occur,
it is possible thatEarlyBidder would no longer be able to invade.
236Decision-TheoreticBidding with Learned Density Models
auctions the agent assumes that auction willclose next. We assumed in the
design ofour
agent that our bids in one auction donot affect prices in other auctions.This
assumption
is not strictly true, butin a large economy one expects that the bidsof
a single individual
havea limited effect on prices. Furthermore,the price most affected by a
bid is the price of the item being bid on; the effect on other auctions seems
less direct and perhaps more limited. Assuming bids in one auction donot
affect prices in another, the optimal bidding strategy is the standard strategy
for a Vickrey auction|the bid for an item should be equal to its utility
to the bidder.So, to place a Vickrey-optimal bid, one mustbe able to estimate
the utility of an item. The utility of owning an item issimply the expected
final score
assuming one owns the item minus the expected final score assuming one does
not ownthe
item. So, the problem of computing a Vickrey-optimal bid can be reduced
to theproblem
of predicting final scores for two alternative game situations. Weuse two
score prediction
procedures,which we call the stable-price score predictor{corresponding
to Equation 5}
andthe unstable-price score predictor {Equation 4}. The Stable-Price Score
Predictor.The stable-price score predictor first estimates the expected prices
in the rest of thegame using whatever information is available in the
givengame situation.It then computes the value achieved by optimal purchases
under the
estimatedprices. In an economy with stable prices,this estimate willbe quite
accurate| if we make the optimal purchases forthe expected price then, if
the prices are nearour
estimates, our performance will also be near the estimated value.
The Unstable-Price Score Predictor.Stable-price score prediction does not
take into account the ability of the agent to react to changes in price as
the gameprogresses. Sup-
pose a given roomis often cheap but is sometimes expensive.If the agent
can first determine the price of the room, and then plan forthat price, the
agent willdo better thanguessing
the price ahead of time and sticking to the purchases dictated by that price.The
unstable
price predictor uses a model of the distribution of possible prices.It repeatedly
samples
pricesfrom this distribution, computes the stable-price scoreprediction
under the sampled
price, and thentakes the average of these stable-price scoresover the various
price samples. This score prediction algorithm is similar to the algorithm
used in Ginsberg's {2001}quite
successful computer bridge program wherethe score is predicted by sampling
the possible hands of the opponent and, for each sample, computing the score
of optimal play inthe case
where all players havecomplete information {double dummy play}.While this
approach
has a simple intuitive motivation, it is clearly imperfect.The unstable-price
score predictor assumes both that future decisions are made in the presence
of complete price information, and that the agent is free to changeexisting
bids in auctions that have not yetclosed. Both
of these assumptions are onlyapproximately true at best. Waysof compensating
for the
imperfectionsin score prediction were described in Section 5. Buy Now or
Decide Later.The trading agent must decide what airlinetickets to buy
and when to buy them.In deciding whether to buy an airline ticket,the agent
can compare
the predicted score in the situation where it owns the airline ticket with
the predicted score
in thesituation where it does not own the airline ticket but may buy it
later. Airlinetickets
tend to increase in price, soif the agent knows that a certain ticketis
needed it should buy
it as soon aspossible. But whether or not a given ticket is desirable may
depend on the price of hotel rooms, which may becomeclearer as the game progresses.
If airline tickets
did not increase in price, as wasthe case in TAC-00, thenthey should bebought
at the
237Stone, Schapire,Littman, Csirik, & McAllester
lastpossiblemoment {Stone et al., 2001}. To determine whether an airline
ticket shouldbe
bought now or not, one cancompare the predicted score in the situation whereone
has just
bought the ticket atits current price with the predicted score in thesituation
where the
price of the ticketis somewhat higher but has not yet been bought. It is
interesting to note that if one uses the stable-price score predictorfor
both of these predictions, and the ticket is purchased in the optimal allocationunder
the current price estimate, then the predicted score for buying the ticket
now willalways be higher|increasing the price of theticket can
only reduce the score.However, the unstable-price score predictorcan yield
an advantage
fordelaying the purchase. This advantage comes from the fact that buying
the ticket may
be optimal under some pricesbut not optimal under others. If the ticket
has not yet been
bought, then thescore will be higher for those sampled priceswhere the ticket
shouldnot
be bought. This corresponds to the intuition thatin certain cases the purchase
should be delayed until more information is available.
Our guiding principle in the design of the agent was, to the greatest extent
possible, to
have the agentanalytically calculate optimal actions. Akey component of
these calculations is the score predictor, based either on asingle estimated
assignment of prices or on a model of
the probability distribution overassignments of prices. Both score predictors,though
clearly
imperfect, seem useful.Of these two predictors, only theunstable-price predictor
can be
usedto quantitatively estimate the valueof postponing a decision until more
information is
available. The accuracy ofprice estimation is clearly of central importance.Future
research
willundoubtedly focus on ways of improving both price modeling and score
prediction based on price modeling.
8.Related and Future Work Although there has been a good dealof research
on auction theory, especiallyfrom the
perspective of auction mechanisms {Klemperer, 1999}, studies of autonomousbiddingagents
and their interactions are relatively few and recent. TACis one example.
FM97.6 is
anotherauction test-bed, which is based on fishmarket auctions {Rodriguez-Aguilar,
Martin, Noriega, Garcia, & Sierra, 1998}.Automatic bidding agents have also
beencreated in this
domain {Gimenez-Funes, Godo, Rodriguez-Aguiolar, & Garcia-Calves, 1998}.
There have
beena number of studies of agents biddingfora single good in multiple auctions
{Ito, Fukuta, Shintani,& Sycara, 2000; Anthony, Hall, Dang, & Jennings, 2001;Preist,Bartolini,
& Phillips,2001}. A notable auction-based competition that was held prior
to TAC was the Santa Fe
Double Auction Tournament{Rust, Miller, & Palmer, 1992}. Thisauction involved
several
agentscompeting in a single continuous double auction similar to the TAC
entertainment ticket auctions. As analyzed by Tesauro and Das {2001}, this
tournament was won by a
parasite strategy that, like livingagents as described in Section 6.3,reliedon
other agents to
find a stableprice and then took advantageof it to gain an advantage. Inthat
case, the
advantagewas gained by waiting until the last minute to bid, a strategy
commonly known as sniping.
TAC-01was the second iteration of the Trading Agent Competition. Therules
of TAC-
01 are largely identical to those of TAC-00, withthree important exceptions:
1. In TAC-00, flight prices did not tend toincrease;
238Decision-TheoreticBidding with Learned Density Models
2. In TAC-00, hotel auctions usuallyall closed at the end of the game; 3.
In TAC-00, entertainment tickets were distributeduniformly to all agents
While minor on the surface, the differencessignificantly enriched the strategic
complexity of the game. In TAC-00,most of the designers discovered that a
dominant strategy was
to defer all serious biddingto the end of the game. A result, the focus
was on solving
the allocation problem,with most agents usinga greedy,heuristicapproach.
Since the
hotel auctions closed at the end of the game,timing issues were also important,
with significant advantages going toagents that were able to bidin response
tolast-second price
quotes {Stone & Greenwald, 2003}. Nonetheless, many techniques developed
in 2000 were
relevant to the 2001 competition: theagent strategies put forth in TAC-00were
important
precursors to the secondyear's field, for instance as pointed out in Section
5.1.
Predicting hotel clearing prices was perhaps the most interesting aspect
of TAC agent
strategies in TAC-01, especially in relation to TAC-00 where the last-minute
bidding created essentially a sealed-bid auction.As indicated by our experiments
describedin Section 6.3,
there are many possible approaches to this hotel price estimation problem,and
the approach
chosen can havea significant impact on the agent's performance. Among those
observed in TAC-01 are the following {Wellman, Greenwald, Stone, & Wurman,2002},
associated in
some cases with theprice-predictor variant in our experimentsthat was motivated
by it. 1. Just use the current price quotep
t
{CurrentBid}.
2. Adjust based on historic data.For example, if 001
t is the average historicaldifference between clearing price and price at
timet, then the predicted clearing price isp
t
+ 001 t
.
3. Predictby fitting a curve to the sequence of askprices seen in the current
game.
4. Predict based on closing price data forthat hotel in past games {SimpleMean
ev
, SimpleMean
s
}.
5. Same as above, but condition on hotel closing time, recognizingthat the
closing
sequence will influencethe relative prices.
6. Sameas above, butcondition on full ordering ofhotel closings, or which
hotels are open or closed at a particular point{Cond'lMean
ev
,Cond'lMean
s
}. 7. Learn a mapping from features of thecurrent game {includingcurrent
prices} toclosing
prices based on historic data {ATT ac-2001
s
,ATT ac-2001
ev
}.
8. Hand-construct rules based onobservations about associations between
abstract fea-
tures.
Havingdemonstrated ATT ac-2001's success atbidding in simultaneous auctions
for mul- tiple interacting goods in the TAC domain, we extended our approach
toapply it to the
U.S. FederalCommunications Commission {FCC} spectrum auctions domain {Weber,
1997}.
TheFCC holds spectrum auctions to sell radiobandwidth to telecommunications
compa- nies. Licenses entitle their owners touse a specified radio spectrum
band within aspecified
geographical area, ormarket. Typically several licenses areauctioned off
simultaneously
withbidders placing independent bids for eachlicense. The most recent auction
brought in 239Stone, Schapire,Littman, Csirik, & McAllester
over $16billion dollars. In a detailed simulation ofthis domain {Csirik,
Littman, Singh, & Stone, 2001}, we discovered a novel,successful bidding
strategy inthis domain that allows
the bidders to increase their profitssignificantly over a reasonable default
strategy {Reitsma,
Stone, Csirik, & Littman, 2002}. Our ongoing research agenda includes applying
our approach to other similar domains.
We particularlyexpect the boostingapproach to price prediction and thedecision-theoretic
reasoning over pricedistributionsto transfer to other domains.Other candidate
real-world
domainsinclude electricity auctions, supplychains, and perhaps even travel
booking on public e-commerce sites.
Acknowledgements
Thiswork was partiallysupported by the United States{Israel BinationalScience
Founda-
tion {BSF}, grant number 1999038. Thanks to the TAC team at the University
of Michigan for providing the infrastructure and support required to run
many of our experiments. Thanks to Ronggang Yu at the University of Texas
at Austin for running oneof the ex-
periments mentioned in thearticle. Most of this research was conductedwhile
all of the
authors were at AT&T Labs - Research.
References Anthony, P., Hall, W., Dang, V. D., &Jennings, N. R. {2001}.
Autonomousagents for
participating in multipleon-line auctions. In Pro c e e dingsof the IJCAI-2001
Workshop
on E-Business and the Intel ligent Web Seattle, WA.
Boutilier, C. {2002}. Apomdp formulation of preference elicitationproblems.
In Pro c e e dings of the Eighteenth National Conferenc e on Artificial Intel
ligence, pp. 239{246.
Collins, M., Schapire, R. E., & Singer, Y. {2002}.Logistic regression, AdaBoost
and Breg- man distances. Machine Le arning, 48 {1/2/3}.
Csirik,J. A., Littman, M. L., Singh, S., & Stone, P. {2001}. FAucS: An FCC
spectrum auction simulator for autonomous biddingagents. In Fiege, L., MÂȘuhl,
G., & Wilhelm,
U. {Eds.},Electr onic Commerc e: Pro c e e dings of the Sec ondInternational
Workshop,
pp. 139{151Heidelberg,Germany. SpringerVerlag.
F reund, Y., & Schapire, R. E.{1997}. A decision-theoretic generalization
ofon-line learning
and an application to boosting. Journal of Computer and System Sciences,
55 {1},
119{139. Fritschi, C., & Dorer, K. {2002}.Agent-oriented software engineering
forsuccessful TAC
participation.In First International Joint Conferenc e on Autonomous Agents
and Multi-Agent Systems Bologna. Garey, M. R., & Johnson, D. S. {1979}.Computers
and Intractability:A Guide to the
Theoryof NP-completeness. Freeman,San Francisco, CA.
240Decision-TheoreticBidding with Learned Density Models
Gimenez-Funes, E., Godo, L., Rodriguez-Aguiolar, J. A., & Garcia-Calves,
P. {1998}. De-
signing biddingstrategies for trading agents in electronic auctions.In Pro
c e e dings of the Third International Conferenc e on Multi-Agent Systems,
pp. 136{143.
Ginsberg, M. L. {2001}.GIB: Imperfect information in a computationallychallenging
game.
JAIR,14, 303{358.
Greenwald,A.,& Boyan, J. {2001}. Biddingalgorithms for simultaneous auctions.
In Pro c e e dings of Third ACM Conferenc e on E-Commerc e, pp. 115{124 Tampa,
FL. Harsanyi, J. {1967{1968}. Gameswith incomplete information played by
bayesian players.
Management Science, 14, 159{182,320{334,486{502. Hauskrecht, M. {1997}.
Incrementalmethods for computing bounds in partially observable
Markov decisionprocesses.In Pro c e e dings of the Fourte enth National
Conferenc eon
Artificial Intel ligence, pp. 734{739.
He, M., & Jennings,N. R. {2002}. SouthamptonTAC:Designing a successful trading
agent. In Fifteenth Europ e anConferenc e on Artificial Intelligence Lyon,
France. Ito, T., Fukuta, N., Shintani, T., & Sycara, K. {2000}. Biddingbot:
amultiagent support
systemfor cooperative bidding in multipleauctions. In Pro c e e dingsof
the Fourth
InternationalConferenc e on MultiAgent Systems, pp. 399{400.
Kearns, M., Mansour, Y., &Ng, A.Y. {1999}. A sparse sampling algorithmfor
near-
optimal planning in large Markov decision processes. In Pro c e edings of
the Sixteenth
InternationalJoint Conferenc e on Artificial Intelligence {IJCAI-99}, pp.
1324{1331. Klemperer, P. {1999}.Auction theory: A guide to the literature.Journal
of Economic
Surveys, 13 {3}, 227{86.
Littman, M. L., Ma jercik, S. M., & Pitassi, T. {2001}. Stochastic Booleansatisfiability.
Journalof Automated Re asoning,27 {3}, 251{296.
Papadimitriou, C. H., & Tsitsiklis, J. N. {1987}. Thecomplexity of Markov
decisionpro- cesses. Mathematics of Oper ationsRese ar ch, 12 {3},441{450.
Preist, C., Bartolini, C., & Phillips, I. {2001}. Algorithm design for agents
which partici-
pate in multiple simultaneousauctions. In Agent Mediate dElectr onic Commerc
e III {LNAI}, pp. 139{154. Springer-Verlag, Berlin.
Reitsma, P.S. A., Stone, P., Csirik, J. A., & Littman, M. L. {2002}. Self-enforcing
strategic demand reduction. In AgentMediate d Electr onic Commerc e IV: Designing
Mecha-
nisms and Systems, Vol. 2531 ofLe ctur e Notes in Artificial Intelligence,
pp. 289{306.
Springer Verlag.
Rodriguez-Aguilar, J. A., Martin, F. J., Noriega, P., Garcia, P.,& Sierra,
C. {1998}. Towards a test-bed for trading agents inelectronic auction markets.
AI Communications, 11 {1},
5{19. 241Stone, Schapire,Littman, Csirik, & McAllester
Rothkopf, M.H., & Harstad, R. M. {1994}. Modelingcompetitive bidding: A
critical essay.
Management Science,40 {3}, 364{384.
Rust, J., Miller,J., & Palmer, R. {1992}. Behaviorof trading automata in
a computerized double auction market. In Friedman, D., & Rust, J. {Eds.},
The Double Auction
Market: Institutions, Theories, and Evidence. Addison-Wesley , Redwood City,CA.
Schapire, R. E., & Singer, Y. {1999}.Improved boosting algorithms usingconfidence-rated
predictions. MachineLe arning, 37 {3},297{336.
Schapire, R. E., & Singer, Y.{2000}. BoosTexter: A boosting-basedsystem
for text cate-
gorization.Machine Le arning, 39{2/3}, 135{168.
Stone, C. J. {1994}.The use of polynomial splines and theirtensor products
in multivariate function estimation. The Annals ofStatistics, 22 {1}, 118{184.
Stone, P., & Greenwald, A.{2003}. The first international trading agent competition:
Autonomous biddingagents.Electr onic Commerc e Rese ar ch. To appear. Stone,
P., Littman, M. L., Singh, S.,&Kearns, M. {2001}. ATTac-2000:An adaptive
autonomous biddingagent.Journal of Artificial Intel ligence Rese ar ch,
15, 189{206. Tesauro, G., & Das, R. {2001}.High-performance bidding agents
for the continuous double
auction. In Third ACM Conferenc e on Electr onic Commerc e, pp. 206{209.
Weber, R. J. {1997}. Makingmore from less: Strategic demand reduction inthe
FCC
spectrum auctions.Journal of Economics and Management Strate gy, 6 {3},
529{548. Wellman, M. P., Greenwald, A.,Stone, P., & Wurman, P. R. {2002}.
The 2001 trading agent competition. In Pro c e e dingsof the Fourte enth
Innovative Applications of Artificial
Intelligence Conferenc e, pp. 935{941. Wellman, M. P., Greenwald, A.,Stone,
P., & Wurman, P. R. {2003a}. The 2001 trading agent competition. Electr onicMarkets,13
{1}, 4{12. Wellman, M. P., Reeves, D.M.,Lochner, K. M., & Vorobeychik,Y.
{2003b}. Price prediction
in a trading agent competition.Tech. rep., University of Michigan. Wellman,
M. P., Wurman,P. R., O'Malley, K., Bangera, R., Lin, S.-d., Reeves, D., &
Walsh,
W. E. {2001}. A trading agent competition. IEEE Internet Computing,5 {2},
43{51.
242