Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions

P. Stone, R. E. Schapire, M. L. Littman, J. A. Csirik and D. McAllester

Volume 19, 2003

Links to Full Text:

Abstract:
Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives.

Extracted Text

Journal of ArtificialIntelligence Research 19 {2003} 209-242Submitted 12/02;
published 9/03
Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous,
InteractingAuctions
 Peter Stonepstone@cs.utexas.edu
 Dept. of ComputerSciences, The University of Texasat Austin
 1 University Station C0500, Austin, Texas 78712-1188 USA
Robert E. Schapire schapire@cs.princeton.edu Department of Computer Science,
Princeton University
 35 Olden Street, Princeton, NJ 08544 USA
 Michael L. Littmanmlittman@cs.rutgers.edu
 Dept.of Computer Science, Rutgers University Piscataway, NJ 08854-8019 USA
 Janos A. Csirik janos@pobox.com D. E. Shaw & Co.
 120 W 45thSt, New York, NY 10036 USA
 David McAllester mcallester@tti-chicago.edu Toyota TechnologicalInstituteat
Chicago
 1427 East 60th Street, Chicago IL, 60637 USA
 Abstract Auctions are becoming an increasinglypopular method for transacting
business, espe- cially over the Internet. Thisarticle presents a general
approach to buildingautonomous
 bidding agents to bid in multiple simultaneous auctions for interacting
goods. A core
 component of ourapproach learns a model of the empiricalpricedynamics based
on past
 data and uses the model to analytically calculate, to the greatest extent
possible, optimal
 bids. We introduce a new and general boosting-based algorithm for conditionaldensity
estimation problems of this kind,i.e., supervisedlearning problems in whichthe
goal is to
 estimate the entireconditional distribution of the real-valuedlabel. This
approach is fully implemented as ATT ac-2001, a top-scoring agent in the
second Trading Agent Competition
 {TAC-01}. We present experimentsdemonstrating the effectiveness of our boosting-based
 price predictor relative to several reasonable alternatives.
 1.Introduction
 Auctions are anincreasingly popular method for transacting business, especially
over the
 Internet.In an auction for a single good, it isstraightforward to create
automated bidding strategies|an agent could keep biddinguntil reaching a
target reserve price, or itcould
 monitor the auction and place awinning bid just before the closing time
{knownas sniping}.
 When bidding for multiple interacting goods in simultaneousauctions, on
the other
 hand, agents mustbe able to reason about uncertainty and make complex value
assess-
 ments. For example, an agent bidding on one's behalf in separate auctions
for a camera and flash may end up buying the flash andthen not being able
to find an affordablecamera.
 Alternatively, if bidding forthe same good in several auctions, it maypurchase
two flashes
 when only one was needed.
 c
 fl2003AI Access Foundation and Morgan KaufmannPublishers. All rights reserved.Stone,
Schapire, Littman, Csirik, & McAllester
 This article makesthree main contributions. The first contribution is a
general ap-
 proachto buildingautonomous biddingagentsto bid in multiple simultaneous
auctions for interacting goods. Westart with the observation that the key
challenge in auctions is the
 prediction of eventual prices of goods: withcomplete knowledge of eventual
prices, there are direct methods for determining the optimalbids to place.
Our guiding principle is to havethe agent model itsuncertainty in eventual
prices and, tothe greatest extent possible,
 analyticallycalculate optimal bids.
 To attack theprice prediction problem, we propose a machine-learning approach:
gather
 examplesof previous auctions and the prices paid in them,then use machine-learning
meth-
 odsto predict these prices based on available features in the auction. Moreover,forour
 strategy, weneeded to be able to model the uncertainty associatedwith predicted
prices;
 in other words, we needed to be able to sample from a predicteddistribution
of prices
 given the currentstate of the game. This can be viewed asa conditional density
estimation problem, that is, a supervised learningproblem in which the goal
is to estimate the entire
 distribution of a real-valued label given a description of currentconditions,
typically in the
 formof a feature vector. The second main contribution of this article is
a new algorithm for solving such general problems based onboosting {Freund
& Schapire, 1997; Schapire&
 Singer, 1999}.
 Thethird contribution of this article is a complete descriptionof a prototype
implemen-
 tation of ourapproach in the form of ATT ac-2001, a top-scoring agent
 1 in the second Trading
 Agent Competition{TAC-01}that was held in Tampa Bay, FL on October 14, 2001
{Well- man, Greenwald, Stone, & Wurman, 2003a}. The TACdomain was the main
motivation for the innovations reported here.ATT ac-2001 builds on top ofATT
ac-2000 {Stone, Littman, Singh, & Kearns, 2001}, the top-scoring agentat
TAC-00, but introduces a fundamentally
 new approach to creating autonomousbiddingagents.
 We present detailsof ATT ac-2001 as an instantiation ofits underlying principles
that we believe have applications in a wide variety of bidding situations.
ATT ac-2001 uses a predic-
 tive,data-driven approach to biddingbased on expected marginal values of
all available
 goods. In this article, we present empirical results demonstrating the robustnessand
effec-
 tiveness of ATT ac-2001's adaptive strategy. We also report on ATT ac-2001's
performance at
 TAC-01 and TAC-02 and reflect on some of the key issues raised during the
competitions.
 The remainder of the article isorganized as follows. In Section 2, wepresent
our
 general approach to biddingfor multiple interacting goods in simultaneous
auctions. In
 Section3, we summarize TAC, the substrate domain for our work. Section 4
describes our boosting-basedprice predictor. InSection 5, we give the details
ofATT ac-2001. In Section 6, we present empirical results including asummary
of ATT ac-2001's performancein TAC-
 01, controlledexperiments isolating the successful aspects ofATT ac-2001,
and controlled experiments illustrating someof the lessons learned during
the competition.A discussion
 andsummary of related workis provided in Sections 7 and 8. 2. General Approach
In a wide variety of decision-theoreticsettings, it is useful to be able
to evaluate hypothetical
 situations.In computer chess, for example, a static board evaluator is used
to heuristically1. Top-scoringby one metric, and second place by another.
210Decision-TheoreticBidding with Learned Density Models
 measure which player is ahead and by how much in a given board situation.The
scenario
 is similar in auction domains,and our bidding agent ATT ac-2001uses a situation
evaluator,
 analogous to the static board evaluator, whichestimates the agent's expectedprofit
in a
 hypothetical futuresituation. This \profit predictor" has a widevariety
of uses in the agent. For example, to determine the value of an item, the
agent compares thepredicted profit
 assuming the item is alreadyowned to the predicted profit assuming that
theitem is not
 available.
 Given prices for goods, one can oftencompute a set of purchases and an allocationthat
 maximizes profit.
 2 Similarly, if closing prices are known, they can be treated as fixed,
and optimal bids can be computed {bid high foranything you want to buy}.
So,one natural
 profit predictor is simply tocalculate the profit of optimal purchases underfixed
predicted
 prices. {Thepredicted prices can, of course, be differentin different situations,
e.g., previous closing prices can be relevantto predicting future closing
prices.} A more sophisticated approach to profitprediction is to construct
a model of the prob- ability distribution over possiblefuture prices and
to place bids that maximizeexpe cte d
 profit. Anapproximate solution to this difficult optimizationproblem can
be created by
 stochastically sampling possible prices and computing a profit prediction
as above for each sampled price. A sampling-based schemefor profit prediction
is important for modeling
 uncertainty and the value of gaining information, i.e., reducingthe price
uncertainty.
 Section2.1 formalizes this latter approach within asimplifiedsequential
auction model. This abstraction illustrates some of thedecision-making issues
in our full sampling-based approach presented in Section 2.2.The full setting
that our approach addresses isconsid-
 erablymore complex than the abstractmodel, but our simplifying assumptions
allow usto
 focus on a core challenge of thefull scenario. Our guiding principle is
to make decision-
 theoretically optimal decisions given profit predictions for hypothetical
futuresituations.
 3
 2.1 SimplifiedAbstraction
 In the simple model, thereare n items to be auctioned off insequence {first
item 0, then
 item1, etc.}. The bidder must place a bidr
 i
 for each itemi, and after each bid, a closing pricey
 i
 is chosen for the corresponding item from a distribution specific to the
item. If the
 bid matches or exceeds the closing price,r
 i
 025 y i
 , the bidder holds itemi, h
 i
 = 1.Otherwise,
 the bidder does not hold theitem, h
 i
 = 0.The bidder's utility v{H} is a function of its
 final vector of holdings H = {h 0
 ; : : : ; h n0001
 } and its costis a function of the holdings and
 the vector of closing prices, H001Y . We willformalize the problem of optimal
bid selection
 and develop a seriesof approximations to make the problem solvable.2. The
problem iscomputationally difficult in general, buthas beensolved effectively
in the non-trivial TAC setting {Greenwald & Boyan, 2001; Stone et al., 2001}.
 3.An alternative approach would be toabstractly calculate the Bayes-Nash
equilibrium{Harsanyi, 1968}
 for the game and playthe optimal strategy. We dismissed thisapproach because
of its intractability in realistically complex situations, including TAC.
Furthermore, even if we wereable to approximate the
 equilibrium strategy, it is reasonable to assume that our opponents would
not play optimal strategies. Thus, we could gain additional advantageby tuning
our approach to our opponents'actual behavior as
 observed in theearlier rounds, which is essentially the strategywe adopted.
 211Stone, Schapire, Littman, Csirik, & McAllester
 2.1.1 ExactValue
 What is the valueof the auction, that is, the bidder'sexpected profit {utility
minus cost} for biddingoptimally for the rest of theauction? If a bidder
knows this value, it can make
 its next bid to be one that maximizes its expected profit.The value is a
function of the bidder's current holdings Hand the current item to bid on,i.
It can be expressed as value{i; H } = max r
 i
 E
 y
 i
 max
r
 i+1
 E
 y
 i+1
 : : :max
 r
 n0001
 E
 y
 n0001
 {v{G+ H } 000 G 001Y }; {1}
 wherethe components of G are the new holdingsas a result of additional winnings
g j
 021
 r
 j
 025 y
 j . Note that H onlyhas non-zero entries for items that havealready been
sold
 {8j025 i; H
 j
 = 0} and G only has non-zero entriesfor items that have yet to be sold {8j
< i; G
 j
 = 0}. Note also thatG and Y are fully specified whenthe g
 j
 and y j
 variables
 {forj 025 i} are boundsequentially by the expectation and maximization
operators. The
 idea here is that the bidsr
 i
 through r n0001
 are chosen tomaximize value in the context of the possibleclosing prices
y j
 .
 Equation 1 is closelyrelated to the equations defining the value of a finite-horizon
par-
 tiallyobservable Markov decisionprocess {Papadimitriou & Tsitsiklis, 1987}
or astochastic
 satisfiability expression{Littman, Ma jercik, & Pitassi, 2001}.Like these
other problems,
 thesequential auction problem is computationally intractable for sufficiently
general repre- sentations of v{001} {specifically, linear functions of theholdings
are not expressive enough to achieve intractability while arbitrarynonlinear
functions are}.
 2.1.2Approximate Value by Reordering There are three ma jor sources of intractability
in Equation 1|the alternation of themaxi-
 mizationand expectation operators{allowing decisions to be conditioned on
an exponential
 number of possible setsof holdings}, the large number of maximizations {forcing
an ex-
 ponential numberof decisions to be considered}, and the large number of
expectations
 {resultingin sums over an exponential numberof variable settings}.
 We attack the problem of interleaved operators by moving all but the first
of themaxi-
 mizations inside the expectations,resulting in an expression that approximates
the value:
 v alue-est{i; H} = max
 r
 i E
 y
 i
 E
 y
 i+1 : : : E
 y
 n0001
 max
 r
 i+1
 : : :max
 r
 n0001
 {v{G +H } 000 G 001 Y}: {2}
 Because the choicesfor bids r
 i+1
 through r
 n0001 appear more deeply nested than the bindings for the closing prices
y
 i
 through y
 n0001
 , they cease to be bidsaltogether, and instead represent
 decisions as to whether to purchase goods atgiven prices. Let G = opt{H;
i; Y } be a vector representing the optimal number of goods to purchase at
the prices specified bythe vector
 Y giventhe current holdings H startingfrom auction i. Conceptually,this
can be computed
 by evaluating
 opt{H; i; Y} = argmax
 g
 i
 ;:::;g
 n0001
 {v{G +H } 000 H 001Y }: {3}
 Thus,Equation 2 can be written:
 value-est{i; H } = max r
 i
 E
 y
 i
 ;:::;y n0001
 {v{opt{H
 0
 ;i + 1; Y } + H 0
 } 000 opt{H
 0
 ; i + 1;Y } 001 Y } {4} 212Decision-TheoreticBidding with Learned Density
Models
 where H
 0
 is identicalto H except the i-th componentreflects whether item i is won|r
 i
 025 y
 i
 .
 Note that there is afurther approximation that can be made bycomputing the
expected
 prices {as pointvalues} before solving the optimizationproblem. This approach
corresponds to further swapping the expectations towards the core of the
equation:
 value-est{i; H }
 ev = max
 r
 i {v{opt{H
0
 ; i + 1; E Y
 } + H
0
 } 000 opt{H 0
 ;i + 1;E
 Y
 } 001 E Y
 } {5}
 whereE
 Y
 = E[y
 i+1
 ;: : : ; y
 n0001 ], the vector of expected costs ofthe goods. In the remainder of
the article, we refer to methods thatuse this further approximation from
Equation 5 as expected value approaches for reasonsthat will be come apparent
shortly.
 The technique of swapping maximization and expectation operators was previously
used by Hauskrecht {1997} to generate a bound for solving partially observableMarkov
decision
 processes.The decrease of uncertainty when decisions aremade makes this
approximation
 anupper bound on the true valueof the auction: value-est 025value. The
tightness of the ap- proximations in Equations 2 and 5 dependson the true
distributions of the expected prices. For example, if the prices were known
in advance with certainty, thenboth approximations
 are exact.
 2.1.3 Approximate Bidding
 Given a vector of costsY , the optimization problem opt{H; i; Y } in Equation
4 is stillNP-
 hard {assuming the representation of the utility function v{001} issufficiently
complex}. For
 many representations of v{001}, the optimization problem can be castas
an integer linear pro-
 gram and approximated by using the fractional relaxation insteadof the exact
optimization
 problem.This is precisely the approach we haveadopted in ATT ac {Stone et
al., 2001}. 2.1.4 Approximation via Sampling Even assuming that opt{H;i;
Y } can be solved in unit time, aliteral interpretation of Equa-
 tion 4 says we'll need to solve this optimization problemfor an exponential
number of cost vectors {or even more if the probabilitydistributions Pr{y
 j } are continuous}. Kearns,Man-
 sour, and Ng {1999} showed that values of partially observableMarkov decisionprocesses
 could beestimated accurately by sampling tra jectoriesinstead of exactly
computing sums.
 Littman et al. {2001} did the same for stochastic satisfiability expressions.
Applyingthis
 idea to Equation 4 leads to the following algorithm.
 1. Generate a setS of vectors of closing costsY according to the product
distribution Pr{y
 i
 }002 001 001 001 002 Pr{y
 n0001
 }. 2. For each of these samples,calculate opt{H
 0
 ; i + 1; Y } as definedabove and average the
 results, resulting in the approximation value-est
 s
 {i; H } = max
 r i
 X
 Y 2S
 {v{opt{H 0
 ; i + 1; Y} + H
 0
 }000 opt{H
 0
 ; i + 1; Y }001Y }=jS j:{6}
 This expression converges to value-est with increasing sample size. A remaining
challenge in evaluatingEquation 6 is computing the real-valuedbid r
 i
 that maximizes the value. Notethat we want to buy item iprecisely at those
closing prices for 213Stone, Schapire, Littman, Csirik, & McAllester
 which the value of having the item {minus its cost}exceeds the value of
not having the item; this maximizes profit. Thus,to make a positive profit,
we arewilling to pay up to, but not more than, the difference in valueof
having the item and not having the item. Formally , let H bethe vector of
current holdings andH
 w
 be the holdings modified to
 reflect winning itemi. Let G
 w
 {Y } = opt{H
 w
 ; i+1; Y}, the optimal set of purchases assuming item i was won, and G{Y
} = opt{H;i+1; Y } the optimal set of purchases assuming otherwise
 {exceptin cases of ambiguity, we write simply G
 w
 and G forG
 w
 {Y } andG{Y} respectively}. We want to select r i
 to achieve the equivalence
 r
 i
 025y
 i
 021
X
 Y 2S
 {v{G
 w
 +H } 000 G
 w 001 Y }=jSj 000 y
 i
025
 X
 Y 2S
 {v{G +H } 000 G 001 Y}=jS j: {7} Setting
 r
 i
 =
 X
 Y 2S {[v{G
 w + H } 000 G w
 001 Y ] 000[v{G + H }000 G 001 Y ]}=jS j: {8}
 achievesthe equivalence desired in Equation 7,as can be verified by substitution,
and therefore biddingthe average differencebetween holding and not holding
the itemmaximizes
 the value.
 4
 2.2 The FullApproach
 Leveraging from the precedinganalysis, we define our sampling-based approachto
profit
 prediction in general simultaneous, multi-unit auctions for interacting
goods. In this sce-
 nario, let there be n simultaneous, multi-unitauctions for interacting goods
a
 0 ; : : : ; a
 n0001
 .
 The auctions might close at different times and these times arenot, in general,
known in
 advance to the bidders. When an auction closes,let us assume that the m
units available
 are distributed irrevocably tothe m highest bidders, who each need to pay
the price bid
 bythe mth highest bidder. This scenario correspondsto an mth price ascending
auction. 5
 Note that the same bidder mayplace multiple bids in an auction, and therebypotentially
 win multiple units.We assume that after the auction closes, thebidders will
no longer have
 anyopportunity to acquire additional copies of thegoods sold in that auction
{i.e., there
 is no aftermarket}. Our approach is based upon fiveassumptions. For G =
{g
 0
 ; : : : ; g n0001
 } 2IN
 n
 , letv{G} 2 IR represent the value derived bythe agent if it owns g
 i
 units of the commodity beingsold in
 auction a
 i . Note that v isindependentof the costs of the commodities.Note furtherthat
 this representation allows for interacting goods of all kinds,including
complementarity and
 substitutability.
 6
 The assumptions of our approach are as follows:
 1. Closing prices are somewhat, butonly somewhat, predictable. That is,
given aset
 of input features X ,for each auction a
 i , there exists a sampling rule that outputsa4. Note that thestrategy for
choosing r
 i in Equation 8 does not exploit the factthat the sample S contains
 only a finite set of possibilities fory
 i
 , which might make it more robust to inaccuracies in the sampling. 5. For
large enough mit is practically the same as the more efficient m + 1st auction.
Weuse the mth
 price model because that is what is used in TAC'shotel auctions.
 6. Goodsare considered complementary if their value as a package is greater
than the sum of theirindividual
 values; goodsare consideredsubstitutable if their value as a package isless
than the sum of their
 individualvalues.
 214Decision-TheoreticBidding with Learned Density Models
 closing price y
 i
according to a probability distribution of predicted closing prices for
 a
 i .
 2. Given a vector ofholdings H = {h
 0 ; : : : ; h
 n0001
 } where h
 i 2IN represents the quantity of the commodity being sold in auctiona
 i
 that are already owned by the agent, and
 given a vector of fixed closing prices Y = {y
 0
 ; : : : ; y n0001
 }, thereexists a tractable
 procedure opt{H; Y } to determine the optimal setof purchases {g
 0
 ; : : : ; g
 n0001
 } where
 g i
 2 IN representsthe number of goods to be purchasedin auction i such that
 v{opt{H; Y } + H} 000 opt{H; Y} 001 Y 025 v{G + H } 000 G001 Y
 for all G2 IN
 n
 .This procedure corresponds to the optimizationproblem opt{H; i; Y } in
Equation 3.
 3. Anindividual agent's bids do not have anappreciable effect on the economy
{large population assumption}.
 4.The agent is free to change existing bidsin auctions that have not yet
closed. 5. Future decisions are made in thepresence of complete price information.
Thisas-
 sumption corresponds to the operatorreordering approximation from the previous
section.
 While these assumptions are notall true in general, they can be reasonable
enough approx-
 imations to be the basis for an effective strategy.
 By Assumption 3,the price predictor can generate predicted prices prior
to considering
 one's bids. Thus,we can sample from these distributionsto produce complete
sets of closing prices of all goods.
 For each good under consideration, weassume that it is the next one to close.If
a
 different auction closes first, we can then revise our bids later {Assumption4}.
Thus, we
 would like tobid exactly the good's expe cte dmarginal utility to us. Thatis,
we bid the
 differencebetween the expected utilities attainable with and without the
good. Tocompute
 these expectations, we simply average the utilities of having and not havingthe
good under
 differentprice samples as in Equation 8. Thisstrategy rests on Assumption
5 in that we assume that biddingthe good's current expected marginal utility
cannot adversely affect our future actions, for instance by impacting our
future space of possible bids.Note that as
 time proceeds, the pricedistributions change in response to the observed
price tra jectories,
 thuscausing the agent to continually revise itsbids.
 Table 1 shows pseudo-code for the entire algorithm. A fully detaileddescription
of an
 instantiationof thisapproach is given in Section 5. 2.3 Example
 Considera camera and a flash with interacting values to an agent as shown
in Table 2.
 Further, consider that theagent estimates that the camera will sell for
$40with probability
 25045, $70 with probability 50045, and $95 with probability 25045.Consider
the question of
 what the agentshould bid for the flash {in auctiona
 0
 }. Thedecision pertaining to the
 camerawould be made via a similar analysis. 215Stone, Schapire, Littman,
Csirik, & McAllester
*  Let H = {h 0
 ; : : : ; h n0001
 } be the agent's current holdings in each of then auctions.
 
*  Fori= 0 to n 000 1{assume auction i is next to close}: { total-diff =
0
{ counter = 0
 {As time permits:
 003For each auction a
 j
 ; j 6=i, generate a predicted price sampley
 j
 . LetY =
 {y
 0 ; : : : ; y
 i0001
 ; 1; y i+1
 ; : : : ; y n0001
 }.
 003 Let H
 w = {h
 0
; : : : ; h
 i0001
 ; h
 i
 + 1; h
 i+1 ; : : : ; h
 n0001
 }, the vector of holdings ifthe agent
 wins a unit in auctiona
 i
 .
 003Compute G
 w
 = opt{H
 w
 ; Y },the optimalset of purchases if the agent wins a unit in
 auction a
 i
 . Note that no additionalunitsof the good will be purchased, since the i-th
component of Yis 1.
 003 ComputeG = opt{H; Y },the optimalset of purchases if the agent never
acquires
 any additional units in theauction a
 i
 and prices areset to Y .
 003 diff= [v{G
 w
 + H } 000 G
 w
 001 Y ] 000[v{G + H }000 G 001 Y ]
 003total-diff = total-diff + diff 003 counter = counter+ 1
 { r = total-diff=counter
 { Bidr in auction a
 i
 .Table 1:The decision-theoretic algorithm for bidding in simultaneous, multi-unit,
interact-
 ing auctions.utilitycamer a alone$50flash alone10both100neither0T able 2:
Thetable of values for all combination ofcamera and flash in our example.
 First,the agent samples from the distributionof possible camera prices.
When the price of the camera {sold in auctiona
 1
 } is $70 in thesample:
 
*  H = {0; 0}; H
 w
 = {1; 0}; Y ={1; 70}
 
* G
 w
 = opt{H w
 ; Y } is the bestset of purchases the agent can make with the flash, and
 assuming the camera costs $70.In this case, the only two options arebuying
the
 camera or not. Buyingthe camera yields a profit of 100000 70 = 30. Not
buying the camera yields a profit of 10000 0 = 10. Thus, G w
 = {0; 1},and [v{G
 w + H } 000 G w
 001 Y ] = v{1; 1} 000{0; 1} 001 {1; 70} = 100000 70. 
*  Similarly G = {0; 0} {since if the flash is not owned, buying the camera
yields a profit of 50 000 70 = 00020, and notbuying it yields a profit
of 0 0000 = 0} and [v{G+ H } 000 G 001 Y] = 0.
 216Decision-TheoreticBidding with Learned Density Models
 
*  val = 30000 0= 30.
 Similarly, when the camera ispredicted to cost $40, val = 6000010 = 50;
andwhen the camera is predicted to cost $95, val= 10 000 0 = 10. Thus,we
expect that 50045 of the camera price samples will suggest a flash value
of $30, while 25045 willlead to a value of $50 and the other
 25045 willlead to a value of $10. Thus,the agent willbid :5 00230 + :25
002 50 + :25 002 10 = $30
 forthe flash.
 Notice that in this analysisof what to bid for the flash, the actual closingprice
of
 the flash is irrelevant. The proper bid depends only on the predicted price
of the camera.
 To determine the proper bid for the camera, asimilar analysis would be done
using the predicted price distributionof the flash. 3. TAC
 W einstantiated our approach as an entry in thesecond Trading Agent Competition
{TAC},
 as described in this section.Building on the success of TAC-00held in July
2000 {Wellman,
 W urman, O'Malley, Bangera, Lin, Reeves,& Walsh, 2001}, TAC-01 included19
agents from
 9 countries {Wellman et al., 2003a}. A key feature ofTAC is that it required
autonomous bidding agents to buy and sellmultiple interacting goo dsinauctions
of different types.It
 is designed as a benchmark problemin the complex and rapidly advancingdomain
of e-
 marketplaces, motivating researchers to apply unique approaches to a common
task. By
 providinga clear-cut ob jective function, TAC also allows the competitors
to focustheir
 attention on the computational andgame-theoretic aspects of the problem
and leaveaside
 the modeling and model validation issues that invariablyloom large in real
applications of automated agents to auctions {see Rothkopf& Harstad, 1994}.
Another feature of TAC
 is that it provides an academicforum for open comparison of agent biddingstrategies
in
 a complex scenario, as opposedto other complex scenarios, such as trading
inreal stock
 markets, in whichpractitioners are {understandably} reluctant to sharetheir
technologies.
 A TACgame instance pits eight autonomous biddingagentsagainst one another.
Each
 TAC agent is a simulated travel agent witheight clients, each of whom wouldlike
to travel
 from TACtown to Tampa and back againduring a 5-day period. Eachclient is
characterized
 by a randomset of preferences for the possible arrival and departure dates,
hotel rooms, and entertainment tickets. To satisfy a client, an agent mustconstruct
a travel package for that client by purchasing airline ticketsto and from
TACtown and securing hotelreservations; it
 is possible to obtain additional bonuses by providing entertainment tickets
as well. A TAC agent's score in a game instance is thedifference between
the sum of its clients'utilities for
 the packagesthey receive and the agent's total expenditure.We provide selected
details about the game next; for full details onthe design and mechanisms
of the TAC server and
 TACgame, see http://www.sics.se/tac. TAC agents buy flights, hotel rooms
and entertainment tickets through auctionsrun
 from the TAC server at the University of Michigan. Eachgame instance lasts
12 minutes
 andincludes a total of 28 auctions of 3different types.
 Flights{8 auctions}: There is a separate auctionfor each type of airline
ticket:to Tampa
 {inflights} on days 1{4 and from Tampa{outflights} on days 2{5.There is
an unlimited
 supply of airlinetickets, and every 24{32 seconds their ask price changes
by from 000$10 217Stone, Schapire, Littman, Csirik, & McAllester
 to $x. x increases linearly over thecourse of a game from 10 to y, wherey
2[10; 90]
 ischosen uniformlyat random for each auction, and is unknown to the bidders.
In
 all cases, tickets are priced between$150 and $800. When the server receives
abid at
 or above the ask price, thetransaction is cleared immediately at the ask
priceand no
 resale is allowed. Hotel Rooms {8}: Thereare two different types of hotel
rooms|the Tampa Towers {TT} and the Shoreline Shanties {SS}|each ofwhich
has 16 rooms availableon days 1{4.
 The rooms are sold ina 16th-price ascending {English} auction,meaning that
for each
 of the 8 typesof hotel rooms, the 16 highest bidders get therooms at the
16th highest
 price.For example, if there are 15 bidsfor TTon day 2 at $300, 2 bids at
$150, and any number of lower bids, therooms are sold for $150 to the 15
high biddersplus
 one of the $150 bidders {earliestreceived bid}. The ask price is the current
16th-
 highest bid and transactions clearonly when the auction closes. Thus,agents
have
 no knowledge of, forexample, the current highest bid. Newbids must be higher
than
 the currentask price. No bid withdrawal or resale isallowed, though the
price of bids may be lowered provided the agentdoes not reduce the number
of rooms itwould win
 were the auction to close.One randomly chosen hotelauction closes at minutes
4{11
 of the 12-minute game. Ask prices are changed only on the minute.
 Entertainment Tickets {12}: Alligator wrestling, amusement park, and museum
tickets are each sold for days 1{4 in continuous double auctions. Here, agents
canbuy and
 sel l tickets, withtransactions clearing immediately when one agentplaces
a buy bid
 at a price at least as high as another agent's sell price.Unlike the other
auction
 typesin which the goods are sold from a centralized stock, each agent starts
with a {skewed} random endowment of entertainment tickets. The prices sent
toagents are
 the bid-ask spre ads, i.e., the highest currentbid price and the lowest
current ask price {due to immediate clears, ask price is always greater than
bid price}. In this case,bid
 withdrawal and ticket resale are both permitted. Eachagent gets blocks of
4 tickets of 2 types, 2 tickets of another 2types, and no tickets of the
other 8 types.
 In addition to unpredictablemarketprices, other sources of variabilityfrom
game in-
 stance to game instance arethe client profiles assigned to the agents andthe
random initial
 allotment of entertainment tickets. Each TAC agent has eight clients with
randomlyas-
 signed travel preferences.Clients have parameters for ideal arrival day,
IAD {1{4};ideal
 departure day,IDD {2{5}; hotelpremium, HP{$50{$150}; and entertainment values,
EV
 {$0{$200} foreach type of entertainment ticket.
 The utility obtained by a client isdetermined by the travel package that
it is given in
 combinationwith its preferences. To obtain a non-zeroutility, the client
must be assigned a feasible travel package consisting of an inflight on some
arrival day AD, an outflight on a departure day DD, and hotel rooms of the
same type {TT or SS} for the days in between
 {days dsuch that AD 024d < DD}. At most one entertainment ticket of each
type can be
 assigned, andno more than one on each day.Given a feasible package, the
client's utility
 isdefined as 1000 000 travelPenalty+ hotelBonus + funBonus
 where
 218Decision-TheoreticBidding with Learned Density Models
 
*  travelPenalty = 100{jAD000 IAD j +jDD 000 IDD j} 
*  hotelBonus = HPif the client is in the TT, 0 otherwise. 
*  funBonus = sum ofEVs for assigned entertainment tickets. A TAC agent's
scor e is the sum of its clients' utilitiesin the optimal allocation of
 itsgoods {computed by the TACserver} minus its expenditures.The client preferences,
 allocations,and resulting utilities from a sample game are shown in Tables
3 and 4.ClientIAD IDD HP AWAP MU1Day2 Day 5 73 175 34 242Day 1 Day 3 125
113 124573Day 4 Day 5 73157 12 1774Day 1 Day 2102 50 67 495Day 1 Day3 75
12 135 1106Day2 Day 4 86 197 8597Day 1 Day 5 9056 197 1628Day 1 Day3 50 79
92 136Table 3: ATT ac-2001's client preferences from anactual game. AW, AP,
and MU are EVs
 foralligator wrestling, amusement park, and museumrespectively.ClientADDD
Hotel Ent'mentUtility1Day2 Day 5 SS AW411752Day 1 Day 2TT AW111383Day3 Day
5 SS MU3, AW412344Day 1Day 2 TT None11025Day 1 Day 2 TTAP111106Day 2Day 3
TT AW211837Day 1 Day 5SS AF2, AW3, MU414158Day 1 Day 2 TT MU11086Table 4:
ATT ac-2001's client allocations and utilities from thesame actual game as
that in
 Table 3. Client 1's \4" under \Ent'ment" indicates on day 4.
 Therules of TAC-01 are largely identical tothose of TAC-00, with three important
exceptions:
 1. In TAC-00, flight prices did not tend toincrease;
 2. In TAC-00,hotel auctions usually all closed at the end ofthe game;
 3. In TAC-00,entertainment tickets were distributeduniformly to all agents
 Whilerelatively minor on the surface, these changessignificantly enriched
the strategic complexity of the game. Stoneand Greenwald {2003} detail agent
strategiesfrom TAC-00.
 219Stone, Schapire,Littman, Csirik, & McAllester
 TAC-01 was organized as a series of fourcompetition phases, culminating
with the semifinals and finals on October 14, 2001at the EC-01 conference
in Tampa,Florida. First,
 the qualifying round,consisting of about 270 games per agent,served to select
the 16
 agentsthat would participate in the semifinals.Second, the seeding round,
consisting of about 315 games per agent, was usedto divide these agents into
two groups ofeight. After
 the semifinals, on themorning of the 14th consisting of 11 games in each
group, four teams
 from each group were selected to compete in the finals duringthat same afternoon.
The
 finalsare summarized in Section 6.
 TAC is not designed to be fully realistic in the sense that an agent from
TACis not
 immediately deployable in thereal world. For one thing, it isunrealistic
to assume that an
 agent wouldhave complete, reliableaccess to all clients'utility functions
{or even that the client would!}; typically, some sort of preference elicitation
procedurewould be required {e.g.
 Boutilier,2002}. For another, theauction mechanisms are somewhat contrived
for the
 purposes of creating an interesting, yetrelatively simple game. However,each
mechanism
 is representativeof a class of auctions that is used in the realworld. And
it is not difficult to imagine a future in which agents doneed to bid in
decentralized, related, yet varying
 auctions for similarly complex packages of goods.
 4.Hotel Price Prediction
 As discussedearlier, a central part of our strategy dependson the ability
to predict prices,
 particularly hotel prices, at variouspoints in the game. To do this asaccurately
as possible,
 we used machine-learning techniques that would examine thehotel prices actually
paid
 in previous gamesto predict prices in future games. Thissection discusses
this part of
 our strategyin detail, including a new boosting-basedalgorithm for conditional
density
 estimation.
 There is bound to beconsiderable uncertainty regarding hotel prices since
these depend
 on many unknown factors, such as the time at which the hotel roomwill close,
who the
 other agents are,what kind of clients have been assigned toeach agent, etc.
Thus, exactly predicting the price of a hotel room ishopeless. Instead, we
regard the closingprice as a
 random variablethat we need to estimate, conditionalon ourcurrent state
of knowledge
 {i.e., number of minutes remaining in the game, ask price of each hotel,
flightprices, etc.}.
 We might then attemptto predict this variable's conditional expected value.
However,
 ourstrategy requires that we not only predict expected value, but that we
also be ableto
 estimate the entireconditional distribution so that we cansample hotel prices.
 To set this up as a learning problem, wegathered a set of training examples
from previously played games. We defined a set of features for describing
each example that
 together are meant tocomprise a snap-shot of all the relevant information
available at the time each prediction is made. Allof the features we used
are real valued; a couple of the
 featurescan have a special value? indicating \value unknown."We used the
following
 basicfeatures:
 
*  The number ofminutes remaining in the game.

*  The price of each hotel room,i.e., the current ask price for roomsthat
have not closed
 or the actualselling price for rooms that have closed. 220Decision-TheoreticBidding
with Learned Density Models
 
*  The closing time of each hotel room. Note that this feature is defined
evenfor rooms
 that have not yet closed,as explained below.
 
* The prices of each of the flights. To this basic list, we added a number
of redundant variations, which wethought might help
 the learning algorithm: 
*  The closing price of hotel rooms that have closed {or ? if the room has
not yet closed}.
 
* The current ask price of hotel rooms thathave not closed {or ? if the room
has already
 closed}.
 
* The closing time of each hotel room minus the closing time of the room
whose price we are trying to predict.
 
*  The number of minutes fromthe current time until each hotel roomcloses.
 During the seeding rounds, it wasimpossibleto know during play who our opponents
 were, although this information was available at the end of each game,and
therefore during
 training. Duringthe semifinals and finals, we did know theidentities of
all our competitors. Therefore, in preparation for the semifinals andfinals,
we added the following features: 
*  The number of playersplaying {ordinarily eight, but sometimes fewer,for
instance if
 one or more playerscrashed}.
 
*  A bit for eachplayer indicating whether or not that playerparticipated
in this game.
 Wetrained specialized predictors for predicting theprice of each type of
hotel room. In other words, one predictor was specialized for predicting
only the price of TT on day
 1, anotherfor predicting SS on day2, etc. This would seem to require eightseparate
 predictors. However,the tournament game is naturally symmetric aboutits
middle in the
 sense that we cancreate an equivalent game by exchangingthe hotel rooms
on days 1 and 2 with those on days 4 and 3 {respectively}, and by exchanging
the inboundflights on
 days1, 2, 3 and 4 withthe outbound flights on days 5, 4, 3 and2 {respectively}.
Thus,
 withappropriate transformations, the outer days {1 and4} can be treated
equivalently ,and
 likewise for the inner days {2 and 3}, reducing the number of specializedpredictors
by half.
 We also createdspecialized predictors for predicting in the firstminute
after flight prices
 hadbeen quoted but prior to receiving any hotelprice information. Thus,
a total of eight specialized predictors were built {for each combination
of TT versus SS, inner versusouter
 day, and first minute versus not first minute}.
 We trained our predictors to predict not theactual closing price of each
room per se, but rather how much the price wouldincrease, i.e., the difference
betweenthe closing price
 and the current price.We thought that this might be aneasier quantity to
predict, and,
 because our predictor never outputs a negative numberwhen trained on nonnegative
data, this approach also ensures that we neverpredict a closing price below
the current bid. From each of the previously played games, wewere able to
extract manyexamples.
 Specifically, for eachminute of the game and for each room thathad not yet
closed, we
 221Stone, Schapire,Littman, Csirik, & McAllester
 extracted the values of all of the features described above at that moment
in the game, plus the actual closing price of the room {which we are trying
to predict}.
 Note thatduringtraining, there is no problem extracting theclosing times
of all of the
 rooms.During the actual play of a game, we donot know the closing times
of rooms that have not yet closed. However,we do know the exact probability
distributionfor closing
 timesof all of the rooms that have not yet closed. Therefore,to sample a
vector of hotel
 prices, wecan first sample according to this distribution over closing times,
and then use
 ourpredictorto sample hotel prices using these sampled closingtimes.
 4.1 The Learning Algorithm Having described how we set up thelearning problem,
we are now ready to describe the
 learning algorithm that we used.Briefly, we solved this learning problemby
first reducing to
 a multiclass, multi-label classification problem {or alternativelya multiple
logistic regression
 problem},and then applying boosting techniques developed by Schapire and
Singer {1999, 2000} combined with a modification of boosting algorithms for
logistic regression proposed by Collins, Schapire and Singer {2002}.The result
is a new machine-learning algorithmfor
 solving conditional density estimationproblems, described in detail in the
remainder ofthis
 section. Table 5 showspseudo-code for the entire algorithm. Abstractly,
we are given pairs {x
 1
 ; y
 1
 }; : : : ; {x
 m
 ; y
 m
 } where each x i
 belongs to a spaceX
 and each y
 i is in R . In our case, thex
 i
 's are the auction-specific feature vectors described
 above; for some n, X022 {R [ f?g} n
 . Each target quantity y
 i
 is the difference between closing
 price and current price.Given a new x, our goal is toestimate the conditional
distribution
 of y given x.
 We proceed with the working assumption thatall training and test examples
{x;y} are
 i.i.d. {i.e, drawn independently from identical distributions}.Although
this assumption is
 false in ourcase {since the agents, including ours, are changing over time},
it seems like a reasonable approximation that greatly reduces thedifficulty
of the learning task.
 Our first step is to reduce the estimationproblem to a classification problem
by breaking the range of the y
 i 's into bins:
 [b 0
 ; b
 1 }; [b
 1 ; b
 2
 }; : : : ; [b
 k ; b
 k+1
]
 for some breakpoints b 0
 < b
 1 < 001 001 001 < b k
 024 b
 k+1
 where for our problem, we chosek = 50.
 7
 Theendpoints b
 0
 andb
 k+1
 are chosen to be the smallest and largest y
 i
 values observed during training. We choose theremaining breakpoints b
 1 ; : : : ; b
 k so that roughly an equal
 number of training labels y i
 fall into each bin.{More technically, breakpoints are chosen so
 that the entropy of thedistribution of bin frequencies is maximized.} For
each of the breakpointsb
 j
 {j =1; : : : ; k}, our learningalgorithm attempts to estimate
 theprobability that a new y {givenx} willbe at least b j
 . Given such estimatesp
 j
 for each b
 j
 , we can thenestimate the probability that y isin thebin [b
 j
 ;b
 j+1
 } byp
 j+1
 000p
 j
 {and
 we can then use a constant density withineach bin}. We thus have reducedthe
problem
 to one of estimating multipleconditional Bernoulli variables corresponding
tothe event7.We did not experiment with varyingk, but expect that the algorithm
is notsensitive to it for sufficiently large values of k.
 222Decision-TheoreticBidding with Learned Density ModelsInput:{x
 1
 ; y
 1
}; : : : ; {x
m
 ; y
 m
 }where x
 i
 2 X, y
 i
 2 R positive integers k andT
 Compute breakpoints:b
 0
 < b
 1
 < 001 001 001< b
 k+1
 where 
*  b
 0
 =min
 i
 y
 i 
*  b
 k+1
 = max
 i
 y
 i
 
*  b
 1 ; : : : ; b
 k chosen to minimize
 P k
 j=0
 q j
 ln q
 j where q
 0
 ;: : : ; q
 k
 arefraction of y
 i
 's in [b
 0
 ; b 1
 }; [b
 1
 ; b
 2
 }; : : : ; [b
 k ; b
 k+1
 ]{using dynamic programing}
 Boosting:
 
*  for t= 1; : : : T :
 000compute weights W
 t {i; j } =
 11 + e s
 j
 {y i
 }f
 t {x
 i
 ;j}
 where s
 j {y} is as in Eq. {10} 000 use W
 t to obtain base function h
 t
 :X 002 f1;: : : ; kg ! R minimizing m
 X
 i=1 k
 X
 j=1 W
 t
 {i;j }e
 000s j
 {y
 i }h
 t
 {x
 i
 ;j} over all decision rules h t
 considered. The decision rules can take any form. Inour work, we use \decision
stumps," or simplethresholds on one
 of the features. Output sampling rule:

*  let f =
 T X
 t=1
 h t
 
*  let f 0
 ={f+ f}=2wheref {x;j } = maxff {x; j
 0
 } :j 024 j
 0
 024 kg
 f{x; j } = minff {x; j
 0
 } : 1 024 j
 0 024 j g
 
*  tosample, given x 2 X
000 let p
 j
 =
 11+ e
 000f
 0 {x;j}
 000let p
 0
 =1; p
 k+1
 =0
 000 choose j2 f0; : : : ; kgrandomly with probability p
 j
 000 p
 j+1
 000 choose yuniformly at random from [b j
 ; b
 j+1
 ]
 000 outputyTable5: The boosting-based algorithm forconditional density
estimation.
 y025 b
 j
 , and for this,we use a logistic regression algorithm based on boosting
techniques as
 described byCollinset al. {2002}.
 Our learning algorithmconstructs a real-valued function f: X 002 f1; :
: : ; kg ! R with
 the interpretation that
 11 + exp{000f{x; j })
 {9} is our estimate of the probability thaty 025b
 j
 ,given x. The negative log likelihood of the
 conditional Bernoulli variable corresponding to y
 i
 being above or belowb
 j
 is then
 ln
 020
 1 + e 000s
 j
 {y
 i
 }f {x
 i
 ;j} 021
 223Stone, Schapire,Littman, Csirik, & McAllester
 where s
 j
 {y} =
 {
 +1 ify 025 b
 j
 0001 if y < b
 j
 .
 {10}
 We attempt tominimize this quantity for all training examples{x
 i
 ; y i
 } and all break- points b
 j
. Specifically, we try to find afunction f minimizing
 m X
 i=1
 k X
 j=1
 ln 020
 1 + e
 000s
 j
 {y i
 }f {x i
 ;j}
 021 :
 We use a boosting-like algorithm described by Collins et al. {2002}for minimizing
ob jective
 functionsof exactly this form. Specifically, we build the function f inrounds.
On each
 roundt, we add a new base functionh
 t
 : X 002f1;: : : ; kg !R . Let
 f
 t =
 t0001
 X
 t
 0
=1
 h
 t
 0 be the accumulating sum.Following Collins, Schapire and Singer, toconstruct
each h
 t
 , we
 first let
 W t
 {i; j }=
 11+ e
 s
 j
 {y
 i
 }f t
 {x
 i ;j}
 be a set of weights on example-breakpoint pairs. We then choose h
 t to minimize
 m
X
 i=1
 k
 X
 j=1
 W t
 {i; j }e
 000s
 j {y
 i
 }h
 t
 {x
 i
 ;j}
 {11} over some space of \simple" basefunctions h
 t
 .For this work, we considered all\decision
 stumps" h of the form h{x; j } = 8
 >
 <
 > :
 A
 j
 if 036{x}025022
 B
 j
 if036{x}< 022
 C
 j
 if 036{x}=?
 where036{001} is one of the featuresdescribed above, and 022, A
 j
 , B j
 and C
 j are all real numbers.
 In other words, such an h simplycompares one feature 036 to a threshold022
andreturns a
 vectorof numbers h{x; 001} that depends only on whether036{x} is unknown
{?}, orabove
 or below022. Schapire and Singer {2000} show how to efficiently search
for the bestsuch
 h over all possible choices of 036, 022, A j
 , B
 j and C
 j
 .{We also employed their technique for \smoothing"A
 j , B
 j
 andC
 j
 .}
 Whencomputed by this sort of iterative procedure,Collins et al. {2002} prove
the asymptotic convergence of f t
 to the minimum of the objective function in Equation {11} over all linear
combinations of the basefunctions. For this problem, we fixedthe number
 of rounds toT = 300. Let f =f
 T +1
 bethe final predictor.
 As noted above,given a new feature vector x, wecompute p
 j
 as in Equation{9} to be
 our estimate for the probability that y 025 b
 j , and we let p
 0 = 1 and p
 k+1 = 0. For this to
 makesense, we need p
1
 025 p
 2 025 001 001 001 025 p k
 , or equivalently ,f {x; 1} 025f {x; 2} 025 001001 001 025 f {x;k},
a
 condition that may not holdfor the learned function f . To force this condition,
we replace f by a reasonable {albeitheuristic} approximation f
 0 that is nonincreasing in j ,namely,
 224Decision-TheoreticBidding with Learned Density Models
 f
 0
 = {f+ f}=2 wheref {respectively, f} is the pointwise minimum {respectively,
maximum} of all nonincreasing functions gthateverywhere upper bound f{respectively,
lower bound f }.
 With this modifiedfunction f
 0
 , we cancompute modified probabilities p j
 . To sample a single point according to the estimateddistributionon R associated
withf
 0
 , we choose bin [b
 j
 ;b
 j+1
 } with probabilityp
 j
 000 p j+1
 , and then select a point from this bin uniformly at
 random. Expected value according to this distributionis easily computed
as
 k
 X
 j=0 {p
 j
 000p
 j+1
 } 022
 b
 j+1 + b
 j2
 
 : Although we present results using thisalgorithm in the trading agent context,
we did not test its performance on more generallearning problems, nor did
we compare to other methods for conditional density estimation, such as those
studied by Stone {1994}.This
 clearly should be an area forfuture research.
 5. ATT ac-2001 Having described hotel price predictionin detail, we now
present the remainingdetails of
 ATT ac-2001'salgorithm. We begin with a briefdescription of the goods allocator,
which is used as a subroutine throughout the algorithm.We then present the
algorithm in a top-down fashion.
 5.1Starting Point
 A core subproblemforTAC agents is the allocation problem:finding the most
profitable
 allocationof goods to clients, G
 003
 , given a set of owned goods and prices for all other goods. The allocation
problem corresponds tofindingopt{H; i; Y }in Equation 3. We denote the value
of G
 003 {i.e., the score one would attainwith G
 003
 } asv{G
 003
 }.The general allocation
 problemis NP-complete, as it is equivalentto the set-packing problem {Garey
& Johnson, 1979}. However itcan be solved tractably in TAC via integer linearprogramming
{Stone
 et al., 2001}. The solution to the integer linear programis a value-maximizing
allocation of owned resources to clients along with a list ofresources that
need to be purchased.Using the linear
 programming package \LPsolve", ATT ac-2001is usually able to find the globally
optimal solution in under 0.01 seconds on a 650 MHz Pentium II. However,
sinceinteger linear
 programming is anNP-complete problem, some inputs can lead to a greatdeal
of search
 over the integrality constraints, andtherefore significantly longersolution
times. When only
 v{G
 003
 } is needed{as opposed to G
 003 itself }, the upper bound producedby LPsolve prior to
 the search over the integrality constraints, known as theLP relaxation,
can be used as an estimate. The LP relaxation can alwaysbe generated very
quickly. Note that this is not by any means the only possible formulation
of the allocation problem. Greenwald and Boyan {2001} studied a fast, heuristic
search variant and found
 that it performedextremely well on a collection of large, randomallocation
problems. Stone
 etal. {2001} used a randomized greedy strategy as afallback for the cases
in which the linear program took too long to solve. 225Stone, Schapire,Littman,
Csirik, & McAllester
 5.2Overview
 Table 6 shows ahigh-level overview of ATT ac-2001. The italicized portions
are described in the remainder of this section.When the first flight quotesare
posted:
 
*  ComputeG
 003
 with current holdingsand expected prices
 
* Buy the flights in G
 003
 for which expectedcost of postponing commitment exceedsthe expected
 benefitof postponing commitment
 Starting 1minute before each hotel close: 
*  Compute G
 003 with current holdings and expectedprices
 
*  Buy the flights in G
 003
 for whichexpected cost of postponing commitmentexceeds expected
 benefitof postponing commitment {30seconds}
 
*  Bidhotel roomexpected marginal values givenholdings, new flights, and
expected hotel purchases {30 seconds} Last minute: Buy remaining flights
as needed by G
 003 In parallel {continuously}:Buy/sell entertainment tickets based ontheir
expected valuesTable 6: ATT ac-2001's high-level algorithm. The italicized
portions are described in the
 remainderof this section.
 5.3 Costof Additional Rooms
 Our hotel pricepredictor described in Section 4 assumes thatATT ac-2001's
bids do not affect the ultimate closing price {Assumption 3 fromSection 2}.
This assumption holds in a large economy. However in TAC, each hotel auction
involved 8 agents competing for 16 hotel
 rooms.Therefore, the actions of each agent had anappreciable effect on the
clearing price: the more hotel rooms an agent attempted topurchase, the higher
the clearing price would be, all other things being equal.This effect needed
to be taken intoaccount when solving
 the basic allocationproblem.
 The simplifiedmodel used byATT ac-2001 assumed that thenth highest bid in
a hotel
 auctionwas roughly proportional toc
 000n
 {over theappropriate range of n} for somec025 1.
 Thus, if the predictor gave a price of p, ATT ac-2001only used this for
purchasing two hotel rooms {the \fair" share of a single agent of the 16
rooms}, and adjusted prices for other quantities of rooms by usingc.
 For example, ATT ac-2001 would consider the cost ofobtaining 4 rooms to
be 4pc 2
 . One or
 two rooms each cost p, but 3 each cost pc, 4 each cost pc 2
 , 5 each cost pc 3
 , etc. So in total,2
 rooms cost 2p, while 4 cost4pc
 2
 . Thereasoning behind this procedure is that ifATT ac-2001
 buys two rooms -  its fair share given that there are 16 rooms and 8 agents,
then the 16th
 highest bid {ATT ac-2001's2 bids in addition to 14 others} sets the price.But
if ATT ac-2001
bids on an additional unit, the previous 15thhighest bid becomes the price-setting
bid:the
 price for all rooms sold goes up from p to pc.
 Theconstant c was calculated from the data of several hundred games during
the seeding round. In each hotel auction, the ratioof the 14th and 18th highest
bids {reflecting the 226Decision-TheoreticBidding with Learned Density Models
 most relevant range of n} was taken as an estimate ofc
 4
 , and the {geometric}mean of the
 resulting estimates was takento obtain c = 1:35. The LP allocator takes
these priceestimates into account when computingG
 003
 by as- signing higher costs to larger purchase volumes, thus tending to
spread out ATT ac-2001's
 demand over thedifferent hotel auctions.
 In ATT ac-2001, a few heuristics were appliedto the above procedure to improvestability
 and to avoid pathological behavior: prices below $1 were replacedby $1 in
estimating c;
 c= 1 was used for purchasing fewerthan two hotel rooms; hotel rooms weredivided
into
 early closing and lateclosing {and cheap and expensive} ones, andthe c values
from the
 corresponding subsets of auctions of the seedingrounds were used in each
case.
 5.4 Hotel Expected Marginal Values
 Using the hotel price prediction module as described in Section 4, coupled
with amodel of
 its own effect on the economy, ATT ac-2001 is equipped todetermine its bids
for hotel rooms. Every minute, for each hotel auction thatis still open,
ATT ac-2001assumes that auction
 will close next andcomputes the marginal value of that hotel room given
the predicted
 closing prices ofthe other rooms. If the auction does notclose next, thenit
assumes
 thatit will have a chance to revise its bids.Since these predicted prices
are represented as distributionsof possible future prices,ATT ac-2001 samples
from these distributionsand
 averages the marginal values to obtain an expected marginal value. Using
the full minute
 between closing times for computation {or30 seconds if there are stillflights
toconsider
 too}, ATT ac-2001divides the available time among thedifferent open hotel
auctions and generates as many price samples as possiblefor each hotel room.
In the end,ATT ac-2001
 bids the expectedmarginal values for each of the rooms. The algorithm is
described precisely and withexplanation in Table 7.
 Oneadditional complication regarding hotel auctions isthat, contrary to
one of our
 assumptions in Section 2.2 {Assumption 4},bids are not fully retractable:
theycan only
 be changed to $1 abovethe current ask price. In the case thatthere are current
active
 bidsfor goods that ATT ac-2001no longer wants that are less than $1 above
the current ask
 price,it may be advantageous to refrain from changing the bid in the hopes
that the ask price will surpass them: that is, thecurrent bid may have a
higher expected value than
 the best possible newbid. To address this issue, ATT ac-2001 samples from
the learned price distribution to find the average expected values of the
current and potential bids, and only
 enters a new bid inthe case that the potential bid is better. 5.5 Expected
Cost/Benefit of Postponing Commitment
 ATT ac-2001makes flight bidding decisions based on acost-benefit analysis:
in particular, ATT ac-2001 computes the incrementalcost of postponing bidding
for a particularflight versus
 the valueof delaying commitment. In this section, we describe the determination
of the cost of postponing the bid. Due to difficulties that compounded withmore
sophisticated approaches, ATT ac-2001 used the following very simple modelfor
estimating the price of a flight ticketat a given
 future time. Itis evident from the formulation that|giveny|the expected
price increase
 from time 0 to time t was very nearlyof the form M t
 2 for some M . It was alsoclear that
 227Stone, Schapire,Littman, Csirik, & McAllester
*  For each hotel {inorder of increasingexpected price}: 
*  Repeat until time bound 1. Generate a random hotel closing order{only
over open hotels}
 2.Sample closing prices from predicted hotelprice distributions
 3. Giventhese closing prices, compute V
 0
 ; V
 1
 ; : : : V
 n
 000V
 i
 021 v{G
 003
 } if owningi of the hotel
 000 Estimatev{G
 003
 } with LPrelaxation
 000 Assume that noadditional hotel rooms of this type can bebought
 000 Forother hotels, assumeoutstanding bids abovesampled price are already
owned
 {i.e., they cannot be withdrawn}. 000 Note that V
 0
 024 V
 1 024 : : : 024 V n
 : the valuesare monotonicallyincreasing since having more goods cannot be
worse in termsof possible allocations.
 
* The value of the ith copy ofthe room is the mean of V
 i
 000 V
 i0001
 over all the samples. 
*  Note further that V 1
 000 V
0
 025 V
 2 000 V
 1
: : : 025 V
 n 000 V
 n0001
 : the value differencesare monotonically
 decreasing since eachadditional room will be assigned to the client who
can derive the most
 value from it.
 
*  Bidfor one room at the value of theith copy of the room for alli such
that the value is at least as much as the current price.Due to the monotonicity
noted in the stepabove, no
 matterwhat the closingprice, the desired number of rooms at thatprice will
be purchased.Table 7: The algorithm forgenerating bids for hotel rooms.
 as long as the price did not hit the artificialboundaries at $150 and $800,
the constantM
 must depend linearly ony 000 10. This linear dependence coefficient was
then estimated from several hundred flight price evolutionsduring the qualifying
round. Thus,for this constant
 m,the expected price increase from timet to time T was m{T
 2
 000t
 2
 }{y00010}. When a
 priceprediction was needed, this formula was first used for the first and
most recent actual price observations to obtain a guess fory, and then this
y was used in theformula again to
 estimate the future price.No change was predicted if the formulayielded
a price decrease.
 This approachsuffers from systemic biases of variouskinds {mainly due to
the fact that the variance of price changes getsrelatively smaller over longer
periods oftime}, but was
 thought to be accurateenough for its use, which was to predictwhether or
not the ticket
 can be expected to get significantlymore expensive over the next few minutes.
 In practice,during TAC-01, ATT ac-2001started with the flight-lookahe adparameterset
 to 3 {i.e., cost of postponing is the average ofthe predicted flight costs
1, 2, and 3 minutes
 in the future}. However,thisparameter was changed to 2 by the endof the
finals in order
 to causeATT ac-2001 to delay its flightcommitments further.
 5.5.1 ExpectedBenefit of Postponing Commitment
 Fundamentally , the benefit of postponingcommitments to flights is that
additional infor- mation about the eventual hotel pricesbecomes known. Thus,
the benefit of postponing
 commitment is computed bysampling possible future price vectors anddetermining,
on
 average, how much better the agent could do if it bought adifferent flight
instead of the one in question. If it is optimal to buythe flight in all
future scenarios, then thereis no
 value in delaying thecommitment and the flight is purchasedimmediately.
However, if 228Decision-TheoreticBidding with Learned Density Models
 there are many scenarios in which theflight is not the best one to get,
the purchase is more
 likely to be delayed. The algorithm for determining the benefitof postponing
commitment is similar to that for determining the marginal valueof hotel
rooms. It is detailed, withexplanation, in
 Table 8.
*  Assume we'reconsidering buying n flights of a given type
 
*  Repeat until time bound
 1. Generate a random hotel closingorder {open hotels}
 2. Sampleclosing prices from predicted price distributions{open hotels}
 3. Giventhese closing prices compute V
 0
 ; V
 1 ; : : : V
 n
 000 V
 i
 =v{G
 003
} if forced to buy i of the flight 000 Estimate v{G 003
 } with LP relaxation 000 Assume more flights can bebought at the current
price
 000 Note that V 0
 025 V
 1 025 : : : 025 V n
 since it is never worse toretain extra flexibility.

*  The value of waiting to buycopy i is the mean of V i
 000 V
i0001
 over all thesamples. If
 all price samples lead tothe conclusion that the ith flight shouldbe bought,
then
 V
 i
 = V
 i0001
 and there is no benefit to postponing commitment.Table 8: The algorithm
for generating value of postponing flight commitments. 5.6 Entertainment
Expected Values
 The core of ATT ac-2001's entertainment-ticket-bidding strategy isagain
a calculation of the
 expected marginalvalues of each ticket. For each ticket, ATT ac-2001computes
the expected
 valueof having one more and one fewer of the ticket. These calculations
give bounds on the bid and ask prices it is willing to post. The actual bid
and ask prices are alinear function of
 time remaining in thegame: ATT ac-2001 settles for a smaller and smaller
profit from ticket transactions as the game goes on.Details of the functions
of bid and ask priceas a function
 of game time and ticket value remained unchanged from ATT ac-2000 {Stone
et al., 2001}. Details of the entertainment-ticket expected marginal utility
calculations are given in Table 9.
 6. Results This section presents empirical resultsdemonstrating the effectiveness
of theATT ac-2001
 strategy .First, we summarize its performance in the2001 and 2002 Trading
Agent Com- petitions {TACs}. Thesesummaries provide evidence of the strategy's
overall effectiveness,
 but, due to the smallnumber of games in the competitions, areanecdotal rather
than sci-
 entificallyconclusive. We then present controlledexperiments that provide
more conclusive evidence of the utility of our decisiontheoretic and learning
approaches embedded within ATT ac-2001.
229Stone, Schapire,Littman, Csirik, & McAllester
*  Assume n of a giventicket type are currently owned 
*  Repeat until time bound 1. Generate a random hotel closing order{open
hotels}
 2. Sampleclosing prices from predicted price distributions{open hotels}
 3. Given these closingprices compute V
 n0001
 ; V
 n
 ; V
 n+1
 000V
 i
 = v{G
 003
 } if owni of the ticket
 000Estimate v{G
 003 } with LP relaxation
 000Assume no other tickets can be bought or sold
 000 Note thatV
 n0001
 024V
 n
 024 V n+1
 since it is never worse to own extra tickets.
 
*  The value of buying a ticket is the mean of V
 n+1
 000 V
 n over all the samples; the value
 of selling is the mean ofV
 n
 000 V n0001
 .
 
*  Since tickets are considered sequentially, if the determined buy or sell
bidleads to a
 price that would clear according to the current quotes, assume the transaction
goes
 throughbefore computing the values of buying and selling other ticket types.Table
9:The algorithm for generating valueof entertainment tickets.
 6.1 TAC-01 Competition Of the 19 teams that entered thequalifying round,
ATT ac-2001 was oneof eight agents
 to make it to thefinals on the afternoon of October 14th, 2001.The finals
consisted of
 24 games amongthe same eight agents. Rightfrom the beginning, it became
clear that livingagents {Fritschi& Dorer, 2002} was the team to beat in thefinals.
They jumped to an
 early lead in the first two games, and by eight games into the round, they
were morethan
 135 points per game ahead oftheir closest competitor {SouthamptonTAC, He
& Jennings,
 2002}. 16games into the round, they were more than 250points ahead of their
two closest competitors {ATT ac-2001and whitebear}.
 From that point, ATT ac-2001, which was continually retraining itsprice
predictors based
 on recent games, began making a comeback. By the time the last game was
to be played,
 it was only an average of 22 pointsper game behind livingagents.It thus
needed to beat
livingagents by 514 pointsin thefinal game to overtake it, well withinthe
margins observed
 in individual gameinstances. As the game completed, ATT ac-2001's score
of 3979 was one of the first scores to be posted bythe server. The other
agents' scores werereported one
 by one, until only thelivingagents score was left. Afteragonizing seconds
{at least for us}, the TAC server posted a finalgame score of 4626, resultingin
a win forlivingagents.
 After the competition, theTAC team at the University of Michigan conducted
a regres-
 sion analysis of the effects of the client profiles on agentscores. Using
data from the seeding rounds, it was determined that agents did better when
their clients had:
 1.fewer total preferred travel days; 2. higher total entertainment values;
and
 3. a higher ratio ofouter days {1 and 4} to inner {2 and 3} in preferred
trip intervals.
230Decision-TheoreticBidding with Learned Density Models
 Based on these significant measures, thegames in the finals could be handicapped
based on each agents' aggregate clientprofiles. Doing so indicated thatlivingagents'
clients were
 much easier to satisfy than those ofATT ac-2001, giving ATT ac-2001the highest
handicapped
 score.The final scores, as well as the handicapped scores, are shown in
Table10. Complete
 results and affiliations areavailable from http://tac.eecs.umich.edu.AgentMeanHandicapped
scoreATT ac-200136224154livingagents36704094whitebear35133931Urlaub0134213909Retsina33523812CaiserSose30743766SouthamptonT
AC32533679T acsMan28593338T able10: Scores during the finals. Eachagent played
24 games. Southampton'sscore was
 adversely affected by agame in which their agent crashed after buyingmany
 flights but no hotels, leadingto a loss of over 3000 points.Discarding that
game
 results in an average score of 3531.
 6.2 TAC-02 Competition
 A year afterthe TAC-01 competition, ATT ac-2001 was re-entered in the TAC-02
competition
 using the modelstrained at the end of TAC-01.Specifically, the price predictors
wereleft
 unchanged throughout {no learning}.The seeding round included 19 agents,
eachplaying
 440 games over the course ofabout 2 weeks. ATT ac-2001was the top-scoring
agent in this round, as shown in Table11. Scores in the seeding round were
weighted so as to emphasize
 later results over earlier results: scores on dayn of the seeding round
were given a weight
 of n. Thispractice was designed to encourage experimentation early in the
round. The
 official ranking in the competitions was based on the mean score after ignoring
each agent's worst 10 results so as to allow for occasional program crashes
and network problems. On the one hand, it is striking thatATT ac-2001 was
able to finish sostrongly in a field
 of agents that hadpresumably improved over the course of theyear. On the
other hand,
 most agents were being tuned, for better and for worse, whileATT ac-2001
was consistent throughout. In particular, we are toldthat SouthamptonTAC
experimented withits approach
 during the later days ofthe round, perhaps causing it to fall out of the
lead {by weighted
 score} in the end.During the 14-game semifinal heat,ATT ac-2001, which was
nowrestored
 with its learning capability andretrained over the data from the 2002 seedinground,
finished
 6th out of 8 therebyfailing to reach the finals.
 Thereare a number of possible reasons for thissudden failure. One relatively
mun- dane explanation is that the agent had tochange computational environments
betweenthe
 seeding rounds and the finals, andthere may have been a bug or computationalresource
 constraint introduced.Another possibilityis that due to the smallnumber
of games in
 231Stone, Schapire,Littman, Csirik, & McAllesterAgentMeanWeighted,droppedworst
10ATT ac-200130503131SouthamptonT AC31003129UMBCT AC29803118livingagents30183091cuhk29983055Thalis29523000whitebear29452966RoxyBot27382855T
able11: Top 8 scores duringthe seeding roundof TAC-02. Each agent played
440 games,
 with its worst 10 games ignoredwhen computing the rankings.
 thesemifinals, ATT ac-2001 simply got unlucky with respect to clients and
the interaction ofopponent strategies. However,it is also plausiblethat the
training data fromthe 2002
 qualifying and seeding round datawas less representative of the 2002 finalsthan
the was the
 training data from 2001;and/or that the competing agents improvedsignificantly
over the
 seeding roundwhile ATT ac-2001 remained unchanged.The TAC team at the University
of Michigan has done a study of the pricepredictors of several 2002 TACagents
that suggests
 that the bug hypothesis is most plausible: the ATT ac-2001 predictor from
2001 outperforms all other predictors from 2002 on the datafrom the 2002
semifinalsand finals; and oneother
 agent that uses the 2002 data didproduce good predictions based on that
data{Wellman,
 Reeves, Lochner, & Vorobeychik, 2003b}.
 8 6.3 Controlled Experiments ATT ac-2001's success in the TAC-01 competition
demonstrates its effectiveness as a complete
 system. However,since the competing agents differed along several dimensions,
the compe-
 tition resultscannot isolate the successful approaches.In this section,
we report controlled experimentsdesigned to test the efficacy of ATT ac-2001's
machine-learning approach to price
 prediction.
 6.3.1Varying the Predictor
 In the first set of experiments, weattempted to determine how the quality
ofATT ac-2001's
 hotel pricepredictions affects its performance. To this end, we devised
seven price prediction schemes, varying considerably insophistication and
inspired by approaches takenby other
 TAC competitors, andincorporated these schemes into our agent.We then played
these
 seven agents against one another repeatedly, with regular retraining as
described below.
 Following are the seven hotelprediction schemes that we used, in decreasingorder
of
 sophistication:8. Indeed, in the TAC-03competition, ATT ac-2001 was enteredusing
the trained models from 2001, and it won the competition, suggesting further
thatthe failure in 2002 was due to a problem withthe learned
 models that were used duringthe finals in 2002.
 232Decision-TheoreticBidding with Learned Density Models
 
*  ATT ac-2001
 s : This is the \full-strength" agentbased on boosting that was used during
the tournament. {The sdenotes sampling.}
 
*  Cond'lMean s
 : This agent samples pricesfrom the empirical distribution of prices from
previously played games, conditionedonly onthe closing time of the hotel
room {a subset of of the features used byATT ac-2001
 s
 }.In other words, it collects all historical hotel prices and breaks them
down by thetime at which the hotel closed {as well as room type, as usual}.
Theprice predictor then simply samples from thecollection of
 prices corresponding to the given closing time.
 
*  SimpleMean s
 : This agent samples prices from the empirical distribution of prices from
previously played games, withoutregard tothe closing time of the hotel room
{but still broken down by room type}. It uses a subset of the features used
by Cond'lMean
 s
 . 
*  ATT ac-2001
ev
 , Cond'lMean
ev
 , SimpleMean
 ev : These agents predict in the same way as
 their corresponding predictors above, but instead of returning a random
samplefrom
 the estimated distributionof hotelprices, they deterministicallyreturn the
expected value of the distribution. {Theev denotes expected value,as introduced
in Sec-
 tion2.1.}
 
*  CurrentBid:This agent uses a very simple predictor thatalways predicts
that the hotel
 room will close at its current price. In every case, whenever the price
predictorreturns a price that is below the currentprice,
 we replace it with the currentprice {since prices cannot go down}. In our
experiments, we added as an eighth agent EarlyBidder, inspired by thelivingagents
 agent. EarlyBidderused SimpleMean
 ev
 to predict closing prices, determined an optimalset of
 purchases, and then placed bidsfor these goods at sufficiently high pricesto
ensure that
 they would be purchased {$1001 for all hotel rooms, just aslivingagents
did in TAC-01}right
 after the first flight quotes.It then never revised these bids. Each of
these agents require training,i.e., data from previously played games.However,
 we are faced with a sortof \chicken and egg" problem: torun the agents,
we need to first train the agents using data fromgames in which they were
involved, butto get this
 kind of data, we need tofirst run the agents. To get aroundthis problem,
we ran the
 agents inphases. In Phase I, which consisted of 126games, we used training
data from
 the seeding, semifinalsand finals rounds of TAC-01. In Phase II, lasting
157 games, we
 retrained the agents once every sixhours using all of the data from the
seeding,semifinals
 and finals rounds as wellas all of the games played in Phase II.Finally,
in Phase III, lasting 622 games, we continued to retrain theagents once every
six hours, but now usingonly
 data from games played during PhasesI and II, and not including data from
theseeding,
 semifinals and finals rounds. Table 12 shows how the agents performed in
each of these phases. Muchof what we
 observe in this table isconsistent with our expectations. Themore sophisticated
boosting-
 based agents{ATT ac-2001
 s
 and ATT ac-2001
 ev } clearly dominated the agents based onsimpler
 prediction schemes. Moreover,with continued training, these agents improved
markedly
 relative toEarlyBidder. We also see the performance of the simplest agent,
CurrentBid, which
 233Stone, Schapire,Littman, Csirik, & McAllesterAgentRelative ScorePhase
IPhase IIPhase IIIATT ac-2001 ev105:2006 49:5 {2}131:6 006 47:7 {2}166:2006
20:8 {1}ATT ac-2001
 s27:8 006 42:1{3}86:1 00644:7 {3}122:3 006 19:4 {2}EarlyBidder140:3 006
38:6 {1}152:8 006 43:4 {1}117:0 006 18:0 {3}SimpleMean
 ev00028:8 006 45:1 {5}00053:9 006 40:1 {5}00011:5 00621:7 {4}SimpleMean
s00072:0 006 47:5 {7}00071:6 00642:8 {6}00044:1 006 18:2{5}Cond'lMean
 ev8:6 006 41:2 {4}3:5 006 37:5 {4}00060:1 00619:7 {6}Cond'lMean s000147:5
006 35:6 {8}00091:4006 41:9 {7}00091:1 00617:6 {7}CurrentBid00033:7006
52:4 {6}000157:1 00654:8 {8}000198:8 006 26:0{8}Table 12: The average
relative scores{006 standard deviation} for eight agents in the three
 phases of our controlled experiment in which the hotel prediction algorithm
was
 varied. The relative score ofan agent is its score minus the averagescore
of all
 agents in that game.The agent's rank within each phase is shown in parentheses.
 does not employany kind of training, significantly declinerelative to the
other data-driven agents.
 On the other hand, there are some phenomena in this table that were verysurprising
 to us. Most surprisingwasthe failure of sampling to help. Ourstrategy relies
heavily
 not only onestimating hotel prices, but also taking samples fromthe distributionof
hotel
 prices.Yet these results indicate that using expected hotel price, rather
than price samples, consistently performs better.We speculate that this may
be because an insufficient number of samples are being used {due tocomputational
limitations} so that the numbersderived
 from these samples have too high a variance. Another possibilityis that
the method of using
 samples toestimate scores consistently overestimates the expected score
because it assumes
 the agent can behave with perfect knowledge for each individual sample|a
propertyof our
 approximation scheme.Finally, as our algorithm uses sampling atseveral different
points
 {computing hotel expected values,deciding when to buy flights, pricingentertainment
tick-
 ets, etc.}, it is quite possible that sampling is beneficial for somedecisions
while detrimental
 forothers. For example, when directly comparingversions of the algorithm
with sampling used at only subsets of the decision points, the data suggests
that sampling for the hotel decisions is most beneficial, while samplingfor
the flights and entertainment ticketsis neu-
 tral at best, and possiblydetrimental. This result is not surprisinggiven
that the sampling
 approach is motivated primarilyby the task of bidding forhotels.
 We were also surprised thatCond'lMean
 s
 and Cond'lMean ev
 eventually performed worse than the less sophisticated SimpleMean s
 and SimpleMean
 ev
 . One possible explanation isthat
 the simpler model happens to give predictions that are just as good as themore
complicated
 model, perhaps becauseclosing time is not terribly informative,or perhaps
because the
 adjustmentto price based on current price is moresignificant. Other things
being equal, the simpler model has the advantage that its statistics are
based on all of the price data,
 regardless of closing time,whereas the conditional model makes eachpredictionbased
on
 only an eighth ofthe data {since there are eight possible closing times,
each equally likely}.
 In addition to agent performance, it is possible to measure the inaccuracy
of the eventual predictions, at least for the non-samplingagents. For these
agents, we measuredthe root
 234Decision-TheoreticBidding with Learned Density Models
 mean squared error of the predictions made inPhase III. These were: 56.0
forATT ac-2001
 ev
 , 66.6 for SimpleMean
 ev , 69.8 for CurrentBid and 71.3 forCond'lMean
 ev
 . Thus,we see that
 the lower the error ofthe predictions {according to this measure},thehigher
the score
 {correlationR= 0000:88}. 6.3.2 ATT ac-2001 vs.EarlyBidder
 In a sense, the twoagents that finished at the top of the standings in TAC-01
represented
 oppositeends of a spectrum. The livingagentsagent uses a simple open-loop
strategy, com-
 mitting to a set of desired goods right at the beginning of the game,while
ATT ac-2001 uses
 a closed-loop, adaptive strategy. The open-loop strategy relies on the other
agents to stabilize the economy and create consistent final prices. In particular,
if all eight agents are open loop and placevery high
 bids for the goods theywant, many of the prices will skyrocket, evaporating
any potential profit. Thus, a set of open-loopagents would tend to get negative
scores|theopen-loop
 strategy is a parasite, in amanner of speaking. Table 13 shows theresults
of running 27
 games with 7 copiesof the open-loop EarlyBidder and one ofATT ac-2001. Although
motivated
 by livingagents, in actuality it is identical to ATT ac-2001except that
it uses SimpleMean ev
 and
 it places all ofits flight and hotel bids immediately after thefirst flight
quotes. It bids only for the hotels that appear inG
 003
 at that time.All hotel bids are for $1001. Inthe
 experiments, one copy ofATT ac-2001
 s
 isincluded for comparison. The price predictors are all from Phase I in
the preceding experiments. EarlyBidder's high biddingstrategy backfires
 and it ends up overpaying significantly for its goods.As our experiments
above indicate, ATT ac-2001 may improve even further if it is allowed to
train on thegames of the on-going
 experiment as well.AgentScoreUtilityATT ac-20012431 006 4648909 006 264EarlyBidder{7}0004880006
3379870 00634Table 13: The results of runningATT ac-2001 against 7 copies
ofEarlyBidder over the course of 27 games. EarlyBidderachieves high utility,
but overpays significantly, resulting
 in low scores.
 The open-loop strategyhas the advantage of buying a minimal setof goods.
That is, it
 never buysmore than it can use. On the other hand, itis susceptible to unexpected
prices
 in that it can get stuck paying arbitrarilyhigh prices for the hotel rooms
it has decidedto
 buy.
 Notice in Table 13 that the average utilityof theEarlyBidder's clients is
significantly greater than that of ATT ac-2001's clients. Thus, the difference
inscore is accounted for
 entirelyby the cost of the goods. EarlyBidderends up paying exorbitant prices,
whileATT ac-
 2001 generally steers clearof the more expensive hotels. Itsclients' utility
suffers, but the cost-savings are well worth it. Compared to the open-loop
strategy, ATT ac-2001's strategyis relatively stable against
 itself. Itsmain drawback is that as it changes itsdecision about what goods
it wants andas
 235Stone, Schapire,Littman, Csirik, & McAllester
 it may alsobuy goods to hedge against possible price changes, it can end
up getting stuck paying for some goods that are ultimatelyuseless to any
of its clients.
 Table 14 shows the results of 7 copiesof ATT ac-2001 playing against eachother
and
 one copy of theEarlyBidder. Again, training is from theseeding round and
finals of TAC- 01: the agents don't adapt during the experiment. Included
in this experiment arethree
 variants of ATT ac-2001, each with a differentflight-lookahe ad parameter
{from thesection
 on \cost of postponing flightcommitments"}. There were three copies each
of the agents
 with flight-lookahe ad setto 2 and 3 {ATT ac-2001{2} and ATT ac-2001{3},
respectively}, and
 oneATT ac-2001 agent withflight-lookahe ad set to 4 {ATT ac-2001{4}).AgentScoreUtilityEarlyBidder2869
0066910079 006 55ATT ac-2001{2}2614 006 389671 006 32ATT ac-2001{3}2570
006399641 006 32ATT ac-2001{4}2494 006 689613 006 55Table 14:The results
of runningthe EarlyBidderagainst 7 copies of ATT ac-2001over the
 course of 197 games.The three different versions ofATT ac-2001 had slightly
different flight-lookahe ads.
 F rom the results in Table 14 it is clear that ATT ac-2001does better when
committing
 to its flight purchases later in the game {ATT ac-2001{2} as opposed toATT
ac-2001{4}). In
 comparison with Table 13, the economyrepresented here does significantly
better overall.
 That is, having many copies ofATT ac-2001 in the economy does notcause them
to suffer.
 However, inthis economy, EarlyBidder is able toinvade. It gets a significantly
higherutility
 for its clients and only pays slightly more than the ATT ac-2001agents {as
computed by
 utilityminus score}.
 9
 Theresults in this section suggest that the variance of the closing prices
is the largest determining factor between the effectiveness of the two strategies
{assuming nobody else
 is using the open-loop strategy}.We speculate that with large price variances,
the closed-
 loop strategy {ATT ac-2001} should do better, butwith small price variances,
the open-loop strategy could do better.
7. Discussion
 The open-loop andclosed-loop strategies of the previous sectiondiffer in
their handling of
 pricefluctuation. A fundamental way of takingprice fluctuation into account
is to place \safe bids." A very high bid exposesan agent to the danger of
buying something at a ridiculously high price. Ifprices are in fact stable
then high bids are safe.But if prices
 fluctuate, then highbids, such as the bids of the stable-pricestrategy,
are risky. In TAC,
 hotel rooms are sold in a Vickrey-style nth price action. Thereis a separate
auction for
 eachday of each hotel and these auctions are donesequentially. Although
the order of the auctions is randomized, and not known tothe agent, when
placing bids in one of these9. We suspectthat were the agents allowed to
retrain over the course of the experiments,ATT ac-2001
 would end up improving, as we saw in Phase III of theprevious set of experiments.
Werethis to occur,
 it is possible thatEarlyBidder would no longer be able to invade.
 236Decision-TheoreticBidding with Learned Density Models
 auctions the agent assumes that auction willclose next. We assumed in the
design ofour
 agent that our bids in one auction donot affect prices in other auctions.This
assumption
 is not strictly true, butin a large economy one expects that the bidsof
a single individual
 havea limited effect on prices. Furthermore,the price most affected by a
bid is the price of the item being bid on; the effect on other auctions seems
less direct and perhaps more limited. Assuming bids in one auction donot
affect prices in another, the optimal bidding strategy is the standard strategy
for a Vickrey auction|the bid for an item should be equal to its utility
to the bidder.So, to place a Vickrey-optimal bid, one mustbe able to estimate
 the utility of an item. The utility of owning an item issimply the expected
final score
 assuming one owns the item minus the expected final score assuming one does
not ownthe
 item. So, the problem of computing a Vickrey-optimal bid can be reduced
to theproblem
 of predicting final scores for two alternative game situations. Weuse two
score prediction
 procedures,which we call the stable-price score predictor{corresponding
to Equation 5}
 andthe unstable-price score predictor {Equation 4}. The Stable-Price Score
Predictor.The stable-price score predictor first estimates the expected prices
in the rest of thegame using whatever information is available in the
 givengame situation.It then computes the value achieved by optimal purchases
under the
 estimatedprices. In an economy with stable prices,this estimate willbe quite
accurate| if we make the optimal purchases forthe expected price then, if
the prices are nearour
 estimates, our performance will also be near the estimated value.
 The Unstable-Price Score Predictor.Stable-price score prediction does not
take into account the ability of the agent to react to changes in price as
the gameprogresses. Sup-
 pose a given roomis often cheap but is sometimes expensive.If the agent
can first determine the price of the room, and then plan forthat price, the
agent willdo better thanguessing
 the price ahead of time and sticking to the purchases dictated by that price.The
unstable
 price predictor uses a model of the distribution of possible prices.It repeatedly
samples
 pricesfrom this distribution, computes the stable-price scoreprediction
under the sampled
 price, and thentakes the average of these stable-price scoresover the various
price samples. This score prediction algorithm is similar to the algorithm
used in Ginsberg's {2001}quite
 successful computer bridge program wherethe score is predicted by sampling
the possible hands of the opponent and, for each sample, computing the score
of optimal play inthe case
 where all players havecomplete information {double dummy play}.While this
approach
 has a simple intuitive motivation, it is clearly imperfect.The unstable-price
score predictor assumes both that future decisions are made in the presence
of complete price information, and that the agent is free to changeexisting
bids in auctions that have not yetclosed. Both
 of these assumptions are onlyapproximately true at best. Waysof compensating
for the
 imperfectionsin score prediction were described in Section 5. Buy Now or
Decide Later.The trading agent must decide what airlinetickets to buy
 and when to buy them.In deciding whether to buy an airline ticket,the agent
can compare
 the predicted score in the situation where it owns the airline ticket with
the predicted score
 in thesituation where it does not own the airline ticket but may buy it
later. Airlinetickets
 tend to increase in price, soif the agent knows that a certain ticketis
needed it should buy
 it as soon aspossible. But whether or not a given ticket is desirable may
depend on the price of hotel rooms, which may becomeclearer as the game progresses.
If airline tickets
 did not increase in price, as wasthe case in TAC-00, thenthey should bebought
at the
 237Stone, Schapire,Littman, Csirik, & McAllester
 lastpossiblemoment {Stone et al., 2001}. To determine whether an airline
ticket shouldbe
 bought now or not, one cancompare the predicted score in the situation whereone
has just
 bought the ticket atits current price with the predicted score in thesituation
where the
 price of the ticketis somewhat higher but has not yet been bought. It is
interesting to note that if one uses the stable-price score predictorfor
both of these predictions, and the ticket is purchased in the optimal allocationunder
the current price estimate, then the predicted score for buying the ticket
now willalways be higher|increasing the price of theticket can
 only reduce the score.However, the unstable-price score predictorcan yield
an advantage
 fordelaying the purchase. This advantage comes from the fact that buying
the ticket may
 be optimal under some pricesbut not optimal under others. If the ticket
has not yet been
 bought, then thescore will be higher for those sampled priceswhere the ticket
shouldnot
 be bought. This corresponds to the intuition thatin certain cases the purchase
should be delayed until more information is available.
 Our guiding principle in the design of the agent was, to the greatest extent
possible, to
 have the agentanalytically calculate optimal actions. Akey component of
these calculations is the score predictor, based either on asingle estimated
assignment of prices or on a model of
 the probability distribution overassignments of prices. Both score predictors,though
clearly
 imperfect, seem useful.Of these two predictors, only theunstable-price predictor
can be
 usedto quantitatively estimate the valueof postponing a decision until more
information is
 available. The accuracy ofprice estimation is clearly of central importance.Future
research
 willundoubtedly focus on ways of improving both price modeling and score
prediction based on price modeling.
 8.Related and Future Work Although there has been a good dealof research
on auction theory, especiallyfrom the
 perspective of auction mechanisms {Klemperer, 1999}, studies of autonomousbiddingagents
 and their interactions are relatively few and recent. TACis one example.
FM97.6 is
 anotherauction test-bed, which is based on fishmarket auctions {Rodriguez-Aguilar,
Martin, Noriega, Garcia, & Sierra, 1998}.Automatic bidding agents have also
beencreated in this
 domain {Gimenez-Funes, Godo, Rodriguez-Aguiolar, & Garcia-Calves, 1998}.
There have
 beena number of studies of agents biddingfora single good in multiple auctions
{Ito, Fukuta, Shintani,& Sycara, 2000; Anthony, Hall, Dang, & Jennings, 2001;Preist,Bartolini,
 & Phillips,2001}. A notable auction-based competition that was held prior
to TAC was the Santa Fe
 Double Auction Tournament{Rust, Miller, & Palmer, 1992}. Thisauction involved
several
 agentscompeting in a single continuous double auction similar to the TAC
entertainment ticket auctions. As analyzed by Tesauro and Das {2001}, this
tournament was won by a
 parasite strategy that, like livingagents as described in Section 6.3,reliedon
other agents to
 find a stableprice and then took advantageof it to gain an advantage. Inthat
case, the
 advantagewas gained by waiting until the last minute to bid, a strategy
commonly known as sniping.
 TAC-01was the second iteration of the Trading Agent Competition. Therules
of TAC-
 01 are largely identical to those of TAC-00, withthree important exceptions:
 1. In TAC-00, flight prices did not tend toincrease;
 238Decision-TheoreticBidding with Learned Density Models
 2. In TAC-00, hotel auctions usuallyall closed at the end of the game; 3.
In TAC-00, entertainment tickets were distributeduniformly to all agents
While minor on the surface, the differencessignificantly enriched the strategic
complexity of the game. In TAC-00,most of the designers discovered that a
dominant strategy was
 to defer all serious biddingto the end of the game. A result, the focus
was on solving
 the allocation problem,with most agents usinga greedy,heuristicapproach.
Since the
 hotel auctions closed at the end of the game,timing issues were also important,
with significant advantages going toagents that were able to bidin response
tolast-second price
 quotes {Stone & Greenwald, 2003}. Nonetheless, many techniques developed
in 2000 were
 relevant to the 2001 competition: theagent strategies put forth in TAC-00were
important
 precursors to the secondyear's field, for instance as pointed out in Section
5.1.
 Predicting hotel clearing prices was perhaps the most interesting aspect
of TAC agent
 strategies in TAC-01, especially in relation to TAC-00 where the last-minute
bidding created essentially a sealed-bid auction.As indicated by our experiments
describedin Section 6.3,
 there are many possible approaches to this hotel price estimation problem,and
the approach
 chosen can havea significant impact on the agent's performance. Among those
observed in TAC-01 are the following {Wellman, Greenwald, Stone, & Wurman,2002},
associated in
 some cases with theprice-predictor variant in our experimentsthat was motivated
by it. 1. Just use the current price quotep
 t
 {CurrentBid}.
 2. Adjust based on historic data.For example, if 001
 t is the average historicaldifference between clearing price and price at
timet, then the predicted clearing price isp
 t
 + 001 t
 .
 3. Predictby fitting a curve to the sequence of askprices seen in the current
game.
 4. Predict based on closing price data forthat hotel in past games {SimpleMean
ev
 , SimpleMean
 s
 }.
 5. Same as above, but condition on hotel closing time, recognizingthat the
closing
 sequence will influencethe relative prices.
 6. Sameas above, butcondition on full ordering ofhotel closings, or which
hotels are open or closed at a particular point{Cond'lMean
 ev
 ,Cond'lMean
 s
 }. 7. Learn a mapping from features of thecurrent game {includingcurrent
prices} toclosing
 prices based on historic data {ATT ac-2001
 s
 ,ATT ac-2001
 ev
 }.
 8. Hand-construct rules based onobservations about associations between
abstract fea-
 tures.
 Havingdemonstrated ATT ac-2001's success atbidding in simultaneous auctions
for mul- tiple interacting goods in the TAC domain, we extended our approach
toapply it to the
 U.S. FederalCommunications Commission {FCC} spectrum auctions domain {Weber,
1997}.
 TheFCC holds spectrum auctions to sell radiobandwidth to telecommunications
compa- nies. Licenses entitle their owners touse a specified radio spectrum
band within aspecified
 geographical area, ormarket. Typically several licenses areauctioned off
simultaneously
 withbidders placing independent bids for eachlicense. The most recent auction
brought in 239Stone, Schapire,Littman, Csirik, & McAllester
 over $16billion dollars. In a detailed simulation ofthis domain {Csirik,
Littman, Singh, & Stone, 2001}, we discovered a novel,successful bidding
strategy inthis domain that allows
 the bidders to increase their profitssignificantly over a reasonable default
strategy {Reitsma,
 Stone, Csirik, & Littman, 2002}. Our ongoing research agenda includes applying
our approach to other similar domains.
 We particularlyexpect the boostingapproach to price prediction and thedecision-theoretic
 reasoning over pricedistributionsto transfer to other domains.Other candidate
real-world
 domainsinclude electricity auctions, supplychains, and perhaps even travel
booking on public e-commerce sites.
 Acknowledgements
 Thiswork was partiallysupported by the United States{Israel BinationalScience
Founda-
 tion {BSF}, grant number 1999038. Thanks to the TAC team at the University
of Michigan for providing the infrastructure and support required to run
many of our experiments. Thanks to Ronggang Yu at the University of Texas
at Austin for running oneof the ex-
 periments mentioned in thearticle. Most of this research was conductedwhile
all of the
 authors were at AT&T Labs  -  Research.
 References Anthony, P., Hall, W., Dang, V. D., &Jennings, N. R. {2001}.
Autonomousagents for
 participating in multipleon-line auctions. In Pro c e e dingsof the IJCAI-2001
Workshop
 on E-Business and the Intel ligent Web Seattle, WA.
 Boutilier, C. {2002}. Apomdp formulation of preference elicitationproblems.
In Pro c e e dings of the Eighteenth National Conferenc e on Artificial Intel
ligence, pp. 239{246.
 Collins, M., Schapire, R. E., & Singer, Y. {2002}.Logistic regression, AdaBoost
and Breg- man distances. Machine Le arning, 48 {1/2/3}.
 Csirik,J. A., Littman, M. L., Singh, S., & Stone, P. {2001}. FAucS: An FCC
spectrum auction simulator for autonomous biddingagents. In Fiege, L., MÂȘuhl,
G., & Wilhelm,
 U. {Eds.},Electr onic Commerc e: Pro c e e dings of the Sec ondInternational
Workshop,
 pp. 139{151Heidelberg,Germany. SpringerVerlag.
 F reund, Y., & Schapire, R. E.{1997}. A decision-theoretic generalization
ofon-line learning
 and an application to boosting. Journal of Computer and System Sciences,
55 {1},
 119{139. Fritschi, C., & Dorer, K. {2002}.Agent-oriented software engineering
forsuccessful TAC
 participation.In First International Joint Conferenc e on Autonomous Agents
and Multi-Agent Systems Bologna. Garey, M. R., & Johnson, D. S. {1979}.Computers
and Intractability:A Guide to the
 Theoryof NP-completeness. Freeman,San Francisco, CA.
 240Decision-TheoreticBidding with Learned Density Models
 Gimenez-Funes, E., Godo, L., Rodriguez-Aguiolar, J. A., & Garcia-Calves,
P. {1998}. De-
 signing biddingstrategies for trading agents in electronic auctions.In Pro
c e e dings of the Third International Conferenc e on Multi-Agent Systems,
pp. 136{143.
 Ginsberg, M. L. {2001}.GIB: Imperfect information in a computationallychallenging
game.
 JAIR,14, 303{358.
 Greenwald,A.,& Boyan, J. {2001}. Biddingalgorithms for simultaneous auctions.
In Pro c e e dings of Third ACM Conferenc e on E-Commerc e, pp. 115{124 Tampa,
FL. Harsanyi, J. {1967{1968}. Gameswith incomplete information played by
bayesian players.
 Management Science, 14, 159{182,320{334,486{502. Hauskrecht, M. {1997}.
Incrementalmethods for computing bounds in partially observable
 Markov decisionprocesses.In Pro c e e dings of the Fourte enth National
Conferenc eon
 Artificial Intel ligence, pp. 734{739.
 He, M., & Jennings,N. R. {2002}. SouthamptonTAC:Designing a successful trading
agent. In Fifteenth Europ e anConferenc e on Artificial Intelligence Lyon,
France. Ito, T., Fukuta, N., Shintani, T., & Sycara, K. {2000}. Biddingbot:
amultiagent support
 systemfor cooperative bidding in multipleauctions. In Pro c e e dingsof
the Fourth
 InternationalConferenc e on MultiAgent Systems, pp. 399{400.
 Kearns, M., Mansour, Y., &Ng, A.Y. {1999}. A sparse sampling algorithmfor
near-
 optimal planning in large Markov decision processes. In Pro c e edings of
the Sixteenth
 InternationalJoint Conferenc e on Artificial Intelligence {IJCAI-99}, pp.
1324{1331. Klemperer, P. {1999}.Auction theory: A guide to the literature.Journal
of Economic
 Surveys, 13 {3}, 227{86.
 Littman, M. L., Ma jercik, S. M., & Pitassi, T. {2001}. Stochastic Booleansatisfiability.
 Journalof Automated Re asoning,27 {3}, 251{296.
 Papadimitriou, C. H., & Tsitsiklis, J. N. {1987}. Thecomplexity of Markov
decisionpro- cesses. Mathematics of Oper ationsRese ar ch, 12 {3},441{450.
 Preist, C., Bartolini, C., & Phillips, I. {2001}. Algorithm design for agents
which partici-
 pate in multiple simultaneousauctions. In Agent Mediate dElectr onic Commerc
e III {LNAI}, pp. 139{154. Springer-Verlag, Berlin.
 Reitsma, P.S. A., Stone, P., Csirik, J. A., & Littman, M. L. {2002}. Self-enforcing
strategic demand reduction. In AgentMediate d Electr onic Commerc e IV: Designing
Mecha-
 nisms and Systems, Vol. 2531 ofLe ctur e Notes in Artificial Intelligence,
pp. 289{306.
 Springer Verlag.
 Rodriguez-Aguilar, J. A., Martin, F. J., Noriega, P., Garcia, P.,& Sierra,
C. {1998}. Towards a test-bed for trading agents inelectronic auction markets.
AI Communications, 11 {1},
 5{19. 241Stone, Schapire,Littman, Csirik, & McAllester
 Rothkopf, M.H., & Harstad, R. M. {1994}. Modelingcompetitive bidding: A
critical essay.
 Management Science,40 {3}, 364{384.
 Rust, J., Miller,J., & Palmer, R. {1992}. Behaviorof trading automata in
a computerized double auction market. In Friedman, D., & Rust, J. {Eds.},
The Double Auction
 Market: Institutions, Theories, and Evidence. Addison-Wesley , Redwood City,CA.
 Schapire, R. E., & Singer, Y. {1999}.Improved boosting algorithms usingconfidence-rated
 predictions. MachineLe arning, 37 {3},297{336.
 Schapire, R. E., & Singer, Y.{2000}. BoosTexter: A boosting-basedsystem
for text cate-
 gorization.Machine Le arning, 39{2/3}, 135{168.
 Stone, C. J. {1994}.The use of polynomial splines and theirtensor products
in multivariate function estimation. The Annals ofStatistics, 22 {1}, 118{184.
Stone, P., & Greenwald, A.{2003}. The first international trading agent competition:
 Autonomous biddingagents.Electr onic Commerc e Rese ar ch. To appear. Stone,
P., Littman, M. L., Singh, S.,&Kearns, M. {2001}. ATTac-2000:An adaptive
 autonomous biddingagent.Journal of Artificial Intel ligence Rese ar ch,
15, 189{206. Tesauro, G., & Das, R. {2001}.High-performance bidding agents
for the continuous double
 auction. In Third ACM Conferenc e on Electr onic Commerc e, pp. 206{209.
Weber, R. J. {1997}. Makingmore from less: Strategic demand reduction inthe
FCC
 spectrum auctions.Journal of Economics and Management Strate gy, 6 {3},
529{548. Wellman, M. P., Greenwald, A.,Stone, P., & Wurman, P. R. {2002}.
The 2001 trading agent competition. In Pro c e e dingsof the Fourte enth
Innovative Applications of Artificial
 Intelligence Conferenc e, pp. 935{941. Wellman, M. P., Greenwald, A.,Stone,
P., & Wurman, P. R. {2003a}. The 2001 trading agent competition. Electr onicMarkets,13
{1}, 4{12. Wellman, M. P., Reeves, D.M.,Lochner, K. M., & Vorobeychik,Y.
{2003b}. Price prediction
 in a trading agent competition.Tech. rep., University of Michigan. Wellman,
M. P., Wurman,P. R., O'Malley, K., Bangera, R., Lin, S.-d., Reeves, D., &
Walsh,
 W. E. {2001}. A trading agent competition. IEEE Internet Computing,5 {2},
43{51.
 242