P. Stone, R. E. Schapire, M. L. Littman, J. A. Csirik and D. McAllester

Volume 19, 2003

**Links to Full Text:**

Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives.

Journal of ArtificialIntelligence Research 19 {2003} 209-242Submitted 12/02; published 9/03 Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, InteractingAuctions Peter Stonepstone@cs.utexas.edu Dept. of ComputerSciences, The University of Texasat Austin 1 University Station C0500, Austin, Texas 78712-1188 USA Robert E. Schapire schapire@cs.princeton.edu Department of Computer Science, Princeton University 35 Olden Street, Princeton, NJ 08544 USA Michael L. Littmanmlittman@cs.rutgers.edu Dept.of Computer Science, Rutgers University Piscataway, NJ 08854-8019 USA Janos A. Csirik janos@pobox.com D. E. Shaw & Co. 120 W 45thSt, New York, NY 10036 USA David McAllester mcallester@tti-chicago.edu Toyota TechnologicalInstituteat Chicago 1427 East 60th Street, Chicago IL, 60637 USA Abstract Auctions are becoming an increasinglypopular method for transacting business, espe- cially over the Internet. Thisarticle presents a general approach to buildingautonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of ourapproach learns a model of the empiricalpricedynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditionaldensity estimation problems of this kind,i.e., supervisedlearning problems in whichthe goal is to estimate the entireconditional distribution of the real-valuedlabel. This approach is fully implemented as ATT ac-2001, a top-scoring agent in the second Trading Agent Competition {TAC-01}. We present experimentsdemonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives. 1.Introduction Auctions are anincreasingly popular method for transacting business, especially over the Internet.In an auction for a single good, it isstraightforward to create automated bidding strategies|an agent could keep biddinguntil reaching a target reserve price, or itcould monitor the auction and place awinning bid just before the closing time {knownas sniping}. When bidding for multiple interacting goods in simultaneousauctions, on the other hand, agents mustbe able to reason about uncertainty and make complex value assess- ments. For example, an agent bidding on one's behalf in separate auctions for a camera and flash may end up buying the flash andthen not being able to find an affordablecamera. Alternatively, if bidding forthe same good in several auctions, it maypurchase two flashes when only one was needed. c fl2003AI Access Foundation and Morgan KaufmannPublishers. All rights reserved.Stone, Schapire, Littman, Csirik, & McAllester This article makesthree main contributions. The first contribution is a general ap- proachto buildingautonomous biddingagentsto bid in multiple simultaneous auctions for interacting goods. Westart with the observation that the key challenge in auctions is the prediction of eventual prices of goods: withcomplete knowledge of eventual prices, there are direct methods for determining the optimalbids to place. Our guiding principle is to havethe agent model itsuncertainty in eventual prices and, tothe greatest extent possible, analyticallycalculate optimal bids. To attack theprice prediction problem, we propose a machine-learning approach: gather examplesof previous auctions and the prices paid in them,then use machine-learning meth- odsto predict these prices based on available features in the auction. Moreover,forour strategy, weneeded to be able to model the uncertainty associatedwith predicted prices; in other words, we needed to be able to sample from a predicteddistribution of prices given the currentstate of the game. This can be viewed asa conditional density estimation problem, that is, a supervised learningproblem in which the goal is to estimate the entire distribution of a real-valued label given a description of currentconditions, typically in the formof a feature vector. The second main contribution of this article is a new algorithm for solving such general problems based onboosting {Freund & Schapire, 1997; Schapire& Singer, 1999}. Thethird contribution of this article is a complete descriptionof a prototype implemen- tation of ourapproach in the form of ATT ac-2001, a top-scoring agent 1 in the second Trading Agent Competition{TAC-01}that was held in Tampa Bay, FL on October 14, 2001 {Well- man, Greenwald, Stone, & Wurman, 2003a}. The TACdomain was the main motivation for the innovations reported here.ATT ac-2001 builds on top ofATT ac-2000 {Stone, Littman, Singh, & Kearns, 2001}, the top-scoring agentat TAC-00, but introduces a fundamentally new approach to creating autonomousbiddingagents. We present detailsof ATT ac-2001 as an instantiation ofits underlying principles that we believe have applications in a wide variety of bidding situations. ATT ac-2001 uses a predic- tive,data-driven approach to biddingbased on expected marginal values of all available goods. In this article, we present empirical results demonstrating the robustnessand effec- tiveness of ATT ac-2001's adaptive strategy. We also report on ATT ac-2001's performance at TAC-01 and TAC-02 and reflect on some of the key issues raised during the competitions. The remainder of the article isorganized as follows. In Section 2, wepresent our general approach to biddingfor multiple interacting goods in simultaneous auctions. In Section3, we summarize TAC, the substrate domain for our work. Section 4 describes our boosting-basedprice predictor. InSection 5, we give the details ofATT ac-2001. In Section 6, we present empirical results including asummary of ATT ac-2001's performancein TAC- 01, controlledexperiments isolating the successful aspects ofATT ac-2001, and controlled experiments illustrating someof the lessons learned during the competition.A discussion andsummary of related workis provided in Sections 7 and 8. 2. General Approach In a wide variety of decision-theoreticsettings, it is useful to be able to evaluate hypothetical situations.In computer chess, for example, a static board evaluator is used to heuristically1. Top-scoringby one metric, and second place by another. 210Decision-TheoreticBidding with Learned Density Models measure which player is ahead and by how much in a given board situation.The scenario is similar in auction domains,and our bidding agent ATT ac-2001uses a situation evaluator, analogous to the static board evaluator, whichestimates the agent's expectedprofit in a hypothetical futuresituation. This \profit predictor" has a widevariety of uses in the agent. For example, to determine the value of an item, the agent compares thepredicted profit assuming the item is alreadyowned to the predicted profit assuming that theitem is not available. Given prices for goods, one can oftencompute a set of purchases and an allocationthat maximizes profit. 2 Similarly, if closing prices are known, they can be treated as fixed, and optimal bids can be computed {bid high foranything you want to buy}. So,one natural profit predictor is simply tocalculate the profit of optimal purchases underfixed predicted prices. {Thepredicted prices can, of course, be differentin different situations, e.g., previous closing prices can be relevantto predicting future closing prices.} A more sophisticated approach to profitprediction is to construct a model of the prob- ability distribution over possiblefuture prices and to place bids that maximizeexpe cte d profit. Anapproximate solution to this difficult optimizationproblem can be created by stochastically sampling possible prices and computing a profit prediction as above for each sampled price. A sampling-based schemefor profit prediction is important for modeling uncertainty and the value of gaining information, i.e., reducingthe price uncertainty. Section2.1 formalizes this latter approach within asimplifiedsequential auction model. This abstraction illustrates some of thedecision-making issues in our full sampling-based approach presented in Section 2.2.The full setting that our approach addresses isconsid- erablymore complex than the abstractmodel, but our simplifying assumptions allow usto focus on a core challenge of thefull scenario. Our guiding principle is to make decision- theoretically optimal decisions given profit predictions for hypothetical futuresituations. 3 2.1 SimplifiedAbstraction In the simple model, thereare n items to be auctioned off insequence {first item 0, then item1, etc.}. The bidder must place a bidr i for each itemi, and after each bid, a closing pricey i is chosen for the corresponding item from a distribution specific to the item. If the bid matches or exceeds the closing price,r i 025 y i , the bidder holds itemi, h i = 1.Otherwise, the bidder does not hold theitem, h i = 0.The bidder's utility v{H} is a function of its final vector of holdings H = {h 0 ; : : : ; h n0001 } and its costis a function of the holdings and the vector of closing prices, H001Y . We willformalize the problem of optimal bid selection and develop a seriesof approximations to make the problem solvable.2. The problem iscomputationally difficult in general, buthas beensolved effectively in the non-trivial TAC setting {Greenwald & Boyan, 2001; Stone et al., 2001}. 3.An alternative approach would be toabstractly calculate the Bayes-Nash equilibrium{Harsanyi, 1968} for the game and playthe optimal strategy. We dismissed thisapproach because of its intractability in realistically complex situations, including TAC. Furthermore, even if we wereable to approximate the equilibrium strategy, it is reasonable to assume that our opponents would not play optimal strategies. Thus, we could gain additional advantageby tuning our approach to our opponents'actual behavior as observed in theearlier rounds, which is essentially the strategywe adopted. 211Stone, Schapire, Littman, Csirik, & McAllester 2.1.1 ExactValue What is the valueof the auction, that is, the bidder'sexpected profit {utility minus cost} for biddingoptimally for the rest of theauction? If a bidder knows this value, it can make its next bid to be one that maximizes its expected profit.The value is a function of the bidder's current holdings Hand the current item to bid on,i. It can be expressed as value{i; H } = max r i E y i max r i+1 E y i+1 : : :max r n0001 E y n0001 {v{G+ H } 000 G 001Y }; {1} wherethe components of G are the new holdingsas a result of additional winnings g j 021 r j 025 y j . Note that H onlyhas non-zero entries for items that havealready been sold {8j025 i; H j = 0} and G only has non-zero entriesfor items that have yet to be sold {8j < i; G j = 0}. Note also thatG and Y are fully specified whenthe g j and y j variables {forj 025 i} are boundsequentially by the expectation and maximization operators. The idea here is that the bidsr i through r n0001 are chosen tomaximize value in the context of the possibleclosing prices y j . Equation 1 is closelyrelated to the equations defining the value of a finite-horizon par- tiallyobservable Markov decisionprocess {Papadimitriou & Tsitsiklis, 1987} or astochastic satisfiability expression{Littman, Ma jercik, & Pitassi, 2001}.Like these other problems, thesequential auction problem is computationally intractable for sufficiently general repre- sentations of v{001} {specifically, linear functions of theholdings are not expressive enough to achieve intractability while arbitrarynonlinear functions are}. 2.1.2Approximate Value by Reordering There are three ma jor sources of intractability in Equation 1|the alternation of themaxi- mizationand expectation operators{allowing decisions to be conditioned on an exponential number of possible setsof holdings}, the large number of maximizations {forcing an ex- ponential numberof decisions to be considered}, and the large number of expectations {resultingin sums over an exponential numberof variable settings}. We attack the problem of interleaved operators by moving all but the first of themaxi- mizations inside the expectations,resulting in an expression that approximates the value: v alue-est{i; H} = max r i E y i E y i+1 : : : E y n0001 max r i+1 : : :max r n0001 {v{G +H } 000 G 001 Y}: {2} Because the choicesfor bids r i+1 through r n0001 appear more deeply nested than the bindings for the closing prices y i through y n0001 , they cease to be bidsaltogether, and instead represent decisions as to whether to purchase goods atgiven prices. Let G = opt{H; i; Y } be a vector representing the optimal number of goods to purchase at the prices specified bythe vector Y giventhe current holdings H startingfrom auction i. Conceptually,this can be computed by evaluating opt{H; i; Y} = argmax g i ;:::;g n0001 {v{G +H } 000 H 001Y }: {3} Thus,Equation 2 can be written: value-est{i; H } = max r i E y i ;:::;y n0001 {v{opt{H 0 ;i + 1; Y } + H 0 } 000 opt{H 0 ; i + 1;Y } 001 Y } {4} 212Decision-TheoreticBidding with Learned Density Models where H 0 is identicalto H except the i-th componentreflects whether item i is won|r i 025 y i . Note that there is afurther approximation that can be made bycomputing the expected prices {as pointvalues} before solving the optimizationproblem. This approach corresponds to further swapping the expectations towards the core of the equation: value-est{i; H } ev = max r i {v{opt{H 0 ; i + 1; E Y } + H 0 } 000 opt{H 0 ;i + 1;E Y } 001 E Y } {5} whereE Y = E[y i+1 ;: : : ; y n0001 ], the vector of expected costs ofthe goods. In the remainder of the article, we refer to methods thatuse this further approximation from Equation 5 as expected value approaches for reasonsthat will be come apparent shortly. The technique of swapping maximization and expectation operators was previously used by Hauskrecht {1997} to generate a bound for solving partially observableMarkov decision processes.The decrease of uncertainty when decisions aremade makes this approximation anupper bound on the true valueof the auction: value-est 025value. The tightness of the ap- proximations in Equations 2 and 5 dependson the true distributions of the expected prices. For example, if the prices were known in advance with certainty, thenboth approximations are exact. 2.1.3 Approximate Bidding Given a vector of costsY , the optimization problem opt{H; i; Y } in Equation 4 is stillNP- hard {assuming the representation of the utility function v{001} issufficiently complex}. For many representations of v{001}, the optimization problem can be castas an integer linear pro- gram and approximated by using the fractional relaxation insteadof the exact optimization problem.This is precisely the approach we haveadopted in ATT ac {Stone et al., 2001}. 2.1.4 Approximation via Sampling Even assuming that opt{H;i; Y } can be solved in unit time, aliteral interpretation of Equa- tion 4 says we'll need to solve this optimization problemfor an exponential number of cost vectors {or even more if the probabilitydistributions Pr{y j } are continuous}. Kearns,Man- sour, and Ng {1999} showed that values of partially observableMarkov decisionprocesses could beestimated accurately by sampling tra jectoriesinstead of exactly computing sums. Littman et al. {2001} did the same for stochastic satisfiability expressions. Applyingthis idea to Equation 4 leads to the following algorithm. 1. Generate a setS of vectors of closing costsY according to the product distribution Pr{y i }002 001 001 001 002 Pr{y n0001 }. 2. For each of these samples,calculate opt{H 0 ; i + 1; Y } as definedabove and average the results, resulting in the approximation value-est s {i; H } = max r i X Y 2S {v{opt{H 0 ; i + 1; Y} + H 0 }000 opt{H 0 ; i + 1; Y }001Y }=jS j:{6} This expression converges to value-est with increasing sample size. A remaining challenge in evaluatingEquation 6 is computing the real-valuedbid r i that maximizes the value. Notethat we want to buy item iprecisely at those closing prices for 213Stone, Schapire, Littman, Csirik, & McAllester which the value of having the item {minus its cost}exceeds the value of not having the item; this maximizes profit. Thus,to make a positive profit, we arewilling to pay up to, but not more than, the difference in valueof having the item and not having the item. Formally , let H bethe vector of current holdings andH w be the holdings modified to reflect winning itemi. Let G w {Y } = opt{H w ; i+1; Y}, the optimal set of purchases assuming item i was won, and G{Y } = opt{H;i+1; Y } the optimal set of purchases assuming otherwise {exceptin cases of ambiguity, we write simply G w and G forG w {Y } andG{Y} respectively}. We want to select r i to achieve the equivalence r i 025y i 021 X Y 2S {v{G w +H } 000 G w 001 Y }=jSj 000 y i 025 X Y 2S {v{G +H } 000 G 001 Y}=jS j: {7} Setting r i = X Y 2S {[v{G w + H } 000 G w 001 Y ] 000[v{G + H }000 G 001 Y ]}=jS j: {8} achievesthe equivalence desired in Equation 7,as can be verified by substitution, and therefore biddingthe average differencebetween holding and not holding the itemmaximizes the value. 4 2.2 The FullApproach Leveraging from the precedinganalysis, we define our sampling-based approachto profit prediction in general simultaneous, multi-unit auctions for interacting goods. In this sce- nario, let there be n simultaneous, multi-unitauctions for interacting goods a 0 ; : : : ; a n0001 . The auctions might close at different times and these times arenot, in general, known in advance to the bidders. When an auction closes,let us assume that the m units available are distributed irrevocably tothe m highest bidders, who each need to pay the price bid bythe mth highest bidder. This scenario correspondsto an mth price ascending auction. 5 Note that the same bidder mayplace multiple bids in an auction, and therebypotentially win multiple units.We assume that after the auction closes, thebidders will no longer have anyopportunity to acquire additional copies of thegoods sold in that auction {i.e., there is no aftermarket}. Our approach is based upon fiveassumptions. For G = {g 0 ; : : : ; g n0001 } 2IN n , letv{G} 2 IR represent the value derived bythe agent if it owns g i units of the commodity beingsold in auction a i . Note that v isindependentof the costs of the commodities.Note furtherthat this representation allows for interacting goods of all kinds,including complementarity and substitutability. 6 The assumptions of our approach are as follows: 1. Closing prices are somewhat, butonly somewhat, predictable. That is, given aset of input features X ,for each auction a i , there exists a sampling rule that outputsa4. Note that thestrategy for choosing r i in Equation 8 does not exploit the factthat the sample S contains only a finite set of possibilities fory i , which might make it more robust to inaccuracies in the sampling. 5. For large enough mit is practically the same as the more efficient m + 1st auction. Weuse the mth price model because that is what is used in TAC'shotel auctions. 6. Goodsare considered complementary if their value as a package is greater than the sum of theirindividual values; goodsare consideredsubstitutable if their value as a package isless than the sum of their individualvalues. 214Decision-TheoreticBidding with Learned Density Models closing price y i according to a probability distribution of predicted closing prices for a i . 2. Given a vector ofholdings H = {h 0 ; : : : ; h n0001 } where h i 2IN represents the quantity of the commodity being sold in auctiona i that are already owned by the agent, and given a vector of fixed closing prices Y = {y 0 ; : : : ; y n0001 }, thereexists a tractable procedure opt{H; Y } to determine the optimal setof purchases {g 0 ; : : : ; g n0001 } where g i 2 IN representsthe number of goods to be purchasedin auction i such that v{opt{H; Y } + H} 000 opt{H; Y} 001 Y 025 v{G + H } 000 G001 Y for all G2 IN n .This procedure corresponds to the optimizationproblem opt{H; i; Y } in Equation 3. 3. Anindividual agent's bids do not have anappreciable effect on the economy {large population assumption}. 4.The agent is free to change existing bidsin auctions that have not yet closed. 5. Future decisions are made in thepresence of complete price information. Thisas- sumption corresponds to the operatorreordering approximation from the previous section. While these assumptions are notall true in general, they can be reasonable enough approx- imations to be the basis for an effective strategy. By Assumption 3,the price predictor can generate predicted prices prior to considering one's bids. Thus,we can sample from these distributionsto produce complete sets of closing prices of all goods. For each good under consideration, weassume that it is the next one to close.If a different auction closes first, we can then revise our bids later {Assumption4}. Thus, we would like tobid exactly the good's expe cte dmarginal utility to us. Thatis, we bid the differencebetween the expected utilities attainable with and without the good. Tocompute these expectations, we simply average the utilities of having and not havingthe good under differentprice samples as in Equation 8. Thisstrategy rests on Assumption 5 in that we assume that biddingthe good's current expected marginal utility cannot adversely affect our future actions, for instance by impacting our future space of possible bids.Note that as time proceeds, the pricedistributions change in response to the observed price tra jectories, thuscausing the agent to continually revise itsbids. Table 1 shows pseudo-code for the entire algorithm. A fully detaileddescription of an instantiationof thisapproach is given in Section 5. 2.3 Example Considera camera and a flash with interacting values to an agent as shown in Table 2. Further, consider that theagent estimates that the camera will sell for $40with probability 25045, $70 with probability 50045, and $95 with probability 25045.Consider the question of what the agentshould bid for the flash {in auctiona 0 }. Thedecision pertaining to the camerawould be made via a similar analysis. 215Stone, Schapire, Littman, Csirik, & McAllester * Let H = {h 0 ; : : : ; h n0001 } be the agent's current holdings in each of then auctions. * Fori= 0 to n 000 1{assume auction i is next to close}: { total-diff = 0 { counter = 0 {As time permits: 003For each auction a j ; j 6=i, generate a predicted price sampley j . LetY = {y 0 ; : : : ; y i0001 ; 1; y i+1 ; : : : ; y n0001 }. 003 Let H w = {h 0 ; : : : ; h i0001 ; h i + 1; h i+1 ; : : : ; h n0001 }, the vector of holdings ifthe agent wins a unit in auctiona i . 003Compute G w = opt{H w ; Y },the optimalset of purchases if the agent wins a unit in auction a i . Note that no additionalunitsof the good will be purchased, since the i-th component of Yis 1. 003 ComputeG = opt{H; Y },the optimalset of purchases if the agent never acquires any additional units in theauction a i and prices areset to Y . 003 diff= [v{G w + H } 000 G w 001 Y ] 000[v{G + H }000 G 001 Y ] 003total-diff = total-diff + diff 003 counter = counter+ 1 { r = total-diff=counter { Bidr in auction a i .Table 1:The decision-theoretic algorithm for bidding in simultaneous, multi-unit, interact- ing auctions.utilitycamer a alone$50flash alone10both100neither0T able 2: Thetable of values for all combination ofcamera and flash in our example. First,the agent samples from the distributionof possible camera prices. When the price of the camera {sold in auctiona 1 } is $70 in thesample: * H = {0; 0}; H w = {1; 0}; Y ={1; 70} * G w = opt{H w ; Y } is the bestset of purchases the agent can make with the flash, and assuming the camera costs $70.In this case, the only two options arebuying the camera or not. Buyingthe camera yields a profit of 100000 70 = 30. Not buying the camera yields a profit of 10000 0 = 10. Thus, G w = {0; 1},and [v{G w + H } 000 G w 001 Y ] = v{1; 1} 000{0; 1} 001 {1; 70} = 100000 70. * Similarly G = {0; 0} {since if the flash is not owned, buying the camera yields a profit of 50 000 70 = 00020, and notbuying it yields a profit of 0 0000 = 0} and [v{G+ H } 000 G 001 Y] = 0. 216Decision-TheoreticBidding with Learned Density Models * val = 30000 0= 30. Similarly, when the camera ispredicted to cost $40, val = 6000010 = 50; andwhen the camera is predicted to cost $95, val= 10 000 0 = 10. Thus,we expect that 50045 of the camera price samples will suggest a flash value of $30, while 25045 willlead to a value of $50 and the other 25045 willlead to a value of $10. Thus,the agent willbid :5 00230 + :25 002 50 + :25 002 10 = $30 forthe flash. Notice that in this analysisof what to bid for the flash, the actual closingprice of the flash is irrelevant. The proper bid depends only on the predicted price of the camera. To determine the proper bid for the camera, asimilar analysis would be done using the predicted price distributionof the flash. 3. TAC W einstantiated our approach as an entry in thesecond Trading Agent Competition {TAC}, as described in this section.Building on the success of TAC-00held in July 2000 {Wellman, W urman, O'Malley, Bangera, Lin, Reeves,& Walsh, 2001}, TAC-01 included19 agents from 9 countries {Wellman et al., 2003a}. A key feature ofTAC is that it required autonomous bidding agents to buy and sellmultiple interacting goo dsinauctions of different types.It is designed as a benchmark problemin the complex and rapidly advancingdomain of e- marketplaces, motivating researchers to apply unique approaches to a common task. By providinga clear-cut ob jective function, TAC also allows the competitors to focustheir attention on the computational andgame-theoretic aspects of the problem and leaveaside the modeling and model validation issues that invariablyloom large in real applications of automated agents to auctions {see Rothkopf& Harstad, 1994}. Another feature of TAC is that it provides an academicforum for open comparison of agent biddingstrategies in a complex scenario, as opposedto other complex scenarios, such as trading inreal stock markets, in whichpractitioners are {understandably} reluctant to sharetheir technologies. A TACgame instance pits eight autonomous biddingagentsagainst one another. Each TAC agent is a simulated travel agent witheight clients, each of whom wouldlike to travel from TACtown to Tampa and back againduring a 5-day period. Eachclient is characterized by a randomset of preferences for the possible arrival and departure dates, hotel rooms, and entertainment tickets. To satisfy a client, an agent mustconstruct a travel package for that client by purchasing airline ticketsto and from TACtown and securing hotelreservations; it is possible to obtain additional bonuses by providing entertainment tickets as well. A TAC agent's score in a game instance is thedifference between the sum of its clients'utilities for the packagesthey receive and the agent's total expenditure.We provide selected details about the game next; for full details onthe design and mechanisms of the TAC server and TACgame, see http://www.sics.se/tac. TAC agents buy flights, hotel rooms and entertainment tickets through auctionsrun from the TAC server at the University of Michigan. Eachgame instance lasts 12 minutes andincludes a total of 28 auctions of 3different types. Flights{8 auctions}: There is a separate auctionfor each type of airline ticket:to Tampa {inflights} on days 1{4 and from Tampa{outflights} on days 2{5.There is an unlimited supply of airlinetickets, and every 24{32 seconds their ask price changes by from 000$10 217Stone, Schapire, Littman, Csirik, & McAllester to $x. x increases linearly over thecourse of a game from 10 to y, wherey 2[10; 90] ischosen uniformlyat random for each auction, and is unknown to the bidders. In all cases, tickets are priced between$150 and $800. When the server receives abid at or above the ask price, thetransaction is cleared immediately at the ask priceand no resale is allowed. Hotel Rooms {8}: Thereare two different types of hotel rooms|the Tampa Towers {TT} and the Shoreline Shanties {SS}|each ofwhich has 16 rooms availableon days 1{4. The rooms are sold ina 16th-price ascending {English} auction,meaning that for each of the 8 typesof hotel rooms, the 16 highest bidders get therooms at the 16th highest price.For example, if there are 15 bidsfor TTon day 2 at $300, 2 bids at $150, and any number of lower bids, therooms are sold for $150 to the 15 high biddersplus one of the $150 bidders {earliestreceived bid}. The ask price is the current 16th- highest bid and transactions clearonly when the auction closes. Thus,agents have no knowledge of, forexample, the current highest bid. Newbids must be higher than the currentask price. No bid withdrawal or resale isallowed, though the price of bids may be lowered provided the agentdoes not reduce the number of rooms itwould win were the auction to close.One randomly chosen hotelauction closes at minutes 4{11 of the 12-minute game. Ask prices are changed only on the minute. Entertainment Tickets {12}: Alligator wrestling, amusement park, and museum tickets are each sold for days 1{4 in continuous double auctions. Here, agents canbuy and sel l tickets, withtransactions clearing immediately when one agentplaces a buy bid at a price at least as high as another agent's sell price.Unlike the other auction typesin which the goods are sold from a centralized stock, each agent starts with a {skewed} random endowment of entertainment tickets. The prices sent toagents are the bid-ask spre ads, i.e., the highest currentbid price and the lowest current ask price {due to immediate clears, ask price is always greater than bid price}. In this case,bid withdrawal and ticket resale are both permitted. Eachagent gets blocks of 4 tickets of 2 types, 2 tickets of another 2types, and no tickets of the other 8 types. In addition to unpredictablemarketprices, other sources of variabilityfrom game in- stance to game instance arethe client profiles assigned to the agents andthe random initial allotment of entertainment tickets. Each TAC agent has eight clients with randomlyas- signed travel preferences.Clients have parameters for ideal arrival day, IAD {1{4};ideal departure day,IDD {2{5}; hotelpremium, HP{$50{$150}; and entertainment values, EV {$0{$200} foreach type of entertainment ticket. The utility obtained by a client isdetermined by the travel package that it is given in combinationwith its preferences. To obtain a non-zeroutility, the client must be assigned a feasible travel package consisting of an inflight on some arrival day AD, an outflight on a departure day DD, and hotel rooms of the same type {TT or SS} for the days in between {days dsuch that AD 024d < DD}. At most one entertainment ticket of each type can be assigned, andno more than one on each day.Given a feasible package, the client's utility isdefined as 1000 000 travelPenalty+ hotelBonus + funBonus where 218Decision-TheoreticBidding with Learned Density Models * travelPenalty = 100{jAD000 IAD j +jDD 000 IDD j} * hotelBonus = HPif the client is in the TT, 0 otherwise. * funBonus = sum ofEVs for assigned entertainment tickets. A TAC agent's scor e is the sum of its clients' utilitiesin the optimal allocation of itsgoods {computed by the TACserver} minus its expenditures.The client preferences, allocations,and resulting utilities from a sample game are shown in Tables 3 and 4.ClientIAD IDD HP AWAP MU1Day2 Day 5 73 175 34 242Day 1 Day 3 125 113 124573Day 4 Day 5 73157 12 1774Day 1 Day 2102 50 67 495Day 1 Day3 75 12 135 1106Day2 Day 4 86 197 8597Day 1 Day 5 9056 197 1628Day 1 Day3 50 79 92 136Table 3: ATT ac-2001's client preferences from anactual game. AW, AP, and MU are EVs foralligator wrestling, amusement park, and museumrespectively.ClientADDD Hotel Ent'mentUtility1Day2 Day 5 SS AW411752Day 1 Day 2TT AW111383Day3 Day 5 SS MU3, AW412344Day 1Day 2 TT None11025Day 1 Day 2 TTAP111106Day 2Day 3 TT AW211837Day 1 Day 5SS AF2, AW3, MU414158Day 1 Day 2 TT MU11086Table 4: ATT ac-2001's client allocations and utilities from thesame actual game as that in Table 3. Client 1's \4" under \Ent'ment" indicates on day 4. Therules of TAC-01 are largely identical tothose of TAC-00, with three important exceptions: 1. In TAC-00, flight prices did not tend toincrease; 2. In TAC-00,hotel auctions usually all closed at the end ofthe game; 3. In TAC-00,entertainment tickets were distributeduniformly to all agents Whilerelatively minor on the surface, these changessignificantly enriched the strategic complexity of the game. Stoneand Greenwald {2003} detail agent strategiesfrom TAC-00. 219Stone, Schapire,Littman, Csirik, & McAllester TAC-01 was organized as a series of fourcompetition phases, culminating with the semifinals and finals on October 14, 2001at the EC-01 conference in Tampa,Florida. First, the qualifying round,consisting of about 270 games per agent,served to select the 16 agentsthat would participate in the semifinals.Second, the seeding round, consisting of about 315 games per agent, was usedto divide these agents into two groups ofeight. After the semifinals, on themorning of the 14th consisting of 11 games in each group, four teams from each group were selected to compete in the finals duringthat same afternoon. The finalsare summarized in Section 6. TAC is not designed to be fully realistic in the sense that an agent from TACis not immediately deployable in thereal world. For one thing, it isunrealistic to assume that an agent wouldhave complete, reliableaccess to all clients'utility functions {or even that the client would!}; typically, some sort of preference elicitation procedurewould be required {e.g. Boutilier,2002}. For another, theauction mechanisms are somewhat contrived for the purposes of creating an interesting, yetrelatively simple game. However,each mechanism is representativeof a class of auctions that is used in the realworld. And it is not difficult to imagine a future in which agents doneed to bid in decentralized, related, yet varying auctions for similarly complex packages of goods. 4.Hotel Price Prediction As discussedearlier, a central part of our strategy dependson the ability to predict prices, particularly hotel prices, at variouspoints in the game. To do this asaccurately as possible, we used machine-learning techniques that would examine thehotel prices actually paid in previous gamesto predict prices in future games. Thissection discusses this part of our strategyin detail, including a new boosting-basedalgorithm for conditional density estimation. There is bound to beconsiderable uncertainty regarding hotel prices since these depend on many unknown factors, such as the time at which the hotel roomwill close, who the other agents are,what kind of clients have been assigned toeach agent, etc. Thus, exactly predicting the price of a hotel room ishopeless. Instead, we regard the closingprice as a random variablethat we need to estimate, conditionalon ourcurrent state of knowledge {i.e., number of minutes remaining in the game, ask price of each hotel, flightprices, etc.}. We might then attemptto predict this variable's conditional expected value. However, ourstrategy requires that we not only predict expected value, but that we also be ableto estimate the entireconditional distribution so that we cansample hotel prices. To set this up as a learning problem, wegathered a set of training examples from previously played games. We defined a set of features for describing each example that together are meant tocomprise a snap-shot of all the relevant information available at the time each prediction is made. Allof the features we used are real valued; a couple of the featurescan have a special value? indicating \value unknown."We used the following basicfeatures: * The number ofminutes remaining in the game. * The price of each hotel room,i.e., the current ask price for roomsthat have not closed or the actualselling price for rooms that have closed. 220Decision-TheoreticBidding with Learned Density Models * The closing time of each hotel room. Note that this feature is defined evenfor rooms that have not yet closed,as explained below. * The prices of each of the flights. To this basic list, we added a number of redundant variations, which wethought might help the learning algorithm: * The closing price of hotel rooms that have closed {or ? if the room has not yet closed}. * The current ask price of hotel rooms thathave not closed {or ? if the room has already closed}. * The closing time of each hotel room minus the closing time of the room whose price we are trying to predict. * The number of minutes fromthe current time until each hotel roomcloses. During the seeding rounds, it wasimpossibleto know during play who our opponents were, although this information was available at the end of each game,and therefore during training. Duringthe semifinals and finals, we did know theidentities of all our competitors. Therefore, in preparation for the semifinals andfinals, we added the following features: * The number of playersplaying {ordinarily eight, but sometimes fewer,for instance if one or more playerscrashed}. * A bit for eachplayer indicating whether or not that playerparticipated in this game. Wetrained specialized predictors for predicting theprice of each type of hotel room. In other words, one predictor was specialized for predicting only the price of TT on day 1, anotherfor predicting SS on day2, etc. This would seem to require eightseparate predictors. However,the tournament game is naturally symmetric aboutits middle in the sense that we cancreate an equivalent game by exchangingthe hotel rooms on days 1 and 2 with those on days 4 and 3 {respectively}, and by exchanging the inboundflights on days1, 2, 3 and 4 withthe outbound flights on days 5, 4, 3 and2 {respectively}. Thus, withappropriate transformations, the outer days {1 and4} can be treated equivalently ,and likewise for the inner days {2 and 3}, reducing the number of specializedpredictors by half. We also createdspecialized predictors for predicting in the firstminute after flight prices hadbeen quoted but prior to receiving any hotelprice information. Thus, a total of eight specialized predictors were built {for each combination of TT versus SS, inner versusouter day, and first minute versus not first minute}. We trained our predictors to predict not theactual closing price of each room per se, but rather how much the price wouldincrease, i.e., the difference betweenthe closing price and the current price.We thought that this might be aneasier quantity to predict, and, because our predictor never outputs a negative numberwhen trained on nonnegative data, this approach also ensures that we neverpredict a closing price below the current bid. From each of the previously played games, wewere able to extract manyexamples. Specifically, for eachminute of the game and for each room thathad not yet closed, we 221Stone, Schapire,Littman, Csirik, & McAllester extracted the values of all of the features described above at that moment in the game, plus the actual closing price of the room {which we are trying to predict}. Note thatduringtraining, there is no problem extracting theclosing times of all of the rooms.During the actual play of a game, we donot know the closing times of rooms that have not yet closed. However,we do know the exact probability distributionfor closing timesof all of the rooms that have not yet closed. Therefore,to sample a vector of hotel prices, wecan first sample according to this distribution over closing times, and then use ourpredictorto sample hotel prices using these sampled closingtimes. 4.1 The Learning Algorithm Having described how we set up thelearning problem, we are now ready to describe the learning algorithm that we used.Briefly, we solved this learning problemby first reducing to a multiclass, multi-label classification problem {or alternativelya multiple logistic regression problem},and then applying boosting techniques developed by Schapire and Singer {1999, 2000} combined with a modification of boosting algorithms for logistic regression proposed by Collins, Schapire and Singer {2002}.The result is a new machine-learning algorithmfor solving conditional density estimationproblems, described in detail in the remainder ofthis section. Table 5 showspseudo-code for the entire algorithm. Abstractly, we are given pairs {x 1 ; y 1 }; : : : ; {x m ; y m } where each x i belongs to a spaceX and each y i is in R . In our case, thex i 's are the auction-specific feature vectors described above; for some n, X022 {R [ f?g} n . Each target quantity y i is the difference between closing price and current price.Given a new x, our goal is toestimate the conditional distribution of y given x. We proceed with the working assumption thatall training and test examples {x;y} are i.i.d. {i.e, drawn independently from identical distributions}.Although this assumption is false in ourcase {since the agents, including ours, are changing over time}, it seems like a reasonable approximation that greatly reduces thedifficulty of the learning task. Our first step is to reduce the estimationproblem to a classification problem by breaking the range of the y i 's into bins: [b 0 ; b 1 }; [b 1 ; b 2 }; : : : ; [b k ; b k+1 ] for some breakpoints b 0 < b 1 < 001 001 001 < b k 024 b k+1 where for our problem, we chosek = 50. 7 Theendpoints b 0 andb k+1 are chosen to be the smallest and largest y i values observed during training. We choose theremaining breakpoints b 1 ; : : : ; b k so that roughly an equal number of training labels y i fall into each bin.{More technically, breakpoints are chosen so that the entropy of thedistribution of bin frequencies is maximized.} For each of the breakpointsb j {j =1; : : : ; k}, our learningalgorithm attempts to estimate theprobability that a new y {givenx} willbe at least b j . Given such estimatesp j for each b j , we can thenestimate the probability that y isin thebin [b j ;b j+1 } byp j+1 000p j {and we can then use a constant density withineach bin}. We thus have reducedthe problem to one of estimating multipleconditional Bernoulli variables corresponding tothe event7.We did not experiment with varyingk, but expect that the algorithm is notsensitive to it for sufficiently large values of k. 222Decision-TheoreticBidding with Learned Density ModelsInput:{x 1 ; y 1 }; : : : ; {x m ; y m }where x i 2 X, y i 2 R positive integers k andT Compute breakpoints:b 0 < b 1 < 001 001 001< b k+1 where * b 0 =min i y i * b k+1 = max i y i * b 1 ; : : : ; b k chosen to minimize P k j=0 q j ln q j where q 0 ;: : : ; q k arefraction of y i 's in [b 0 ; b 1 }; [b 1 ; b 2 }; : : : ; [b k ; b k+1 ]{using dynamic programing} Boosting: * for t= 1; : : : T : 000compute weights W t {i; j } = 11 + e s j {y i }f t {x i ;j} where s j {y} is as in Eq. {10} 000 use W t to obtain base function h t :X 002 f1;: : : ; kg ! R minimizing m X i=1 k X j=1 W t {i;j }e 000s j {y i }h t {x i ;j} over all decision rules h t considered. The decision rules can take any form. Inour work, we use \decision stumps," or simplethresholds on one of the features. Output sampling rule: * let f = T X t=1 h t * let f 0 ={f+ f}=2wheref {x;j } = maxff {x; j 0 } :j 024 j 0 024 kg f{x; j } = minff {x; j 0 } : 1 024 j 0 024 j g * tosample, given x 2 X 000 let p j = 11+ e 000f 0 {x;j} 000let p 0 =1; p k+1 =0 000 choose j2 f0; : : : ; kgrandomly with probability p j 000 p j+1 000 choose yuniformly at random from [b j ; b j+1 ] 000 outputyTable5: The boosting-based algorithm forconditional density estimation. y025 b j , and for this,we use a logistic regression algorithm based on boosting techniques as described byCollinset al. {2002}. Our learning algorithmconstructs a real-valued function f: X 002 f1; : : : ; kg ! R with the interpretation that 11 + exp{000f{x; j }) {9} is our estimate of the probability thaty 025b j ,given x. The negative log likelihood of the conditional Bernoulli variable corresponding to y i being above or belowb j is then ln 020 1 + e 000s j {y i }f {x i ;j} 021 223Stone, Schapire,Littman, Csirik, & McAllester where s j {y} = { +1 ify 025 b j 0001 if y < b j . {10} We attempt tominimize this quantity for all training examples{x i ; y i } and all break- points b j . Specifically, we try to find afunction f minimizing m X i=1 k X j=1 ln 020 1 + e 000s j {y i }f {x i ;j} 021 : We use a boosting-like algorithm described by Collins et al. {2002}for minimizing ob jective functionsof exactly this form. Specifically, we build the function f inrounds. On each roundt, we add a new base functionh t : X 002f1;: : : ; kg !R . Let f t = t0001 X t 0 =1 h t 0 be the accumulating sum.Following Collins, Schapire and Singer, toconstruct each h t , we first let W t {i; j }= 11+ e s j {y i }f t {x i ;j} be a set of weights on example-breakpoint pairs. We then choose h t to minimize m X i=1 k X j=1 W t {i; j }e 000s j {y i }h t {x i ;j} {11} over some space of \simple" basefunctions h t .For this work, we considered all\decision stumps" h of the form h{x; j } = 8 > < > : A j if 036{x}025022 B j if036{x}< 022 C j if 036{x}=? where036{001} is one of the featuresdescribed above, and 022, A j , B j and C j are all real numbers. In other words, such an h simplycompares one feature 036 to a threshold022 andreturns a vectorof numbers h{x; 001} that depends only on whether036{x} is unknown {?}, orabove or below022. Schapire and Singer {2000} show how to efficiently search for the bestsuch h over all possible choices of 036, 022, A j , B j and C j .{We also employed their technique for \smoothing"A j , B j andC j .} Whencomputed by this sort of iterative procedure,Collins et al. {2002} prove the asymptotic convergence of f t to the minimum of the objective function in Equation {11} over all linear combinations of the basefunctions. For this problem, we fixedthe number of rounds toT = 300. Let f =f T +1 bethe final predictor. As noted above,given a new feature vector x, wecompute p j as in Equation{9} to be our estimate for the probability that y 025 b j , and we let p 0 = 1 and p k+1 = 0. For this to makesense, we need p 1 025 p 2 025 001 001 001 025 p k , or equivalently ,f {x; 1} 025f {x; 2} 025 001001 001 025 f {x;k}, a condition that may not holdfor the learned function f . To force this condition, we replace f by a reasonable {albeitheuristic} approximation f 0 that is nonincreasing in j ,namely, 224Decision-TheoreticBidding with Learned Density Models f 0 = {f+ f}=2 wheref {respectively, f} is the pointwise minimum {respectively, maximum} of all nonincreasing functions gthateverywhere upper bound f{respectively, lower bound f }. With this modifiedfunction f 0 , we cancompute modified probabilities p j . To sample a single point according to the estimateddistributionon R associated withf 0 , we choose bin [b j ;b j+1 } with probabilityp j 000 p j+1 , and then select a point from this bin uniformly at random. Expected value according to this distributionis easily computed as k X j=0 {p j 000p j+1 } 022 b j+1 + b j2 : Although we present results using thisalgorithm in the trading agent context, we did not test its performance on more generallearning problems, nor did we compare to other methods for conditional density estimation, such as those studied by Stone {1994}.This clearly should be an area forfuture research. 5. ATT ac-2001 Having described hotel price predictionin detail, we now present the remainingdetails of ATT ac-2001'salgorithm. We begin with a briefdescription of the goods allocator, which is used as a subroutine throughout the algorithm.We then present the algorithm in a top-down fashion. 5.1Starting Point A core subproblemforTAC agents is the allocation problem:finding the most profitable allocationof goods to clients, G 003 , given a set of owned goods and prices for all other goods. The allocation problem corresponds tofindingopt{H; i; Y }in Equation 3. We denote the value of G 003 {i.e., the score one would attainwith G 003 } asv{G 003 }.The general allocation problemis NP-complete, as it is equivalentto the set-packing problem {Garey & Johnson, 1979}. However itcan be solved tractably in TAC via integer linearprogramming {Stone et al., 2001}. The solution to the integer linear programis a value-maximizing allocation of owned resources to clients along with a list ofresources that need to be purchased.Using the linear programming package \LPsolve", ATT ac-2001is usually able to find the globally optimal solution in under 0.01 seconds on a 650 MHz Pentium II. However, sinceinteger linear programming is anNP-complete problem, some inputs can lead to a greatdeal of search over the integrality constraints, andtherefore significantly longersolution times. When only v{G 003 } is needed{as opposed to G 003 itself }, the upper bound producedby LPsolve prior to the search over the integrality constraints, known as theLP relaxation, can be used as an estimate. The LP relaxation can alwaysbe generated very quickly. Note that this is not by any means the only possible formulation of the allocation problem. Greenwald and Boyan {2001} studied a fast, heuristic search variant and found that it performedextremely well on a collection of large, randomallocation problems. Stone etal. {2001} used a randomized greedy strategy as afallback for the cases in which the linear program took too long to solve. 225Stone, Schapire,Littman, Csirik, & McAllester 5.2Overview Table 6 shows ahigh-level overview of ATT ac-2001. The italicized portions are described in the remainder of this section.When the first flight quotesare posted: * ComputeG 003 with current holdingsand expected prices * Buy the flights in G 003 for which expectedcost of postponing commitment exceedsthe expected benefitof postponing commitment Starting 1minute before each hotel close: * Compute G 003 with current holdings and expectedprices * Buy the flights in G 003 for whichexpected cost of postponing commitmentexceeds expected benefitof postponing commitment {30seconds} * Bidhotel roomexpected marginal values givenholdings, new flights, and expected hotel purchases {30 seconds} Last minute: Buy remaining flights as needed by G 003 In parallel {continuously}:Buy/sell entertainment tickets based ontheir expected valuesTable 6: ATT ac-2001's high-level algorithm. The italicized portions are described in the remainderof this section. 5.3 Costof Additional Rooms Our hotel pricepredictor described in Section 4 assumes thatATT ac-2001's bids do not affect the ultimate closing price {Assumption 3 fromSection 2}. This assumption holds in a large economy. However in TAC, each hotel auction involved 8 agents competing for 16 hotel rooms.Therefore, the actions of each agent had anappreciable effect on the clearing price: the more hotel rooms an agent attempted topurchase, the higher the clearing price would be, all other things being equal.This effect needed to be taken intoaccount when solving the basic allocationproblem. The simplifiedmodel used byATT ac-2001 assumed that thenth highest bid in a hotel auctionwas roughly proportional toc 000n {over theappropriate range of n} for somec025 1. Thus, if the predictor gave a price of p, ATT ac-2001only used this for purchasing two hotel rooms {the \fair" share of a single agent of the 16 rooms}, and adjusted prices for other quantities of rooms by usingc. For example, ATT ac-2001 would consider the cost ofobtaining 4 rooms to be 4pc 2 . One or two rooms each cost p, but 3 each cost pc, 4 each cost pc 2 , 5 each cost pc 3 , etc. So in total,2 rooms cost 2p, while 4 cost4pc 2 . Thereasoning behind this procedure is that ifATT ac-2001 buys two rooms - its fair share given that there are 16 rooms and 8 agents, then the 16th highest bid {ATT ac-2001's2 bids in addition to 14 others} sets the price.But if ATT ac-2001 bids on an additional unit, the previous 15thhighest bid becomes the price-setting bid:the price for all rooms sold goes up from p to pc. Theconstant c was calculated from the data of several hundred games during the seeding round. In each hotel auction, the ratioof the 14th and 18th highest bids {reflecting the 226Decision-TheoreticBidding with Learned Density Models most relevant range of n} was taken as an estimate ofc 4 , and the {geometric}mean of the resulting estimates was takento obtain c = 1:35. The LP allocator takes these priceestimates into account when computingG 003 by as- signing higher costs to larger purchase volumes, thus tending to spread out ATT ac-2001's demand over thedifferent hotel auctions. In ATT ac-2001, a few heuristics were appliedto the above procedure to improvestability and to avoid pathological behavior: prices below $1 were replacedby $1 in estimating c; c= 1 was used for purchasing fewerthan two hotel rooms; hotel rooms weredivided into early closing and lateclosing {and cheap and expensive} ones, andthe c values from the corresponding subsets of auctions of the seedingrounds were used in each case. 5.4 Hotel Expected Marginal Values Using the hotel price prediction module as described in Section 4, coupled with amodel of its own effect on the economy, ATT ac-2001 is equipped todetermine its bids for hotel rooms. Every minute, for each hotel auction thatis still open, ATT ac-2001assumes that auction will close next andcomputes the marginal value of that hotel room given the predicted closing prices ofthe other rooms. If the auction does notclose next, thenit assumes thatit will have a chance to revise its bids.Since these predicted prices are represented as distributionsof possible future prices,ATT ac-2001 samples from these distributionsand averages the marginal values to obtain an expected marginal value. Using the full minute between closing times for computation {or30 seconds if there are stillflights toconsider too}, ATT ac-2001divides the available time among thedifferent open hotel auctions and generates as many price samples as possiblefor each hotel room. In the end,ATT ac-2001 bids the expectedmarginal values for each of the rooms. The algorithm is described precisely and withexplanation in Table 7. Oneadditional complication regarding hotel auctions isthat, contrary to one of our assumptions in Section 2.2 {Assumption 4},bids are not fully retractable: theycan only be changed to $1 abovethe current ask price. In the case thatthere are current active bidsfor goods that ATT ac-2001no longer wants that are less than $1 above the current ask price,it may be advantageous to refrain from changing the bid in the hopes that the ask price will surpass them: that is, thecurrent bid may have a higher expected value than the best possible newbid. To address this issue, ATT ac-2001 samples from the learned price distribution to find the average expected values of the current and potential bids, and only enters a new bid inthe case that the potential bid is better. 5.5 Expected Cost/Benefit of Postponing Commitment ATT ac-2001makes flight bidding decisions based on acost-benefit analysis: in particular, ATT ac-2001 computes the incrementalcost of postponing bidding for a particularflight versus the valueof delaying commitment. In this section, we describe the determination of the cost of postponing the bid. Due to difficulties that compounded withmore sophisticated approaches, ATT ac-2001 used the following very simple modelfor estimating the price of a flight ticketat a given future time. Itis evident from the formulation that|giveny|the expected price increase from time 0 to time t was very nearlyof the form M t 2 for some M . It was alsoclear that 227Stone, Schapire,Littman, Csirik, & McAllester * For each hotel {inorder of increasingexpected price}: * Repeat until time bound 1. Generate a random hotel closing order{only over open hotels} 2.Sample closing prices from predicted hotelprice distributions 3. Giventhese closing prices, compute V 0 ; V 1 ; : : : V n 000V i 021 v{G 003 } if owningi of the hotel 000 Estimatev{G 003 } with LPrelaxation 000 Assume that noadditional hotel rooms of this type can bebought 000 Forother hotels, assumeoutstanding bids abovesampled price are already owned {i.e., they cannot be withdrawn}. 000 Note that V 0 024 V 1 024 : : : 024 V n : the valuesare monotonicallyincreasing since having more goods cannot be worse in termsof possible allocations. * The value of the ith copy ofthe room is the mean of V i 000 V i0001 over all the samples. * Note further that V 1 000 V 0 025 V 2 000 V 1 : : : 025 V n 000 V n0001 : the value differencesare monotonically decreasing since eachadditional room will be assigned to the client who can derive the most value from it. * Bidfor one room at the value of theith copy of the room for alli such that the value is at least as much as the current price.Due to the monotonicity noted in the stepabove, no matterwhat the closingprice, the desired number of rooms at thatprice will be purchased.Table 7: The algorithm forgenerating bids for hotel rooms. as long as the price did not hit the artificialboundaries at $150 and $800, the constantM must depend linearly ony 000 10. This linear dependence coefficient was then estimated from several hundred flight price evolutionsduring the qualifying round. Thus,for this constant m,the expected price increase from timet to time T was m{T 2 000t 2 }{y00010}. When a priceprediction was needed, this formula was first used for the first and most recent actual price observations to obtain a guess fory, and then this y was used in theformula again to estimate the future price.No change was predicted if the formulayielded a price decrease. This approachsuffers from systemic biases of variouskinds {mainly due to the fact that the variance of price changes getsrelatively smaller over longer periods oftime}, but was thought to be accurateenough for its use, which was to predictwhether or not the ticket can be expected to get significantlymore expensive over the next few minutes. In practice,during TAC-01, ATT ac-2001started with the flight-lookahe adparameterset to 3 {i.e., cost of postponing is the average ofthe predicted flight costs 1, 2, and 3 minutes in the future}. However,thisparameter was changed to 2 by the endof the finals in order to causeATT ac-2001 to delay its flightcommitments further. 5.5.1 ExpectedBenefit of Postponing Commitment Fundamentally , the benefit of postponingcommitments to flights is that additional infor- mation about the eventual hotel pricesbecomes known. Thus, the benefit of postponing commitment is computed bysampling possible future price vectors anddetermining, on average, how much better the agent could do if it bought adifferent flight instead of the one in question. If it is optimal to buythe flight in all future scenarios, then thereis no value in delaying thecommitment and the flight is purchasedimmediately. However, if 228Decision-TheoreticBidding with Learned Density Models there are many scenarios in which theflight is not the best one to get, the purchase is more likely to be delayed. The algorithm for determining the benefitof postponing commitment is similar to that for determining the marginal valueof hotel rooms. It is detailed, withexplanation, in Table 8. * Assume we'reconsidering buying n flights of a given type * Repeat until time bound 1. Generate a random hotel closingorder {open hotels} 2. Sampleclosing prices from predicted price distributions{open hotels} 3. Giventhese closing prices compute V 0 ; V 1 ; : : : V n 000 V i =v{G 003 } if forced to buy i of the flight 000 Estimate v{G 003 } with LP relaxation 000 Assume more flights can bebought at the current price 000 Note that V 0 025 V 1 025 : : : 025 V n since it is never worse toretain extra flexibility. * The value of waiting to buycopy i is the mean of V i 000 V i0001 over all thesamples. If all price samples lead tothe conclusion that the ith flight shouldbe bought, then V i = V i0001 and there is no benefit to postponing commitment.Table 8: The algorithm for generating value of postponing flight commitments. 5.6 Entertainment Expected Values The core of ATT ac-2001's entertainment-ticket-bidding strategy isagain a calculation of the expected marginalvalues of each ticket. For each ticket, ATT ac-2001computes the expected valueof having one more and one fewer of the ticket. These calculations give bounds on the bid and ask prices it is willing to post. The actual bid and ask prices are alinear function of time remaining in thegame: ATT ac-2001 settles for a smaller and smaller profit from ticket transactions as the game goes on.Details of the functions of bid and ask priceas a function of game time and ticket value remained unchanged from ATT ac-2000 {Stone et al., 2001}. Details of the entertainment-ticket expected marginal utility calculations are given in Table 9. 6. Results This section presents empirical resultsdemonstrating the effectiveness of theATT ac-2001 strategy .First, we summarize its performance in the2001 and 2002 Trading Agent Com- petitions {TACs}. Thesesummaries provide evidence of the strategy's overall effectiveness, but, due to the smallnumber of games in the competitions, areanecdotal rather than sci- entificallyconclusive. We then present controlledexperiments that provide more conclusive evidence of the utility of our decisiontheoretic and learning approaches embedded within ATT ac-2001. 229Stone, Schapire,Littman, Csirik, & McAllester * Assume n of a giventicket type are currently owned * Repeat until time bound 1. Generate a random hotel closing order{open hotels} 2. Sampleclosing prices from predicted price distributions{open hotels} 3. Given these closingprices compute V n0001 ; V n ; V n+1 000V i = v{G 003 } if owni of the ticket 000Estimate v{G 003 } with LP relaxation 000Assume no other tickets can be bought or sold 000 Note thatV n0001 024V n 024 V n+1 since it is never worse to own extra tickets. * The value of buying a ticket is the mean of V n+1 000 V n over all the samples; the value of selling is the mean ofV n 000 V n0001 . * Since tickets are considered sequentially, if the determined buy or sell bidleads to a price that would clear according to the current quotes, assume the transaction goes throughbefore computing the values of buying and selling other ticket types.Table 9:The algorithm for generating valueof entertainment tickets. 6.1 TAC-01 Competition Of the 19 teams that entered thequalifying round, ATT ac-2001 was oneof eight agents to make it to thefinals on the afternoon of October 14th, 2001.The finals consisted of 24 games amongthe same eight agents. Rightfrom the beginning, it became clear that livingagents {Fritschi& Dorer, 2002} was the team to beat in thefinals. They jumped to an early lead in the first two games, and by eight games into the round, they were morethan 135 points per game ahead oftheir closest competitor {SouthamptonTAC, He & Jennings, 2002}. 16games into the round, they were more than 250points ahead of their two closest competitors {ATT ac-2001and whitebear}. From that point, ATT ac-2001, which was continually retraining itsprice predictors based on recent games, began making a comeback. By the time the last game was to be played, it was only an average of 22 pointsper game behind livingagents.It thus needed to beat livingagents by 514 pointsin thefinal game to overtake it, well withinthe margins observed in individual gameinstances. As the game completed, ATT ac-2001's score of 3979 was one of the first scores to be posted bythe server. The other agents' scores werereported one by one, until only thelivingagents score was left. Afteragonizing seconds {at least for us}, the TAC server posted a finalgame score of 4626, resultingin a win forlivingagents. After the competition, theTAC team at the University of Michigan conducted a regres- sion analysis of the effects of the client profiles on agentscores. Using data from the seeding rounds, it was determined that agents did better when their clients had: 1.fewer total preferred travel days; 2. higher total entertainment values; and 3. a higher ratio ofouter days {1 and 4} to inner {2 and 3} in preferred trip intervals. 230Decision-TheoreticBidding with Learned Density Models Based on these significant measures, thegames in the finals could be handicapped based on each agents' aggregate clientprofiles. Doing so indicated thatlivingagents' clients were much easier to satisfy than those ofATT ac-2001, giving ATT ac-2001the highest handicapped score.The final scores, as well as the handicapped scores, are shown in Table10. Complete results and affiliations areavailable from http://tac.eecs.umich.edu.AgentMeanHandicapped scoreATT ac-200136224154livingagents36704094whitebear35133931Urlaub0134213909Retsina33523812CaiserSose30743766SouthamptonT AC32533679T acsMan28593338T able10: Scores during the finals. Eachagent played 24 games. Southampton'sscore was adversely affected by agame in which their agent crashed after buyingmany flights but no hotels, leadingto a loss of over 3000 points.Discarding that game results in an average score of 3531. 6.2 TAC-02 Competition A year afterthe TAC-01 competition, ATT ac-2001 was re-entered in the TAC-02 competition using the modelstrained at the end of TAC-01.Specifically, the price predictors wereleft unchanged throughout {no learning}.The seeding round included 19 agents, eachplaying 440 games over the course ofabout 2 weeks. ATT ac-2001was the top-scoring agent in this round, as shown in Table11. Scores in the seeding round were weighted so as to emphasize later results over earlier results: scores on dayn of the seeding round were given a weight of n. Thispractice was designed to encourage experimentation early in the round. The official ranking in the competitions was based on the mean score after ignoring each agent's worst 10 results so as to allow for occasional program crashes and network problems. On the one hand, it is striking thatATT ac-2001 was able to finish sostrongly in a field of agents that hadpresumably improved over the course of theyear. On the other hand, most agents were being tuned, for better and for worse, whileATT ac-2001 was consistent throughout. In particular, we are toldthat SouthamptonTAC experimented withits approach during the later days ofthe round, perhaps causing it to fall out of the lead {by weighted score} in the end.During the 14-game semifinal heat,ATT ac-2001, which was nowrestored with its learning capability andretrained over the data from the 2002 seedinground, finished 6th out of 8 therebyfailing to reach the finals. Thereare a number of possible reasons for thissudden failure. One relatively mun- dane explanation is that the agent had tochange computational environments betweenthe seeding rounds and the finals, andthere may have been a bug or computationalresource constraint introduced.Another possibilityis that due to the smallnumber of games in 231Stone, Schapire,Littman, Csirik, & McAllesterAgentMeanWeighted,droppedworst 10ATT ac-200130503131SouthamptonT AC31003129UMBCT AC29803118livingagents30183091cuhk29983055Thalis29523000whitebear29452966RoxyBot27382855T able11: Top 8 scores duringthe seeding roundof TAC-02. Each agent played 440 games, with its worst 10 games ignoredwhen computing the rankings. thesemifinals, ATT ac-2001 simply got unlucky with respect to clients and the interaction ofopponent strategies. However,it is also plausiblethat the training data fromthe 2002 qualifying and seeding round datawas less representative of the 2002 finalsthan the was the training data from 2001;and/or that the competing agents improvedsignificantly over the seeding roundwhile ATT ac-2001 remained unchanged.The TAC team at the University of Michigan has done a study of the pricepredictors of several 2002 TACagents that suggests that the bug hypothesis is most plausible: the ATT ac-2001 predictor from 2001 outperforms all other predictors from 2002 on the datafrom the 2002 semifinalsand finals; and oneother agent that uses the 2002 data didproduce good predictions based on that data{Wellman, Reeves, Lochner, & Vorobeychik, 2003b}. 8 6.3 Controlled Experiments ATT ac-2001's success in the TAC-01 competition demonstrates its effectiveness as a complete system. However,since the competing agents differed along several dimensions, the compe- tition resultscannot isolate the successful approaches.In this section, we report controlled experimentsdesigned to test the efficacy of ATT ac-2001's machine-learning approach to price prediction. 6.3.1Varying the Predictor In the first set of experiments, weattempted to determine how the quality ofATT ac-2001's hotel pricepredictions affects its performance. To this end, we devised seven price prediction schemes, varying considerably insophistication and inspired by approaches takenby other TAC competitors, andincorporated these schemes into our agent.We then played these seven agents against one another repeatedly, with regular retraining as described below. Following are the seven hotelprediction schemes that we used, in decreasingorder of sophistication:8. Indeed, in the TAC-03competition, ATT ac-2001 was enteredusing the trained models from 2001, and it won the competition, suggesting further thatthe failure in 2002 was due to a problem withthe learned models that were used duringthe finals in 2002. 232Decision-TheoreticBidding with Learned Density Models * ATT ac-2001 s : This is the \full-strength" agentbased on boosting that was used during the tournament. {The sdenotes sampling.} * Cond'lMean s : This agent samples pricesfrom the empirical distribution of prices from previously played games, conditionedonly onthe closing time of the hotel room {a subset of of the features used byATT ac-2001 s }.In other words, it collects all historical hotel prices and breaks them down by thetime at which the hotel closed {as well as room type, as usual}. Theprice predictor then simply samples from thecollection of prices corresponding to the given closing time. * SimpleMean s : This agent samples prices from the empirical distribution of prices from previously played games, withoutregard tothe closing time of the hotel room {but still broken down by room type}. It uses a subset of the features used by Cond'lMean s . * ATT ac-2001 ev , Cond'lMean ev , SimpleMean ev : These agents predict in the same way as their corresponding predictors above, but instead of returning a random samplefrom the estimated distributionof hotelprices, they deterministicallyreturn the expected value of the distribution. {Theev denotes expected value,as introduced in Sec- tion2.1.} * CurrentBid:This agent uses a very simple predictor thatalways predicts that the hotel room will close at its current price. In every case, whenever the price predictorreturns a price that is below the currentprice, we replace it with the currentprice {since prices cannot go down}. In our experiments, we added as an eighth agent EarlyBidder, inspired by thelivingagents agent. EarlyBidderused SimpleMean ev to predict closing prices, determined an optimalset of purchases, and then placed bidsfor these goods at sufficiently high pricesto ensure that they would be purchased {$1001 for all hotel rooms, just aslivingagents did in TAC-01}right after the first flight quotes.It then never revised these bids. Each of these agents require training,i.e., data from previously played games.However, we are faced with a sortof \chicken and egg" problem: torun the agents, we need to first train the agents using data fromgames in which they were involved, butto get this kind of data, we need tofirst run the agents. To get aroundthis problem, we ran the agents inphases. In Phase I, which consisted of 126games, we used training data from the seeding, semifinalsand finals rounds of TAC-01. In Phase II, lasting 157 games, we retrained the agents once every sixhours using all of the data from the seeding,semifinals and finals rounds as wellas all of the games played in Phase II.Finally, in Phase III, lasting 622 games, we continued to retrain theagents once every six hours, but now usingonly data from games played during PhasesI and II, and not including data from theseeding, semifinals and finals rounds. Table 12 shows how the agents performed in each of these phases. Muchof what we observe in this table isconsistent with our expectations. Themore sophisticated boosting- based agents{ATT ac-2001 s and ATT ac-2001 ev } clearly dominated the agents based onsimpler prediction schemes. Moreover,with continued training, these agents improved markedly relative toEarlyBidder. We also see the performance of the simplest agent, CurrentBid, which 233Stone, Schapire,Littman, Csirik, & McAllesterAgentRelative ScorePhase IPhase IIPhase IIIATT ac-2001 ev105:2006 49:5 {2}131:6 006 47:7 {2}166:2006 20:8 {1}ATT ac-2001 s27:8 006 42:1{3}86:1 00644:7 {3}122:3 006 19:4 {2}EarlyBidder140:3 006 38:6 {1}152:8 006 43:4 {1}117:0 006 18:0 {3}SimpleMean ev00028:8 006 45:1 {5}00053:9 006 40:1 {5}00011:5 00621:7 {4}SimpleMean s00072:0 006 47:5 {7}00071:6 00642:8 {6}00044:1 006 18:2{5}Cond'lMean ev8:6 006 41:2 {4}3:5 006 37:5 {4}00060:1 00619:7 {6}Cond'lMean s000147:5 006 35:6 {8}00091:4006 41:9 {7}00091:1 00617:6 {7}CurrentBid00033:7006 52:4 {6}000157:1 00654:8 {8}000198:8 006 26:0{8}Table 12: The average relative scores{006 standard deviation} for eight agents in the three phases of our controlled experiment in which the hotel prediction algorithm was varied. The relative score ofan agent is its score minus the averagescore of all agents in that game.The agent's rank within each phase is shown in parentheses. does not employany kind of training, significantly declinerelative to the other data-driven agents. On the other hand, there are some phenomena in this table that were verysurprising to us. Most surprisingwasthe failure of sampling to help. Ourstrategy relies heavily not only onestimating hotel prices, but also taking samples fromthe distributionof hotel prices.Yet these results indicate that using expected hotel price, rather than price samples, consistently performs better.We speculate that this may be because an insufficient number of samples are being used {due tocomputational limitations} so that the numbersderived from these samples have too high a variance. Another possibilityis that the method of using samples toestimate scores consistently overestimates the expected score because it assumes the agent can behave with perfect knowledge for each individual sample|a propertyof our approximation scheme.Finally, as our algorithm uses sampling atseveral different points {computing hotel expected values,deciding when to buy flights, pricingentertainment tick- ets, etc.}, it is quite possible that sampling is beneficial for somedecisions while detrimental forothers. For example, when directly comparingversions of the algorithm with sampling used at only subsets of the decision points, the data suggests that sampling for the hotel decisions is most beneficial, while samplingfor the flights and entertainment ticketsis neu- tral at best, and possiblydetrimental. This result is not surprisinggiven that the sampling approach is motivated primarilyby the task of bidding forhotels. We were also surprised thatCond'lMean s and Cond'lMean ev eventually performed worse than the less sophisticated SimpleMean s and SimpleMean ev . One possible explanation isthat the simpler model happens to give predictions that are just as good as themore complicated model, perhaps becauseclosing time is not terribly informative,or perhaps because the adjustmentto price based on current price is moresignificant. Other things being equal, the simpler model has the advantage that its statistics are based on all of the price data, regardless of closing time,whereas the conditional model makes eachpredictionbased on only an eighth ofthe data {since there are eight possible closing times, each equally likely}. In addition to agent performance, it is possible to measure the inaccuracy of the eventual predictions, at least for the non-samplingagents. For these agents, we measuredthe root 234Decision-TheoreticBidding with Learned Density Models mean squared error of the predictions made inPhase III. These were: 56.0 forATT ac-2001 ev , 66.6 for SimpleMean ev , 69.8 for CurrentBid and 71.3 forCond'lMean ev . Thus,we see that the lower the error ofthe predictions {according to this measure},thehigher the score {correlationR= 0000:88}. 6.3.2 ATT ac-2001 vs.EarlyBidder In a sense, the twoagents that finished at the top of the standings in TAC-01 represented oppositeends of a spectrum. The livingagentsagent uses a simple open-loop strategy, com- mitting to a set of desired goods right at the beginning of the game,while ATT ac-2001 uses a closed-loop, adaptive strategy. The open-loop strategy relies on the other agents to stabilize the economy and create consistent final prices. In particular, if all eight agents are open loop and placevery high bids for the goods theywant, many of the prices will skyrocket, evaporating any potential profit. Thus, a set of open-loopagents would tend to get negative scores|theopen-loop strategy is a parasite, in amanner of speaking. Table 13 shows theresults of running 27 games with 7 copiesof the open-loop EarlyBidder and one ofATT ac-2001. Although motivated by livingagents, in actuality it is identical to ATT ac-2001except that it uses SimpleMean ev and it places all ofits flight and hotel bids immediately after thefirst flight quotes. It bids only for the hotels that appear inG 003 at that time.All hotel bids are for $1001. Inthe experiments, one copy ofATT ac-2001 s isincluded for comparison. The price predictors are all from Phase I in the preceding experiments. EarlyBidder's high biddingstrategy backfires and it ends up overpaying significantly for its goods.As our experiments above indicate, ATT ac-2001 may improve even further if it is allowed to train on thegames of the on-going experiment as well.AgentScoreUtilityATT ac-20012431 006 4648909 006 264EarlyBidder{7}0004880006 3379870 00634Table 13: The results of runningATT ac-2001 against 7 copies ofEarlyBidder over the course of 27 games. EarlyBidderachieves high utility, but overpays significantly, resulting in low scores. The open-loop strategyhas the advantage of buying a minimal setof goods. That is, it never buysmore than it can use. On the other hand, itis susceptible to unexpected prices in that it can get stuck paying arbitrarilyhigh prices for the hotel rooms it has decidedto buy. Notice in Table 13 that the average utilityof theEarlyBidder's clients is significantly greater than that of ATT ac-2001's clients. Thus, the difference inscore is accounted for entirelyby the cost of the goods. EarlyBidderends up paying exorbitant prices, whileATT ac- 2001 generally steers clearof the more expensive hotels. Itsclients' utility suffers, but the cost-savings are well worth it. Compared to the open-loop strategy, ATT ac-2001's strategyis relatively stable against itself. Itsmain drawback is that as it changes itsdecision about what goods it wants andas 235Stone, Schapire,Littman, Csirik, & McAllester it may alsobuy goods to hedge against possible price changes, it can end up getting stuck paying for some goods that are ultimatelyuseless to any of its clients. Table 14 shows the results of 7 copiesof ATT ac-2001 playing against eachother and one copy of theEarlyBidder. Again, training is from theseeding round and finals of TAC- 01: the agents don't adapt during the experiment. Included in this experiment arethree variants of ATT ac-2001, each with a differentflight-lookahe ad parameter {from thesection on \cost of postponing flightcommitments"}. There were three copies each of the agents with flight-lookahe ad setto 2 and 3 {ATT ac-2001{2} and ATT ac-2001{3}, respectively}, and oneATT ac-2001 agent withflight-lookahe ad set to 4 {ATT ac-2001{4}).AgentScoreUtilityEarlyBidder2869 0066910079 006 55ATT ac-2001{2}2614 006 389671 006 32ATT ac-2001{3}2570 006399641 006 32ATT ac-2001{4}2494 006 689613 006 55Table 14:The results of runningthe EarlyBidderagainst 7 copies of ATT ac-2001over the course of 197 games.The three different versions ofATT ac-2001 had slightly different flight-lookahe ads. F rom the results in Table 14 it is clear that ATT ac-2001does better when committing to its flight purchases later in the game {ATT ac-2001{2} as opposed toATT ac-2001{4}). In comparison with Table 13, the economyrepresented here does significantly better overall. That is, having many copies ofATT ac-2001 in the economy does notcause them to suffer. However, inthis economy, EarlyBidder is able toinvade. It gets a significantly higherutility for its clients and only pays slightly more than the ATT ac-2001agents {as computed by utilityminus score}. 9 Theresults in this section suggest that the variance of the closing prices is the largest determining factor between the effectiveness of the two strategies {assuming nobody else is using the open-loop strategy}.We speculate that with large price variances, the closed- loop strategy {ATT ac-2001} should do better, butwith small price variances, the open-loop strategy could do better. 7. Discussion The open-loop andclosed-loop strategies of the previous sectiondiffer in their handling of pricefluctuation. A fundamental way of takingprice fluctuation into account is to place \safe bids." A very high bid exposesan agent to the danger of buying something at a ridiculously high price. Ifprices are in fact stable then high bids are safe.But if prices fluctuate, then highbids, such as the bids of the stable-pricestrategy, are risky. In TAC, hotel rooms are sold in a Vickrey-style nth price action. Thereis a separate auction for eachday of each hotel and these auctions are donesequentially. Although the order of the auctions is randomized, and not known tothe agent, when placing bids in one of these9. We suspectthat were the agents allowed to retrain over the course of the experiments,ATT ac-2001 would end up improving, as we saw in Phase III of theprevious set of experiments. Werethis to occur, it is possible thatEarlyBidder would no longer be able to invade. 236Decision-TheoreticBidding with Learned Density Models auctions the agent assumes that auction willclose next. We assumed in the design ofour agent that our bids in one auction donot affect prices in other auctions.This assumption is not strictly true, butin a large economy one expects that the bidsof a single individual havea limited effect on prices. Furthermore,the price most affected by a bid is the price of the item being bid on; the effect on other auctions seems less direct and perhaps more limited. Assuming bids in one auction donot affect prices in another, the optimal bidding strategy is the standard strategy for a Vickrey auction|the bid for an item should be equal to its utility to the bidder.So, to place a Vickrey-optimal bid, one mustbe able to estimate the utility of an item. The utility of owning an item issimply the expected final score assuming one owns the item minus the expected final score assuming one does not ownthe item. So, the problem of computing a Vickrey-optimal bid can be reduced to theproblem of predicting final scores for two alternative game situations. Weuse two score prediction procedures,which we call the stable-price score predictor{corresponding to Equation 5} andthe unstable-price score predictor {Equation 4}. The Stable-Price Score Predictor.The stable-price score predictor first estimates the expected prices in the rest of thegame using whatever information is available in the givengame situation.It then computes the value achieved by optimal purchases under the estimatedprices. In an economy with stable prices,this estimate willbe quite accurate| if we make the optimal purchases forthe expected price then, if the prices are nearour estimates, our performance will also be near the estimated value. The Unstable-Price Score Predictor.Stable-price score prediction does not take into account the ability of the agent to react to changes in price as the gameprogresses. Sup- pose a given roomis often cheap but is sometimes expensive.If the agent can first determine the price of the room, and then plan forthat price, the agent willdo better thanguessing the price ahead of time and sticking to the purchases dictated by that price.The unstable price predictor uses a model of the distribution of possible prices.It repeatedly samples pricesfrom this distribution, computes the stable-price scoreprediction under the sampled price, and thentakes the average of these stable-price scoresover the various price samples. This score prediction algorithm is similar to the algorithm used in Ginsberg's {2001}quite successful computer bridge program wherethe score is predicted by sampling the possible hands of the opponent and, for each sample, computing the score of optimal play inthe case where all players havecomplete information {double dummy play}.While this approach has a simple intuitive motivation, it is clearly imperfect.The unstable-price score predictor assumes both that future decisions are made in the presence of complete price information, and that the agent is free to changeexisting bids in auctions that have not yetclosed. Both of these assumptions are onlyapproximately true at best. Waysof compensating for the imperfectionsin score prediction were described in Section 5. Buy Now or Decide Later.The trading agent must decide what airlinetickets to buy and when to buy them.In deciding whether to buy an airline ticket,the agent can compare the predicted score in the situation where it owns the airline ticket with the predicted score in thesituation where it does not own the airline ticket but may buy it later. Airlinetickets tend to increase in price, soif the agent knows that a certain ticketis needed it should buy it as soon aspossible. But whether or not a given ticket is desirable may depend on the price of hotel rooms, which may becomeclearer as the game progresses. If airline tickets did not increase in price, as wasthe case in TAC-00, thenthey should bebought at the 237Stone, Schapire,Littman, Csirik, & McAllester lastpossiblemoment {Stone et al., 2001}. To determine whether an airline ticket shouldbe bought now or not, one cancompare the predicted score in the situation whereone has just bought the ticket atits current price with the predicted score in thesituation where the price of the ticketis somewhat higher but has not yet been bought. It is interesting to note that if one uses the stable-price score predictorfor both of these predictions, and the ticket is purchased in the optimal allocationunder the current price estimate, then the predicted score for buying the ticket now willalways be higher|increasing the price of theticket can only reduce the score.However, the unstable-price score predictorcan yield an advantage fordelaying the purchase. This advantage comes from the fact that buying the ticket may be optimal under some pricesbut not optimal under others. If the ticket has not yet been bought, then thescore will be higher for those sampled priceswhere the ticket shouldnot be bought. This corresponds to the intuition thatin certain cases the purchase should be delayed until more information is available. Our guiding principle in the design of the agent was, to the greatest extent possible, to have the agentanalytically calculate optimal actions. Akey component of these calculations is the score predictor, based either on asingle estimated assignment of prices or on a model of the probability distribution overassignments of prices. Both score predictors,though clearly imperfect, seem useful.Of these two predictors, only theunstable-price predictor can be usedto quantitatively estimate the valueof postponing a decision until more information is available. The accuracy ofprice estimation is clearly of central importance.Future research willundoubtedly focus on ways of improving both price modeling and score prediction based on price modeling. 8.Related and Future Work Although there has been a good dealof research on auction theory, especiallyfrom the perspective of auction mechanisms {Klemperer, 1999}, studies of autonomousbiddingagents and their interactions are relatively few and recent. TACis one example. FM97.6 is anotherauction test-bed, which is based on fishmarket auctions {Rodriguez-Aguilar, Martin, Noriega, Garcia, & Sierra, 1998}.Automatic bidding agents have also beencreated in this domain {Gimenez-Funes, Godo, Rodriguez-Aguiolar, & Garcia-Calves, 1998}. There have beena number of studies of agents biddingfora single good in multiple auctions {Ito, Fukuta, Shintani,& Sycara, 2000; Anthony, Hall, Dang, & Jennings, 2001;Preist,Bartolini, & Phillips,2001}. A notable auction-based competition that was held prior to TAC was the Santa Fe Double Auction Tournament{Rust, Miller, & Palmer, 1992}. Thisauction involved several agentscompeting in a single continuous double auction similar to the TAC entertainment ticket auctions. As analyzed by Tesauro and Das {2001}, this tournament was won by a parasite strategy that, like livingagents as described in Section 6.3,reliedon other agents to find a stableprice and then took advantageof it to gain an advantage. Inthat case, the advantagewas gained by waiting until the last minute to bid, a strategy commonly known as sniping. TAC-01was the second iteration of the Trading Agent Competition. Therules of TAC- 01 are largely identical to those of TAC-00, withthree important exceptions: 1. In TAC-00, flight prices did not tend toincrease; 238Decision-TheoreticBidding with Learned Density Models 2. In TAC-00, hotel auctions usuallyall closed at the end of the game; 3. In TAC-00, entertainment tickets were distributeduniformly to all agents While minor on the surface, the differencessignificantly enriched the strategic complexity of the game. In TAC-00,most of the designers discovered that a dominant strategy was to defer all serious biddingto the end of the game. A result, the focus was on solving the allocation problem,with most agents usinga greedy,heuristicapproach. Since the hotel auctions closed at the end of the game,timing issues were also important, with significant advantages going toagents that were able to bidin response tolast-second price quotes {Stone & Greenwald, 2003}. Nonetheless, many techniques developed in 2000 were relevant to the 2001 competition: theagent strategies put forth in TAC-00were important precursors to the secondyear's field, for instance as pointed out in Section 5.1. Predicting hotel clearing prices was perhaps the most interesting aspect of TAC agent strategies in TAC-01, especially in relation to TAC-00 where the last-minute bidding created essentially a sealed-bid auction.As indicated by our experiments describedin Section 6.3, there are many possible approaches to this hotel price estimation problem,and the approach chosen can havea significant impact on the agent's performance. Among those observed in TAC-01 are the following {Wellman, Greenwald, Stone, & Wurman,2002}, associated in some cases with theprice-predictor variant in our experimentsthat was motivated by it. 1. Just use the current price quotep t {CurrentBid}. 2. Adjust based on historic data.For example, if 001 t is the average historicaldifference between clearing price and price at timet, then the predicted clearing price isp t + 001 t . 3. Predictby fitting a curve to the sequence of askprices seen in the current game. 4. Predict based on closing price data forthat hotel in past games {SimpleMean ev , SimpleMean s }. 5. Same as above, but condition on hotel closing time, recognizingthat the closing sequence will influencethe relative prices. 6. Sameas above, butcondition on full ordering ofhotel closings, or which hotels are open or closed at a particular point{Cond'lMean ev ,Cond'lMean s }. 7. Learn a mapping from features of thecurrent game {includingcurrent prices} toclosing prices based on historic data {ATT ac-2001 s ,ATT ac-2001 ev }. 8. Hand-construct rules based onobservations about associations between abstract fea- tures. Havingdemonstrated ATT ac-2001's success atbidding in simultaneous auctions for mul- tiple interacting goods in the TAC domain, we extended our approach toapply it to the U.S. FederalCommunications Commission {FCC} spectrum auctions domain {Weber, 1997}. TheFCC holds spectrum auctions to sell radiobandwidth to telecommunications compa- nies. Licenses entitle their owners touse a specified radio spectrum band within aspecified geographical area, ormarket. Typically several licenses areauctioned off simultaneously withbidders placing independent bids for eachlicense. The most recent auction brought in 239Stone, Schapire,Littman, Csirik, & McAllester over $16billion dollars. In a detailed simulation ofthis domain {Csirik, Littman, Singh, & Stone, 2001}, we discovered a novel,successful bidding strategy inthis domain that allows the bidders to increase their profitssignificantly over a reasonable default strategy {Reitsma, Stone, Csirik, & Littman, 2002}. Our ongoing research agenda includes applying our approach to other similar domains. We particularlyexpect the boostingapproach to price prediction and thedecision-theoretic reasoning over pricedistributionsto transfer to other domains.Other candidate real-world domainsinclude electricity auctions, supplychains, and perhaps even travel booking on public e-commerce sites. Acknowledgements Thiswork was partiallysupported by the United States{Israel BinationalScience Founda- tion {BSF}, grant number 1999038. Thanks to the TAC team at the University of Michigan for providing the infrastructure and support required to run many of our experiments. Thanks to Ronggang Yu at the University of Texas at Austin for running oneof the ex- periments mentioned in thearticle. Most of this research was conductedwhile all of the authors were at AT&T Labs - Research. References Anthony, P., Hall, W., Dang, V. D., &Jennings, N. R. {2001}. Autonomousagents for participating in multipleon-line auctions. In Pro c e e dingsof the IJCAI-2001 Workshop on E-Business and the Intel ligent Web Seattle, WA. Boutilier, C. {2002}. Apomdp formulation of preference elicitationproblems. In Pro c e e dings of the Eighteenth National Conferenc e on Artificial Intel ligence, pp. 239{246. Collins, M., Schapire, R. E., & Singer, Y. {2002}.Logistic regression, AdaBoost and Breg- man distances. Machine Le arning, 48 {1/2/3}. Csirik,J. A., Littman, M. L., Singh, S., & Stone, P. {2001}. FAucS: An FCC spectrum auction simulator for autonomous biddingagents. In Fiege, L., MÂȘuhl, G., & Wilhelm, U. {Eds.},Electr onic Commerc e: Pro c e e dings of the Sec ondInternational Workshop, pp. 139{151Heidelberg,Germany. SpringerVerlag. F reund, Y., & Schapire, R. E.{1997}. A decision-theoretic generalization ofon-line learning and an application to boosting. Journal of Computer and System Sciences, 55 {1}, 119{139. Fritschi, C., & Dorer, K. {2002}.Agent-oriented software engineering forsuccessful TAC participation.In First International Joint Conferenc e on Autonomous Agents and Multi-Agent Systems Bologna. Garey, M. R., & Johnson, D. S. {1979}.Computers and Intractability:A Guide to the Theoryof NP-completeness. Freeman,San Francisco, CA. 240Decision-TheoreticBidding with Learned Density Models Gimenez-Funes, E., Godo, L., Rodriguez-Aguiolar, J. A., & Garcia-Calves, P. {1998}. De- signing biddingstrategies for trading agents in electronic auctions.In Pro c e e dings of the Third International Conferenc e on Multi-Agent Systems, pp. 136{143. Ginsberg, M. L. {2001}.GIB: Imperfect information in a computationallychallenging game. JAIR,14, 303{358. Greenwald,A.,& Boyan, J. {2001}. Biddingalgorithms for simultaneous auctions. In Pro c e e dings of Third ACM Conferenc e on E-Commerc e, pp. 115{124 Tampa, FL. Harsanyi, J. {1967{1968}. Gameswith incomplete information played by bayesian players. Management Science, 14, 159{182,320{334,486{502. Hauskrecht, M. {1997}. Incrementalmethods for computing bounds in partially observable Markov decisionprocesses.In Pro c e e dings of the Fourte enth National Conferenc eon Artificial Intel ligence, pp. 734{739. He, M., & Jennings,N. R. {2002}. SouthamptonTAC:Designing a successful trading agent. In Fifteenth Europ e anConferenc e on Artificial Intelligence Lyon, France. Ito, T., Fukuta, N., Shintani, T., & Sycara, K. {2000}. Biddingbot: amultiagent support systemfor cooperative bidding in multipleauctions. In Pro c e e dingsof the Fourth InternationalConferenc e on MultiAgent Systems, pp. 399{400. Kearns, M., Mansour, Y., &Ng, A.Y. {1999}. A sparse sampling algorithmfor near- optimal planning in large Markov decision processes. In Pro c e edings of the Sixteenth InternationalJoint Conferenc e on Artificial Intelligence {IJCAI-99}, pp. 1324{1331. Klemperer, P. {1999}.Auction theory: A guide to the literature.Journal of Economic Surveys, 13 {3}, 227{86. Littman, M. L., Ma jercik, S. M., & Pitassi, T. {2001}. Stochastic Booleansatisfiability. Journalof Automated Re asoning,27 {3}, 251{296. Papadimitriou, C. H., & Tsitsiklis, J. N. {1987}. Thecomplexity of Markov decisionpro- cesses. Mathematics of Oper ationsRese ar ch, 12 {3},441{450. Preist, C., Bartolini, C., & Phillips, I. {2001}. Algorithm design for agents which partici- pate in multiple simultaneousauctions. In Agent Mediate dElectr onic Commerc e III {LNAI}, pp. 139{154. Springer-Verlag, Berlin. Reitsma, P.S. A., Stone, P., Csirik, J. A., & Littman, M. L. {2002}. Self-enforcing strategic demand reduction. In AgentMediate d Electr onic Commerc e IV: Designing Mecha- nisms and Systems, Vol. 2531 ofLe ctur e Notes in Artificial Intelligence, pp. 289{306. Springer Verlag. Rodriguez-Aguilar, J. A., Martin, F. J., Noriega, P., Garcia, P.,& Sierra, C. {1998}. Towards a test-bed for trading agents inelectronic auction markets. AI Communications, 11 {1}, 5{19. 241Stone, Schapire,Littman, Csirik, & McAllester Rothkopf, M.H., & Harstad, R. M. {1994}. Modelingcompetitive bidding: A critical essay. Management Science,40 {3}, 364{384. Rust, J., Miller,J., & Palmer, R. {1992}. Behaviorof trading automata in a computerized double auction market. In Friedman, D., & Rust, J. {Eds.}, The Double Auction Market: Institutions, Theories, and Evidence. Addison-Wesley , Redwood City,CA. Schapire, R. E., & Singer, Y. {1999}.Improved boosting algorithms usingconfidence-rated predictions. MachineLe arning, 37 {3},297{336. Schapire, R. E., & Singer, Y.{2000}. BoosTexter: A boosting-basedsystem for text cate- gorization.Machine Le arning, 39{2/3}, 135{168. Stone, C. J. {1994}.The use of polynomial splines and theirtensor products in multivariate function estimation. The Annals ofStatistics, 22 {1}, 118{184. Stone, P., & Greenwald, A.{2003}. The first international trading agent competition: Autonomous biddingagents.Electr onic Commerc e Rese ar ch. To appear. Stone, P., Littman, M. L., Singh, S.,&Kearns, M. {2001}. ATTac-2000:An adaptive autonomous biddingagent.Journal of Artificial Intel ligence Rese ar ch, 15, 189{206. Tesauro, G., & Das, R. {2001}.High-performance bidding agents for the continuous double auction. In Third ACM Conferenc e on Electr onic Commerc e, pp. 206{209. Weber, R. J. {1997}. Makingmore from less: Strategic demand reduction inthe FCC spectrum auctions.Journal of Economics and Management Strate gy, 6 {3}, 529{548. Wellman, M. P., Greenwald, A.,Stone, P., & Wurman, P. R. {2002}. The 2001 trading agent competition. In Pro c e e dingsof the Fourte enth Innovative Applications of Artificial Intelligence Conferenc e, pp. 935{941. Wellman, M. P., Greenwald, A.,Stone, P., & Wurman, P. R. {2003a}. The 2001 trading agent competition. Electr onicMarkets,13 {1}, 4{12. Wellman, M. P., Reeves, D.M.,Lochner, K. M., & Vorobeychik,Y. {2003b}. Price prediction in a trading agent competition.Tech. rep., University of Michigan. Wellman, M. P., Wurman,P. R., O'Malley, K., Bangera, R., Lin, S.-d., Reeves, D., & Walsh, W. E. {2001}. A trading agent competition. IEEE Internet Computing,5 {2}, 43{51. 242