M. P. Wellman, D. M. Reeves, K. M. Lochner and Y. Vorobeychik
Volume 21, 2004
Links to Full Text:
ppredict.dvi
Journal of Artiï¬cial Intelligence Research 21 (2004) 19â36 Submitted 7/03; published 1/04 Price Prediction in a Trading Agent Competition Michael P. Wellman WELLMAN@UMICH.EDU Daniel M. Reeves DREEVES@UMICH.EDU Kevin M. Lochner KLOCHNER@UMICH.EDU Yevgeniy Vorobeychik YVOROBEY@UMICH.EDU University of Michigan, Artiï¬cial Intelligence LaboratoryAnn Arbor, MI 48109-2110 USA Abstract The 2002 Trading Agent Competition (TAC) presented a challenging market game in the do- main of travel shopping. One of the pivotal issues in this domain is uncertainty about hotel prices,which have a signiï¬cant inï¬uence on the relative cost of alternative trip schedules. Thus, virtuallyall participants employ some method for predicting hotel prices. We survey approaches employedin the tournament, ï¬nding that agents apply an interesting diversity of techniques, taking into ac-count differing sources of evidence bearing on prices. Based on data provided by entrants on theiragentsâ actual predictions in the TAC-02 ï¬nals and semiï¬nals, we analyze the relative efï¬cacy ofthese approaches. The results show that taking into account game-speciï¬c information about ï¬ightprices is a major distinguishing factor. Machine learning methods effectively induce the relation-ship between ï¬ight and hotel prices from game data, and a purely analytical approach based oncompetitive equilibrium analysis achieves equal accuracy with no historical data. Employing a newmeasure of prediction quality, we relate absolute accuracy to bottom-line performance in the game. 1. Introduction Many market decision problems involve some anticipation or forecast of future prices. Price pre-diction is particularly important, for example, in committing to a binding offer to purchase a goodthat has complements to be purchased at a later date. This sort of scenario arises whenever there aresequential or overlapping auctions for related goods. Although market forecasting techniques are inwidespread use over a broad range of applications, we are unaware of studies exploring the problemin a context reminiscent of multi-auction environments. The annual Trading Agent Competition (TAC) (Wellman et al., 2003) provides a convenient medium for studying approaches to price prediction. As an open-invitation tournament, it has at-tracted a diverse community of researchers interested in many aspects of trading agent strategy.Price prediction has turned out to be a pivotal issue in the TAC market game, 1 and an interesting ar-ray of approaches has emerged from agent designersâ efforts over three years of competition. SinceTAC deï¬nes a controlled, regular, repeatable, and transparent environment for observing tradingagent behavior, it is also uncommonly amenable to analysis. 1. We refer to the original (âclassicâ) TAC game, a scenario in the domain of travel shopping. Sadeh et al. (2003) introduced a second game (âTAC/SCMâ), in the domain of supply chain management, and we expect that furtherdomains will be the subject of trading competitions in coming years. c 2004 AI Access Foundation. All rights reserved. Â
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK 2. TAC Travel-Shopping Game The TAC market game presents a travel-shopping task, where traders assemble ï¬ights, hotels, andentertainment into trips for a set of eight probabilistically generated clients. Clients are describedby their preferred arrival and departure days (pa and pd), the premium (hp) they are willing to payto stay at the âTowersâ (T) hotel rather than âShantiesâ (S), and their values for three different typesof entertainment events. The agentsâ objective is to maximize the value of trips for their clients, netof expenditures in the markets for travel goods. Flights. A feasible trip includes round-trip air, which consists of an inï¬ight day and outï¬ight day , . Flights in and out each day are sold independently, at prices determined by ½ a stochastic process. The initial price for each ï¬ight is , and follows a random walk à ¾ ¼ ¼¼ thereafter with an increasingly upward bias. Hotels. Feasible trips must also include a room in one of the two hotels for each night of the clientâs stay. There are 16 rooms available in each hotel each night, and these are sold throughascending 16th-price auctions. When the auction closes, the units are allocated to the 16 highestoffers, with all bidders paying the price of the lowest winning offer. Each minute, the hotel auctionsissue quotes, indicating the 16th- (ASK) and 17th-highest (BID) prices among the currently activeunit offers. Each minute, starting at 4:00, one of the hotel auctions is selected at random to close,with the others remaining active and open for bids. Entertainment. Agents receive an initial random allocation of entertainment tickets (indexed by type and day), which they may allocate to their own clients or sell to other agents through continuousdouble auctions. A feasible client trip is deï¬ned by an inï¬ight day in , outï¬ight day out , and hotel type ( , à à à à à which is 1 if T and 0 if S). Trips also specify entertainment allocations, but for purposes of this paperwe summarize expected entertainment surplus as a function of trip days. (The paper describing our agent (Cheng et al., 2004) provides details on as well as some other constructs glossed ´à µ here.) The value of this trip for client (with preferences pa, pd, hp) is then given by pa in pd out hp (1) à ´à µ ½¼¼¼  ½¼¼´  ·  µ · ¡ à · ´à µ à à à At the end of a game instance, the TAC server calculates the optimal allocation of trips to clients for each agent, given ï¬nal holdings of ï¬ights, hotels, and entertainment. The agentâs game score isits total client trip utility, minus net expenditures in the TAC auctions. 3. Price Prediction TAC participants recognized early on the importance of accurate price prediction in overall perfor-mance (Stone & Greenwald, 2004). The prices of hotels are highly variable from game to game,yet a hotelâs price is not ï¬nalized until its auction closesâsome minutes into the game, dependingon the random closing order. Because agents tend not to submit serious hotel bids until the ï¬rstclosing is imminent, no useful information is revealed by price quotes until the fourth minute ofplay. Complementarity among goods dictates that outcomes of early auctions signiï¬cantly affectthe value an agent places on a particular hotel later in the game, and conversely, the prices of hotelsavailable later dictate whether an agent had bid wisely early in the game. Anticipating hotel prices is a key element in several decisions facing a TAC agent, in particular: 20
TAC PRICE PREDICTION 1. Selecting trip itineraries. Because ï¬ight prices tend to increase, agents have a great incentive to commit to traveling on particular days early in the game. Yet the quality of day choicesdepends crucially on the hotel prices of the included travel days. 2. Bidding policy. The likelihood of obtaining a good with a given bid depends obviously on the resulting clearing price. Moreover, the value of any particular good is in general a function ofthe price of others. For example, the value of obtaining a room at hotel (S, ) is an increasingfunction of the projected cost of the alternative hotel on that day, (T, ), and a decreasingfunction of the projected cost of complementary hotel rooms on the adjacent days, (S, )  ½ and (S, ). · ½ Given the importance of price prediction, it is not surprising that TAC researchers have explored a variety of approaches. Stone et al. (2003) noted a diversity of price estimation methods amongTAC-01 agents. Our particular interest in the problem is certainly connected to our own TAC-02agent, Walverine, which introduced a method quite distinct from those reported previously. Thus,we were highly motivated to characterize performance on this price prediction task. Indeed, if competitions such as TAC are to be successful in facilitating research, it will be nec- essary to separately evaluate techniques developed for problem subtasks (Stone, 2002). Althoughinteresting subtasks tend not to be strictly separable in such a complex game, the price-predictioncomponent of trading strategy may be easier to isolate than most. In particular, the problem can beformulated in its own terms, with natural absolute accuracy measures. And in fact, as we see below,most agent developers independently chose to deï¬ne price prediction as a distinct task in their ownagent designs. We divide price prediction into two phases: initial and interim. Initial refers to the beginning of the game, before any hotel auctions close or provide quote information. Interim refers to themethod employed thereafter. Since the information available for initial prediction (ï¬ight prices,client preferencesâsee Section 5.2) is a strict subset of that available for interim (which adds trans-action and hotel price data), most agents treat initial prediction as just a (simpler) special case. Initial prediction is relevant to bidding policy for the ï¬rst hotel closing, and especially salient for trip choices as these are typically made early in the game. Interim prediction supports ongoing re-vision of bids as the hotel auctions start to close. Our analysis and report focus on initial prediction,mainly because it is the simpler of the two tasks, involving less potential information. Moreover,agents initially have relatively comparable information sets, thus providing for a cleaner analysis.Interim prediction is also quite important and interesting, and should be the focus of further work. 4. TAC-02 Agents The nineteen agents who participated in the TAC-02 tournament are listed in Table 1. The tablepresents raw average scores from the ï¬nals and semiï¬nals, and weighted averages for the seedinground. For overviews of most of these agents, see the survey edited by Greenwald (2003a). The seeding rounds were held during the period 1-12 July, each agent playing 440 games. Weights increased each day, so that later games counted more than earlier, and the lowest 10 scoresfor each agent were dropped. The top 16 agents advanced to the semiï¬nals, held on 28 July inEdmonton, Canada. There were two semiï¬nal heats: H1 comprising agents seeded 1-4 and 13-16,with the 5-12 seeds placed in heat H2. The top four teams from each heat (14 games, lowest score 21
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK Agent Afï¬liation Seeding Semiï¬nals Finals ATTac AT&T Research (et al.) 3131 H1: 3137 â BigRed McGill U 696 â â cuhk Chinese U Hong Kong 3055 H2: 3266 3069 harami Bogazici U 2064 â â kavayaH Oracle India 2549 H1: 3200 3099 livingagents Living Systems AG 3091 H1: 3310 3181 PackaTAC N Carolina State U 2835 H2: 3250 â PainInNEC NEC Research (et al.) 2319 H1: 2193 â RoxyBot Brown U 2855 H2: 3160 â sics Swedish Inst Comp Sci 2847 H2: 3146 â SouthamptonTAC U Southampton 3129 H1: 3397 3385 Thalis U Essex 3000 H2: 3199 3246 tniTac Poli Bucharest 2232 H1: 3108 â TOMAhack U Toronto 2809 H2: 2843 â tvad Technion 2618 H1: 2724 â UMBCTAC U Maryland Baltimore Cty 3118 H1: 3208 3236 Walverine U Michigan 2772 H2: 3287 3210 WhiteBear Cornell U 2966 H2: 3324 3413 zepp Poli Bucharest 2098 â â Table 1: TAC-02 tournament participants, and their scores by round. dropped) proceeded to the ï¬nals, which ran for 32 games the same day. Further details about theTAC-02 tournament are available at http://www.sics.se/tac. 5. Price Prediction Survey Shortly after the TAC-02 event, we distributed a survey to all the entrants eliciting descriptions anddata documenting their agentsâ price-prediction methods. Sixteen out of 19 teams responded to thesurvey, including 14 of 16 semiï¬nalists, and all eight ï¬nalists. The result provides a detailed pictureof the prediction techniques employed, and enables some comparison of their efï¬cacy with respectto a common experienceâthe TAC-02 ï¬nals and semiï¬nals. Thirteen out of the 16 respondents reported that their agents did indeed form explicit price pre- dictions for use in their trading strategies. These thirteen are listed in Table 2, along with somehigh-level descriptors of their approach to the initial prediction task. 2 In addition, tniTac and zeppresponded that price predictions were part of their agent designs, but were not developed sufï¬cientlyto be deployed in the tournament. TOMAhack reported an ambitious design (also not actually em-ployed) based on model-free policy learning, which does account for other agentsâ bidding behaviorbut without formulating explicit price predictions. 2. Though we address only initial prediction in this report, our survey also solicited descriptions about interim methods. 22
TAC PRICE PREDICTION Agent Approach Form Notes ATTac machine learning prob boosting cuhk historical priceline moving average harami historical prob kavayaH machine learning point neural net livingagents historical point PackaTAC historical prob RoxyBot historical prob sics historical priceline SouthamptonTAC historical point classiï¬cation to reference categories Thalis historical (?) point survey incomplete UMBCTAC historical point Walverine competitive point equilibrium analysis WhiteBear historical point Table 2: Agents reporting prediction of hotel prices in TAC-02. 5.1 Forms of Prediction One distinction observed in TAC-01 was that some agents explicitly formulated predictions in termsof probability distributions over prices, rather than point estimates. Predictions in this form enablethe agent to properly account for price uncertainty in decision making. Thus, we asked entrantsabout the form of their predictions in our survey. Although most agents generate point predictions, there are notable exceptions. ATTacâs boost- ing algorithm (Stone et al., 2003) expressly learns probability distributions associated with gamefeatures. RoxyBot tabulates game price statistics for a direct estimation of deciles for each hotelauction. PackaTAC and harami measure historical variance, combining this with historical averag-ing to deï¬ne a parametric distribution for each hotel price. Walverine predicts point prices, but itsâhedgingâ approach for some decisions amounts to forming an effective distribution around them. Given a prediction in the form of a distribution, agents may make decisions by sampling or through other decision-analytic techniques. The distribution may also facilitate the interim pre-diction task, enabling updates based on treating observations such as price quotes as evidence.However, the ï¬rst controlled experiment evaluating the distribution feature, in the context of ATTac(Stone et al., 2003), did not ï¬nd an overall advantage to decision-making based on distributionscompared to using mean values. The authors offered several possible explanations for the observedperformance, including (1) that their implementation employs insufï¬cient samples, and (2) that theiruse of distributions makes the unrealistic assumption that subsequent decisions can be made withfull knowledge of the actual price values. Greenwald (2003b) also found that bidding marginal util-ity based on means outperformed bidding expected marginal utility based on distributions, this timeimplemented in the context of RoxyBot. We have since performed analogous trials using Walver-ineâwhich generates and applies distributions in yet a third wayâand also found bidding basedon means to be superior to the distribution-based bidding the agent actually employed in TAC-02.Although the source of this deï¬ciency has not been conclusively established, we speculate that thesecond reason adduced by the ATTac designers is most plausible. 23
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK Despite this evidence, alternative ways of using the distributions may well prove beneï¬cial. The study by Greenwald (2003b) demonstrated an advantage to RoxyBotâs 2002 strategy of evaluatingcandidate bid sets with respect to distributions, compared to its 2000 strategy of evaluating themwith respect to means. Nevertheless, for agents that predict probability distributions, we take the mean of these dis- tributions as the subject of our analysis. This may discount potential advantages, but based onthe discussion above, we suspect thatâwith the possible exception of RoxyBotâagents did notactually beneï¬t from predicting distributions in TAC-02. Another variation in form is the prediction of prices as a function of quantity demanded. From the ï¬rst TAC, entrants recognized that purchasing additional units may cause the price to increase,and so introduced the concept of pricelines, which express estimated prices by unit (Boyan & Green-wald, 2001; Stone & Greenwald, 2004). Agents sics and cuhk reported predicting pricelines. 3 Inboth cases, the agent started with a baseline point prediction for the ï¬rst unit of each hotel, andderived the remainder of the priceline according to some rule. For example, sics predicted the pricefor the th unit (i.e., price given it demands units) to be འ, where is the baseline prediction à à Ãà à and is 1.15 for hotels on day 1 or 4, and 1.25 for hotels on day 2 or 3. à In the succeeding analysis, we evaluate predictions in terms of baseline prices only. As noted below, our accuracy measures applied to pricelines would not reï¬ect their actual value. 5.2 Information Employed The set of information potentially available at the beginning of the game includes all data from pastgames, the initial vector of ï¬ight prices, and the agentâs own client preferences. For TAC-02, allagents except Walverine reported using historical information in their predictions. Only ATTac,kavayaH, and Walverine employ ï¬ight prices. All agents that construct pricelines effectively takeaccount of own client preferences. Walverine does not construct pricelines but does factor in itsown client preferences as part of its equilibrium calculations. The identities of other agents participating in a game instance are not known during the TAC preliminary (qualifying and seeding) rounds, as agents are drawn randomly into a round-robin tour-nament. However, the semiï¬nal and ï¬nal rounds ï¬xed a set of eight agents for a series of games,and so the identity of other agents was effectively observable. ATTac is the only agent to exploit theavailability of this information. 6. Approaches to Price Prediction Based on survey responses, we divide TAC-02 prediction techniques into three categories. 6.1 Historical Averaging Most agents took a relatively straightforward approach to initial price prediction, estimating thehotel clearing prices according to observed historical averages. For example, harami calculates themean hotel prices for the preceding 200 games, and uses this as its initial prediction. The respectiveagents classiï¬ed as adopting the âhistoricalâ approach in Table 2 differ on what set of games they 3. WhiteBear also reported using pricelines for interim prediction (Vetsikas & Selman, 2003), but initial predictions were essentially points. 24
TAC PRICE PREDICTION include in the average, but most used games from the seeding round. Given a dataset, agents tendto use the sample mean or distribution itself as the estimate, 4 at least as the baseline. The majority of averaging agents ï¬xed a pool of prior games, and did not update the averages during the ï¬nals. An exception was cuhk, which employed a moving average of the previous tengames in the current round, or from previous rounds at the beginning of a new round. UMBCTAC reported employing mean prices as predictions with respect to decisions about trips of two or more days, but median prices (which tended to be lower) for decisions about one-daytrips. For the semiï¬nals they based their statistics on the last 100 seeding games. For the ï¬nalstheir dataset comprised the 14 games of their semiï¬nal heat. In our analysis below, we attributepredictions to UMBCTAC based on the mean values from these samples. The approach taken by SouthamptonTAC (He & Jennings, 2003) was unique among TAC agents. The SouthamptonTAC designers partitioned the seeding-round games into three cate-gories, âcompetitiveâ, ânon-competitiveâ, and âsemi-competitiveâ. They then speciï¬ed a referenceprice for each type and day of hotel in each game category. The agent then chooses a category forany game based on its monitoring of recent game history. In the actual tournament, Southampton-TAC began the semiï¬nals predicting the semi-competitive reference prices, maintaining this stanceuntil switching to non-competitive for the last eight games of the ï¬nals. 6.2 Machine Learning A couple of TAC-02 agents employed machine learning techniques to derive relationships betweenobservable parameters and resulting hotel prices. The premise of this approach is that game-speciï¬cfeatures provide potentially predictive information, enabling the agent to anticipate hotel price direc-tions before they are manifest in price quotes themselves. Not surprisingly, as noted in Section 5.2,the two learning agents employed more kinds of information than typical TAC-02 agents. ATTac predicts prices using a sophisticated boosting-based algorithm for conditional density estimation (Stone et al., 2003). Development of the technique was expressly motivated by the TACprice-prediction problem, though the resulting algorithm is quite general. ATTac learns a predictorfor each hotel type and day category (i.e., days 1 and 4 are treated symmetrically, as are 2 and 3).The predictor applied at the beginning of the game maps the following features into a predictedprice for that hotel: identity of agents participating in the game, ¯ initial ï¬ight prices, and ¯ closing time of each hotel room. ¯ Since the hotel closing times are unknown at game start, this predictor induces a distribution overprice predictions, based on the distribution of hotel closing sequences. This distribution constitutesATTacâs initial price prediction. kavayaH (Putchala et al., 2002) predicts initial hotel prices using neural networks trained via backpropagation. The agent has a separate network for each hotel. The output of each network 4. Several of these agents complement this simple initial prediction with a relatively sophisticated approach to interim prediction, using the evidence from price quotes to gradually override the initial estimate. All else equal, of course,straightforwardness is an advantage. Indeed, simplicity was likely a signiï¬cant ingredient of livingagentsâs successin TAC-01 (Fritschi & Dorer, 2002; Wellman et al., 2003). 25
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK is one of a discrete set of prices, where the choice set for each hotel (type, day) was speciï¬ed bykavayaHâs designers based on historical prices. The inputs for each network are based on initialï¬ight prices, speciï¬cally thresholded differences between ï¬ights on adjacent days. For example,hotel T1 might have a binary input that indicates whether the price difference between inï¬ights ondays 1 and 2 is greater than 50. Hotel S2 might have this input, as well as another based on thedifference in ï¬ight prices on days 2 and 3. kavayaHâs designers selected the most relevant inputsbased on experiments with the agent. 6.3 Competitive Analysis Walverineâs overall approach to TAC markets is to presume that they are well-approximated bya competitive economy (Cheng et al., 2004). Its method for predicting hotel prices is a literalapplication of this assumption. Speciï¬cally, it calculates the Walrasian competitive equilibrium ofthe TAC economy, deï¬ned as the set of prices at which all markets would clear, assuming otheragents behave as price takers (Katzner, 1989). Taking into account the exogenously determinedï¬ight prices, Walverine ï¬nds a set of hotel prices that support such an equilibrium, and returnsthese values as its prediction for the hotelsâ ï¬nal prices. Let be a vector of hotel prices, consisting of elements denoting the price of hotel type on à à day . Let denote agent âs demand for hotel day at these prices, and write the vector of à ´Ãµ such demands as . Aggregate demand is simply the sum of agent demands, à . à ´Ãµ ôõ à ´Ãµ Prices constitute a competitive equilibrium if aggregate demand equals aggregate supply for à all hotels. Since there are 16 rooms available for each hotel on each day, we have that in competitiveequilibrium, . ôõ ½ Starting from an initial guess ¼, Walverine searches for equilibrium prices using the taton- à nement protocol, an iterative price adjustment mechanism originally conceived by Walras (Arrow& Hahn, 1971). Given a speciï¬cation of aggregate demand, tatonnement iteratively computes arevised price vector according to the following difference equation: ÷½ à à à (2) à à · « Ã´à µ  ½ Although equilibrium prices are not guaranteed to exist given the discreteness and complementari-ties of the TAC environment, we have found that this procedure typically produces an approximateequilibrium well within the 300 iterations Walverine devotes to the prediction calculation. A critical step of competitive analysis is determining the aggregate demand function. Walver- ine estimates as the sum of (1) its own demand based on the eight clients it knows about, ôõ and (2) the expected demand for the other agents (56 clients), based on the speciï¬ed TAC distri-bution of client preferences. Our calculation of expected demand for the others is exact, modulo asummarization of entertainment effects, given an assumption that agent demands are separable byclient (Cheng et al., 2004). This assumption is true at the beginning of the game (hence for initialprediction), but is invalidated once agents accumulate holdings of ï¬ights and hotels. Although theanalytic expression of this expected demand is somewhat complicated, deriving it is not conceptu-ally or computationally difï¬cult. Note that the larger component of Walverineâs demand estimation is an expectation over the deï¬ned distribution of client preferences. Therefore, the prices it derives should properly be viewedas an equilibrium over the expectation, rather than the expected equilibrium prices. The latter mightactually be a more appropriate measure for price prediction. However, since expected equilibrium 26
TAC PRICE PREDICTION is much more computationally expensive than equilibrium of the expectation (and we suspect thedifference would be relatively small for 56 i.i.d. clients), we employ the simpler measure. 7. Predictions As part of the survey, entrants provided the predictions their agents actually employed in the TAC-02 ï¬nals and semiï¬nals: a total of 60 games. In many cases, predictions are constant (i.e., thesame for every game), so it is straightforward to evaluate them with respect to the full slate ofï¬nal and semiï¬nal games. In two of the cases where initial predictions change every game (ATTacand Walverine), entrants were able to construct what their agent would have predicted for eachof these games, whether or not they actually participated. In one case (kavayaH), we have partialdata. kavayaH reported its predictions for the 32 ï¬nal games, and for the semiï¬nal heat in which itparticipated (H1), except for one game in which its predictor crashed. We include two versions of ATTac, corresponding to predictors learned from the 2001 and 2002 preliminary rounds. ATTac01 and ATTac02, respectively, represent the prediction functionsemployed in the TAC-01 and TAC-02 ï¬nals. In applying the ATTac01 predictor to the TAC-02ï¬nals, its use of agent identity information was disabled. The price vectors supplied by entrants and employed in our analysis are presented in Table 3. Prices are rounded to the nearest integer for display, though our analysis employed whatever preci-sion was provided. Agents who condition on game-speciï¬c information produce distinct vectors ineach instance, so are not tabulated here. Agent S1 S2 S3 S4 T1 T2 T3 T4 harami 21 58 80 16 47 108 101 64 livingagents 27 118 124 41 73 163 164 105 PackaTAC 21 116 119 38 76 167 164 97 RoxyBot5 20 103 103 20 76 152 152 76 sics 30 100 100 40 95 160 155 110 WhiteBear 19 102 96 28 75 144 141 81 SouthamptonTAC âSâ 50 100 100 50 100 150 150 100 SouthamptonTAC âNâ 20 30 30 20 50 80 80 50 UMBCTAC semiï¬nals 20 133 124 45 83 192 158 110 UMBCTAC ï¬nals 37 75 87 29 113 141 95 71 Actual Mean 68 85 97 52 121 124 154 109 Actual Median 9 48 38 8 59 105 98 59 Best Euc Dist 18 73 57 15 71 111 95 69 Best EVPP 28 51 67 0 80 103 100 84 Walverine const 28 76 76 28 73 113 113 73 Table 3: Predicted price vectors: Shoreline Shanties, followed by Tampa Towers, each for days 1â 4. The ï¬rst ten rows represent predictions employed by agents in the tournament. The lastï¬ve represent various benchmarks, discussed below. 27
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK The ï¬rst six rows of Table 3 (harami through WhiteBear) correspond to constant predictions for their associated agents. As noted above, SouthamptonTAC switched between two predictionvectors: âSâ represents the reference prices for its âsemi-competitiveâ environment, and âNâ itsânon-competitiveâ prices. UMBCTAC as well switched prediction vectors within the 60 gamesâintheir case introducing for the ï¬nals a prediction based on average semiï¬nal (H1) prices. The rows labeled âActual Meanâ and âActual Medianâ, respectively, present the average and median hotel prices actually resulting from the 60 games of interest. Although clairvoyance is ob-viously not an admissible approach to prediction, we include them here as a benchmark. In a directsense, the actual central tendencies represent the best that agents taking the historical averagingapproach can hope to capture. The price vectors labeled âBest Euc Distâ, âBest EVPPâ, and âWalverine constâ are discussed in Section 8.2. 8. Evaluating Prediction Quality 8.1 Accuracy Measures It remains now to assess the efï¬cacy of the various prediction approaches, in terms of the agentsâprice predictions in the actual TAC-02 ï¬nal and semiï¬nal games. In order to do so, we require somemeasure characterizing the accuracy of a prediction given the actual prices of a given game. à à 8.1.1 EUCLIDEAN DISTANCE A natural measure of the closeness of two price vectors is their Euclidean distance: ¾ ¿ ½ ¾ ¾ ´à à µ ´à  à µ ´ µ where indexes the price of hotel S T on day . Clearly, lower values of ´ µ ¾ ¾ ½ ¾ ¿ are preferred, and for any , . à ´à à µ ¼ Calculating is straightforward, and we have done so for all of the reported predictions for all 60 games. Note that if is in the form of a distribution, the Euclidean distance of the mean à provides a lower bound on the average distance of the components of this distribution. Thus, atleast according to this measure, our evaluation of distribution predictions in terms of their meansprovides a bias in their favor. It is likewise the case that among all constant predictions, the actual mean for a set of games à minimizes the aggregate squared distance for those games. That is, if is the actual price vector à for game , , ½ à à à ½ ¾ à à à à à ´à à µ à à ½ ½ 5. RoxyBotâs prediction is based on statistics from the seeding rounds, expressed as cumulative price distributions for each hotel, discretized into deciles. RoxyBot reportedly based its decisions on samples from this distribution,taking each decile value to occur with probability 0.1. This tends to overestimate prices, however, as the decile valuescorrespond to upper limits of their respective ranges. The prediction vector presented in Table 3 (and analyzed below)corresponds to an adjusted value, obtained by dropping the top decile and averaging the remaining nine. 28
TAC PRICE PREDICTION There is no closed form for the prediction minimizing aggregate , but one can derive it numericallyfor a given set of games (Bose et al., 2002). 8.1.2 EXPECTED VALUE OF PERFECT PREDICTION Euclidean distance appears to be a reasonable measure of accuracy in an absolute sense. However,the purpose of prediction is not accuracy for its own sake, but rather to support decisions based onthese predictions. Thus, we seek a measure that relates in principle to expected TAC performance.By analogy with standard value-of-information measures, we introduce the concept of value ofperfect prediction (VPP). Suppose an agent could anticipate perfectly the eventual closing price of all hotels. Then, among other things, the agent would be able to purchase all ï¬ights immediately with conï¬dence that ithad selected optimal trips for all clients. 6 Since many agents apparently commit to trips at thebeginning of the game anyway, perfect prediction would translate directly to improved quality ofthese choices.7 We take this as the primary worth of predictions, and measure quality of a predictionin terms of how it supports trip choice in comparison with perfect anticipation. The idea is thatVPP will be particularly high for agents that otherwise have a poor estimate of prices. If we arealready predicting well, then the value of obtaining a perfect prediction will be relatively small.This corresponds to the use of standard value-of-information concepts for measuring uncertainty:for an agent with perfect knowledge, the value of additional information is nil. Speciï¬cally, consider a client with preferences pa pd hp . A tripâs surplus for client at ´ µ prices , is deï¬ned as value minus cost, à ´à à µ cost ´à à µ à ´à µ  ´à à µ where cost is simply the total price of ï¬ights and hotel rooms included in trip . Let ´à à µ à £ à ´Ãµ à à à ´à à µ à denote the trip that maximizes surplus for with respect to prices . The expression à £ ´à ´Ãµ õ then represents the surplus of the trip chosen based on prices , but evaluated with respect to prices à . From this we can deï¬ne value of perfect prediction, à VPP £ £ (3) ´à õ ´à ´Ãµ à µ  ´à ´Ãµ à µ Note that our VPP deï¬nition (3) is relative to client preferences, whereas we seek a measure applicable to a pair of price vectors outside the context of a particular client. To this end we deï¬ne 6. Modulo some residual uncertainty regarding availability of entertainment tickets, which we ignore in this analysis.7. We compiled statistics on the temporal proï¬le of ï¬ight purchases for the eight agents in the TAC-02 ï¬nals. Four of the agents purchased 16 ï¬ights (enough for round trips for all clients) within 45 seconds on average. All eight agentspurchased more than half their ï¬ights by that time, on average. Vetsikas and Selman (2003) veriï¬ed experimentallythat predicting prices beneï¬ts agents who commit to ï¬ights early to a greater extent than it does those who delayï¬ight purchases. 29
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK the expected value of perfect prediction, EVPP, as the expectation of VPP with respect to TACâsdistribution of client preferences: EVPP VPP ´ à à µ ´à õ £ £ (4) ´à ´Ãµ à µ  ´à ´Ãµ õ Note that as for , lower values of EVPP are preferred, and for any , EVPP . à ´à à µ ¼ From (4) we see that computing EVPP reduces to computing £ . We can derive ´à ´Ãµ à µ this latter value as follows. For each (pa,pd) pair, determine the best trip for hotel S and the best tripfor hotel T, respectively, at prices ignoring any contribution from hotel premium, hp. From this à we can determine the threshold value of hp (if any) at which the agent would switch from S to T.We then use that boundary to split the integration of surplus (based on prices ) for these trips, with à respect to the underlying distribution of hp. Note that this procedure is analogous to Walverineâsmethod for calculating expected client demand (Cheng et al., 2004) in its competitive equilibriumcomputation. 8.2 Results Figure 1 plots the thirteen agents for whom we have prediction data according to our two qualitymeasures. With one exception, the and EVPP values shown represent averages over the 60 gamesof TAC-02 ï¬nals and semiï¬nals. Since kavayaH predicted only 45 games, we normalized its aver-age and EVPP values to account for the relative difï¬culty of the games it omitted compared to thegames it predicted. The normalization multiplied its raw average by the ratio of prediction qualitiesfor these game sets by another representative agent (ATTac01, which was the most favorable choicefor kavayaH, though other normalizations would have produced similar results). The two dashed lines in Figure 1 represent best-achievable constant predictions with respect to the two accuracy measures. âBest Euc Distâ minimizes average Euclidean distance, as indicatedby the vertical line. For EVPP, we performed hill-climbing search from a few promising candi-date vectors to derive a local minimum on that measure, represented by the horizontal line. Bothreference vectors are provided in Table 3. Note that in principle, only agents that varied their pre-dictions across game instances (ATTac, kavayaH, cuhk, and Walverine, and to a coarser degree,SouthamptonTAC and UMBCTAC) have the potential to perform outside the upper-right quadrant. To assess the signiï¬cance of the accuracy rankings among agents, we performed paired-sample t-tests on all pairs of agents for both of our measures. The differences between Walverine andATTac01 do not reach a threshold of statistical signiï¬cance on either measure. Walverine beatsATTac01 for at while ATTac01 beats Walverine for EVPP at . Walverine à ½ à ½ signiï¬cantly ( ) outperforms all other agents on both measures. ATTac01 signiï¬cantly à ¼¿ ( ) outperforms all other agents for EVPP, but for it is not statistically distinguishable à ¼½ ( ) from kavayaH, harami, or cuhk. For EVPP, Walverine and ATTac01 are the only à ¼ agents that beat âBest EVPPâ ( and ), and âBest EVPPâ in turn beats all other à ¼½ à ¼ agents (all but cuhk signiï¬cantly). For , Walverine is the only agent to signiï¬cantly ( ) à ¼¼½ beat Best Euclidean Distance, which in turn beats every other agent but ATTac01 and kavayaH.No agent but Walverine does signiï¬cantly better than Actual Mean, with ATTac01, kavayaH, andharami statistically indistinguishable. The large discrepancy in performance between ATTac01 and ATTac02 is unexpected, given that their predictors are generated from the same boosting-based learning algorithm (Stone et al., 30
TAC PRICE PREDICTION . livingagents 65 PackaTAC .Dist Southampton Euc 60 RoxyBot whitebear UMBCTAC Best prediction ATTac02 55 SICS perfect 50 of harami value 45 kavayaH cuhk Best EVPP 40 Expected Walverine 35 ATTac01 190 200 210 220 230 240 Euclidean distance to actual prices Figure 1: Prediction quality for eleven TAC-02 agents. Dashed lines delimit the accuracy achiev- able with constant predictions: âbest Euclidean distanceâ and âbest EVPPâ for the tworespective measures. The diagonal line is a least-squares ï¬t to the points. Observe thatthe origin of this graph is at (190,32). 2003). This might be explained if the 2002 preliminary rounds were somehow less predictive of theTAC-02 ï¬nals than was the case in 2001. The relative success of another learning agent, kavayaH,is evidence against this, however. The more likely hypothesis is that the 2002 agent suffered from abug emerging from a last-minute change in computing environments. To directly evaluate a prediction in the form of pricelines, we would need to know the initial demand of this agent corresponding to the priceline. We did obtain such information from sics, butfound that the accuracy of the priceline prediction according to these measures was far worse thanthat of the baseline prediction. Our impression is that the pricelines may well be advantageous withrespect to the decisions the agents based on them, but do not improve basic accuracy. Note thatEVPP inherently is based on an interpretation of prices as linear, thus it may not provide a properevaluation of priceline predictions. 8.3 The Inï¬uence of Flight Prices Observe that the three best price predictorsâATTac01, Walverine, and kavayaHâare exactlythose agents that take ï¬ight prices into account. Initial ï¬ight prices potentially affect hotel pricesthrough their inï¬uence on agentsâ early trip choices. In theory, lower ï¬ight prices should increasethe tendency of agents to travel on those days, all else equal, thus increasing the prices of hotels on 31
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK the corresponding days of stay. Indeed, capturing this effect of ï¬ight prices was one of the mainmotivations for Walverineâs price-equilibrium approach. ATTac and kavayaH attempt to inducethe relationship from game data. kavayaHâs designers, in particular, explored neural network mod-els based on their hypotheses about which ï¬ights were likely to affect which hotel prices (Putchalaet al., 2002). To isolate and quantify the effect of ï¬ight prices, we investigated the contribution of different factors employed by Walverine in its predictions. We deï¬ned three additional versions of Walver-ineâs prediction model, each of which ignores some information that Walverine takes into account: Walv-no-cdata ignores its own client knowledge, effectively treating own demand as based ¯ on the same underlying client preference distribution assumed for the other agents. Walv-constF ignores the initial ï¬ight prices, assuming that they are set at the mean of the ¯ initial ï¬ight distribution (i.e., 325) in every game instance. Walverine const ignores its own client knowledge and takes ï¬ight prices at their mean rather ¯ than actual values. The result is a constant prediction vector, presented in Table 3. Figure 2 plots the prediction qualities of these agents. Ignoring client knowledge degraded pre-diction quality only slightly, increasing EVPP from 38.0 to 38.6. Neglecting initial ï¬ight prices,however, signiï¬cantly hurt predictions, increasing EVPP to 47.9. Ignoring both, Walverine constincurred an average EVPP of 49.1. 50 Walverine&95;Const Actual&95;Mean 48 Walv&95;constF 46 prediction Best&95;EucDist Actual&95;Median 44 perfectof 42 Best&95;EVPP value 40 Expected Walv&95;no&95;cdata 38 Walverine 197.5 200 202.5 205 207.5 210 212.5 215 Euclidean distance to actual prices Figure 2: Prediction quality for Walverine variants and various central tendency benchmarks. The results conï¬rm the predictive value of ï¬ight prices. On the EVPP measure, Walverine does not gain a signiï¬cant advantage from considering own client data, but cannot beat âBest EVPPâ 32
TAC PRICE PREDICTION without considering initial ï¬ight prices. For , client data does make a signiï¬cant ( ) dif- à ¼¿ ference when also considering ï¬ight data. Flight data signiï¬cantly ( ) affects Walverineâs à ¼¼½ prediction quality for the metric, regardless of client data. 8.4 Relating Accuracy to Performance As indicated by the scatter plot of Figure 1, our two accuracy measures are highly correlated( ). Given that EVPP is value-based, this does suggest that accuracy translates somewhat ¼ ¼ proportionally to performance. However, EVPP is a highly idealized proxy for actual scores, and sodoes not deï¬nitively establish the relation of prediction accuracy to overall TAC performance. Such a relation was observed in prior work by Stone et al. (2003), who evaluated the predictive accuracy and overall performance of four variations on ATTac01 in an experimental trial. Employ-ing a different measure of prediction quality than ours above, they found that average score wasrelated monotonically to average predictive accuracy. In an effort to more directly connect our accuracy measures to the bottom line, we regressed the actual TAC-02 game scores against the accuracy measures for all reported predictionsâone datapoint per agent per game. We controlled for favorability of random client preferences, employingthe same summary statistics used to construct the âclient preference adjustmentâ in our analysisof the TAC-01 tournament (Wellman et al., 2003). In two separate regressions, we found highlysigniï¬cant coefï¬cients (  ½¼ ) for both and EVPP. Predictive accuracy explained score à ½¼ variance quite poorly ( ¾ ), however, as might be expected given all the other unmodeled à ¼ ½¾ variation across agents. To reduce the variation, we undertook a series of controlled trials involving variants of Walver- ine (Cheng et al., 2004). Each trial comprised a set of games with a ï¬xed conï¬guration of agents.The agents were constant within trials, but varied across, as they were conducted weeks or monthsapart while Walverine was undergoing modiï¬cations. For each trial, we regressed the actual scoreof the ï¬rst agent on EVPP, controlling as above for favorability of random client preferences. Weconsidered only one agent per game, since the data points for the other agents would be dependentgiven their common game instance. The results of our linear regression are summarized in Table 4. Trial Mean EVPP EVPP Coeff ¾ à à 1 200 70.4 -8.89 0.57 2 151 32.2 -11.59 0.26 3 110 59.5 -10.26 0.65 Table 4: Regression of score on EVPP in three trials. The EVPP coefï¬cient was highly signiï¬cant in all cases (  ). Note that since EVPP is à ½¼ measured per-client in the same units as scores, a direct translation would equate reducing EVPPby one with an increase of eight score points. Our regressions yielded coefï¬cients ranging from-8.89 to -11.59, which we take as a rough conï¬rmation of the expected relationship. If anything,the results indicate that EVPP understates the value of predictionâwhich we might expect since itaddresses only initial trip choice. Interestingly, the regression model seems to provide a better ï¬t(as measured by ¾ ) for the trials involving worse price predictors (as measured by mean EVPP). à This suggests that as prediction is optimized, other unmodeled factors may have relatively greaterincremental inï¬uence on score. 33
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK It should be noted that these games appear to be quite unrepresentative of TAC tournament games. Since the agents are all versions of Walverine, they tend to make trip choices on the samebasis. 9. Limitations It is important to emphasize several limitations of this study, which must qualify any conclusionsdrawn about the efï¬cacy of prediction methods evaluated here. First, we have focused exclusively on initial price prediction, whereas many agents placed greater emphasis on the interim prediction task. Second, in many cases we have represented agentsâ predictions by an abstraction of the actual object produced by their prediction modules. In particular, we reduce probability distributions totheir means, and consider only the ï¬rst unit of a priceline prediction. More generally, we do notaccount for the very different ways that agents apply the predictions they generate. Without questioneven the measure we introduce, EVPP, is inspired by our thinking in terms of how Walverine usesits predictions. Perhaps measures tailored to the processes of other agents would (justiï¬ably) showtheir predictions in a more favorable light. Third, it should be recognized that despite the desirability of isolating focused components of an agent for analysis, complete separation is not possible in principle. Prediction evaluation is relativenot only to how the agent uses its prediction, but also how it makes other tradeoffs (e.g., when itcommits to ï¬ights), and ultimately its entire strategy, in all its complexity. Studies such as thismust strive to balance the beneï¬ts of decomposition with appreciation for the interconnections andsynergies among elements of a sophisticated agentâs behavior. 10. Conclusions We have presented a comprehensive survey of approaches to initial price prediction in TAC-02, withquantitative analysis of their relative accuracy. Our analysis introduces a new measure, expectedvalue of perfect prediction, which captures in an important sense the instrumental value of accuratepredictions, beyond their nominal correspondence to realized outcomes. We draw several conclusions about price prediction in TAC from this exercise. First, the results clearly demonstrate that instance-speciï¬c information can provide signiï¬cant leverage over purebackground history. Three agents use instance-speciï¬c information to perform better on at leastone measure than any constant prediction. In particular, the initial ï¬ight prices provide substantialpredictive value. This can be induced and veriï¬ed empirically, as seen through the success of themachine-learning agents. The predictive value ï¬ows from the inï¬uence of ï¬ight prices on demandfor hotels, as indicated by the success of competitive analysis in capturing this relationship. We believe it striking that a purely analytical approach, without any empirical tuning, could achieve accuracy comparable to the best available machine-learning method. Moreover, manywould surely have been skeptical that straight competitive analysis could prove so successful, giventhe manifest unreality of its assumptions as applied to TAC. Our analysis certainly does not showthat competitive equilibrium is the best possible model for price formation in TAC, but it doesdemonstrate that deriving the shape of a market from an idealized economic theory can be surpris-ingly effective. 34
TAC PRICE PREDICTION There are several advantages to model-based reasoning, most obviously the ability to perform with minimal or no empirical data. Even when historical information is available, it can be mislead-ing to rely on it in a nonstationary environment. A tournament setup like TAC naturally violatesstationarity, as the agent pool evolves over time, through selection as well as individual learningand development. Of course, dealing with time-variance, particularly in multiagent environments,is an active area of current research, and ultimately the best methods will combine elements ofmodel-based and data-based reasoning. Finally, we suggest that the present study represents evidence that research competitions can in the right circumstances produce knowledge and insights beyond what might emerge from indepen-dent research efforts. We hope that additional researchers will be inspired to bring their innovativeideas on trading strategy to the next TAC, and look forward to investigating the results of suchinterplay in future work. Acknowledgments This study would not have been possible without the generous cooperation of the TAC-02 entrants.The research was supported in part by NSF grant IIS-9988715, as well as a STIET fellowship to thethird author, under an NSF IGERT grant. References Arrow, K. J., & Hahn, F. H. (1971). General Competitive Analysis. Holden-Day, San Francisco. Bose, P., Maheshwari, A., & Morin, P. (2002). Fast approximations for sums of distances, clustering and the Fermat-Weber problem. Computational Geometry: Theory and Applications, 24,135â146. Boyan, J., & Greenwald, A. (2001). Bid determination in simultaneous auctions: An agent archi- tecture. In Third ACM Conference on Electronic Commerce, pp. 210â212, Tampa, FL. Cheng, S.-F., Leung, E., Lochner, K. M., OâMalley, K., Reeves, D. M., & Wellman, M. P. (2004). Walverine: A Walrasian trading agent. Decision Support Systems, To appear. Fritschi, C., & Dorer, K. (2002). Agent-oriented software engineering for successful TAC participa- tion. In First International Joint Conference on Autonomous Agents and Multi-Agent Systems,Bologna. Greenwald, A. (2003a). The 2002 trading agent competition: An overview of agent strategies. AI Magazine, 24(1), 83â91. Greenwald, A. (2003b). Bidding under uncertainty in simultaneous auctions. In IJCAI-03 Workshop on Trading Agent Design and Analysis, Acapulco. He, M., & Jennings, N. R. (2003). SouthamptonTAC: An adaptive autonomous trading agent. ACM Transactions on Internet Technology, 3, 218â235. Katzner, D. W. (1989). The Walrasian Vision of the Microeconomy. University of Michigan Press. Putchala, R. P., Morris, V. N., Kazhanchi, R., Raman, L., & Shekhar, S. (2002). kavayaH: A trading agent developed for TAC-02. Tech. rep., Oracle India. Sadeh, N., Arunachalam, R., Eriksson, J., Finne, N., & Janson, S. (2003). TAC-03: A supply-chain trading competition. AI Magazine, 24(1), 92â94. 35
WELLMAN, REEVES, LOCHNER, & VOROBEYCHIK Stone, P. (2002). Multiagent competitions and research: Lessons from RoboCup and TAC. In Sixth RoboCup International Symposium, Fukuoka, Japan. Stone, P., & Greenwald, A. (2004). The ï¬rst international trading agent competition: Autonomous bidding agents. Electronic Commerce Research, To appear. Stone, P., Schapire, R. E., Littman, M. L., Csirik, J. A., & McAllester, D. (2003). Decision-theoretic bidding based on learned density models in simultaneous, interacting auctions. Journal ofArtiï¬cial Intelligence Research, 19, 209â242. Vetsikas, I. A., & Selman, B. (2003). A principled study of the design tradeoffs for autonomous trading agents. In Second International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 473â480, Melbourne. Wellman, M. P., Greenwald, A., Stone, P., & Wurman, P. R. (2003). The 2001 trading agent com- petition. Electronic Markets, 13, 4â12. 36