| ![]() | |||||||||
there are no experimental results, due to the fact that his algorithm reflects a special hardware environment that is not available yet.
Only a few experiments have been conducted yet with massive parallelism (> 20 processors) : In [OF88] Otto and Felten reported on a speedup of 101 using a 256 Hypercube processors, however this was achieved by parallelizing a suboptimal version of the sequential -algorithm. Another interesting result is the speedup of 12 achieved by Ferguson and Korf in [FK88] using 32 processors for an Othello playing program.
We presented a speedup of 25 with 32 processors and a speedup of 34 with 64 processors in [FMM90]. Our recent results indicate that our algorithm is scalable even beyond the 256 processors with expected good efficiency.
Furthermore we give a cost and gain study of our distributed transposition table presented in [FMM90]. This transposition table is a virtually global table that is physically distributed among all the processors in the network. Access to this table is by routing requests and entries from a demanding processor to the processor responsible for the entry. We show that in the processor range available the cost to access this table by routing is much smaller than the gain resulting from the huge table size. Moreover this approach is easily scalable without introducing bottlenecks (as when using a global transposition table located at a single processor) and without the loss of valuable informations (as when using tables with local access only).
In [MP85] Marsland and Popowich compared the use of local and global transposition tables. Schaeffer uses a hybrid version of these methods for his chess program Sun Phoenix ([MOS86, Sch89]). All these approaches suffer heavily from either the communication bottleneck or the information loss.
A short review of our algorithm is given in section 2., in the section 3. we describe the extension of the Young Brothers Wait Concept. The cost and gain analysis of the distributed transposition table in presented in section 4. In section 5. we give the results gained from the experiments concerning the performance of the transposition table and the overall performance of the algorithm.
2. REVIEW OF OUR DISTRIBUTED ALGORITHM
In this section we will briefly review our distributed algorithm described in [FMMV90, FMM90].
The general idea to use parallelism is to dynamically decompose the game tree that has to be searched. This is done in the following way:
ffl At the start of a game tree search all processors but one are idle. The busy one is responsible for the root and starts to search the game tree.
ffl Processors that search a subtree of the whole game tree produce subproblems that are queued for later evaluation.