Accurate Indirect Branch Prediction
Karel Driesen and Urs H?lzle
Department of Computer Science
University of California
Santa Barbara, CA 93106
Technical Report TRCS97-19
December 3, 1997
Abstract. Indirect branch prediction is likely to become increasingly important in the future because indirect branches occur more frequently in object-oriented programs. With misprediction rates of around 25% on current processors, indirect branches can incur a significant fraction of branch misprediction overhead even though they remain less frequent than the more predictable conditional branches. We investigate a wide range of two-level predictors dedicated exclusively to indirect branches. Starting with predictors that use full-precision addresses and unlimited tables, we progressively introduce hardware constraints and minimize the loss of predictor performance at each step. For programs from the SPECint95 suite as well as a suite of large C++ applications, a two-level predictor achieves a misprediction rate of 9.8% with a 1K-entry table and 7.3% with an 8K-entry table, representing more than a threefold improvement over an ideal BTB. A hybrid predictor further reduces the misprediction rates to 8.98% and 5.95%, respectively.
Current high-performance superscalar processors use branch prediction to speculatively execute instructions beyond an unresolved branch. If the branch is mispredicted, this work is lost, and execution must restart right after the branch instruction. Newer designs increase instructions issue width and pipeline depth, increasing the relative overhead of mispredicted branches and making accurate branch prediction even more critical to performance.
Conditional direct branches, whose target is encoded in the instruction itself, can already be predicted with reported hit rates of up to 97% ([YP93]). In contrast, indirect branches, which transfer control to an address stored in a register, are harder to predict accurately. Unlike conditional branches, they can have more than two targets so that prediction requires a full 32-bit or 64- bit address rather than just a ?taken? or ?not taken? bit. Current processors predict indirect branches with a branch target buffer (BTB) which caches the most recent target address of a branch. Unfortunately, BTBs typically have much lower prediction rates than the best predictors for conditional branches. For example, an ideal (unconstrained) BTB achieves an average prediction hit ratio of only 64% on the SPECint95 benchmarks.
Though not as common as conditional branches, indirect branches occur frequently enough to cause substantial overhead. Chang et al. [CHP97] predict a reduction in execution time of 14% and 5% for the perl and gcc benchmarks on a wide-issue superscalar processor with an improved prediction mechanism for indirect branches (Target Cache).