Next: , Previous: , Up: Machine Desc   [Contents][Index]


16.13 Defining Looping Instruction Patterns

Some machines have special jump instructions that can be utilized to make loops more efficient. A common example is the 68000 ‘dbra’ instruction which performs a decrement of a register and a branch if the result was greater than zero. Other machines, in particular digital signal processors (DSPs), have special block repeat instructions to provide low-overhead loop support. For example, the TI TMS320C3x/C4x DSPs have a block repeat instruction that loads special registers to mark the top and end of a loop and to count the number of loop iterations. This avoids the need for fetching and executing a ‘dbra’-like instruction and avoids pipeline stalls associated with the jump.

GCC has three special named patterns to support low overhead looping. They are ‘decrement_and_branch_until_zero’, ‘doloop_begin’, and ‘doloop_end’. The first pattern, ‘decrement_and_branch_until_zero’, is not emitted during RTL generation but may be emitted during the instruction combination phase. This requires the assistance of the loop optimizer, using information collected during strength reduction, to reverse a loop to count down to zero. Some targets also require the loop optimizer to add a REG_NONNEG note to indicate that the iteration count is always positive. This is needed if the target performs a signed loop termination test. For example, the 68000 uses a pattern similar to the following for its dbra instruction:

(define_insn "decrement_and_branch_until_zero"
  [(set (pc)
        (if_then_else
          (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am")
                       (const_int -1))
              (const_int 0))
          (label_ref (match_operand 1 "" ""))
          (pc)))
   (set (match_dup 0)
        (plus:SI (match_dup 0)
                 (const_int -1)))]
  "find_reg_note (insn, REG_NONNEG, 0)"
  "…")

Note that since the insn is both a jump insn and has an output, it must deal with its own reloads, hence the ‘m’ constraints. Also note that since this insn is generated by the instruction combination phase combining two sequential insns together into an implicit parallel insn, the iteration counter needs to be biased by the same amount as the decrement operation, in this case -1. Note that the following similar pattern will not be matched by the combiner.

(define_insn "decrement_and_branch_until_zero"
  [(set (pc)
        (if_then_else
          (ge (match_operand:SI 0 "general_operand" "+d*am")
              (const_int 1))
          (label_ref (match_operand 1 "" ""))
          (pc)))
   (set (match_dup 0)
        (plus:SI (match_dup 0)
                 (const_int -1)))]
  "find_reg_note (insn, REG_NONNEG, 0)"
  "…")

The other two special looping patterns, ‘doloop_begin’ and ‘doloop_end’, are emitted by the loop optimizer for certain well-behaved loops with a finite number of loop iterations using information collected during strength reduction.

The ‘doloop_end’ pattern describes the actual looping instruction (or the implicit looping operation) and the ‘doloop_begin’ pattern is an optional companion pattern that can be used for initialization needed for some low-overhead looping instructions.

Note that some machines require the actual looping instruction to be emitted at the top of the loop (e.g., the TMS320C3x/C4x DSPs). Emitting the true RTL for a looping instruction at the top of the loop can cause problems with flow analysis. So instead, a dummy doloop insn is emitted at the end of the loop. The machine dependent reorg pass checks for the presence of this doloop insn and then searches back to the top of the loop, where it inserts the true looping insn (provided there are no instructions in the loop which would cause problems). Any additional labels can be emitted at this point. In addition, if the desired special iteration counter register was not allocated, this machine dependent reorg pass could emit a traditional compare and jump instruction pair.

The essential difference between the ‘decrement_and_branch_until_zero’ and the ‘doloop_end’ patterns is that the loop optimizer allocates an additional pseudo register for the latter as an iteration counter. This pseudo register cannot be used within the loop (i.e., general induction variables cannot be derived from it), however, in many cases the loop induction variable may become redundant and removed by the flow pass.


Next: , Previous: , Up: Machine Desc   [Contents][Index]