BookRags.com Literature Guides Literature
Guides
Criticism & Essays Criticism &
Essays
Questions & Answers Questions &
Answers
Lesson Plans Lesson
Plans
My Bibliography Periodic Table U.S. Presidents Shakespeare Sonnet Shake-Up
Research Anything:        
History | Encyclopedias | Films | News | Create a Bibliography | More... Login | Register | Help
Not What You Meant?  There are 19 definitions for Allocation.

Register allocation

Print-Friendly
About 5 pages (1,391 words)

Bookmark and Share Questions on this topic? Just ask!

In compiler optimization, register allocation is the process of multiplexing a large number of target program variables onto a small number of CPU registers. The goal is to keep as many operands as possible in registers to maximise the execution speed of software programs. Register allocation can happen over a basic block (local register allocation), over a whole function/procedure (global register allocation), or in-between functions as a calling convention (interprocedural register allocation). Most computer programs need to process large numbers of different data items. However, most CPUs can only perform operations on a small fixed number of "slots" called registers. Even on machines that support memory operands, register access is considerably faster than memory access. Variables not allocated to registers must be loaded in and out of RAM whenever they are used. Register spilling occurs where there are more live variables than the machine has registers. When a compiler is generating machine code and there are more live variables than the machine has registers, it has to transfer or "spill" some variables from registers to memory. This incurrs a certain cost, as access from memory is typically slower than access from a register. In compilers, register pressure occurs when there are more variables to allocate than there are registers available. This typically results in register spilling.

Contents

Challenges

Register allocation is an NP-complete problem[1][2]. The number of variables in a typical program is much larger than the number of available registers in a processor, so the contents of some variables have to be spilled (saved) into memory locations. The cost of such spilling is minimised by spilling the least frequently used variables first, but it is not easy to know which variables will be used the least. In addition to this the hardware and operating system may impose restrictions on the usage of some registers.

Global register allocation

Like most other compiler optimizations, register allocation is based on the result of some compiler analysis, mostly the result of live variable analysis from data flow analysis. Traditional allocators perform global register allocation using a graph coloring algorithm devised by Chaitin et al. It can be divided into two phases:

  1. Machine instructions are generated as if there are an infinite number of symbolic registers. So all variables suitable to being in registers will be assigned to numbered logical registers. The phase is sometimes called register variable recognition.
  2. symbolic registers are replaced by physical registers in a target machine, with the minimum cost of spills.

In phase two, an interference graph is constructed where nodes are program variables and an arc connects two nodes if they are alive at the same time. More precisely, if one variable is alive at the time the other is defined then they are said to interfere. If the graph can be colored with R colors then the variables can be stored in R registers. This insight was pointed out by John Cocke, "father of the RISC architecture". The problem is that coloring a graph is an NP-hard problem. The key insight to Chaitin’s algorithm is called the degree < R rule which is as follows. Given a graph G which contains a node N with degree less than R, G is R-colorable iff the graph G’, where G’ is G with node N removed, is R-colorable. The proof is obvious in one direction: if a graph G can be colored with R colors then the graph G’ can be created without changing the coloring. In the other direction, suppose we have an R-coloring of G’. Since N has a degree of less than R there must be at least one color that is not in use for a node adjacent to N. We can color N with this color.

While G cannot be R-colored
  While graph G has a node N with degree less than R
    Remove N and its associated edges from G and push N on a stack S
  End While 
  If the entire graph has been removed then the graph is R-colorable 
    While stack S contains a node N
      Add N to graph G and assign it a color from the R colors
    End While
  Else graph G cannot be colored with R colors
    Simplify the graph G by choosing an object to spill and remove its node N from G
    (spill nodes are chosen based on object’s number of definitions and references)
End While

This algorithm is O(n^2). This algorithm can be improved through subsumption which is the act of coalescing nodes which are the source and target of copy operations into a single node before running the algorithm. This reduces the number of nodes to color but can increase the degree of any coalesced node. This can only be done when the nodes do not interfere with each other, however, and aggressive coalescing can lead to uncolorable graphs. (Preston Briggs’ thesis work introduces safer methods to determine which nodes to coalesce and spill. Based on his improvements this algorithm is often called the Chaitin-Briggs algorithm.) The subsumption step is slow and is not done in fast register allocators.

Recent developments

Graph coloring allocators produce efficient code, but their allocation time is high. In cases of static compilation, allocation time is not a significant concern. In cases of dynamic compilation, such as just-in-time (JIT) compilers, fast register allocation is important. An efficient technique proposed by Poletto and Sarkar is linear scan allocation. This technique requires only a single pass over the list of variable live ranges. Ranges with short lifetimes are assigned to registers, whereas those with long lifetimes tend to be spilled, or reside in memory. The results are on average only 12% less efficient than graph coloring allocators. The linear scan algorithm follows:

  1. Perform dataflow analysis to gather liveness information. Keep track of all variables’ live intervals, the interval when a variable is live, in a list sorted in order of increasing start point (note that this ordering is free if the list is built when computing liveness.) We consider variables and their intervals to be interchangeable in this algorithm.
  2. Iterate through liveness start points and allocate a register from the available register pool to each live variable.
  3. At each step maintain a list of active intervals sorted by the end point of the live intervals. (Note that insertion sort into a balanced binary tree can be used to maintain this list at linear cost.) Remove any expired intervals from the active list and free the expired interval’s register to the available register pool.
  4. In the case where the active list is size R we cannot allocate a register. In this case add the current interval to the active pool without allocating a register. Spill the interval from the active list with the furthest end point. Assign the register from the spilled interval to the current interval or, if the current interval is the one spilled, do not change register assignments.

Cooper and Dasgupta recently developed a "lossy" Chaitin-Briggs graph coloring algorithm suitable for use in a JIT [3]. The "lossy" moniker refers to the imprecision the algorithm introduces into the interference graph. This optimization reduces the costly graph building step of Chaitin-Briggs making it suitable for runtime compilation. Experiments indicate that this lossy register allocator outperforms linear scan on the majority of tests used. "Optimal" register allocation algorithms based on Integer Programming have been developed by Goodwin and Wilken for regular architectures. These algorithms have been extended to irregular architectures by Kong and Wilken. While the worst case execution time is exponential, the experimental results show that the actual time is typically of order <math>O(n^{2.5})</math> of the number of constraints[4].

References

  1. ^ Gregory J. Chaitin, Mark A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. "Register allocation via coloring." Computer Languages, 6:47-57, 1981.
  2. ^ Fernando Magno Quintão Pereira, Jens Palsberg, "Register Allocation after Classical SSA Elimination is NP-complete", http://www.cs.ucla.edu/~palsberg/paper/fossacs06.pdf
  3. ^ Cooper, Dasgupta, "Tailoring Graph-coloring Register Allocation For Runtime Compilation", http://llvm.org/pubs/2006-04-04-CGO-GraphColoring.html
  4. ^ Kong, Wilken, "Precise Register Allocation for Irregular Architectures", http://www.ece.ucdavis.edu/cerl/cerl_arch/irreg.pdf

View More Summaries on Register allocation
 
Ask any question on Register allocation and get it answered FAST!
Answer questions in BookRags Q&A and earn points toward
discounted or even FREE Study Guides and other BookRags products!
Learn more about BookRags Q&A
Copyrights
Register allocation from Wíkipedia. ©2006 by Wíkipedia. Licensed under the GNU Free Documentation License. View a list of authors or edit this article.

Article Navigation
Join BookRagslearn moreJoin BookRags




About BookRags | Customer Service | Report an Error | Terms of Use | Privacy Policy