Compilers: liveness analysis

by Andrew Chang-DeWitt, Wed Jun 04 2025

a variable is live when its value is needed, e.g.:

int f(int x) {
                 <---- x is live
  int a = x + 2;
                 <---- a & x are live
  int b = a * a;
                 <---- b & x are live
  int c = b + x;
                 <---- c is live
  return c;
}

observing this example, we see that each variable lives from the point after it is initialized to the last point where it is used.

[NOTE!] liveness !== scope

why does this matter?

  1. because if two variables share the same scope, but are not alive at the same time, then this means we can optimize by having these variables share registers
  2. some other analyses are possible when thinking in terms of liveness, such as defining constraints of non-lexical lifetimes in rustc's borrow checker implementation

a simple overview

to begin, we model the program being compiled as a control flow graph (CFG), composed of individual instructions as nodes & control flow "directions" as edges:

/->[ Move ]<-\
|     |      |
|     v      |
|  [ Binop ] |
|     |      |
|     v      |
|  [ If ]----/
|     |
|     v
|  [ Unop ]
|     |
|     v
\--[ Jump ]

liveness is then associated with edges in our graph. e.g., from a = b + 1, we can derive a CFG as:

     |
     |  live: b
     v
[ Mov a, b ]
     |
     |  live: a
     v
[ Add a, 1 ]
     |
     |  live: a (maybe)
     v

we can naively compile this to:

Move rax, rax
Add rax, 1

however, as line 1 should make painfully clear, it would be missing an important optimization. as seen by the liveness annotation for a & b above, neither variable is alive at the same time as the other, thus we can optimize by simply skipping the Move rax, rax instruction entirely.

use[s] & def[s]

liveness is based on uses & definitions, modeled as sets of variable names related to another variable.

e.g.:

∵ a = b + c

  use[a] = {b,c}
  def[a] = {a}

∵ a = a + 1

  use[a] = {a}
  def[a] = {a}

formal def of liveness

a variable v is live on edge e if:

there is:

a naive algorithm

for every variable v:
  for every n in use[v]:
    walk CFG backward from n until:
      a def[v] is found
      OR a previously visited node has been reached
      AND mark v as live on each edge traversed

this works, but is inefficient (time complexity of O(num edges _ |use[v]| _ num of variables) ~= O(n3)). how can we make it better?