Compilers: dataflow analysis
by Andrew Chang-DeWitt, Tue Jul 08 2025
a general framework for analysing control flow graphs by
producing a list of facts that are available going into & out of each node
this means that each node does the following (in the eyes of dataflow analysis):
- generates some set of facts
- kills some set of facts
the steps:
- define facts, gen & kill sets
- define constraints
- convert constraints to equations (for use in algorithm)
- initialize facts for each node (consistent w/ if sets are increasing or decreasing as the cfg is traversed)
some examples of dataflow analyses are:
- liveness
- reaching defs
- optimizations
let's review liveness in terms of dataflow analysis
ex: liveness
define facts & gen/kill sets:
facts := live variables gen[n] := use[n] // set of variables referenced (live) at node kill[n] := def[n] // set of variables defined at node
constraints:
in[n] ⊇ gen[n] out[n] ⊇ in[n'] if n' succ[n] in[n] ⊇ out[n] / kill[n]
equations:
out[n] := ∪n'∈succ[n] in[n'] in[n] := gen[n] ∪ (out[n] / kill[n])
initialize sets:
out[n] := ∅ in[n] := ∅
ex: reaching defs
which definition of variable
might reach the current node?
( 1: b = a + 2 ) // out[1] = {1}
| // in[2] = {1}
v
( 2: c = b * b ) // out[2] = {1,2}
| // in[3] = {1,2}
v
( 3: b = c + 1 ) // out[3] = {2,3}
| // in[4] = {2,3}, note 2 still reaches node 4,
| // even though c is no longer live
v
( 4: ret b * a )
generalizing this:
define facts & gen/kill sets:
facts := nodes who's defs might reach current node gen[n] := current node if it defines a variable kill[n] := all other nodes that define that variable if current node defines a variable
constraints:
out[n] ⊇ gen[n] in[n] ⊇ out[n'] if n' ∈ pred[n] out[n] ⊇ in[n] / kill[n]
equations:
in[n] := ∪n'∈pred[n] out[n'] out[n] := gen[n] ∪ (in[n] / kill[n])
initialize sets:
out[n] := ∅ in[n] := ∅
types of dataflow analyses
backward | forward | |
---|---|---|
may | liveness: what variables may be needed from n? |
reaching defs: what defs may reach n? |
must | very busy exprs: what exprs will be defined on every path from n? |
available exprs: what exprs must reach n? |
generic dataflow analysis, in code
so how to translate the 4-step system to code? as an example here, we
disect the generalized pattern given in project 5, in the file
src/dataflow.ml
.
first, how is this module used? looking at src/opt.ml
, the dataflow module is initialized w/ a
module ExpDataflow = Dataflow.Make
(struct type t = var end)
(struct type t = inst
let compare a b =
(* This may not do the right thing, but it'll
* do something, which is good enough to just
* treat the set like a list *)
if a < b then -1 else
if a = b then 0 else 1
end)