[analyzer] An in-depth look at Liveness Analysis in the Static Analyzer

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[analyzer] An in-depth look at Liveness Analysis in the Static Analyzer

Keane, Erich via cfe-dev
Hi!

This mail will take a look at LiveVariables, which in reality is a live values analysis implemented by clang. Written about 2 months after the clang/ directory itself was created, it plays a crucial rule in the Clang Static Analyzer to this day. Despite that, I managed to find only 2 functional changes to it in the last 8 years, so we generally consider it to be stable.

As I wanted to merge the implementations of Liveness and UninitializedVariables, I ended up struggling a lot to comprehend the former, so this serves as a documentation for myself but written for anyone, and I welcome anyone to find errors in my understanding.

---=== 1. What is liveness analysis? ===---

Suppose statement i assigns a value to x . If statement j has x as an operand, and control can flow from statement i to j along a path that has no intervening assignments to x , then we say statement j uses the value of x computed at statement i. We further say that x is live at statement i. [5]

This can be used to spot, for example, to spot dead stores (storing a value into a variable, but not using it anywhere).

In practice, similarly to other data flow algorithms, we calculate an in-state and out-state for each basic block in a control flow graph. The in-state of a block is the set of variables that are live at the start of the block. Its out-state is the set of variables that are live at the end of it. The out-state is the union of the in-states of the block's successors. [8]

We can extend the above described live variable analysis to include values, this is the so called live expressions analysis [2]. Expressions live only within the containing statement -- for instance, argument expressions in a call expressions are live until the end of the call expression itself. If the expression isn't immediately read, it does not live in any context.

---=== 2. Examples ===---

a.) The following example shows the LIVEin and LIVEout sets (after, not during calculation): [6]

// in: {}
b1: a = 3;
    b = 5;
    d = 4;
    x = 100; //x is never being used later thus not in the out set {a,b,d}
    if a > b then
// out: {a,b,d}    //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d}  

// in: {a,b}
b2: c = a + b;
    d = 2;
// out: {b,d}

// in: {b,d}
b3: endif
    c = 4;
    return b * d + c;
// out:{}

b.) Multiple kills of a variable in a single block:

int a = 5; // 'a' is written, can liveness tell whether 'a' is live?
a = 6; // 'a' is written
use(6); // 'a' is used

c.) Statement that spans multiple blocks:

x ? y : z

d.) Expression liveness example:

int getInt();
int getInt2();
void foo(int, int);
void test() {
  foo(getInt(), getInt2());
}

Contents of test()'s main CFGBlock:

1: foo
2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int, int))
3: getInt
4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
5: [B1.4]()
6: getInt2
7: [B1.6] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
8: [B1.7]()
9: [B1.2]([B1.5], [B1.8])

Note how, as one would expect, the order of evaluation can be clearly seen in the elements of the block on a rather fine level.

The relevant (and somewhat simplified) AST:

`-CallExpr 'void'
  |-ImplicitCastExpr 'void (*)(int, int)'
  | `-DeclRefExpr 'foo' 'void (int, int)'
  |-CallExpr 'int'
  | `-ImplicitCastExpr 'int (*)()'
  |   `-DeclRefExpr 'getInt' 'int ()'
  `-CallExpr 'int'
    `-ImplicitCastExpr 'int (*)()'
      `-DeclRefExpr 'getInt2' 'int ()'

---=== 3. How it works in Clang ===---

a.) General

Since liveness analysis is a backwards may analysis [8], each CFGBlock in the CFG is visited with a worklist that always enqueues predecessors. It runs a fix-point algorithm, calculating the set of in- and outgoing live expressions/variables until those sets change no more.

Calculating liveness sets for only blocks isn't granular enough, as a single block may contain multiple kills (example b.). This demands that these LIVEin and LIVEout sets are calculated for each statement. However, since a statement can span across multiple blocks, block-level liveness needs to be calculated as well (example c.).

Contrary to the Wikipedia algorithm [8] describes, there are no established GEN and KILL sets, nor similar constructs like a symbol table as described in the dragon book [5]. Instead, determining which statements constitutes as variable/expression kill (write), or GENeration (use/read) is done in the same step as the calculation of the LIVEin and LIVEout sets.

For each block in the worklist, we traverse its CFGElements backwards and create a TransferFunctions object, which is a statement visitor under the hood.  For each element in this block that is a statement (CFGStmt), we visit the statement with said TranserFunctions. We always copy the liveness set from the previous statement to the next. Wiki reads,

"The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live."[8]

b.) Calculating expression liveness for a single CFGBlock

Since we also calculate live expressions however, we have to also account for them in addition. If the statement is an Expr, it must also be the kill of that Expr (it is the only write of that expression, after all), so we remove it from the live expressions set. Moving on, we branch out -- depending on the type of the statement, we mark things like the condition of for/if/while/do statements live. 

We also employ a heuristic to generalize this task -- we mark all child expressions live as well. Demonstrating that on the Example d.):

* [B1.9] (Block 1, Element 9): Remove [B1.9] (as it is the CallExpr to foo) from [B1.9]'s liveness set, add its children, the arguments, [B1.5] and [B1.8], and the function to be called, [B1.2] (mind that this could be a function pointer as well!).
* [B1.8]: Remove [B1.8] (as it is the CallExpr to getInt2) from [B1.8]'s liveness set, add its child, [B1.7].
* [B1.7]: Remove [B1.7] (as it is also an Expr, an ImplicitCastExpr) from [B1.7]'s liveness set, add its child, [B1.6].
* [B1.6]: Remove [B1.6] (as it is a DeclRefExpr to getInt2) from [B1.6]'s liveness set, and since it has no more children, we're done here.
* [B1.5] -- [B1.3] same as [B1.8] --[B1.6].
* [B1.2]: Remove [B1.2] (as it is also an ImplicitCastExpr of function foo) from [B1.7]'s liveness set, add its child, [B1.1].
* [B1.1]: Remove [B1.1] (as it is the DeclRefExpr to function foo) from [B1.1]'s liveness set, and we're done.

This leaves the following live expression sets (mind that due to the linearity of the CFG, this is the final result):
* [B1.9]: [B1.8], [B1.5], [B1.2]
* [B1.8]: [B1.5], [B1.2], [B1.7]
* [B1.7]: [B1.5], [B1.2], [B1.6]
* [B1.6]: [B1.5], [B1.2]
* [B1.5]: [B1.2], [B1.4]
* [B1.4]: [B1.2], [B1.3],
* [B1.3]: [B1.2]
* [B1.2]: [B1.1]
* [B1.1]: {}

Indeed, this accurately represents the liveness of these expressions -- the function to be called, and both arguments must live until the end of the call expression.

c.) Calculating variable liveness for a single CFGBlock

Reads or usages of variables can be found in many contexts, namely when the statement visitor encounters a DeclRefExpr, DeclStmt, BlockExpr, CFGAutomaticObjDtor, or ObjCMessageExpr. What is different compared to expression live set calculation, are kills, or in other words, the removal of variables from liveness sets -- writes happen on either assignments or their definitions. Mind that not all assignments are kills -- operators such as += also read the variable.

In general, understanding live variables calculation follows the literature more closely, and in my experience, demands far less time to understand.

d.) Transfer in between blocks

The LIVEin set of a CFGBlock will always be the liveness set (both for variables and expressions) of the corresponding CFGElement no1. Next, as the algorithm states, we enqueue all predecessor nodes to the worklist. Processing the next block in the worklist starts with calculating its LIVEout set with the union of all liveness sets of successor blocks.

e.) Relaxed liveness analysis

LiveVariables presents the option to conduct a relaxed analysis, no longer regarding assignments as kills. This is desirable when we're not interested in a liveness of a value held by a variable, but rather the lifetime of a variable. As such, the analyzer uses relaxed liveness analysis when inquiring about variables for the most part with the exception of DeadStoresChecker. Even debug.DumpLiveStms dumps relaxed sets, but debug.DumpLiveVars uses the non-relaxed version.

d.) Miscellaneous notes

Live bindings[9] work very similarly to variables, but have their own distinct liveness set.

There is currently a hack in the analyzer where we assign value to a statement, and thus Liveness calculates live statements, not expressions [2], https://reviews.llvm.org/D82598#2171312.

There is a corner case for CFGAutomaticObjDtor, which isn't a CFGStmt, but is still handled, as it marks its associated variable live.

A notable bug when we heuristic of adding all child expressions (or, as of writing this mail, still statements) didn't work is described in [3].

---=== 3. How the static analyzer uses it ===---

I'll detail this in a followup mail when and if I get to it. I guess the most interesting and non-trivial usage of liveness is found in DeadStoresChecker, which is definitely a good learning example of dataflow analysis being incorporated into symbolic execution.

If that mail doesn't happen soon enough, [1], [4], [7] provide great further reading on a related problem, zombie symbols, which is strongly related to liveness.

Have a great one!
Husi

[2] [analyzer][Liveness][NFC] Get rid of statement liveness, because such a thing doesn't exist https://reviews.llvm.org/D82598
[3] [analyzer] LiveVariables: Fix a zombie expression problem, add testing infrastructure. https://reviews.llvm.org/D55566
[4] [analyzer] Fix the "Zombie symbols" issue. https://reviews.llvm.org/D18860
[5] Aho, A., Lam, M., Sethi, R., & Ullman, J. (2007). Compilers: Principles, Techniques and Tools, 2nd Edition.
[6] Live variable analysis, Wikipedia, second example https://en.wikipedia.org/wiki/Live_variable_analysis#Second_example
[7] Dergachev, A. (2016). Clang Static Analyzer: A Checker Developer’s Guide.(2016).
[8] Live variable analysis, Wikipedia, second example https://en.wikipedia.org/wiki/Live_variable_analysis

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [analyzer] An in-depth look at Liveness Analysis in the Static Analyzer

Keane, Erich via cfe-dev
Kristóf,

This is really useful for ClangSA developers. I am wondering if this should be placed somewhere in the official documentation?

> ---=== 3. How the static analyzer uses it ===---
> I'll detail this in a followup mail when and if I get to it.

Can't wait to read that! Perhaps that should be part of the above mentioned docs too :)

Gábor

On Tue, Jul 28, 2020 at 3:17 AM Kristóf Umann <[hidden email]> wrote:
Hi!

This mail will take a look at LiveVariables, which in reality is a live values analysis implemented by clang. Written about 2 months after the clang/ directory itself was created, it plays a crucial rule in the Clang Static Analyzer to this day. Despite that, I managed to find only 2 functional changes to it in the last 8 years, so we generally consider it to be stable.

As I wanted to merge the implementations of Liveness and UninitializedVariables, I ended up struggling a lot to comprehend the former, so this serves as a documentation for myself but written for anyone, and I welcome anyone to find errors in my understanding.

---=== 1. What is liveness analysis? ===---

Suppose statement i assigns a value to x . If statement j has x as an operand, and control can flow from statement i to j along a path that has no intervening assignments to x , then we say statement j uses the value of x computed at statement i. We further say that x is live at statement i. [5]

This can be used to spot, for example, to spot dead stores (storing a value into a variable, but not using it anywhere).

In practice, similarly to other data flow algorithms, we calculate an in-state and out-state for each basic block in a control flow graph. The in-state of a block is the set of variables that are live at the start of the block. Its out-state is the set of variables that are live at the end of it. The out-state is the union of the in-states of the block's successors. [8]

We can extend the above described live variable analysis to include values, this is the so called live expressions analysis [2]. Expressions live only within the containing statement -- for instance, argument expressions in a call expressions are live until the end of the call expression itself. If the expression isn't immediately read, it does not live in any context.

---=== 2. Examples ===---

a.) The following example shows the LIVEin and LIVEout sets (after, not during calculation): [6]

// in: {}
b1: a = 3;
    b = 5;
    d = 4;
    x = 100; //x is never being used later thus not in the out set {a,b,d}
    if a > b then
// out: {a,b,d}    //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d}  

// in: {a,b}
b2: c = a + b;
    d = 2;
// out: {b,d}

// in: {b,d}
b3: endif
    c = 4;
    return b * d + c;
// out:{}

b.) Multiple kills of a variable in a single block:

int a = 5; // 'a' is written, can liveness tell whether 'a' is live?
a = 6; // 'a' is written
use(6); // 'a' is used

c.) Statement that spans multiple blocks:

x ? y : z

d.) Expression liveness example:

int getInt();
int getInt2();
void foo(int, int);
void test() {
  foo(getInt(), getInt2());
}

Contents of test()'s main CFGBlock:

1: foo
2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int, int))
3: getInt
4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
5: [B1.4]()
6: getInt2
7: [B1.6] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
8: [B1.7]()
9: [B1.2]([B1.5], [B1.8])

Note how, as one would expect, the order of evaluation can be clearly seen in the elements of the block on a rather fine level.

The relevant (and somewhat simplified) AST:

`-CallExpr 'void'
  |-ImplicitCastExpr 'void (*)(int, int)'
  | `-DeclRefExpr 'foo' 'void (int, int)'
  |-CallExpr 'int'
  | `-ImplicitCastExpr 'int (*)()'
  |   `-DeclRefExpr 'getInt' 'int ()'
  `-CallExpr 'int'
    `-ImplicitCastExpr 'int (*)()'
      `-DeclRefExpr 'getInt2' 'int ()'

---=== 3. How it works in Clang ===---

a.) General

Since liveness analysis is a backwards may analysis [8], each CFGBlock in the CFG is visited with a worklist that always enqueues predecessors. It runs a fix-point algorithm, calculating the set of in- and outgoing live expressions/variables until those sets change no more.

Calculating liveness sets for only blocks isn't granular enough, as a single block may contain multiple kills (example b.). This demands that these LIVEin and LIVEout sets are calculated for each statement. However, since a statement can span across multiple blocks, block-level liveness needs to be calculated as well (example c.).

Contrary to the Wikipedia algorithm [8] describes, there are no established GEN and KILL sets, nor similar constructs like a symbol table as described in the dragon book [5]. Instead, determining which statements constitutes as variable/expression kill (write), or GENeration (use/read) is done in the same step as the calculation of the LIVEin and LIVEout sets.

For each block in the worklist, we traverse its CFGElements backwards and create a TransferFunctions object, which is a statement visitor under the hood.  For each element in this block that is a statement (CFGStmt), we visit the statement with said TranserFunctions. We always copy the liveness set from the previous statement to the next. Wiki reads,

"The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live."[8]

b.) Calculating expression liveness for a single CFGBlock

Since we also calculate live expressions however, we have to also account for them in addition. If the statement is an Expr, it must also be the kill of that Expr (it is the only write of that expression, after all), so we remove it from the live expressions set. Moving on, we branch out -- depending on the type of the statement, we mark things like the condition of for/if/while/do statements live. 

We also employ a heuristic to generalize this task -- we mark all child expressions live as well. Demonstrating that on the Example d.):

* [B1.9] (Block 1, Element 9): Remove [B1.9] (as it is the CallExpr to foo) from [B1.9]'s liveness set, add its children, the arguments, [B1.5] and [B1.8], and the function to be called, [B1.2] (mind that this could be a function pointer as well!).
* [B1.8]: Remove [B1.8] (as it is the CallExpr to getInt2) from [B1.8]'s liveness set, add its child, [B1.7].
* [B1.7]: Remove [B1.7] (as it is also an Expr, an ImplicitCastExpr) from [B1.7]'s liveness set, add its child, [B1.6].
* [B1.6]: Remove [B1.6] (as it is a DeclRefExpr to getInt2) from [B1.6]'s liveness set, and since it has no more children, we're done here.
* [B1.5] -- [B1.3] same as [B1.8] --[B1.6].
* [B1.2]: Remove [B1.2] (as it is also an ImplicitCastExpr of function foo) from [B1.7]'s liveness set, add its child, [B1.1].
* [B1.1]: Remove [B1.1] (as it is the DeclRefExpr to function foo) from [B1.1]'s liveness set, and we're done.

This leaves the following live expression sets (mind that due to the linearity of the CFG, this is the final result):
* [B1.9]: [B1.8], [B1.5], [B1.2]
* [B1.8]: [B1.5], [B1.2], [B1.7]
* [B1.7]: [B1.5], [B1.2], [B1.6]
* [B1.6]: [B1.5], [B1.2]
* [B1.5]: [B1.2], [B1.4]
* [B1.4]: [B1.2], [B1.3],
* [B1.3]: [B1.2]
* [B1.2]: [B1.1]
* [B1.1]: {}

Indeed, this accurately represents the liveness of these expressions -- the function to be called, and both arguments must live until the end of the call expression.

c.) Calculating variable liveness for a single CFGBlock

Reads or usages of variables can be found in many contexts, namely when the statement visitor encounters a DeclRefExpr, DeclStmt, BlockExpr, CFGAutomaticObjDtor, or ObjCMessageExpr. What is different compared to expression live set calculation, are kills, or in other words, the removal of variables from liveness sets -- writes happen on either assignments or their definitions. Mind that not all assignments are kills -- operators such as += also read the variable.

In general, understanding live variables calculation follows the literature more closely, and in my experience, demands far less time to understand.

d.) Transfer in between blocks

The LIVEin set of a CFGBlock will always be the liveness set (both for variables and expressions) of the corresponding CFGElement no1. Next, as the algorithm states, we enqueue all predecessor nodes to the worklist. Processing the next block in the worklist starts with calculating its LIVEout set with the union of all liveness sets of successor blocks.

e.) Relaxed liveness analysis

LiveVariables presents the option to conduct a relaxed analysis, no longer regarding assignments as kills. This is desirable when we're not interested in a liveness of a value held by a variable, but rather the lifetime of a variable. As such, the analyzer uses relaxed liveness analysis when inquiring about variables for the most part with the exception of DeadStoresChecker. Even debug.DumpLiveStms dumps relaxed sets, but debug.DumpLiveVars uses the non-relaxed version.

d.) Miscellaneous notes

Live bindings[9] work very similarly to variables, but have their own distinct liveness set.

There is currently a hack in the analyzer where we assign value to a statement, and thus Liveness calculates live statements, not expressions [2], https://reviews.llvm.org/D82598#2171312.

There is a corner case for CFGAutomaticObjDtor, which isn't a CFGStmt, but is still handled, as it marks its associated variable live.

A notable bug when we heuristic of adding all child expressions (or, as of writing this mail, still statements) didn't work is described in [3].

---=== 3. How the static analyzer uses it ===---

I'll detail this in a followup mail when and if I get to it. I guess the most interesting and non-trivial usage of liveness is found in DeadStoresChecker, which is definitely a good learning example of dataflow analysis being incorporated into symbolic execution.

If that mail doesn't happen soon enough, [1], [4], [7] provide great further reading on a related problem, zombie symbols, which is strongly related to liveness.

Have a great one!
Husi

[2] [analyzer][Liveness][NFC] Get rid of statement liveness, because such a thing doesn't exist https://reviews.llvm.org/D82598
[3] [analyzer] LiveVariables: Fix a zombie expression problem, add testing infrastructure. https://reviews.llvm.org/D55566
[4] [analyzer] Fix the "Zombie symbols" issue. https://reviews.llvm.org/D18860
[5] Aho, A., Lam, M., Sethi, R., & Ullman, J. (2007). Compilers: Principles, Techniques and Tools, 2nd Edition.
[6] Live variable analysis, Wikipedia, second example https://en.wikipedia.org/wiki/Live_variable_analysis#Second_example
[7] Dergachev, A. (2016). Clang Static Analyzer: A Checker Developer’s Guide.(2016).
[8] Live variable analysis, Wikipedia, second example https://en.wikipedia.org/wiki/Live_variable_analysis

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev