

Hi!
This mail will take a look at LiveVariables, which in reality is a live values analysis implemented by clang. Written about 2 months after the clang/ directory itself was created, it plays a crucial rule in the Clang Static Analyzer to this day. Despite that, I managed to find only 2 functional changes to it in the last 8 years, so we generally consider it to be stable.
As I wanted to merge the implementations of Liveness and UninitializedVariables, I ended up struggling a lot to comprehend the former, so this serves as a documentation for myself but written for anyone, and I welcome anyone to find errors in my understanding.
=== 1. What is liveness analysis? ===
Suppose statement i assigns a value to x . If statement j has x as an operand, and control can flow from statement i to j along a path that has no intervening assignments to x , then we say statement j uses the value of x computed at statement i. We further say that x is live at statement i. [5]
This can be used to spot, for example, to spot dead stores (storing a value into a variable, but not using it anywhere).
In practice, similarly to other data flow algorithms, we calculate an instate and outstate for each basic block in a control flow graph. The instate of a block is the set of variables that are live at the start of the block. Its outstate is the set of variables that are live at the end of it. The outstate is the union of the instates of the block's successors. [8]
We can extend the above described live variable analysis to include values, this is the so called live expressions analysis [2]. Expressions live only within the containing statement  for instance, argument expressions in a call expressions are live until the end of the call expression itself. If the expression isn't immediately read, it does not live in any context.
=== 2. Examples ===
a.) The following example shows the LIVEin and LIVEout sets (after, not during calculation): [6]
// in: {} b1: a = 3; b = 5; d = 4; x = 100; //x is never being used later thus not in the out set {a,b,d} if a > b then // out: {a,b,d} //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d}
// in: {a,b} b2: c = a + b; d = 2; // out: {b,d}
// in: {b,d} b3: endif c = 4; return b * d + c; // out:{}
b.) Multiple kills of a variable in a single block:
int a = 5; // 'a' is written, can liveness tell whether 'a' is live? a = 6; // 'a' is written use(6); // 'a' is used
c.) Statement that spans multiple blocks:
x ? y : z
d.) Expression liveness example:
int getInt(); int getInt2(); void foo(int, int); void test() { foo(getInt(), getInt2()); }
Contents of test()'s main CFGBlock:
1: foo 2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int, int)) 3: getInt 4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void)) 5: [B1.4]() 6: getInt2 7: [B1.6] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void)) 8: [B1.7]() 9: [B1.2]([B1.5], [B1.8])
Note how, as one would expect, the order of evaluation can be clearly seen in the elements of the block on a rather fine level.
The relevant (and somewhat simplified) AST:
`CallExpr 'void' ImplicitCastExpr 'void (*)(int, int)'  `DeclRefExpr 'foo' 'void (int, int)' CallExpr 'int'  `ImplicitCastExpr 'int (*)()'  `DeclRefExpr 'getInt' 'int ()' `CallExpr 'int' `ImplicitCastExpr 'int (*)()' `DeclRefExpr 'getInt2' 'int ()'
=== 3. How it works in Clang ===
a.) General
Since liveness analysis is a backwards may analysis [8], each CFGBlock in the CFG is visited with a worklist that always enqueues predecessors. It runs a fixpoint algorithm, calculating the set of in and outgoing live expressions/variables until those sets change no more.
Calculating liveness sets for only blocks isn't granular enough, as a single block may contain multiple kills (example b.). This demands that these LIVEin and LIVEout sets are calculated for each statement. However, since a statement can span across multiple blocks, blocklevel liveness needs to be calculated as well (example c.).
Contrary to the Wikipedia algorithm [8] describes, there are no established GEN and KILL sets, nor similar constructs like a symbol table as described in the dragon book [5]. Instead, determining which statements constitutes as variable/expression kill (write), or GENeration (use/read) is done in the same step as the calculation of the LIVEin and LIVEout sets.
For each block in the worklist, we traverse its CFGElements backwards and create a TransferFunctions object, which is a statement visitor under the hood. For each element in this block that is a statement (CFGStmt), we visit the statement with said TranserFunctions. We always copy the liveness set from the previous statement to the next. Wiki reads,
"The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live."[8]
b.) Calculating expression liveness for a single CFGBlock
Since we also calculate live expressions however, we have to also account for them in addition. If the statement is an Expr, it must also be the kill of that Expr (it is the only write of that expression, after all), so we remove it from the live expressions set. Moving on, we branch out  depending on the type of the statement, we mark things like the condition of for/if/while/do statements live.
We also employ a heuristic to generalize this task  we mark all child expressions live as well. Demonstrating that on the Example d.):
* [B1.9] (Block 1, Element 9): Remove [B1.9] (as it is the CallExpr to foo) from [B1.9]'s liveness set, add its children, the arguments, [B1.5] and [B1.8], and the function to be called, [B1.2] (mind that this could be a function pointer as well!). * [B1.8]: Remove [B1.8] (as it is the CallExpr to getInt2) from [B1.8]'s liveness set, add its child, [B1.7]. * [B1.7]: Remove [B1.7] (as it is also an Expr, an ImplicitCastExpr) from [B1.7]'s liveness set, add its child, [B1.6]. * [B1.6]: Remove [B1.6] (as it is a DeclRefExpr to getInt2) from [B1.6]'s liveness set, and since it has no more children, we're done here. * [B1.5]  [B1.3] same as [B1.8] [B1.6]. * [B1.2]: Remove [B1.2] (as it is also an ImplicitCastExpr of function foo) from [B1.7]'s liveness set, add its child, [B1.1]. * [B1.1]: Remove [B1.1] (as it is the DeclRefExpr to function foo) from [B1.1]'s liveness set, and we're done.
This leaves the following live expression sets (mind that due to the linearity of the CFG, this is the final result): * [B1.9]: [B1.8], [B1.5], [B1.2] * [B1.8]: [B1.5], [B1.2], [B1.7] * [B1.7]: [B1.5], [B1.2], [B1.6] * [B1.6]: [B1.5], [B1.2] * [B1.5]: [B1.2], [B1.4] * [B1.4]: [B1.2], [B1.3], * [B1.3]: [B1.2] * [B1.2]: [B1.1] * [B1.1]: {}
Indeed, this accurately represents the liveness of these expressions  the function to be called, and both arguments must live until the end of the call expression.
c.) Calculating variable liveness for a single CFGBlock
Reads or usages of variables can be found in many contexts, namely when the statement visitor encounters a DeclRefExpr, DeclStmt, BlockExpr, CFGAutomaticObjDtor, or ObjCMessageExpr. What is different compared to expression live set calculation, are kills, or in other words, the removal of variables from liveness sets  writes happen on either assignments or their definitions. Mind that not all assignments are kills  operators such as += also read the variable.
In general, understanding live variables calculation follows the literature more closely, and in my experience, demands far less time to understand.
d.) Transfer in between blocks
The LIVEin set of a CFGBlock will always be the liveness set (both for variables and expressions) of the corresponding CFGElement no1. Next, as the algorithm states, we enqueue all predecessor nodes to the worklist. Processing the next block in the worklist starts with calculating its LIVEout set with the union of all liveness sets of successor blocks.
e.) Relaxed liveness analysis
LiveVariables presents the option to conduct a relaxed analysis, no longer regarding assignments as kills. This is desirable when we're not interested in a liveness of a value held by a variable, but rather the lifetime of a variable. As such, the analyzer uses relaxed liveness analysis when inquiring about variables for the most part with the exception of DeadStoresChecker. Even debug.DumpLiveStms dumps relaxed sets, but debug.DumpLiveVars uses the nonrelaxed version.
d.) Miscellaneous notes
Live bindings[9] work very similarly to variables, but have their own distinct liveness set.
There is a corner case for CFGAutomaticObjDtor, which isn't a CFGStmt, but is still handled, as it marks its associated variable live.
A notable bug when we heuristic of adding all child expressions (or, as of writing this mail, still statements) didn't work is described in [3].
=== 3. How the static analyzer uses it ===
I'll detail this in a followup mail when and if I get to it. I guess the most interesting and nontrivial usage of liveness is found in DeadStoresChecker, which is definitely a good learning example of dataflow analysis being incorporated into symbolic execution.
If that mail doesn't happen soon enough, [1], [4], [7] provide great further reading on a related problem, zombie symbols, which is strongly related to liveness.
Have a great one! Husi
[5] Aho, A., Lam, M., Sethi, R., & Ullman, J. (2007). Compilers: Principles, Techniques and Tools, 2nd Edition. [7] Dergachev, A. (2016). Clang Static Analyzer: A Checker Developer’s Guide.(2016).
_______________________________________________
cfedev mailing list
[hidden email]
https://lists.llvm.org/cgibin/mailman/listinfo/cfedev


Kristóf,
This is really useful for ClangSA developers. I am wondering if this should be placed somewhere in the official documentation?
> === 3. How the static analyzer uses it === > I'll detail this in a followup mail when and if I get to it.
Can't wait to read that! Perhaps that should be part of the above mentioned docs too :)
Gábor Hi!
This mail will take a look at LiveVariables, which in reality is a live values analysis implemented by clang. Written about 2 months after the clang/ directory itself was created, it plays a crucial rule in the Clang Static Analyzer to this day. Despite that, I managed to find only 2 functional changes to it in the last 8 years, so we generally consider it to be stable.
As I wanted to merge the implementations of Liveness and UninitializedVariables, I ended up struggling a lot to comprehend the former, so this serves as a documentation for myself but written for anyone, and I welcome anyone to find errors in my understanding.
=== 1. What is liveness analysis? ===
Suppose statement i assigns a value to x . If statement j has x as an operand, and control can flow from statement i to j along a path that has no intervening assignments to x , then we say statement j uses the value of x computed at statement i. We further say that x is live at statement i. [5]
This can be used to spot, for example, to spot dead stores (storing a value into a variable, but not using it anywhere).
In practice, similarly to other data flow algorithms, we calculate an instate and outstate for each basic block in a control flow graph. The instate of a block is the set of variables that are live at the start of the block. Its outstate is the set of variables that are live at the end of it. The outstate is the union of the instates of the block's successors. [8]
We can extend the above described live variable analysis to include values, this is the so called live expressions analysis [2]. Expressions live only within the containing statement  for instance, argument expressions in a call expressions are live until the end of the call expression itself. If the expression isn't immediately read, it does not live in any context.
=== 2. Examples ===
a.) The following example shows the LIVEin and LIVEout sets (after, not during calculation): [6]
// in: {} b1: a = 3; b = 5; d = 4; x = 100; //x is never being used later thus not in the out set {a,b,d} if a > b then // out: {a,b,d} //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d}
// in: {a,b} b2: c = a + b; d = 2; // out: {b,d}
// in: {b,d} b3: endif c = 4; return b * d + c; // out:{}
b.) Multiple kills of a variable in a single block:
int a = 5; // 'a' is written, can liveness tell whether 'a' is live? a = 6; // 'a' is written use(6); // 'a' is used
c.) Statement that spans multiple blocks:
x ? y : z
d.) Expression liveness example:
int getInt(); int getInt2(); void foo(int, int); void test() { foo(getInt(), getInt2()); }
Contents of test()'s main CFGBlock:
1: foo 2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int, int)) 3: getInt 4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void)) 5: [B1.4]() 6: getInt2 7: [B1.6] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void)) 8: [B1.7]() 9: [B1.2]([B1.5], [B1.8])
Note how, as one would expect, the order of evaluation can be clearly seen in the elements of the block on a rather fine level.
The relevant (and somewhat simplified) AST:
`CallExpr 'void' ImplicitCastExpr 'void (*)(int, int)'  `DeclRefExpr 'foo' 'void (int, int)' CallExpr 'int'  `ImplicitCastExpr 'int (*)()'  `DeclRefExpr 'getInt' 'int ()' `CallExpr 'int' `ImplicitCastExpr 'int (*)()' `DeclRefExpr 'getInt2' 'int ()'
=== 3. How it works in Clang ===
a.) General
Since liveness analysis is a backwards may analysis [8], each CFGBlock in the CFG is visited with a worklist that always enqueues predecessors. It runs a fixpoint algorithm, calculating the set of in and outgoing live expressions/variables until those sets change no more.
Calculating liveness sets for only blocks isn't granular enough, as a single block may contain multiple kills (example b.). This demands that these LIVEin and LIVEout sets are calculated for each statement. However, since a statement can span across multiple blocks, blocklevel liveness needs to be calculated as well (example c.).
Contrary to the Wikipedia algorithm [8] describes, there are no established GEN and KILL sets, nor similar constructs like a symbol table as described in the dragon book [5]. Instead, determining which statements constitutes as variable/expression kill (write), or GENeration (use/read) is done in the same step as the calculation of the LIVEin and LIVEout sets.
For each block in the worklist, we traverse its CFGElements backwards and create a TransferFunctions object, which is a statement visitor under the hood. For each element in this block that is a statement (CFGStmt), we visit the statement with said TranserFunctions. We always copy the liveness set from the previous statement to the next. Wiki reads,
"The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live."[8]
b.) Calculating expression liveness for a single CFGBlock
Since we also calculate live expressions however, we have to also account for them in addition. If the statement is an Expr, it must also be the kill of that Expr (it is the only write of that expression, after all), so we remove it from the live expressions set. Moving on, we branch out  depending on the type of the statement, we mark things like the condition of for/if/while/do statements live.
We also employ a heuristic to generalize this task  we mark all child expressions live as well. Demonstrating that on the Example d.):
* [B1.9] (Block 1, Element 9): Remove [B1.9] (as it is the CallExpr to foo) from [B1.9]'s liveness set, add its children, the arguments, [B1.5] and [B1.8], and the function to be called, [B1.2] (mind that this could be a function pointer as well!). * [B1.8]: Remove [B1.8] (as it is the CallExpr to getInt2) from [B1.8]'s liveness set, add its child, [B1.7]. * [B1.7]: Remove [B1.7] (as it is also an Expr, an ImplicitCastExpr) from [B1.7]'s liveness set, add its child, [B1.6]. * [B1.6]: Remove [B1.6] (as it is a DeclRefExpr to getInt2) from [B1.6]'s liveness set, and since it has no more children, we're done here. * [B1.5]  [B1.3] same as [B1.8] [B1.6]. * [B1.2]: Remove [B1.2] (as it is also an ImplicitCastExpr of function foo) from [B1.7]'s liveness set, add its child, [B1.1]. * [B1.1]: Remove [B1.1] (as it is the DeclRefExpr to function foo) from [B1.1]'s liveness set, and we're done.
This leaves the following live expression sets (mind that due to the linearity of the CFG, this is the final result): * [B1.9]: [B1.8], [B1.5], [B1.2] * [B1.8]: [B1.5], [B1.2], [B1.7] * [B1.7]: [B1.5], [B1.2], [B1.6] * [B1.6]: [B1.5], [B1.2] * [B1.5]: [B1.2], [B1.4] * [B1.4]: [B1.2], [B1.3], * [B1.3]: [B1.2] * [B1.2]: [B1.1] * [B1.1]: {}
Indeed, this accurately represents the liveness of these expressions  the function to be called, and both arguments must live until the end of the call expression.
c.) Calculating variable liveness for a single CFGBlock
Reads or usages of variables can be found in many contexts, namely when the statement visitor encounters a DeclRefExpr, DeclStmt, BlockExpr, CFGAutomaticObjDtor, or ObjCMessageExpr. What is different compared to expression live set calculation, are kills, or in other words, the removal of variables from liveness sets  writes happen on either assignments or their definitions. Mind that not all assignments are kills  operators such as += also read the variable.
In general, understanding live variables calculation follows the literature more closely, and in my experience, demands far less time to understand.
d.) Transfer in between blocks
The LIVEin set of a CFGBlock will always be the liveness set (both for variables and expressions) of the corresponding CFGElement no1. Next, as the algorithm states, we enqueue all predecessor nodes to the worklist. Processing the next block in the worklist starts with calculating its LIVEout set with the union of all liveness sets of successor blocks.
e.) Relaxed liveness analysis
LiveVariables presents the option to conduct a relaxed analysis, no longer regarding assignments as kills. This is desirable when we're not interested in a liveness of a value held by a variable, but rather the lifetime of a variable. As such, the analyzer uses relaxed liveness analysis when inquiring about variables for the most part with the exception of DeadStoresChecker. Even debug.DumpLiveStms dumps relaxed sets, but debug.DumpLiveVars uses the nonrelaxed version.
d.) Miscellaneous notes
Live bindings[9] work very similarly to variables, but have their own distinct liveness set.
There is a corner case for CFGAutomaticObjDtor, which isn't a CFGStmt, but is still handled, as it marks its associated variable live.
A notable bug when we heuristic of adding all child expressions (or, as of writing this mail, still statements) didn't work is described in [3].
=== 3. How the static analyzer uses it ===
I'll detail this in a followup mail when and if I get to it. I guess the most interesting and nontrivial usage of liveness is found in DeadStoresChecker, which is definitely a good learning example of dataflow analysis being incorporated into symbolic execution.
If that mail doesn't happen soon enough, [1], [4], [7] provide great further reading on a related problem, zombie symbols, which is strongly related to liveness.
Have a great one! Husi
[5] Aho, A., Lam, M., Sethi, R., & Ullman, J. (2007). Compilers: Principles, Techniques and Tools, 2nd Edition. [7] Dergachev, A. (2016). Clang Static Analyzer: A Checker Developer’s Guide.(2016).
_______________________________________________
cfedev mailing list
[hidden email]
https://lists.llvm.org/cgibin/mailman/listinfo/cfedev

