probable bug in clang source based coverage

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

probable bug in clang source based coverage

Boris Kolpackov via cfe-dev
All,

I am parsing llvm-cov exports (json files) and I was surprised by some
of the outputs.
I was able to narrow it to the below example:

the source file:

int main (int argc, char* argv[])
{
   if ((argc > 1) && (argc < 10))
   {
   }
}

the output:
https://gist.github.com/meahow/84f39ed625d138b42ee1b9cde7613b6a

The confusing part is in segments section:

"segments": [
   [2,1,1,1,1],
   [3,8,1,1,1],
   [3,22,0,1,1],
   [3,33,1,1,0],
   [4,4,0,1,1],
   [5,5,1,1,0],
   [6,2,0,0,0]
],


It looks like there is unclosed region produced.
There are two opening marks in line 3 (cols 8 and 22) and one closing.
I believe that there should be another closing one in col 17.

It looks like this behavior is easily reproducible with multi
statement conditions in if expression like this:

if ( (a > 1) && (a < 5))
{}

Can someone confirm that this is a bug or if I'm not getting it right ?


regards,
Michał Pszona
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: probable bug in clang source based coverage

Boris Kolpackov via cfe-dev
I have also found very similar output for switch..case and if..else if
expressions.
regards,
Michał Pszona


On Mon, Mar 20, 2017 at 1:37 PM, Michał Pszona <[hidden email]> wrote:

> All,
>
> I am parsing llvm-cov exports (json files) and I was surprised by some
> of the outputs.
> I was able to narrow it to the below example:
>
> the source file:
>
> int main (int argc, char* argv[])
> {
>    if ((argc > 1) && (argc < 10))
>    {
>    }
> }
>
> the output:
> https://gist.github.com/meahow/84f39ed625d138b42ee1b9cde7613b6a
>
> The confusing part is in segments section:
>
> "segments": [
>    [2,1,1,1,1],
>    [3,8,1,1,1],
>    [3,22,0,1,1],
>    [3,33,1,1,0],
>    [4,4,0,1,1],
>    [5,5,1,1,0],
>    [6,2,0,0,0]
> ],
>
>
> It looks like there is unclosed region produced.
> There are two opening marks in line 3 (cols 8 and 22) and one closing.
> I believe that there should be another closing one in col 17.
>
> It looks like this behavior is easily reproducible with multi
> statement conditions in if expression like this:
>
> if ( (a > 1) && (a < 5))
> {}
>
> Can someone confirm that this is a bug or if I'm not getting it right ?
>
>
> regards,
> Michał Pszona
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: probable bug in clang source based coverage

Boris Kolpackov via cfe-dev
In reply to this post by Boris Kolpackov via cfe-dev
The '&&' in your example belongs to the source region corresponding to the if condition. It is always 'reached' before '(argc > 1)' is reached, so the segment contains 'argc > 1) && ('. A new segment is pushed for 'argc < 10', because it isn't always reached when the '&&' is reached.

This is why the current implementation does not emit a new segment just for the ' && ' snippet: it would be redundant.

Also, there are no 'closing' segments, only new ones. Maybe this is just a terminology mix up.

Let me know if the segments should be made more precise in any other way. If you have access to bugs.llvm.org, I'd appreciate any bug reports there (please add llvm-cov to the title).

best,
vedant

> On Mar 20, 2017, at 5:37 AM, Michał Pszona via cfe-dev <[hidden email]> wrote:
>
> All,
>
> I am parsing llvm-cov exports (json files) and I was surprised by some
> of the outputs.
> I was able to narrow it to the below example:
>
> the source file:
>
> int main (int argc, char* argv[])
> {
>   if ((argc > 1) && (argc < 10))
>   {
>   }
> }
>
> the output:
> https://gist.github.com/meahow/84f39ed625d138b42ee1b9cde7613b6a
>
> The confusing part is in segments section:
>
> "segments": [
>   [2,1,1,1,1],
>   [3,8,1,1,1],
>   [3,22,0,1,1],
>   [3,33,1,1,0],
>   [4,4,0,1,1],
>   [5,5,1,1,0],
>   [6,2,0,0,0]
> ],
>
>
> It looks like there is unclosed region produced.
> There are two opening marks in line 3 (cols 8 and 22) and one closing.
> I believe that there should be another closing one in col 17.
>
> It looks like this behavior is easily reproducible with multi
> statement conditions in if expression like this:
>
> if ( (a > 1) && (a < 5))
> {}
>
> Can someone confirm that this is a bug or if I'm not getting it right ?
>
>
> regards,
> Michał Pszona
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: probable bug in clang source based coverage

Boris Kolpackov via cfe-dev
Vedant,
Thanks for your reply.

I would submit a bug, if I would be sure this actually is a bug :)

So let me clear my understanding of the segments.
(This is based solely on this use of Segment type here
https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/tools/llvm-cov/CoverageExporterJson.cpp;298288$285
and llvm mapping format doc:
http://llvm.org/docs/CoverageMappingFormat.html
If you could point me to a better/more precise documentation, it would
be much appreciated)

In the file above, there are 2 segments that are clear:
[start_line, start_col => end_line, end_col]
[2,1 => 6, 2] and [4, 4 => 5, 5]
covering both the function and the if's body scopes.
This leaves us with the following list:
   [3,8,1,1,1],
   [3,22,0,1,1],
   [3,33,1,1,0]

My understanding is, that every segment, that has IsRegionEntry set
should be a region opening, whereas if it's unset, it is closing the
last open most-inner region, so:
   [3,8,1,1,1] means we "open" new region at position 3,8 and
   [3,22,0,1,1] means we "open" a nested region at position 3,22.
Now
   [3,33,1,1,0] would mean that at position 3,33 we are closing the
last open region, that is, the one open at 3,22.
but in this case we are missing the "closing mark" for region open at
position 3,8.

If the above is incorrect, please let me know how could I parse the
segments from json file to have source code properly annotated with
regions.
So far I am using a recursive function keeping track of the last
"unclosed" segment at the current level and recursing whenever there
is another one opening with the last one not closed.


I hope this is clear enough :)
regards,
Michał Pszona


On Mon, Mar 20, 2017 at 7:11 PM, Vedant Kumar <[hidden email]> wrote:

> The '&&' in your example belongs to the source region corresponding to the if condition. It is always 'reached' before '(argc > 1)' is reached, so the segment contains 'argc > 1) && ('. A new segment is pushed for 'argc < 10', because it isn't always reached when the '&&' is reached.
>
> This is why the current implementation does not emit a new segment just for the ' && ' snippet: it would be redundant.
>
> Also, there are no 'closing' segments, only new ones. Maybe this is just a terminology mix up.
>
> Let me know if the segments should be made more precise in any other way. If you have access to bugs.llvm.org, I'd appreciate any bug reports there (please add llvm-cov to the title).
>
> best,
> vedant
>
>> On Mar 20, 2017, at 5:37 AM, Michał Pszona via cfe-dev <[hidden email]> wrote:
>>
>> All,
>>
>> I am parsing llvm-cov exports (json files) and I was surprised by some
>> of the outputs.
>> I was able to narrow it to the below example:
>>
>> the source file:
>>
>> int main (int argc, char* argv[])
>> {
>>   if ((argc > 1) && (argc < 10))
>>   {
>>   }
>> }
>>
>> the output:
>> https://gist.github.com/meahow/84f39ed625d138b42ee1b9cde7613b6a
>>
>> The confusing part is in segments section:
>>
>> "segments": [
>>   [2,1,1,1,1],
>>   [3,8,1,1,1],
>>   [3,22,0,1,1],
>>   [3,33,1,1,0],
>>   [4,4,0,1,1],
>>   [5,5,1,1,0],
>>   [6,2,0,0,0]
>> ],
>>
>>
>> It looks like there is unclosed region produced.
>> There are two opening marks in line 3 (cols 8 and 22) and one closing.
>> I believe that there should be another closing one in col 17.
>>
>> It looks like this behavior is easily reproducible with multi
>> statement conditions in if expression like this:
>>
>> if ( (a > 1) && (a < 5))
>> {}
>>
>> Can someone confirm that this is a bug or if I'm not getting it right ?
>>
>>
>> regards,
>> Michał Pszona
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: probable bug in clang source based coverage

Boris Kolpackov via cfe-dev
Hi,


> In the file above, there are 2 segments that are clear:
> [start_line, start_col => end_line, end_col]
> [2,1 => 6, 2] and [4, 4 => 5, 5]
> covering both the function and the if's body scopes.
> This leaves us with the following list:
>   [3,8,1,1,1],
>   [3,22,0,1,1],
>   [3,33,1,1,0]
>
> My understanding is, that every segment, that has IsRegionEntry set
> should be a region opening, whereas if it's unset, it is closing the
> last open most-inner region, so:
>   [3,8,1,1,1] means we "open" new region at position 3,8 and
>   [3,22,0,1,1] means we "open" a nested region at position 3,22.

The `IsRegionEntry` field is set when a segment corresponds to a source region with a fresh profiling counter. Otherwise it is not set. I don't think it will be helpful to think of segments as "closing" a region.

Some examples:

- We need a fresh profiling counter for "argc > 1", because its execution count cannot be determined any other way. So `IsRegionEntry` is set.
- The same for "argc < 10".
- The final segment on line 3 is different, and does not have `IsRegionEntry` set. That's because its execution count can be derived from existing counters.

The `IsRegionEntry` field is just used as a hint by llvm-cov to compute better line execution counts, and to determine when it is appropriate to create tooltips for regions within a line.


> Now
>   [3,33,1,1,0] would mean that at position 3,33 we are closing the
> last open region, that is, the one open at 3,22.
> but in this case we are missing the "closing mark" for region open at
> position 3,8.

If you choose to think of segments as 'closing' a region, consider that one may 'close' multiple regions.


> If the above is incorrect, please let me know how could I parse the
> segments from json file to have source code properly annotated with
> regions.

The design goal for the segment information was to "do all the work" of figuring out which execution counts correspond to which snippets of text. If you look at the SourceCoverageViewHTML::renderLine function in llvm-cov, you can see an example of how this is done (hopefully the comments help).

What kind of source annotations are you interested in? If it's similar to something llvm-cov does, I can provide more details about how it works.


> So far I am using a recursive function keeping track of the last
> "unclosed" segment at the current level and recursing whenever there
> is another one opening with the last one not closed.

It sounds like you may be running into issues with the lack of explicit "end-of-segment" information in the CoverageSegment data structure. That information is implicit in the order in which the segments are presented.

I suggest breaking each line into snippets of text, where each snippet corresponds to exactly one CoverageSegment. This is what SourceCoverageViewHTML::renderLine does.


> I hope this is clear enough :)

Thanks for your interest! Please keep us updated on how your tool develops.

vedant

> regards,
> Michał Pszona
>
>
> On Mon, Mar 20, 2017 at 7:11 PM, Vedant Kumar <[hidden email]> wrote:
>> The '&&' in your example belongs to the source region corresponding to the if condition. It is always 'reached' before '(argc > 1)' is reached, so the segment contains 'argc > 1) && ('. A new segment is pushed for 'argc < 10', because it isn't always reached when the '&&' is reached.
>>
>> This is why the current implementation does not emit a new segment just for the ' && ' snippet: it would be redundant.
>>
>> Also, there are no 'closing' segments, only new ones. Maybe this is just a terminology mix up.
>>
>> Let me know if the segments should be made more precise in any other way. If you have access to bugs.llvm.org, I'd appreciate any bug reports there (please add llvm-cov to the title).
>>
>> best,
>> vedant
>>
>>> On Mar 20, 2017, at 5:37 AM, Michał Pszona via cfe-dev <[hidden email]> wrote:
>>>
>>> All,
>>>
>>> I am parsing llvm-cov exports (json files) and I was surprised by some
>>> of the outputs.
>>> I was able to narrow it to the below example:
>>>
>>> the source file:
>>>
>>> int main (int argc, char* argv[])
>>> {
>>>  if ((argc > 1) && (argc < 10))
>>>  {
>>>  }
>>> }
>>>
>>> the output:
>>> https://gist.github.com/meahow/84f39ed625d138b42ee1b9cde7613b6a
>>>
>>> The confusing part is in segments section:
>>>
>>> "segments": [
>>>  [2,1,1,1,1],
>>>  [3,8,1,1,1],
>>>  [3,22,0,1,1],
>>>  [3,33,1,1,0],
>>>  [4,4,0,1,1],
>>>  [5,5,1,1,0],
>>>  [6,2,0,0,0]
>>> ],
>>>
>>>
>>> It looks like there is unclosed region produced.
>>> There are two opening marks in line 3 (cols 8 and 22) and one closing.
>>> I believe that there should be another closing one in col 17.
>>>
>>> It looks like this behavior is easily reproducible with multi
>>> statement conditions in if expression like this:
>>>
>>> if ( (a > 1) && (a < 5))
>>> {}
>>>
>>> Can someone confirm that this is a bug or if I'm not getting it right ?
>>>
>>>
>>> regards,
>>> Michał Pszona
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: probable bug in clang source based coverage

Boris Kolpackov via cfe-dev
Vedant,

First of all, thank you so much for your help!
It took me some time to wrap my mind around this and actually
realized, that the information in the segments array is more verbose
than I was expecting :)
I was assuming this something similar to what regions array in the
function entry is.

>
> The `IsRegionEntry` field is just used as a hint by llvm-cov to compute better line execution counts, and to determine when it is appropriate to create tooltips for regions within a line.
>

This might be confusing as my tool can't figure out which region
should the counters be inherited from.

>
> If you choose to think of segments as 'closing' a region, consider that one may 'close' multiple regions.
>

Again, I can't be sure which segments are being closed and which
should still be open.

>
> What kind of source annotations are you interested in? If it's similar to something llvm-cov does, I can provide more details about how it works.
>

Yes, it is basically the same job as SourceCoverageViewHTML class
does, but it operates on the exported json data.
The tool I'm developing is aggregating the data from llvm-cov's export
command from different executables and runs.
Thus getting summary of all the files used in several binaries, but on
one codebase.
Maybe there is a better way to do this or you know any existing tool
that can do it for me ?
As far as I was able to figure out, it would be a bit problematic to
add it to llvm-cov because of the way the coverage data is defined.

So the way I understand it so far is as follows:
The positions in the segments array are actually markers for the code
"colorization" (based on hit count)
Wherever a new marker is set, it is valid for the rest of the line and
all the lines until a new marker is set.
The code can be marked as (based only on the current marker):
   - covered when the hasCount field is set and Count is greater than 0.
   - uncovered when hasCount field is set but count is 0
   - skipped/uninstrumented otherwise (hasCount should be 0)
As far as I can see, the IsRegionCount field is only used to determine
if you need to colorize multiple regions in one line and produce a
tooltip.


Can you confirm this is correct?


regards,
Michał Pszona
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: probable bug in clang source based coverage

Boris Kolpackov via cfe-dev

> On Mar 23, 2017, at 8:35 AM, Michał Pszona <[hidden email]> wrote:
>
> Vedant,
>
> First of all, thank you so much for your help!
> It took me some time to wrap my mind around this and actually
> realized, that the information in the segments array is more verbose
> than I was expecting :)
> I was assuming this something similar to what regions array in the
> function entry is.
>
>>
>> The `IsRegionEntry` field is just used as a hint by llvm-cov to compute better line execution counts, and to determine when it is appropriate to create tooltips for regions within a line.
>>
>
> This might be confusing as my tool can't figure out which region
> should the counters be inherited from.

llvm-cov doesn't try to evaluate counter expressions either.

Let me frame the problem: there may be multiple profiling regions that start on a given line, and we want to find the "most interesting one", that can be used to get the best line execution count.

`IsRegionEntry` provides a heuristic for deciding which region counts are interesting.


>> What kind of source annotations are you interested in? If it's similar to something llvm-cov does, I can provide more details about how it works.
>>
>
> Yes, it is basically the same job as SourceCoverageViewHTML class
> does, but it operates on the exported json data.
> The tool I'm developing is aggregating the data from llvm-cov's export
> command from different executables and runs.

If you want to aggregate coverage reports from multiple executables, you can use llvm-cov's "-object" flag to pass in multiple binaries.

You may be interested in this helper script, which has an example usage of the "-object" option:

  https://github.com/llvm-mirror/llvm/blob/master/utils/prepare-code-coverage-artifact.py


> So the way I understand it so far is as follows:
> The positions in the segments array are actually markers for the code
> "colorization" (based on hit count)
> Wherever a new marker is set, it is valid for the rest of the line and
> all the lines until a new marker is set.
> The code can be marked as (based only on the current marker):
>   - covered when the hasCount field is set and Count is greater than 0.
>   - uncovered when hasCount field is set but count is 0
>   - skipped/uninstrumented otherwise (hasCount should be 0)
> As far as I can see, the IsRegionCount field is only used to determine
> if you need to colorize multiple regions in one line and produce a
> tooltip.
>
> Can you confirm this is correct?

This looks correct.

vedant
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev