RFC: Remove uninteresting debug locations at -O0

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
Getting source location information right is tricky and all about finding a balance.
Recently, I was wondering why stepping through this contrived example

   1       struct Foo {
   2         Foo *getFoo() { return this; }
   3       };
   4      
   5       int main(int argc, char **argv) {
   6         Foo *foo = new Foo();
   7         foo->getFoo()->getFoo();
   8         return 0;
   9       }

LLDB was showing the column marker as

   7         foo->getFoo()->getFoo();
             ^^^

focussing on foo instead of at the method call getFoo() that I was expecting.

In LLVM IR, this code looks like

  %1 = load %struct.Foo*, %struct.Foo** %foo, align 8, !dbg !30
  %call1 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %1), !dbg !31
  %call2 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %call1), !dbg !32

or, in x86_64 assembler:

  .loc 1 7 3 is_stmt 1 ## column_info.cpp:7:3
  movq -24(%rbp), %rdi
  .loc 1 7 8 is_stmt 0 ## column_info.cpp:7:8
  callq __ZN3Foo6getFooEv
  .loc 1 7 18 ## column_info.cpp:7:18

The spurious (7:3) location is attached to an instruction that is the load of the variable from the stack slot, fused with moving that value into the register the ABI defines for $arg0.

I’m postulating that the source location of the LLVM IR load is uninteresting and perhaps even harmful. It is uninteresting, because at -O0, the location does not refer to explicit code that the user wrote and thus causes unintuitive stepping, and with optimizations enabled, there is a high likelihood that the entire instruction is going to be eliminated because of mem2reg, so the effect on profiling should be minimal. Since the load is from an alloca, it also cannot crash under normal operation. The location is harmful, because loads (at least on a CISC instruction set) are often fused with other instructions and having conflicting locations will cause both locations to be dropped when merged.

Based on all this I would most like to assign a form of “weak” source location to loads from allocas generated by the Clang frontend, that looses against any other source location when merged. The closest thing we have to this in LLVM IR is attaching no debug location. An instruction without a debug location either belongs to the function prologue, or will inherit whatever debug location the instruction before it has. In this particular case I think that no debug location is preferable over line 0 (which is how we usually denote compiler-generated code) because we don’t want the load instruction’s source location to erase any source location it may get merged with. One thing I need to check is what happens when an instruction without a location is the first instruction in a basic block. We may need to make an exception for that case.

To summarize, I’m proposing to delete all debug locations from instructions generated in the Clang frontend for loads from allocas that are holding source variables to improve the debug experience at -O0. This will have little effect on optimized code.

Let me know what you think!
-- adrian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev


On Tue, Apr 28, 2020 at 1:57 PM Adrian Prantl <[hidden email]> wrote:
Getting source location information right is tricky and all about finding a balance.
Recently, I was wondering why stepping through this contrived example

   1       struct Foo {
   2         Foo *getFoo() { return this; }
   3       };
   4       
   5       int main(int argc, char **argv) {
   6         Foo *foo = new Foo();
   7         foo->getFoo()->getFoo();
   8         return 0;
   9       }

LLDB was showing the column marker as

   7         foo->getFoo()->getFoo();
             ^^^

focussing on foo instead of at the method call getFoo() that I was expecting.

In LLVM IR, this code looks like

  %1 = load %struct.Foo*, %struct.Foo** %foo, align 8, !dbg !30
  %call1 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %1), !dbg !31
  %call2 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %call1), !dbg !32

or, in x86_64 assembler:

  .loc 1 7 3 is_stmt 1 ## column_info.cpp:7:3
  movq -24(%rbp), %rdi
  .loc 1 7 8 is_stmt 0 ## column_info.cpp:7:8
  callq __ZN3Foo6getFooEv
  .loc 1 7 18 ## column_info.cpp:7:18

The spurious (7:3) location is attached to an instruction that is the load of the variable from the stack slot, fused with moving that value into the register the ABI defines for $arg0.

I’m postulating that the source location of the LLVM IR load is uninteresting and perhaps even harmful. It is uninteresting, because at -O0, the location does not refer to explicit code that the user wrote and thus causes unintuitive stepping, and with optimizations enabled, there is a high likelihood that the entire instruction is going to be eliminated because of mem2reg, so the effect on profiling should be minimal. Since the load is from an alloca, it also cannot crash under normal operation. The location is harmful, because loads (at least on a CISC instruction set) are often fused with other instructions and having conflicting locations will cause both locations to be dropped when merged.

Based on all this I would most like to assign a form of “weak” source location to loads from allocas generated by the Clang frontend, that looses against any other source location when merged. The closest thing we have to this in LLVM IR is attaching no debug location. An instruction without a debug location either belongs to the function prologue, or will inherit whatever debug location the instruction before it has. In this particular case I think that no debug location is preferable over line 0 (which is how we usually denote compiler-generated code) because we don’t want the load instruction’s source location to erase any source location it may get merged with. One thing I need to check is what happens when an instruction without a location is the first instruction in a basic block. We may need to make an exception for that case.

To summarize, I’m proposing to delete all debug locations from instructions generated in the Clang frontend for loads from allocas that are holding source variables to improve the debug experience at -O0. This will have little effect on optimized code.

Let me know what you think!

In general, pretty apprehensive - in terms of practical effects that come to mind, seems like this might break sanitizers that want to describe specific loads/stores (even for local memory allocations - wonder how it looks for something like MSan). There is some work on value based profiling, I think, too? But I don't know much about it & whether this would impact that.

Could you show a more motivating example? I assume when you're actually inside the getFoo call, going up one frame would give you a line/column that points to the specific getFoo function call, yes? So this is only about stepping, but that step doesn't look particularly problematic to me - perhaps it gets worse/confusing/misleading in other places?

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
In reply to this post by Phil King via cfe-dev
On Tue, 28 Apr 2020 at 22:58, Adrian Prantl via cfe-dev
<[hidden email]> wrote:
> or will inherit whatever debug location the instruction before it has

This part looks worrying to me. Would that mean, that if in your
example, I set a breakpoint on line 7, the program would stop directly
at the callq instruction? That seems like it would break the following
flow:
b 7
run
# program stops at line 7
p foo
# hmm... that value isn't right, lemme see what happens if I change it
p foo = other_foo
continue

In non-optimized builds I would expect to be able to make these kinds
of modifications to the program and things should generally "work". In
that sense, maybe these loads are not "uninteresting" as they are what
makes these modifications take effect.
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
In reply to this post by Phil King via cfe-dev
Broadly, I would like to instead put effort into statement location tracking instead of going down this path. With statement markers, the debugger could step over the whole statement, if that's the desired stepping behavior.

I think that the source location of loads from allocas is often interesting in C++, where the majority of class local variables end up being address-taken and mem2reg / SROA do not fire. Consider llvm::SmallVectorBase::grow_pod, which makes every SmallVector of unknowable length address taken.

If I were to put a watchpoint on the variable, I would want precise source location information for the load that accesses it.

Instrumentation tools (PGO & ASan) will probably want fine-grained source location information for cases like this:
  int x;
  escape(&x);
  return someCondition ? x : barFunc();
The load from x is trivial, but it will be the only source location in its basic block.

My interest in tracking statement boundaries has been renewed because we need it to support the Visual Studio set next statement feature: https://crbug.com/1061084

I wouldn't strongly object to moving in the direction you are proposing if you go with it. I just wanted to raise an alternative for discussion.

On Tue, Apr 28, 2020 at 1:57 PM Adrian Prantl <[hidden email]> wrote:
Getting source location information right is tricky and all about finding a balance.
Recently, I was wondering why stepping through this contrived example

   1       struct Foo {
   2         Foo *getFoo() { return this; }
   3       };
   4       
   5       int main(int argc, char **argv) {
   6         Foo *foo = new Foo();
   7         foo->getFoo()->getFoo();
   8         return 0;
   9       }

LLDB was showing the column marker as

   7         foo->getFoo()->getFoo();
             ^^^

focussing on foo instead of at the method call getFoo() that I was expecting.

In LLVM IR, this code looks like

  %1 = load %struct.Foo*, %struct.Foo** %foo, align 8, !dbg !30
  %call1 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %1), !dbg !31
  %call2 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %call1), !dbg !32

or, in x86_64 assembler:

  .loc 1 7 3 is_stmt 1 ## column_info.cpp:7:3
  movq -24(%rbp), %rdi
  .loc 1 7 8 is_stmt 0 ## column_info.cpp:7:8
  callq __ZN3Foo6getFooEv
  .loc 1 7 18 ## column_info.cpp:7:18

The spurious (7:3) location is attached to an instruction that is the load of the variable from the stack slot, fused with moving that value into the register the ABI defines for $arg0.

I’m postulating that the source location of the LLVM IR load is uninteresting and perhaps even harmful. It is uninteresting, because at -O0, the location does not refer to explicit code that the user wrote and thus causes unintuitive stepping, and with optimizations enabled, there is a high likelihood that the entire instruction is going to be eliminated because of mem2reg, so the effect on profiling should be minimal. Since the load is from an alloca, it also cannot crash under normal operation. The location is harmful, because loads (at least on a CISC instruction set) are often fused with other instructions and having conflicting locations will cause both locations to be dropped when merged.

Based on all this I would most like to assign a form of “weak” source location to loads from allocas generated by the Clang frontend, that looses against any other source location when merged. The closest thing we have to this in LLVM IR is attaching no debug location. An instruction without a debug location either belongs to the function prologue, or will inherit whatever debug location the instruction before it has. In this particular case I think that no debug location is preferable over line 0 (which is how we usually denote compiler-generated code) because we don’t want the load instruction’s source location to erase any source location it may get merged with. One thing I need to check is what happens when an instruction without a location is the first instruction in a basic block. We may need to make an exception for that case.

To summarize, I’m proposing to delete all debug locations from instructions generated in the Clang frontend for loads from allocas that are holding source variables to improve the debug experience at -O0. This will have little effect on optimized code.

Let me know what you think!
-- adrian

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
Sure, I can try to elaborate, but I am assuming a bit about what the users desired stepping behavior is. I am assuming in this case that the user is doing the equivalent of `s` or `n` in gdb, and they want to get from one semicolon to the next. If that is not the case, if the user wants the debugger to stop at the location of both getFoo() calls in your example, my suggestion isn't helpful.

However, since the last dev meeting, I have been thinking about having IR intrinsics that mark the IR position where a statement begins. The intrinsic would carry a source location. The next instruction emitted with that source location would carry the DWARF is_stmt line table flag.

Making this idea work with optimizations is harder, since the instructions that make up a statement may all be removed or hoisted. To make this work, we would have to establish which instructions properly belong to the statement. This could be done by adding a level to the DIScope hierarchy. The statement markers would remain in place, and code motion would happen around them. Late in the codegen pipeline, the first instruction belonging to the most recently activated statement will be emitted with the is_stmt flag.

On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <[hidden email]> wrote:
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev


On Wed, Apr 29, 2020 at 2:52 PM Reid Kleckner <[hidden email]> wrote:
Sure, I can try to elaborate, but I am assuming a bit about what the users desired stepping behavior is. I am assuming in this case that the user is doing the equivalent of `s` or `n` in gdb, and they want to get from one semicolon to the next. If that is not the case, if the user wants the debugger to stop at the location of both getFoo() calls in your example, my suggestion isn't helpful.

However, since the last dev meeting, I have been thinking about having IR intrinsics that mark the IR position where a statement begins. The intrinsic would carry a source location. The next instruction emitted with that source location would carry the DWARF is_stmt line table flag.

Making this idea work with optimizations is harder, since the instructions that make up a statement may all be removed or hoisted. To make this work, we would have to establish which instructions properly belong to the statement. This could be done by adding a level to the DIScope hierarchy. The statement markers would remain in place, and code motion would happen around them.

In this case there would be no intrinsic, just a new kind of "statement" DIScope? yeah, that's the best idea I've had/heard of/discussed for statement grouping so far. Certainly seems imminently prototype-able & then get a sense for just how expensive scoping everything like that would be.
 
Late in the codegen pipeline, the first instruction belonging to the most recently activated statement will be emitted with the is_stmt flag.

On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <[hidden email]> wrote:
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.

Actually I'd kind of love this - I think it'd be really great to describe source ranges instead of source locations - probably super size costly though (& intermediate compiler memory usage, etc). Being able to describe nesting, etc - the same way Clang does with expression/subexpression highlighting I think would be wonderful - imagine if every instruction weren't just attributed to one location, but to a source range with a preferred location (instead of just pointing to the "+" for an add, be able to highlight the LHS and RHS of that plus expression, etc).
 


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
In reply to this post by Phil King via cfe-dev


> -----Original Message-----
> From: Adrian Prantl <[hidden email]>
> Sent: Wednesday, April 29, 2020 12:37 PM
> To: Reid Kleckner <[hidden email]>
> Cc: Clang Dev <[hidden email]>; Shafik Yaghmour
> <[hidden email]>; Vedant Kumar <[hidden email]>; Davidino Italiano
> <[hidden email]>; David Blaikie <[hidden email]>; Robinson, Paul
> <[hidden email]>; Jeremy Morse <[hidden email]>;
> Pavel Labath <[hidden email]>
> Subject: Re: RFC: Remove uninteresting debug locations at -O0
>
> Thanks to all of you for sharing your perspective! You all brought up
> important aspects that I hadn't considered. The argument that really
> clicked for me was Pavel's that you might want to change the value before
> the load happens.

That's definitely one thing a debugger user would like to do.
Another is something like a "do-over" which we've tried to support in our
debugger: you've stepped to line 27, had your "aha" moment, diddled some
values, now you drag your cursor back to line 24 and click "step".  This
can only work if all the instructions contributing to line 24 are actually
attributed to line 24.

Let me point out another aspect of what happens when you've done the load:
The value now exists in two places.  DWARF is able to describe this with
location lists (entry ranges are explicitly permitted to overlap) although
LLVM doesn't have a way to track or report that.  If we did, then Pavel's
scenario would work fine with Adrian's idea, because the debugger would be
able to update *both* places that had the value (assuming the debugger was
alert to the possibility of a value in multiple places).
--paulr

>
> One of my worries here is that what I called "less interesting" locations
> might delete more interesting ones when instructions and their locations
> are merged. But if we indeed consider the stack slot loads to be
> interesting then that is really the best we can do. After all, storing
> more than source location per instruction would be a UI design nightmare
> for a debugger.
>
>
> Reid, I would like to learn more about what you mean by statement
> tracking. I'm thinking of a statements as the expressions separated by
> semicolons, but since my whole example was a single statement I'm assuming
> you have something more fine-grained in mind. Perhaps you can post an
> example to illustrate what you have in mind?
>
>
> thanks!
> -- adrian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
In reply to this post by Phil King via cfe-dev


On Apr 29, 2020, at 2:52 PM, Reid Kleckner <[hidden email]> wrote:

Sure, I can try to elaborate, but I am assuming a bit about what the users desired stepping behavior is. I am assuming in this case that the user is doing the equivalent of `s` or `n` in gdb, and they want to get from one semicolon to the next. If that is not the case, if the user wants the debugger to stop at the location of both getFoo() calls in your example, my suggestion isn't helpful.

So instead of moving forward on a by-line basis, "n" would move forward to the next source-level statement?

Is the motivation to make larger steps, such as in:

  f(var1,
    var2,
    var3);
  g(var4)

having "n" jump from the f-call directlty to the g-call? Or are you concerned about code that has more than one statement per line? I guess the latter is very rare in LLVM code at least.

However, since the last dev meeting, I have been thinking about having IR intrinsics that mark the IR position where a statement begins. The intrinsic would carry a source location. The next instruction emitted with that source location would carry the DWARF is_stmt line table flag.

What I'm personally most interested in, is making smaller steps, such as expression-based stepping, i.e.:

f(a(), b(), c())

being able to "n" from the a-call, to the b-call. For that the current line info would probably be good enough because we already emit different column locations for all subexpressions.

Making this idea work with optimizations is harder, since the instructions that make up a statement may all be removed or hoisted. To make this work, we would have to establish which instructions properly belong to the statement. This could be done by adding a level to the DIScope hierarchy. The statement markers would remain in place, and code motion would happen around them. Late in the codegen pipeline, the first instruction belonging to the most recently activated statement will be emitted with the is_stmt flag.

The main problem I see with optimized code in this respect is when instructions get merged, hoisted, and otherwise moved or reused. I'd be curious to see some examples for how to handle loop-invariant code motion, DAGCombine patterns, in that approach.

-- adrian


On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <[hidden email]> wrote:
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
In reply to this post by Phil King via cfe-dev


On Apr 29, 2020, at 2:58 PM, David Blaikie <[hidden email]> wrote:



On Wed, Apr 29, 2020 at 2:52 PM Reid Kleckner <[hidden email]> wrote:
Sure, I can try to elaborate, but I am assuming a bit about what the users desired stepping behavior is. I am assuming in this case that the user is doing the equivalent of `s` or `n` in gdb, and they want to get from one semicolon to the next. If that is not the case, if the user wants the debugger to stop at the location of both getFoo() calls in your example, my suggestion isn't helpful.

However, since the last dev meeting, I have been thinking about having IR intrinsics that mark the IR position where a statement begins. The intrinsic would carry a source location. The next instruction emitted with that source location would carry the DWARF is_stmt line table flag.

Making this idea work with optimizations is harder, since the instructions that make up a statement may all be removed or hoisted. To make this work, we would have to establish which instructions properly belong to the statement. This could be done by adding a level to the DIScope hierarchy. The statement markers would remain in place, and code motion would happen around them.

In this case there would be no intrinsic, just a new kind of "statement" DIScope? yeah, that's the best idea I've had/heard of/discussed for statement grouping so far. Certainly seems imminently prototype-able & then get a sense for just how expensive scoping everything like that would be.
 
Late in the codegen pipeline, the first instruction belonging to the most recently activated statement will be emitted with the is_stmt flag.

On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <[hidden email]> wrote:
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.

Actually I'd kind of love this - I think it'd be really great to describe source ranges instead of source locations - probably super size costly though (& intermediate compiler memory usage, etc). Being able to describe nesting, etc - the same way Clang does with expression/subexpression highlighting I think would be wonderful - imagine if every instruction weren't just attributed to one location, but to a source range with a preferred location (instead of just pointing to the "+" for an add, be able to highlight the LHS and RHS of that plus expression, etc).

I've been thinking about this, too. An extreme point of view is that today's editors are so syntax-aware that they wouldn't need the debugger to tell it where an expression ends, as long as the start point is unambiguous. For example, LLDB already uses clang for syntax highlighting. But having it pre-encoded makes a lot things easier and unambiguous.

For the "+" infix operator example, you really need a third value in addition to the (start, length) that is otherwise sufficient. But in the normal case the length would be very short, so we could probably come up with a cheap encoding for it.

-- adrian 


 


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
On Thu, Apr 30, 2020 at 8:42 PM Adrian Prantl <[hidden email]> wrote:


On Apr 29, 2020, at 2:58 PM, David Blaikie <[hidden email]> wrote:



On Wed, Apr 29, 2020 at 2:52 PM Reid Kleckner <[hidden email]> wrote:
Sure, I can try to elaborate, but I am assuming a bit about what the users desired stepping behavior is. I am assuming in this case that the user is doing the equivalent of `s` or `n` in gdb, and they want to get from one semicolon to the next. If that is not the case, if the user wants the debugger to stop at the location of both getFoo() calls in your example, my suggestion isn't helpful.

However, since the last dev meeting, I have been thinking about having IR intrinsics that mark the IR position where a statement begins. The intrinsic would carry a source location. The next instruction emitted with that source location would carry the DWARF is_stmt line table flag.

Making this idea work with optimizations is harder, since the instructions that make up a statement may all be removed or hoisted. To make this work, we would have to establish which instructions properly belong to the statement. This could be done by adding a level to the DIScope hierarchy. The statement markers would remain in place, and code motion would happen around them.

In this case there would be no intrinsic, just a new kind of "statement" DIScope? yeah, that's the best idea I've had/heard of/discussed for statement grouping so far. Certainly seems imminently prototype-able & then get a sense for just how expensive scoping everything like that would be.
 
Late in the codegen pipeline, the first instruction belonging to the most recently activated statement will be emitted with the is_stmt flag.

On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <[hidden email]> wrote:
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.

Actually I'd kind of love this - I think it'd be really great to describe source ranges instead of source locations - probably super size costly though (& intermediate compiler memory usage, etc). Being able to describe nesting, etc - the same way Clang does with expression/subexpression highlighting I think would be wonderful - imagine if every instruction weren't just attributed to one location, but to a source range with a preferred location (instead of just pointing to the "+" for an add, be able to highlight the LHS and RHS of that plus expression, etc).

I've been thinking about this, too. An extreme point of view is that today's editors are so syntax-aware that they wouldn't need the debugger to tell it where an expression ends, as long as the start point is unambiguous. For example, LLDB already uses clang for syntax highlighting. But having it pre-encoded makes a lot things easier and unambiguous.

For the "+" infix operator example, you really need a third value in addition to the (start, length) that is otherwise sufficient. But in the normal case the length would be very short, so we could probably come up with a cheap encoding for it.

Arguably if you are relying on the syntax-awareness of the debugger/editor, perhaps you could rely on that to know precedence, grouping, etc. So the '+' would unambiguously describe the range? I think there's probably a lot of cases where either you'd have to define some very subtle/specific contract between producer and consumer to say "this position refers to this conceptual entity, this position refers to this one, etc... " (I'm thinking around for loops - how to describe different parts of the loop/body/etc - the sort of thing that's been being discussed again on a few bugs lately)
 

-- adrian 


 


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
In reply to this post by Phil King via cfe-dev
If you intend to manage source locations for an optimizing compiler, you need a much richer data structure than Clang & LLVM currently support. The compiler company I worked for went down this road ~25 years ago. It's actually quite straightforward to build, albeit potentially expensive in terms of memory and compilation time. But the result is that you can support accurate debugging of fully-optimized code. The model we used didn't even try to represent any source location smaller than a line of code, because we had waaay less memory available back then. The approach we used can reasonably apply to any granularity of source location you choose to implement. That said:
  • Any given instruction may "originate from" an arbitrary number of distinct source locations (call them LOCs). Consider inline expansion of a method, or macro expansion as two obvious examples; there are many more. The resulting instructions "originate from" the location of the call (or macro) and also from the particular line(s) in the method (macro) that generated them. So we need to be able to map one instruction/program-address to many LOCs.
  • Any given line of code may be implemented by a bunch of instructions that may or may not be contiguous. So we need to be able to map a single LOC to multiple instructions, ranges of instructions, multiple non-contiguous ranges of instructions, etc.
  • Depending on later optimization, there could legitimately be multiple distinct instructions each of which is "a beginning" of that LOC. More multiplicity.
  • Depending on later optimization, there could be multiple distinct instances of a single LOC. Consider a method that is inline-expanded at K call sites.  Each LOC within that method has (at least) K instances. Each maps back to the same LOC, but the instances aren't the same as each other. This is an important and possibly non-obvious part of the design. Loop unrolling is another example; hoisting an invariant out of a loop is yet another.
We found by experience that attempting to maintain beginnings (and endings) of LOC instances through the entire compiler was too hard; we always messed it up. So what we did instead was to tag each syntax tree (or Optimizer IR node, or instruction, etc.) with all the LOC instances that it's part of.  We did some quite simple LOC management (see below) throughout IR generation, optimization, and code generation, then computed the beginnings, ends, and ranges of LOC instances at the end of the compilation process via the obvious flow-based algorithms.

Tagging everything with all its LOC info and recovering beginnings/ends after all transformations are complete makes managing source location data inside the compiler essentially trivial. It devolves into five simple cases:
  1. When you delete code/IR/instructions (I'll call all of these code from now on), delete the LOC data along with them. With a reasonable implementation this comes for free when deleting the code.
  2. When moving code, move the LOC data along with it/them. Also a freebie.
  3. When combining code, union the LOC info. This sometimes happens in combination with the next case.
  4. When creating new code (simple case) copy the LOC info from the origin from which it was created onto the new code.
  5. When replicating code (more complex case) — inline expansion, loop unrolling, macro expansion... — create a new instance of each LOC for each replication of the origin. New instances are treated as distinct lines of code for the purpose of computing begin/end locations. When setting a breakpoint at the beginning of line N, for example, you would get breakpoint(s) at each beginning of each distinct instance of line N. For fancier debug support, one could set breakpoints at particular instances of line N without stopping at other instances, and so on.
The pain of implementing such an approach in the compiler is that somebody has to visit more or less the entire code base to ensure that each thing that transforms code applies the correct choice from the above 5 cases. The good news is that each such decision is both local and essentially trivial — we finished the first cut at the update in our compilers' code (~400KLOC at the time) in about two person-weeks of effort. Followed, of course, by testing & debugging time to smoke out the incorrect or missing decisions. 

The result of this approach is that you get quality source location information for minimum-to-low optimization levels at reasonable expense in memory & compile-time and with only modest expansion of the size of the debug info. At higher optimization levels you continue to get fully-accurate source location information (modulo bugs, as with all things). Depending on how agressive your optimizations are, that location information may be anything from somewhat non-obvious to utterly brain-bending. The increase in size for the debug info can be large, but is roughly proportional to the degree to which code has been transformed.

You can propagate LOCs through essentially arbitrary optimizations: loop unrolling, vectorization, loop pipelining, cache blocking, or whatever. You can even track source locations through software loop pipelined, unrolled & optimized VLIW code, if that happens to be the problem at hand. As an added benefit, it supports things like providing auditors looking at safety-critical code with the exact origin(s) of every instruction in the program. It even helps compiler implementers understand what their optimizer has done to the program.

Obviously, your debugger has to understand the more complicated model of source locations. For example, asking for a breakpoint at the beginning of a source line may require setting either one or many breakpoints in the program.  Similarly, if you stop at an arbitrary instruction and ask the debugger where you are in the source, you could wind up with something like "Stopped at the beginning of line-K-instance-X, somewhere during execution of line-L-instance-1, and just after the end of line-A". The most useful breakpoint operations were: "stop at the beginning of (each instance of) a line" and "stop just after the last instruction of (each instance of) a line", because those are the points where either nothing for that line has yet executed or where everything for that line is complete. Obviously, arbitrary other stuff could be in flight there as well, including prior or subsequent lines, or whatever.

We also implemented a way to accurately explain to the debugger how to find the value of variables post optimization, but that'd need another big wall of text so I'll skip it for now.

Sorry to go on so long. My primary message is that propagating rich source location information through a compiler is a problem that is conceptually easy, is tedious but feasible to implement, and that has known proven solutions. Further details on request, if anyone is interested.

Dean


On Wed, Apr 29, 2020 at 2:59 PM David Blaikie via cfe-dev <[hidden email]> wrote:


On Wed, Apr 29, 2020 at 2:52 PM Reid Kleckner <[hidden email]> wrote:
Sure, I can try to elaborate, but I am assuming a bit about what the users desired stepping behavior is. I am assuming in this case that the user is doing the equivalent of `s` or `n` in gdb, and they want to get from one semicolon to the next. If that is not the case, if the user wants the debugger to stop at the location of both getFoo() calls in your example, my suggestion isn't helpful.

However, since the last dev meeting, I have been thinking about having IR intrinsics that mark the IR position where a statement begins. The intrinsic would carry a source location. The next instruction emitted with that source location would carry the DWARF is_stmt line table flag.

Making this idea work with optimizations is harder, since the instructions that make up a statement may all be removed or hoisted. To make this work, we would have to establish which instructions properly belong to the statement. This could be done by adding a level to the DIScope hierarchy. The statement markers would remain in place, and code motion would happen around them.

In this case there would be no intrinsic, just a new kind of "statement" DIScope? yeah, that's the best idea I've had/heard of/discussed for statement grouping so far. Certainly seems imminently prototype-able & then get a sense for just how expensive scoping everything like that would be.
 
Late in the codegen pipeline, the first instruction belonging to the most recently activated statement will be emitted with the is_stmt flag.

On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <[hidden email]> wrote:
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.

Actually I'd kind of love this - I think it'd be really great to describe source ranges instead of source locations - probably super size costly though (& intermediate compiler memory usage, etc). Being able to describe nesting, etc - the same way Clang does with expression/subexpression highlighting I think would be wonderful - imagine if every instruction weren't just attributed to one location, but to a source range with a preferred location (instead of just pointing to the "+" for an add, be able to highlight the LHS and RHS of that plus expression, etc).
 


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Remove uninteresting debug locations at -O0

Phil King via cfe-dev
Hi Dean,

Thank you for the detailed post - I agree that this is the right approach.  This is what we converged on in the MLIR IR as well for basically the same reasons you listed.

I think it would be great to consider porting the MLIR design over the LLVM IR.

-Chris

On May 22, 2020, at 4:13 PM, Dean Sutherland via cfe-dev <[hidden email]> wrote:

If you intend to manage source locations for an optimizing compiler, you need a much richer data structure than Clang & LLVM currently support. The compiler company I worked for went down this road ~25 years ago. It's actually quite straightforward to build, albeit potentially expensive in terms of memory and compilation time. But the result is that you can support accurate debugging of fully-optimized code. The model we used didn't even try to represent any source location smaller than a line of code, because we had waaay less memory available back then. The approach we used can reasonably apply to any granularity of source location you choose to implement. That said:
  • Any given instruction may "originate from" an arbitrary number of distinct source locations (call them LOCs). Consider inline expansion of a method, or macro expansion as two obvious examples; there are many more. The resulting instructions "originate from" the location of the call (or macro) and also from the particular line(s) in the method (macro) that generated them. So we need to be able to map one instruction/program-address to many LOCs.
  • Any given line of code may be implemented by a bunch of instructions that may or may not be contiguous. So we need to be able to map a single LOC to multiple instructions, ranges of instructions, multiple non-contiguous ranges of instructions, etc.
  • Depending on later optimization, there could legitimately be multiple distinct instructions each of which is "a beginning" of that LOC. More multiplicity.
  • Depending on later optimization, there could be multiple distinct instances of a single LOC. Consider a method that is inline-expanded at K call sites.  Each LOC within that method has (at least) K instances. Each maps back to the same LOC, but the instances aren't the same as each other. This is an important and possibly non-obvious part of the design. Loop unrolling is another example; hoisting an invariant out of a loop is yet another.
We found by experience that attempting to maintain beginnings (and endings) of LOC instances through the entire compiler was too hard; we always messed it up. So what we did instead was to tag each syntax tree (or Optimizer IR node, or instruction, etc.) with all the LOC instances that it's part of.  We did some quite simple LOC management (see below) throughout IR generation, optimization, and code generation, then computed the beginnings, ends, and ranges of LOC instances at the end of the compilation process via the obvious flow-based algorithms.

Tagging everything with all its LOC info and recovering beginnings/ends after all transformations are complete makes managing source location data inside the compiler essentially trivial. It devolves into five simple cases:
  1. When you delete code/IR/instructions (I'll call all of these code from now on), delete the LOC data along with them. With a reasonable implementation this comes for free when deleting the code.
  2. When moving code, move the LOC data along with it/them. Also a freebie.
  3. When combining code, union the LOC info. This sometimes happens in combination with the next case.
  4. When creating new code (simple case) copy the LOC info from the origin from which it was created onto the new code.
  5. When replicating code (more complex case) — inline expansion, loop unrolling, macro expansion... — create a new instance of each LOC for each replication of the origin. New instances are treated as distinct lines of code for the purpose of computing begin/end locations. When setting a breakpoint at the beginning of line N, for example, you would get breakpoint(s) at each beginning of each distinct instance of line N. For fancier debug support, one could set breakpoints at particular instances of line N without stopping at other instances, and so on.
The pain of implementing such an approach in the compiler is that somebody has to visit more or less the entire code base to ensure that each thing that transforms code applies the correct choice from the above 5 cases. The good news is that each such decision is both local and essentially trivial — we finished the first cut at the update in our compilers' code (~400KLOC at the time) in about two person-weeks of effort. Followed, of course, by testing & debugging time to smoke out the incorrect or missing decisions. 

The result of this approach is that you get quality source location information for minimum-to-low optimization levels at reasonable expense in memory & compile-time and with only modest expansion of the size of the debug info. At higher optimization levels you continue to get fully-accurate source location information (modulo bugs, as with all things). Depending on how agressive your optimizations are, that location information may be anything from somewhat non-obvious to utterly brain-bending. The increase in size for the debug info can be large, but is roughly proportional to the degree to which code has been transformed.

You can propagate LOCs through essentially arbitrary optimizations: loop unrolling, vectorization, loop pipelining, cache blocking, or whatever. You can even track source locations through software loop pipelined, unrolled & optimized VLIW code, if that happens to be the problem at hand. As an added benefit, it supports things like providing auditors looking at safety-critical code with the exact origin(s) of every instruction in the program. It even helps compiler implementers understand what their optimizer has done to the program.

Obviously, your debugger has to understand the more complicated model of source locations. For example, asking for a breakpoint at the beginning of a source line may require setting either one or many breakpoints in the program.  Similarly, if you stop at an arbitrary instruction and ask the debugger where you are in the source, you could wind up with something like "Stopped at the beginning of line-K-instance-X, somewhere during execution of line-L-instance-1, and just after the end of line-A". The most useful breakpoint operations were: "stop at the beginning of (each instance of) a line" and "stop just after the last instruction of (each instance of) a line", because those are the points where either nothing for that line has yet executed or where everything for that line is complete. Obviously, arbitrary other stuff could be in flight there as well, including prior or subsequent lines, or whatever.

We also implemented a way to accurately explain to the debugger how to find the value of variables post optimization, but that'd need another big wall of text so I'll skip it for now.

Sorry to go on so long. My primary message is that propagating rich source location information through a compiler is a problem that is conceptually easy, is tedious but feasible to implement, and that has known proven solutions. Further details on request, if anyone is interested.

Dean


On Wed, Apr 29, 2020 at 2:59 PM David Blaikie via cfe-dev <[hidden email]> wrote:


On Wed, Apr 29, 2020 at 2:52 PM Reid Kleckner <[hidden email]> wrote:
Sure, I can try to elaborate, but I am assuming a bit about what the users desired stepping behavior is. I am assuming in this case that the user is doing the equivalent of `s` or `n` in gdb, and they want to get from one semicolon to the next. If that is not the case, if the user wants the debugger to stop at the location of both getFoo() calls in your example, my suggestion isn't helpful.

However, since the last dev meeting, I have been thinking about having IR intrinsics that mark the IR position where a statement begins. The intrinsic would carry a source location. The next instruction emitted with that source location would carry the DWARF is_stmt line table flag.

Making this idea work with optimizations is harder, since the instructions that make up a statement may all be removed or hoisted. To make this work, we would have to establish which instructions properly belong to the statement. This could be done by adding a level to the DIScope hierarchy. The statement markers would remain in place, and code motion would happen around them.

In this case there would be no intrinsic, just a new kind of "statement" DIScope? yeah, that's the best idea I've had/heard of/discussed for statement grouping so far. Certainly seems imminently prototype-able & then get a sense for just how expensive scoping everything like that would be.
 
Late in the codegen pipeline, the first instruction belonging to the most recently activated statement will be emitted with the is_stmt flag.

On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <[hidden email]> wrote:
Thanks to all of you for sharing your perspective! You all brought up important aspects that I hadn't considered. The argument that really clicked for me was Pavel's that you might want to change the value before the load happens.

One of my worries here is that what I called "less interesting" locations might delete more interesting ones when instructions and their locations are merged. But if we indeed consider the stack slot loads to be interesting then that is really the best we can do. After all, storing more than source location per instruction would be a UI design nightmare for a debugger.

Actually I'd kind of love this - I think it'd be really great to describe source ranges instead of source locations - probably super size costly though (& intermediate compiler memory usage, etc). Being able to describe nesting, etc - the same way Clang does with expression/subexpression highlighting I think would be wonderful - imagine if every instruction weren't just attributed to one location, but to a source range with a preferred location (instead of just pointing to the "+" for an add, be able to highlight the LHS and RHS of that plus expression, etc).
 


Reid, I would like to learn more about what you mean by statement tracking. I'm thinking of a statements as the expressions separated by semicolons, but since my whole example was a single statement I'm assuming you have something more fine-grained in mind. Perhaps you can post an example to illustrate what you have in mind?


thanks!
-- adrian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev