compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

shirley breuer via cfe-dev
Hi, Alexey!

I've ran into an odd case with debug info generation in NVPTX.

Reproduction:
------------------------
__device__ __attribute__((noinline)) void bar() { printf("Hi!"); }
__global__ void foo() { bar(); }
int main(){}
------------------------

$ clang++ -v --cuda-gpu-arch=sm_70 --cuda-device-only -fdebug-default-version=5 a.cu -gmlt -O1 -c

Compilation fails due to a syntax error reported by ptxas.
The reason for the error is that clang generates a label in the middle of a `call.uni` instuction. E.g:

        { // callseq 1, 0
        .reg .b32 temp_param_reg;
        call.uni
Ltmp14:
        _Z3barv,
        (
        );
        } // callseq 1

The odd part is that we're only generating line info and there is no DWARF in the generated PTX.
It appears that this behavior is triggered by `-dwarf-version=5` passed to cc1. 
Looks like another case where PTX syntax breaks DWARF generator assumptions.

It's possible to work around it with an additional `-Xarch_device -fdebug-default-version=2`, 
but I'd appreciate it if you could take a look and see if that could be fixed.

--
--Artem Belevich

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

shirley breuer via cfe-dev
Hi Artem, will check this tomorrow.

Best regards,
Alexey Bataev

2 дек. 2020 г., в 18:00, Artem Belevich <[hidden email]> написал(а):


Hi, Alexey!

I've ran into an odd case with debug info generation in NVPTX.

Reproduction:
------------------------
__device__ __attribute__((noinline)) void bar() { printf("Hi!"); }
__global__ void foo() { bar(); }
int main(){}
------------------------

$ clang++ -v --cuda-gpu-arch=sm_70 --cuda-device-only -fdebug-default-version=5 a.cu -gmlt -O1 -c

Compilation fails due to a syntax error reported by ptxas.
The reason for the error is that clang generates a label in the middle of a `call.uni` instuction. E.g:

        { // callseq 1, 0
        .reg .b32 temp_param_reg;
        call.uni
Ltmp14:
        _Z3barv,
        (
        );
        } // callseq 1

The odd part is that we're only generating line info and there is no DWARF in the generated PTX.
It appears that this behavior is triggered by `-dwarf-version=5` passed to cc1. 
Looks like another case where PTX syntax breaks DWARF generator assumptions.

It's possible to work around it with an additional `-Xarch_device -fdebug-default-version=2`, 
but I'd appreciate it if you could take a look and see if that could be fixed.

--
--Artem Belevich

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

shirley breuer via cfe-dev
In reply to this post by shirley breuer via cfe-dev

Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DW_AT_call_site attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt).

Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it. You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop treating it as a call instruction in NVPTXInstrInfo.td. Can't say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

-------------
Best regards,
Alexey Bataev
12/2/2020 6:00 PM, Artem Belevich пишет:
Hi, Alexey!

I've ran into an odd case with debug info generation in NVPTX.

Reproduction:
------------------------
__device__ __attribute__((noinline)) void bar() { printf("Hi!"); }
__global__ void foo() { bar(); }
int main(){}
------------------------

$ clang++ -v --cuda-gpu-arch=sm_70 --cuda-device-only -fdebug-default-version=5 a.cu -gmlt -O1 -c

Compilation fails due to a syntax error reported by ptxas.
The reason for the error is that clang generates a label in the middle of a `call.uni` instuction. E.g:

        { // callseq 1, 0
        .reg .b32 temp_param_reg;
        call.uni
Ltmp14:
        _Z3barv,
        (
        );
        } // callseq 1

The odd part is that we're only generating line info and there is no DWARF in the generated PTX.
It appears that this behavior is triggered by `-dwarf-version=5` passed to cc1. 
Looks like another case where PTX syntax breaks DWARF generator assumptions.

It's possible to work around it with an additional `-Xarch_device -fdebug-default-version=2`, 
but I'd appreciate it if you could take a look and see if that could be fixed.

--
--Artem Belevich

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

OpenPGP_signature (855 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

shirley breuer via cfe-dev


On Thu, Dec 3, 2020 at 11:57 AM Alexey.Bataev <[hidden email]> wrote:

Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DW_AT_call_site attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt).

Thank you for looking into this.
 

Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it.

Would it help if we were to fold the whole call.uni into a single line? It appears that swarf injects the label after the call instruction line gets printed. 
What is the label supposed to point at? At the call instruction itself? Or at the return point?

You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop treating it as a call instruction in NVPTXInstrInfo.td. Can't say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

David suggested not allowing -fdebug-default-version=5 to override the DWARF version provided by NVPTX back-end. That would make sense, considering that we can't handle the newer DWARF versions anyways.

Let me see if I can fix the lineinfo generation first. 

--Artem

-------------
Best regards,
Alexey Bataev
12/2/2020 6:00 PM, Artem Belevich пишет:
Hi, Alexey!

I've ran into an odd case with debug info generation in NVPTX.

Reproduction:
------------------------
__device__ __attribute__((noinline)) void bar() { printf("Hi!"); }
__global__ void foo() { bar(); }
int main(){}
------------------------

$ clang++ -v --cuda-gpu-arch=sm_70 --cuda-device-only -fdebug-default-version=5 a.cu -gmlt -O1 -c

Compilation fails due to a syntax error reported by ptxas.
The reason for the error is that clang generates a label in the middle of a `call.uni` instuction. E.g:

        { // callseq 1, 0
        .reg .b32 temp_param_reg;
        call.uni
Ltmp14:
        _Z3barv,
        (
        );
        } // callseq 1

The odd part is that we're only generating line info and there is no DWARF in the generated PTX.
It appears that this behavior is triggered by `-dwarf-version=5` passed to cc1. 
Looks like another case where PTX syntax breaks DWARF generator assumptions.

It's possible to work around it with an additional `-Xarch_device -fdebug-default-version=2`, 
but I'd appreciate it if you could take a look and see if that could be fixed.

--
--Artem Belevich


--
--Artem Belevich

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

shirley breuer via cfe-dev


-------------
Best regards,
Alexey Bataev
12/3/2020 3:28 PM, Artem Belevich пишет:


On Thu, Dec 3, 2020 at 11:57 AM Alexey.Bataev <[hidden email]> wrote:

Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DW_AT_call_site attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt).

Thank you for looking into this.
 

Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it.

Would it help if we were to fold the whole call.uni into a single line?
If it would be a single instruction - yes, it should fix it.
It appears that swarf injects the label after the call instruction line gets printed.
Actually, it may insert the label before (for tail calls) and after a call.
What is the label supposed to point at? At the call instruction itself? Or at the return point?
The labels just point on the start and the end of function call, i.e. it must be inserted before call (for tail calls) and right after call.

You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop treating it as a call instruction in NVPTXInstrInfo.td. Can't say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

David suggested not allowing -fdebug-default-version=5 to override the DWARF version provided by NVPTX back-end. That would make sense, considering that we can't handle the newer DWARF versions anyways.
Yes, that's a good idea. If we need to change it we still can use LLVM option to override the DWARF version.

Let me see if I can fix the lineinfo generation first. 

--Artem

-------------
Best regards,
Alexey Bataev
12/2/2020 6:00 PM, Artem Belevich пишет:
Hi, Alexey!

I've ran into an odd case with debug info generation in NVPTX.

Reproduction:
------------------------
__device__ __attribute__((noinline)) void bar() { printf("Hi!"); }
__global__ void foo() { bar(); }
int main(){}
------------------------

$ clang++ -v --cuda-gpu-arch=sm_70 --cuda-device-only -fdebug-default-version=5 a.cu -gmlt -O1 -c

Compilation fails due to a syntax error reported by ptxas.
The reason for the error is that clang generates a label in the middle of a `call.uni` instuction. E.g:

        { // callseq 1, 0
        .reg .b32 temp_param_reg;
        call.uni
Ltmp14:
        _Z3barv,
        (
        );
        } // callseq 1

The odd part is that we're only generating line info and there is no DWARF in the generated PTX.
It appears that this behavior is triggered by `-dwarf-version=5` passed to cc1. 
Looks like another case where PTX syntax breaks DWARF generator assumptions.

It's possible to work around it with an additional `-Xarch_device -fdebug-default-version=2`, 
but I'd appreciate it if you could take a look and see if that could be fixed.

--
--Artem Belevich


--
--Artem Belevich

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

OpenPGP_signature (855 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

shirley breuer via cfe-dev
In reply to this post by shirley breuer via cfe-dev


On Thu, Dec 3, 2020 at 12:29 PM Artem Belevich via cfe-dev <[hidden email]> wrote:


On Thu, Dec 3, 2020 at 11:57 AM Alexey.Bataev <[hidden email]> wrote:

Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DW_AT_call_site attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt).

Thank you for looking into this.
 

Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it.

Would it help if we were to fold the whole call.uni into a single line? It appears that swarf injects the label after the call instruction line gets printed. 
What is the label supposed to point at? At the call instruction itself? Or at the return point?

I believe it's immediately after the call.

Testing an x86 example:

$ cat test.c

void f1();

void f2();

void f3() {

  f1();

  f2();

}

$ clang-tot -gdwarf-5 test.c -c -O3 && llvm-dwarfdump-tot test.o | grep "DW_TAG\|_call_" && llvm-objdump -d test.o

0x0000000c: DW_TAG_compile_unit

0x00000023:   DW_TAG_subprogram

                DW_AT_call_all_calls    (true)

0x0000002e:     DW_TAG_call_site

                  DW_AT_call_origin     (0x0000003b)

                  DW_AT_call_return_pc  (0x0000000000000008)

0x00000034:     DW_TAG_call_site

                  DW_AT_call_origin     (0x00000041)

                  DW_AT_call_tail_call  (true)

                  DW_AT_call_pc (0x000000000000000b)

0x0000003b:   DW_TAG_subprogram

0x0000003f:     DW_TAG_unspecified_parameters

0x00000041:   DW_TAG_subprogram

0x00000045:     DW_TAG_unspecified_parameters


test.o: file format elf64-x86-64



Disassembly of section .text:


0000000000000000 <f3>:

       0: 50                            pushq   %rax

       1: 31 c0                         xorl    %eax, %eax

       3: e8 00 00 00 00                callq   0x8 <f3+0x8>

       8: 31 c0                         xorl    %eax, %eax

       a: 59                            popq    %rcx

       b: e9 00 00 00 00                jmp     0x10 <f3+0x10>



But the tail call uses the jump location, because it can't do anything else.

You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop treating it as a call instruction in NVPTXInstrInfo.td. Can't say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

David suggested not allowing -fdebug-default-version=5 to override the DWARF version provided by NVPTX back-end. That would make sense, considering that we can't handle the newer DWARF versions anyways.

Let me see if I can fix the lineinfo generation first. 

Yeah, -gdwarf-5 and -fdebug-default-version=5 -g should behave the same. So whatever it is that's disabling/downgrading to DWARFv2 for NVPTX when the user uses -gdwarf-5 shoudl do the same for -fdebug-default-version=5
 

--Artem

-------------
Best regards,
Alexey Bataev
12/2/2020 6:00 PM, Artem Belevich пишет:
Hi, Alexey!

I've ran into an odd case with debug info generation in NVPTX.

Reproduction:
------------------------
__device__ __attribute__((noinline)) void bar() { printf("Hi!"); }
__global__ void foo() { bar(); }
int main(){}
------------------------

$ clang++ -v --cuda-gpu-arch=sm_70 --cuda-device-only -fdebug-default-version=5 a.cu -gmlt -O1 -c

Compilation fails due to a syntax error reported by ptxas.
The reason for the error is that clang generates a label in the middle of a `call.uni` instuction. E.g:

        { // callseq 1, 0
        .reg .b32 temp_param_reg;
        call.uni
Ltmp14:
        _Z3barv,
        (
        );
        } // callseq 1

The odd part is that we're only generating line info and there is no DWARF in the generated PTX.
It appears that this behavior is triggered by `-dwarf-version=5` passed to cc1. 
Looks like another case where PTX syntax breaks DWARF generator assumptions.

It's possible to work around it with an additional `-Xarch_device -fdebug-default-version=2`, 
but I'd appreciate it if you could take a look and see if that could be fixed.

--
--Artem Belevich


--
--Artem Belevich
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev