UNREACHABLE executed! error while trying to generate PTX

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

UNREACHABLE executed! error while trying to generate PTX

I am trying to generate PTX code for 'nbody' sample program's kernel (nbody_kernel.cu) using clang/LLVM version 3.2. The nbody CUDA program is available in Nvidia's SDK.

I am referring to https://github.com/jholewinski/llvm-ptx-samples project.

Following are my commands,

clang++ -O4 -S -I/usr/local/cuda/include -emit-llvm -target nvptx64 nbody_kernel.cu -o nbody_kernel.ll

opt -O3 -loop-unroll -unroll-allow-partial nbody_kernel.ll -o nbody_kernel.ll

llc nbody_kernel.ll -o nbody_kernel.ptx

After execution of the last command(llc) I get a UNREACHABLE executed! error with the following stack trace

[DEVICE-C++] nbody.kernel.cpp
unexpected address space
UNREACHABLE executed at /home/pratnali/LLVM/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1317!
0  libLLVM-3.3svn.so 0x00007f3857bdf0cb llvm::sys::PrintStackTrace(_IO_FILE*) + 43
1  libLLVM-3.3svn.so 0x00007f3857bde74a
2  libpthread.so.0   0x00007f3856c3c460
3  libc.so.6         0x00007f3855a90b15 gsignal + 53
4  libc.so.6         0x00007f3855a91f96 abort + 390
5  libLLVM-3.3svn.so 0x00007f3857bc30f7 llvm::llvm_unreachable_internal(char const*, char const*, unsigned int) + 359
6  libLLVM-3.3svn.so 0x00007f385722967d
7  libLLVM-3.3svn.so 0x00007f385722b6d7
8  libLLVM-3.3svn.so 0x00007f3857341723 llvm::FPPassManager::doInitialization(llvm::Module&) + 99
9  libLLVM-3.3svn.so 0x00007f385734639d llvm::MPPassManager::runOnModule(llvm::Module&) + 205
10 libLLVM-3.3svn.so 0x00007f3857349b7c llvm::PassManagerImpl::run(llvm::Module&) + 268
11 llc               0x000000000040b534
12 llc               0x000000000040d131 main + 465
13 libc.so.6         0x00007f3855a7d4bd __libc_start_main + 253
14 llc               0x0000000000406e59
Stack dump:
0. Program arguments: llc nbody.kernel.ll -o nbody.kernel.ptx
make: *** [nbody.kernel.ptx] Aborted

I replaced the global indexes like for e.g threadIdx.x with __builtin_ptx_read_tid_x() and others. There are no problems in generating LLVM IR (i.e .ll). The error pops up while trying to generate PTX from the IR using llc.

Any pointers on what might be going on here ? Will appreciate any help in going forward

I have attached my program and observations in a README here.

You can easily reproduce the problem using this.