Detecting undefined pointer arithmetic

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Detecting undefined pointer arithmetic

Manas via cfe-dev
I noticed that none of the sanitizers seems to support checking for
out-of-bounds pointer arithmetic, even though my understanding of
the C standard is that this is undefined behavior.  In particular, I
believe the following trivial program has undefined behavior (assuming
malloc() succeeds), but none of the sanitizers flag any warnings:

#include <stdlib.h>
int main(void) {
   char *buf = malloc(1);
   if (buf) {
      char *this_is_ub = buf + 3;
      free(buf);
   }
}

Of course, I suspect this just has not been implemented yet, but
it still leaves me at a loss for how to track this form of UB down.
Is there a better solution than manual code review?

Sincerely,

Demi Obenour


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Detecting undefined pointer arithmetic

Manas via cfe-dev
> On Jan 10, 2021, at 11:39 PM, Demi M. Obenour via cfe-dev <[hidden email]> wrote:
>
> I noticed that none of the sanitizers seems to support checking for
> out-of-bounds pointer arithmetic, even though my understanding of
> the C standard is that this is undefined behavior.  In particular, I
> believe the following trivial program has undefined behavior (assuming
> malloc() succeeds), but none of the sanitizers flag any warnings:
>
> #include <stdlib.h>
> int main(void) {
>   char *buf = malloc(1);
>   if (buf) {
>      char *this_is_ub = buf + 3;
>      free(buf);
>   }
> }
>
> Of course, I suspect this just has not been implemented yet, but
> it still leaves me at a loss for how to track this form of UB down.
> Is there a better solution than manual code review?


This program does not have UB.
There’s nothing wrong with forming an "out-of-bound” pointer.
If you use it for anything, then that is UB - and address sanitizer will find such usages for you.

Like this:

#include <stdlib.h>
int main(void) {
  int ret = 0;
  char *buf = (char *) malloc(1);
  if (buf) {
     char *this_is_ub = buf + 3;
     ret = *this_is_ub;
     free(buf);
  }
  return ret;
}

——
% clang++ -fsanitize=address bug.cpp && ./a.out
=================================================================
==25602==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000d3 at pc 0x00010531ff18 bp 0x7ffeea8e0970 sp 0x7ffeea8e0968
READ of size 1 at 0x6020000000d3 thread T0
    #0 0x10531ff17 in main (a.out:x86_64+0x100000f17)
    #1 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)

0x6020000000d3 is located 2 bytes to the right of 1-byte region [0x6020000000d0,0x6020000000d1)
allocated by thread T0 here:
    #0 0x105e08abd in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x45abd)
    #1 0x10531feaf in main (a.out:x86_64+0x100000eaf)
    #2 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)

SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:x86_64+0x100000f17) in main
Shadow bytes around the buggy address:
  0x1c03ffffffc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03ffffffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 00 fa fa 00 04
=>0x1c0400000010: fa fa 00 00 fa fa 00 06 fa fa[01]fa fa fa fa fa
  0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==25602==ABORTING
zsh: abort      ./a.out

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Detecting undefined pointer arithmetic

Manas via cfe-dev
Dear All,

Actually, I think this is technically undefined behavior as (IIRC) C allows a pointer to extend one byte past the end of the referent memory object but not any further than that.

That said, I wouldn’t be surprised if the sanitizers do not catch this case because:

  1. Since the code doesn’t do anything useful, dead code elimination may remove the offending code before the sanitizers instrument the program.
  2. The out of bounds pointer is not used to read or write memory, so it can’t corrupt or leak memory.  About the most danger it poses is that it may enable optimizations that the programmer is not expecting because the code is undefined.

I’ve lost track of all the different sanitizers in LLVM and how they work, but the original Address Sanitizer just checks for out of bounds loads and stores; it doesn’t place any checks on pointer arithmetic operations (like the LLVM gep instruction).  That makes it faster at the cost of not catching all pointer arithmetic errors.

Regards,

John Criswell

--
John Criswell
Associate Professor
University of Rochester





On Jan 14, 2021, at 9:39 AM, Marshall Clow via cfe-dev <[hidden email]> wrote:

On Jan 10, 2021, at 11:39 PM, Demi M. Obenour via cfe-dev <[hidden email]> wrote:

I noticed that none of the sanitizers seems to support checking for
out-of-bounds pointer arithmetic, even though my understanding of
the C standard is that this is undefined behavior.  In particular, I
believe the following trivial program has undefined behavior (assuming
malloc() succeeds), but none of the sanitizers flag any warnings:

#include <stdlib.h>
int main(void) {
 char *buf = malloc(1);
 if (buf) {
    char *this_is_ub = buf + 3;
    free(buf);
 }
}

Of course, I suspect this just has not been implemented yet, but
it still leaves me at a loss for how to track this form of UB down.
Is there a better solution than manual code review?


This program does not have UB.
There’s nothing wrong with forming an "out-of-bound” pointer.
If you use it for anything, then that is UB - and address sanitizer will find such usages for you.

Like this:

#include <stdlib.h>
int main(void) {
 int ret = 0;
 char *buf = (char *) malloc(1);
 if (buf) {
    char *this_is_ub = buf + 3;
    ret = *this_is_ub;
    free(buf);
 }
 return ret;
}

——
% clang++ -fsanitize=address bug.cpp && ./a.out
=================================================================
==25602==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000d3 at pc 0x00010531ff18 bp 0x7ffeea8e0970 sp 0x7ffeea8e0968
READ of size 1 at 0x6020000000d3 thread T0
   #0 0x10531ff17 in main (a.out:x86_64+0x100000f17)
   #1 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)

0x6020000000d3 is located 2 bytes to the right of 1-byte region [0x6020000000d0,0x6020000000d1)
allocated by thread T0 here:
   #0 0x105e08abd in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x45abd)
   #1 0x10531feaf in main (a.out:x86_64+0x100000eaf)
   #2 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)

SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:x86_64+0x100000f17) in main
Shadow bytes around the buggy address:
 0x1c03ffffffc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c03ffffffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 00 fa fa 00 04
=>0x1c0400000010: fa fa 00 00 fa fa 00 06 fa fa[01]fa fa fa fa fa
 0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
 Addressable:           00
 Partially addressable: 01 02 03 04 05 06 07
 Heap left redzone:       fa
 Freed heap region:       fd
 Stack left redzone:      f1
 Stack mid redzone:       f2
 Stack right redzone:     f3
 Stack after return:      f5
 Stack use after scope:   f8
 Global redzone:          f9
 Global init order:       f6
 Poisoned by user:        f7
 Container overflow:      fc
 Array cookie:            ac
 Intra object redzone:    bb
 ASan internal:           fe
 Left alloca redzone:     ca
 Right alloca redzone:    cb
 Shadow gap:              cc
==25602==ABORTING
zsh: abort      ./a.out

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Detecting undefined pointer arithmetic

Manas via cfe-dev
We do actually have (at least) two sanitizers that catch some cases of this source of undefined behavior.

1) We have a pointer overflow sanitizer that catches pointer arithmetic that wraps around the address space.

  buf + very_large_number

2) We have an array bounds sanitizer that catches pointer arithmetic that leaves the bounds of an array of known size.

   int n = 7;
   int arr[5];
   int *p = arr + n;

Both checks are somewhat simplistic and could be extended to catch more cases:
(1) could be taught to catch pointer arithmetic that leaves the bounds determined by __builtin_object_size.
(2) could be taught to determine the known array bound in more cases, for example catching '&n + 2' because (at least in C++) the language rules say that 'n' is treated as an array of bound 1 for the purposes of pointer arithmetic.

On Thu, 14 Jan 2021 at 08:03, John Criswell via cfe-dev <[hidden email]> wrote:
Dear All,

Actually, I think this is technically undefined behavior as (IIRC) C allows a pointer to extend one byte past the end of the referent memory object but not any further than that.

That said, I wouldn’t be surprised if the sanitizers do not catch this case because:

  1. Since the code doesn’t do anything useful, dead code elimination may remove the offending code before the sanitizers instrument the program.
  2. The out of bounds pointer is not used to read or write memory, so it can’t corrupt or leak memory.  About the most danger it poses is that it may enable optimizations that the programmer is not expecting because the code is undefined.

I’ve lost track of all the different sanitizers in LLVM and how they work, but the original Address Sanitizer just checks for out of bounds loads and stores; it doesn’t place any checks on pointer arithmetic operations (like the LLVM gep instruction).  That makes it faster at the cost of not catching all pointer arithmetic errors.

Regards,

John Criswell

--
John Criswell
Associate Professor
University of Rochester





On Jan 14, 2021, at 9:39 AM, Marshall Clow via cfe-dev <[hidden email]> wrote:

On Jan 10, 2021, at 11:39 PM, Demi M. Obenour via cfe-dev <[hidden email]> wrote:

I noticed that none of the sanitizers seems to support checking for
out-of-bounds pointer arithmetic, even though my understanding of
the C standard is that this is undefined behavior.  In particular, I
believe the following trivial program has undefined behavior (assuming
malloc() succeeds), but none of the sanitizers flag any warnings:

#include <stdlib.h>
int main(void) {
 char *buf = malloc(1);
 if (buf) {
    char *this_is_ub = buf + 3;
    free(buf);
 }
}

Of course, I suspect this just has not been implemented yet, but
it still leaves me at a loss for how to track this form of UB down.
Is there a better solution than manual code review?


This program does not have UB.
There’s nothing wrong with forming an "out-of-bound” pointer.
If you use it for anything, then that is UB - and address sanitizer will find such usages for you.

Like this:

#include <stdlib.h>
int main(void) {
 int ret = 0;
 char *buf = (char *) malloc(1);
 if (buf) {
    char *this_is_ub = buf + 3;
    ret = *this_is_ub;
    free(buf);
 }
 return ret;
}

——
% clang++ -fsanitize=address bug.cpp && ./a.out
=================================================================
==25602==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000d3 at pc 0x00010531ff18 bp 0x7ffeea8e0970 sp 0x7ffeea8e0968
READ of size 1 at 0x6020000000d3 thread T0
   #0 0x10531ff17 in main (a.out:x86_64+0x100000f17)
   #1 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)

0x6020000000d3 is located 2 bytes to the right of 1-byte region [0x6020000000d0,0x6020000000d1)
allocated by thread T0 here:
   #0 0x105e08abd in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x45abd)
   #1 0x10531feaf in main (a.out:x86_64+0x100000eaf)
   #2 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)

SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:x86_64+0x100000f17) in main
Shadow bytes around the buggy address:
 0x1c03ffffffc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c03ffffffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 00 fa fa 00 04
=>0x1c0400000010: fa fa 00 00 fa fa 00 06 fa fa[01]fa fa fa fa fa
 0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 0x1c0400000060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
 Addressable:           00
 Partially addressable: 01 02 03 04 05 06 07
 Heap left redzone:       fa
 Freed heap region:       fd
 Stack left redzone:      f1
 Stack mid redzone:       f2
 Stack right redzone:     f3
 Stack after return:      f5
 Stack use after scope:   f8
 Global redzone:          f9
 Global init order:       f6
 Poisoned by user:        f7
 Container overflow:      fc
 Array cookie:            ac
 Intra object redzone:    bb
 ASan internal:           fe
 Left alloca redzone:     ca
 Right alloca redzone:    cb
 Shadow gap:              cc
==25602==ABORTING
zsh: abort      ./a.out

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Detecting undefined pointer arithmetic

Manas via cfe-dev
In reply to this post by Manas via cfe-dev
We would/should not exploit UB in such a case, at least not in the shown
example. The pointer computation might yield `poison` but that is it.
If you'd use the pointer afterwards, the situation is different though.

~ Johannes


On 1/14/21 10:03 AM, John Criswell via cfe-dev wrote:

> Dear All,
>
> Actually, I think this is technically undefined behavior as (IIRC) C allows a pointer to extend one byte past the end of the referent memory object but not any further than that.
>
> That said, I wouldn’t be surprised if the sanitizers do not catch this case because:
>
> Since the code doesn’t do anything useful, dead code elimination may remove the offending code before the sanitizers instrument the program.
> The out of bounds pointer is not used to read or write memory, so it can’t corrupt or leak memory.  About the most danger it poses is that it may enable optimizations that the programmer is not expecting because the code is undefined.
>
> I’ve lost track of all the different sanitizers in LLVM and how they work, but the original Address Sanitizer just checks for out of bounds loads and stores; it doesn’t place any checks on pointer arithmetic operations (like the LLVM gep instruction).  That makes it faster at the cost of not catching all pointer arithmetic errors.
>
> Regards,
>
> John Criswell
>
> --
> John Criswell
> Associate Professor
> University of Rochester
> [hidden email]
>
>
>
>
>
>> On Jan 14, 2021, at 9:39 AM, Marshall Clow via cfe-dev <[hidden email]> wrote:
>>
>>> On Jan 10, 2021, at 11:39 PM, Demi M. Obenour via cfe-dev <[hidden email]> wrote:
>>>
>>> I noticed that none of the sanitizers seems to support checking for
>>> out-of-bounds pointer arithmetic, even though my understanding of
>>> the C standard is that this is undefined behavior.  In particular, I
>>> believe the following trivial program has undefined behavior (assuming
>>> malloc() succeeds), but none of the sanitizers flag any warnings:
>>>
>>> #include <stdlib.h>
>>> int main(void) {
>>>   char *buf = malloc(1);
>>>   if (buf) {
>>>      char *this_is_ub = buf + 3;
>>>      free(buf);
>>>   }
>>> }
>>>
>>> Of course, I suspect this just has not been implemented yet, but
>>> it still leaves me at a loss for how to track this form of UB down.
>>> Is there a better solution than manual code review?
>>
>> This program does not have UB.
>> There’s nothing wrong with forming an "out-of-bound” pointer.
>> If you use it for anything, then that is UB - and address sanitizer will find such usages for you.
>>
>> Like this:
>>
>> #include <stdlib.h>
>> int main(void) {
>>   int ret = 0;
>>   char *buf = (char *) malloc(1);
>>   if (buf) {
>>      char *this_is_ub = buf + 3;
>>      ret = *this_is_ub;
>>      free(buf);
>>   }
>>   return ret;
>> }
>>
>> ——
>> % clang++ -fsanitize=address bug.cpp && ./a.out
>> =================================================================
>> ==25602==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000d3 at pc 0x00010531ff18 bp 0x7ffeea8e0970 sp 0x7ffeea8e0968
>> READ of size 1 at 0x6020000000d3 thread T0
>>     #0 0x10531ff17 in main (a.out:x86_64+0x100000f17)
>>     #1 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>
>> 0x6020000000d3 is located 2 bytes to the right of 1-byte region [0x6020000000d0,0x6020000000d1)
>> allocated by thread T0 here:
>>     #0 0x105e08abd in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x45abd)
>>     #1 0x10531feaf in main (a.out:x86_64+0x100000eaf)
>>     #2 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>
>> SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:x86_64+0x100000f17) in main
>> Shadow bytes around the buggy address:
>>   0x1c03ffffffc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x1c03ffffffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 00 fa fa 00 04
>> =>0x1c0400000010: fa fa 00 00 fa fa 00 06 fa fa[01]fa fa fa fa fa
>>   0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>>   0x1c0400000030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>>   0x1c0400000040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>>   0x1c0400000050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>>   0x1c0400000060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> Shadow byte legend (one shadow byte represents 8 application bytes):
>>   Addressable:           00
>>   Partially addressable: 01 02 03 04 05 06 07
>>   Heap left redzone:       fa
>>   Freed heap region:       fd
>>   Stack left redzone:      f1
>>   Stack mid redzone:       f2
>>   Stack right redzone:     f3
>>   Stack after return:      f5
>>   Stack use after scope:   f8
>>   Global redzone:          f9
>>   Global init order:       f6
>>   Poisoned by user:        f7
>>   Container overflow:      fc
>>   Array cookie:            ac
>>   Intra object redzone:    bb
>>   ASan internal:           fe
>>   Left alloca redzone:     ca
>>   Right alloca redzone:    cb
>>   Shadow gap:              cc
>> ==25602==ABORTING
>> zsh: abort      ./a.out
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Detecting undefined pointer arithmetic

Manas via cfe-dev
On 1/14/21 7:52 PM, Johannes Doerfert via cfe-dev wrote:
> We would/should not exploit UB in such a case, at least not in the shown
> example. The pointer computation might yield `poison` but that is it.
> If you'd use the pointer afterwards, the situation is different though.
>
> ~ Johannes

The code pattern I see in practice is essentially:

    char *p = malloc(some_bytes);
    if (!p)
        goto fail;
    initialize(p, some_bytes);
    char *end = p + some_bytes;
    int offset = untrusted_user_input();
    if (offset < 0 || offset > (1 << 28))
        goto fail;
    if (p + offset > end)
        goto fail;

    /* assume that offset is in bounds from here on */

RPM uses this quite a bit.

Sincerely,

Demi


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Detecting undefined pointer arithmetic

Manas via cfe-dev
In reply to this post by Manas via cfe-dev
On 1/14/21 7:52 PM, Johannes Doerfert via cfe-dev wrote:
> We would/should not exploit UB in such a case, at least not in the shown
> example. The pointer computation might yield `poison` but that is it.
> If you'd use the pointer afterwards, the situation is different though.
>
> ~ Johannes

The problem is the following pattern in C:

```
#include <stdlib.h>
#include <stdint.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdbool.h>

/* Struct for parsing results */
typedef struct some_struct some_struct;
/* Stream of untrusted data */
typedef struct untrusted_stream untrusted_stream;

/* Parse untrusted data into `output`.  Returns false on error */
bool parse(untrusted_stream *stream, some_struct *output);

/*
 * Read `len` bytes from `stream` into `ptr`.  Returns false if it cannot read
 * the full amount.
 */
bool read_data(untrusted_stream *stream, void *ptr, uint32_t len);

/* Maximum input length */
static const uint32_t MAX_INPUT = 1024 * 1024;

static bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out);

/**
 * Parse some untrusted data of length `len`.  We are guaranteed that `ptr` points
 * to `len` bytes of data.
 * @param ptr Pointer to `len` bytes of untrusted data.
 * @param len The length of the untrusted data.
 */
bool process_untrusted_input(const uint8_t *ptr, const size_t len, some_struct *bar) {
   const uint8_t *end = ptr + len; /* end is now one-past-the-end */
   uint32_t foo;
   /* some parsing code */
   if (!extract_be_u32(ptr, end, &foo))
      return false;
   foo = ntohl(foo);
   /* more parsing code follows that uses foo */
   return true;
}

bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out) {
   uint32_t res;
   /* this is the bug: it should be `end - ptr < sizeof res` */
   if (ptr + sizeof res > end)
      return false;
   memcpy(&res, ptr, sizeof res);
   *out = ntohl(res);
   return true;
}

/* Parse some data from `stream` into `output`.  The data is not trusted. */
bool parse(untrusted_stream *stream, some_struct *output) {
   uint32_t len;
   if (!read_data(stream, &len, sizeof len))
      return false;
   len = ntohl(len);
   /* Avoid DoS attacks */
   if (len > MAX_INPUT)
      return false;
   uint8_t *ptr = malloc(len);
   if (!ptr || !read_data(stream, ptr, len))
      return false;
   bool res = process_untrusted_input(ptr, len, output);
   free(ptr);
   return res;
}
```

This code has undefined behavior if `process_untrusted_input` is passed less
than 4 bytes of input, which an attacker can often arrange to happen.  However,
I know of no way to detect this, or even test for it.  Sanitizers and valgrind
won’t catch it, since the out-of-bounds pointer is not dereferenced.  And the
amount by which the pointer is out of bounds is bounded, so it won’t wrap
in practice.  And if Clang optimized away the bounds check, the result is a
security vulnerability.

This isn’t just theoretical; I have seen similar bugs in the wild.
Fortunately, I haven’t seen any cases of them being miscompiled.

Will Clang miscompile code like this?  Even though it is undefined behavior,
it happens often enough that I would prefer if Clang made it defined as
an extension.  Dereferencing the out-of-bounds pointer would still be UB, of
course.  I believe there is precedent for this, in that type-punning via unions
is undefined behavior in C++ but (to my knowledge) works on both GCC and Clang.
Also, GCC supports `-fwrapv-pointer`.

Sincerely,

Demi


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

OpenPGP_signature (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Detecting undefined pointer arithmetic

Manas via cfe-dev
Hi All,

There are two instrumentations in the ASan that are related to the subject:
- pointer-subtract,
- pointer-compare.

They check if a pointer arithmetic operation of a given kind (subtract or compare) is not UB. The pointers must be from the same memory buffer allocation, including the one-element-after end pointer.

> The option must be combined with [...] -fsanitize=address [...] By default the check is disabled at run time. To enable it, add detect_invalid_pointer_pairs=2 to the environment variable ASAN_OPTIONS. Using detect_invalid_pointer_pairs=1 detects invalid operation only when both pointers are non-null.


These were good news. Bad news are:
- The compiler flags and runtime options "work" with clang but these checks are not officially supported. E.g. see https://bugs.llvm.org/show_bug.cgi?id=47626.
- Although these are supported by GCC, I have encountered multiple issues with them in the latest GCC  10. However, most (if not all) issues have been resolved and should be shipped with GCC 10.3 or GCC 11.
  - instrumentation may break constexpr evaluation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97145
  - these checks report valid issues in std::vector and std::stringbuf implementations in libstdc++: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97415
  - libasan crashes when the detect_stack_use_after_return runtime check is also enabled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97414

Despite the above issues, these checks have found some bugs in my projects. I consider them useful if you develop in UB-free pedantic mode.

Have fun,
Paweł Bylica


On Sat, Mar 6, 2021 at 10:56 PM Demi M. Obenour via cfe-dev <[hidden email]> wrote:
On 1/14/21 7:52 PM, Johannes Doerfert via cfe-dev wrote:
> We would/should not exploit UB in such a case, at least not in the shown
> example. The pointer computation might yield `poison` but that is it.
> If you'd use the pointer afterwards, the situation is different though.
>
> ~ Johannes

The problem is the following pattern in C:

```
#include <stdlib.h>
#include <stdint.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdbool.h>

/* Struct for parsing results */
typedef struct some_struct some_struct;
/* Stream of untrusted data */
typedef struct untrusted_stream untrusted_stream;

/* Parse untrusted data into `output`.  Returns false on error */
bool parse(untrusted_stream *stream, some_struct *output);

/*
 * Read `len` bytes from `stream` into `ptr`.  Returns false if it cannot read
 * the full amount.
 */
bool read_data(untrusted_stream *stream, void *ptr, uint32_t len);

/* Maximum input length */
static const uint32_t MAX_INPUT = 1024 * 1024;

static bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out);

/**
 * Parse some untrusted data of length `len`.  We are guaranteed that `ptr` points
 * to `len` bytes of data.
 * @param ptr Pointer to `len` bytes of untrusted data.
 * @param len The length of the untrusted data.
 */
bool process_untrusted_input(const uint8_t *ptr, const size_t len, some_struct *bar) {
   const uint8_t *end = ptr + len; /* end is now one-past-the-end */
   uint32_t foo;
   /* some parsing code */
   if (!extract_be_u32(ptr, end, &foo))
      return false;
   foo = ntohl(foo);
   /* more parsing code follows that uses foo */
   return true;
}

bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out) {
   uint32_t res;
   /* this is the bug: it should be `end - ptr < sizeof res` */
   if (ptr + sizeof res > end)
      return false;
   memcpy(&res, ptr, sizeof res);
   *out = ntohl(res);
   return true;
}

/* Parse some data from `stream` into `output`.  The data is not trusted. */
bool parse(untrusted_stream *stream, some_struct *output) {
   uint32_t len;
   if (!read_data(stream, &len, sizeof len))
      return false;
   len = ntohl(len);
   /* Avoid DoS attacks */
   if (len > MAX_INPUT)
      return false;
   uint8_t *ptr = malloc(len);
   if (!ptr || !read_data(stream, ptr, len))
      return false;
   bool res = process_untrusted_input(ptr, len, output);
   free(ptr);
   return res;
}
```

This code has undefined behavior if `process_untrusted_input` is passed less
than 4 bytes of input, which an attacker can often arrange to happen.  However,
I know of no way to detect this, or even test for it.  Sanitizers and valgrind
won’t catch it, since the out-of-bounds pointer is not dereferenced.  And the
amount by which the pointer is out of bounds is bounded, so it won’t wrap
in practice.  And if Clang optimized away the bounds check, the result is a
security vulnerability.

This isn’t just theoretical; I have seen similar bugs in the wild.
Fortunately, I haven’t seen any cases of them being miscompiled.

Will Clang miscompile code like this?  Even though it is undefined behavior,
it happens often enough that I would prefer if Clang made it defined as
an extension.  Dereferencing the out-of-bounds pointer would still be UB, of
course.  I believe there is precedent for this, in that type-punning via unions
is undefined behavior in C++ but (to my knowledge) works on both GCC and Clang.
Also, GCC supports `-fwrapv-pointer`.

Sincerely,

Demi

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev