RFC: proper string handling in CXString

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: proper string handling in CXString

Louis Dionne via cfe-dev
I opened this issue: https://bugs.llvm.org/show_bug.cgi?id=35896

MSAN tests fail because CXString is trying to reuse string data from
LLVM::StringRef.  However the latter doesn't guarantee that strings
are NUL-terminated, but CXStrings need to be.  Calling
`createRef(llvm::StringRef)` then selectively reuses (if there's a
zero terminator) or copies (otherwise) the bytes referenced in
StringRef.

To do this, CXString needs to possibly look past the end of the string
allocation, e.g. `if (str[str.size()] != 0)`.  If the allocation of
the string bytes ends at &str[str.size()-1], then MSAN fails because
of the out-of-bounds read of those data.

So, I was hoping to get some feedback or ideas on how to fix it.  I'm
trying to make LLVM and Clang MSAN-clean, as I'm trying to debug other
issues and preexisting problems are getting in the way, and I figure
that fixing these issues, however minor some of them are, are a good
thing anyway. :)

Some ideas:

(1) Always copy StringRef data.  Let `createRef(StringRef)` simply
jump to `createDup(StringRef)` instead, or codemod the former away.
Safe but involves more "memcpy"ing.
(2) Let CXString have only data and size fields, without worrying
about internal representation.  Then when using `clang_getCString`, do
the allocation + string copy + extra NUL byte.  Still does a "memcpy",
but only on strings the caller truly needs as C strings.
(3) My least favorite option, selectively ignore this particular MSAN
violation in some blacklist, and get on with life.

Other ideas?

Thanks!

--steveo
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: proper string handling in CXString

Louis Dionne via cfe-dev

> On Jan 11, 2018, at 10:16 AM, Steve O'Brien via cfe-dev <[hidden email]> wrote:
>
> I opened this issue: https://bugs.llvm.org/show_bug.cgi?id=35896
>
> MSAN tests fail because CXString is trying to reuse string data from
> LLVM::StringRef.  However the latter doesn't guarantee that strings
> are NUL-terminated, but CXStrings need to be.  Calling
> `createRef(llvm::StringRef)` then selectively reuses (if there's a
> zero terminator) or copies (otherwise) the bytes referenced in
> StringRef.
>
> To do this, CXString needs to possibly look past the end of the string
> allocation, e.g. `if (str[str.size()] != 0)`.  If the allocation of
> the string bytes ends at &str[str.size()-1], then MSAN fails because
> of the out-of-bounds read of those data.
>
> So, I was hoping to get some feedback or ideas on how to fix it.  I'm
> trying to make LLVM and Clang MSAN-clean, as I'm trying to debug other
> issues and preexisting problems are getting in the way, and I figure
> that fixing these issues, however minor some of them are, are a good
> thing anyway. :)
>
> Some ideas:
>
> (1) Always copy StringRef data.  Let `createRef(StringRef)` simply
> jump to `createDup(StringRef)` instead, or codemod the former away.
> Safe but involves more "memcpy"ing.
> (2) Let CXString have only data and size fields, without worrying
> about internal representation.  Then when using `clang_getCString`, do
> the allocation + string copy + extra NUL byte.  Still does a "memcpy",
> but only on strings the caller truly needs as C strings.

Of these options, I like (2) the most, provided there's some way to access the contents of a CXString without calling clang_getCString() on it (for performance-sensitive clients).

> (3) My least favorite option, selectively ignore this particular MSAN
> violation in some blacklist, and get on with life.

IMO this is a serious issue. E.g it seems plausible that a thread could store a non-zero value into the null-terminator byte after the CXString is constructed.

best,
vedant

>
> Other ideas?
>
> Thanks!
>
> --steveo
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev