Repeated clang-format'ting keeps changing code / use of clang-format in CI

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
Hi!


I'm trying to integrate clang-format version 9 with CI in a way that the
CI run only passes if applying clang-format yields the exact same
code, i.e. "git diff --exit-code" returns 0.  The idea is that every
pull request would only pass if clang-format

When trying to put that approach to action with libexpat [1]
I noticed that multiple runs to clang-format do not seem to produce the
same code, at least not with the two version of Clang 9 that I tested
[2], and at least not with libexpat code.

I wonder if that's a known problem, if it's fixed in later versions
of clang-format, if you are aware of workarounds, or if I just need
to say goodbye to combining clang-format and CI for stable style checking.

Thanks in advance!

Best



Sebastian


[1] https://github.com/libexpat/libexpat/pull/293
[2] 9.0.0.9999 commit 28c954cf961a846789a972a8ed179b7108244ae7
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
Hello again,


a quick update for anyone who wondered the same and ran into my previous
e-mail:

For a workaround, troublesome code can be wrapped by…

  // clang-format off
  [..]
  // clang-format on

…for C++ or…

  /* clang-format off */
  [..]
  /* clang-format on */

…for C to disable re-formatting of that section of code.  That way,
clang-format produces stable results and becomes suitable for during CI.

Best



Sebastian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
In general, I would expect any instability to be a bug - no doubt there are bugs, but if you can reduce the example to something easy to communicate in a bug report, etc, and file it, I expect some folks would be interested in fixing it.

On Fri, Jul 26, 2019 at 1:04 PM Sebastian Pipping via cfe-dev <[hidden email]> wrote:
Hello again,


a quick update for anyone who wondered the same and ran into my previous
e-mail:

For a workaround, troublesome code can be wrapped by…

  // clang-format off
  [..]
  // clang-format on

…for C++ or…

  /* clang-format off */
  [..]
  /* clang-format on */

…for C to disable re-formatting of that section of code.  That way,
clang-format produces stable results and becomes suitable for during CI.

Best



Sebastian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
In reply to this post by Kristóf Umann via cfe-dev
clang-cl is supposed to reach a fixpoint in one iteration. If it doesn't, that's a bug. If you can, please post a reduced repro case :)

On Thu, Jul 25, 2019 at 5:38 PM Sebastian Pipping via cfe-dev <[hidden email]> wrote:
Hi!


I'm trying to integrate clang-format version 9 with CI in a way that the
CI run only passes if applying clang-format yields the exact same
code, i.e. "git diff --exit-code" returns 0.  The idea is that every
pull request would only pass if clang-format

When trying to put that approach to action with libexpat [1]
I noticed that multiple runs to clang-format do not seem to produce the
same code, at least not with the two version of Clang 9 that I tested
[2], and at least not with libexpat code.

I wonder if that's a known problem, if it's fixed in later versions
of clang-format, if you are aware of workarounds, or if I just need
to say goodbye to combining clang-format and CI for stable style checking.

Thanks in advance!

Best



Sebastian


[1] https://github.com/libexpat/libexpat/pull/293
[2] 9.0.0.9999 commit 28c954cf961a846789a972a8ed179b7108244ae7
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
Hi Nico,


that's great to hear.  I have attached a script to reproduce the issue
from public code of libexpat 2.2.7 as well as the second iteration diff
I get from clang-format version 10.0.0
(/var/tmp/portage/sys-devel/clang-10.0.0.9999/work/x/y/clang-10.0.0.9999
c0048be7ff340ebba3092e95d82147bc9928b909) of Gentoo.

Best



Sebastian


On 26.07.19 22:07, Nico Weber wrote:

> clang-cl is supposed to reach a fixpoint in one iteration. If it
> doesn't, that's a bug. If you can, please post a reduced repro case :)
>
> On Thu, Jul 25, 2019 at 5:38 PM Sebastian Pipping via cfe-dev
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi!
>
>
>     I'm trying to integrate clang-format version 9 with CI in a way that the
>     CI run only passes if applying clang-format yields the exact same
>     code, i.e. "git diff --exit-code" returns 0.  The idea is that every
>     pull request would only pass if clang-format
>
>     When trying to put that approach to action with libexpat [1]
>     I noticed that multiple runs to clang-format do not seem to produce the
>     same code, at least not with the two version of Clang 9 that I tested
>     [2], and at least not with libexpat code.
>
>     I wonder if that's a known problem, if it's fixed in later versions
>     of clang-format, if you are aware of workarounds, or if I just need
>     to say goodbye to combining clang-format and CI for stable style
>     checking.
>
>     Thanks in advance!
>
>     Best
>
>
>
>     Sebastian
>
>
>     [1] https://github.com/libexpat/libexpat/pull/293
>     [2] 9.0.0.9999 commit 28c954cf961a846789a972a8ed179b7108244ae7
>     _______________________________________________
>     cfe-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

clang-format-fixpoint-issue-demo.sh.txt (586 bytes) Download Attachment
expat-2-2-7-clang-format-second-round.diff (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
Hi, this is the smallest reproducer I could find with the default clang-format configuration.

$ clang-format --version
TRC clang-format version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
$ clang-format test.c > test.c.1 && clang-format test.c.1 > test.c.2 && diff test.c.1 test.c.2
1,4c1,5
< const char *expected = XCS("ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPAB"
<                            "C") XCS("ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJ"
<                                     "KLMNOPABC") XCS(
<     "ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABC")
---
> const char *expected = XCS(
>     "ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPAB"
>     "C") XCS("ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJ"
>              "KLMNOPABC") XCS("ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOPABCDEFGHIJKLMNOP"
>                               "ABC")


--
Raphaël Londeix

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

test.c (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
In reply to this post by Kristóf Umann via cfe-dev
Thanks!

Here's a (not minimal) reduction of the first issue; it looks like the other issues are basically the same thing:

$ cat fmt.c
void cdataSectionTok()
{
  switch (1) {
#define LEAD_CASE(n) \
  case BT_LEAD ## n: \
    break
  LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
  default:
    break;
  }
}

$ out/gn/bin/clang-format fmt.c
void cdataSectionTok() {
  switch (1) {
#define LEAD_CASE(n)                                                           \
  case BT_LEAD##n:                                                             \
    break
    LEAD_CASE(2)
    LEAD_CASE(3) LEAD_CASE(4)
#undef LEAD_CASE
        default : break;
  }
}

As you can see by the weird "default : break" at the end, clang-format gets confused about the whole switch statement. It looks like the macros confuse it. Just adding semicolons after it is sufficient to unconfuse it:

$ cat fmt.c
void cdataSectionTok()
{
  switch (1) {
#define LEAD_CASE(n) \
  case BT_LEAD ## n: \
    break
  LEAD_CASE(2); LEAD_CASE(3); LEAD_CASE(4);
#undef LEAD_CASE
  default:
    break;
  }
}

$ out/gn/bin/clang-format fmt.c
void cdataSectionTok() {
  switch (1) {
#define LEAD_CASE(n)                                                           \
  case BT_LEAD##n:                                                             \
    break
    LEAD_CASE(2);
    LEAD_CASE(3);
    LEAD_CASE(4);
#undef LEAD_CASE
  default:
    break;
  }
}


If you're able to change expat's source, this might be a good approach.

Else, you can explicitly tell clang-format the name of macros that should be handled as statements in your .clang-format file like so:

StatementMacros: ['LEAD_CASE']

That helps with LEAD_CASE. (clang-format should still converge within one round, but it's going to converge to something bad without either of the two things I suggest, so it wouldn't help you much).

Like Raphaël's mail says, XCS is slightly different because it's not a statement. clang-format has special logic for formatting sequences of string literals, and that doesn't fire if each literal is surrounded by a macro.

You can fix this by doing `XCS("foo" "bar")` (with a linebreak in between foo and bar) instead of `XCS("foo") XCS("bar")`. Semantically they do the same thing as far as I can tell, but clang-format does much better with the first form. Having done this transformation locally, this is arguably easier to read for humans too.

There isn't a .clang-format setting for this case as far as I know.

(Again, clang-format should reach a fixed point in one hop, but again the one hop gives you pretty bad formatting so fixing that bug wouldn't help you all that much.)

With these two changes, clang-format does reach an end state in one step, and its output (for the two cases I looked at) looks good too.

Hope this helps!

On Fri, Jul 26, 2019 at 5:03 PM Sebastian Pipping <[hidden email]> wrote:
Hi Nico,


that's great to hear.  I have attached a script to reproduce the issue
from public code of libexpat 2.2.7 as well as the second iteration diff
I get from clang-format version 10.0.0
(/var/tmp/portage/sys-devel/clang-10.0.0.9999/work/x/y/clang-10.0.0.9999
c0048be7ff340ebba3092e95d82147bc9928b909) of Gentoo.

Best



Sebastian


On 26.07.19 22:07, Nico Weber wrote:
> clang-cl is supposed to reach a fixpoint in one iteration. If it
> doesn't, that's a bug. If you can, please post a reduced repro case :)
>
> On Thu, Jul 25, 2019 at 5:38 PM Sebastian Pipping via cfe-dev
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi!
>
>
>     I'm trying to integrate clang-format version 9 with CI in a way that the
>     CI run only passes if applying clang-format yields the exact same
>     code, i.e. "git diff --exit-code" returns 0.  The idea is that every
>     pull request would only pass if clang-format
>
>     When trying to put that approach to action with libexpat [1]
>     I noticed that multiple runs to clang-format do not seem to produce the
>     same code, at least not with the two version of Clang 9 that I tested
>     [2], and at least not with libexpat code.
>
>     I wonder if that's a known problem, if it's fixed in later versions
>     of clang-format, if you are aware of workarounds, or if I just need
>     to say goodbye to combining clang-format and CI for stable style
>     checking.
>
>     Thanks in advance!
>
>     Best
>
>
>
>     Sebastian
>
>
>     [1] https://github.com/libexpat/libexpat/pull/293
>     [2] 9.0.0.9999 commit 28c954cf961a846789a972a8ed179b7108244ae7
>     _______________________________________________
>     cfe-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
Hello Nico,


On 29.07.19 15:16, Nico Weber wrote:
> As you can see by the weird "default : break" at the end, clang-format
> gets confused about the whole switch statement. It looks like the macros
> confuse it. Just adding semicolons after it is sufficient to unconfuse it:
>
> [..]
>   LEAD_CASE(2); LEAD_CASE(3); LEAD_CASE(4);
> [..]
>
> If you're able to change expat's source, this might be a good approach.

interesting idea!


> Else, you can explicitly tell clang-format the name of macros that
> should be handled as statements in your .clang-format file like so:
>
> StatementMacros: ['LEAD_CASE']

Good to know!


> You can fix this by doing `XCS("foo" "bar")` (with a linebreak in
> between foo and bar) instead of `XCS("foo") XCS("bar")`. Semantically
> they do the same thing as far as I can tell, but clang-format does much
> better with the first form.

Things get interesting here because XCS is either

  # define XCS(s) _XCS(s)
  # define _XCS(s) L ## s

or

  # define XCS(s) s

depending of macro XML_UNICODE_WCHAR_T to turn string literals into
wide or narrow string literals.  For the first version, XCS("foo" "bar")
results in invalid C syntax, mixing wide L"foo" with narrow "bar".  As a
result, I needed to turn off BreakStringLiterals altogether for now.

Best



Sebastian
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
In reply to this post by Kristóf Umann via cfe-dev
Hello!


I have found existing bug "Inconsistent output when running clang-format
twice" [1] now and attached the instructions to my original case to it.

Best



Sebastian


[1] https://bugs.llvm.org/show_bug.cgi?id=42509

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Repeated clang-format'ting keeps changing code / use of clang-format in CI

Kristóf Umann via cfe-dev
In reply to this post by Kristóf Umann via cfe-dev
On Sat, Aug 3, 2019 at 5:55 PM Sebastian Pipping <[hidden email]> wrote:
Hello Nico,


On 29.07.19 15:16, Nico Weber wrote:
> As you can see by the weird "default : break" at the end, clang-format
> gets confused about the whole switch statement. It looks like the macros
> confuse it. Just adding semicolons after it is sufficient to unconfuse it:
>
> [..]
>   LEAD_CASE(2); LEAD_CASE(3); LEAD_CASE(4);
> [..]
>
> If you're able to change expat's source, this might be a good approach.

interesting idea!


> Else, you can explicitly tell clang-format the name of macros that
> should be handled as statements in your .clang-format file like so:
>
> StatementMacros: ['LEAD_CASE']

Good to know!


> You can fix this by doing `XCS("foo" "bar")` (with a linebreak in
> between foo and bar) instead of `XCS("foo") XCS("bar")`. Semantically
> they do the same thing as far as I can tell, but clang-format does much
> better with the first form.

Things get interesting here because XCS is either

  # define XCS(s) _XCS(s)
  # define _XCS(s) L ## s

or

  # define XCS(s) s

depending of macro XML_UNICODE_WCHAR_T to turn string literals into
wide or narrow string literals.  For the first version, XCS("foo" "bar")
results in invalid C syntax, mixing wide L"foo" with narrow "bar".  As a
result, I needed to turn off BreakStringLiterals altogether for now.

FWIW, C++11 added "If one string-literal has no encoding-prefix, it is treated as a string-literal of the same encoding-prefix as the other operand." to [lex.string]p13, so in C++11 `L"foo" "bar"` is the same as `L"foo" L"bar"`. But granted, it's implementation-defined in C++ before C++11.

The C standard says "If any of the tokens are wide string literal tokens, the resulting multibyte character sequence is treated as a wide string literal; otherwise, it is treated as a character string literal." in 6.4.5 String Literals p4. C89 still said "If a character string literal token is adjacent to a wide string literal token, the behavior is undefined", I think this changed in C99.

So if you need to support compilers that don't implement the C99 behavior (which you likely do need to do :) ), then you're right.

We should probably have a bug for making clang-format handle string literals surrounded by a macro call without semicolons that aren't in StatementMacros the same as bare string literals.
 

Best



Sebastian

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev