C++ algorithm analysis tool

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

C++ algorithm analysis tool

Richard Pennington via cfe-dev
Hey everyone,

Just wondering if there's a clang tool that can analyse an algorithm to suggest where standard algorithms can be replace handwritten algos?
E.g.

int i = 0;
for (; i < v.size(); ++i)
   if (v[i] == expected)
      break;
if (i != v.size())
   some_function(v[i]);

Can be rewritten to

auto i = find(v.begin(), v.end(), expected);
if (i != v.end())
   some_function(*i);

or in C++17:

if (auto i = find(v.begin(), v.end(), expected); i != v.end())
   some_function(*i);

If not, how difficult a task is it to write such a tool? Is there anything that one should take into special consideration while writing this tool? Do you think it would have a lot of use-cases? (I do, based on my company's code base, and code I have seen while marking assignments).

Cheers,

Chris

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev

Hi Chris,

To my knowledge, there isn't.

I don't recall where I got the idea, but I gave it a try last summer trying to implement a clang-tidy check doing what you proposed. I didn't have enough time to complete it, though, and I only managed to detect one or two very simple patterns.

After thinking about this idea for some time I found that clang-tidy might be a perfect place for that, not sure whether a separate tool would be beneficial. The task of detecting a specific pattern is very similar to what clang-tidy checks do in a wide range of tasks. Also, there'd be a separate heuristic set for each standard algorithm, which makes the partitioning into different checks (for each popular standard library algorithm) natural.

In my opinion, such checks would be useful, I'd be interested in seeing a proof-of-concept of some sort.

One more idea I have in mind: it might be interesting to try using CloneChecker (a check of Clang Static Analyzer) to detect similar patterns in a generic way, but I'm not sure how beneficial that would be in practice. Still, might worth a try.

+CC Alex, he might have some thoughts about this.

Kind regards,
Kirill


On 01/04/17 02:42, Christopher Di Bella via cfe-dev wrote:
Hey everyone,

Just wondering if there's a clang tool that can analyse an algorithm to suggest where standard algorithms can be replace handwritten algos?
E.g.

int i = 0;
for (; i < v.size(); ++i)
   if (v[i] == expected)
      break;
if (i != v.size())
   some_function(v[i]);

Can be rewritten to

auto i = find(v.begin(), v.end(), expected);
if (i != v.end())
   some_function(*i);

or in C++17:

if (auto i = find(v.begin(), v.end(), expected); i != v.end())
   some_function(*i);

If not, how difficult a task is it to write such a tool? Is there anything that one should take into special consideration while writing this tool? Do you think it would have a lot of use-cases? (I do, based on my company's code base, and code I have seen while marking assignments).

Cheers,

Chris


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev
In reply to this post by Richard Pennington via cfe-dev
Hmm, missed this one. Adding to Kirill's answer.

Yep, clang-tidy seems to be the the right place for such check. They
already have the `modernize' package which suggests using tricks from
higher versions of the C++ standard. They support fix-it hints to
suggest how to rewrite the highlighted code. They are also eagerly
accepting checks that are not highlighting bugs, unlike Clang Static
Analyzer, where, at least for now, the reputation requires to only
include critical warnings that most likely require immediate attention.
And your check doesn't seem to require symbolic execution, so relying on
the Analyzer's path-sensitive engine is not necessary.

You should be able to write particular checks easily using ASTMatchers;
with this awesome mechanism, the code for matching a particular pattern
wouldn't be much bigger than the pattern itself, and you'd be able to
re-use sub-patterns.



On 4/1/17 2:42 AM, Christopher Di Bella via cfe-dev wrote:

> Hey everyone,
>
> Just wondering if there's a clang tool that can analyse an algorithm
> to suggest where standard algorithms can be replace handwritten algos?
> E.g.
>
> int i = 0;
> for (; i < v.size(); ++i)
>    if (v[i] == expected)
>       break;
> if (i != v.size())
>    some_function(v[i]);
>
> Can be rewritten to
>
> auto i = find(v.begin(), v.end(), expected);
> if (i != v.end())
>    some_function(*i);
>
> or in C++17:
>
> if (auto i = find(v.begin(), v.end(), expected); i != v.end())
>    some_function(*i);
>
> If not, how difficult a task is it to write such a tool? Is there
> anything that one should take into special consideration while writing
> this tool? Do you think it would have a lot of use-cases? (I do, based
> on my company's code base, and code I have seen while marking
> assignments).
>
> Cheers,
>
> Chris
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev
In reply to this post by Richard Pennington via cfe-dev
+CC CloneDetector guys.

Hmm, the idea of making body-farms and then using CloneChecker to find
clones of synthesized bodies in the actual code looks curious and funny,
though i'm not immediately seeing how is it superior to ASTMatchers.


On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:

>
> Hi Chris,
>
> To my knowledge, there isn't.
>
> I don't recall where I got the idea, but I gave it a try last summer
> trying to implement a clang-tidy check doing what you proposed. I
> didn't have enough time to complete it, though, and I only managed to
> detect one or two very simple patterns.
>
> After thinking about this idea for some time I found that clang-tidy
> might be a perfect place for that, not sure whether a separate tool
> would be beneficial. The task of detecting a specific pattern is very
> similar to what clang-tidy checks do in a wide range of tasks. Also,
> there'd be a separate heuristic set for each standard algorithm, which
> makes the partitioning into different checks (for each popular
> standard library algorithm) natural.
>
> In my opinion, such checks would be useful, I'd be interested in
> seeing a proof-of-concept of some sort.
>
> One more idea I have in mind: it might be interesting to try using
> CloneChecker (a check of Clang Static Analyzer) to detect similar
> patterns in a generic way, but I'm not sure how beneficial that would
> be in practice. Still, might worth a try.
>
> +CC Alex, he might have some thoughts about this.
>
> Kind regards,
> Kirill
>
>
> On 01/04/17 02:42, Christopher Di Bella via cfe-dev wrote:
>> Hey everyone,
>>
>> Just wondering if there's a clang tool that can analyse an algorithm
>> to suggest where standard algorithms can be replace handwritten algos?
>> E.g.
>>
>> int i = 0;
>> for (; i < v.size(); ++i)
>>    if (v[i] == expected)
>>       break;
>> if (i != v.size())
>>    some_function(v[i]);
>>
>> Can be rewritten to
>>
>> auto i = find(v.begin(), v.end(), expected);
>> if (i != v.end())
>>    some_function(*i);
>>
>> or in C++17:
>>
>> if (auto i = find(v.begin(), v.end(), expected); i != v.end())
>>    some_function(*i);
>>
>> If not, how difficult a task is it to write such a tool? Is there
>> anything that one should take into special consideration while
>> writing this tool? Do you think it would have a lot of use-cases? (I
>> do, based on my company's code base, and code I have seen while
>> marking assignments).
>>
>> Cheers,
>>
>> Chris
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev
I feel the implementation of an algorithm in the STL and the
equivalent implementation written by the user are usually so different
in terms of syntax that the current clone detection won't work that
well. If we had a constraint for estimating semantic equality, then it
would be a different situation, but I don't expect an implementation
of this anytime soon :).

However, if we are satisfied with syntactic equality, then this can be
done in a few minutes with the clone detector infrastructure with
something like this pseudocode:

   detector.findClones(RecursiveTypeIIConstraint(),
MinFunctionBodyCount(1), MinComplexity(20));

This would detect code like this:

void bubblesort(int *array, int length) {
     for (int i = 0; i < length - 1; ++i)
        for (int j = 0; j < length - i - 1; ++j)
            if (array[j] > array[j + 1]) {
                int tmp = array[j];
                array[j] = array[j + 1];
                array[j + 1] = tmp;
            }
}

int main() {
     for (int i = 0; i < length - 1; ++i) // expect-warning{You could
call the function 'bubblesort' instead
        for (int j = 0; j < length - i - 1; ++j)
            if (array[j] > array[j + 1]) {
                int tmp = array[j];
                array[j] = array[j + 1];
                array[j + 1] = tmp;
            }
}

- Raphael

2017-04-04 10:35 GMT+02:00 Artem Dergachev <[hidden email]>:

> +CC CloneDetector guys.
>
> Hmm, the idea of making body-farms and then using CloneChecker to find
> clones of synthesized bodies in the actual code looks curious and funny,
> though i'm not immediately seeing how is it superior to ASTMatchers.
>
>
>
> On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:
>>
>>
>> Hi Chris,
>>
>> To my knowledge, there isn't.
>>
>> I don't recall where I got the idea, but I gave it a try last summer
>> trying to implement a clang-tidy check doing what you proposed. I didn't
>> have enough time to complete it, though, and I only managed to detect one or
>> two very simple patterns.
>>
>> After thinking about this idea for some time I found that clang-tidy might
>> be a perfect place for that, not sure whether a separate tool would be
>> beneficial. The task of detecting a specific pattern is very similar to what
>> clang-tidy checks do in a wide range of tasks. Also, there'd be a separate
>> heuristic set for each standard algorithm, which makes the partitioning into
>> different checks (for each popular standard library algorithm) natural.
>>
>> In my opinion, such checks would be useful, I'd be interested in seeing a
>> proof-of-concept of some sort.
>>
>> One more idea I have in mind: it might be interesting to try using
>> CloneChecker (a check of Clang Static Analyzer) to detect similar patterns
>> in a generic way, but I'm not sure how beneficial that would be in practice.
>> Still, might worth a try.
>>
>> +CC Alex, he might have some thoughts about this.
>>
>> Kind regards,
>> Kirill
>>
>>
>> On 01/04/17 02:42, Christopher Di Bella via cfe-dev wrote:
>>>
>>> Hey everyone,
>>>
>>> Just wondering if there's a clang tool that can analyse an algorithm to
>>> suggest where standard algorithms can be replace handwritten algos?
>>> E.g.
>>>
>>> int i = 0;
>>> for (; i < v.size(); ++i)
>>>    if (v[i] == expected)
>>>       break;
>>> if (i != v.size())
>>>    some_function(v[i]);
>>>
>>> Can be rewritten to
>>>
>>> auto i = find(v.begin(), v.end(), expected);
>>> if (i != v.end())
>>>    some_function(*i);
>>>
>>> or in C++17:
>>>
>>> if (auto i = find(v.begin(), v.end(), expected); i != v.end())
>>>    some_function(*i);
>>>
>>> If not, how difficult a task is it to write such a tool? Is there
>>> anything that one should take into special consideration while writing this
>>> tool? Do you think it would have a lot of use-cases? (I do, based on my
>>> company's code base, and code I have seen while marking assignments).
>>>
>>> Cheers,
>>>
>>> Chris
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev
On 04/04/17 10:59, Raphael Isemann wrote:

> I feel the implementation of an algorithm in the STL and the
> equivalent implementation written by the user are usually so different
> in terms of syntax that the current clone detection won't work that
> well. If we had a constraint for estimating semantic equality, then it
> would be a different situation, but I don't expect an implementation
> of this anytime soon :).
>
> However, if we are satisfied with syntactic equality, then this can be
> done in a few minutes with the clone detector infrastructure with
> something like this pseudocode:
>
>     detector.findClones(RecursiveTypeIIConstraint(),
> MinFunctionBodyCount(1), MinComplexity(20));
>
> This would detect code like this:
>
> void bubblesort(int *array, int length) {
>       for (int i = 0; i < length - 1; ++i)
>          for (int j = 0; j < length - i - 1; ++j)
>              if (array[j] > array[j + 1]) {
>                  int tmp = array[j];
>                  array[j] = array[j + 1];
>                  array[j + 1] = tmp;
>              }
> }
>
> int main() {
>       for (int i = 0; i < length - 1; ++i) // expect-warning{You could
> call the function 'bubblesort' instead
>          for (int j = 0; j < length - i - 1; ++j)
>              if (array[j] > array[j + 1]) {
>                  int tmp = array[j];
>                  array[j] = array[j + 1];
>                  array[j + 1] = tmp;
>              }
> }
>
> - Raphael

   That being said, patches are welcome. The current infrastructure is
friendly for implementing semantic clone detection constraints (clones
type III). We'd be interested to help if necessary.

-- Vassil

>
> 2017-04-04 10:35 GMT+02:00 Artem Dergachev <[hidden email]>:
>> +CC CloneDetector guys.
>>
>> Hmm, the idea of making body-farms and then using CloneChecker to find
>> clones of synthesized bodies in the actual code looks curious and funny,
>> though i'm not immediately seeing how is it superior to ASTMatchers.
>>
>>
>>
>> On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:
>>>
>>> Hi Chris,
>>>
>>> To my knowledge, there isn't.
>>>
>>> I don't recall where I got the idea, but I gave it a try last summer
>>> trying to implement a clang-tidy check doing what you proposed. I didn't
>>> have enough time to complete it, though, and I only managed to detect one or
>>> two very simple patterns.
>>>
>>> After thinking about this idea for some time I found that clang-tidy might
>>> be a perfect place for that, not sure whether a separate tool would be
>>> beneficial. The task of detecting a specific pattern is very similar to what
>>> clang-tidy checks do in a wide range of tasks. Also, there'd be a separate
>>> heuristic set for each standard algorithm, which makes the partitioning into
>>> different checks (for each popular standard library algorithm) natural.
>>>
>>> In my opinion, such checks would be useful, I'd be interested in seeing a
>>> proof-of-concept of some sort.
>>>
>>> One more idea I have in mind: it might be interesting to try using
>>> CloneChecker (a check of Clang Static Analyzer) to detect similar patterns
>>> in a generic way, but I'm not sure how beneficial that would be in practice.
>>> Still, might worth a try.
>>>
>>> +CC Alex, he might have some thoughts about this.
>>>
>>> Kind regards,
>>> Kirill
>>>
>>>
>>> On 01/04/17 02:42, Christopher Di Bella via cfe-dev wrote:
>>>> Hey everyone,
>>>>
>>>> Just wondering if there's a clang tool that can analyse an algorithm to
>>>> suggest where standard algorithms can be replace handwritten algos?
>>>> E.g.
>>>>
>>>> int i = 0;
>>>> for (; i < v.size(); ++i)
>>>>     if (v[i] == expected)
>>>>        break;
>>>> if (i != v.size())
>>>>     some_function(v[i]);
>>>>
>>>> Can be rewritten to
>>>>
>>>> auto i = find(v.begin(), v.end(), expected);
>>>> if (i != v.end())
>>>>     some_function(*i);
>>>>
>>>> or in C++17:
>>>>
>>>> if (auto i = find(v.begin(), v.end(), expected); i != v.end())
>>>>     some_function(*i);
>>>>
>>>> If not, how difficult a task is it to write such a tool? Is there
>>>> anything that one should take into special consideration while writing this
>>>> tool? Do you think it would have a lot of use-cases? (I do, based on my
>>>> company's code base, and code I have seen while marking assignments).
>>>>
>>>> Cheers,
>>>>
>>>> Chris
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> [hidden email]
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev
In reply to this post by Richard Pennington via cfe-dev
As Artem noted, there are already a few clang-tidy checks that try to suggest using more compact coding patterns like range-for-loopsmore suitable algorithms, shrink_to_fit(), etc. It should be equally possible to detect certain hand-rolled algorithms and replace them with STL. As for the amount of work it might take, you can take a look at the implementation of modernize-loop-convert, for example. If you would like to try this, you can start here.

On Tue, Apr 4, 2017 at 10:31 AM, Artem Dergachev via cfe-dev <[hidden email]> wrote:
Hmm, missed this one. Adding to Kirill's answer.

Yep, clang-tidy seems to be the the right place for such check. They already have the `modernize' package which suggests using tricks from higher versions of the C++ standard. They support fix-it hints to suggest how to rewrite the highlighted code. They are also eagerly accepting checks that are not highlighting bugs, unlike Clang Static Analyzer, where, at least for now, the reputation requires to only include critical warnings that most likely require immediate attention. And your check doesn't seem to require symbolic execution, so relying on the Analyzer's path-sensitive engine is not necessary.

You should be able to write particular checks easily using ASTMatchers; with this awesome mechanism, the code for matching a particular pattern wouldn't be much bigger than the pattern itself, and you'd be able to re-use sub-patterns.




On 4/1/17 2:42 AM, Christopher Di Bella via cfe-dev wrote:
Hey everyone,

Just wondering if there's a clang tool that can analyse an algorithm to suggest where standard algorithms can be replace handwritten algos?
E.g.

int i = 0;
for (; i < v.size(); ++i)
   if (v[i] == expected)
      break;
if (i != v.size())
   some_function(v[i]);

Can be rewritten to

auto i = find(v.begin(), v.end(), expected);
if (i != v.end())
   some_function(*i);

or in C++17:

if (auto i = find(v.begin(), v.end(), expected); i != v.end())
   some_function(*i);

If not, how difficult a task is it to write such a tool? Is there anything that one should take into special consideration while writing this tool? Do you think it would have a lot of use-cases? (I do, based on my company's code base, and code I have seen while marking assignments).

Cheers,

Chris


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev
In reply to this post by Richard Pennington via cfe-dev
On second thought, maybe it's quite neat.

I mean, like, a person who doesn't know anything about AST and about
matchers and whatever, could probably do something like that

$ cat < pattern.c
void bubblesort(int *array, int length) {
       for (int i = 0; i < length - 1; ++i)
          for (int j = 0; j < length - i - 1; ++j)
              if (array[j] > array[j + 1]) {
                  int tmp = array[j];
                  array[j] = array[j + 1];
                  array[j + 1] = tmp;
              }
}
^D

and then

$ clang-finder --pattern pattern.c my_project.c

and the tool would find all clones of the pattern in the project, by
constructing the AST for the pattern and hashing it and looking for
clones. The pattern would be in a different AST context, but i guess for
the clone checker it shouldn't be a problem, so we wouldn't need to deal
with ASTImporter for that.

On 4/4/17 12:25 PM, Vassil Vassilev wrote:

> On 04/04/17 10:59, Raphael Isemann wrote:
>> I feel the implementation of an algorithm in the STL and the
>> equivalent implementation written by the user are usually so different
>> in terms of syntax that the current clone detection won't work that
>> well. If we had a constraint for estimating semantic equality, then it
>> would be a different situation, but I don't expect an implementation
>> of this anytime soon :).
>>
>> However, if we are satisfied with syntactic equality, then this can be
>> done in a few minutes with the clone detector infrastructure with
>> something like this pseudocode:
>>
>>     detector.findClones(RecursiveTypeIIConstraint(),
>> MinFunctionBodyCount(1), MinComplexity(20));
>>
>> This would detect code like this:
>>
>> void bubblesort(int *array, int length) {
>>       for (int i = 0; i < length - 1; ++i)
>>          for (int j = 0; j < length - i - 1; ++j)
>>              if (array[j] > array[j + 1]) {
>>                  int tmp = array[j];
>>                  array[j] = array[j + 1];
>>                  array[j + 1] = tmp;
>>              }
>> }
>>
>> int main() {
>>       for (int i = 0; i < length - 1; ++i) // expect-warning{You could
>> call the function 'bubblesort' instead
>>          for (int j = 0; j < length - i - 1; ++j)
>>              if (array[j] > array[j + 1]) {
>>                  int tmp = array[j];
>>                  array[j] = array[j + 1];
>>                  array[j + 1] = tmp;
>>              }
>> }
>>
>> - Raphael
>
>   That being said, patches are welcome. The current infrastructure is
> friendly for implementing semantic clone detection constraints (clones
> type III). We'd be interested to help if necessary.
>
> -- Vassil
>>
>> 2017-04-04 10:35 GMT+02:00 Artem Dergachev <[hidden email]>:
>>> +CC CloneDetector guys.
>>>
>>> Hmm, the idea of making body-farms and then using CloneChecker to find
>>> clones of synthesized bodies in the actual code looks curious and
>>> funny,
>>> though i'm not immediately seeing how is it superior to ASTMatchers.
>>>
>>>
>>>
>>> On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:
>>>>
>>>> Hi Chris,
>>>>
>>>> To my knowledge, there isn't.
>>>>
>>>> I don't recall where I got the idea, but I gave it a try last summer
>>>> trying to implement a clang-tidy check doing what you proposed. I
>>>> didn't
>>>> have enough time to complete it, though, and I only managed to
>>>> detect one or
>>>> two very simple patterns.
>>>>
>>>> After thinking about this idea for some time I found that
>>>> clang-tidy might
>>>> be a perfect place for that, not sure whether a separate tool would be
>>>> beneficial. The task of detecting a specific pattern is very
>>>> similar to what
>>>> clang-tidy checks do in a wide range of tasks. Also, there'd be a
>>>> separate
>>>> heuristic set for each standard algorithm, which makes the
>>>> partitioning into
>>>> different checks (for each popular standard library algorithm)
>>>> natural.
>>>>
>>>> In my opinion, such checks would be useful, I'd be interested in
>>>> seeing a
>>>> proof-of-concept of some sort.
>>>>
>>>> One more idea I have in mind: it might be interesting to try using
>>>> CloneChecker (a check of Clang Static Analyzer) to detect similar
>>>> patterns
>>>> in a generic way, but I'm not sure how beneficial that would be in
>>>> practice.
>>>> Still, might worth a try.
>>>>
>>>> +CC Alex, he might have some thoughts about this.
>>>>
>>>> Kind regards,
>>>> Kirill
>>>>
>>>>
>>>> On 01/04/17 02:42, Christopher Di Bella via cfe-dev wrote:
>>>>> Hey everyone,
>>>>>
>>>>> Just wondering if there's a clang tool that can analyse an
>>>>> algorithm to
>>>>> suggest where standard algorithms can be replace handwritten algos?
>>>>> E.g.
>>>>>
>>>>> int i = 0;
>>>>> for (; i < v.size(); ++i)
>>>>>     if (v[i] == expected)
>>>>>        break;
>>>>> if (i != v.size())
>>>>>     some_function(v[i]);
>>>>>
>>>>> Can be rewritten to
>>>>>
>>>>> auto i = find(v.begin(), v.end(), expected);
>>>>> if (i != v.end())
>>>>>     some_function(*i);
>>>>>
>>>>> or in C++17:
>>>>>
>>>>> if (auto i = find(v.begin(), v.end(), expected); i != v.end())
>>>>>     some_function(*i);
>>>>>
>>>>> If not, how difficult a task is it to write such a tool? Is there
>>>>> anything that one should take into special consideration while
>>>>> writing this
>>>>> tool? Do you think it would have a lot of use-cases? (I do, based
>>>>> on my
>>>>> company's code base, and code I have seen while marking assignments).
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> [hidden email]
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> [hidden email]
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev


On 5 April 2017 at 11:05, Artem Dergachev via cfe-dev <[hidden email]> wrote:
On second thought, maybe it's quite neat.

I mean, like, a person who doesn't know anything about AST and about matchers and whatever, could probably do something like that

$ cat < pattern.c
void bubblesort(int *array, int length) {
      for (int i = 0; i < length - 1; ++i)
         for (int j = 0; j < length - i - 1; ++j)
             if (array[j] > array[j + 1]) {
                 int tmp = array[j];
                 array[j] = array[j + 1];
                 array[j + 1] = tmp;
             }
}
^D

and then

$ clang-finder --pattern pattern.c my_project.c

and the tool would find all clones of the pattern in the project, by constructing the AST for the pattern and hashing it and looking for clones. The pattern would be in a different AST context, but i guess for the clone checker it shouldn't be a problem, so we wouldn't need to deal with ASTImporter for that.

Pardon my ignorance, but will the clone checker match patterns that have different variable names (e.g. values instead of array and size instead of length)?

Also, is it possible to use patterns recursively? E.g. Say I want to detect naive swap and std::swap in the bubble sort:

swap1.c:
                 int tmp = array[j];
                 array[j] = array[j + 1];
                 array[j + 1] = tmp;
swap2.c:
                 std::swap(array[j], array[j + 1]);

pattern.c:
void bubblesort(int *array, int length) {
      for (int i = 0; i < length - 1; ++i)
         for (int j = 0; j < length - i - 1; ++j)
             if (array[j] > array[j + 1]) {
                 SWAP(array, j)
             }
}




On 4/4/17 12:25 PM, Vassil Vassilev wrote:
On 04/04/17 10:59, Raphael Isemann wrote:
I feel the implementation of an algorithm in the STL and the
equivalent implementation written by the user are usually so different
in terms of syntax that the current clone detection won't work that
well. If we had a constraint for estimating semantic equality, then it
would be a different situation, but I don't expect an implementation
of this anytime soon :).

However, if we are satisfied with syntactic equality, then this can be
done in a few minutes with the clone detector infrastructure with
something like this pseudocode:

    detector.findClones(RecursiveTypeIIConstraint(),
MinFunctionBodyCount(1), MinComplexity(20));

This would detect code like this:

void bubblesort(int *array, int length) {
      for (int i = 0; i < length - 1; ++i)
         for (int j = 0; j < length - i - 1; ++j)
             if (array[j] > array[j + 1]) {
                 int tmp = array[j];
                 array[j] = array[j + 1];
                 array[j + 1] = tmp;
             }
}

int main() {
      for (int i = 0; i < length - 1; ++i) // expect-warning{You could
call the function 'bubblesort' instead
         for (int j = 0; j < length - i - 1; ++j)
             if (array[j] > array[j + 1]) {
                 int tmp = array[j];
                 array[j] = array[j + 1];
                 array[j + 1] = tmp;
             }
}

- Raphael

  That being said, patches are welcome. The current infrastructure is friendly for implementing semantic clone detection constraints (clones type III). We'd be interested to help if necessary.

-- Vassil

2017-04-04 10:35 GMT+02:00 Artem Dergachev <[hidden email]>:
+CC CloneDetector guys.

Hmm, the idea of making body-farms and then using CloneChecker to find
clones of synthesized bodies in the actual code looks curious and funny,
though i'm not immediately seeing how is it superior to ASTMatchers.



On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:

Hi Chris,

To my knowledge, there isn't.

I don't recall where I got the idea, but I gave it a try last summer
trying to implement a clang-tidy check doing what you proposed. I didn't
have enough time to complete it, though, and I only managed to detect one or
two very simple patterns.

After thinking about this idea for some time I found that clang-tidy might
be a perfect place for that, not sure whether a separate tool would be
beneficial. The task of detecting a specific pattern is very similar to what
clang-tidy checks do in a wide range of tasks. Also, there'd be a separate
heuristic set for each standard algorithm, which makes the partitioning into
different checks (for each popular standard library algorithm) natural.

In my opinion, such checks would be useful, I'd be interested in seeing a
proof-of-concept of some sort.

One more idea I have in mind: it might be interesting to try using
CloneChecker (a check of Clang Static Analyzer) to detect similar patterns
in a generic way, but I'm not sure how beneficial that would be in practice.
Still, might worth a try.

+CC Alex, he might have some thoughts about this.

Kind regards,
Kirill


On 01/04/17 02:42, Christopher Di Bella via cfe-dev wrote:
Hey everyone,

Just wondering if there's a clang tool that can analyse an algorithm to
suggest where standard algorithms can be replace handwritten algos?
E.g.

int i = 0;
for (; i < v.size(); ++i)
    if (v[i] == expected)
       break;
if (i != v.size())
    some_function(v[i]);

Can be rewritten to

auto i = find(v.begin(), v.end(), expected);
if (i != v.end())
    some_function(*i);

or in C++17:

if (auto i = find(v.begin(), v.end(), expected); i != v.end())
    some_function(*i);

If not, how difficult a task is it to write such a tool? Is there
anything that one should take into special consideration while writing this
tool? Do you think it would have a lot of use-cases? (I do, based on my
company's code base, and code I have seen while marking assignments).

Cheers,

Chris


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: C++ algorithm analysis tool

Richard Pennington via cfe-dev
Clone checker currently does discard variable names, as long as they
follow the same pattern, eg. `a = a + b` is a clone of `c = c + d` but
not of `a = b + b`. It also has a mode in which it warns about patterns
that are almost-clones (broken in one variable) as copy-paste errors.

Function names matter though, so if we expect to find both SWAP and
std::swap, we'd need two runs for now.

As soon as https://reviews.llvm.org/D23418 lands, which is right about
now, the clone checker would also be *highly* configurable and modular,
and all sorts of per-use-case tweaks would be possible. Fuzzy-matching
function names is probably not impossible.

On 4/5/17 1:11 PM, Alex L wrote:

>
>
> On 5 April 2017 at 11:05, Artem Dergachev via cfe-dev
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     On second thought, maybe it's quite neat.
>
>     I mean, like, a person who doesn't know anything about AST and
>     about matchers and whatever, could probably do something like that
>
>     $ cat < pattern.c
>     void bubblesort(int *array, int length) {
>           for (int i = 0; i < length - 1; ++i)
>              for (int j = 0; j < length - i - 1; ++j)
>                  if (array[j] > array[j + 1]) {
>                      int tmp = array[j];
>                      array[j] = array[j + 1];
>                      array[j + 1] = tmp;
>                  }
>     }
>     ^D
>
>     and then
>
>     $ clang-finder --pattern pattern.c my_project.c
>
>     and the tool would find all clones of the pattern in the project,
>     by constructing the AST for the pattern and hashing it and looking
>     for clones. The pattern would be in a different AST context, but i
>     guess for the clone checker it shouldn't be a problem, so we
>     wouldn't need to deal with ASTImporter for that.
>
>
> Pardon my ignorance, but will the clone checker match patterns that
> have different variable names (e.g. values instead of array and size
> instead of length)?
>
> Also, is it possible to use patterns recursively? E.g. Say I want to
> detect naive swap and std::swap in the bubble sort:
>
> swap1.c:
>                  int tmp = array[j];
>                  array[j] = array[j + 1];
>                  array[j + 1] = tmp;
> swap2.c:
>                  std::swap(array[j], array[j + 1]);
>
> pattern.c:
> void bubblesort(int *array, int length) {
>       for (int i = 0; i < length - 1; ++i)
>          for (int j = 0; j < length - i - 1; ++j)
>              if (array[j] > array[j + 1]) {
>                  SWAP(array, j)
>              }
> }
>
>
>
>
>     On 4/4/17 12:25 PM, Vassil Vassilev wrote:
>
>         On 04/04/17 10:59, Raphael Isemann wrote:
>
>             I feel the implementation of an algorithm in the STL and the
>             equivalent implementation written by the user are usually
>             so different
>             in terms of syntax that the current clone detection won't
>             work that
>             well. If we had a constraint for estimating semantic
>             equality, then it
>             would be a different situation, but I don't expect an
>             implementation
>             of this anytime soon :).
>
>             However, if we are satisfied with syntactic equality, then
>             this can be
>             done in a few minutes with the clone detector
>             infrastructure with
>             something like this pseudocode:
>
>                 detector.findClones(RecursiveTypeIIConstraint(),
>             MinFunctionBodyCount(1), MinComplexity(20));
>
>             This would detect code like this:
>
>             void bubblesort(int *array, int length) {
>                   for (int i = 0; i < length - 1; ++i)
>                      for (int j = 0; j < length - i - 1; ++j)
>                          if (array[j] > array[j + 1]) {
>                              int tmp = array[j];
>                              array[j] = array[j + 1];
>                              array[j + 1] = tmp;
>                          }
>             }
>
>             int main() {
>                   for (int i = 0; i < length - 1; ++i) //
>             expect-warning{You could
>             call the function 'bubblesort' instead
>                      for (int j = 0; j < length - i - 1; ++j)
>                          if (array[j] > array[j + 1]) {
>                              int tmp = array[j];
>                              array[j] = array[j + 1];
>                              array[j + 1] = tmp;
>                          }
>             }
>
>             - Raphael
>
>
>           That being said, patches are welcome. The current
>         infrastructure is friendly for implementing semantic clone
>         detection constraints (clones type III). We'd be interested to
>         help if necessary.
>
>         -- Vassil
>
>
>             2017-04-04 10:35 GMT+02:00 Artem Dergachev
>             <[hidden email] <mailto:[hidden email]>>:
>
>                 +CC CloneDetector guys.
>
>                 Hmm, the idea of making body-farms and then using
>                 CloneChecker to find
>                 clones of synthesized bodies in the actual code looks
>                 curious and funny,
>                 though i'm not immediately seeing how is it superior
>                 to ASTMatchers.
>
>
>
>                 On 4/1/17 1:49 PM, Kirill Bobyrev via cfe-dev wrote:
>
>
>                     Hi Chris,
>
>                     To my knowledge, there isn't.
>
>                     I don't recall where I got the idea, but I gave it
>                     a try last summer
>                     trying to implement a clang-tidy check doing what
>                     you proposed. I didn't
>                     have enough time to complete it, though, and I
>                     only managed to detect one or
>                     two very simple patterns.
>
>                     After thinking about this idea for some time I
>                     found that clang-tidy might
>                     be a perfect place for that, not sure whether a
>                     separate tool would be
>                     beneficial. The task of detecting a specific
>                     pattern is very similar to what
>                     clang-tidy checks do in a wide range of tasks.
>                     Also, there'd be a separate
>                     heuristic set for each standard algorithm, which
>                     makes the partitioning into
>                     different checks (for each popular standard
>                     library algorithm) natural.
>
>                     In my opinion, such checks would be useful, I'd be
>                     interested in seeing a
>                     proof-of-concept of some sort.
>
>                     One more idea I have in mind: it might be
>                     interesting to try using
>                     CloneChecker (a check of Clang Static Analyzer) to
>                     detect similar patterns
>                     in a generic way, but I'm not sure how beneficial
>                     that would be in practice.
>                     Still, might worth a try.
>
>                     +CC Alex, he might have some thoughts about this.
>
>                     Kind regards,
>                     Kirill
>
>
>                     On 01/04/17 02:42, Christopher Di Bella via
>                     cfe-dev wrote:
>
>                         Hey everyone,
>
>                         Just wondering if there's a clang tool that
>                         can analyse an algorithm to
>                         suggest where standard algorithms can be
>                         replace handwritten algos?
>                         E.g.
>
>                         int i = 0;
>                         for (; i < v.size(); ++i)
>                             if (v[i] == expected)
>                                break;
>                         if (i != v.size())
>                             some_function(v[i]);
>
>                         Can be rewritten to
>
>                         auto i = find(v.begin(), v.end(), expected);
>                         if (i != v.end())
>                             some_function(*i);
>
>                         or in C++17:
>
>                         if (auto i = find(v.begin(), v.end(),
>                         expected); i != v.end())
>                             some_function(*i);
>
>                         If not, how difficult a task is it to write
>                         such a tool? Is there
>                         anything that one should take into special
>                         consideration while writing this
>                         tool? Do you think it would have a lot of
>                         use-cases? (I do, based on my
>                         company's code base, and code I have seen
>                         while marking assignments).
>
>                         Cheers,
>
>                         Chris
>
>
>                         _______________________________________________
>                         cfe-dev mailing list
>                         [hidden email]
>                         <mailto:[hidden email]>
>                         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>                         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
>
>
>                     _______________________________________________
>                     cfe-dev mailing list
>                     [hidden email] <mailto:[hidden email]>
>                     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>                     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
>
>
>     _______________________________________________
>     cfe-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...