Advanced Rewriting

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Advanced Rewriting

via cfe-dev
Hey everyone

The rewriting API of Clang operates on the source code in textual form.
The user can use AST nodes to figure out what to replace, but in the end
he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest
replacements. The involvement of macros makes a manual process even more
difficult. See some recent threads expressing difficulty with the API
[1][2].

What do I mean by "nested replacements"? For example in the following:

     int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the
MemberExpr with some constant:

     int i = Addition(x, 7);

When keeping the two replacement rules independent of each other,
achieving this with the current API is extremely difficult. More so when
macros are involved.

I am proposing some kind of helper that aims to solve these issues by
providing an interface that offers to directly replace AST nodes and a
mechanism to nest AST node replacements - without having to worry about
macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how
replacements should happen:

     class RewriteAdds : public cu::StmtToRewrite
     {
     public:
         std::string makeReplaceStr() const override
         {
             auto binOp = dyn_cast<BinaryOperator>(replaceS);
             return "Addition(" +
getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
         }
     };

     class RewriteMembs : public cu::StmtToRewrite
     {
     public:
         std::string makeReplaceStr() const override
         {
             return "7";
         }
     };

- Construct a RewriteManager:

     cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

     // std::vector<const Stmt *> AddStmts = /* matched from
binaryOperator() with plus */
     // std::vector<const Stmt *> MembStmts = /* matched from
memberExpr() */
     for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
     for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

     clang::Rewriter rewriter(SM, LangOpts);
     for (const auto &r : mgr.getReplacements()) {
         rewriter.RemoveText(r.rangeToRemove);
         rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
     }


At the end of this mail is my low quality code that kind-of implements
this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which
replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by
spelling locations. Expand any macros that are encountered during this.
The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within
macros. For example in "#define FOO(a) a + a" the two "a"s expand to
different AST nodes that could be replaced with different rules. This is
an important issue, because it can lead to completely broken code with
nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy
to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if
required. For example not required if just a macro arg is replaced and
all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced
with clang, but this was my vision how I would want to do replacements.
Are you interested in getting this into clang? I would need help with
ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



----------------------------------------

RewriteManager.h

----------------------------------------

#ifndef CLANGUTIL_REWRITEMANAGER_H
#define CLANGUTIL_REWRITEMANAGER_H

#include "ClangUtil/SourceRangeLess.h"
#include "make_unique.h"
#include "clang/AST/AST.h"
#include <vector>
#include <map>


// TODO extend to decls


namespace cu
{
// Represents a statement in the original AST that should be rewritten.
To implement recursive replacements, call
// getMgr()->getReplaced() on any AST node within the makeReplaceStr
callback.
class StmtToRewrite
{
     friend class RewriteManager;

public:
     // Returns the enclosing RewriteManager.
     class RewriteManager *getMgr() const;
     // Override this to build a replacement string. Implement recursive
replacements with RewriteManager::getReplaced.
     virtual std::string makeReplaceStr() const = 0;

     // The statement to replace.
     const clang::Stmt *replaceS = nullptr;

private:
     RewriteManager *m_mgr;
};

struct RewriteOperation
{
     clang::SourceRange rangeToRemove;
     std::string strToInsert;
};

// A class for managing replacements of AST nodes. It allows to
specifically target AST nodes instead of raw source
// locations to enable easy replacements involving macros and nested
replacements.
// For extended documentation see: doc/rewriting.md
class RewriteManager
{
public:
     RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);

     clang::ASTContext &getACtx() const { return ACtx; }

     // Registers a StmtToRewrite for use with getReplacements. Call
this on all
     // statements that should be rewritten before calling any rewriting
functions.
     void registerStmt(std::unique_ptr<StmtToRewrite> S);

     // Helper for constructing the custom type from a Stmt.
     template <typename T, typename... Args>
     void registerStmt(const clang::Stmt *S, Args... args)
     {
         auto p = std::make_unique<T>(std::forward<Args>(args)...);
         p->replaceS = S;
         registerStmt(std::move(p));
     }

     // Get the full replacement of an AST node. Note that this function
removes any replaced statements from the work
     // list, so calling it twice will only replace the first time.
     RewriteOperation getReplaced(const clang::Stmt *S);
     // Get all replacements. These may be fewer than the requested ones
because of nesting.
     std::vector<RewriteOperation> getReplacements();

private:
     std::string getExpandedCode(const clang::Stmt *toReplaceS);

private:
     clang::ASTContext &ACtx;
     const clang::LangOptions &LangOpts;
     clang::SourceManager &SM;
     clang::Preprocessor &PP;

     // Manages the pending replacements.
     class WorkList
     {
     public:
         typedef std::map<clang::SourceRange, std::vector<const
StmtToRewrite *>> RangeToRepMap;

         WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);

         bool isStmtPending(const clang::Stmt *S) const;
         void addStmt(std::unique_ptr<StmtToRewrite> S);
         const RangeToRepMap &getRangeToReplacementsMap() const;
         std::vector<const StmtToRewrite *> getSortedReplacements() const;
         void markDone(const StmtToRewrite *S);
         void cleanup();

     private:
         clang::ASTContext &ACtx;
         clang::SourceManager &SM;
         std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
         std::vector<std::unique_ptr<StmtToRewrite>> m_done;
         RangeToRepMap m_rangeToReplacements;
     };

     WorkList m_workList;
};

} // namespace cu

#endif



----------------------------------------

RewriteManager.cpp

----------------------------------------

#include "ClangUtil/RewriteManager.h"
#include "ClangUtil/ASTUtil.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"
#include "clang/Lex/TokenConcatenation.h"
#include "clang/Lex/MacroArgs.h"


using namespace cu;


// Returns a Stmt that is the first parent of startS whose expansion
range is within the given range.
static const clang::Stmt *GetFullMacroStmt(clang::SourceRange range,
const clang::Stmt *startS, clang::ASTContext &ACtx)
{
     auto &SM = ACtx.getSourceManager();

     // Walk the tree upwards until ST does no longer expand to within
range.
     const clang::Stmt *ST = startS;
     while (true)
     {
         const auto &parents = ACtx.getParents(*ST);
         if (parents.empty())
         {
             break;
         }
         auto childS = ST;
         ST = parents[0].get<clang::Stmt>();
         if (!ST)
         {
             if (auto D = parents[0].get<clang::Decl>())
             {
                 const auto &parentsD = ACtx.getParents(*D);
                 if (parentsD.empty())
                 {
                     break;
                 }
                 ST = parentsD[0].get<clang::Stmt>();
                 if (!ST)
                 {
                     break;
                 }
             }
             else
             {
                 break;
             }
         }

         auto exLocS = SM.getExpansionLoc(ST->getLocStart());
         auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
         if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
             SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
         {
             return childS;
         }
     }

     return nullptr;
}


RewriteManager *StmtToRewrite::getMgr() const
{
     return m_mgr;
}


RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx,
clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) const
{
     for (const auto &r : m_pending)
     {
         if (r->replaceS == S)
         {
             return true;
         }
     }
     return false;
}
void RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
{
     // Use the expansion range for maximal replacement flexibility in
macros.
     auto replaceRange =
SM.getExpansionRange(S->replaceS->getSourceRange());

     // TODO not quite correct.
     /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
         std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite *lhs,
const StmtToRewrite *rhs) {
             auto lhsRange =
SM.getExpansionRange(lhs->replaceS->getSourceRange());
             auto rhsRange =
SM.getExpansionRange(rhs->replaceS->getSourceRange());
             return IsContained(rhsRange, lhsRange, SM);
         });
     };*/

     // Establish hierarchical relation between all ranges.
     bool found = false;
     // First, check if this range is within one we already have.
     for (auto &r : m_rangeToReplacements)
     {
         if (IsContained(replaceRange, r.first, SM))
         {
             // Insert in a sorted order.
             for (auto it = r.second.begin(); it != r.second.end(); ++it)
             {
                 //auto testRange =
SM.getExpansionRange((*it)->replaceS->getSourceRange());
                 // if (IsContained(testRange, replaceRange, SM))
                 if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
                 {
                     r.second.insert(it, S.get());
                     found = true;
                     break;
                 }
             }
             if (!found)
             {
                 r.second.push_back(S.get());
                 found = true;
             }
             break;
         }
     }
     // Not within existing range, add as new top-level range.
     if (!found)
     {
         // Check if any existing ranges are contained within the new one.
         std::vector<const StmtToRewrite *> moveThese;
         auto it = m_rangeToReplacements.begin();
         while (it != m_rangeToReplacements.end())
         {
             if (IsContained(it->first, replaceRange, SM))
             {
                 moveThese.insert(moveThese.end(), it->second.begin(),
it->second.end());
                 it = m_rangeToReplacements.erase(it);
             }
             else
             {
                 ++it;
             }
         }
         auto &accesses = m_rangeToReplacements[replaceRange];
         // The order is important here. We want the first element to be
the one that spans the full range.
         accesses.push_back(S.get());
         // TODO sort "moveThese".
         accesses.insert(accesses.end(), moveThese.begin(),
moveThese.end());
     }

     int count = 0;
     for (const auto &r : m_rangeToReplacements)
     {
         printf("range %i\n", count++);
         for (const auto &a : r.second)
         {
             printf("replacement:\n");
             a->replaceS->dump();
         }
     }

     m_pending.push_back(std::move(S));
}
const RewriteManager::WorkList::RangeToRepMap
&RewriteManager::WorkList::getRangeToReplacementsMap() const
{
     return m_rangeToReplacements;
}
std::vector<const StmtToRewrite *>
RewriteManager::WorkList::getSortedReplacements() const
{
     std::vector<const StmtToRewrite *> result;
     for (auto &r : m_rangeToReplacements)
     {
         result.insert(result.end(), r.second.begin(), r.second.end());
     }
     return result;
}
void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
{
     // Remove from hierarchy.
     for (auto &r : m_rangeToReplacements)
     {
         r.second.erase(std::remove(r.second.begin(), r.second.end(),
S), r.second.end());
     }

     // Move from pending to done list.
     auto it = std::find_if(m_pending.begin(), m_pending.end(),
                            [&](const std::unique_ptr<StmtToRewrite>
&rep) { return rep.get() == S; });
     if (it == m_pending.end())
     {
         throw std::runtime_error("Did not find replacement to mark as
done");
     }
     m_done.push_back(std::move(*it));
     m_pending.erase(it);
}
void RewriteManager::WorkList::cleanup()
{
     m_done.clear();
}


RewriteManager::RewriteManager(clang::ASTContext &ACtx,
clang::Preprocessor &PP)
     : ACtx(ACtx), LangOpts(ACtx.getLangOpts()),
SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
{
}

void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
{
     if (!S->replaceS)
     {
         throw std::runtime_error("Must set replaceS");
     }

     if (m_workList.isStmtPending(S->replaceS))
     {
         throw std::runtime_error("This Stmt will already be replaced");
     }

     S->m_mgr = this;
     m_workList.addStmt(std::move(S));
}

RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
{
     auto range = SM.getExpansionRange(S->getSourceRange());
     return { range, getExpandedCode(S) };
}

std::vector<RewriteOperation> RewriteManager::getReplacements()
{
     std::vector<RewriteOperation> results;

     for (auto &rangeAndAccesses : m_workList.getRangeToReplacementsMap())
     {
         auto &range = rangeAndAccesses.first;
         auto &accesses = rangeAndAccesses.second;

         // Cannot replace something inside a macro because it would
replace all expansions instead of just the selected
         // AST node. So in a first step, get an enclosing statement
that is no longer inside a macro.
         // TODO we could keep the original code more clean by not
expanding macro args if the whole expansion does not
         // contain the macro arg more than once.
         auto macroS = GetFullMacroStmt(range, accesses[0]->replaceS, ACtx);

         results.push_back(getReplaced(macroS));

         // TODO we could run clang-format on the replacements. this
would especially benefit long macro expansions.
     }

     m_workList.cleanup();

     return results;
}

std::string RewriteManager::getExpandedCode(const clang::Stmt *toReplaceS)
{
     // TODO performance optimization. this is parsing way more than
required.

     using namespace clang;

     printf("getExpandedCode:\n");
     toReplaceS->dump();

     std::string out;

     auto toReplaceExpStart = SM.getExpansionLoc(toReplaceS->getLocStart());
     auto toReplaceExpEnd = SM.getExpansionLoc(toReplaceS->getLocEnd());
     auto toReplaceSpellStart =
SM.getSpellingLoc(toReplaceS->getLocStart());
     auto toReplaceSpellEnd = SM.getSpellingLoc(toReplaceS->getLocEnd());

     auto FID = SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));

     // The following is inspired by:
clang/Rewrite/HTMLRewrite.cpp:HighlightMacros

     // Re-lex the raw token stream into a token buffer.
     std::vector<Token> TokenStream;

     const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
     Lexer L(FID, FromFile, SM, PP.getLangOpts());

     // Lex all the tokens in raw mode, to avoid entering #includes or
expanding
     // macros.
     while (1)
     {
         Token Tok;
         L.LexFromRawLexer(Tok);

         // If this is a # at the start of a line, discard it from the
token stream.
         // We don't want the re-preprocess step to see #defines,
#includes or other
         // preprocessor directives.
         if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
             continue;

         // If this is a ## token, change its kind to unknown so that
repreprocessing
         // it will not produce an error.
         if (Tok.is(tok::hashhash))
             Tok.setKind(tok::unknown);

         // If this raw token is an identifier, the raw lexer won't have
looked up
         // the corresponding identifier info for it.  Do this now so
that it will be
         // macro expanded when we re-preprocess it.
         if (Tok.is(tok::raw_identifier))
             PP.LookUpIdentifierInfo(Tok);

         TokenStream.push_back(Tok);

         for (auto &rep : m_workList.getSortedReplacements())
         {
             auto repS = rep->replaceS;
             auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
             if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
             {
                 //
             }
         }

         if (Tok.is(tok::eof))
             break;
     }

     // Temporarily change the diagnostics object so that we ignore any
generated
     // diagnostics from this pass.
     DiagnosticsEngine TmpDiags(PP.getDiagnostics().getDiagnosticIDs(),
&PP.getDiagnostics().getDiagnosticOptions(),
                                new IgnoringDiagConsumer);

     // Copy the preprocessor and all of its state.
     auto PPOpts =
std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
     LangOptions LO = PP.getLangOpts();
     Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(),
PP.getHeaderSearchInfo(), PP.getModuleLoader(),
PP.getIdentifierTable().getExternalIdentifierLookup());
     TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
     TmpPP.setExternalSource(PP.getExternalSource());
     TmpPP.setPreprocessedOutput(true);

     std::map<const clang::IdentifierInfo *, bool> MacroPreviouslyEnabled;
     for (const auto &m : PP.macros())
     {
         // printf("PREDEF MACRO: %s\n", m.first->getName().str().c_str());
         TmpPP.getMacroDefinition(m.first);

         for (const auto &tmpm : TmpPP.macros())
         {
             if (tmpm.first == m.first)
             {
                 auto MD = m.second.getLatest();
                 auto MI = MD->getMacroInfo();
                 // If this is a recursive call we might be in a macro
expansion and the macro might be disabled. We need
                 // to enable it for now so that all expansions work.
Restore it later.
                 MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
                 if (!MI->isEnabled())
                 {
                     MD->getMacroInfo()->EnableMacro();
                 }

                 // This should not change anything since we just copy
data over.
                 auto &mutableState =
const_cast<std::remove_const<decltype(tmpm.second)>::type &>(tmpm.second);
                 mutableState.setLatest(MD);
                 break;
             }
         }
     }

     class MacroArgCollector : public clang::PPCallbacks
     {
     public:
         MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}

         void MacroExpands(const Token &Tok, const MacroDefinition &MD,
SourceRange Range, const MacroArgs *Args) override
         {
             if (!Args)
             {
                 return;
             }
             printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
             for (int i = 0; i < (int)Args->getNumMacroArguments(); i++)
             {
                 auto TokUnex = Args->getUnexpArgument(i);
                 // Thats just non-const for a cache, so should be fine.
                 auto TokPreExp = const_cast<MacroArgs
*>(Args)->getPreExpArgument(i, TmpPP);
                 printf("unexp: %s\n", TmpPP.getSpelling(*TokUnex).c_str());
                 for (const auto &T : TokPreExp)
                 {
                     printf("preexp: %s\n", TmpPP.getSpelling(T).c_str());
                 }
             }
         }

         Preprocessor &TmpPP;
     };
TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
     // Instead: collect the macro arg info in the law lexing step
above. or do another pass that uses the PP but without expansions.

     /*printf("DUMP MACRO INFO\n");
     for (const auto &m : PP.macros())
         PP.dumpMacroInfo(m.first);
     printf("---\n");
     for (const auto &m : TmpPP.macros())
         TmpPP.dumpMacroInfo(m.first);
     printf("DUMP MACRO INFO END\n");*/

     DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();

     // Inform the preprocessor that we don't want comments.
     TmpPP.SetCommentRetentionState(false, false);

     // We don't want pragmas either. Although we filtered out #pragma,
removing
     // _Pragma and __pragma is much harder.
     bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
     TmpPP.setPragmasEnabled(false);

     // Enter the tokens we just lexed.  This will cause them to be
macro expanded
     // but won't enter sub-files (because we removed #'s).
     TmpPP.EnterTokenStream(TokenStream, false);

     TokenConcatenation ConcatInfo(TmpPP);

     // Lex all the tokens.
     Token Tok;
     TmpPP.Lex(Tok);

     std::map<SourceLocation, int> slocIdx;

     auto checkReplacement = [&]() {
         for (auto &rep : m_workList.getSortedReplacements())
         {
             // auto rep = r.second.get();
             auto repS = rep->replaceS;
             auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
             // TODO we need to check here if the repS spans the full
range (or largest?)
             if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
             {
                 if (slocIdx[spellLoc] == 7)
                 {
                     // replace
                 }
                 slocIdx[spellLoc]++;

                 // Done replacing that one, but have to keep it alive
until we're done with it.
                 m_workList.markDone(rep);

                 printf("[[[\n");
                 auto repStr = rep->makeReplaceStr();
                 printf("REPLACED: %s ]]]\n", repStr.c_str());
                 out += repStr;

                 // Skip ahead until after the whole replacement.
                 auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
                 while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
                 {
                     TmpPP.Lex(Tok);
                     assert(!Tok.is(tok::eof) && "End not found");
                 }

                 // Eat one more since we stopped at the end token and
we want to continue after it.
                 TmpPP.Lex(Tok);

                 return true;
             }
         }
         return false;
     };

     while (Tok.isNot(tok::eof))
     {
         printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());

         auto TokLoc = Tok.getLocation();
         auto TokExp = SM.getExpansionLoc(TokLoc);
         if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
         {
             // Anything after the Stmt we want to replace is not
interesting.
             break;
         }

         // Skip ahead until we are at the expansion start of the Stmt
we want to replace.
         if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
         {
             if (TokLoc.isMacroID())
             {
                 // This is the first token of a macro expansion.
                 auto LLoc = SM.getExpansionRange(TokLoc);

                 // Ignore tokens whose instantiation location was not
the main file.
                 if (SM.getFileID(LLoc.first) != FID)
                 {
                     TmpPP.Lex(Tok);
                     continue;
                 }

                 assert(SM.getFileID(LLoc.second) == FID &&
                        "Start and end of expansion must be in the same
ultimate file!");

                 bool stopOutputOnNextToken = false;
                 bool toReplaceStartsInMacro = toReplaceExpStart == TokExp;
                 bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
                 bool startedOutput = false;

                 Token PrevPrevTok;
                 Token PrevTok = Tok;

                 while (!Tok.is(tok::eof) &&
SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
                 {
                     printf("TOKEN (in macro): %s\n",
TmpPP.getSpelling(Tok).c_str());

                     auto TokSpell = SM.getSpellingLoc(Tok.getLocation());
                     if (stopOutputOnNextToken)
                     {
                         break;
                     }
                     if (toReplaceEndsInMacro && TokSpell ==
toReplaceSpellEnd)
                     {
                         stopOutputOnNextToken = true;
                     }

                     if (toReplaceStartsInMacro && !startedOutput)
                     {
                         if (TokSpell == toReplaceSpellStart)
                         {
                             startedOutput = true;
                         }
                         else
                         {
                             TmpPP.Lex(Tok);
                             continue;
                         }
                     }

                     // If the tokens were already space separated, or
if they must be to avoid
                     // them being implicitly pasted, add a space
between them.
                     if (Tok.hasLeadingSpace() ||
ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
                         out += ' ';

                     if (checkReplacement())
                     {
                         continue;
                     }

                     out += TmpPP.getSpelling(Tok);
                     TmpPP.Lex(Tok);
                 }
                 if (stopOutputOnNextToken)
                 {
                     break;
                 }
             }
             else
             {
                 if (checkReplacement())
                 {
                     continue;
                 }

                 // Output original code because we are outside of a
replacement.
                 out += TmpPP.getSpelling(Tok);
                 TmpPP.Lex(Tok);
             }
         }
         else
         {
             TmpPP.Lex(Tok);
         }
     }

     // Restore the preprocessor's old state.
     TmpPP.setDiagnostics(*OldDiags);
     TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);

     for (const auto &tmpm : TmpPP.macros())
     {
         auto it = MacroPreviouslyEnabled.find(tmpm.first);
         if (it != MacroPreviouslyEnabled.end())
         {
             auto MD = tmpm.second.getLatest();
             auto MI = MD->getMacroInfo();
             if (MI->isEnabled() && !it->second)
             {
                 MI->DisableMacro();
             }
             else if (!MI->isEnabled() && it->second)
             {
                 MI->EnableMacro();
             }
         }
     }

     return out;
}


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



----------------------------------------

RewriteManager.h

----------------------------------------

#ifndef CLANGUTIL_REWRITEMANAGER_H
#define CLANGUTIL_REWRITEMANAGER_H

#include "ClangUtil/SourceRangeLess.h"
#include "make_unique.h"
#include "clang/AST/AST.h"
#include <vector>
#include <map>


// TODO extend to decls


namespace cu
{
// Represents a statement in the original AST that should be rewritten. To implement recursive replacements, call
// getMgr()->getReplaced() on any AST node within the makeReplaceStr callback.
class StmtToRewrite
{
    friend class RewriteManager;

public:
    // Returns the enclosing RewriteManager.
    class RewriteManager *getMgr() const;
    // Override this to build a replacement string. Implement recursive replacements with RewriteManager::getReplaced.
    virtual std::string makeReplaceStr() const = 0;

    // The statement to replace.
    const clang::Stmt *replaceS = nullptr;

private:
    RewriteManager *m_mgr;
};

struct RewriteOperation
{
    clang::SourceRange rangeToRemove;
    std::string strToInsert;
};

// A class for managing replacements of AST nodes. It allows to specifically target AST nodes instead of raw source
// locations to enable easy replacements involving macros and nested replacements.
// For extended documentation see: doc/rewriting.md
class RewriteManager
{
public:
    RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);

    clang::ASTContext &getACtx() const { return ACtx; }

    // Registers a StmtToRewrite for use with getReplacements. Call this on all
    // statements that should be rewritten before calling any rewriting functions.
    void registerStmt(std::unique_ptr<StmtToRewrite> S);

    // Helper for constructing the custom type from a Stmt.
    template <typename T, typename... Args>
    void registerStmt(const clang::Stmt *S, Args... args)
    {
        auto p = std::make_unique<T>(std::forward<Args>(args)...);
        p->replaceS = S;
        registerStmt(std::move(p));
    }

    // Get the full replacement of an AST node. Note that this function removes any replaced statements from the work
    // list, so calling it twice will only replace the first time.
    RewriteOperation getReplaced(const clang::Stmt *S);
    // Get all replacements. These may be fewer than the requested ones because of nesting.
    std::vector<RewriteOperation> getReplacements();

private:
    std::string getExpandedCode(const clang::Stmt *toReplaceS);

private:
    clang::ASTContext &ACtx;
    const clang::LangOptions &LangOpts;
    clang::SourceManager &SM;
    clang::Preprocessor &PP;

    // Manages the pending replacements.
    class WorkList
    {
    public:
        typedef std::map<clang::SourceRange, std::vector<const StmtToRewrite *>> RangeToRepMap;

        WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);

        bool isStmtPending(const clang::Stmt *S) const;
        void addStmt(std::unique_ptr<StmtToRewrite> S);
        const RangeToRepMap &getRangeToReplacementsMap() const;
        std::vector<const StmtToRewrite *> getSortedReplacements() const;
        void markDone(const StmtToRewrite *S);
        void cleanup();

    private:
        clang::ASTContext &ACtx;
        clang::SourceManager &SM;
        std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
        std::vector<std::unique_ptr<StmtToRewrite>> m_done;
        RangeToRepMap m_rangeToReplacements;
    };

    WorkList m_workList;
};

} // namespace cu

#endif



----------------------------------------

RewriteManager.cpp

----------------------------------------

#include "ClangUtil/RewriteManager.h"
#include "ClangUtil/ASTUtil.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"
#include "clang/Lex/TokenConcatenation.h"
#include "clang/Lex/MacroArgs.h"


using namespace cu;


// Returns a Stmt that is the first parent of startS whose expansion range is within the given range.
static const clang::Stmt *GetFullMacroStmt(clang::SourceRange range, const clang::Stmt *startS, clang::ASTContext &ACtx)
{
    auto &SM = ACtx.getSourceManager();

    // Walk the tree upwards until ST does no longer expand to within range.
    const clang::Stmt *ST = startS;
    while (true)
    {
        const auto &parents = ACtx.getParents(*ST);
        if (parents.empty())
        {
            break;
        }
        auto childS = ST;
        ST = parents[0].get<clang::Stmt>();
        if (!ST)
        {
            if (auto D = parents[0].get<clang::Decl>())
            {
                const auto &parentsD = ACtx.getParents(*D);
                if (parentsD.empty())
                {
                    break;
                }
                ST = parentsD[0].get<clang::Stmt>();
                if (!ST)
                {
                    break;
                }
            }
            else
            {
                break;
            }
        }

        auto exLocS = SM.getExpansionLoc(ST->getLocStart());
        auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
        if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
            SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
        {
            return childS;
        }
    }

    return nullptr;
}


RewriteManager *StmtToRewrite::getMgr() const
{
    return m_mgr;
}


RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) const
{
    for (const auto &r : m_pending)
    {
        if (r->replaceS == S)
        {
            return true;
        }
    }
    return false;
}
void RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
{
    // Use the expansion range for maximal replacement flexibility in macros.
    auto replaceRange = SM.getExpansionRange(S->replaceS->getSourceRange());

    // TODO not quite correct.
    /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
        std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite *lhs, const StmtToRewrite *rhs) {
            auto lhsRange = SM.getExpansionRange(lhs->replaceS->getSourceRange());
            auto rhsRange = SM.getExpansionRange(rhs->replaceS->getSourceRange());
            return IsContained(rhsRange, lhsRange, SM);
        });
    };*/

    // Establish hierarchical relation between all ranges.
    bool found = false;
    // First, check if this range is within one we already have.
    for (auto &r : m_rangeToReplacements)
    {
        if (IsContained(replaceRange, r.first, SM))
        {
            // Insert in a sorted order.
            for (auto it = r.second.begin(); it != r.second.end(); ++it)
            {
                //auto testRange = SM.getExpansionRange((*it)->replaceS->getSourceRange());
                // if (IsContained(testRange, replaceRange, SM))
                if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
                {
                    r.second.insert(it, S.get());
                    found = true;
                    break;
                }
            }
            if (!found)
            {
                r.second.push_back(S.get());
                found = true;
            }
            break;
        }
    }
    // Not within existing range, add as new top-level range.
    if (!found)
    {
        // Check if any existing ranges are contained within the new one.
        std::vector<const StmtToRewrite *> moveThese;
        auto it = m_rangeToReplacements.begin();
        while (it != m_rangeToReplacements.end())
        {
            if (IsContained(it->first, replaceRange, SM))
            {
                moveThese.insert(moveThese.end(), it->second.begin(), it->second.end());
                it = m_rangeToReplacements.erase(it);
            }
            else
            {
                ++it;
            }
        }
        auto &accesses = m_rangeToReplacements[replaceRange];
        // The order is important here. We want the first element to be the one that spans the full range.
        accesses.push_back(S.get());
        // TODO sort "moveThese".
        accesses.insert(accesses.end(), moveThese.begin(), moveThese.end());
    }

    int count = 0;
    for (const auto &r : m_rangeToReplacements)
    {
        printf("range %i\n", count++);
        for (const auto &a : r.second)
        {
            printf("replacement:\n");
            a->replaceS->dump();
        }
    }

    m_pending.push_back(std::move(S));
}
const RewriteManager::WorkList::RangeToRepMap &RewriteManager::WorkList::getRangeToReplacementsMap() const
{
    return m_rangeToReplacements;
}
std::vector<const StmtToRewrite *> RewriteManager::WorkList::getSortedReplacements() const
{
    std::vector<const StmtToRewrite *> result;
    for (auto &r : m_rangeToReplacements)
    {
        result.insert(result.end(), r.second.begin(), r.second.end());
    }
    return result;
}
void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
{
    // Remove from hierarchy.
    for (auto &r : m_rangeToReplacements)
    {
        r.second.erase(std::remove(r.second.begin(), r.second.end(), S), r.second.end());
    }

    // Move from pending to done list.
    auto it = std::find_if(m_pending.begin(), m_pending.end(),
                           [&](const std::unique_ptr<StmtToRewrite> &rep) { return rep.get() == S; });
    if (it == m_pending.end())
    {
        throw std::runtime_error("Did not find replacement to mark as done");
    }
    m_done.push_back(std::move(*it));
    m_pending.erase(it);
}
void RewriteManager::WorkList::cleanup()
{
    m_done.clear();
}


RewriteManager::RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP)
    : ACtx(ACtx), LangOpts(ACtx.getLangOpts()), SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
{
}

void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
{
    if (!S->replaceS)
    {
        throw std::runtime_error("Must set replaceS");
    }

    if (m_workList.isStmtPending(S->replaceS))
    {
        throw std::runtime_error("This Stmt will already be replaced");
    }

    S->m_mgr = this;
    m_workList.addStmt(std::move(S));
}

RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
{
    auto range = SM.getExpansionRange(S->getSourceRange());
    return { range, getExpandedCode(S) };
}

std::vector<RewriteOperation> RewriteManager::getReplacements()
{
    std::vector<RewriteOperation> results;

    for (auto &rangeAndAccesses : m_workList.getRangeToReplacementsMap())
    {
        auto &range = rangeAndAccesses.first;
        auto &accesses = rangeAndAccesses.second;

        // Cannot replace something inside a macro because it would replace all expansions instead of just the selected
        // AST node. So in a first step, get an enclosing statement that is no longer inside a macro.
        // TODO we could keep the original code more clean by not expanding macro args if the whole expansion does not
        // contain the macro arg more than once.
        auto macroS = GetFullMacroStmt(range, accesses[0]->replaceS, ACtx);

        results.push_back(getReplaced(macroS));

        // TODO we could run clang-format on the replacements. this would especially benefit long macro expansions.
    }

    m_workList.cleanup();

    return results;
}

std::string RewriteManager::getExpandedCode(const clang::Stmt *toReplaceS)
{
    // TODO performance optimization. this is parsing way more than required.

    using namespace clang;

    printf("getExpandedCode:\n");
    toReplaceS->dump();

    std::string out;

    auto toReplaceExpStart = SM.getExpansionLoc(toReplaceS->getLocStart());
    auto toReplaceExpEnd = SM.getExpansionLoc(toReplaceS->getLocEnd());
    auto toReplaceSpellStart = SM.getSpellingLoc(toReplaceS->getLocStart());
    auto toReplaceSpellEnd = SM.getSpellingLoc(toReplaceS->getLocEnd());

    auto FID = SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));

    // The following is inspired by: clang/Rewrite/HTMLRewrite.cpp:HighlightMacros

    // Re-lex the raw token stream into a token buffer.
    std::vector<Token> TokenStream;

    const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
    Lexer L(FID, FromFile, SM, PP.getLangOpts());

    // Lex all the tokens in raw mode, to avoid entering #includes or expanding
    // macros.
    while (1)
    {
        Token Tok;
        L.LexFromRawLexer(Tok);

        // If this is a # at the start of a line, discard it from the token stream.
        // We don't want the re-preprocess step to see #defines, #includes or other
        // preprocessor directives.
        if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
            continue;

        // If this is a ## token, change its kind to unknown so that repreprocessing
        // it will not produce an error.
        if (Tok.is(tok::hashhash))
            Tok.setKind(tok::unknown);

        // If this raw token is an identifier, the raw lexer won't have looked up
        // the corresponding identifier info for it.  Do this now so that it will be
        // macro expanded when we re-preprocess it.
        if (Tok.is(tok::raw_identifier))
            PP.LookUpIdentifierInfo(Tok);

        TokenStream.push_back(Tok);

        for (auto &rep : m_workList.getSortedReplacements())
        {
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                //
            }
        }

        if (Tok.is(tok::eof))
            break;
    }

    // Temporarily change the diagnostics object so that we ignore any generated
    // diagnostics from this pass.
    DiagnosticsEngine TmpDiags(PP.getDiagnostics().getDiagnosticIDs(), &PP.getDiagnostics().getDiagnosticOptions(),
                               new IgnoringDiagConsumer);

    // Copy the preprocessor and all of its state.
    auto PPOpts = std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
    LangOptions LO = PP.getLangOpts();
    Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(), PP.getHeaderSearchInfo(), PP.getModuleLoader(),
PP.getIdentifierTable().getExternalIdentifierLookup());
    TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
    TmpPP.setExternalSource(PP.getExternalSource());
    TmpPP.setPreprocessedOutput(true);

    std::map<const clang::IdentifierInfo *, bool> MacroPreviouslyEnabled;
    for (const auto &m : PP.macros())
    {
        // printf("PREDEF MACRO: %s\n", m.first->getName().str().c_str());
        TmpPP.getMacroDefinition(m.first);

        for (const auto &tmpm : TmpPP.macros())
        {
            if (tmpm.first == m.first)
            {
                auto MD = m.second.getLatest();
                auto MI = MD->getMacroInfo();
                // If this is a recursive call we might be in a macro expansion and the macro might be disabled. We need
                // to enable it for now so that all expansions work. Restore it later.
                MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
                if (!MI->isEnabled())
                {
                    MD->getMacroInfo()->EnableMacro();
                }

                // This should not change anything since we just copy data over.
                auto &mutableState = const_cast<std::remove_const<decltype(tmpm.second)>::type &>(tmpm.second);
                mutableState.setLatest(MD);
                break;
            }
        }
    }

    class MacroArgCollector : public clang::PPCallbacks
    {
    public:
        MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}

        void MacroExpands(const Token &Tok, const MacroDefinition &MD, SourceRange Range, const MacroArgs *Args) override
        {
            if (!Args)
            {
                return;
            }
            printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
            for (int i = 0; i < (int)Args->getNumMacroArguments(); i++)
            {
                auto TokUnex = Args->getUnexpArgument(i);
                // Thats just non-const for a cache, so should be fine.
                auto TokPreExp = const_cast<MacroArgs *>(Args)->getPreExpArgument(i, TmpPP);
                printf("unexp: %s\n", TmpPP.getSpelling(*TokUnex).c_str());
                for (const auto &T : TokPreExp)
                {
                    printf("preexp: %s\n", TmpPP.getSpelling(T).c_str());
                }
            }
        }

        Preprocessor &TmpPP;
    };
TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
    // Instead: collect the macro arg info in the law lexing step above. or do another pass that uses the PP but without expansions.

    /*printf("DUMP MACRO INFO\n");
    for (const auto &m : PP.macros())
        PP.dumpMacroInfo(m.first);
    printf("---\n");
    for (const auto &m : TmpPP.macros())
        TmpPP.dumpMacroInfo(m.first);
    printf("DUMP MACRO INFO END\n");*/

    DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();

    // Inform the preprocessor that we don't want comments.
    TmpPP.SetCommentRetentionState(false, false);

    // We don't want pragmas either. Although we filtered out #pragma, removing
    // _Pragma and __pragma is much harder.
    bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
    TmpPP.setPragmasEnabled(false);

    // Enter the tokens we just lexed.  This will cause them to be macro expanded
    // but won't enter sub-files (because we removed #'s).
    TmpPP.EnterTokenStream(TokenStream, false);

    TokenConcatenation ConcatInfo(TmpPP);

    // Lex all the tokens.
    Token Tok;
    TmpPP.Lex(Tok);

    std::map<SourceLocation, int> slocIdx;

    auto checkReplacement = [&]() {
        for (auto &rep : m_workList.getSortedReplacements())
        {
            // auto rep = r.second.get();
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            // TODO we need to check here if the repS spans the full range (or largest?)
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                if (slocIdx[spellLoc] == 7)
                {
                    // replace
                }
                slocIdx[spellLoc]++;

                // Done replacing that one, but have to keep it alive until we're done with it.
                m_workList.markDone(rep);

                printf("[[[\n");
                auto repStr = rep->makeReplaceStr();
                printf("REPLACED: %s ]]]\n", repStr.c_str());
                out += repStr;

                // Skip ahead until after the whole replacement.
                auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
                while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
                {
                    TmpPP.Lex(Tok);
                    assert(!Tok.is(tok::eof) && "End not found");
                }

                // Eat one more since we stopped at the end token and we want to continue after it.
                TmpPP.Lex(Tok);

                return true;
            }
        }
        return false;
    };

    while (Tok.isNot(tok::eof))
    {
        printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());

        auto TokLoc = Tok.getLocation();
        auto TokExp = SM.getExpansionLoc(TokLoc);
        if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
        {
            // Anything after the Stmt we want to replace is not interesting.
            break;
        }

        // Skip ahead until we are at the expansion start of the Stmt we want to replace.
        if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
        {
            if (TokLoc.isMacroID())
            {
                // This is the first token of a macro expansion.
                auto LLoc = SM.getExpansionRange(TokLoc);

                // Ignore tokens whose instantiation location was not the main file.
                if (SM.getFileID(LLoc.first) != FID)
                {
                    TmpPP.Lex(Tok);
                    continue;
                }

                assert(SM.getFileID(LLoc.second) == FID &&
                       "Start and end of expansion must be in the same ultimate file!");

                bool stopOutputOnNextToken = false;
                bool toReplaceStartsInMacro = toReplaceExpStart == TokExp;
                bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
                bool startedOutput = false;

                Token PrevPrevTok;
                Token PrevTok = Tok;

                while (!Tok.is(tok::eof) && SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
                {
                    printf("TOKEN (in macro): %s\n", TmpPP.getSpelling(Tok).c_str());

                    auto TokSpell = SM.getSpellingLoc(Tok.getLocation());
                    if (stopOutputOnNextToken)
                    {
                        break;
                    }
                    if (toReplaceEndsInMacro && TokSpell == toReplaceSpellEnd)
                    {
                        stopOutputOnNextToken = true;
                    }

                    if (toReplaceStartsInMacro && !startedOutput)
                    {
                        if (TokSpell == toReplaceSpellStart)
                        {
                            startedOutput = true;
                        }
                        else
                        {
                            TmpPP.Lex(Tok);
                            continue;
                        }
                    }

                    // If the tokens were already space separated, or if they must be to avoid
                    // them being implicitly pasted, add a space between them.
                    if (Tok.hasLeadingSpace() || ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
                        out += ' ';

                    if (checkReplacement())
                    {
                        continue;
                    }

                    out += TmpPP.getSpelling(Tok);
                    TmpPP.Lex(Tok);
                }
                if (stopOutputOnNextToken)
                {
                    break;
                }
            }
            else
            {
                if (checkReplacement())
                {
                    continue;
                }

                // Output original code because we are outside of a replacement.
                out += TmpPP.getSpelling(Tok);
                TmpPP.Lex(Tok);
            }
        }
        else
        {
            TmpPP.Lex(Tok);
        }
    }

    // Restore the preprocessor's old state.
    TmpPP.setDiagnostics(*OldDiags);
    TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);

    for (const auto &tmpm : TmpPP.macros())
    {
        auto it = MacroPreviouslyEnabled.find(tmpm.first);
        if (it != MacroPreviouslyEnabled.end())
        {
            auto MD = tmpm.second.getLatest();
            auto MI = MD->getMacroInfo();
            if (MI->isEnabled() && !it->second)
            {
                MI->DisableMacro();
            }
            else if (!MI->isEnabled() && it->second)
            {
                MI->EnableMacro();
            }
        }
    }

    return out;
}



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev

Hi Jonas

Thanks for introducing me to this, I have seen the "Replacement" before, but not clang-refactor.

However it seems to only provide management facilities around rewrite operations and not aid with the rewriting itself. Am I missing something here?

The two core problems for me:

- nesting replacements: When implementing replacements with clang-refactor, I still have to provide replacements that are closed in themselves. I cannot make them depend on others, right?
- macros: clang-refactor only seems to work with spelling locations.

Maybe an even simpler example: Replace all additions with "add(lhs, rhs)". This in itself is very difficult with clang as soon as the Stmts are nested or macros are involved.

Best regards
Rafael


On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



----------------------------------------

RewriteManager.h

----------------------------------------

#ifndef CLANGUTIL_REWRITEMANAGER_H
#define CLANGUTIL_REWRITEMANAGER_H

#include "ClangUtil/SourceRangeLess.h"
#include "make_unique.h"
#include "clang/AST/AST.h"
#include <vector>
#include <map>


// TODO extend to decls


namespace cu
{
// Represents a statement in the original AST that should be rewritten. To implement recursive replacements, call
// getMgr()->getReplaced() on any AST node within the makeReplaceStr callback.
class StmtToRewrite
{
    friend class RewriteManager;

public:
    // Returns the enclosing RewriteManager.
    class RewriteManager *getMgr() const;
    // Override this to build a replacement string. Implement recursive replacements with RewriteManager::getReplaced.
    virtual std::string makeReplaceStr() const = 0;

    // The statement to replace.
    const clang::Stmt *replaceS = nullptr;

private:
    RewriteManager *m_mgr;
};

struct RewriteOperation
{
    clang::SourceRange rangeToRemove;
    std::string strToInsert;
};

// A class for managing replacements of AST nodes. It allows to specifically target AST nodes instead of raw source
// locations to enable easy replacements involving macros and nested replacements.
// For extended documentation see: doc/rewriting.md
class RewriteManager
{
public:
    RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);

    clang::ASTContext &getACtx() const { return ACtx; }

    // Registers a StmtToRewrite for use with getReplacements. Call this on all
    // statements that should be rewritten before calling any rewriting functions.
    void registerStmt(std::unique_ptr<StmtToRewrite> S);

    // Helper for constructing the custom type from a Stmt.
    template <typename T, typename... Args>
    void registerStmt(const clang::Stmt *S, Args... args)
    {
        auto p = std::make_unique<T>(std::forward<Args>(args)...);
        p->replaceS = S;
        registerStmt(std::move(p));
    }

    // Get the full replacement of an AST node. Note that this function removes any replaced statements from the work
    // list, so calling it twice will only replace the first time.
    RewriteOperation getReplaced(const clang::Stmt *S);
    // Get all replacements. These may be fewer than the requested ones because of nesting.
    std::vector<RewriteOperation> getReplacements();

private:
    std::string getExpandedCode(const clang::Stmt *toReplaceS);

private:
    clang::ASTContext &ACtx;
    const clang::LangOptions &LangOpts;
    clang::SourceManager &SM;
    clang::Preprocessor &PP;

    // Manages the pending replacements.
    class WorkList
    {
    public:
        typedef std::map<clang::SourceRange, std::vector<const StmtToRewrite *>> RangeToRepMap;

        WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);

        bool isStmtPending(const clang::Stmt *S) const;
        void addStmt(std::unique_ptr<StmtToRewrite> S);
        const RangeToRepMap &getRangeToReplacementsMap() const;
        std::vector<const StmtToRewrite *> getSortedReplacements() const;
        void markDone(const StmtToRewrite *S);
        void cleanup();

    private:
        clang::ASTContext &ACtx;
        clang::SourceManager &SM;
        std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
        std::vector<std::unique_ptr<StmtToRewrite>> m_done;
        RangeToRepMap m_rangeToReplacements;
    };

    WorkList m_workList;
};

} // namespace cu

#endif



----------------------------------------

RewriteManager.cpp

----------------------------------------

#include "ClangUtil/RewriteManager.h"
#include "ClangUtil/ASTUtil.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"
#include "clang/Lex/TokenConcatenation.h"
#include "clang/Lex/MacroArgs.h"


using namespace cu;


// Returns a Stmt that is the first parent of startS whose expansion range is within the given range.
static const clang::Stmt *GetFullMacroStmt(clang::SourceRange range, const clang::Stmt *startS, clang::ASTContext &ACtx)
{
    auto &SM = ACtx.getSourceManager();

    // Walk the tree upwards until ST does no longer expand to within range.
    const clang::Stmt *ST = startS;
    while (true)
    {
        const auto &parents = ACtx.getParents(*ST);
        if (parents.empty())
        {
            break;
        }
        auto childS = ST;
        ST = parents[0].get<clang::Stmt>();
        if (!ST)
        {
            if (auto D = parents[0].get<clang::Decl>())
            {
                const auto &parentsD = ACtx.getParents(*D);
                if (parentsD.empty())
                {
                    break;
                }
                ST = parentsD[0].get<clang::Stmt>();
                if (!ST)
                {
                    break;
                }
            }
            else
            {
                break;
            }
        }

        auto exLocS = SM.getExpansionLoc(ST->getLocStart());
        auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
        if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
            SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
        {
            return childS;
        }
    }

    return nullptr;
}


RewriteManager *StmtToRewrite::getMgr() const
{
    return m_mgr;
}


RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) const
{
    for (const auto &r : m_pending)
    {
        if (r->replaceS == S)
        {
            return true;
        }
    }
    return false;
}
void RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
{
    // Use the expansion range for maximal replacement flexibility in macros.
    auto replaceRange = SM.getExpansionRange(S->replaceS->getSourceRange());

    // TODO not quite correct.
    /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
        std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite *lhs, const StmtToRewrite *rhs) {
            auto lhsRange = SM.getExpansionRange(lhs->replaceS->getSourceRange());
            auto rhsRange = SM.getExpansionRange(rhs->replaceS->getSourceRange());
            return IsContained(rhsRange, lhsRange, SM);
        });
    };*/

    // Establish hierarchical relation between all ranges.
    bool found = false;
    // First, check if this range is within one we already have.
    for (auto &r : m_rangeToReplacements)
    {
        if (IsContained(replaceRange, r.first, SM))
        {
            // Insert in a sorted order.
            for (auto it = r.second.begin(); it != r.second.end(); ++it)
            {
                //auto testRange = SM.getExpansionRange((*it)->replaceS->getSourceRange());
                // if (IsContained(testRange, replaceRange, SM))
                if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
                {
                    r.second.insert(it, S.get());
                    found = true;
                    break;
                }
            }
            if (!found)
            {
                r.second.push_back(S.get());
                found = true;
            }
            break;
        }
    }
    // Not within existing range, add as new top-level range.
    if (!found)
    {
        // Check if any existing ranges are contained within the new one.
        std::vector<const StmtToRewrite *> moveThese;
        auto it = m_rangeToReplacements.begin();
        while (it != m_rangeToReplacements.end())
        {
            if (IsContained(it->first, replaceRange, SM))
            {
                moveThese.insert(moveThese.end(), it->second.begin(), it->second.end());
                it = m_rangeToReplacements.erase(it);
            }
            else
            {
                ++it;
            }
        }
        auto &accesses = m_rangeToReplacements[replaceRange];
        // The order is important here. We want the first element to be the one that spans the full range.
        accesses.push_back(S.get());
        // TODO sort "moveThese".
        accesses.insert(accesses.end(), moveThese.begin(), moveThese.end());
    }

    int count = 0;
    for (const auto &r : m_rangeToReplacements)
    {
        printf("range %i\n", count++);
        for (const auto &a : r.second)
        {
            printf("replacement:\n");
            a->replaceS->dump();
        }
    }

    m_pending.push_back(std::move(S));
}
const RewriteManager::WorkList::RangeToRepMap &RewriteManager::WorkList::getRangeToReplacementsMap() const
{
    return m_rangeToReplacements;
}
std::vector<const StmtToRewrite *> RewriteManager::WorkList::getSortedReplacements() const
{
    std::vector<const StmtToRewrite *> result;
    for (auto &r : m_rangeToReplacements)
    {
        result.insert(result.end(), r.second.begin(), r.second.end());
    }
    return result;
}
void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
{
    // Remove from hierarchy.
    for (auto &r : m_rangeToReplacements)
    {
        r.second.erase(std::remove(r.second.begin(), r.second.end(), S), r.second.end());
    }

    // Move from pending to done list.
    auto it = std::find_if(m_pending.begin(), m_pending.end(),
                           [&](const std::unique_ptr<StmtToRewrite> &rep) { return rep.get() == S; });
    if (it == m_pending.end())
    {
        throw std::runtime_error("Did not find replacement to mark as done");
    }
    m_done.push_back(std::move(*it));
    m_pending.erase(it);
}
void RewriteManager::WorkList::cleanup()
{
    m_done.clear();
}


RewriteManager::RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP)
    : ACtx(ACtx), LangOpts(ACtx.getLangOpts()), SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
{
}

void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
{
    if (!S->replaceS)
    {
        throw std::runtime_error("Must set replaceS");
    }

    if (m_workList.isStmtPending(S->replaceS))
    {
        throw std::runtime_error("This Stmt will already be replaced");
    }

    S->m_mgr = this;
    m_workList.addStmt(std::move(S));
}

RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
{
    auto range = SM.getExpansionRange(S->getSourceRange());
    return { range, getExpandedCode(S) };
}

std::vector<RewriteOperation> RewriteManager::getReplacements()
{
    std::vector<RewriteOperation> results;

    for (auto &rangeAndAccesses : m_workList.getRangeToReplacementsMap())
    {
        auto &range = rangeAndAccesses.first;
        auto &accesses = rangeAndAccesses.second;

        // Cannot replace something inside a macro because it would replace all expansions instead of just the selected
        // AST node. So in a first step, get an enclosing statement that is no longer inside a macro.
        // TODO we could keep the original code more clean by not expanding macro args if the whole expansion does not
        // contain the macro arg more than once.
        auto macroS = GetFullMacroStmt(range, accesses[0]->replaceS, ACtx);

        results.push_back(getReplaced(macroS));

        // TODO we could run clang-format on the replacements. this would especially benefit long macro expansions.
    }

    m_workList.cleanup();

    return results;
}

std::string RewriteManager::getExpandedCode(const clang::Stmt *toReplaceS)
{
    // TODO performance optimization. this is parsing way more than required.

    using namespace clang;

    printf("getExpandedCode:\n");
    toReplaceS->dump();

    std::string out;

    auto toReplaceExpStart = SM.getExpansionLoc(toReplaceS->getLocStart());
    auto toReplaceExpEnd = SM.getExpansionLoc(toReplaceS->getLocEnd());
    auto toReplaceSpellStart = SM.getSpellingLoc(toReplaceS->getLocStart());
    auto toReplaceSpellEnd = SM.getSpellingLoc(toReplaceS->getLocEnd());

    auto FID = SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));

    // The following is inspired by: clang/Rewrite/HTMLRewrite.cpp:HighlightMacros

    // Re-lex the raw token stream into a token buffer.
    std::vector<Token> TokenStream;

    const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
    Lexer L(FID, FromFile, SM, PP.getLangOpts());

    // Lex all the tokens in raw mode, to avoid entering #includes or expanding
    // macros.
    while (1)
    {
        Token Tok;
        L.LexFromRawLexer(Tok);

        // If this is a # at the start of a line, discard it from the token stream.
        // We don't want the re-preprocess step to see #defines, #includes or other
        // preprocessor directives.
        if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
            continue;

        // If this is a ## token, change its kind to unknown so that repreprocessing
        // it will not produce an error.
        if (Tok.is(tok::hashhash))
            Tok.setKind(tok::unknown);

        // If this raw token is an identifier, the raw lexer won't have looked up
        // the corresponding identifier info for it.  Do this now so that it will be
        // macro expanded when we re-preprocess it.
        if (Tok.is(tok::raw_identifier))
            PP.LookUpIdentifierInfo(Tok);

        TokenStream.push_back(Tok);

        for (auto &rep : m_workList.getSortedReplacements())
        {
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                //
            }
        }

        if (Tok.is(tok::eof))
            break;
    }

    // Temporarily change the diagnostics object so that we ignore any generated
    // diagnostics from this pass.
    DiagnosticsEngine TmpDiags(PP.getDiagnostics().getDiagnosticIDs(), &PP.getDiagnostics().getDiagnosticOptions(),
                               new IgnoringDiagConsumer);

    // Copy the preprocessor and all of its state.
    auto PPOpts = std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
    LangOptions LO = PP.getLangOpts();
    Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(), PP.getHeaderSearchInfo(), PP.getModuleLoader(),
PP.getIdentifierTable().getExternalIdentifierLookup());
    TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
    TmpPP.setExternalSource(PP.getExternalSource());
    TmpPP.setPreprocessedOutput(true);

    std::map<const clang::IdentifierInfo *, bool> MacroPreviouslyEnabled;
    for (const auto &m : PP.macros())
    {
        // printf("PREDEF MACRO: %s\n", m.first->getName().str().c_str());
        TmpPP.getMacroDefinition(m.first);

        for (const auto &tmpm : TmpPP.macros())
        {
            if (tmpm.first == m.first)
            {
                auto MD = m.second.getLatest();
                auto MI = MD->getMacroInfo();
                // If this is a recursive call we might be in a macro expansion and the macro might be disabled. We need
                // to enable it for now so that all expansions work. Restore it later.
                MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
                if (!MI->isEnabled())
                {
                    MD->getMacroInfo()->EnableMacro();
                }

                // This should not change anything since we just copy data over.
                auto &mutableState = const_cast<std::remove_const<decltype(tmpm.second)>::type &>(tmpm.second);
                mutableState.setLatest(MD);
                break;
            }
        }
    }

    class MacroArgCollector : public clang::PPCallbacks
    {
    public:
        MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}

        void MacroExpands(const Token &Tok, const MacroDefinition &MD, SourceRange Range, const MacroArgs *Args) override
        {
            if (!Args)
            {
                return;
            }
            printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
            for (int i = 0; i < (int)Args->getNumMacroArguments(); i++)
            {
                auto TokUnex = Args->getUnexpArgument(i);
                // Thats just non-const for a cache, so should be fine.
                auto TokPreExp = const_cast<MacroArgs *>(Args)->getPreExpArgument(i, TmpPP);
                printf("unexp: %s\n", TmpPP.getSpelling(*TokUnex).c_str());
                for (const auto &T : TokPreExp)
                {
                    printf("preexp: %s\n", TmpPP.getSpelling(T).c_str());
                }
            }
        }

        Preprocessor &TmpPP;
    };
TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
    // Instead: collect the macro arg info in the law lexing step above. or do another pass that uses the PP but without expansions.

    /*printf("DUMP MACRO INFO\n");
    for (const auto &m : PP.macros())
        PP.dumpMacroInfo(m.first);
    printf("---\n");
    for (const auto &m : TmpPP.macros())
        TmpPP.dumpMacroInfo(m.first);
    printf("DUMP MACRO INFO END\n");*/

    DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();

    // Inform the preprocessor that we don't want comments.
    TmpPP.SetCommentRetentionState(false, false);

    // We don't want pragmas either. Although we filtered out #pragma, removing
    // _Pragma and __pragma is much harder.
    bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
    TmpPP.setPragmasEnabled(false);

    // Enter the tokens we just lexed.  This will cause them to be macro expanded
    // but won't enter sub-files (because we removed #'s).
    TmpPP.EnterTokenStream(TokenStream, false);

    TokenConcatenation ConcatInfo(TmpPP);

    // Lex all the tokens.
    Token Tok;
    TmpPP.Lex(Tok);

    std::map<SourceLocation, int> slocIdx;

    auto checkReplacement = [&]() {
        for (auto &rep : m_workList.getSortedReplacements())
        {
            // auto rep = r.second.get();
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            // TODO we need to check here if the repS spans the full range (or largest?)
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                if (slocIdx[spellLoc] == 7)
                {
                    // replace
                }
                slocIdx[spellLoc]++;

                // Done replacing that one, but have to keep it alive until we're done with it.
                m_workList.markDone(rep);

                printf("[[[\n");
                auto repStr = rep->makeReplaceStr();
                printf("REPLACED: %s ]]]\n", repStr.c_str());
                out += repStr;

                // Skip ahead until after the whole replacement.
                auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
                while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
                {
                    TmpPP.Lex(Tok);
                    assert(!Tok.is(tok::eof) && "End not found");
                }

                // Eat one more since we stopped at the end token and we want to continue after it.
                TmpPP.Lex(Tok);

                return true;
            }
        }
        return false;
    };

    while (Tok.isNot(tok::eof))
    {
        printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());

        auto TokLoc = Tok.getLocation();
        auto TokExp = SM.getExpansionLoc(TokLoc);
        if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
        {
            // Anything after the Stmt we want to replace is not interesting.
            break;
        }

        // Skip ahead until we are at the expansion start of the Stmt we want to replace.
        if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
        {
            if (TokLoc.isMacroID())
            {
                // This is the first token of a macro expansion.
                auto LLoc = SM.getExpansionRange(TokLoc);

                // Ignore tokens whose instantiation location was not the main file.
                if (SM.getFileID(LLoc.first) != FID)
                {
                    TmpPP.Lex(Tok);
                    continue;
                }

                assert(SM.getFileID(LLoc.second) == FID &&
                       "Start and end of expansion must be in the same ultimate file!");

                bool stopOutputOnNextToken = false;
                bool toReplaceStartsInMacro = toReplaceExpStart == TokExp;
                bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
                bool startedOutput = false;

                Token PrevPrevTok;
                Token PrevTok = Tok;

                while (!Tok.is(tok::eof) && SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
                {
                    printf("TOKEN (in macro): %s\n", TmpPP.getSpelling(Tok).c_str());

                    auto TokSpell = SM.getSpellingLoc(Tok.getLocation());
                    if (stopOutputOnNextToken)
                    {
                        break;
                    }
                    if (toReplaceEndsInMacro && TokSpell == toReplaceSpellEnd)
                    {
                        stopOutputOnNextToken = true;
                    }

                    if (toReplaceStartsInMacro && !startedOutput)
                    {
                        if (TokSpell == toReplaceSpellStart)
                        {
                            startedOutput = true;
                        }
                        else
                        {
                            TmpPP.Lex(Tok);
                            continue;
                        }
                    }

                    // If the tokens were already space separated, or if they must be to avoid
                    // them being implicitly pasted, add a space between them.
                    if (Tok.hasLeadingSpace() || ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
                        out += ' ';

                    if (checkReplacement())
                    {
                        continue;
                    }

                    out += TmpPP.getSpelling(Tok);
                    TmpPP.Lex(Tok);
                }
                if (stopOutputOnNextToken)
                {
                    break;
                }
            }
            else
            {
                if (checkReplacement())
                {
                    continue;
                }

                // Output original code because we are outside of a replacement.
                out += TmpPP.getSpelling(Tok);
                TmpPP.Lex(Tok);
            }
        }
        else
        {
            TmpPP.Lex(Tok);
        }
    }

    // Restore the preprocessor's old state.
    TmpPP.setDiagnostics(*OldDiags);
    TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);

    for (const auto &tmpm : TmpPP.macros())
    {
        auto it = MacroPreviouslyEnabled.find(tmpm.first);
        if (it != MacroPreviouslyEnabled.end())
        {
            auto MD = tmpm.second.getLatest();
            auto MI = MD->getMacroInfo();
            if (MI->isEnabled() && !it->second)
            {
                MI->DisableMacro();
            }
            else if (!MI->isEnabled() && it->second)
            {
                MI->EnableMacro();
            }
        }
    }

    return out;
}



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev

Hi Rafael,

I did read into clang-refactor a while ago but unfortunatly could not follow that up. If I recall correctly its about source-to-source transformation (as you said) and aims at implementing the primitive refactorings that exist (e.g. extract-method, extract-variable, ....).

Rewriting itself should happen with the normal tooling framework.

(https://clang.llvm.org/docs/RefactoringEngine.html)

Maybe the implementers of the existing code can give better comments on you proposal (and might have considered a similar solution to yours already).

+Alex Lorenz

All the best, Jonas


Am 17.07.2018 um 14:46 schrieb Rafael·Stahl:

Hi Jonas

Thanks for introducing me to this, I have seen the "Replacement" before, but not clang-refactor.

However it seems to only provide management facilities around rewrite operations and not aid with the rewriting itself. Am I missing something here?

The two core problems for me:

- nesting replacements: When implementing replacements with clang-refactor, I still have to provide replacements that are closed in themselves. I cannot make them depend on others, right?
- macros: clang-refactor only seems to work with spelling locations.

Maybe an even simpler example: Replace all additions with "add(lhs, rhs)". This in itself is very difficult with clang as soon as the Stmts are nested or macros are involved.

Best regards
Rafael


On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



----------------------------------------

RewriteManager.h

----------------------------------------

#ifndef CLANGUTIL_REWRITEMANAGER_H
#define CLANGUTIL_REWRITEMANAGER_H

#include "ClangUtil/SourceRangeLess.h"
#include "make_unique.h"
#include "clang/AST/AST.h"
#include <vector>
#include <map>


// TODO extend to decls


namespace cu
{
// Represents a statement in the original AST that should be rewritten. To implement recursive replacements, call
// getMgr()->getReplaced() on any AST node within the makeReplaceStr callback.
class StmtToRewrite
{
    friend class RewriteManager;

public:
    // Returns the enclosing RewriteManager.
    class RewriteManager *getMgr() const;
    // Override this to build a replacement string. Implement recursive replacements with RewriteManager::getReplaced.
    virtual std::string makeReplaceStr() const = 0;

    // The statement to replace.
    const clang::Stmt *replaceS = nullptr;

private:
    RewriteManager *m_mgr;
};

struct RewriteOperation
{
    clang::SourceRange rangeToRemove;
    std::string strToInsert;
};

// A class for managing replacements of AST nodes. It allows to specifically target AST nodes instead of raw source
// locations to enable easy replacements involving macros and nested replacements.
// For extended documentation see: doc/rewriting.md
class RewriteManager
{
public:
    RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);

    clang::ASTContext &getACtx() const { return ACtx; }

    // Registers a StmtToRewrite for use with getReplacements. Call this on all
    // statements that should be rewritten before calling any rewriting functions.
    void registerStmt(std::unique_ptr<StmtToRewrite> S);

    // Helper for constructing the custom type from a Stmt.
    template <typename T, typename... Args>
    void registerStmt(const clang::Stmt *S, Args... args)
    {
        auto p = std::make_unique<T>(std::forward<Args>(args)...);
        p->replaceS = S;
        registerStmt(std::move(p));
    }

    // Get the full replacement of an AST node. Note that this function removes any replaced statements from the work
    // list, so calling it twice will only replace the first time.
    RewriteOperation getReplaced(const clang::Stmt *S);
    // Get all replacements. These may be fewer than the requested ones because of nesting.
    std::vector<RewriteOperation> getReplacements();

private:
    std::string getExpandedCode(const clang::Stmt *toReplaceS);

private:
    clang::ASTContext &ACtx;
    const clang::LangOptions &LangOpts;
    clang::SourceManager &SM;
    clang::Preprocessor &PP;

    // Manages the pending replacements.
    class WorkList
    {
    public:
        typedef std::map<clang::SourceRange, std::vector<const StmtToRewrite *>> RangeToRepMap;

        WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);

        bool isStmtPending(const clang::Stmt *S) const;
        void addStmt(std::unique_ptr<StmtToRewrite> S);
        const RangeToRepMap &getRangeToReplacementsMap() const;
        std::vector<const StmtToRewrite *> getSortedReplacements() const;
        void markDone(const StmtToRewrite *S);
        void cleanup();

    private:
        clang::ASTContext &ACtx;
        clang::SourceManager &SM;
        std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
        std::vector<std::unique_ptr<StmtToRewrite>> m_done;
        RangeToRepMap m_rangeToReplacements;
    };

    WorkList m_workList;
};

} // namespace cu

#endif



----------------------------------------

RewriteManager.cpp

----------------------------------------

#include "ClangUtil/RewriteManager.h"
#include "ClangUtil/ASTUtil.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"
#include "clang/Lex/TokenConcatenation.h"
#include "clang/Lex/MacroArgs.h"


using namespace cu;


// Returns a Stmt that is the first parent of startS whose expansion range is within the given range.
static const clang::Stmt *GetFullMacroStmt(clang::SourceRange range, const clang::Stmt *startS, clang::ASTContext &ACtx)
{
    auto &SM = ACtx.getSourceManager();

    // Walk the tree upwards until ST does no longer expand to within range.
    const clang::Stmt *ST = startS;
    while (true)
    {
        const auto &parents = ACtx.getParents(*ST);
        if (parents.empty())
        {
            break;
        }
        auto childS = ST;
        ST = parents[0].get<clang::Stmt>();
        if (!ST)
        {
            if (auto D = parents[0].get<clang::Decl>())
            {
                const auto &parentsD = ACtx.getParents(*D);
                if (parentsD.empty())
                {
                    break;
                }
                ST = parentsD[0].get<clang::Stmt>();
                if (!ST)
                {
                    break;
                }
            }
            else
            {
                break;
            }
        }

        auto exLocS = SM.getExpansionLoc(ST->getLocStart());
        auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
        if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
            SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
        {
            return childS;
        }
    }

    return nullptr;
}


RewriteManager *StmtToRewrite::getMgr() const
{
    return m_mgr;
}


RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) const
{
    for (const auto &r : m_pending)
    {
        if (r->replaceS == S)
        {
            return true;
        }
    }
    return false;
}
void RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
{
    // Use the expansion range for maximal replacement flexibility in macros.
    auto replaceRange = SM.getExpansionRange(S->replaceS->getSourceRange());

    // TODO not quite correct.
    /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
        std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite *lhs, const StmtToRewrite *rhs) {
            auto lhsRange = SM.getExpansionRange(lhs->replaceS->getSourceRange());
            auto rhsRange = SM.getExpansionRange(rhs->replaceS->getSourceRange());
            return IsContained(rhsRange, lhsRange, SM);
        });
    };*/

    // Establish hierarchical relation between all ranges.
    bool found = false;
    // First, check if this range is within one we already have.
    for (auto &r : m_rangeToReplacements)
    {
        if (IsContained(replaceRange, r.first, SM))
        {
            // Insert in a sorted order.
            for (auto it = r.second.begin(); it != r.second.end(); ++it)
            {
                //auto testRange = SM.getExpansionRange((*it)->replaceS->getSourceRange());
                // if (IsContained(testRange, replaceRange, SM))
                if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
                {
                    r.second.insert(it, S.get());
                    found = true;
                    break;
                }
            }
            if (!found)
            {
                r.second.push_back(S.get());
                found = true;
            }
            break;
        }
    }
    // Not within existing range, add as new top-level range.
    if (!found)
    {
        // Check if any existing ranges are contained within the new one.
        std::vector<const StmtToRewrite *> moveThese;
        auto it = m_rangeToReplacements.begin();
        while (it != m_rangeToReplacements.end())
        {
            if (IsContained(it->first, replaceRange, SM))
            {
                moveThese.insert(moveThese.end(), it->second.begin(), it->second.end());
                it = m_rangeToReplacements.erase(it);
            }
            else
            {
                ++it;
            }
        }
        auto &accesses = m_rangeToReplacements[replaceRange];
        // The order is important here. We want the first element to be the one that spans the full range.
        accesses.push_back(S.get());
        // TODO sort "moveThese".
        accesses.insert(accesses.end(), moveThese.begin(), moveThese.end());
    }

    int count = 0;
    for (const auto &r : m_rangeToReplacements)
    {
        printf("range %i\n", count++);
        for (const auto &a : r.second)
        {
            printf("replacement:\n");
            a->replaceS->dump();
        }
    }

    m_pending.push_back(std::move(S));
}
const RewriteManager::WorkList::RangeToRepMap &RewriteManager::WorkList::getRangeToReplacementsMap() const
{
    return m_rangeToReplacements;
}
std::vector<const StmtToRewrite *> RewriteManager::WorkList::getSortedReplacements() const
{
    std::vector<const StmtToRewrite *> result;
    for (auto &r : m_rangeToReplacements)
    {
        result.insert(result.end(), r.second.begin(), r.second.end());
    }
    return result;
}
void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
{
    // Remove from hierarchy.
    for (auto &r : m_rangeToReplacements)
    {
        r.second.erase(std::remove(r.second.begin(), r.second.end(), S), r.second.end());
    }

    // Move from pending to done list.
    auto it = std::find_if(m_pending.begin(), m_pending.end(),
                           [&](const std::unique_ptr<StmtToRewrite> &rep) { return rep.get() == S; });
    if (it == m_pending.end())
    {
        throw std::runtime_error("Did not find replacement to mark as done");
    }
    m_done.push_back(std::move(*it));
    m_pending.erase(it);
}
void RewriteManager::WorkList::cleanup()
{
    m_done.clear();
}


RewriteManager::RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP)
    : ACtx(ACtx), LangOpts(ACtx.getLangOpts()), SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
{
}

void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
{
    if (!S->replaceS)
    {
        throw std::runtime_error("Must set replaceS");
    }

    if (m_workList.isStmtPending(S->replaceS))
    {
        throw std::runtime_error("This Stmt will already be replaced");
    }

    S->m_mgr = this;
    m_workList.addStmt(std::move(S));
}

RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
{
    auto range = SM.getExpansionRange(S->getSourceRange());
    return { range, getExpandedCode(S) };
}

std::vector<RewriteOperation> RewriteManager::getReplacements()
{
    std::vector<RewriteOperation> results;

    for (auto &rangeAndAccesses : m_workList.getRangeToReplacementsMap())
    {
        auto &range = rangeAndAccesses.first;
        auto &accesses = rangeAndAccesses.second;

        // Cannot replace something inside a macro because it would replace all expansions instead of just the selected
        // AST node. So in a first step, get an enclosing statement that is no longer inside a macro.
        // TODO we could keep the original code more clean by not expanding macro args if the whole expansion does not
        // contain the macro arg more than once.
        auto macroS = GetFullMacroStmt(range, accesses[0]->replaceS, ACtx);

        results.push_back(getReplaced(macroS));

        // TODO we could run clang-format on the replacements. this would especially benefit long macro expansions.
    }

    m_workList.cleanup();

    return results;
}

std::string RewriteManager::getExpandedCode(const clang::Stmt *toReplaceS)
{
    // TODO performance optimization. this is parsing way more than required.

    using namespace clang;

    printf("getExpandedCode:\n");
    toReplaceS->dump();

    std::string out;

    auto toReplaceExpStart = SM.getExpansionLoc(toReplaceS->getLocStart());
    auto toReplaceExpEnd = SM.getExpansionLoc(toReplaceS->getLocEnd());
    auto toReplaceSpellStart = SM.getSpellingLoc(toReplaceS->getLocStart());
    auto toReplaceSpellEnd = SM.getSpellingLoc(toReplaceS->getLocEnd());

    auto FID = SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));

    // The following is inspired by: clang/Rewrite/HTMLRewrite.cpp:HighlightMacros

    // Re-lex the raw token stream into a token buffer.
    std::vector<Token> TokenStream;

    const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
    Lexer L(FID, FromFile, SM, PP.getLangOpts());

    // Lex all the tokens in raw mode, to avoid entering #includes or expanding
    // macros.
    while (1)
    {
        Token Tok;
        L.LexFromRawLexer(Tok);

        // If this is a # at the start of a line, discard it from the token stream.
        // We don't want the re-preprocess step to see #defines, #includes or other
        // preprocessor directives.
        if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
            continue;

        // If this is a ## token, change its kind to unknown so that repreprocessing
        // it will not produce an error.
        if (Tok.is(tok::hashhash))
            Tok.setKind(tok::unknown);

        // If this raw token is an identifier, the raw lexer won't have looked up
        // the corresponding identifier info for it.  Do this now so that it will be
        // macro expanded when we re-preprocess it.
        if (Tok.is(tok::raw_identifier))
            PP.LookUpIdentifierInfo(Tok);

        TokenStream.push_back(Tok);

        for (auto &rep : m_workList.getSortedReplacements())
        {
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                //
            }
        }

        if (Tok.is(tok::eof))
            break;
    }

    // Temporarily change the diagnostics object so that we ignore any generated
    // diagnostics from this pass.
    DiagnosticsEngine TmpDiags(PP.getDiagnostics().getDiagnosticIDs(), &PP.getDiagnostics().getDiagnosticOptions(),
                               new IgnoringDiagConsumer);

    // Copy the preprocessor and all of its state.
    auto PPOpts = std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
    LangOptions LO = PP.getLangOpts();
    Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(), PP.getHeaderSearchInfo(), PP.getModuleLoader(),
PP.getIdentifierTable().getExternalIdentifierLookup());
    TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
    TmpPP.setExternalSource(PP.getExternalSource());
    TmpPP.setPreprocessedOutput(true);

    std::map<const clang::IdentifierInfo *, bool> MacroPreviouslyEnabled;
    for (const auto &m : PP.macros())
    {
        // printf("PREDEF MACRO: %s\n", m.first->getName().str().c_str());
        TmpPP.getMacroDefinition(m.first);

        for (const auto &tmpm : TmpPP.macros())
        {
            if (tmpm.first == m.first)
            {
                auto MD = m.second.getLatest();
                auto MI = MD->getMacroInfo();
                // If this is a recursive call we might be in a macro expansion and the macro might be disabled. We need
                // to enable it for now so that all expansions work. Restore it later.
                MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
                if (!MI->isEnabled())
                {
                    MD->getMacroInfo()->EnableMacro();
                }

                // This should not change anything since we just copy data over.
                auto &mutableState = const_cast<std::remove_const<decltype(tmpm.second)>::type &>(tmpm.second);
                mutableState.setLatest(MD);
                break;
            }
        }
    }

    class MacroArgCollector : public clang::PPCallbacks
    {
    public:
        MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}

        void MacroExpands(const Token &Tok, const MacroDefinition &MD, SourceRange Range, const MacroArgs *Args) override
        {
            if (!Args)
            {
                return;
            }
            printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
            for (int i = 0; i < (int)Args->getNumMacroArguments(); i++)
            {
                auto TokUnex = Args->getUnexpArgument(i);
                // Thats just non-const for a cache, so should be fine.
                auto TokPreExp = const_cast<MacroArgs *>(Args)->getPreExpArgument(i, TmpPP);
                printf("unexp: %s\n", TmpPP.getSpelling(*TokUnex).c_str());
                for (const auto &T : TokPreExp)
                {
                    printf("preexp: %s\n", TmpPP.getSpelling(T).c_str());
                }
            }
        }

        Preprocessor &TmpPP;
    };
TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
    // Instead: collect the macro arg info in the law lexing step above. or do another pass that uses the PP but without expansions.

    /*printf("DUMP MACRO INFO\n");
    for (const auto &m : PP.macros())
        PP.dumpMacroInfo(m.first);
    printf("---\n");
    for (const auto &m : TmpPP.macros())
        TmpPP.dumpMacroInfo(m.first);
    printf("DUMP MACRO INFO END\n");*/

    DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();

    // Inform the preprocessor that we don't want comments.
    TmpPP.SetCommentRetentionState(false, false);

    // We don't want pragmas either. Although we filtered out #pragma, removing
    // _Pragma and __pragma is much harder.
    bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
    TmpPP.setPragmasEnabled(false);

    // Enter the tokens we just lexed.  This will cause them to be macro expanded
    // but won't enter sub-files (because we removed #'s).
    TmpPP.EnterTokenStream(TokenStream, false);

    TokenConcatenation ConcatInfo(TmpPP);

    // Lex all the tokens.
    Token Tok;
    TmpPP.Lex(Tok);

    std::map<SourceLocation, int> slocIdx;

    auto checkReplacement = [&]() {
        for (auto &rep : m_workList.getSortedReplacements())
        {
            // auto rep = r.second.get();
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            // TODO we need to check here if the repS spans the full range (or largest?)
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                if (slocIdx[spellLoc] == 7)
                {
                    // replace
                }
                slocIdx[spellLoc]++;

                // Done replacing that one, but have to keep it alive until we're done with it.
                m_workList.markDone(rep);

                printf("[[[\n");
                auto repStr = rep->makeReplaceStr();
                printf("REPLACED: %s ]]]\n", repStr.c_str());
                out += repStr;

                // Skip ahead until after the whole replacement.
                auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
                while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
                {
                    TmpPP.Lex(Tok);
                    assert(!Tok.is(tok::eof) && "End not found");
                }

                // Eat one more since we stopped at the end token and we want to continue after it.
                TmpPP.Lex(Tok);

                return true;
            }
        }
        return false;
    };

    while (Tok.isNot(tok::eof))
    {
        printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());

        auto TokLoc = Tok.getLocation();
        auto TokExp = SM.getExpansionLoc(TokLoc);
        if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
        {
            // Anything after the Stmt we want to replace is not interesting.
            break;
        }

        // Skip ahead until we are at the expansion start of the Stmt we want to replace.
        if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
        {
            if (TokLoc.isMacroID())
            {
                // This is the first token of a macro expansion.
                auto LLoc = SM.getExpansionRange(TokLoc);

                // Ignore tokens whose instantiation location was not the main file.
                if (SM.getFileID(LLoc.first) != FID)
                {
                    TmpPP.Lex(Tok);
                    continue;
                }

                assert(SM.getFileID(LLoc.second) == FID &&
                       "Start and end of expansion must be in the same ultimate file!");

                bool stopOutputOnNextToken = false;
                bool toReplaceStartsInMacro = toReplaceExpStart == TokExp;
                bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
                bool startedOutput = false;

                Token PrevPrevTok;
                Token PrevTok = Tok;

                while (!Tok.is(tok::eof) && SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
                {
                    printf("TOKEN (in macro): %s\n", TmpPP.getSpelling(Tok).c_str());

                    auto TokSpell = SM.getSpellingLoc(Tok.getLocation());
                    if (stopOutputOnNextToken)
                    {
                        break;
                    }
                    if (toReplaceEndsInMacro && TokSpell == toReplaceSpellEnd)
                    {
                        stopOutputOnNextToken = true;
                    }

                    if (toReplaceStartsInMacro && !startedOutput)
                    {
                        if (TokSpell == toReplaceSpellStart)
                        {
                            startedOutput = true;
                        }
                        else
                        {
                            TmpPP.Lex(Tok);
                            continue;
                        }
                    }

                    // If the tokens were already space separated, or if they must be to avoid
                    // them being implicitly pasted, add a space between them.
                    if (Tok.hasLeadingSpace() || ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
                        out += ' ';

                    if (checkReplacement())
                    {
                        continue;
                    }

                    out += TmpPP.getSpelling(Tok);
                    TmpPP.Lex(Tok);
                }
                if (stopOutputOnNextToken)
                {
                    break;
                }
            }
            else
            {
                if (checkReplacement())
                {
                    continue;
                }

                // Output original code because we are outside of a replacement.
                out += TmpPP.getSpelling(Tok);
                TmpPP.Lex(Tok);
            }
        }
        else
        {
            TmpPP.Lex(Tok);
        }
    }

    // Restore the preprocessor's old state.
    TmpPP.setDiagnostics(*OldDiags);
    TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);

    for (const auto &tmpm : TmpPP.macros())
    {
        auto it = MacroPreviouslyEnabled.find(tmpm.first);
        if (it != MacroPreviouslyEnabled.end())
        {
            auto MD = tmpm.second.getLatest();
            auto MI = MD->getMacroInfo();
            if (MI->isEnabled() && !it->second)
            {
                MI->DisableMacro();
            }
            else if (!MI->isEnabled() && it->second)
            {
                MI->EnableMacro();
            }
        }
    }

    return out;
}



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev

Hey everyone

the two major limitations are resolved now:

- The macro argument issue
- Support for replacing Decls using clang::ast_type_traits::DynTypedNode

The macro issue vanished by itself once I figured a little unintuitive detail out: The SourceLocations from the original AST and the ones from the new Preprocessor Lexer did not compare equal even though they refer to the exact same location.

I believe this is because expansion SrcLocs have a kind of pointer identity representation and equality just compares raw representations, such that even when they point to the same locations, they don't compare equal. What worked for me was a kind of "deep" comparison:

bool clutil::DeepSrcLocEqual(clang::SourceLocation lhs, clang::SourceLocation rhs, const clang::SourceManager &SM)
{
    if (lhs == rhs)
        return true;

    if (SM.getExpansionLoc(lhs) != SM.getExpansionLoc(rhs))
        return false;
    if (SM.getSpellingLoc(lhs) != SM.getSpellingLoc(rhs))
        return false;

    clang::SourceLocation lhsMacro, rhsMacro;
    if (SM.isMacroArgExpansion(lhs, &lhsMacro))
    {
        if (!SM.isMacroArgExpansion(rhs, &rhsMacro))
            return false;
        if (!DeepSrcLocEqual(lhsMacro, rhsMacro, SM))
            return false;
    }

    return true;
}

Attached is the cleaned up rewriter that I ended up with now. Still, if there is any interest, I'd be happy to contribute this back to clangs libraries, but I would require some feedback how this fits into existing facilities.

Best regards
Rafael

On 17.07.18 16:19, Jonas Toth wrote:

Hi Rafael,

I did read into clang-refactor a while ago but unfortunatly could not follow that up. If I recall correctly its about source-to-source transformation (as you said) and aims at implementing the primitive refactorings that exist (e.g. extract-method, extract-variable, ....).

Rewriting itself should happen with the normal tooling framework.

(https://clang.llvm.org/docs/RefactoringEngine.html)

Maybe the implementers of the existing code can give better comments on you proposal (and might have considered a similar solution to yours already).

+Alex Lorenz

All the best, Jonas


Am 17.07.2018 um 14:46 schrieb Rafael·Stahl:

Hi Jonas

Thanks for introducing me to this, I have seen the "Replacement" before, but not clang-refactor.

However it seems to only provide management facilities around rewrite operations and not aid with the rewriting itself. Am I missing something here?

The two core problems for me:

- nesting replacements: When implementing replacements with clang-refactor, I still have to provide replacements that are closed in themselves. I cannot make them depend on others, right?
- macros: clang-refactor only seems to work with spelling locations.

Maybe an even simpler example: Replace all additions with "add(lhs, rhs)". This in itself is very difficult with clang as soon as the Stmts are nested or macros are involved.

Best regards
Rafael


On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



----------------------------------------

RewriteManager.h

----------------------------------------

#ifndef CLANGUTIL_REWRITEMANAGER_H
#define CLANGUTIL_REWRITEMANAGER_H

#include "ClangUtil/SourceRangeLess.h"
#include "make_unique.h"
#include "clang/AST/AST.h"
#include <vector>
#include <map>


// TODO extend to decls


namespace cu
{
// Represents a statement in the original AST that should be rewritten. To implement recursive replacements, call
// getMgr()->getReplaced() on any AST node within the makeReplaceStr callback.
class StmtToRewrite
{
    friend class RewriteManager;

public:
    // Returns the enclosing RewriteManager.
    class RewriteManager *getMgr() const;
    // Override this to build a replacement string. Implement recursive replacements with RewriteManager::getReplaced.
    virtual std::string makeReplaceStr() const = 0;

    // The statement to replace.
    const clang::Stmt *replaceS = nullptr;

private:
    RewriteManager *m_mgr;
};

struct RewriteOperation
{
    clang::SourceRange rangeToRemove;
    std::string strToInsert;
};

// A class for managing replacements of AST nodes. It allows to specifically target AST nodes instead of raw source
// locations to enable easy replacements involving macros and nested replacements.
// For extended documentation see: doc/rewriting.md
class RewriteManager
{
public:
    RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);

    clang::ASTContext &getACtx() const { return ACtx; }

    // Registers a StmtToRewrite for use with getReplacements. Call this on all
    // statements that should be rewritten before calling any rewriting functions.
    void registerStmt(std::unique_ptr<StmtToRewrite> S);

    // Helper for constructing the custom type from a Stmt.
    template <typename T, typename... Args>
    void registerStmt(const clang::Stmt *S, Args... args)
    {
        auto p = std::make_unique<T>(std::forward<Args>(args)...);
        p->replaceS = S;
        registerStmt(std::move(p));
    }

    // Get the full replacement of an AST node. Note that this function removes any replaced statements from the work
    // list, so calling it twice will only replace the first time.
    RewriteOperation getReplaced(const clang::Stmt *S);
    // Get all replacements. These may be fewer than the requested ones because of nesting.
    std::vector<RewriteOperation> getReplacements();

private:
    std::string getExpandedCode(const clang::Stmt *toReplaceS);

private:
    clang::ASTContext &ACtx;
    const clang::LangOptions &LangOpts;
    clang::SourceManager &SM;
    clang::Preprocessor &PP;

    // Manages the pending replacements.
    class WorkList
    {
    public:
        typedef std::map<clang::SourceRange, std::vector<const StmtToRewrite *>> RangeToRepMap;

        WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);

        bool isStmtPending(const clang::Stmt *S) const;
        void addStmt(std::unique_ptr<StmtToRewrite> S);
        const RangeToRepMap &getRangeToReplacementsMap() const;
        std::vector<const StmtToRewrite *> getSortedReplacements() const;
        void markDone(const StmtToRewrite *S);
        void cleanup();

    private:
        clang::ASTContext &ACtx;
        clang::SourceManager &SM;
        std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
        std::vector<std::unique_ptr<StmtToRewrite>> m_done;
        RangeToRepMap m_rangeToReplacements;
    };

    WorkList m_workList;
};

} // namespace cu

#endif



----------------------------------------

RewriteManager.cpp

----------------------------------------

#include "ClangUtil/RewriteManager.h"
#include "ClangUtil/ASTUtil.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"
#include "clang/Lex/TokenConcatenation.h"
#include "clang/Lex/MacroArgs.h"


using namespace cu;


// Returns a Stmt that is the first parent of startS whose expansion range is within the given range.
static const clang::Stmt *GetFullMacroStmt(clang::SourceRange range, const clang::Stmt *startS, clang::ASTContext &ACtx)
{
    auto &SM = ACtx.getSourceManager();

    // Walk the tree upwards until ST does no longer expand to within range.
    const clang::Stmt *ST = startS;
    while (true)
    {
        const auto &parents = ACtx.getParents(*ST);
        if (parents.empty())
        {
            break;
        }
        auto childS = ST;
        ST = parents[0].get<clang::Stmt>();
        if (!ST)
        {
            if (auto D = parents[0].get<clang::Decl>())
            {
                const auto &parentsD = ACtx.getParents(*D);
                if (parentsD.empty())
                {
                    break;
                }
                ST = parentsD[0].get<clang::Stmt>();
                if (!ST)
                {
                    break;
                }
            }
            else
            {
                break;
            }
        }

        auto exLocS = SM.getExpansionLoc(ST->getLocStart());
        auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
        if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
            SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
        {
            return childS;
        }
    }

    return nullptr;
}


RewriteManager *StmtToRewrite::getMgr() const
{
    return m_mgr;
}


RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) const
{
    for (const auto &r : m_pending)
    {
        if (r->replaceS == S)
        {
            return true;
        }
    }
    return false;
}
void RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
{
    // Use the expansion range for maximal replacement flexibility in macros.
    auto replaceRange = SM.getExpansionRange(S->replaceS->getSourceRange());

    // TODO not quite correct.
    /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
        std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite *lhs, const StmtToRewrite *rhs) {
            auto lhsRange = SM.getExpansionRange(lhs->replaceS->getSourceRange());
            auto rhsRange = SM.getExpansionRange(rhs->replaceS->getSourceRange());
            return IsContained(rhsRange, lhsRange, SM);
        });
    };*/

    // Establish hierarchical relation between all ranges.
    bool found = false;
    // First, check if this range is within one we already have.
    for (auto &r : m_rangeToReplacements)
    {
        if (IsContained(replaceRange, r.first, SM))
        {
            // Insert in a sorted order.
            for (auto it = r.second.begin(); it != r.second.end(); ++it)
            {
                //auto testRange = SM.getExpansionRange((*it)->replaceS->getSourceRange());
                // if (IsContained(testRange, replaceRange, SM))
                if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
                {
                    r.second.insert(it, S.get());
                    found = true;
                    break;
                }
            }
            if (!found)
            {
                r.second.push_back(S.get());
                found = true;
            }
            break;
        }
    }
    // Not within existing range, add as new top-level range.
    if (!found)
    {
        // Check if any existing ranges are contained within the new one.
        std::vector<const StmtToRewrite *> moveThese;
        auto it = m_rangeToReplacements.begin();
        while (it != m_rangeToReplacements.end())
        {
            if (IsContained(it->first, replaceRange, SM))
            {
                moveThese.insert(moveThese.end(), it->second.begin(), it->second.end());
                it = m_rangeToReplacements.erase(it);
            }
            else
            {
                ++it;
            }
        }
        auto &accesses = m_rangeToReplacements[replaceRange];
        // The order is important here. We want the first element to be the one that spans the full range.
        accesses.push_back(S.get());
        // TODO sort "moveThese".
        accesses.insert(accesses.end(), moveThese.begin(), moveThese.end());
    }

    int count = 0;
    for (const auto &r : m_rangeToReplacements)
    {
        printf("range %i\n", count++);
        for (const auto &a : r.second)
        {
            printf("replacement:\n");
            a->replaceS->dump();
        }
    }

    m_pending.push_back(std::move(S));
}
const RewriteManager::WorkList::RangeToRepMap &RewriteManager::WorkList::getRangeToReplacementsMap() const
{
    return m_rangeToReplacements;
}
std::vector<const StmtToRewrite *> RewriteManager::WorkList::getSortedReplacements() const
{
    std::vector<const StmtToRewrite *> result;
    for (auto &r : m_rangeToReplacements)
    {
        result.insert(result.end(), r.second.begin(), r.second.end());
    }
    return result;
}
void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
{
    // Remove from hierarchy.
    for (auto &r : m_rangeToReplacements)
    {
        r.second.erase(std::remove(r.second.begin(), r.second.end(), S), r.second.end());
    }

    // Move from pending to done list.
    auto it = std::find_if(m_pending.begin(), m_pending.end(),
                           [&](const std::unique_ptr<StmtToRewrite> &rep) { return rep.get() == S; });
    if (it == m_pending.end())
    {
        throw std::runtime_error("Did not find replacement to mark as done");
    }
    m_done.push_back(std::move(*it));
    m_pending.erase(it);
}
void RewriteManager::WorkList::cleanup()
{
    m_done.clear();
}


RewriteManager::RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP)
    : ACtx(ACtx), LangOpts(ACtx.getLangOpts()), SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
{
}

void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
{
    if (!S->replaceS)
    {
        throw std::runtime_error("Must set replaceS");
    }

    if (m_workList.isStmtPending(S->replaceS))
    {
        throw std::runtime_error("This Stmt will already be replaced");
    }

    S->m_mgr = this;
    m_workList.addStmt(std::move(S));
}

RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
{
    auto range = SM.getExpansionRange(S->getSourceRange());
    return { range, getExpandedCode(S) };
}

std::vector<RewriteOperation> RewriteManager::getReplacements()
{
    std::vector<RewriteOperation> results;

    for (auto &rangeAndAccesses : m_workList.getRangeToReplacementsMap())
    {
        auto &range = rangeAndAccesses.first;
        auto &accesses = rangeAndAccesses.second;

        // Cannot replace something inside a macro because it would replace all expansions instead of just the selected
        // AST node. So in a first step, get an enclosing statement that is no longer inside a macro.
        // TODO we could keep the original code more clean by not expanding macro args if the whole expansion does not
        // contain the macro arg more than once.
        auto macroS = GetFullMacroStmt(range, accesses[0]->replaceS, ACtx);

        results.push_back(getReplaced(macroS));

        // TODO we could run clang-format on the replacements. this would especially benefit long macro expansions.
    }

    m_workList.cleanup();

    return results;
}

std::string RewriteManager::getExpandedCode(const clang::Stmt *toReplaceS)
{
    // TODO performance optimization. this is parsing way more than required.

    using namespace clang;

    printf("getExpandedCode:\n");
    toReplaceS->dump();

    std::string out;

    auto toReplaceExpStart = SM.getExpansionLoc(toReplaceS->getLocStart());
    auto toReplaceExpEnd = SM.getExpansionLoc(toReplaceS->getLocEnd());
    auto toReplaceSpellStart = SM.getSpellingLoc(toReplaceS->getLocStart());
    auto toReplaceSpellEnd = SM.getSpellingLoc(toReplaceS->getLocEnd());

    auto FID = SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));

    // The following is inspired by: clang/Rewrite/HTMLRewrite.cpp:HighlightMacros

    // Re-lex the raw token stream into a token buffer.
    std::vector<Token> TokenStream;

    const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
    Lexer L(FID, FromFile, SM, PP.getLangOpts());

    // Lex all the tokens in raw mode, to avoid entering #includes or expanding
    // macros.
    while (1)
    {
        Token Tok;
        L.LexFromRawLexer(Tok);

        // If this is a # at the start of a line, discard it from the token stream.
        // We don't want the re-preprocess step to see #defines, #includes or other
        // preprocessor directives.
        if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
            continue;

        // If this is a ## token, change its kind to unknown so that repreprocessing
        // it will not produce an error.
        if (Tok.is(tok::hashhash))
            Tok.setKind(tok::unknown);

        // If this raw token is an identifier, the raw lexer won't have looked up
        // the corresponding identifier info for it.  Do this now so that it will be
        // macro expanded when we re-preprocess it.
        if (Tok.is(tok::raw_identifier))
            PP.LookUpIdentifierInfo(Tok);

        TokenStream.push_back(Tok);

        for (auto &rep : m_workList.getSortedReplacements())
        {
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                //
            }
        }

        if (Tok.is(tok::eof))
            break;
    }

    // Temporarily change the diagnostics object so that we ignore any generated
    // diagnostics from this pass.
    DiagnosticsEngine TmpDiags(PP.getDiagnostics().getDiagnosticIDs(), &PP.getDiagnostics().getDiagnosticOptions(),
                               new IgnoringDiagConsumer);

    // Copy the preprocessor and all of its state.
    auto PPOpts = std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
    LangOptions LO = PP.getLangOpts();
    Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(), PP.getHeaderSearchInfo(), PP.getModuleLoader(),
PP.getIdentifierTable().getExternalIdentifierLookup());
    TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
    TmpPP.setExternalSource(PP.getExternalSource());
    TmpPP.setPreprocessedOutput(true);

    std::map<const clang::IdentifierInfo *, bool> MacroPreviouslyEnabled;
    for (const auto &m : PP.macros())
    {
        // printf("PREDEF MACRO: %s\n", m.first->getName().str().c_str());
        TmpPP.getMacroDefinition(m.first);

        for (const auto &tmpm : TmpPP.macros())
        {
            if (tmpm.first == m.first)
            {
                auto MD = m.second.getLatest();
                auto MI = MD->getMacroInfo();
                // If this is a recursive call we might be in a macro expansion and the macro might be disabled. We need
                // to enable it for now so that all expansions work. Restore it later.
                MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
                if (!MI->isEnabled())
                {
                    MD->getMacroInfo()->EnableMacro();
                }

                // This should not change anything since we just copy data over.
                auto &mutableState = const_cast<std::remove_const<decltype(tmpm.second)>::type &>(tmpm.second);
                mutableState.setLatest(MD);
                break;
            }
        }
    }

    class MacroArgCollector : public clang::PPCallbacks
    {
    public:
        MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}

        void MacroExpands(const Token &Tok, const MacroDefinition &MD, SourceRange Range, const MacroArgs *Args) override
        {
            if (!Args)
            {
                return;
            }
            printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
            for (int i = 0; i < (int)Args->getNumMacroArguments(); i++)
            {
                auto TokUnex = Args->getUnexpArgument(i);
                // Thats just non-const for a cache, so should be fine.
                auto TokPreExp = const_cast<MacroArgs *>(Args)->getPreExpArgument(i, TmpPP);
                printf("unexp: %s\n", TmpPP.getSpelling(*TokUnex).c_str());
                for (const auto &T : TokPreExp)
                {
                    printf("preexp: %s\n", TmpPP.getSpelling(T).c_str());
                }
            }
        }

        Preprocessor &TmpPP;
    };
TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
    // Instead: collect the macro arg info in the law lexing step above. or do another pass that uses the PP but without expansions.

    /*printf("DUMP MACRO INFO\n");
    for (const auto &m : PP.macros())
        PP.dumpMacroInfo(m.first);
    printf("---\n");
    for (const auto &m : TmpPP.macros())
        TmpPP.dumpMacroInfo(m.first);
    printf("DUMP MACRO INFO END\n");*/

    DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();

    // Inform the preprocessor that we don't want comments.
    TmpPP.SetCommentRetentionState(false, false);

    // We don't want pragmas either. Although we filtered out #pragma, removing
    // _Pragma and __pragma is much harder.
    bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
    TmpPP.setPragmasEnabled(false);

    // Enter the tokens we just lexed.  This will cause them to be macro expanded
    // but won't enter sub-files (because we removed #'s).
    TmpPP.EnterTokenStream(TokenStream, false);

    TokenConcatenation ConcatInfo(TmpPP);

    // Lex all the tokens.
    Token Tok;
    TmpPP.Lex(Tok);

    std::map<SourceLocation, int> slocIdx;

    auto checkReplacement = [&]() {
        for (auto &rep : m_workList.getSortedReplacements())
        {
            // auto rep = r.second.get();
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            // TODO we need to check here if the repS spans the full range (or largest?)
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                if (slocIdx[spellLoc] == 7)
                {
                    // replace
                }
                slocIdx[spellLoc]++;

                // Done replacing that one, but have to keep it alive until we're done with it.
                m_workList.markDone(rep);

                printf("[[[\n");
                auto repStr = rep->makeReplaceStr();
                printf("REPLACED: %s ]]]\n", repStr.c_str());
                out += repStr;

                // Skip ahead until after the whole replacement.
                auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
                while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
                {
                    TmpPP.Lex(Tok);
                    assert(!Tok.is(tok::eof) && "End not found");
                }

                // Eat one more since we stopped at the end token and we want to continue after it.
                TmpPP.Lex(Tok);

                return true;
            }
        }
        return false;
    };

    while (Tok.isNot(tok::eof))
    {
        printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());

        auto TokLoc = Tok.getLocation();
        auto TokExp = SM.getExpansionLoc(TokLoc);
        if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
        {
            // Anything after the Stmt we want to replace is not interesting.
            break;
        }

        // Skip ahead until we are at the expansion start of the Stmt we want to replace.
        if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
        {
            if (TokLoc.isMacroID())
            {
                // This is the first token of a macro expansion.
                auto LLoc = SM.getExpansionRange(TokLoc);

                // Ignore tokens whose instantiation location was not the main file.
                if (SM.getFileID(LLoc.first) != FID)
                {
                    TmpPP.Lex(Tok);
                    continue;
                }

                assert(SM.getFileID(LLoc.second) == FID &&
                       "Start and end of expansion must be in the same ultimate file!");

                bool stopOutputOnNextToken = false;
                bool toReplaceStartsInMacro = toReplaceExpStart == TokExp;
                bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
                bool startedOutput = false;

                Token PrevPrevTok;
                Token PrevTok = Tok;

                while (!Tok.is(tok::eof) && SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
                {
                    printf("TOKEN (in macro): %s\n", TmpPP.getSpelling(Tok).c_str());

                    auto TokSpell = SM.getSpellingLoc(Tok.getLocation());
                    if (stopOutputOnNextToken)
                    {
                        break;
                    }
                    if (toReplaceEndsInMacro && TokSpell == toReplaceSpellEnd)
                    {
                        stopOutputOnNextToken = true;
                    }

                    if (toReplaceStartsInMacro && !startedOutput)
                    {
                        if (TokSpell == toReplaceSpellStart)
                        {
                            startedOutput = true;
                        }
                        else
                        {
                            TmpPP.Lex(Tok);
                            continue;
                        }
                    }

                    // If the tokens were already space separated, or if they must be to avoid
                    // them being implicitly pasted, add a space between them.
                    if (Tok.hasLeadingSpace() || ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
                        out += ' ';

                    if (checkReplacement())
                    {
                        continue;
                    }

                    out += TmpPP.getSpelling(Tok);
                    TmpPP.Lex(Tok);
                }
                if (stopOutputOnNextToken)
                {
                    break;
                }
            }
            else
            {
                if (checkReplacement())
                {
                    continue;
                }

                // Output original code because we are outside of a replacement.
                out += TmpPP.getSpelling(Tok);
                TmpPP.Lex(Tok);
            }
        }
        else
        {
            TmpPP.Lex(Tok);
        }
    }

    // Restore the preprocessor's old state.
    TmpPP.setDiagnostics(*OldDiags);
    TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);

    for (const auto &tmpm : TmpPP.macros())
    {
        auto it = MacroPreviouslyEnabled.find(tmpm.first);
        if (it != MacroPreviouslyEnabled.end())
        {
            auto MD = tmpm.second.getLatest();
            auto MI = MD->getMacroInfo();
            if (MI->isEnabled() && !it->second)
            {
                MI->DisableMacro();
            }
            else if (!MI->isEnabled() && it->second)
            {
                MI->EnableMacro();
            }
        }
    }

    return out;
}



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

RewriteManager.h (3K) Download Attachment
RewriteManager.cpp (13K) Download Attachment
smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev

Hi Rafael,

nice to see your progress!

In my opinion these advancements should go in the Tooling and Rewrite directly in clang but integrate with the existing facilities. It would be inconvenient two have two separate ways of doing rewritings.

What I am curious about is how to resolve the macro-situation. If the refactoring will work directly on the AST and ignore macros, how is the refactoring translated into real code again?

Having an functional way of doing the refactorings with just transforming the AST, e.g. replacing one node with a new subtree would be great and elegant on the AST side, but how would we move back from AST to code?

Best, Jonas

Am 23.10.18 um 17:32 schrieb Rafael·Stahl:

Hey everyone

the two major limitations are resolved now:

- The macro argument issue
- Support for replacing Decls using clang::ast_type_traits::DynTypedNode

The macro issue vanished by itself once I figured a little unintuitive detail out: The SourceLocations from the original AST and the ones from the new Preprocessor Lexer did not compare equal even though they refer to the exact same location.

I believe this is because expansion SrcLocs have a kind of pointer identity representation and equality just compares raw representations, such that even when they point to the same locations, they don't compare equal. What worked for me was a kind of "deep" comparison:

bool clutil::DeepSrcLocEqual(clang::SourceLocation lhs, clang::SourceLocation rhs, const clang::SourceManager &SM)
{
    if (lhs == rhs)
        return true;

    if (SM.getExpansionLoc(lhs) != SM.getExpansionLoc(rhs))
        return false;
    if (SM.getSpellingLoc(lhs) != SM.getSpellingLoc(rhs))
        return false;

    clang::SourceLocation lhsMacro, rhsMacro;
    if (SM.isMacroArgExpansion(lhs, &lhsMacro))
    {
        if (!SM.isMacroArgExpansion(rhs, &rhsMacro))
            return false;
        if (!DeepSrcLocEqual(lhsMacro, rhsMacro, SM))
            return false;
    }

    return true;
}

Attached is the cleaned up rewriter that I ended up with now. Still, if there is any interest, I'd be happy to contribute this back to clangs libraries, but I would require some feedback how this fits into existing facilities.

Best regards
Rafael

On 17.07.18 16:19, Jonas Toth wrote:

Hi Rafael,

I did read into clang-refactor a while ago but unfortunatly could not follow that up. If I recall correctly its about source-to-source transformation (as you said) and aims at implementing the primitive refactorings that exist (e.g. extract-method, extract-variable, ....).

Rewriting itself should happen with the normal tooling framework.

(https://clang.llvm.org/docs/RefactoringEngine.html)

Maybe the implementers of the existing code can give better comments on you proposal (and might have considered a similar solution to yours already).

+Alex Lorenz

All the best, Jonas


Am 17.07.2018 um 14:46 schrieb Rafael·Stahl:

Hi Jonas

Thanks for introducing me to this, I have seen the "Replacement" before, but not clang-refactor.

However it seems to only provide management facilities around rewrite operations and not aid with the rewriting itself. Am I missing something here?

The two core problems for me:

- nesting replacements: When implementing replacements with clang-refactor, I still have to provide replacements that are closed in themselves. I cannot make them depend on others, right?
- macros: clang-refactor only seems to work with spelling locations.

Maybe an even simpler example: Replace all additions with "add(lhs, rhs)". This in itself is very difficult with clang as soon as the Stmts are nested or macros are involved.

Best regards
Rafael


On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



----------------------------------------

RewriteManager.h

----------------------------------------

#ifndef CLANGUTIL_REWRITEMANAGER_H
#define CLANGUTIL_REWRITEMANAGER_H

#include "ClangUtil/SourceRangeLess.h"
#include "make_unique.h"
#include "clang/AST/AST.h"
#include <vector>
#include <map>


// TODO extend to decls


namespace cu
{
// Represents a statement in the original AST that should be rewritten. To implement recursive replacements, call
// getMgr()->getReplaced() on any AST node within the makeReplaceStr callback.
class StmtToRewrite
{
    friend class RewriteManager;

public:
    // Returns the enclosing RewriteManager.
    class RewriteManager *getMgr() const;
    // Override this to build a replacement string. Implement recursive replacements with RewriteManager::getReplaced.
    virtual std::string makeReplaceStr() const = 0;

    // The statement to replace.
    const clang::Stmt *replaceS = nullptr;

private:
    RewriteManager *m_mgr;
};

struct RewriteOperation
{
    clang::SourceRange rangeToRemove;
    std::string strToInsert;
};

// A class for managing replacements of AST nodes. It allows to specifically target AST nodes instead of raw source
// locations to enable easy replacements involving macros and nested replacements.
// For extended documentation see: doc/rewriting.md
class RewriteManager
{
public:
    RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);

    clang::ASTContext &getACtx() const { return ACtx; }

    // Registers a StmtToRewrite for use with getReplacements. Call this on all
    // statements that should be rewritten before calling any rewriting functions.
    void registerStmt(std::unique_ptr<StmtToRewrite> S);

    // Helper for constructing the custom type from a Stmt.
    template <typename T, typename... Args>
    void registerStmt(const clang::Stmt *S, Args... args)
    {
        auto p = std::make_unique<T>(std::forward<Args>(args)...);
        p->replaceS = S;
        registerStmt(std::move(p));
    }

    // Get the full replacement of an AST node. Note that this function removes any replaced statements from the work
    // list, so calling it twice will only replace the first time.
    RewriteOperation getReplaced(const clang::Stmt *S);
    // Get all replacements. These may be fewer than the requested ones because of nesting.
    std::vector<RewriteOperation> getReplacements();

private:
    std::string getExpandedCode(const clang::Stmt *toReplaceS);

private:
    clang::ASTContext &ACtx;
    const clang::LangOptions &LangOpts;
    clang::SourceManager &SM;
    clang::Preprocessor &PP;

    // Manages the pending replacements.
    class WorkList
    {
    public:
        typedef std::map<clang::SourceRange, std::vector<const StmtToRewrite *>> RangeToRepMap;

        WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);

        bool isStmtPending(const clang::Stmt *S) const;
        void addStmt(std::unique_ptr<StmtToRewrite> S);
        const RangeToRepMap &getRangeToReplacementsMap() const;
        std::vector<const StmtToRewrite *> getSortedReplacements() const;
        void markDone(const StmtToRewrite *S);
        void cleanup();

    private:
        clang::ASTContext &ACtx;
        clang::SourceManager &SM;
        std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
        std::vector<std::unique_ptr<StmtToRewrite>> m_done;
        RangeToRepMap m_rangeToReplacements;
    };

    WorkList m_workList;
};

} // namespace cu

#endif



----------------------------------------

RewriteManager.cpp

----------------------------------------

#include "ClangUtil/RewriteManager.h"
#include "ClangUtil/ASTUtil.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"
#include "clang/Lex/TokenConcatenation.h"
#include "clang/Lex/MacroArgs.h"


using namespace cu;


// Returns a Stmt that is the first parent of startS whose expansion range is within the given range.
static const clang::Stmt *GetFullMacroStmt(clang::SourceRange range, const clang::Stmt *startS, clang::ASTContext &ACtx)
{
    auto &SM = ACtx.getSourceManager();

    // Walk the tree upwards until ST does no longer expand to within range.
    const clang::Stmt *ST = startS;
    while (true)
    {
        const auto &parents = ACtx.getParents(*ST);
        if (parents.empty())
        {
            break;
        }
        auto childS = ST;
        ST = parents[0].get<clang::Stmt>();
        if (!ST)
        {
            if (auto D = parents[0].get<clang::Decl>())
            {
                const auto &parentsD = ACtx.getParents(*D);
                if (parentsD.empty())
                {
                    break;
                }
                ST = parentsD[0].get<clang::Stmt>();
                if (!ST)
                {
                    break;
                }
            }
            else
            {
                break;
            }
        }

        auto exLocS = SM.getExpansionLoc(ST->getLocStart());
        auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
        if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
            SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
        {
            return childS;
        }
    }

    return nullptr;
}


RewriteManager *StmtToRewrite::getMgr() const
{
    return m_mgr;
}


RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) const
{
    for (const auto &r : m_pending)
    {
        if (r->replaceS == S)
        {
            return true;
        }
    }
    return false;
}
void RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
{
    // Use the expansion range for maximal replacement flexibility in macros.
    auto replaceRange = SM.getExpansionRange(S->replaceS->getSourceRange());

    // TODO not quite correct.
    /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
        std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite *lhs, const StmtToRewrite *rhs) {
            auto lhsRange = SM.getExpansionRange(lhs->replaceS->getSourceRange());
            auto rhsRange = SM.getExpansionRange(rhs->replaceS->getSourceRange());
            return IsContained(rhsRange, lhsRange, SM);
        });
    };*/

    // Establish hierarchical relation between all ranges.
    bool found = false;
    // First, check if this range is within one we already have.
    for (auto &r : m_rangeToReplacements)
    {
        if (IsContained(replaceRange, r.first, SM))
        {
            // Insert in a sorted order.
            for (auto it = r.second.begin(); it != r.second.end(); ++it)
            {
                //auto testRange = SM.getExpansionRange((*it)->replaceS->getSourceRange());
                // if (IsContained(testRange, replaceRange, SM))
                if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
                {
                    r.second.insert(it, S.get());
                    found = true;
                    break;
                }
            }
            if (!found)
            {
                r.second.push_back(S.get());
                found = true;
            }
            break;
        }
    }
    // Not within existing range, add as new top-level range.
    if (!found)
    {
        // Check if any existing ranges are contained within the new one.
        std::vector<const StmtToRewrite *> moveThese;
        auto it = m_rangeToReplacements.begin();
        while (it != m_rangeToReplacements.end())
        {
            if (IsContained(it->first, replaceRange, SM))
            {
                moveThese.insert(moveThese.end(), it->second.begin(), it->second.end());
                it = m_rangeToReplacements.erase(it);
            }
            else
            {
                ++it;
            }
        }
        auto &accesses = m_rangeToReplacements[replaceRange];
        // The order is important here. We want the first element to be the one that spans the full range.
        accesses.push_back(S.get());
        // TODO sort "moveThese".
        accesses.insert(accesses.end(), moveThese.begin(), moveThese.end());
    }

    int count = 0;
    for (const auto &r : m_rangeToReplacements)
    {
        printf("range %i\n", count++);
        for (const auto &a : r.second)
        {
            printf("replacement:\n");
            a->replaceS->dump();
        }
    }

    m_pending.push_back(std::move(S));
}
const RewriteManager::WorkList::RangeToRepMap &RewriteManager::WorkList::getRangeToReplacementsMap() const
{
    return m_rangeToReplacements;
}
std::vector<const StmtToRewrite *> RewriteManager::WorkList::getSortedReplacements() const
{
    std::vector<const StmtToRewrite *> result;
    for (auto &r : m_rangeToReplacements)
    {
        result.insert(result.end(), r.second.begin(), r.second.end());
    }
    return result;
}
void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
{
    // Remove from hierarchy.
    for (auto &r : m_rangeToReplacements)
    {
        r.second.erase(std::remove(r.second.begin(), r.second.end(), S), r.second.end());
    }

    // Move from pending to done list.
    auto it = std::find_if(m_pending.begin(), m_pending.end(),
                           [&](const std::unique_ptr<StmtToRewrite> &rep) { return rep.get() == S; });
    if (it == m_pending.end())
    {
        throw std::runtime_error("Did not find replacement to mark as done");
    }
    m_done.push_back(std::move(*it));
    m_pending.erase(it);
}
void RewriteManager::WorkList::cleanup()
{
    m_done.clear();
}


RewriteManager::RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP)
    : ACtx(ACtx), LangOpts(ACtx.getLangOpts()), SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
{
}

void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
{
    if (!S->replaceS)
    {
        throw std::runtime_error("Must set replaceS");
    }

    if (m_workList.isStmtPending(S->replaceS))
    {
        throw std::runtime_error("This Stmt will already be replaced");
    }

    S->m_mgr = this;
    m_workList.addStmt(std::move(S));
}

RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
{
    auto range = SM.getExpansionRange(S->getSourceRange());
    return { range, getExpandedCode(S) };
}

std::vector<RewriteOperation> RewriteManager::getReplacements()
{
    std::vector<RewriteOperation> results;

    for (auto &rangeAndAccesses : m_workList.getRangeToReplacementsMap())
    {
        auto &range = rangeAndAccesses.first;
        auto &accesses = rangeAndAccesses.second;

        // Cannot replace something inside a macro because it would replace all expansions instead of just the selected
        // AST node. So in a first step, get an enclosing statement that is no longer inside a macro.
        // TODO we could keep the original code more clean by not expanding macro args if the whole expansion does not
        // contain the macro arg more than once.
        auto macroS = GetFullMacroStmt(range, accesses[0]->replaceS, ACtx);

        results.push_back(getReplaced(macroS));

        // TODO we could run clang-format on the replacements. this would especially benefit long macro expansions.
    }

    m_workList.cleanup();

    return results;
}

std::string RewriteManager::getExpandedCode(const clang::Stmt *toReplaceS)
{
    // TODO performance optimization. this is parsing way more than required.

    using namespace clang;

    printf("getExpandedCode:\n");
    toReplaceS->dump();

    std::string out;

    auto toReplaceExpStart = SM.getExpansionLoc(toReplaceS->getLocStart());
    auto toReplaceExpEnd = SM.getExpansionLoc(toReplaceS->getLocEnd());
    auto toReplaceSpellStart = SM.getSpellingLoc(toReplaceS->getLocStart());
    auto toReplaceSpellEnd = SM.getSpellingLoc(toReplaceS->getLocEnd());

    auto FID = SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));

    // The following is inspired by: clang/Rewrite/HTMLRewrite.cpp:HighlightMacros

    // Re-lex the raw token stream into a token buffer.
    std::vector<Token> TokenStream;

    const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
    Lexer L(FID, FromFile, SM, PP.getLangOpts());

    // Lex all the tokens in raw mode, to avoid entering #includes or expanding
    // macros.
    while (1)
    {
        Token Tok;
        L.LexFromRawLexer(Tok);

        // If this is a # at the start of a line, discard it from the token stream.
        // We don't want the re-preprocess step to see #defines, #includes or other
        // preprocessor directives.
        if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
            continue;

        // If this is a ## token, change its kind to unknown so that repreprocessing
        // it will not produce an error.
        if (Tok.is(tok::hashhash))
            Tok.setKind(tok::unknown);

        // If this raw token is an identifier, the raw lexer won't have looked up
        // the corresponding identifier info for it.  Do this now so that it will be
        // macro expanded when we re-preprocess it.
        if (Tok.is(tok::raw_identifier))
            PP.LookUpIdentifierInfo(Tok);

        TokenStream.push_back(Tok);

        for (auto &rep : m_workList.getSortedReplacements())
        {
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                //
            }
        }

        if (Tok.is(tok::eof))
            break;
    }

    // Temporarily change the diagnostics object so that we ignore any generated
    // diagnostics from this pass.
    DiagnosticsEngine TmpDiags(PP.getDiagnostics().getDiagnosticIDs(), &PP.getDiagnostics().getDiagnosticOptions(),
                               new IgnoringDiagConsumer);

    // Copy the preprocessor and all of its state.
    auto PPOpts = std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
    LangOptions LO = PP.getLangOpts();
    Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(), PP.getHeaderSearchInfo(), PP.getModuleLoader(),
PP.getIdentifierTable().getExternalIdentifierLookup());
    TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
    TmpPP.setExternalSource(PP.getExternalSource());
    TmpPP.setPreprocessedOutput(true);

    std::map<const clang::IdentifierInfo *, bool> MacroPreviouslyEnabled;
    for (const auto &m : PP.macros())
    {
        // printf("PREDEF MACRO: %s\n", m.first->getName().str().c_str());
        TmpPP.getMacroDefinition(m.first);

        for (const auto &tmpm : TmpPP.macros())
        {
            if (tmpm.first == m.first)
            {
                auto MD = m.second.getLatest();
                auto MI = MD->getMacroInfo();
                // If this is a recursive call we might be in a macro expansion and the macro might be disabled. We need
                // to enable it for now so that all expansions work. Restore it later.
                MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
                if (!MI->isEnabled())
                {
                    MD->getMacroInfo()->EnableMacro();
                }

                // This should not change anything since we just copy data over.
                auto &mutableState = const_cast<std::remove_const<decltype(tmpm.second)>::type &>(tmpm.second);
                mutableState.setLatest(MD);
                break;
            }
        }
    }

    class MacroArgCollector : public clang::PPCallbacks
    {
    public:
        MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}

        void MacroExpands(const Token &Tok, const MacroDefinition &MD, SourceRange Range, const MacroArgs *Args) override
        {
            if (!Args)
            {
                return;
            }
            printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
            for (int i = 0; i < (int)Args->getNumMacroArguments(); i++)
            {
                auto TokUnex = Args->getUnexpArgument(i);
                // Thats just non-const for a cache, so should be fine.
                auto TokPreExp = const_cast<MacroArgs *>(Args)->getPreExpArgument(i, TmpPP);
                printf("unexp: %s\n", TmpPP.getSpelling(*TokUnex).c_str());
                for (const auto &T : TokPreExp)
                {
                    printf("preexp: %s\n", TmpPP.getSpelling(T).c_str());
                }
            }
        }

        Preprocessor &TmpPP;
    };
TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
    // Instead: collect the macro arg info in the law lexing step above. or do another pass that uses the PP but without expansions.

    /*printf("DUMP MACRO INFO\n");
    for (const auto &m : PP.macros())
        PP.dumpMacroInfo(m.first);
    printf("---\n");
    for (const auto &m : TmpPP.macros())
        TmpPP.dumpMacroInfo(m.first);
    printf("DUMP MACRO INFO END\n");*/

    DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();

    // Inform the preprocessor that we don't want comments.
    TmpPP.SetCommentRetentionState(false, false);

    // We don't want pragmas either. Although we filtered out #pragma, removing
    // _Pragma and __pragma is much harder.
    bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
    TmpPP.setPragmasEnabled(false);

    // Enter the tokens we just lexed.  This will cause them to be macro expanded
    // but won't enter sub-files (because we removed #'s).
    TmpPP.EnterTokenStream(TokenStream, false);

    TokenConcatenation ConcatInfo(TmpPP);

    // Lex all the tokens.
    Token Tok;
    TmpPP.Lex(Tok);

    std::map<SourceLocation, int> slocIdx;

    auto checkReplacement = [&]() {
        for (auto &rep : m_workList.getSortedReplacements())
        {
            // auto rep = r.second.get();
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            // TODO we need to check here if the repS spans the full range (or largest?)
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                if (slocIdx[spellLoc] == 7)
                {
                    // replace
                }
                slocIdx[spellLoc]++;

                // Done replacing that one, but have to keep it alive until we're done with it.
                m_workList.markDone(rep);

                printf("[[[\n");
                auto repStr = rep->makeReplaceStr();
                printf("REPLACED: %s ]]]\n", repStr.c_str());
                out += repStr;

                // Skip ahead until after the whole replacement.
                auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
                while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
                {
                    TmpPP.Lex(Tok);
                    assert(!Tok.is(tok::eof) && "End not found");
                }

                // Eat one more since we stopped at the end token and we want to continue after it.
                TmpPP.Lex(Tok);

                return true;
            }
        }
        return false;
    };

    while (Tok.isNot(tok::eof))
    {
        printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());

        auto TokLoc = Tok.getLocation();
        auto TokExp = SM.getExpansionLoc(TokLoc);
        if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
        {
            // Anything after the Stmt we want to replace is not interesting.
            break;
        }

        // Skip ahead until we are at the expansion start of the Stmt we want to replace.
        if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
        {
            if (TokLoc.isMacroID())
            {
                // This is the first token of a macro expansion.
                auto LLoc = SM.getExpansionRange(TokLoc);

                // Ignore tokens whose instantiation location was not the main file.
                if (SM.getFileID(LLoc.first) != FID)
                {
                    TmpPP.Lex(Tok);
                    continue;
                }

                assert(SM.getFileID(LLoc.second) == FID &&
                       "Start and end of expansion must be in the same ultimate file!");

                bool stopOutputOnNextToken = false;
                bool toReplaceStartsInMacro = toReplaceExpStart == TokExp;
                bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
                bool startedOutput = false;

                Token PrevPrevTok;
                Token PrevTok = Tok;

                while (!Tok.is(tok::eof) && SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
                {
                    printf("TOKEN (in macro): %s\n", TmpPP.getSpelling(Tok).c_str());

                    auto TokSpell = SM.getSpellingLoc(Tok.getLocation());
                    if (stopOutputOnNextToken)
                    {
                        break;
                    }
                    if (toReplaceEndsInMacro && TokSpell == toReplaceSpellEnd)
                    {
                        stopOutputOnNextToken = true;
                    }

                    if (toReplaceStartsInMacro && !startedOutput)
                    {
                        if (TokSpell == toReplaceSpellStart)
                        {
                            startedOutput = true;
                        }
                        else
                        {
                            TmpPP.Lex(Tok);
                            continue;
                        }
                    }

                    // If the tokens were already space separated, or if they must be to avoid
                    // them being implicitly pasted, add a space between them.
                    if (Tok.hasLeadingSpace() || ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
                        out += ' ';

                    if (checkReplacement())
                    {
                        continue;
                    }

                    out += TmpPP.getSpelling(Tok);
                    TmpPP.Lex(Tok);
                }
                if (stopOutputOnNextToken)
                {
                    break;
                }
            }
            else
            {
                if (checkReplacement())
                {
                    continue;
                }

                // Output original code because we are outside of a replacement.
                out += TmpPP.getSpelling(Tok);
                TmpPP.Lex(Tok);
            }
        }
        else
        {
            TmpPP.Lex(Tok);
        }
    }

    // Restore the preprocessor's old state.
    TmpPP.setDiagnostics(*OldDiags);
    TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);

    for (const auto &tmpm : TmpPP.macros())
    {
        auto it = MacroPreviouslyEnabled.find(tmpm.first);
        if (it != MacroPreviouslyEnabled.end())
        {
            auto MD = tmpm.second.getLatest();
            auto MI = MD->getMacroInfo();
            if (MI->isEnabled() && !it->second)
            {
                MI->DisableMacro();
            }
            else if (!MI->isEnabled() && it->second)
            {
                MI->EnableMacro();
            }
        }
    }

    return out;
}



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev
If you want this in the core libs, a great idea would be to write up a doc or email explaining the design and the use cases.
Note that if this doesn't solve a generic enough use case, it might still be a good idea for it to live in a different place - API surface is a cost to users.

On Tue, Oct 23, 2018 at 5:56 PM Jonas Toth via cfe-dev <[hidden email]> wrote:

Hi Rafael,

nice to see your progress!

In my opinion these advancements should go in the Tooling and Rewrite directly in clang but integrate with the existing facilities. It would be inconvenient two have two separate ways of doing rewritings.

What I am curious about is how to resolve the macro-situation. If the refactoring will work directly on the AST and ignore macros, how is the refactoring translated into real code again?

Having an functional way of doing the refactorings with just transforming the AST, e.g. replacing one node with a new subtree would be great and elegant on the AST side, but how would we move back from AST to code?

Best, Jonas

Am 23.10.18 um 17:32 schrieb Rafael·Stahl:

Hey everyone

the two major limitations are resolved now:

- The macro argument issue
- Support for replacing Decls using clang::ast_type_traits::DynTypedNode

The macro issue vanished by itself once I figured a little unintuitive detail out: The SourceLocations from the original AST and the ones from the new Preprocessor Lexer did not compare equal even though they refer to the exact same location.

I believe this is because expansion SrcLocs have a kind of pointer identity representation and equality just compares raw representations, such that even when they point to the same locations, they don't compare equal. What worked for me was a kind of "deep" comparison:

bool clutil::DeepSrcLocEqual(clang::SourceLocation lhs, clang::SourceLocation rhs, const clang::SourceManager &SM)
{
    if (lhs == rhs)
        return true;

    if (SM.getExpansionLoc(lhs) != SM.getExpansionLoc(rhs))
        return false;
    if (SM.getSpellingLoc(lhs) != SM.getSpellingLoc(rhs))
        return false;

    clang::SourceLocation lhsMacro, rhsMacro;
    if (SM.isMacroArgExpansion(lhs, &lhsMacro))
    {
        if (!SM.isMacroArgExpansion(rhs, &rhsMacro))
            return false;
        if (!DeepSrcLocEqual(lhsMacro, rhsMacro, SM))
            return false;
    }

    return true;
}

Attached is the cleaned up rewriter that I ended up with now. Still, if there is any interest, I'd be happy to contribute this back to clangs libraries, but I would require some feedback how this fits into existing facilities.

Best regards
Rafael

On 17.07.18 16:19, Jonas Toth wrote:

Hi Rafael,

I did read into clang-refactor a while ago but unfortunatly could not follow that up. If I recall correctly its about source-to-source transformation (as you said) and aims at implementing the primitive refactorings that exist (e.g. extract-method, extract-variable, ....).

Rewriting itself should happen with the normal tooling framework.

(https://clang.llvm.org/docs/RefactoringEngine.html)

Maybe the implementers of the existing code can give better comments on you proposal (and might have considered a similar solution to yours already).

+Alex Lorenz

All the best, Jonas


Am 17.07.2018 um 14:46 schrieb Rafael·Stahl:

Hi Jonas

Thanks for introducing me to this, I have seen the "Replacement" before, but not clang-refactor.

However it seems to only provide management facilities around rewrite operations and not aid with the rewriting itself. Am I missing something here?

The two core problems for me:

- nesting replacements: When implementing replacements with clang-refactor, I still have to provide replacements that are closed in themselves. I cannot make them depend on others, right?
- macros: clang-refactor only seems to work with spelling locations.

Maybe an even simpler example: Replace all additions with "add(lhs, rhs)". This in itself is very difficult with clang as soon as the Stmts are nested or macros are involved.

Best regards
Rafael


On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



----------------------------------------

RewriteManager.h

----------------------------------------

#ifndef CLANGUTIL_REWRITEMANAGER_H
#define CLANGUTIL_REWRITEMANAGER_H

#include "ClangUtil/SourceRangeLess.h"
#include "make_unique.h"
#include "clang/AST/AST.h"
#include <vector>
#include <map>


// TODO extend to decls


namespace cu
{
// Represents a statement in the original AST that should be rewritten. To implement recursive replacements, call
// getMgr()->getReplaced() on any AST node within the makeReplaceStr callback.
class StmtToRewrite
{
    friend class RewriteManager;

public:
    // Returns the enclosing RewriteManager.
    class RewriteManager *getMgr() const;
    // Override this to build a replacement string. Implement recursive replacements with RewriteManager::getReplaced.
    virtual std::string makeReplaceStr() const = 0;

    // The statement to replace.
    const clang::Stmt *replaceS = nullptr;

private:
    RewriteManager *m_mgr;
};

struct RewriteOperation
{
    clang::SourceRange rangeToRemove;
    std::string strToInsert;
};

// A class for managing replacements of AST nodes. It allows to specifically target AST nodes instead of raw source
// locations to enable easy replacements involving macros and nested replacements.
// For extended documentation see: doc/rewriting.md
class RewriteManager
{
public:
    RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP);

    clang::ASTContext &getACtx() const { return ACtx; }

    // Registers a StmtToRewrite for use with getReplacements. Call this on all
    // statements that should be rewritten before calling any rewriting functions.
    void registerStmt(std::unique_ptr<StmtToRewrite> S);

    // Helper for constructing the custom type from a Stmt.
    template <typename T, typename... Args>
    void registerStmt(const clang::Stmt *S, Args... args)
    {
        auto p = std::make_unique<T>(std::forward<Args>(args)...);
        p->replaceS = S;
        registerStmt(std::move(p));
    }

    // Get the full replacement of an AST node. Note that this function removes any replaced statements from the work
    // list, so calling it twice will only replace the first time.
    RewriteOperation getReplaced(const clang::Stmt *S);
    // Get all replacements. These may be fewer than the requested ones because of nesting.
    std::vector<RewriteOperation> getReplacements();

private:
    std::string getExpandedCode(const clang::Stmt *toReplaceS);

private:
    clang::ASTContext &ACtx;
    const clang::LangOptions &LangOpts;
    clang::SourceManager &SM;
    clang::Preprocessor &PP;

    // Manages the pending replacements.
    class WorkList
    {
    public:
        typedef std::map<clang::SourceRange, std::vector<const StmtToRewrite *>> RangeToRepMap;

        WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM);

        bool isStmtPending(const clang::Stmt *S) const;
        void addStmt(std::unique_ptr<StmtToRewrite> S);
        const RangeToRepMap &getRangeToReplacementsMap() const;
        std::vector<const StmtToRewrite *> getSortedReplacements() const;
        void markDone(const StmtToRewrite *S);
        void cleanup();

    private:
        clang::ASTContext &ACtx;
        clang::SourceManager &SM;
        std::vector<std::unique_ptr<StmtToRewrite>> m_pending;
        std::vector<std::unique_ptr<StmtToRewrite>> m_done;
        RangeToRepMap m_rangeToReplacements;
    };

    WorkList m_workList;
};

} // namespace cu

#endif



----------------------------------------

RewriteManager.cpp

----------------------------------------

#include "ClangUtil/RewriteManager.h"
#include "ClangUtil/ASTUtil.h"
#include "clang/Lex/Lexer.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Lex/PreprocessorOptions.h"
#include "clang/Lex/TokenConcatenation.h"
#include "clang/Lex/MacroArgs.h"


using namespace cu;


// Returns a Stmt that is the first parent of startS whose expansion range is within the given range.
static const clang::Stmt *GetFullMacroStmt(clang::SourceRange range, const clang::Stmt *startS, clang::ASTContext &ACtx)
{
    auto &SM = ACtx.getSourceManager();

    // Walk the tree upwards until ST does no longer expand to within range.
    const clang::Stmt *ST = startS;
    while (true)
    {
        const auto &parents = ACtx.getParents(*ST);
        if (parents.empty())
        {
            break;
        }
        auto childS = ST;
        ST = parents[0].get<clang::Stmt>();
        if (!ST)
        {
            if (auto D = parents[0].get<clang::Decl>())
            {
                const auto &parentsD = ACtx.getParents(*D);
                if (parentsD.empty())
                {
                    break;
                }
                ST = parentsD[0].get<clang::Stmt>();
                if (!ST)
                {
                    break;
                }
            }
            else
            {
                break;
            }
        }

        auto exLocS = SM.getExpansionLoc(ST->getLocStart());
        auto exLocE = SM.getExpansionLoc(ST->getLocEnd());
        if (SM.isBeforeInTranslationUnit(exLocS, range.getBegin()) ||
            SM.isBeforeInTranslationUnit(range.getEnd(), exLocE))
        {
            return childS;
        }
    }

    return nullptr;
}


RewriteManager *StmtToRewrite::getMgr() const
{
    return m_mgr;
}


RewriteManager::WorkList::WorkList(clang::ASTContext &ACtx, clang::SourceManager &SM) : ACtx(ACtx), SM(SM) {}
bool RewriteManager::WorkList::isStmtPending(const clang::Stmt *S) const
{
    for (const auto &r : m_pending)
    {
        if (r->replaceS == S)
        {
            return true;
        }
    }
    return false;
}
void RewriteManager::WorkList::addStmt(std::unique_ptr<StmtToRewrite> S)
{
    // Use the expansion range for maximal replacement flexibility in macros.
    auto replaceRange = SM.getExpansionRange(S->replaceS->getSourceRange());

    // TODO not quite correct.
    /*auto sortRanges = [&](std::vector<const StmtToRewrite *> &vec) {
        std::sort(vec.begin(), vec.end(), [&](const StmtToRewrite *lhs, const StmtToRewrite *rhs) {
            auto lhsRange = SM.getExpansionRange(lhs->replaceS->getSourceRange());
            auto rhsRange = SM.getExpansionRange(rhs->replaceS->getSourceRange());
            return IsContained(rhsRange, lhsRange, SM);
        });
    };*/

    // Establish hierarchical relation between all ranges.
    bool found = false;
    // First, check if this range is within one we already have.
    for (auto &r : m_rangeToReplacements)
    {
        if (IsContained(replaceRange, r.first, SM))
        {
            // Insert in a sorted order.
            for (auto it = r.second.begin(); it != r.second.end(); ++it)
            {
                //auto testRange = SM.getExpansionRange((*it)->replaceS->getSourceRange());
                // if (IsContained(testRange, replaceRange, SM))
                if (IsParent(S->replaceS, (*it)->replaceS, ACtx))
                {
                    r.second.insert(it, S.get());
                    found = true;
                    break;
                }
            }
            if (!found)
            {
                r.second.push_back(S.get());
                found = true;
            }
            break;
        }
    }
    // Not within existing range, add as new top-level range.
    if (!found)
    {
        // Check if any existing ranges are contained within the new one.
        std::vector<const StmtToRewrite *> moveThese;
        auto it = m_rangeToReplacements.begin();
        while (it != m_rangeToReplacements.end())
        {
            if (IsContained(it->first, replaceRange, SM))
            {
                moveThese.insert(moveThese.end(), it->second.begin(), it->second.end());
                it = m_rangeToReplacements.erase(it);
            }
            else
            {
                ++it;
            }
        }
        auto &accesses = m_rangeToReplacements[replaceRange];
        // The order is important here. We want the first element to be the one that spans the full range.
        accesses.push_back(S.get());
        // TODO sort "moveThese".
        accesses.insert(accesses.end(), moveThese.begin(), moveThese.end());
    }

    int count = 0;
    for (const auto &r : m_rangeToReplacements)
    {
        printf("range %i\n", count++);
        for (const auto &a : r.second)
        {
            printf("replacement:\n");
            a->replaceS->dump();
        }
    }

    m_pending.push_back(std::move(S));
}
const RewriteManager::WorkList::RangeToRepMap &RewriteManager::WorkList::getRangeToReplacementsMap() const
{
    return m_rangeToReplacements;
}
std::vector<const StmtToRewrite *> RewriteManager::WorkList::getSortedReplacements() const
{
    std::vector<const StmtToRewrite *> result;
    for (auto &r : m_rangeToReplacements)
    {
        result.insert(result.end(), r.second.begin(), r.second.end());
    }
    return result;
}
void RewriteManager::WorkList::markDone(const StmtToRewrite *S)
{
    // Remove from hierarchy.
    for (auto &r : m_rangeToReplacements)
    {
        r.second.erase(std::remove(r.second.begin(), r.second.end(), S), r.second.end());
    }

    // Move from pending to done list.
    auto it = std::find_if(m_pending.begin(), m_pending.end(),
                           [&](const std::unique_ptr<StmtToRewrite> &rep) { return rep.get() == S; });
    if (it == m_pending.end())
    {
        throw std::runtime_error("Did not find replacement to mark as done");
    }
    m_done.push_back(std::move(*it));
    m_pending.erase(it);
}
void RewriteManager::WorkList::cleanup()
{
    m_done.clear();
}


RewriteManager::RewriteManager(clang::ASTContext &ACtx, clang::Preprocessor &PP)
    : ACtx(ACtx), LangOpts(ACtx.getLangOpts()), SM(ACtx.getSourceManager()), PP(PP), m_workList(ACtx, SM)
{
}

void RewriteManager::registerStmt(std::unique_ptr<StmtToRewrite> S)
{
    if (!S->replaceS)
    {
        throw std::runtime_error("Must set replaceS");
    }

    if (m_workList.isStmtPending(S->replaceS))
    {
        throw std::runtime_error("This Stmt will already be replaced");
    }

    S->m_mgr = this;
    m_workList.addStmt(std::move(S));
}

RewriteOperation RewriteManager::getReplaced(const clang::Stmt *S)
{
    auto range = SM.getExpansionRange(S->getSourceRange());
    return { range, getExpandedCode(S) };
}

std::vector<RewriteOperation> RewriteManager::getReplacements()
{
    std::vector<RewriteOperation> results;

    for (auto &rangeAndAccesses : m_workList.getRangeToReplacementsMap())
    {
        auto &range = rangeAndAccesses.first;
        auto &accesses = rangeAndAccesses.second;

        // Cannot replace something inside a macro because it would replace all expansions instead of just the selected
        // AST node. So in a first step, get an enclosing statement that is no longer inside a macro.
        // TODO we could keep the original code more clean by not expanding macro args if the whole expansion does not
        // contain the macro arg more than once.
        auto macroS = GetFullMacroStmt(range, accesses[0]->replaceS, ACtx);

        results.push_back(getReplaced(macroS));

        // TODO we could run clang-format on the replacements. this would especially benefit long macro expansions.
    }

    m_workList.cleanup();

    return results;
}

std::string RewriteManager::getExpandedCode(const clang::Stmt *toReplaceS)
{
    // TODO performance optimization. this is parsing way more than required.

    using namespace clang;

    printf("getExpandedCode:\n");
    toReplaceS->dump();

    std::string out;

    auto toReplaceExpStart = SM.getExpansionLoc(toReplaceS->getLocStart());
    auto toReplaceExpEnd = SM.getExpansionLoc(toReplaceS->getLocEnd());
    auto toReplaceSpellStart = SM.getSpellingLoc(toReplaceS->getLocStart());
    auto toReplaceSpellEnd = SM.getSpellingLoc(toReplaceS->getLocEnd());

    auto FID = SM.getFileID(SM.getExpansionLoc(toReplaceS->getLocStart()));

    // The following is inspired by: clang/Rewrite/HTMLRewrite.cpp:HighlightMacros

    // Re-lex the raw token stream into a token buffer.
    std::vector<Token> TokenStream;

    const llvm::MemoryBuffer *FromFile = SM.getBuffer(FID);
    Lexer L(FID, FromFile, SM, PP.getLangOpts());

    // Lex all the tokens in raw mode, to avoid entering #includes or expanding
    // macros.
    while (1)
    {
        Token Tok;
        L.LexFromRawLexer(Tok);

        // If this is a # at the start of a line, discard it from the token stream.
        // We don't want the re-preprocess step to see #defines, #includes or other
        // preprocessor directives.
        if (Tok.is(tok::hash) && Tok.isAtStartOfLine())
            continue;

        // If this is a ## token, change its kind to unknown so that repreprocessing
        // it will not produce an error.
        if (Tok.is(tok::hashhash))
            Tok.setKind(tok::unknown);

        // If this raw token is an identifier, the raw lexer won't have looked up
        // the corresponding identifier info for it.  Do this now so that it will be
        // macro expanded when we re-preprocess it.
        if (Tok.is(tok::raw_identifier))
            PP.LookUpIdentifierInfo(Tok);

        TokenStream.push_back(Tok);

        for (auto &rep : m_workList.getSortedReplacements())
        {
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                //
            }
        }

        if (Tok.is(tok::eof))
            break;
    }

    // Temporarily change the diagnostics object so that we ignore any generated
    // diagnostics from this pass.
    DiagnosticsEngine TmpDiags(PP.getDiagnostics().getDiagnosticIDs(), &PP.getDiagnostics().getDiagnosticOptions(),
                               new IgnoringDiagConsumer);

    // Copy the preprocessor and all of its state.
    auto PPOpts = std::make_shared<PreprocessorOptions>(PP.getPreprocessorOpts());
    LangOptions LO = PP.getLangOpts();
    Preprocessor TmpPP(PPOpts, TmpDiags, LO, SM, PP.getPCMCache(), PP.getHeaderSearchInfo(), PP.getModuleLoader(),
PP.getIdentifierTable().getExternalIdentifierLookup());
    TmpPP.Initialize(PP.getTargetInfo(), PP.getAuxTargetInfo());
    TmpPP.setExternalSource(PP.getExternalSource());
    TmpPP.setPreprocessedOutput(true);

    std::map<const clang::IdentifierInfo *, bool> MacroPreviouslyEnabled;
    for (const auto &m : PP.macros())
    {
        // printf("PREDEF MACRO: %s\n", m.first->getName().str().c_str());
        TmpPP.getMacroDefinition(m.first);

        for (const auto &tmpm : TmpPP.macros())
        {
            if (tmpm.first == m.first)
            {
                auto MD = m.second.getLatest();
                auto MI = MD->getMacroInfo();
                // If this is a recursive call we might be in a macro expansion and the macro might be disabled. We need
                // to enable it for now so that all expansions work. Restore it later.
                MacroPreviouslyEnabled[tmpm.first] = MI->isEnabled();
                if (!MI->isEnabled())
                {
                    MD->getMacroInfo()->EnableMacro();
                }

                // This should not change anything since we just copy data over.
                auto &mutableState = const_cast<std::remove_const<decltype(tmpm.second)>::type &>(tmpm.second);
                mutableState.setLatest(MD);
                break;
            }
        }
    }

    class MacroArgCollector : public clang::PPCallbacks
    {
    public:
        MacroArgCollector(Preprocessor &TmpPP) : TmpPP(TmpPP) {}

        void MacroExpands(const Token &Tok, const MacroDefinition &MD, SourceRange Range, const MacroArgs *Args) override
        {
            if (!Args)
            {
                return;
            }
            printf("GOT MACRO ARGS EXPANSION CALLBACK\n");
            for (int i = 0; i < (int)Args->getNumMacroArguments(); i++)
            {
                auto TokUnex = Args->getUnexpArgument(i);
                // Thats just non-const for a cache, so should be fine.
                auto TokPreExp = const_cast<MacroArgs *>(Args)->getPreExpArgument(i, TmpPP);
                printf("unexp: %s\n", TmpPP.getSpelling(*TokUnex).c_str());
                for (const auto &T : TokPreExp)
                {
                    printf("preexp: %s\n", TmpPP.getSpelling(T).c_str());
                }
            }
        }

        Preprocessor &TmpPP;
    };
TmpPP.addPPCallbacks(std::make_unique<MacroArgCollector>(TmpPP));
    // Instead: collect the macro arg info in the law lexing step above. or do another pass that uses the PP but without expansions.

    /*printf("DUMP MACRO INFO\n");
    for (const auto &m : PP.macros())
        PP.dumpMacroInfo(m.first);
    printf("---\n");
    for (const auto &m : TmpPP.macros())
        TmpPP.dumpMacroInfo(m.first);
    printf("DUMP MACRO INFO END\n");*/

    DiagnosticsEngine *OldDiags = &TmpPP.getDiagnostics();

    // Inform the preprocessor that we don't want comments.
    TmpPP.SetCommentRetentionState(false, false);

    // We don't want pragmas either. Although we filtered out #pragma, removing
    // _Pragma and __pragma is much harder.
    bool PragmasPreviouslyEnabled = TmpPP.getPragmasEnabled();
    TmpPP.setPragmasEnabled(false);

    // Enter the tokens we just lexed.  This will cause them to be macro expanded
    // but won't enter sub-files (because we removed #'s).
    TmpPP.EnterTokenStream(TokenStream, false);

    TokenConcatenation ConcatInfo(TmpPP);

    // Lex all the tokens.
    Token Tok;
    TmpPP.Lex(Tok);

    std::map<SourceLocation, int> slocIdx;

    auto checkReplacement = [&]() {
        for (auto &rep : m_workList.getSortedReplacements())
        {
            // auto rep = r.second.get();
            auto repS = rep->replaceS;
            auto spellLoc = SM.getSpellingLoc(repS->getLocStart());
            // TODO we need to check here if the repS spans the full range (or largest?)
            if (SM.getSpellingLoc(Tok.getLocation()) == spellLoc)
            {
                if (slocIdx[spellLoc] == 7)
                {
                    // replace
                }
                slocIdx[spellLoc]++;

                // Done replacing that one, but have to keep it alive until we're done with it.
                m_workList.markDone(rep);

                printf("[[[\n");
                auto repStr = rep->makeReplaceStr();
                printf("REPLACED: %s ]]]\n", repStr.c_str());
                out += repStr;

                // Skip ahead until after the whole replacement.
                auto repEnd = SM.getSpellingLoc(repS->getLocEnd());
                while (repEnd != SM.getSpellingLoc(Tok.getLocation()))
                {
                    TmpPP.Lex(Tok);
                    assert(!Tok.is(tok::eof) && "End not found");
                }

                // Eat one more since we stopped at the end token and we want to continue after it.
                TmpPP.Lex(Tok);

                return true;
            }
        }
        return false;
    };

    while (Tok.isNot(tok::eof))
    {
        printf("TOKEN: %s\n", TmpPP.getSpelling(Tok).c_str());

        auto TokLoc = Tok.getLocation();
        auto TokExp = SM.getExpansionLoc(TokLoc);
        if (SM.isBeforeInTranslationUnit(toReplaceExpEnd, TokExp))
        {
            // Anything after the Stmt we want to replace is not interesting.
            break;
        }

        // Skip ahead until we are at the expansion start of the Stmt we want to replace.
        if (!SM.isBeforeInTranslationUnit(TokLoc, toReplaceExpStart))
        {
            if (TokLoc.isMacroID())
            {
                // This is the first token of a macro expansion.
                auto LLoc = SM.getExpansionRange(TokLoc);

                // Ignore tokens whose instantiation location was not the main file.
                if (SM.getFileID(LLoc.first) != FID)
                {
                    TmpPP.Lex(Tok);
                    continue;
                }

                assert(SM.getFileID(LLoc.second) == FID &&
                       "Start and end of expansion must be in the same ultimate file!");

                bool stopOutputOnNextToken = false;
                bool toReplaceStartsInMacro = toReplaceExpStart == TokExp;
                bool toReplaceEndsInMacro = toReplaceExpEnd == TokExp;
                bool startedOutput = false;

                Token PrevPrevTok;
                Token PrevTok = Tok;

                while (!Tok.is(tok::eof) && SM.getExpansionLoc(Tok.getLocation()) == LLoc.first)
                {
                    printf("TOKEN (in macro): %s\n", TmpPP.getSpelling(Tok).c_str());

                    auto TokSpell = SM.getSpellingLoc(Tok.getLocation());
                    if (stopOutputOnNextToken)
                    {
                        break;
                    }
                    if (toReplaceEndsInMacro && TokSpell == toReplaceSpellEnd)
                    {
                        stopOutputOnNextToken = true;
                    }

                    if (toReplaceStartsInMacro && !startedOutput)
                    {
                        if (TokSpell == toReplaceSpellStart)
                        {
                            startedOutput = true;
                        }
                        else
                        {
                            TmpPP.Lex(Tok);
                            continue;
                        }
                    }

                    // If the tokens were already space separated, or if they must be to avoid
                    // them being implicitly pasted, add a space between them.
                    if (Tok.hasLeadingSpace() || ConcatInfo.AvoidConcat(PrevPrevTok, PrevTok, Tok))
                        out += ' ';

                    if (checkReplacement())
                    {
                        continue;
                    }

                    out += TmpPP.getSpelling(Tok);
                    TmpPP.Lex(Tok);
                }
                if (stopOutputOnNextToken)
                {
                    break;
                }
            }
            else
            {
                if (checkReplacement())
                {
                    continue;
                }

                // Output original code because we are outside of a replacement.
                out += TmpPP.getSpelling(Tok);
                TmpPP.Lex(Tok);
            }
        }
        else
        {
            TmpPP.Lex(Tok);
        }
    }

    // Restore the preprocessor's old state.
    TmpPP.setDiagnostics(*OldDiags);
    TmpPP.setPragmasEnabled(PragmasPreviouslyEnabled);

    for (const auto &tmpm : TmpPP.macros())
    {
        auto it = MacroPreviouslyEnabled.find(tmpm.first);
        if (it != MacroPreviouslyEnabled.end())
        {
            auto MD = tmpm.second.getLatest();
            auto MI = MD->getMacroInfo();
            if (MI->isEnabled() && !it->second)
            {
                MI->DisableMacro();
            }
            else if (!MI->isEnabled() && it->second)
            {
                MI->EnableMacro();
            }
        }
    }

    return out;
}



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev

Hi Jonas

AST replacements are translated back into code by expanding macros where required. This of course might lead to code duplication and nasty inconsistency bugs when the programmer then changes the macro. This is not an issue for my use-case though: The transformations are applied _after_ programming, it's more of a code generation step. Why not apply those on IR level? For compatibility with other source-level tooling.

A great improvement for _during_ programming transformations would be to limit macro expansion to the absolutely necessary cases. Specifically: Don't expand when a change in the macro argument would be equivalent and don't expand when a change in the macro body would be equivalent. You could also do some clever stuff like parameterizing the original macro with the rewritten AST node, but that seems to intrusive to me.

@Manuel Yes, I will do that. Also to see if there is any interest.

-Rafael

On 24.10.18 10:30, Manuel Klimek wrote:
If you want this in the core libs, a great idea would be to write up a doc or email explaining the design and the use cases.
Note that if this doesn't solve a generic enough use case, it might still be a good idea for it to live in a different place - API surface is a cost to users.

On Tue, Oct 23, 2018 at 5:56 PM Jonas Toth via cfe-dev <[hidden email]> wrote:

Hi Rafael,

nice to see your progress!

In my opinion these advancements should go in the Tooling and Rewrite directly in clang but integrate with the existing facilities. It would be inconvenient two have two separate ways of doing rewritings.

What I am curious about is how to resolve the macro-situation. If the refactoring will work directly on the AST and ignore macros, how is the refactoring translated into real code again?

Having an functional way of doing the refactorings with just transforming the AST, e.g. replacing one node with a new subtree would be great and elegant on the AST side, but how would we move back from AST to code?

Best, Jonas

Am 23.10.18 um 17:32 schrieb Rafael·Stahl:

Hey everyone

the two major limitations are resolved now:

- The macro argument issue
- Support for replacing Decls using clang::ast_type_traits::DynTypedNode

The macro issue vanished by itself once I figured a little unintuitive detail out: The SourceLocations from the original AST and the ones from the new Preprocessor Lexer did not compare equal even though they refer to the exact same location.

I believe this is because expansion SrcLocs have a kind of pointer identity representation and equality just compares raw representations, such that even when they point to the same locations, they don't compare equal. What worked for me was a kind of "deep" comparison:

bool clutil::DeepSrcLocEqual(clang::SourceLocation lhs, clang::SourceLocation rhs, const clang::SourceManager &SM)
{
    if (lhs == rhs)
        return true;

    if (SM.getExpansionLoc(lhs) != SM.getExpansionLoc(rhs))
        return false;
    if (SM.getSpellingLoc(lhs) != SM.getSpellingLoc(rhs))
        return false;

    clang::SourceLocation lhsMacro, rhsMacro;
    if (SM.isMacroArgExpansion(lhs, &lhsMacro))
    {
        if (!SM.isMacroArgExpansion(rhs, &rhsMacro))
            return false;
        if (!DeepSrcLocEqual(lhsMacro, rhsMacro, SM))
            return false;
    }

    return true;
}

Attached is the cleaned up rewriter that I ended up with now. Still, if there is any interest, I'd be happy to contribute this back to clangs libraries, but I would require some feedback how this fits into existing facilities.

Best regards
Rafael

On 17.07.18 16:19, Jonas Toth wrote:

Hi Rafael,

I did read into clang-refactor a while ago but unfortunatly could not follow that up. If I recall correctly its about source-to-source transformation (as you said) and aims at implementing the primitive refactorings that exist (e.g. extract-method, extract-variable, ....).

Rewriting itself should happen with the normal tooling framework.

(https://clang.llvm.org/docs/RefactoringEngine.html)

Maybe the implementers of the existing code can give better comments on you proposal (and might have considered a similar solution to yours already).

+Alex Lorenz

All the best, Jonas


Am 17.07.2018 um 14:46 schrieb Rafael·Stahl:

Hi Jonas

Thanks for introducing me to this, I have seen the "Replacement" before, but not clang-refactor.

However it seems to only provide management facilities around rewrite operations and not aid with the rewriting itself. Am I missing something here?

The two core problems for me:

- nesting replacements: When implementing replacements with clang-refactor, I still have to provide replacements that are closed in themselves. I cannot make them depend on others, right?
- macros: clang-refactor only seems to work with spelling locations.

Maybe an even simpler example: Replace all additions with "add(lhs, rhs)". This in itself is very difficult with clang as soon as the Stmts are nested or macros are involved.

Best regards
Rafael


On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



// snip



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Advanced Rewriting

via cfe-dev

Hi Rafael,

i was looking at this proposal through the lens of tooling, like clang-tidy, thats why the macros were interesting.

If the whole purpose is not to change the actual source-code but more to change the code-generation macros are not so relevant, but then the propsal should _NOT_ mix with current tooling that is supposed to make changes in the source-code.

Not knowing a lot about the Codegeneration phases, but the AST will be translated into IR at some point. They should be equivalent in the program sense, but the AST might contain more interesting information for your transformation as it's on a higher level. In principle you could go to the IR.

Best Jonas

Am 25.10.18 um 09:49 schrieb Rafael·Stahl:

Hi Jonas

AST replacements are translated back into code by expanding macros where required. This of course might lead to code duplication and nasty inconsistency bugs when the programmer then changes the macro. This is not an issue for my use-case though: The transformations are applied _after_ programming, it's more of a code generation step. Why not apply those on IR level? For compatibility with other source-level tooling.

A great improvement for _during_ programming transformations would be to limit macro expansion to the absolutely necessary cases. Specifically: Don't expand when a change in the macro argument would be equivalent and don't expand when a change in the macro body would be equivalent. You could also do some clever stuff like parameterizing the original macro with the rewritten AST node, but that seems to intrusive to me.

@Manuel Yes, I will do that. Also to see if there is any interest.

-Rafael

On 24.10.18 10:30, Manuel Klimek wrote:
If you want this in the core libs, a great idea would be to write up a doc or email explaining the design and the use cases.
Note that if this doesn't solve a generic enough use case, it might still be a good idea for it to live in a different place - API surface is a cost to users.

On Tue, Oct 23, 2018 at 5:56 PM Jonas Toth via cfe-dev <[hidden email]> wrote:

Hi Rafael,

nice to see your progress!

In my opinion these advancements should go in the Tooling and Rewrite directly in clang but integrate with the existing facilities. It would be inconvenient two have two separate ways of doing rewritings.

What I am curious about is how to resolve the macro-situation. If the refactoring will work directly on the AST and ignore macros, how is the refactoring translated into real code again?

Having an functional way of doing the refactorings with just transforming the AST, e.g. replacing one node with a new subtree would be great and elegant on the AST side, but how would we move back from AST to code?

Best, Jonas

Am 23.10.18 um 17:32 schrieb Rafael·Stahl:

Hey everyone

the two major limitations are resolved now:

- The macro argument issue
- Support for replacing Decls using clang::ast_type_traits::DynTypedNode

The macro issue vanished by itself once I figured a little unintuitive detail out: The SourceLocations from the original AST and the ones from the new Preprocessor Lexer did not compare equal even though they refer to the exact same location.

I believe this is because expansion SrcLocs have a kind of pointer identity representation and equality just compares raw representations, such that even when they point to the same locations, they don't compare equal. What worked for me was a kind of "deep" comparison:

bool clutil::DeepSrcLocEqual(clang::SourceLocation lhs, clang::SourceLocation rhs, const clang::SourceManager &SM)
{
    if (lhs == rhs)
        return true;

    if (SM.getExpansionLoc(lhs) != SM.getExpansionLoc(rhs))
        return false;
    if (SM.getSpellingLoc(lhs) != SM.getSpellingLoc(rhs))
        return false;

    clang::SourceLocation lhsMacro, rhsMacro;
    if (SM.isMacroArgExpansion(lhs, &lhsMacro))
    {
        if (!SM.isMacroArgExpansion(rhs, &rhsMacro))
            return false;
        if (!DeepSrcLocEqual(lhsMacro, rhsMacro, SM))
            return false;
    }

    return true;
}

Attached is the cleaned up rewriter that I ended up with now. Still, if there is any interest, I'd be happy to contribute this back to clangs libraries, but I would require some feedback how this fits into existing facilities.

Best regards
Rafael

On 17.07.18 16:19, Jonas Toth wrote:

Hi Rafael,

I did read into clang-refactor a while ago but unfortunatly could not follow that up. If I recall correctly its about source-to-source transformation (as you said) and aims at implementing the primitive refactorings that exist (e.g. extract-method, extract-variable, ....).

Rewriting itself should happen with the normal tooling framework.

(https://clang.llvm.org/docs/RefactoringEngine.html)

Maybe the implementers of the existing code can give better comments on you proposal (and might have considered a similar solution to yours already).

+Alex Lorenz

All the best, Jonas


Am 17.07.2018 um 14:46 schrieb Rafael·Stahl:

Hi Jonas

Thanks for introducing me to this, I have seen the "Replacement" before, but not clang-refactor.

However it seems to only provide management facilities around rewrite operations and not aid with the rewriting itself. Am I missing something here?

The two core problems for me:

- nesting replacements: When implementing replacements with clang-refactor, I still have to provide replacements that are closed in themselves. I cannot make them depend on others, right?
- macros: clang-refactor only seems to work with spelling locations.

Maybe an even simpler example: Replace all additions with "add(lhs, rhs)". This in itself is very difficult with clang as soon as the Stmts are nested or macros are involved.

Best regards
Rafael


On 16.07.2018 19:06, Jonas Toth via cfe-dev wrote:

Hi Rafael,

wouldn't your usecase be a task for clang-refactor?

Best,  Jonas


Am 16.07.2018 um 17:08 schrieb Rafael·Stahl via cfe-dev:
Hey everyone

The rewriting API of Clang operates on the source code in textual form. The user can use AST nodes to figure out what to replace, but in the end he has to remove and insert snippets in a linear piece of text.

This is very inconvenient when it is required to restructure and nest replacements. The involvement of macros makes a manual process even more difficult. See some recent threads expressing difficulty with the API [1][2].

What do I mean by "nested replacements"? For example in the following:

    int i = x + s->a;

I would want to replace the BinaryOperator with a function call and the MemberExpr with some constant:

    int i = Addition(x, 7);

When keeping the two replacement rules independent of each other, achieving this with the current API is extremely difficult. More so when macros are involved.

I am proposing some kind of helper that aims to solve these issues by providing an interface that offers to directly replace AST nodes and a mechanism to nest AST node replacements - without having to worry about macros.

Potential usage:

- Develop a class that derives from StmtToRewrite to define how replacements should happen:

    class RewriteAdds : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            auto binOp = dyn_cast<BinaryOperator>(replaceS);
            return "Addition(" + getMgr()->getReplaced(binOp->getLHS()).strToInsert + ", " +
getMgr()->getReplaced(binOp->getRHS()).strToInsert + ")";
        }
    };

    class RewriteMembs : public cu::StmtToRewrite
    {
    public:
        std::string makeReplaceStr() const override
        {
            return "7";
        }
    };

- Construct a RewriteManager:

    cu::RewriteManager mgr(ACtx, PP);

- Add rewriting operations to the manager:

    // std::vector<const Stmt *> AddStmts = /* matched from binaryOperator() with plus */
    // std::vector<const Stmt *> MembStmts = /* matched from memberExpr() */
    for (const auto &S : AddStmts) mgr.registerStmt<RewriteAdds>(S);
    for (const auto &S : MembStmts) mgr.registerStmt<RewriteMembs>(S);

- Retrieve and apply the results:

    clang::Rewriter rewriter(SM, LangOpts);
    for (const auto &r : mgr.getReplacements()) {
        rewriter.RemoveText(r.rangeToRemove);
        rewriter.InsertText(r.rangeToRemove.getBegin(), r.strToInsert);
    }


At the end of this mail is my low quality code that kind-of implements this. TLDR:

- Build a hierarchy of stmts to replace and keep track of which replacements must be combined
- Move further up in the AST if these replacements are inside a macro
- Recursively lex the file and look for replacements outside-in by spelling locations. Expand any macros that are encountered during this. The re-lexing idea is based on the hint in [3].

The code has the following shortcomings:

- I do not know how to distinguish macro argument expansions within macros. For example in "#define FOO(a) a + a" the two "a"s expand to different AST nodes that could be replaced with different rules. This is an important issue, because it can lead to completely broken code with nesting.
- Limited to Stmts, when Decls should be supported too.
- Very un-optimized with lexing the entire source file many times. Easy to solve, but didn't want to raise the complexity further for now.
- Could keep written code more clean by only expanding macros if required. For example not required if just a macro arg is replaced and all expansions would be the same.


I am very interested in your general thoughts. I'm not very experienced with clang, but this was my vision how I would want to do replacements. Are you interested in getting this into clang? I would need help with ironing out the remaining issues.

-Rafael


[1] http://lists.llvm.org/pipermail/cfe-dev/2018-July/058430.html
[2] http://lists.llvm.org/pipermail/cfe-dev/2018-June/058213.html
[3] http://lists.llvm.org/pipermail/cfe-dev/2017-August/055079.html



// snip



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev