AST Serialization

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

AST Serialization

suyash singh via cfe-dev

Hi,

 

I have a question regarding the assumptions and correct usage of the AST serialization (regarding C and C++ sources).

 

I have done the following:

1)      I have implemented a ClangTool which builds ASTs from compilation databases.

2)      I have dumped the contents of the ASTs in both textual and binary formats.

3)      Then I have read in the serialized binary, and dumped that one again in both formats.

 

What I have noticed, is that dump of the different generations are different in size (up to a magnitude). Textual dumps also differ.

I would have assumed the serialization and deserialization steps to produce an AST which is the same as the original.

 

Maybe I have done it the wrong way, in the following outline I try to give the gist of the method used:

 

void textual_dump_to_file(const ASTUnit& unit, StringRef file_path) {

    using namespace llvm::sys::fs;

   using namespace llvm::sys::path;

 

   // mkdir -p

  create_directories(parent_path(file_path));

 

  std::error_code EC;

  llvm::raw_fd_ostream out {file_path, EC};

  unit.getASTContext().getTranslationUnitDecl()->dump(out, /*deserialize*/ true);

}

 

void experiment_with_unit(CompilerInstance& CI, ASTUnit& Unit, StringRef MethodPrefix, StringRef SourcePath) {

 

  using namespace llvm::sys::fs;

  using namespace llvm::sys::path;

 

  IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts = new DiagnosticOptions();

  TextDiagnosticPrinter *DiagClient = new TextDiagnosticPrinter(llvm::errs(), &*DiagOpts);

   IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs());

   IntrusiveRefCntPtr<DiagnosticsEngine> Diags(

           new DiagnosticsEngine(DiagID, &*DiagOpts, DiagClient));

 

   llvm::SmallString<256> TextDumpPath{MethodPrefix};

   TextDumpPath.append(SourcePath);

 

   llvm::SmallString<256> BinaryDumpPath {TextDumpPath};

 

   replace_extension(TextDumpPath, ".txt1");

   replace_extension(BinaryDumpPath, ".bin1");

 

   Unit.Save(BinaryDumpPath);

  

   textual_dump_to_file(Unit, TextDumpPath);

 

   auto Dump1Loaded = ASTUnit::LoadFromASTFile(

        std::string(BinaryDumpPath), CI.getPCHContainerOperations()->getRawReader(),

       ASTUnit::LoadEverything, Diags, CI.getFileSystemOpts());

 

   replace_extension(TextDumpPath, ".txt2");

   replace_extension(BinaryDumpPath, ".bin2");

 

   Dump1Loaded->Save(BinaryDumpPath);

   textual_dump_to_file(*Dump1Loaded, TextDumpPath);

}

 

Files with extensions txt1 and txt2 differ, and bin1 and bin2 as well.

I would think that if there is a problem in the reproducibility of the AST, then it would affect modules, and the analyzer as well.

 

Any thoughts on this?

 

Thanks


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev