AST Serialization

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

AST Serialization

suyash singh via cfe-dev



I have a question regarding the assumptions and correct usage of the AST serialization (regarding C and C++ sources).


I have done the following:

1)      I have implemented a ClangTool which builds ASTs from compilation databases.

2)      I have dumped the contents of the ASTs in both textual and binary formats.

3)      Then I have read in the serialized binary, and dumped that one again in both formats.


What I have noticed, is that dump of the different generations are different in size (up to a magnitude). Textual dumps also differ.

I would have assumed the serialization and deserialization steps to produce an AST which is the same as the original.


Maybe I have done it the wrong way, in the following outline I try to give the gist of the method used:


void textual_dump_to_file(const ASTUnit& unit, StringRef file_path) {

    using namespace llvm::sys::fs;

   using namespace llvm::sys::path;


   // mkdir -p



  std::error_code EC;

  llvm::raw_fd_ostream out {file_path, EC};

  unit.getASTContext().getTranslationUnitDecl()->dump(out, /*deserialize*/ true);



void experiment_with_unit(CompilerInstance& CI, ASTUnit& Unit, StringRef MethodPrefix, StringRef SourcePath) {


  using namespace llvm::sys::fs;

  using namespace llvm::sys::path;


  IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts = new DiagnosticOptions();

  TextDiagnosticPrinter *DiagClient = new TextDiagnosticPrinter(llvm::errs(), &*DiagOpts);

   IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs());

   IntrusiveRefCntPtr<DiagnosticsEngine> Diags(

           new DiagnosticsEngine(DiagID, &*DiagOpts, DiagClient));


   llvm::SmallString<256> TextDumpPath{MethodPrefix};



   llvm::SmallString<256> BinaryDumpPath {TextDumpPath};


   replace_extension(TextDumpPath, ".txt1");

   replace_extension(BinaryDumpPath, ".bin1");




   textual_dump_to_file(Unit, TextDumpPath);


   auto Dump1Loaded = ASTUnit::LoadFromASTFile(

        std::string(BinaryDumpPath), CI.getPCHContainerOperations()->getRawReader(),

       ASTUnit::LoadEverything, Diags, CI.getFileSystemOpts());


   replace_extension(TextDumpPath, ".txt2");

   replace_extension(BinaryDumpPath, ".bin2");



   textual_dump_to_file(*Dump1Loaded, TextDumpPath);



Files with extensions txt1 and txt2 differ, and bin1 and bin2 as well.

I would think that if there is a problem in the reproducibility of the AST, then it would affect modules, and the analyzer as well.


Any thoughts on this?



cfe-dev mailing list
[hidden email]