RFC: Preprocessor option to assist with parsing a single file only

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFC: Preprocessor option to assist with parsing a single file only

Xin Wang via cfe-dev
Hey all,

In r305044 I introduced a preprocessor option (bool SingleFileParseMode) and clang-c/Index.h enumerator (CXTranslationUnit_SingleFileParse) to assist with ‘parsing a single file only’. I’m going to provide some details and context on why such parsing is useful and why a new option is necessary.

Parsing a single file (essentially parse it normally but without including any other headers) is useful as a way to determine the global symbols that exist in the source files, in an inaccurate but ‘lightning-super-fast’ mode. For example, if the source is like this:

@implementation Foo
-(void)testSomething {}
-(NSString*)returnIt { return @“blah”; }
@end

The parser can determine that there is an ObjC @implementation named ‘Foo’ with 2 methods, -testSomething, and -returnIt. Even if no SDK header gets included and ‘NSString’ becomes unresolved, the parser can still provide the associated global symbols.

In general terms, think of this like approximating the inaccurate parsing that something like SublimeText is doing, where there’s no preprocessor or precise typechecking but it can still provide you with a list of symbols and some rudimentary jump-to-definition.

We’ve used this for a while now in Xcode to do something like ‘fast-scanning’ specifically for ObjC unit tests (*). This allows us to show the available unit tests almost immediately once you open a project, without waiting for the full-accurate indexing to complete.
If the ‘fast-scan’ is missing something, e.g. due to preprocessor directives or macros, it will still show up once the accurate indexing catches up.

To clarify, this is working without any modifications to clang, we were just using libclang to parse the file containing the unit tests and did not pass any search paths, which had the practical effect of not including headers. So why adding the option now ?

This is due to the limitation of the 'fast scan' not seeing symbols inside preprocessor directives. For example, with code like this:

#if ENABLE_FOO_TESTS

@implementation Foo
-(void)testSomething {}
@end

#endif

‘ENABLE_FOO_TESTS’ is not defined so the preprocessor skips this block and we miss getting these tests via the ‘fast scan’. Here’s what I’d like to propose:

If ‘SingleFileParseMode’ is true, the preprocessor will treat undefined identifiers in preprocessor directives specially. If the directive is making use of an undefined identifier then it will cause it to ignore the directive and parse all blocks of the directive (the #if block, and the #else one as well).
If the directive is using literals like:

#if 0
#endif

#if 1
#endif

Or making use of defined macros then there’s no change of behavior.

With such a change, in this ‘fast-scan-inaccurate-mode’ we’ll be able to gather the symbols that exist in preprocessor directives like the "#if ENABLE_FOO_TESTS” example.

Let me know what you think!


(*) Dealing only with detection of ObjC unit tests has a restricted scope and clang was well equipped to help with unmodified. If we want to extend ‘fast/inaccurate’ parsing and try to gather such symbol info from all files, clang would need to be enhanced to improve its error recovery and not drop valuable information from its AST when there are compiler errors. But this is a discussion for another thread at some later point in future.

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: Preprocessor option to assist with parsing a single file only

Xin Wang via cfe-dev
Put a patch for review here:

On Jun 14, 2017, at 6:25 PM, Argyrios Kyrtzidis via cfe-dev <[hidden email]> wrote:

Hey all,

In r305044 I introduced a preprocessor option (bool SingleFileParseMode) and clang-c/Index.h enumerator (CXTranslationUnit_SingleFileParse) to assist with ‘parsing a single file only’. I’m going to provide some details and context on why such parsing is useful and why a new option is necessary.

Parsing a single file (essentially parse it normally but without including any other headers) is useful as a way to determine the global symbols that exist in the source files, in an inaccurate but ‘lightning-super-fast’ mode. For example, if the source is like this:

@implementation Foo
-(void)testSomething {}
-(NSString*)returnIt { return @“blah”; }
@end

The parser can determine that there is an ObjC @implementation named ‘Foo’ with 2 methods, -testSomething, and -returnIt. Even if no SDK header gets included and ‘NSString’ becomes unresolved, the parser can still provide the associated global symbols.

In general terms, think of this like approximating the inaccurate parsing that something like SublimeText is doing, where there’s no preprocessor or precise typechecking but it can still provide you with a list of symbols and some rudimentary jump-to-definition.

We’ve used this for a while now in Xcode to do something like ‘fast-scanning’ specifically for ObjC unit tests (*). This allows us to show the available unit tests almost immediately once you open a project, without waiting for the full-accurate indexing to complete.
If the ‘fast-scan’ is missing something, e.g. due to preprocessor directives or macros, it will still show up once the accurate indexing catches up.

To clarify, this is working without any modifications to clang, we were just using libclang to parse the file containing the unit tests and did not pass any search paths, which had the practical effect of not including headers. So why adding the option now ?

This is due to the limitation of the 'fast scan' not seeing symbols inside preprocessor directives. For example, with code like this:

#if ENABLE_FOO_TESTS

@implementation Foo
-(void)testSomething {}
@end

#endif

‘ENABLE_FOO_TESTS’ is not defined so the preprocessor skips this block and we miss getting these tests via the ‘fast scan’. Here’s what I’d like to propose:

If ‘SingleFileParseMode’ is true, the preprocessor will treat undefined identifiers in preprocessor directives specially. If the directive is making use of an undefined identifier then it will cause it to ignore the directive and parse all blocks of the directive (the #if block, and the #else one as well).
If the directive is using literals like:

#if 0
#endif

#if 1
#endif

Or making use of defined macros then there’s no change of behavior.

With such a change, in this ‘fast-scan-inaccurate-mode’ we’ll be able to gather the symbols that exist in preprocessor directives like the "#if ENABLE_FOO_TESTS” example.

Let me know what you think!


(*) Dealing only with detection of ObjC unit tests has a restricted scope and clang was well equipped to help with unmodified. If we want to extend ‘fast/inaccurate’ parsing and try to gather such symbol info from all files, clang would need to be enhanced to improve its error recovery and not drop valuable information from its AST when there are compiler errors. But this is a discussion for another thread at some later point in future.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: Preprocessor option to assist with parsing a single file only

Xin Wang via cfe-dev
I think this is a useful addition that has been requested multiple times in the past - have you by chance tried this on C++ code? I'd predict it doesn't work well there, but I'd be curious whether you have other results :)

On Fri, Jun 16, 2017 at 2:11 AM Argyrios Kyrtzidis <[hidden email]> wrote:
Put a patch for review here:

On Jun 14, 2017, at 6:25 PM, Argyrios Kyrtzidis via cfe-dev <[hidden email]> wrote:

Hey all,

In r305044 I introduced a preprocessor option (bool SingleFileParseMode) and clang-c/Index.h enumerator (CXTranslationUnit_SingleFileParse) to assist with ‘parsing a single file only’. I’m going to provide some details and context on why such parsing is useful and why a new option is necessary.

Parsing a single file (essentially parse it normally but without including any other headers) is useful as a way to determine the global symbols that exist in the source files, in an inaccurate but ‘lightning-super-fast’ mode. For example, if the source is like this:

@implementation Foo
-(void)testSomething {}
-(NSString*)returnIt { return @“blah”; }
@end

The parser can determine that there is an ObjC @implementation named ‘Foo’ with 2 methods, -testSomething, and -returnIt. Even if no SDK header gets included and ‘NSString’ becomes unresolved, the parser can still provide the associated global symbols.

In general terms, think of this like approximating the inaccurate parsing that something like SublimeText is doing, where there’s no preprocessor or precise typechecking but it can still provide you with a list of symbols and some rudimentary jump-to-definition.

We’ve used this for a while now in Xcode to do something like ‘fast-scanning’ specifically for ObjC unit tests (*). This allows us to show the available unit tests almost immediately once you open a project, without waiting for the full-accurate indexing to complete.
If the ‘fast-scan’ is missing something, e.g. due to preprocessor directives or macros, it will still show up once the accurate indexing catches up.

To clarify, this is working without any modifications to clang, we were just using libclang to parse the file containing the unit tests and did not pass any search paths, which had the practical effect of not including headers. So why adding the option now ?

This is due to the limitation of the 'fast scan' not seeing symbols inside preprocessor directives. For example, with code like this:

#if ENABLE_FOO_TESTS

@implementation Foo
-(void)testSomething {}
@end

#endif

‘ENABLE_FOO_TESTS’ is not defined so the preprocessor skips this block and we miss getting these tests via the ‘fast scan’. Here’s what I’d like to propose:

If ‘SingleFileParseMode’ is true, the preprocessor will treat undefined identifiers in preprocessor directives specially. If the directive is making use of an undefined identifier then it will cause it to ignore the directive and parse all blocks of the directive (the #if block, and the #else one as well).
If the directive is using literals like:

#if 0
#endif

#if 1
#endif

Or making use of defined macros then there’s no change of behavior.

With such a change, in this ‘fast-scan-inaccurate-mode’ we’ll be able to gather the symbols that exist in preprocessor directives like the "#if ENABLE_FOO_TESTS” example.

Let me know what you think!


(*) Dealing only with detection of ObjC unit tests has a restricted scope and clang was well equipped to help with unmodified. If we want to extend ‘fast/inaccurate’ parsing and try to gather such symbol info from all files, clang would need to be enhanced to improve its error recovery and not drop valuable information from its AST when there are compiler errors. But this is a discussion for another thread at some later point in future.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: Preprocessor option to assist with parsing a single file only

Xin Wang via cfe-dev

On Jun 19, 2017, at 3:31 AM, Manuel Klimek via cfe-dev <[hidden email]> wrote:

I think this is a useful addition that has been requested multiple times in the past - have you by chance tried this on C++ code? I'd predict it doesn't work well there, but I'd be curious whether you have other results :)

We haven’t tried on C++. It likely depends on the style of the codebase, it would be interesting to see how much info we can get using the clang repo.
Note that, even for ObjC, experiments with generalizing beyond unit test discovery showed that we’d need improvements in error recovery, e.g
- when you have '@interace A : B’ it should not drop ‘B’ completely from the super-class list if ‘B’ is unresolved
- there were cases where clang was too ‘liberal’ in skipping tokens after a parser error
- unresolved types changing to ‘int’ is not great



On Fri, Jun 16, 2017 at 2:11 AM Argyrios Kyrtzidis <[hidden email]> wrote:
Put a patch for review here:

On Jun 14, 2017, at 6:25 PM, Argyrios Kyrtzidis via cfe-dev <[hidden email]> wrote:

Hey all,

In r305044 I introduced a preprocessor option (bool SingleFileParseMode) and clang-c/Index.h enumerator (CXTranslationUnit_SingleFileParse) to assist with ‘parsing a single file only’. I’m going to provide some details and context on why such parsing is useful and why a new option is necessary.

Parsing a single file (essentially parse it normally but without including any other headers) is useful as a way to determine the global symbols that exist in the source files, in an inaccurate but ‘lightning-super-fast’ mode. For example, if the source is like this:

@implementation Foo
-(void)testSomething {}
-(NSString*)returnIt { return @“blah”; }
@end

The parser can determine that there is an ObjC @implementation named ‘Foo’ with 2 methods, -testSomething, and -returnIt. Even if no SDK header gets included and ‘NSString’ becomes unresolved, the parser can still provide the associated global symbols.

In general terms, think of this like approximating the inaccurate parsing that something like SublimeText is doing, where there’s no preprocessor or precise typechecking but it can still provide you with a list of symbols and some rudimentary jump-to-definition.

We’ve used this for a while now in Xcode to do something like ‘fast-scanning’ specifically for ObjC unit tests (*). This allows us to show the available unit tests almost immediately once you open a project, without waiting for the full-accurate indexing to complete.
If the ‘fast-scan’ is missing something, e.g. due to preprocessor directives or macros, it will still show up once the accurate indexing catches up.

To clarify, this is working without any modifications to clang, we were just using libclang to parse the file containing the unit tests and did not pass any search paths, which had the practical effect of not including headers. So why adding the option now ?

This is due to the limitation of the 'fast scan' not seeing symbols inside preprocessor directives. For example, with code like this:

#if ENABLE_FOO_TESTS

@implementation Foo
-(void)testSomething {}
@end

#endif

‘ENABLE_FOO_TESTS’ is not defined so the preprocessor skips this block and we miss getting these tests via the ‘fast scan’. Here’s what I’d like to propose:

If ‘SingleFileParseMode’ is true, the preprocessor will treat undefined identifiers in preprocessor directives specially. If the directive is making use of an undefined identifier then it will cause it to ignore the directive and parse all blocks of the directive (the #if block, and the #else one as well).
If the directive is using literals like:

#if 0
#endif

#if 1
#endif

Or making use of defined macros then there’s no change of behavior.

With such a change, in this ‘fast-scan-inaccurate-mode’ we’ll be able to gather the symbols that exist in preprocessor directives like the "#if ENABLE_FOO_TESTS” example.

Let me know what you think!


(*) Dealing only with detection of ObjC unit tests has a restricted scope and clang was well equipped to help with unmodified. If we want to extend ‘fast/inaccurate’ parsing and try to gather such symbol info from all files, clang would need to be enhanced to improve its error recovery and not drop valuable information from its AST when there are compiler errors. But this is a discussion for another thread at some later point in future.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Loading...