clang memory usage with C++ template metaprogramming

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

clang memory usage with C++ template metaprogramming

John Bytheway
Several years ago I embarked on a project involving some heavy-duty C++
template metaprogramming.  In the end I abandoned it because the compile
times and memory usage with g++ were too big.

On seeing clang's promised reduction of such requirements, I thought I'd
go back to my project and see how clang fared when compiling it.
Although it does indeed run much faster than g++, it actually uses
*more* memory.  I'm just posting here to ask if this is to be expected.
 If it might be indicative of some issue or if you'd like to know where
all this memory is being used then I'd be happy to try some profiling.

Here are some numbers for one particular compilation:

g++ 4.3.3:
  Wall clock time: 9:38.92
  Peak memory usage: ~1.40GiB

g++ 4.4.3:
  Wall clock time: 7:01.17
  Peak memory usage: ~1.37GiB

clang++ (svn r105478):
  Wall clock time: 0:15.59
  Peak memory usage: ~1.50GiB

TBH I'm astonished that clang was able to gobble up so much memory in so
little time!

John Bytheway


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: clang memory usage with C++ template metaprogramming

Douglas Gregor

On Jun 8, 2010, at 2:55 PM, John Bytheway wrote:

> Several years ago I embarked on a project involving some heavy-duty C++
> template metaprogramming.  In the end I abandoned it because the compile
> times and memory usage with g++ were too big.
>
> On seeing clang's promised reduction of such requirements, I thought I'd
> go back to my project and see how clang fared when compiling it.
> Although it does indeed run much faster than g++, it actually uses
> *more* memory.  I'm just posting here to ask if this is to be expected.
> If it might be indicative of some issue or if you'd like to know where
> all this memory is being used then I'd be happy to try some profiling.
>
> Here are some numbers for one particular compilation:
>
> g++ 4.3.3:
>  Wall clock time: 9:38.92
>  Peak memory usage: ~1.40GiB
>
> g++ 4.4.3:
>  Wall clock time: 7:01.17
>  Peak memory usage: ~1.37GiB
>
> clang++ (svn r105478):
>  Wall clock time: 0:15.59
>  Peak memory usage: ~1.50GiB

I've also seen this with template metaprogramming-heavy code, but aside from some idle speculation (we think it has to do with type-location information in Clang), we haven't looked into it closely.

        - Doug
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: clang memory usage with C++ template metaprogramming

John Bytheway
On 08/06/10 23:03, Douglas Gregor wrote:

>
> On Jun 8, 2010, at 2:55 PM, John Bytheway wrote:
>
>> Several years ago I embarked on a project involving some heavy-duty
>> C++ template metaprogramming.  In the end I abandoned it because
>> the compile times and memory usage with g++ were too big.
>>
>> On seeing clang's promised reduction of such requirements, I
>> thought I'd go back to my project and see how clang fared when
>> compiling it. Although it does indeed run much faster than g++, it
>> actually uses *more* memory.  I'm just posting here to ask if this
>> is to be expected. If it might be indicative of some issue or if
>> you'd like to know where all this memory is being used then I'd be
>> happy to try some profiling.
<snip>

> I've also seen this with template metaprogramming-heavy code, but
> aside from some idle speculation (we think it has to do with
> type-location information in Clang), we haven't looked into it
> closely.

Fair enough.  I was curious, so I ran valgrind/massif to get an idea.
In short:

16.53% (259,009,024B) in 722 places, all below massif's threshold
14.49% (227,086,336B) clang::DeclContext::CreateStoredDeclsMap
12.85% (201,326,592B) clang::SourceManager::createInstantiationLoc
12.83% (201,068,544B) clang::ASTContext::CreateTypeSourceInfo
08.86% (138,792,960B) clang::ASTContext::getTemplateSpecializationType
06.82% (106,841,236B) clang::TokenLexer::ExpandFunctionArguments
04.94% (77,463,552B) clang::CXXConstructorDecl::Create
02.86% (44,883,968B) clang::ClassTemplateSpecializationDecl::Create
02.71% (42,532,864B) clang::ParmVarDecl::Create
02.25% (35,332,096B) clang::TagDecl::startDefinition
02.15% (33,763,328B) clang::TemplateArgumentList::TemplateArgumentList
02.14% (33,554,432B) std::vector<clang::Type*,
std::allocator<clang::Type*> >::_M_insert_aux
02.06% (32,329,728B) clang::CXXRecordDecl::Create
02.05% (32,157,696B) clang::CXXMethodDecl::Create
01.82% (28,585,984B) clang::CXXDestructorDecl::Create
01.59% (24,907,776B) clang::ASTContext::getFunctionType
01.27% (19,861,504B) clang::ASTContext::getLValueReferenceType
01.08% (16,908,288B) clang::TypedefDecl::Create

So indeed type location information is a significant part, but nothing
is overwhelming, which I guess is a good sign and nothing is worth changing.

I wonder idly: How plausible would it be to allow execution in a mode
where no source information was maintained, and thus reduce memory usage
(at the expense of useful errors/warnings)?  Such a mode might be useful
at times.  I'm guessing it would be prohibitively difficult.

John Bytheway

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: clang memory usage with C++ template metaprogramming

Douglas Gregor

On Jun 8, 2010, at 4:03 PM, John Bytheway wrote:

> On 08/06/10 23:03, Douglas Gregor wrote:
>>
>> On Jun 8, 2010, at 2:55 PM, John Bytheway wrote:
>>
>>> Several years ago I embarked on a project involving some heavy-duty
>>> C++ template metaprogramming.  In the end I abandoned it because
>>> the compile times and memory usage with g++ were too big.
>>>
>>> On seeing clang's promised reduction of such requirements, I
>>> thought I'd go back to my project and see how clang fared when
>>> compiling it. Although it does indeed run much faster than g++, it
>>> actually uses *more* memory.  I'm just posting here to ask if this
>>> is to be expected. If it might be indicative of some issue or if
>>> you'd like to know where all this memory is being used then I'd be
>>> happy to try some profiling.
> <snip>
>
>> I've also seen this with template metaprogramming-heavy code, but
>> aside from some idle speculation (we think it has to do with
>> type-location information in Clang), we haven't looked into it
>> closely.
>
> Fair enough.  I was curious, so I ran valgrind/massif to get an idea.
> In short:
>
> 16.53% (259,009,024B) in 722 places, all below massif's threshold
> 14.49% (227,086,336B) clang::DeclContext::CreateStoredDeclsMap

In theory, we might be able to use a smaller data structure for DeclContexts with only a few elements in them, which would probably help reduce memory usage when we're dealing with many instantiations of small templates.

> 12.85% (201,326,592B) clang::SourceManager::createInstantiationLoc
> 06.82% (106,841,236B) clang::TokenLexer::ExpandFunctionArguments

There must be some preprocessor metaprogramming going on this example, too? That's pretty big for the preprocessor.

> 12.83% (201,068,544B) clang::ASTContext::CreateTypeSourceInfo

Yes, this is the type-source information I mentioned. If we make template instantiation "perfect" with respect to type-source information, so that any dependent type instantiates down to something that structurally identical to the form it had when it was written in the source, then we could avoid allocating memory for type-source information in each type instantiation. We're not too far from this goal, but it has to be *perfect* for us to use the optimization.

> 04.94% (77,463,552B) clang::CXXConstructorDecl::Create
> 02.05% (32,157,696B) clang::CXXMethodDecl::Create
> 01.82% (28,585,984B) clang::CXXDestructorDecl::Create

A number of these could be eliminated if we were to lazily create the implicitly-declared default constructor, copy constructor, copy-assignment operator, and destructor.

> 08.86% (138,792,960B) clang::ASTContext::getTemplateSpecializationType

> 02.15% (33,763,328B) clang::TemplateArgumentList::TemplateArgumentList
> 01.59% (24,907,776B) clang::ASTContext::getFunctionType
> 01.27% (19,861,504B) clang::ASTContext::getLValueReferenceType
> 01.08% (16,908,288B) clang::TypedefDecl::Create

Not much we can do about these, except look for ways to make the various AST nodes smaller.

> So indeed type location information is a significant part, but nothing
> is overwhelming, which I guess is a good sign and nothing is worth changing.
>
> I wonder idly: How plausible would it be to allow execution in a mode
> where no source information was maintained, and thus reduce memory usage
> (at the expense of useful errors/warnings)?  Such a mode might be useful
> at times.  I'm guessing it would be prohibitively difficult.

We discussed this back when we improved type-source location information, but I am very much against having such a mode: the AST should always be the same, for all clients, or the size of the testing matrix explodes and we get far worse coverage. We should spend time optimizing the system as a unified whole rather than trying to separate out the less-efficient bits that provide needed functionality.

        - Doug
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: clang memory usage with C++ template metaprogramming

John Bytheway
On 09/06/10 00:54, Douglas Gregor wrote:
> On Jun 8, 2010, at 4:03 PM, John Bytheway wrote:
>> Fair enough.  I was curious, so I ran valgrind/massif to get an
>> idea. In short:
<snip>
>> 12.85% (201,326,592B) clang::SourceManager::createInstantiationLoc
>> 06.82% (106,841,236B) clang::TokenLexer::ExpandFunctionArguments
>
> There must be some preprocessor metaprogramming going on this
> example, too? That's pretty big for the preprocessor.

Yes, there is.  Give me variadic templates and constexpr functions and I
can dispose of most of it :).

>> 12.83% (201,068,544B) clang::ASTContext::CreateTypeSourceInfo
>
> Yes, this is the type-source information I mentioned. If we make
> template instantiation "perfect" with respect to type-source
> information, so that any dependent type instantiates down to
> something that structurally identical to the form it had when it was
> written in the source, then we could avoid allocating memory for
> type-source information in each type instantiation. We're not too far
> from this goal, but it has to be *perfect* for us to use the
> optimization.

A laudable goal.

>> 04.94% (77,463,552B) clang::CXXConstructorDecl::Create
>> 02.05% (32,157,696B) clang::CXXMethodDecl::Create
>> 01.82% (28,585,984B) clang::CXXDestructorDecl::Create
>
> A number of these could be eliminated if we were to lazily create the
> implicitly-declared default constructor, copy constructor,
> copy-assignment operator, and destructor.

That sounds like the easiest of these; if it is then it's a shame these
are not a larger proportion of the problem.

>> I wonder idly: How plausible would it be to allow execution in a
>> mode where no source information was maintained, and thus reduce
>> memory usage (at the expense of useful errors/warnings)?  Such a
>> mode might be useful at times.  I'm guessing it would be
>> prohibitively difficult.
>
> We discussed this back when we improved type-source location
> information, but I am very much against having such a mode: the AST
> should always be the same, for all clients, or the size of the
> testing matrix explodes and we get far worse coverage. We should
> spend time optimizing the system as a unified whole rather than
> trying to separate out the less-efficient bits that provide needed
> functionality.

Yeah, that feels wise.

Thanks for the insight,

John Bytheway

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: clang memory usage with C++ template metaprogramming

John McCall
In reply to this post by Douglas Gregor
On Jun 8, 2010, at 4:54 PM, Douglas Gregor wrote:
>> 08.86% (138,792,960B) clang::ASTContext::getTemplateSpecializationType
>
>> 02.15% (33,763,328B) clang::TemplateArgumentList::TemplateArgumentList
>> 01.59% (24,907,776B) clang::ASTContext::getFunctionType
>> 01.27% (19,861,504B) clang::ASTContext::getLValueReferenceType
>> 01.08% (16,908,288B) clang::TypedefDecl::Create
>
> Not much we can do about these, except look for ways to make the various AST nodes smaller.

Actually, we still don't unique non-canonical TSTs;  it's possible that
doing so would cut down on memory usage in these cases, at least when none
of the template arguments are expressions.

We also make some unnecessary TSTs during argument expansion.

John.
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev