RFC: clacc: translating OpenACC to OpenMP in clang

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.  It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.  Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?

Thanks,

Jeff

On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.  It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.  Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?

Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do). I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).

Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.

 -Hal


Thanks,

Jeff

On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.  It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev


On 12/05/2017 01:06 PM, Joel E. Denny wrote:
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

Great.


We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls

I don't think that we want this option because, if nothing else, it will preclude builting source-level tooling for OpenACC.

2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

My recommendation: We should think about the very best way we could refactor the code to implement (2), and if that is too ugly (or otherwise significantly degrades maintainability of the OpenMP code), then we should choose (3).


In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

We'll end up doing this, but most of this falls within the scope of the "parallel IR" designs that many of us are working on. Doing this kind of analysis in the frontend is hard (because it essentially requires it to do inlining, simplification, and analysis akin to what the optimizer itself does).


6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.

I'm fairly certain that not everything maps exactly. They'll be some things we need to deal with explicitly in CodeGen.

It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


+1


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

This makes sense to me, and I think is most likely to leave the CodeGen code easiest to maintain (and has good separation of concerns). Nevertheless, I think we should go through the mental refactoring exercise for (2) to decide on the value of (3).

Thanks again,
Hal


Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev
Hi Jeff, Hal,

Thanks for your feedback.  My comments are inline below.

On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.

I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:


  Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?

Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).

I'll look for a better list that I can share.
 
I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).

Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.

Joel


Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.

 -Hal



Thanks,

Jeff

On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.  It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev
Hi Hal,

Thanks for your feedback.  It sounds like we're basically in agreement, but I've added a few thoughts inline below.

On Wed, Dec 6, 2017 at 4:02 AM, Hal Finkel <[hidden email]> wrote:


On 12/05/2017 01:06 PM, Joel E. Denny wrote:
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

Great.


We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls

I don't think that we want this option because, if nothing else, it will preclude builting source-level tooling for OpenACC.

Agreed.


2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

My recommendation: We should think about the very best way we could refactor the code to implement (2), and if that is too ugly (or otherwise significantly degrades maintainability of the OpenMP code), then we should choose (3).

I started out with design 2 in the early prototype I'm experimenting with.  Eventually I figured out some possibilities for how to implement the ttx component above (I'd be happy to discuss that), and I switched to design 3.  So far, I'm finding design 3 to be easier to implement.  Moreover, I can use -ast-print combined with a custom option to print either OpenACC source, OpenMP source, or both with one commented out.  I like that capability.  However, I think it's clear that design 3 has greater potential for running into difficulties as I move forward to more complex OpenACC constructs.




In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

We'll end up doing this, but most of this falls within the scope of the "parallel IR" designs that many of us are working on. Doing this kind of analysis in the frontend is hard (because it essentially requires it to do inlining, simplification, and analysis akin to what the optimizer itself does).

I agree.  However, before the parallel IR efforts mature, I need to make progress.  Also, I want to keep my options open, especially at this early stage, so I can experiment with different possibilities.



6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.

I'm fairly certain that not everything maps exactly. They'll be some things we need to deal with explicitly in CodeGen.

It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


+1


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

This makes sense to me, and I think is most likely to leave the CodeGen code easiest to maintain (and has good separation of concerns). Nevertheless, I think we should go through the mental refactoring exercise for (2) to decide on the value of (3).

At this moment, I'm finding that the easiest way to explore is to just push forward with design 3.  Even so, if developers who have a deeper understanding than I do of clang's OpenMP implementation would like to have an email discussion on the refactoring exercise for design 2, I agree that would be helpful.


Thanks again,
Hal

Thanks.

Joel



Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev


On Fri, Dec 8, 2017 at 7:51 AM, Joel E. Denny <[hidden email]> wrote:

>
> Hi Jeff, Hal,
>
> Thanks for your feedback.  My comments are inline below.
>
> On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
>>
>> On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
>>
>> All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
>
> I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:

Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.

> http://mrfil.github.io/PowerGrid/

/tmp/PowerGrid$ git grep -il "pragma acc"
PowerGrid/Gfft.hpp
PowerGrid/Gnufft.hpp
PowerGrid/ftCpu.hpp
PowerGrid/gridding.hpp
PowerGrid/griddingSupport.hpp

From http://mrfil.github.io/PowerGrid/docs/Installation:

We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.

GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.

For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.


So these folks support OpenACC, but haven't bothered to try the GCC implementation in the 1+ year that it's been available.  How likely are they to use Clang's?

> https://nek5000.mcs.anl.gov/ (look at the openacc branch in github)

(on the openacc branch)

/tmp/Nek5000$ git grep -il "\$acc "
core/acc.f
core/comm_mpi.f
core/gmres.f
core/hmholtz.f
core/hsmg.f
core/math.f
core/navier1.f
core/navier4.f
core/plan4.f
core/prepost.f
core/subs2.f

>
> https://nekcem.mcs.anl.gov/

(on master)
/tmp/svn$ git grep -il "\$acc"
branches/maxwell-experimental/src/cem_dg.F
branches/maxwell-experimental/src/dssum2.F
branches/maxwell-experimental/src/io.F
branches/maxwell-experimental/src/mat1.F
branches/maxwell-experimental/src/maxwell.F
branches/maxwell-experimental/src/maxwell_acc.F
branches/maxwell-experimental/src/mxm_acc.F
branches/trunkQu/src/quantum_csr.F
branches/trunkQu/src/quantum_setup.f
branches/trunkQu/src/quantum_time.F
trunk/examples/openacc_gpu=1/box.usr
trunk/examples/openacc_gpu=8/box.usr
trunk/src/acoustic.F
trunk/src/cem_dg2.F
trunk/src/complex.F
trunk/src/drift1.F
trunk/src/drift1_maud.F
trunk/src/drive.F
trunk/src/drive_maud.F
trunk/src/dssum2.F
trunk/src/hmholtz.F
trunk/src/io.F
trunk/src/mat1.F
trunk/src/maxwell.F
trunk/src/maxwell_acc.F
trunk/src/mg_r2204.F
trunk/src/mxm_acc.F
trunk/src/poisson.F
trunk/src/quantum2.F
www/examples/libs/phpThumb/phpthumb.functions.php
www/examples/phpthumb.functions.php

>>   Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?
>>
>>
>> Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).
>
> I'll look for a better list that I can share.

That would be helpful.

Best,

Jeff


>> I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).
>
>
> Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.
>
> Joel
>
>>
>> Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.
>>
>>  -Hal
>>
>>
>>
>> Thanks,
>>
>> Jeff
>>
>> On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.
>>>
>>> We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.
>>>
>>> Thanks.
>>>
>>> Joel E. Denny
>>> Future Technologies Group
>>> Oak Ridge National Laboratory
>>>
>>>
>>> Design Alternatives
>>> -------------------
>>>
>>> We have considered three design alternatives for the clacc compiler:
>>>
>>> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
>>> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
>>> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls
>>>
>>> In the above diagram:
>>>
>>> * acc src = C source code containing acc constructs.
>>> * acc AST = a clang AST in which acc constructs are represented by
>>>   nodes with acc node types.  Of course, such node types do not
>>>   already exist in clang's implementation.
>>> * omp AST = a clang AST in which acc constructs have been lowered
>>>   to omp constructs represented by nodes with omp node types.  Of
>>>   course, such node types do already exist in clang's
>>>   implementation.
>>> * parser = the existing clang parser and semantic analyzer,
>>>   extended to handle acc constructs.
>>> * codegen = the existing clang backend that translates a clang AST
>>>   to LLVM IR, extended if necessary (depending on which design is
>>>   chosen) to perform codegen from acc nodes.
>>> * ttx (tree transformer) = a new clang component that transforms
>>>   acc to omp in clang ASTs.
>>>
>>> Design Features
>>> ---------------
>>>
>>> There are several features to consider when choosing among the designs
>>> in the previous section:
>>>
>>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>>>    designs 2 and 3 best facilitate the creation of additional acc
>>>    source-level tools (such as pretty printers, analyzers, lint-like
>>>    tools, and editor extensions).  Some of these tools, such as pretty
>>>    printing, would be available immediately or as minor extensions of
>>>    tools that already exist in clang's ecosystem.
>>>
>>> 2. omp AST/source as an artifact -- Because they create omp AST
>>>    nodes, designs 1 and 3 best facilitate the use of source-level
>>>    tools to help an application developer discover how clacc has
>>>    mapped his acc to omp, possibly in order to debug a mapping
>>>    specification he has supplied.  With design 2 instead, an
>>>    application developer has to examine low-level LLVM IR + omp rt
>>>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>>>    application's acc source to omp source can be automated.
>>>
>>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>>>    also make it easier for the compiler developer to reason about and
>>>    implement mappings from acc to omp.  That is, because acc and omp
>>>    syntax is so similar, implementing the translation at the level of
>>>    a syntactic representation is probably easier than translating to
>>>    LLVM IR.
>>>
>>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>>>    compiler implementation by enabling reuse of clang's existing omp
>>>    support for codegen.  In contrast, design 2 requires at least some
>>>    extensions to clang codegen to support acc nodes.
>>>
>>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>>>    enable the compiler to analyze the entire source (as opposed to
>>>    just the acc construct currently being parsed) while choosing the
>>>    mapping to omp.  It is not clear if this feature will prove useful,
>>>    but it might enable more optimizations and compiler research
>>>    opportunities.
>>>
>>> 6. No acc node classes -- Design 1 simplifies the compiler
>>>    implementation by eliminating the need to implement many acc node
>>>    classes.  While we have so far found that implementing these
>>>    classes is mostly mechanical, it does take a non-trivial amount of
>>>    time.
>>>
>>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>>>    omp.  That is, it is conceivable that, for some acc constructs,
>>>    there will prove to be no omp syntax to capture the semantics we
>>>    wish to implement.  It is also conceivable that we might one day
>>>    want to represent some acc constructs directly as extensions to
>>>    LLVM IR, where some acc analyses or optimizations might be more
>>>    feasible to implement.  This possibility dovetails with recent
>>>    discussions in the LLVM community about developing LLVM IR
>>>    extensions for various parallel programming models.
>>>
>>> Because of features 4 and 6, design 1 is likely the fastest design to
>>> implement, at least at first while we focus on simple acc features and
>>> simple mappings to omp.  However, we have so far found no advantage
>>> that design 1 has but that design 3 does not have except for feature
>>> 6, which we see as the least important of the above features in the
>>> long term.
>>>
>>> The only advantage we have found that design 2 has but that design 3
>>> does not have is feature 7.  It should be possible to choose design 3
>>> as the default but, for certain acc constructs or scenarios where
>>> feature 7 proves important (if any), incorporate design 2.  In other
>>> words, if we decide not to map a particular acc construct to any omp
>>> construct, ttx would leave it alone, and we would extend codegen to
>>> handle it directly.
>>>
>>> Conclusions
>>> -----------
>>>
>>> For the above reasons, and because design 3 offers the cleanest
>>> separation of concerns, we have chosen design 3 with the possibility
>>> of incorporating design 2 where it proves useful.
>>>
>>> Because of the immutability of clang's AST, the design of our proposed
>>> ttx component requires careful consideration.  To shorten this initial
>>> email, we have omitted those details for now, but we will be happy to
>>> include them as the discussion progresses.
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>>
>>
>> --
>> Jeff Hammond
>> [hidden email]
>> http://jeffhammond.github.io/
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>
>



--
Jeff Hammond
[hidden email]
http://jeffhammond.github.io/

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
On Fri, Dec 8, 2017 at 11:32 AM, Jeff Hammond <[hidden email]> wrote:


On Fri, Dec 8, 2017 at 7:51 AM, Joel E. Denny <[hidden email]> wrote:

>
> Hi Jeff, Hal,
>
> Thanks for your feedback.  My comments are inline below.
>
> On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
>>
>> On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
>>
>> All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
>
> I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:

Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.

Are you saying that the occurrences of "pragma acc" in Nek5000 and NekCEM are unused?


> http://mrfil.github.io/PowerGrid/

/tmp/PowerGrid$ git grep -il "pragma acc"
PowerGrid/Gfft.hpp
PowerGrid/Gnufft.hpp
PowerGrid/ftCpu.hpp
PowerGrid/gridding.hpp
PowerGrid/griddingSupport.hpp

From http://mrfil.github.io/PowerGrid/docs/Installation:

We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.

GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.

For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.


So these folks support OpenACC, but haven't bothered to try the GCC implementation in the 1+ year that it's been available.  How likely are they to use Clang's?

I cannot answer that. Perhaps they were waiting for GCC support to mature?

Thanks.

Joel


> https://nek5000.mcs.anl.gov/ (look at the openacc branch in github)

(on the openacc branch)

/tmp/Nek5000$ git grep -il "\$acc "
core/acc.f
core/comm_mpi.f
core/gmres.f
core/hmholtz.f
core/hsmg.f
core/math.f
core/navier1.f
core/navier4.f
core/plan4.f
core/prepost.f
core/subs2.f

>
> https://nekcem.mcs.anl.gov/

(on master)
/tmp/svn$ git grep -il "\$acc"
branches/maxwell-experimental/src/cem_dg.F
branches/maxwell-experimental/src/dssum2.F
branches/maxwell-experimental/src/io.F
branches/maxwell-experimental/src/mat1.F
branches/maxwell-experimental/src/maxwell.F
branches/maxwell-experimental/src/maxwell_acc.F
branches/maxwell-experimental/src/mxm_acc.F
branches/trunkQu/src/quantum_csr.F
branches/trunkQu/src/quantum_setup.f
branches/trunkQu/src/quantum_time.F
trunk/examples/openacc_gpu=1/box.usr
trunk/examples/openacc_gpu=8/box.usr
trunk/src/acoustic.F
trunk/src/cem_dg2.F
trunk/src/complex.F
trunk/src/drift1.F
trunk/src/drift1_maud.F
trunk/src/drive.F
trunk/src/drive_maud.F
trunk/src/dssum2.F
trunk/src/hmholtz.F
trunk/src/io.F
trunk/src/mat1.F
trunk/src/maxwell.F
trunk/src/maxwell_acc.F
trunk/src/mg_r2204.F
trunk/src/mxm_acc.F
trunk/src/poisson.F
trunk/src/quantum2.F
www/examples/libs/phpThumb/phpthumb.functions.php
www/examples/phpthumb.functions.php

>>   Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?
>>
>>
>> Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).
>
> I'll look for a better list that I can share.

That would be helpful.

Best,

Jeff



>> I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).
>
>
> Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.
>
> Joel
>
>>
>> Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.
>>
>>  -Hal
>>
>>
>>
>> Thanks,
>>
>> Jeff
>>
>> On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.
>>>
>>> We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.
>>>
>>> Thanks.
>>>
>>> Joel E. Denny
>>> Future Technologies Group
>>> Oak Ridge National Laboratory
>>>
>>>
>>> Design Alternatives
>>> -------------------
>>>
>>> We have considered three design alternatives for the clacc compiler:
>>>
>>> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
>>> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
>>> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls
>>>
>>> In the above diagram:
>>>
>>> * acc src = C source code containing acc constructs.
>>> * acc AST = a clang AST in which acc constructs are represented by
>>>   nodes with acc node types.  Of course, such node types do not
>>>   already exist in clang's implementation.
>>> * omp AST = a clang AST in which acc constructs have been lowered
>>>   to omp constructs represented by nodes with omp node types.  Of
>>>   course, such node types do already exist in clang's
>>>   implementation.
>>> * parser = the existing clang parser and semantic analyzer,
>>>   extended to handle acc constructs.
>>> * codegen = the existing clang backend that translates a clang AST
>>>   to LLVM IR, extended if necessary (depending on which design is
>>>   chosen) to perform codegen from acc nodes.
>>> * ttx (tree transformer) = a new clang component that transforms
>>>   acc to omp in clang ASTs.
>>>
>>> Design Features
>>> ---------------
>>>
>>> There are several features to consider when choosing among the designs
>>> in the previous section:
>>>
>>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>>>    designs 2 and 3 best facilitate the creation of additional acc
>>>    source-level tools (such as pretty printers, analyzers, lint-like
>>>    tools, and editor extensions).  Some of these tools, such as pretty
>>>    printing, would be available immediately or as minor extensions of
>>>    tools that already exist in clang's ecosystem.
>>>
>>> 2. omp AST/source as an artifact -- Because they create omp AST
>>>    nodes, designs 1 and 3 best facilitate the use of source-level
>>>    tools to help an application developer discover how clacc has
>>>    mapped his acc to omp, possibly in order to debug a mapping
>>>    specification he has supplied.  With design 2 instead, an
>>>    application developer has to examine low-level LLVM IR + omp rt
>>>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>>>    application's acc source to omp source can be automated.
>>>
>>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>>>    also make it easier for the compiler developer to reason about and
>>>    implement mappings from acc to omp.  That is, because acc and omp
>>>    syntax is so similar, implementing the translation at the level of
>>>    a syntactic representation is probably easier than translating to
>>>    LLVM IR.
>>>
>>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>>>    compiler implementation by enabling reuse of clang's existing omp
>>>    support for codegen.  In contrast, design 2 requires at least some
>>>    extensions to clang codegen to support acc nodes.
>>>
>>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>>>    enable the compiler to analyze the entire source (as opposed to
>>>    just the acc construct currently being parsed) while choosing the
>>>    mapping to omp.  It is not clear if this feature will prove useful,
>>>    but it might enable more optimizations and compiler research
>>>    opportunities.
>>>
>>> 6. No acc node classes -- Design 1 simplifies the compiler
>>>    implementation by eliminating the need to implement many acc node
>>>    classes.  While we have so far found that implementing these
>>>    classes is mostly mechanical, it does take a non-trivial amount of
>>>    time.
>>>
>>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>>>    omp.  That is, it is conceivable that, for some acc constructs,
>>>    there will prove to be no omp syntax to capture the semantics we
>>>    wish to implement.  It is also conceivable that we might one day
>>>    want to represent some acc constructs directly as extensions to
>>>    LLVM IR, where some acc analyses or optimizations might be more
>>>    feasible to implement.  This possibility dovetails with recent
>>>    discussions in the LLVM community about developing LLVM IR
>>>    extensions for various parallel programming models.
>>>
>>> Because of features 4 and 6, design 1 is likely the fastest design to
>>> implement, at least at first while we focus on simple acc features and
>>> simple mappings to omp.  However, we have so far found no advantage
>>> that design 1 has but that design 3 does not have except for feature
>>> 6, which we see as the least important of the above features in the
>>> long term.
>>>
>>> The only advantage we have found that design 2 has but that design 3
>>> does not have is feature 7.  It should be possible to choose design 3
>>> as the default but, for certain acc constructs or scenarios where
>>> feature 7 proves important (if any), incorporate design 2.  In other
>>> words, if we decide not to map a particular acc construct to any omp
>>> construct, ttx would leave it alone, and we would extend codegen to
>>> handle it directly.
>>>
>>> Conclusions
>>> -----------
>>>
>>> For the above reasons, and because design 3 offers the cleanest
>>> separation of concerns, we have chosen design 3 with the possibility
>>> of incorporating design 2 where it proves useful.
>>>
>>> Because of the immutability of clang's AST, the design of our proposed
>>> ttx component requires careful consideration.  To shorten this initial
>>> email, we have omitted those details for now, but we will be happy to
>>> include them as the discussion progresses.
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>>
>>
>> --
>> Jeff Hammond
>> [hidden email]
>> http://jeffhammond.github.io/
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>
>



--
Jeff Hammond
[hidden email]
http://jeffhammond.github.io/


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev


On Fri, Dec 8, 2017 at 9:00 AM, Joel E. Denny <[hidden email]> wrote:

>
> On Fri, Dec 8, 2017 at 11:32 AM, Jeff Hammond <[hidden email]> wrote:
>>
>>
>>
>> On Fri, Dec 8, 2017 at 7:51 AM, Joel E. Denny <[hidden email]> wrote:
>> >
>> > Hi Jeff, Hal,
>> >
>> > Thanks for your feedback.  My comments are inline below.
>> >
>> > On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
>> >>
>> >> All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
>> >
>> > I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:
>>
>> Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.
>
>
> Are you saying that the occurrences of "pragma acc" in Nek5000 and NekCEM are unused?
>

The instances of "pragma acc" in those - it's the same code in both projects - are either (1) only causing host-device data synchronization or (2) commented-out.

It's unclear to me what actually happens in the code as currently written.  The OpenACC C/C++ code does not more than copy data to/from the device.  I didn't trace the entire code execution but I can't tell if any code touches the device data that OpenACC is updating.  If it is updated, it is updated by Fortran OpenACC code somewhere else in the source tree.

What does the OpenACC standard say about interoperability of compilers+runtimes, as would be required if one used Clang OpenACC for C/C++ and Fortran OpenACC implemented by PGI, Cray, or GCC.  OpenMP definitely does not support this, even if a subset of usage may work when one uses the same runtime library with different compilers.

/tmp/Nek5000$ git grep "pragma acc"
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc exit data delete(map0,map1)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c://#pragma acc enter data copyin(stage[0].scatter_map[0:stage[0].s_size],stage[0].scatter_mapf[0:stage[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[0].scatter_map[0:stage2[0].s_size],stage2[0].scatter_mapf[0:stage2[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc update host(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:  //#pragma acc exit data delete(ard->map_to_buf[0],ard->map_to_buf[1],ard->map_from_buf[0],ard->map_from_buf[1])
jl/gs.c:  //#pragma acc enter data copyin(ard->map_to_buf[0][0:ard->mt_size[0]],ard->map_from_buf[0][0:ard->mf_size[0]],ard->map_to_buf_f[0][0:ard->mt_nt[0]],ard->map_from_buf_f[0][0:ard->mf_nt[0]],ard->map_to_buf[1][0:ard->mt_size[1]],ard->map_from_buf[1][0:ard->mf_size[1]],ard->map_to_buf_f[1][0:ard->mt_nt[1]],ard->map_from_buf_f[1][0:ard->mf_nt[1]])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(map_local0,map_local1,flagged_primaries)
jl/gs.c:#pragma acc enter data pcopyin(map[0:*m_size],mapf2[0:2*mf_temp])
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc enter data pcopyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc data present(u[0:uds],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2],t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size]) 
jl/gs_acc.c://#pragma acc data create(sbuf[0:bl],rbuf[0:bl]) if(bl!=0)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],map[0:m_size],mapf[0:m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size],fp_mapf[0:fp_m_nt*2]) private(i,j) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size]) private(i,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],snd_mapf[0:snd_m_nt*2],sbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],sbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc update host(sbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc update device(rbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rcv_mapf[0:rcv_m_nt*2],rbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c:    //#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],t_map[0:t_m_size],t_mapf[0:t_m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
 

>>
>>
>> > http://mrfil.github.io/PowerGrid/
>>
>> /tmp/PowerGrid$ git grep -il "pragma acc"
>> PowerGrid/Gfft.hpp
>> PowerGrid/Gnufft.hpp
>> PowerGrid/ftCpu.hpp
>> PowerGrid/gridding.hpp
>> PowerGrid/griddingSupport.hpp
>>
>> From http://mrfil.github.io/PowerGrid/docs/Installation:
>>
>> We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.
>>
>> GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.
>>
>> For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.
>>
>> So these folks support OpenACC, but haven't bothered to try the GCC implementation in the 1+ year that it's been available.  How likely are they to use Clang's?
>
>
> I cannot answer that. Perhaps they were waiting for GCC support to mature?

Or maybe they aren't interested using in OpenACC compiler support outside of PGI.

What I'm really getting at here is who is going to use OpenACC support in Clang, particularly if there is no compatible Fortran OpenACC compiler?  In addition to justifying the code maintenance effort, users who are not developers are essential for implementation hardening.

Best,

Jeff

> Thanks.
>
> Joel
>
>>
>> > https://nek5000.mcs.anl.gov/ (look at the openacc branch in github)
>>
>> (on the openacc branch)
>>
>> /tmp/Nek5000$ git grep -il "\$acc "
>> core/acc.f
>> core/comm_mpi.f
>> core/gmres.f
>> core/hmholtz.f
>> core/hsmg.f
>> core/math.f
>> core/navier1.f
>> core/navier4.f
>> core/plan4.f
>> core/prepost.f
>> core/subs2.f
>>
>> >
>> > https://nekcem.mcs.anl.gov/
>>
>> (on master)
>> /tmp/svn$ git grep -il "\$acc"
>> branches/maxwell-experimental/src/cem_dg.F
>> branches/maxwell-experimental/src/dssum2.F
>> branches/maxwell-experimental/src/io.F
>> branches/maxwell-experimental/src/mat1.F
>> branches/maxwell-experimental/src/maxwell.F
>> branches/maxwell-experimental/src/maxwell_acc.F
>> branches/maxwell-experimental/src/mxm_acc.F
>> branches/trunkQu/src/quantum_csr.F
>> branches/trunkQu/src/quantum_setup.f
>> branches/trunkQu/src/quantum_time.F
>> trunk/examples/openacc_gpu=1/box.usr
>> trunk/examples/openacc_gpu=8/box.usr
>> trunk/src/acoustic.F
>> trunk/src/cem_dg2.F
>> trunk/src/complex.F
>> trunk/src/drift1.F
>> trunk/src/drift1_maud.F
>> trunk/src/drive.F
>> trunk/src/drive_maud.F
>> trunk/src/dssum2.F
>> trunk/src/hmholtz.F
>> trunk/src/io.F
>> trunk/src/mat1.F
>> trunk/src/maxwell.F
>> trunk/src/maxwell_acc.F
>> trunk/src/mg_r2204.F
>> trunk/src/mxm_acc.F
>> trunk/src/poisson.F
>> trunk/src/quantum2.F
>> www/examples/libs/phpThumb/phpthumb.functions.php
>> www/examples/phpthumb.functions.php
>>
>> >>   Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?
>> >>
>> >>
>> >> Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).
>> >
>> > I'll look for a better list that I can share.
>>
>> That would be helpful.
>>
>> Best,
>>
>> Jeff
>>
>>
>>
>> >> I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).
>> >
>> >
>> > Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.
>> >
>> > Joel
>> >
>> >>
>> >> Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.
>> >>
>> >>  -Hal
>> >>
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Jeff
>> >>
>> >> On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.
>> >>>
>> >>> We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Joel E. Denny
>> >>> Future Technologies Group
>> >>> Oak Ridge National Laboratory
>> >>>
>> >>>
>> >>> Design Alternatives
>> >>> -------------------
>> >>>
>> >>> We have considered three design alternatives for the clacc compiler:
>> >>>
>> >>> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
>> >>> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>>
>> >>> In the above diagram:
>> >>>
>> >>> * acc src = C source code containing acc constructs.
>> >>> * acc AST = a clang AST in which acc constructs are represented by
>> >>>   nodes with acc node types.  Of course, such node types do not
>> >>>   already exist in clang's implementation.
>> >>> * omp AST = a clang AST in which acc constructs have been lowered
>> >>>   to omp constructs represented by nodes with omp node types.  Of
>> >>>   course, such node types do already exist in clang's
>> >>>   implementation.
>> >>> * parser = the existing clang parser and semantic analyzer,
>> >>>   extended to handle acc constructs.
>> >>> * codegen = the existing clang backend that translates a clang AST
>> >>>   to LLVM IR, extended if necessary (depending on which design is
>> >>>   chosen) to perform codegen from acc nodes.
>> >>> * ttx (tree transformer) = a new clang component that transforms
>> >>>   acc to omp in clang ASTs.
>> >>>
>> >>> Design Features
>> >>> ---------------
>> >>>
>> >>> There are several features to consider when choosing among the designs
>> >>> in the previous section:
>> >>>
>> >>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>> >>>    designs 2 and 3 best facilitate the creation of additional acc
>> >>>    source-level tools (such as pretty printers, analyzers, lint-like
>> >>>    tools, and editor extensions).  Some of these tools, such as pretty
>> >>>    printing, would be available immediately or as minor extensions of
>> >>>    tools that already exist in clang's ecosystem.
>> >>>
>> >>> 2. omp AST/source as an artifact -- Because they create omp AST
>> >>>    nodes, designs 1 and 3 best facilitate the use of source-level
>> >>>    tools to help an application developer discover how clacc has
>> >>>    mapped his acc to omp, possibly in order to debug a mapping
>> >>>    specification he has supplied.  With design 2 instead, an
>> >>>    application developer has to examine low-level LLVM IR + omp rt
>> >>>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>> >>>    application's acc source to omp source can be automated.
>> >>>
>> >>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>> >>>    also make it easier for the compiler developer to reason about and
>> >>>    implement mappings from acc to omp.  That is, because acc and omp
>> >>>    syntax is so similar, implementing the translation at the level of
>> >>>    a syntactic representation is probably easier than translating to
>> >>>    LLVM IR.
>> >>>
>> >>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>> >>>    compiler implementation by enabling reuse of clang's existing omp
>> >>>    support for codegen.  In contrast, design 2 requires at least some
>> >>>    extensions to clang codegen to support acc nodes.
>> >>>
>> >>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>> >>>    enable the compiler to analyze the entire source (as opposed to
>> >>>    just the acc construct currently being parsed) while choosing the
>> >>>    mapping to omp.  It is not clear if this feature will prove useful,
>> >>>    but it might enable more optimizations and compiler research
>> >>>    opportunities.
>> >>>
>> >>> 6. No acc node classes -- Design 1 simplifies the compiler
>> >>>    implementation by eliminating the need to implement many acc node
>> >>>    classes.  While we have so far found that implementing these
>> >>>    classes is mostly mechanical, it does take a non-trivial amount of
>> >>>    time.
>> >>>
>> >>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>> >>>    omp.  That is, it is conceivable that, for some acc constructs,
>> >>>    there will prove to be no omp syntax to capture the semantics we
>> >>>    wish to implement.  It is also conceivable that we might one day
>> >>>    want to represent some acc constructs directly as extensions to
>> >>>    LLVM IR, where some acc analyses or optimizations might be more
>> >>>    feasible to implement.  This possibility dovetails with recent
>> >>>    discussions in the LLVM community about developing LLVM IR
>> >>>    extensions for various parallel programming models.
>> >>>
>> >>> Because of features 4 and 6, design 1 is likely the fastest design to
>> >>> implement, at least at first while we focus on simple acc features and
>> >>> simple mappings to omp.  However, we have so far found no advantage
>> >>> that design 1 has but that design 3 does not have except for feature
>> >>> 6, which we see as the least important of the above features in the
>> >>> long term.
>> >>>
>> >>> The only advantage we have found that design 2 has but that design 3
>> >>> does not have is feature 7.  It should be possible to choose design 3
>> >>> as the default but, for certain acc constructs or scenarios where
>> >>> feature 7 proves important (if any), incorporate design 2.  In other
>> >>> words, if we decide not to map a particular acc construct to any omp
>> >>> construct, ttx would leave it alone, and we would extend codegen to
>> >>> handle it directly.
>> >>>
>> >>> Conclusions
>> >>> -----------
>> >>>
>> >>> For the above reasons, and because design 3 offers the cleanest
>> >>> separation of concerns, we have chosen design 3 with the possibility
>> >>> of incorporating design 2 where it proves useful.
>> >>>
>> >>> Because of the immutability of clang's AST, the design of our proposed
>> >>> ttx component requires careful consideration.  To shorten this initial
>> >>> email, we have omitted those details for now, but we will be happy to
>> >>> include them as the discussion progresses.
>> >>>
>> >>> _______________________________________________
>> >>> cfe-dev mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jeff Hammond
>> >> [hidden email]
>> >> http://jeffhammond.github.io/
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> [hidden email]
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>
>> >>
>> >> --
>> >> Hal Finkel
>> >> Lead, Compiler Technology and Programming Languages
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >
>> >
>>
>>
>>
>> --
>> Jeff Hammond
>> [hidden email]
>> http://jeffhammond.github.io/
>
>



--
Jeff Hammond
[hidden email]
http://jeffhammond.github.io/

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
On Fri, Dec 8, 2017 at 1:02 PM, Jeff Hammond <[hidden email]> wrote:


On Fri, Dec 8, 2017 at 9:00 AM, Joel E. Denny <[hidden email]> wrote:

>
> On Fri, Dec 8, 2017 at 11:32 AM, Jeff Hammond <[hidden email]> wrote:
>>
>>
>>
>> On Fri, Dec 8, 2017 at 7:51 AM, Joel E. Denny <[hidden email]> wrote:
>> >
>> > Hi Jeff, Hal,
>> >
>> > Thanks for your feedback.  My comments are inline below.
>> >
>> > On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
>> >>
>> >> All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
>> >
>> > I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:
>>
>> Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.
>
>
> Are you saying that the occurrences of "pragma acc" in Nek5000 and NekCEM are unused?
>

The instances of "pragma acc" in those - it's the same code in both projects - are either (1) only causing host-device data synchronization or (2) commented-out.

It's unclear to me what actually happens in the code as currently written.  The OpenACC C/C++ code does not more than copy data to/from the device.  I didn't trace the entire code execution but I can't tell if any code touches the device data that OpenACC is updating.  If it is updated, it is updated by Fortran OpenACC code somewhere else in the source tree.

The point is that here is some evidence that compiler support for OpenACC in C/C++ is useful.
 

What does the OpenACC standard say about interoperability of compilers+runtimes, as would be required if one used Clang OpenACC for C/C++ and Fortran OpenACC implemented by PGI, Cray, or GCC.  OpenMP definitely does not support this, even if a subset of usage may work when one uses the same runtime library with different compilers.

Flang is under development.  I see no reason to believe it cannot grow OpenACC support eventually as well.


/tmp/Nek5000$ git grep "pragma acc"
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc exit data delete(map0,map1)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c://#pragma acc enter data copyin(stage[0].scatter_map[0:stage[0].s_size],stage[0].scatter_mapf[0:stage[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[0].scatter_map[0:stage2[0].s_size],stage2[0].scatter_mapf[0:stage2[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc update host(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:  //#pragma acc exit data delete(ard->map_to_buf[0],ard->map_to_buf[1],ard->map_from_buf[0],ard->map_from_buf[1])
jl/gs.c:  //#pragma acc enter data copyin(ard->map_to_buf[0][0:ard->mt_size[0]],ard->map_from_buf[0][0:ard->mf_size[0]],ard->map_to_buf_f[0][0:ard->mt_nt[0]],ard->map_from_buf_f[0][0:ard->mf_nt[0]],ard->map_to_buf[1][0:ard->mt_size[1]],ard->map_from_buf[1][0:ard->mf_size[1]],ard->map_to_buf_f[1][0:ard->mt_nt[1]],ard->map_from_buf_f[1][0:ard->mf_nt[1]])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(map_local0,map_local1,flagged_primaries)
jl/gs.c:#pragma acc enter data pcopyin(map[0:*m_size],mapf2[0:2*mf_temp])
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc enter data pcopyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc data present(u[0:uds],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2],t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size]) 
jl/gs_acc.c://#pragma acc data create(sbuf[0:bl],rbuf[0:bl]) if(bl!=0)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],map[0:m_size],mapf[0:m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size],fp_mapf[0:fp_m_nt*2]) private(i,j) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size]) private(i,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],snd_mapf[0:snd_m_nt*2],sbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],sbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc update host(sbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc update device(rbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rcv_mapf[0:rcv_m_nt*2],rbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c:    //#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],t_map[0:t_m_size],t_mapf[0:t_m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
 

>>
>>
>> > http://mrfil.github.io/PowerGrid/
>>
>> /tmp/PowerGrid$ git grep -il "pragma acc"
>> PowerGrid/Gfft.hpp
>> PowerGrid/Gnufft.hpp
>> PowerGrid/ftCpu.hpp
>> PowerGrid/gridding.hpp
>> PowerGrid/griddingSupport.hpp
>>
>> From http://mrfil.github.io/PowerGrid/docs/Installation:
>>
>> We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.
>>
>> GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.
>>
>> For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.
>>
>> So these folks support OpenACC, but haven't bothered to try the GCC implementation in the 1+ year that it's been available.  How likely are they to use Clang's?
>
>
> I cannot answer that. Perhaps they were waiting for GCC support to mature?

Or maybe they aren't interested using in OpenACC compiler support outside of PGI.

They said they are interested.  I don't yet see sufficient evidence to believe that interest is not genuine.

Thanks.

Joel
 

What I'm really getting at here is who is going to use OpenACC support in Clang, particularly if there is no compatible Fortran OpenACC compiler?  In addition to justifying the code maintenance effort, users who are not developers are essential for implementation hardening.

Best,

Jeff


> Thanks.
>
> Joel
>
>>
>> > https://nek5000.mcs.anl.gov/ (look at the openacc branch in github)
>>
>> (on the openacc branch)
>>
>> /tmp/Nek5000$ git grep -il "\$acc "
>> core/acc.f
>> core/comm_mpi.f
>> core/gmres.f
>> core/hmholtz.f
>> core/hsmg.f
>> core/math.f
>> core/navier1.f
>> core/navier4.f
>> core/plan4.f
>> core/prepost.f
>> core/subs2.f
>>
>> >
>> > https://nekcem.mcs.anl.gov/
>>
>> (on master)
>> /tmp/svn$ git grep -il "\$acc"
>> branches/maxwell-experimental/src/cem_dg.F
>> branches/maxwell-experimental/src/dssum2.F
>> branches/maxwell-experimental/src/io.F
>> branches/maxwell-experimental/src/mat1.F
>> branches/maxwell-experimental/src/maxwell.F
>> branches/maxwell-experimental/src/maxwell_acc.F
>> branches/maxwell-experimental/src/mxm_acc.F
>> branches/trunkQu/src/quantum_csr.F
>> branches/trunkQu/src/quantum_setup.f
>> branches/trunkQu/src/quantum_time.F
>> trunk/examples/openacc_gpu=1/box.usr
>> trunk/examples/openacc_gpu=8/box.usr
>> trunk/src/acoustic.F
>> trunk/src/cem_dg2.F
>> trunk/src/complex.F
>> trunk/src/drift1.F
>> trunk/src/drift1_maud.F
>> trunk/src/drive.F
>> trunk/src/drive_maud.F
>> trunk/src/dssum2.F
>> trunk/src/hmholtz.F
>> trunk/src/io.F
>> trunk/src/mat1.F
>> trunk/src/maxwell.F
>> trunk/src/maxwell_acc.F
>> trunk/src/mg_r2204.F
>> trunk/src/mxm_acc.F
>> trunk/src/poisson.F
>> trunk/src/quantum2.F
>> www/examples/libs/phpThumb/phpthumb.functions.php
>> www/examples/phpthumb.functions.php
>>
>> >>   Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?
>> >>
>> >>
>> >> Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).
>> >
>> > I'll look for a better list that I can share.
>>
>> That would be helpful.
>>
>> Best,
>>
>> Jeff
>>
>>
>>
>> >> I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).
>> >
>> >
>> > Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.
>> >
>> > Joel
>> >
>> >>
>> >> Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.
>> >>
>> >>  -Hal
>> >>
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Jeff
>> >>
>> >> On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.
>> >>>
>> >>> We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Joel E. Denny
>> >>> Future Technologies Group
>> >>> Oak Ridge National Laboratory
>> >>>
>> >>>
>> >>> Design Alternatives
>> >>> -------------------
>> >>>
>> >>> We have considered three design alternatives for the clacc compiler:
>> >>>
>> >>> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
>> >>> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>>
>> >>> In the above diagram:
>> >>>
>> >>> * acc src = C source code containing acc constructs.
>> >>> * acc AST = a clang AST in which acc constructs are represented by
>> >>>   nodes with acc node types.  Of course, such node types do not
>> >>>   already exist in clang's implementation.
>> >>> * omp AST = a clang AST in which acc constructs have been lowered
>> >>>   to omp constructs represented by nodes with omp node types.  Of
>> >>>   course, such node types do already exist in clang's
>> >>>   implementation.
>> >>> * parser = the existing clang parser and semantic analyzer,
>> >>>   extended to handle acc constructs.
>> >>> * codegen = the existing clang backend that translates a clang AST
>> >>>   to LLVM IR, extended if necessary (depending on which design is
>> >>>   chosen) to perform codegen from acc nodes.
>> >>> * ttx (tree transformer) = a new clang component that transforms
>> >>>   acc to omp in clang ASTs.
>> >>>
>> >>> Design Features
>> >>> ---------------
>> >>>
>> >>> There are several features to consider when choosing among the designs
>> >>> in the previous section:
>> >>>
>> >>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>> >>>    designs 2 and 3 best facilitate the creation of additional acc
>> >>>    source-level tools (such as pretty printers, analyzers, lint-like
>> >>>    tools, and editor extensions).  Some of these tools, such as pretty
>> >>>    printing, would be available immediately or as minor extensions of
>> >>>    tools that already exist in clang's ecosystem.
>> >>>
>> >>> 2. omp AST/source as an artifact -- Because they create omp AST
>> >>>    nodes, designs 1 and 3 best facilitate the use of source-level
>> >>>    tools to help an application developer discover how clacc has
>> >>>    mapped his acc to omp, possibly in order to debug a mapping
>> >>>    specification he has supplied.  With design 2 instead, an
>> >>>    application developer has to examine low-level LLVM IR + omp rt
>> >>>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>> >>>    application's acc source to omp source can be automated.
>> >>>
>> >>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>> >>>    also make it easier for the compiler developer to reason about and
>> >>>    implement mappings from acc to omp.  That is, because acc and omp
>> >>>    syntax is so similar, implementing the translation at the level of
>> >>>    a syntactic representation is probably easier than translating to
>> >>>    LLVM IR.
>> >>>
>> >>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>> >>>    compiler implementation by enabling reuse of clang's existing omp
>> >>>    support for codegen.  In contrast, design 2 requires at least some
>> >>>    extensions to clang codegen to support acc nodes.
>> >>>
>> >>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>> >>>    enable the compiler to analyze the entire source (as opposed to
>> >>>    just the acc construct currently being parsed) while choosing the
>> >>>    mapping to omp.  It is not clear if this feature will prove useful,
>> >>>    but it might enable more optimizations and compiler research
>> >>>    opportunities.
>> >>>
>> >>> 6. No acc node classes -- Design 1 simplifies the compiler
>> >>>    implementation by eliminating the need to implement many acc node
>> >>>    classes.  While we have so far found that implementing these
>> >>>    classes is mostly mechanical, it does take a non-trivial amount of
>> >>>    time.
>> >>>
>> >>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>> >>>    omp.  That is, it is conceivable that, for some acc constructs,
>> >>>    there will prove to be no omp syntax to capture the semantics we
>> >>>    wish to implement.  It is also conceivable that we might one day
>> >>>    want to represent some acc constructs directly as extensions to
>> >>>    LLVM IR, where some acc analyses or optimizations might be more
>> >>>    feasible to implement.  This possibility dovetails with recent
>> >>>    discussions in the LLVM community about developing LLVM IR
>> >>>    extensions for various parallel programming models.
>> >>>
>> >>> Because of features 4 and 6, design 1 is likely the fastest design to
>> >>> implement, at least at first while we focus on simple acc features and
>> >>> simple mappings to omp.  However, we have so far found no advantage
>> >>> that design 1 has but that design 3 does not have except for feature
>> >>> 6, which we see as the least important of the above features in the
>> >>> long term.
>> >>>
>> >>> The only advantage we have found that design 2 has but that design 3
>> >>> does not have is feature 7.  It should be possible to choose design 3
>> >>> as the default but, for certain acc constructs or scenarios where
>> >>> feature 7 proves important (if any), incorporate design 2.  In other
>> >>> words, if we decide not to map a particular acc construct to any omp
>> >>> construct, ttx would leave it alone, and we would extend codegen to
>> >>> handle it directly.
>> >>>
>> >>> Conclusions
>> >>> -----------
>> >>>
>> >>> For the above reasons, and because design 3 offers the cleanest
>> >>> separation of concerns, we have chosen design 3 with the possibility
>> >>> of incorporating design 2 where it proves useful.
>> >>>
>> >>> Because of the immutability of clang's AST, the design of our proposed
>> >>> ttx component requires careful consideration.  To shorten this initial
>> >>> email, we have omitted those details for now, but we will be happy to
>> >>> include them as the discussion progresses.
>> >>>
>> >>> _______________________________________________
>> >>> cfe-dev mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jeff Hammond
>> >> [hidden email]
>> >> http://jeffhammond.github.io/
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> [hidden email]
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>
>> >>
>> >> --
>> >> Hal Finkel
>> >> Lead, Compiler Technology and Programming Languages
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >
>> >
>>
>>
>>
>> --
>> Jeff Hammond
>> [hidden email]
>> http://jeffhammond.github.io/
>
>



--
Jeff Hammond
[hidden email]
http://jeffhammond.github.io/


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev

Hi, Jeff,

First, I don't think we should spam all of cfe-dev the source listings of every project using OpenACC :-) -- We actually do need to understand what's out there (both open source and otherwise), for many different reasons, but this is not the place to collect that information. Of course, one of the reasons that we'd like OpenACC support in Clang is to make it easier to do this kind of analysis (current internal work is using GCC).

Second, we have a significant body of code using OpenACC, and moreover, the rate at which OpenACC code is being written is increasing. As I mentioned, this is because the compiler OpenACC support has now reached a point where using it makes sense. On many machines with GPUs, including large DOE machines, OpenACC is the recommended way to exploit them at this point. As OpenMP accelerator support matures, that may change. We're (DOE is) investing heavily in OpenMP for the future. Nevertheless, OpenACC is a cross-vendor standard (https://www.openacc.org), which is actively developed, supported by the Cray and PGI compilers, and also by GCC (as of GCC v7). The GCC support is new, and less mature than what is provided by the vendor compilers, but I expect that it will get better over time as well.

I think that it makes sense for Clang to support OpenACC. We'd like to use Clang's tooling capabilities on this code. Moreover, I'd like to do this in a way that enables maximum interoperability with OpenMP. Over time, this will make it easier to transition code from OpenACC to OpenMP when that's desired. There's a significant overlap in functionality between the two sets of directives. That should lower the implementation cost significantly. OpenACC is not a pure subset of OpenMP, however, and that's also important.

Thanks again,
Hal

On 12/08/2017 12:02 PM, Jeff Hammond wrote:


On Fri, Dec 8, 2017 at 9:00 AM, Joel E. Denny <[hidden email]> wrote:
>
> On Fri, Dec 8, 2017 at 11:32 AM, Jeff Hammond <[hidden email]> wrote:
>>
>>
>>
>> On Fri, Dec 8, 2017 at 7:51 AM, Joel E. Denny <[hidden email]> wrote:
>> >
>> > Hi Jeff, Hal,
>> >
>> > Thanks for your feedback.  My comments are inline below.
>> >
>> > On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
>> >>
>> >> All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
>> >
>> > I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:
>>
>> Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.
>
>
> Are you saying that the occurrences of "pragma acc" in Nek5000 and NekCEM are unused?
>

The instances of "pragma acc" in those - it's the same code in both projects - are either (1) only causing host-device data synchronization or (2) commented-out.

It's unclear to me what actually happens in the code as currently written.  The OpenACC C/C++ code does not more than copy data to/from the device.  I didn't trace the entire code execution but I can't tell if any code touches the device data that OpenACC is updating.  If it is updated, it is updated by Fortran OpenACC code somewhere else in the source tree.

What does the OpenACC standard say about interoperability of compilers+runtimes, as would be required if one used Clang OpenACC for C/C++ and Fortran OpenACC implemented by PGI, Cray, or GCC.  OpenMP definitely does not support this, even if a subset of usage may work when one uses the same runtime library with different compilers.

/tmp/Nek5000$ git grep "pragma acc"
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc exit data delete(map0,map1)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c://#pragma acc enter data copyin(stage[0].scatter_map[0:stage[0].s_size],stage[0].scatter_mapf[0:stage[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[0].scatter_map[0:stage2[0].s_size],stage2[0].scatter_mapf[0:stage2[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc update host(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:  //#pragma acc exit data delete(ard->map_to_buf[0],ard->map_to_buf[1],ard->map_from_buf[0],ard->map_from_buf[1])
jl/gs.c:  //#pragma acc enter data copyin(ard->map_to_buf[0][0:ard->mt_size[0]],ard->map_from_buf[0][0:ard->mf_size[0]],ard->map_to_buf_f[0][0:ard->mt_nt[0]],ard->map_from_buf_f[0][0:ard->mf_nt[0]],ard->map_to_buf[1][0:ard->mt_size[1]],ard->map_from_buf[1][0:ard->mf_size[1]],ard->map_to_buf_f[1][0:ard->mt_nt[1]],ard->map_from_buf_f[1][0:ard->mf_nt[1]])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(map_local0,map_local1,flagged_primaries)
jl/gs.c:#pragma acc enter data pcopyin(map[0:*m_size],mapf2[0:2*mf_temp])
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc enter data pcopyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc data present(u[0:uds],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2],t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size]) 
jl/gs_acc.c://#pragma acc data create(sbuf[0:bl],rbuf[0:bl]) if(bl!=0)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],map[0:m_size],mapf[0:m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size],fp_mapf[0:fp_m_nt*2]) private(i,j) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size]) private(i,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],snd_mapf[0:snd_m_nt*2],sbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],sbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc update host(sbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc update device(rbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rcv_mapf[0:rcv_m_nt*2],rbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c:    //#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],t_map[0:t_m_size],t_mapf[0:t_m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
 
>>
>>
>> > http://mrfil.github.io/PowerGrid/
>>
>> /tmp/PowerGrid$ git grep -il "pragma acc"
>> PowerGrid/Gfft.hpp
>> PowerGrid/Gnufft.hpp
>> PowerGrid/ftCpu.hpp
>> PowerGrid/gridding.hpp
>> PowerGrid/griddingSupport.hpp
>>
>> From http://mrfil.github.io/PowerGrid/docs/Installation:
>>
>> We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.
>>
>> GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.
>>
>> For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.
>>
>> So these folks support OpenACC, but haven't bothered to try the GCC implementation in the 1+ year that it's been available.  How likely are they to use Clang's?
>
>
> I cannot answer that. Perhaps they were waiting for GCC support to mature?

Or maybe they aren't interested using in OpenACC compiler support outside of PGI.

What I'm really getting at here is who is going to use OpenACC support in Clang, particularly if there is no compatible Fortran OpenACC compiler?  In addition to justifying the code maintenance effort, users who are not developers are essential for implementation hardening.

Best,

Jeff

> Thanks.
>
> Joel
>
>>
>> > https://nek5000.mcs.anl.gov/ (look at the openacc branch in github)
>>
>> (on the openacc branch)
>>
>> /tmp/Nek5000$ git grep -il "\$acc "
>> core/acc.f
>> core/comm_mpi.f
>> core/gmres.f
>> core/hmholtz.f
>> core/hsmg.f
>> core/math.f
>> core/navier1.f
>> core/navier4.f
>> core/plan4.f
>> core/prepost.f
>> core/subs2.f
>>
>> >
>> > https://nekcem.mcs.anl.gov/
>>
>> (on master)
>> /tmp/svn$ git grep -il "\$acc"
>> branches/maxwell-experimental/src/cem_dg.F
>> branches/maxwell-experimental/src/dssum2.F
>> branches/maxwell-experimental/src/io.F
>> branches/maxwell-experimental/src/mat1.F
>> branches/maxwell-experimental/src/maxwell.F
>> branches/maxwell-experimental/src/maxwell_acc.F
>> branches/maxwell-experimental/src/mxm_acc.F
>> branches/trunkQu/src/quantum_csr.F
>> branches/trunkQu/src/quantum_setup.f
>> branches/trunkQu/src/quantum_time.F
>> trunk/examples/openacc_gpu=1/box.usr
>> trunk/examples/openacc_gpu=8/box.usr
>> trunk/src/acoustic.F
>> trunk/src/cem_dg2.F
>> trunk/src/complex.F
>> trunk/src/drift1.F
>> trunk/src/drift1_maud.F
>> trunk/src/drive.F
>> trunk/src/drive_maud.F
>> trunk/src/dssum2.F
>> trunk/src/hmholtz.F
>> trunk/src/io.F
>> trunk/src/mat1.F
>> trunk/src/maxwell.F
>> trunk/src/maxwell_acc.F
>> trunk/src/mg_r2204.F
>> trunk/src/mxm_acc.F
>> trunk/src/poisson.F
>> trunk/src/quantum2.F
>> www/examples/libs/phpThumb/phpthumb.functions.php
>> www/examples/phpthumb.functions.php
>>
>> >>   Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?
>> >>
>> >>
>> >> Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).
>> >
>> > I'll look for a better list that I can share.
>>
>> That would be helpful.
>>
>> Best,
>>
>> Jeff
>>
>>
>>
>> >> I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).
>> >
>> >
>> > Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.
>> >
>> > Joel
>> >
>> >>
>> >> Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.
>> >>
>> >>  -Hal
>> >>
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Jeff
>> >>
>> >> On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.
>> >>>
>> >>> We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Joel E. Denny
>> >>> Future Technologies Group
>> >>> Oak Ridge National Laboratory
>> >>>
>> >>>
>> >>> Design Alternatives
>> >>> -------------------
>> >>>
>> >>> We have considered three design alternatives for the clacc compiler:
>> >>>
>> >>> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
>> >>> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>>
>> >>> In the above diagram:
>> >>>
>> >>> * acc src = C source code containing acc constructs.
>> >>> * acc AST = a clang AST in which acc constructs are represented by
>> >>>   nodes with acc node types.  Of course, such node types do not
>> >>>   already exist in clang's implementation.
>> >>> * omp AST = a clang AST in which acc constructs have been lowered
>> >>>   to omp constructs represented by nodes with omp node types.  Of
>> >>>   course, such node types do already exist in clang's
>> >>>   implementation.
>> >>> * parser = the existing clang parser and semantic analyzer,
>> >>>   extended to handle acc constructs.
>> >>> * codegen = the existing clang backend that translates a clang AST
>> >>>   to LLVM IR, extended if necessary (depending on which design is
>> >>>   chosen) to perform codegen from acc nodes.
>> >>> * ttx (tree transformer) = a new clang component that transforms
>> >>>   acc to omp in clang ASTs.
>> >>>
>> >>> Design Features
>> >>> ---------------
>> >>>
>> >>> There are several features to consider when choosing among the designs
>> >>> in the previous section:
>> >>>
>> >>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>> >>>    designs 2 and 3 best facilitate the creation of additional acc
>> >>>    source-level tools (such as pretty printers, analyzers, lint-like
>> >>>    tools, and editor extensions).  Some of these tools, such as pretty
>> >>>    printing, would be available immediately or as minor extensions of
>> >>>    tools that already exist in clang's ecosystem.
>> >>>
>> >>> 2. omp AST/source as an artifact -- Because they create omp AST
>> >>>    nodes, designs 1 and 3 best facilitate the use of source-level
>> >>>    tools to help an application developer discover how clacc has
>> >>>    mapped his acc to omp, possibly in order to debug a mapping
>> >>>    specification he has supplied.  With design 2 instead, an
>> >>>    application developer has to examine low-level LLVM IR + omp rt
>> >>>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>> >>>    application's acc source to omp source can be automated.
>> >>>
>> >>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>> >>>    also make it easier for the compiler developer to reason about and
>> >>>    implement mappings from acc to omp.  That is, because acc and omp
>> >>>    syntax is so similar, implementing the translation at the level of
>> >>>    a syntactic representation is probably easier than translating to
>> >>>    LLVM IR.
>> >>>
>> >>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>> >>>    compiler implementation by enabling reuse of clang's existing omp
>> >>>    support for codegen.  In contrast, design 2 requires at least some
>> >>>    extensions to clang codegen to support acc nodes.
>> >>>
>> >>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>> >>>    enable the compiler to analyze the entire source (as opposed to
>> >>>    just the acc construct currently being parsed) while choosing the
>> >>>    mapping to omp.  It is not clear if this feature will prove useful,
>> >>>    but it might enable more optimizations and compiler research
>> >>>    opportunities.
>> >>>
>> >>> 6. No acc node classes -- Design 1 simplifies the compiler
>> >>>    implementation by eliminating the need to implement many acc node
>> >>>    classes.  While we have so far found that implementing these
>> >>>    classes is mostly mechanical, it does take a non-trivial amount of
>> >>>    time.
>> >>>
>> >>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>> >>>    omp.  That is, it is conceivable that, for some acc constructs,
>> >>>    there will prove to be no omp syntax to capture the semantics we
>> >>>    wish to implement.  It is also conceivable that we might one day
>> >>>    want to represent some acc constructs directly as extensions to
>> >>>    LLVM IR, where some acc analyses or optimizations might be more
>> >>>    feasible to implement.  This possibility dovetails with recent
>> >>>    discussions in the LLVM community about developing LLVM IR
>> >>>    extensions for various parallel programming models.
>> >>>
>> >>> Because of features 4 and 6, design 1 is likely the fastest design to
>> >>> implement, at least at first while we focus on simple acc features and
>> >>> simple mappings to omp.  However, we have so far found no advantage
>> >>> that design 1 has but that design 3 does not have except for feature
>> >>> 6, which we see as the least important of the above features in the
>> >>> long term.
>> >>>
>> >>> The only advantage we have found that design 2 has but that design 3
>> >>> does not have is feature 7.  It should be possible to choose design 3
>> >>> as the default but, for certain acc constructs or scenarios where
>> >>> feature 7 proves important (if any), incorporate design 2.  In other
>> >>> words, if we decide not to map a particular acc construct to any omp
>> >>> construct, ttx would leave it alone, and we would extend codegen to
>> >>> handle it directly.
>> >>>
>> >>> Conclusions
>> >>> -----------
>> >>>
>> >>> For the above reasons, and because design 3 offers the cleanest
>> >>> separation of concerns, we have chosen design 3 with the possibility
>> >>> of incorporating design 2 where it proves useful.
>> >>>
>> >>> Because of the immutability of clang's AST, the design of our proposed
>> >>> ttx component requires careful consideration.  To shorten this initial
>> >>> email, we have omitted those details for now, but we will be happy to
>> >>> include them as the discussion progresses.
>> >>>
>> >>> _______________________________________________
>> >>> cfe-dev mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jeff Hammond
>> >> [hidden email]
>> >> http://jeffhammond.github.io/
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> [hidden email]
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>
>> >>
>> >> --
>> >> Hal Finkel
>> >> Lead, Compiler Technology and Programming Languages
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >
>> >
>>
>>
>>
>> --
>> Jeff Hammond
>> [hidden email]
>> http://jeffhammond.github.io/
>
>



--
Jeff Hammond
[hidden email]
http://jeffhammond.github.io/

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev


On 12/08/2017 09:58 AM, Joel E. Denny wrote:
Hi Hal,

Thanks for your feedback.  It sounds like we're basically in agreement, but I've added a few thoughts inline below.

On Wed, Dec 6, 2017 at 4:02 AM, Hal Finkel <[hidden email]> wrote:


On 12/05/2017 01:06 PM, Joel E. Denny wrote:
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

Great.


We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls

I don't think that we want this option because, if nothing else, it will preclude builting source-level tooling for OpenACC.

Agreed.


2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

My recommendation: We should think about the very best way we could refactor the code to implement (2), and if that is too ugly (or otherwise significantly degrades maintainability of the OpenMP code), then we should choose (3).

I started out with design 2 in the early prototype I'm experimenting with.  Eventually I figured out some possibilities for how to implement the ttx component above (I'd be happy to discuss that)

That's probably a good idea. Please share some details on this front.

, and I switched to design 3.  So far, I'm finding design 3 to be easier to implement.  Moreover, I can use -ast-print combined with a custom option to print either OpenACC source, OpenMP source, or both with one commented out.  I like that capability.  However, I think it's clear that design 3 has greater potential for running into difficulties as I move forward to more complex OpenACC constructs.

It is this last part that is potentially concerning. If you try it, however, and it sounds like you are, then we'll know for sure soon enough.

Obviously the most efficient way to write some piece of code, and the way to write it to maximize maintainability and ease of extension, may be different. To the extent that they're the same, in terms of upstream functionality, we'll learn something.





In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

We'll end up doing this, but most of this falls within the scope of the "parallel IR" designs that many of us are working on. Doing this kind of analysis in the frontend is hard (because it essentially requires it to do inlining, simplification, and analysis akin to what the optimizer itself does).

I agree.  However, before the parallel IR efforts mature, I need to make progress.  Also, I want to keep my options open, especially at this early stage, so I can experiment with different possibilities.

You're free to prototype things however you'd like :-)

Thanks again,
Hal




6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.

I'm fairly certain that not everything maps exactly. They'll be some things we need to deal with explicitly in CodeGen.

It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


+1


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

This makes sense to me, and I think is most likely to leave the CodeGen code easiest to maintain (and has good separation of concerns). Nevertheless, I think we should go through the mental refactoring exercise for (2) to decide on the value of (3).

At this moment, I'm finding that the easiest way to explore is to just push forward with design 3.  Even so, if developers who have a deeper understanding than I do of clang's OpenMP implementation would like to have an email discussion on the refactoring exercise for design 2, I agree that would be helpful.


Thanks again,
Hal

Thanks.

Joel



Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
On Dec 9, 2017 6:39 AM, "Hal Finkel" <[hidden email]> wrote:


On 12/08/2017 09:58 AM, Joel E. Denny wrote:
Hi Hal,

Thanks for your feedback.  It sounds like we're basically in agreement, but I've added a few thoughts inline below.

On Wed, Dec 6, 2017 at 4:02 AM, Hal Finkel <[hidden email]> wrote:


On 12/05/2017 01:06 PM, Joel E. Denny wrote:
Hi,

We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.

Great.


We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.

Thanks.

Joel E. Denny
Future Technologies Group
Oak Ridge National Laboratory


Design Alternatives
-------------------

We have considered three design alternatives for the clacc compiler:

1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls

I don't think that we want this option because, if nothing else, it will preclude builting source-level tooling for OpenACC.

Agreed.


2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls

My recommendation: We should think about the very best way we could refactor the code to implement (2), and if that is too ugly (or otherwise significantly degrades maintainability of the OpenMP code), then we should choose (3).

I started out with design 2 in the early prototype I'm experimenting with.  Eventually I figured out some possibilities for how to implement the ttx component above (I'd be happy to discuss that)

That's probably a good idea. Please share some details on this front.


Will do.  However, due to the holidays, it might take me a little time to put that in an email. 



, and I switched to design 3.  So far, I'm finding design 3 to be easier to implement.  Moreover, I can use -ast-print combined with a custom option to print either OpenACC source, OpenMP source, or both with one commented out.  I like that capability.  However, I think it's clear that design 3 has greater potential for running into difficulties as I move forward to more complex OpenACC constructs.

It is this last part that is potentially concerning. If you try it, however, and it sounds like you are, then we'll know for sure soon enough.

Obviously the most efficient way to write some piece of code, and the way to write it to maximize maintainability and ease of extension, may be different. To the extent that they're the same, in terms of upstream functionality, we'll learn something.

Agreed.  My hope is that we can get the best of both if design 3 and design 2 prove easily composable.

Thanks.

Joel







In the above diagram:

* acc src = C source code containing acc constructs.
* acc AST = a clang AST in which acc constructs are represented by
  nodes with acc node types.  Of course, such node types do not
  already exist in clang's implementation.
* omp AST = a clang AST in which acc constructs have been lowered
  to omp constructs represented by nodes with omp node types.  Of
  course, such node types do already exist in clang's
  implementation.
* parser = the existing clang parser and semantic analyzer,
  extended to handle acc constructs.
* codegen = the existing clang backend that translates a clang AST
  to LLVM IR, extended if necessary (depending on which design is
  chosen) to perform codegen from acc nodes.
* ttx (tree transformer) = a new clang component that transforms
  acc to omp in clang ASTs.

Design Features
---------------

There are several features to consider when choosing among the designs
in the previous section:

1. acc AST as an artifact -- Because they create acc AST nodes,
   designs 2 and 3 best facilitate the creation of additional acc
   source-level tools (such as pretty printers, analyzers, lint-like
   tools, and editor extensions).  Some of these tools, such as pretty
   printing, would be available immediately or as minor extensions of
   tools that already exist in clang's ecosystem.

2. omp AST/source as an artifact -- Because they create omp AST
   nodes, designs 1 and 3 best facilitate the use of source-level
   tools to help an application developer discover how clacc has
   mapped his acc to omp, possibly in order to debug a mapping
   specification he has supplied.  With design 2 instead, an
   application developer has to examine low-level LLVM IR + omp rt
   calls.  Moreover, with designs 1 and 3, permanently migrating an
   application's acc source to omp source can be automated.

3. omp AST for mapping implementation -- Designs 1 and 3 might
   also make it easier for the compiler developer to reason about and
   implement mappings from acc to omp.  That is, because acc and omp
   syntax is so similar, implementing the translation at the level of
   a syntactic representation is probably easier than translating to
   LLVM IR.

4. omp AST for codegen -- Designs 1 and 3 simplify the
   compiler implementation by enabling reuse of clang's existing omp
   support for codegen.  In contrast, design 2 requires at least some
   extensions to clang codegen to support acc nodes.

5. Full acc AST for mapping -- Designs 2 and 3 potentially
   enable the compiler to analyze the entire source (as opposed to
   just the acc construct currently being parsed) while choosing the
   mapping to omp.  It is not clear if this feature will prove useful,
   but it might enable more optimizations and compiler research
   opportunities.

We'll end up doing this, but most of this falls within the scope of the "parallel IR" designs that many of us are working on. Doing this kind of analysis in the frontend is hard (because it essentially requires it to do inlining, simplification, and analysis akin to what the optimizer itself does).

I agree.  However, before the parallel IR efforts mature, I need to make progress.  Also, I want to keep my options open, especially at this early stage, so I can experiment with different possibilities.

You're free to prototype things however you'd like :-)

Thanks again,
Hal





6. No acc node classes -- Design 1 simplifies the compiler
   implementation by eliminating the need to implement many acc node
   classes.  While we have so far found that implementing these
   classes is mostly mechanical, it does take a non-trivial amount of
   time.

7. No omp mapping -- Design 2 does not require acc to be mapped to
   omp.  That is, it is conceivable that, for some acc constructs,
   there will prove to be no omp syntax to capture the semantics we
   wish to implement.

I'm fairly certain that not everything maps exactly. They'll be some things we need to deal with explicitly in CodeGen.

It is also conceivable that we might one day
   want to represent some acc constructs directly as extensions to
   LLVM IR, where some acc analyses or optimizations might be more
   feasible to implement.  This possibility dovetails with recent
   discussions in the LLVM community about developing LLVM IR
   extensions for various parallel programming models.


+1


Because of features 4 and 6, design 1 is likely the fastest design to
implement, at least at first while we focus on simple acc features and
simple mappings to omp.  However, we have so far found no advantage
that design 1 has but that design 3 does not have except for feature
6, which we see as the least important of the above features in the
long term.

The only advantage we have found that design 2 has but that design 3
does not have is feature 7.  It should be possible to choose design 3
as the default but, for certain acc constructs or scenarios where
feature 7 proves important (if any), incorporate design 2.  In other
words, if we decide not to map a particular acc construct to any omp
construct, ttx would leave it alone, and we would extend codegen to
handle it directly.

This makes sense to me, and I think is most likely to leave the CodeGen code easiest to maintain (and has good separation of concerns). Nevertheless, I think we should go through the mental refactoring exercise for (2) to decide on the value of (3).

At this moment, I'm finding that the easiest way to explore is to just push forward with design 3.  Even so, if developers who have a deeper understanding than I do of clang's OpenMP implementation would like to have an email discussion on the refactoring exercise for design 2, I agree that would be helpful.


Thanks again,
Hal

Thanks.

Joel



Conclusions
-----------

For the above reasons, and because design 3 offers the cleanest
separation of concerns, we have chosen design 3 with the possibility
of incorporating design 2 where it proves useful.

Because of the immutability of clang's AST, the design of our proposed
ttx component requires careful consideration.  To shorten this initial
email, we have omitted those details for now, but we will be happy to
include them as the discussion progresses.


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
In reply to this post by Robinson, Paul via cfe-dev
Sorry the top post, but just to chime in here a bit..

As a company who has implemented OpenACC on top of clang *ALREADY* and supported customers for C (and a limited amount of C++) I see a lot of benefit here. Frankly, the amount of accelerated C/C++ code is growing and what does or doesn't exist now doesn't mean nearly as much as the industry trend towards C++. I'm biased in that I think OMP is just rubbish for offloading and needs a lot of work. I don't think OpenACC is any perfect solution, but certainly has the benefit of being easier for users, but at the expense of needing smart and high quality tools.

2nd - there was no community usage requirement when Intel started working on getting OMP4 added to clang. It was allowed that it could incrementally be merged and reviewed. Adding this as a troll blocker just doesn't seem to be fair. Please stop with the politics and judge this based on technical merit alone. OpenACC and OpenMP4+ have similar goals, but try to achieve them in different ways. All my biases aside please stick to technical reasons for why this should be blocked. (For example, if there isn't anyone who has pledged to continue to support it long term.. etc)

So basically if this is just some lame research project with no extended maintenance plan, make a fork and put your code there and not bother upstream. Otherwise, I think it's a very welcome idea.
 

On Sat, Dec 9, 2017 at 2:02 AM, Jeff Hammond via cfe-dev <[hidden email]> wrote:


On Fri, Dec 8, 2017 at 9:00 AM, Joel E. Denny <[hidden email]> wrote:

>
> On Fri, Dec 8, 2017 at 11:32 AM, Jeff Hammond <[hidden email]> wrote:
>>
>>
>>
>> On Fri, Dec 8, 2017 at 7:51 AM, Joel E. Denny <[hidden email]> wrote:
>> >
>> > Hi Jeff, Hal,
>> >
>> > Thanks for your feedback.  My comments are inline below.
>> >
>> > On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
>> >>
>> >> All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
>> >
>> > I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:
>>
>> Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.
>
>
> Are you saying that the occurrences of "pragma acc" in Nek5000 and NekCEM are unused?
>

The instances of "pragma acc" in those - it's the same code in both projects - are either (1) only causing host-device data synchronization or (2) commented-out.

It's unclear to me what actually happens in the code as currently written.  The OpenACC C/C++ code does not more than copy data to/from the device.  I didn't trace the entire code execution but I can't tell if any code touches the device data that OpenACC is updating.  If it is updated, it is updated by Fortran OpenACC code somewhere else in the source tree.

What does the OpenACC standard say about interoperability of compilers+runtimes, as would be required if one used Clang OpenACC for C/C++ and Fortran OpenACC implemented by PGI, Cray, or GCC.  OpenMP definitely does not support this, even if a subset of usage may work when one uses the same runtime library with different compilers.

/tmp/Nek5000$ git grep "pragma acc"
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc exit data delete(map0,map1)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c://#pragma acc enter data copyin(stage[0].scatter_map[0:stage[0].s_size],stage[0].scatter_mapf[0:stage[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[0].scatter_map[0:stage2[0].s_size],stage2[0].scatter_mapf[0:stage2[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc update host(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:  //#pragma acc exit data delete(ard->map_to_buf[0],ard->map_to_buf[1],ard->map_from_buf[0],ard->map_from_buf[1])
jl/gs.c:  //#pragma acc enter data copyin(ard->map_to_buf[0][0:ard->mt_size[0]],ard->map_from_buf[0][0:ard->mf_size[0]],ard->map_to_buf_f[0][0:ard->mt_nt[0]],ard->map_from_buf_f[0][0:ard->mf_nt[0]],ard->map_to_buf[1][0:ard->mt_size[1]],ard->map_from_buf[1][0:ard->mf_size[1]],ard->map_to_buf_f[1][0:ard->mt_nt[1]],ard->map_from_buf_f[1][0:ard->mf_nt[1]])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(map_local0,map_local1,flagged_primaries)
jl/gs.c:#pragma acc enter data pcopyin(map[0:*m_size],mapf2[0:2*mf_temp])
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc enter data pcopyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc data present(u[0:uds],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2],t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size]) 
jl/gs_acc.c://#pragma acc data create(sbuf[0:bl],rbuf[0:bl]) if(bl!=0)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],map[0:m_size],mapf[0:m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size],fp_mapf[0:fp_m_nt*2]) private(i,j) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size]) private(i,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],snd_mapf[0:snd_m_nt*2],sbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],sbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc update host(sbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc update device(rbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rcv_mapf[0:rcv_m_nt*2],rbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c:    //#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],t_map[0:t_m_size],t_mapf[0:t_m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
 

>>
>>
>> > http://mrfil.github.io/PowerGrid/
>>
>> /tmp/PowerGrid$ git grep -il "pragma acc"
>> PowerGrid/Gfft.hpp
>> PowerGrid/Gnufft.hpp
>> PowerGrid/ftCpu.hpp
>> PowerGrid/gridding.hpp
>> PowerGrid/griddingSupport.hpp
>>
>> From http://mrfil.github.io/PowerGrid/docs/Installation:
>>
>> We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.
>>
>> GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.
>>
>> For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.
>>
>> So these folks support OpenACC, but haven't bothered to try the GCC implementation in the 1+ year that it's been available.  How likely are they to use Clang's?
>
>
> I cannot answer that. Perhaps they were waiting for GCC support to mature?

Or maybe they aren't interested using in OpenACC compiler support outside of PGI.

What I'm really getting at here is who is going to use OpenACC support in Clang, particularly if there is no compatible Fortran OpenACC compiler?  In addition to justifying the code maintenance effort, users who are not developers are essential for implementation hardening.

Best,

Jeff


> Thanks.
>
> Joel
>
>>
>> > https://nek5000.mcs.anl.gov/ (look at the openacc branch in github)
>>
>> (on the openacc branch)
>>
>> /tmp/Nek5000$ git grep -il "\$acc "
>> core/acc.f
>> core/comm_mpi.f
>> core/gmres.f
>> core/hmholtz.f
>> core/hsmg.f
>> core/math.f
>> core/navier1.f
>> core/navier4.f
>> core/plan4.f
>> core/prepost.f
>> core/subs2.f
>>
>> >
>> > https://nekcem.mcs.anl.gov/
>>
>> (on master)
>> /tmp/svn$ git grep -il "\$acc"
>> branches/maxwell-experimental/src/cem_dg.F
>> branches/maxwell-experimental/src/dssum2.F
>> branches/maxwell-experimental/src/io.F
>> branches/maxwell-experimental/src/mat1.F
>> branches/maxwell-experimental/src/maxwell.F
>> branches/maxwell-experimental/src/maxwell_acc.F
>> branches/maxwell-experimental/src/mxm_acc.F
>> branches/trunkQu/src/quantum_csr.F
>> branches/trunkQu/src/quantum_setup.f
>> branches/trunkQu/src/quantum_time.F
>> trunk/examples/openacc_gpu=1/box.usr
>> trunk/examples/openacc_gpu=8/box.usr
>> trunk/src/acoustic.F
>> trunk/src/cem_dg2.F
>> trunk/src/complex.F
>> trunk/src/drift1.F
>> trunk/src/drift1_maud.F
>> trunk/src/drive.F
>> trunk/src/drive_maud.F
>> trunk/src/dssum2.F
>> trunk/src/hmholtz.F
>> trunk/src/io.F
>> trunk/src/mat1.F
>> trunk/src/maxwell.F
>> trunk/src/maxwell_acc.F
>> trunk/src/mg_r2204.F
>> trunk/src/mxm_acc.F
>> trunk/src/poisson.F
>> trunk/src/quantum2.F
>> www/examples/libs/phpThumb/phpthumb.functions.php
>> www/examples/phpthumb.functions.php
>>
>> >>   Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?
>> >>
>> >>
>> >> Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).
>> >
>> > I'll look for a better list that I can share.
>>
>> That would be helpful.
>>
>> Best,
>>
>> Jeff
>>
>>
>>
>> >> I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).
>> >
>> >
>> > Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.
>> >
>> > Joel
>> >
>> >>
>> >> Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.
>> >>
>> >>  -Hal
>> >>
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Jeff
>> >>
>> >> On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.
>> >>>
>> >>> We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Joel E. Denny
>> >>> Future Technologies Group
>> >>> Oak Ridge National Laboratory
>> >>>
>> >>>
>> >>> Design Alternatives
>> >>> -------------------
>> >>>
>> >>> We have considered three design alternatives for the clacc compiler:
>> >>>
>> >>> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
>> >>> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>>
>> >>> In the above diagram:
>> >>>
>> >>> * acc src = C source code containing acc constructs.
>> >>> * acc AST = a clang AST in which acc constructs are represented by
>> >>>   nodes with acc node types.  Of course, such node types do not
>> >>>   already exist in clang's implementation.
>> >>> * omp AST = a clang AST in which acc constructs have been lowered
>> >>>   to omp constructs represented by nodes with omp node types.  Of
>> >>>   course, such node types do already exist in clang's
>> >>>   implementation.
>> >>> * parser = the existing clang parser and semantic analyzer,
>> >>>   extended to handle acc constructs.
>> >>> * codegen = the existing clang backend that translates a clang AST
>> >>>   to LLVM IR, extended if necessary (depending on which design is
>> >>>   chosen) to perform codegen from acc nodes.
>> >>> * ttx (tree transformer) = a new clang component that transforms
>> >>>   acc to omp in clang ASTs.
>> >>>
>> >>> Design Features
>> >>> ---------------
>> >>>
>> >>> There are several features to consider when choosing among the designs
>> >>> in the previous section:
>> >>>
>> >>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>> >>>    designs 2 and 3 best facilitate the creation of additional acc
>> >>>    source-level tools (such as pretty printers, analyzers, lint-like
>> >>>    tools, and editor extensions).  Some of these tools, such as pretty
>> >>>    printing, would be available immediately or as minor extensions of
>> >>>    tools that already exist in clang's ecosystem.
>> >>>
>> >>> 2. omp AST/source as an artifact -- Because they create omp AST
>> >>>    nodes, designs 1 and 3 best facilitate the use of source-level
>> >>>    tools to help an application developer discover how clacc has
>> >>>    mapped his acc to omp, possibly in order to debug a mapping
>> >>>    specification he has supplied.  With design 2 instead, an
>> >>>    application developer has to examine low-level LLVM IR + omp rt
>> >>>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>> >>>    application's acc source to omp source can be automated.
>> >>>
>> >>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>> >>>    also make it easier for the compiler developer to reason about and
>> >>>    implement mappings from acc to omp.  That is, because acc and omp
>> >>>    syntax is so similar, implementing the translation at the level of
>> >>>    a syntactic representation is probably easier than translating to
>> >>>    LLVM IR.
>> >>>
>> >>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>> >>>    compiler implementation by enabling reuse of clang's existing omp
>> >>>    support for codegen.  In contrast, design 2 requires at least some
>> >>>    extensions to clang codegen to support acc nodes.
>> >>>
>> >>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>> >>>    enable the compiler to analyze the entire source (as opposed to
>> >>>    just the acc construct currently being parsed) while choosing the
>> >>>    mapping to omp.  It is not clear if this feature will prove useful,
>> >>>    but it might enable more optimizations and compiler research
>> >>>    opportunities.
>> >>>
>> >>> 6. No acc node classes -- Design 1 simplifies the compiler
>> >>>    implementation by eliminating the need to implement many acc node
>> >>>    classes.  While we have so far found that implementing these
>> >>>    classes is mostly mechanical, it does take a non-trivial amount of
>> >>>    time.
>> >>>
>> >>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>> >>>    omp.  That is, it is conceivable that, for some acc constructs,
>> >>>    there will prove to be no omp syntax to capture the semantics we
>> >>>    wish to implement.  It is also conceivable that we might one day
>> >>>    want to represent some acc constructs directly as extensions to
>> >>>    LLVM IR, where some acc analyses or optimizations might be more
>> >>>    feasible to implement.  This possibility dovetails with recent
>> >>>    discussions in the LLVM community about developing LLVM IR
>> >>>    extensions for various parallel programming models.
>> >>>
>> >>> Because of features 4 and 6, design 1 is likely the fastest design to
>> >>> implement, at least at first while we focus on simple acc features and
>> >>> simple mappings to omp.  However, we have so far found no advantage
>> >>> that design 1 has but that design 3 does not have except for feature
>> >>> 6, which we see as the least important of the above features in the
>> >>> long term.
>> >>>
>> >>> The only advantage we have found that design 2 has but that design 3
>> >>> does not have is feature 7.  It should be possible to choose design 3
>> >>> as the default but, for certain acc constructs or scenarios where
>> >>> feature 7 proves important (if any), incorporate design 2.  In other
>> >>> words, if we decide not to map a particular acc construct to any omp
>> >>> construct, ttx would leave it alone, and we would extend codegen to
>> >>> handle it directly.
>> >>>
>> >>> Conclusions
>> >>> -----------
>> >>>
>> >>> For the above reasons, and because design 3 offers the cleanest
>> >>> separation of concerns, we have chosen design 3 with the possibility
>> >>> of incorporating design 2 where it proves useful.
>> >>>
>> >>> Because of the immutability of clang's AST, the design of our proposed
>> >>> ttx component requires careful consideration.  To shorten this initial
>> >>> email, we have omitted those details for now, but we will be happy to
>> >>> include them as the discussion progresses.
>> >>>
>> >>> _______________________________________________
>> >>> cfe-dev mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jeff Hammond
>> >> [hidden email]
>> >> http://jeffhammond.github.io/
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> [hidden email]
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>
>> >>
>> >> --
>> >> Hal Finkel
>> >> Lead, Compiler Technology and Programming Languages
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >
>> >
>>
>>
>>
>> --
>> Jeff Hammond
>> [hidden email]
>> http://jeffhammond.github.io/
>
>



--
Jeff Hammond
[hidden email]
http://jeffhammond.github.io/

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: clacc: translating OpenACC to OpenMP in clang

Robinson, Paul via cfe-dev
On Mon, Dec 11, 2017 at 9:26 AM, C Bergström via cfe-dev <[hidden email]> wrote:
2nd - there was no community usage requirement when Intel started working on getting OMP4 added to clang. It was allowed that it could incrementally be merged and reviewed. Adding this as a troll blocker just doesn't seem to be fair. Please stop with the politics and judge this based on technical merit alone. OpenACC and OpenMP4+ have similar goals, but try to achieve them in different ways. All my biases aside please stick to technical reasons for why this should be blocked. (For example, if there isn't anyone who has pledged to continue to support it long term.. etc)

There is an *explicit* requirement in the "Contributing Extensions to Clang" section of "Getting Involved with the Clang Project" document (http://clang.llvm.org/get_involved.html#criteria):

"Evidence of a significant user community: This is based on a number of factors, including an actual, existing user community, the perceived likelihood that users would adopt such a feature if it were available, and any "trickle-down" effects that come from, e.g., a library adopting the feature and providing benefits to its users."

As an example, CilkPlus support was rejected partly due to concerns of not significant user community present at the time.

I don't remember the exact situation with OpenMP, but perhaps the concern hadn't been risen simply because usage interest (including from your company, no?) was just too obvious for everyone?

Yours,
Andrey
===
Compiler Architect
NXP
 

So basically if this is just some lame research project with no extended maintenance plan, make a fork and put your code there and not bother upstream. Otherwise, I think it's a very welcome idea.

 

On Sat, Dec 9, 2017 at 2:02 AM, Jeff Hammond via cfe-dev <[hidden email]> wrote:


On Fri, Dec 8, 2017 at 9:00 AM, Joel E. Denny <[hidden email]> wrote:

>
> On Fri, Dec 8, 2017 at 11:32 AM, Jeff Hammond <[hidden email]> wrote:
>>
>>
>>
>> On Fri, Dec 8, 2017 at 7:51 AM, Joel E. Denny <[hidden email]> wrote:
>> >
>> > Hi Jeff, Hal,
>> >
>> > Thanks for your feedback.  My comments are inline below.
>> >
>> > On Tue, Dec 5, 2017 at 6:43 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> On 12/05/2017 05:11 PM, Jeff Hammond via cfe-dev wrote:
>> >>
>> >> All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
>> >
>> > I agree that it's easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++.  However, the latter certainly exist.  For example:
>>
>> Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.
>
>
> Are you saying that the occurrences of "pragma acc" in Nek5000 and NekCEM are unused?
>

The instances of "pragma acc" in those - it's the same code in both projects - are either (1) only causing host-device data synchronization or (2) commented-out.

It's unclear to me what actually happens in the code as currently written.  The OpenACC C/C++ code does not more than copy data to/from the device.  I didn't trace the entire code execution but I can't tell if any code touches the device data that OpenACC is updating.  If it is updated, it is updated by Fortran OpenACC code somewhere else in the source tree.

What does the OpenACC standard say about interoperability of compilers+runtimes, as would be required if one used Clang OpenACC for C/C++ and Fortran OpenACC implemented by PGI, Cray, or GCC.  OpenMP definitely does not support this, even if a subset of usage may work when one uses the same runtime library with different compilers.

/tmp/Nek5000$ git grep "pragma acc"
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update host(sendbuf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize/2]) if(acc)
jl/gs.c:#pragma acc exit data delete(map0,map1)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update host(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_size*bufSize]) if(acc)
jl/gs.c://#pragma acc enter data copyin(stage[0].scatter_map[0:stage[0].s_size],stage[0].scatter_mapf[0:stage[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[0].scatter_map[0:stage2[0].s_size],stage2[0].scatter_mapf[0:stage2[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc update host(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:vn*unit_size*bufSize]) if(acc)
jl/gs.c:  //#pragma acc exit data delete(ard->map_to_buf[0],ard->map_to_buf[1],ard->map_from_buf[0],ard->map_from_buf[1])
jl/gs.c:  //#pragma acc enter data copyin(ard->map_to_buf[0][0:ard->mt_size[0]],ard->map_from_buf[0][0:ard->mf_size[0]],ard->map_to_buf_f[0][0:ard->mt_nt[0]],ard->map_from_buf_f[0][0:ard->mf_nt[0]],ard->map_to_buf[1][0:ard->mt_size[1]],ard->map_from_buf[1][0:ard->mf_size[1]],ard->map_to_buf_f[1][0:ard->mt_nt[1]],ard->map_from_buf_f[1][0:ard->mf_nt[1]])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vn*gs_dom_size[dom]*gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(map_local0,map_local1,flagged_primaries)
jl/gs.c:#pragma acc enter data pcopyin(map[0:*m_size],mapf2[0:2*mf_temp])
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c:  //#pragma acc enter data copyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc enter data pcopyin(t_mapf[0:t_m_nt*2],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc data present(u[0:uds],mapf[0:m_nt*2],snd_mapf[0:snd_m_nt*2],rcv_mapf[0:rcv_m_nt*2],fp_mapf[0:fp_m_nt*2],t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size]) 
jl/gs_acc.c://#pragma acc data create(sbuf[0:bl],rbuf[0:bl]) if(bl!=0)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],map[0:m_size],mapf[0:m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size],fp_mapf[0:fp_m_nt*2]) private(i,j) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size]) private(i,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],snd_mapf[0:snd_m_nt*2],sbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],sbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc update host(sbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc update device(rbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rcv_mapf[0:rcv_m_nt*2],rbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait      
jl/gs_acc.c:    //#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],t_map[0:t_m_size],t_mapf[0:t_m_nt*2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
 

>>
>>
>> > http://mrfil.github.io/PowerGrid/
>>
>> /tmp/PowerGrid$ git grep -il "pragma acc"
>> PowerGrid/Gfft.hpp
>> PowerGrid/Gnufft.hpp
>> PowerGrid/ftCpu.hpp
>> PowerGrid/gridding.hpp
>> PowerGrid/griddingSupport.hpp
>>
>> From http://mrfil.github.io/PowerGrid/docs/Installation:
>>
>> We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.
>>
>> GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.
>>
>> For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.
>>
>> So these folks support OpenACC, but haven't bothered to try the GCC implementation in the 1+ year that it's been available.  How likely are they to use Clang's?
>
>
> I cannot answer that. Perhaps they were waiting for GCC support to mature?

Or maybe they aren't interested using in OpenACC compiler support outside of PGI.

What I'm really getting at here is who is going to use OpenACC support in Clang, particularly if there is no compatible Fortran OpenACC compiler?  In addition to justifying the code maintenance effort, users who are not developers are essential for implementation hardening.

Best,

Jeff


> Thanks.
>
> Joel
>
>>
>> > https://nek5000.mcs.anl.gov/ (look at the openacc branch in github)
>>
>> (on the openacc branch)
>>
>> /tmp/Nek5000$ git grep -il "\$acc "
>> core/acc.f
>> core/comm_mpi.f
>> core/gmres.f
>> core/hmholtz.f
>> core/hsmg.f
>> core/math.f
>> core/navier1.f
>> core/navier4.f
>> core/plan4.f
>> core/prepost.f
>> core/subs2.f
>>
>> >
>> > https://nekcem.mcs.anl.gov/
>>
>> (on master)
>> /tmp/svn$ git grep -il "\$acc"
>> branches/maxwell-experimental/src/cem_dg.F
>> branches/maxwell-experimental/src/dssum2.F
>> branches/maxwell-experimental/src/io.F
>> branches/maxwell-experimental/src/mat1.F
>> branches/maxwell-experimental/src/maxwell.F
>> branches/maxwell-experimental/src/maxwell_acc.F
>> branches/maxwell-experimental/src/mxm_acc.F
>> branches/trunkQu/src/quantum_csr.F
>> branches/trunkQu/src/quantum_setup.f
>> branches/trunkQu/src/quantum_time.F
>> trunk/examples/openacc_gpu=1/box.usr
>> trunk/examples/openacc_gpu=8/box.usr
>> trunk/src/acoustic.F
>> trunk/src/cem_dg2.F
>> trunk/src/complex.F
>> trunk/src/drift1.F
>> trunk/src/drift1_maud.F
>> trunk/src/drive.F
>> trunk/src/drive_maud.F
>> trunk/src/dssum2.F
>> trunk/src/hmholtz.F
>> trunk/src/io.F
>> trunk/src/mat1.F
>> trunk/src/maxwell.F
>> trunk/src/maxwell_acc.F
>> trunk/src/mg_r2204.F
>> trunk/src/mxm_acc.F
>> trunk/src/poisson.F
>> trunk/src/quantum2.F
>> www/examples/libs/phpThumb/phpthumb.functions.php
>> www/examples/phpthumb.functions.php
>>
>> >>   Can you provide a list of C/C++ applications using OpenACC today and estimate the number of users that will benefit from this feature?
>> >>
>> >>
>> >> Such lists exist, although I don't know what can be shared (and Oak Ridge likely has better lists in this regard than I do).
>> >
>> > I'll look for a better list that I can share.
>>
>> That would be helpful.
>>
>> Best,
>>
>> Jeff
>>
>>
>>
>> >> I can tell you, from my own experience, that we're seeing an increase in development using OpenACC, in both C/C++ and Fortran, over the last couple of years (essentially because the compiler technology has improved to the point where that is now a potentially-productive choice).
>> >
>> >
>> > Providing support in a production-quality, open-source compiler tool chain like LLVM will hopefully accelerate this trend.
>> >
>> > Joel
>> >
>> >>
>> >> Also, we have a strong desire to enable tooling over code bases using OpenACC. Among many other things, at some point we'll likely want the option to automatically migrate much of this code to using OpenMP. Having an OpenACC-enabled Clang, with an implementation that maps to OpenMP, is an important step in that process.
>> >>
>> >>  -Hal
>> >>
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Jeff
>> >>
>> >> On Tue, Dec 5, 2017 at 11:06 AM, Joel E. Denny via cfe-dev <[hidden email]> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We are working on a new project, clacc, that extends clang with OpenACC support.  Clacc's approach is to translate OpenACC (a descriptive language) to OpenMP (a prescriptive language) and thus to build on clang's existing OpenMP support.  While we plan to develop clacc to support our own research, an important goal is to contribute clacc as a production-quality component of upstream clang.
>> >>>
>> >>> We have begun implementing an early prototype of clacc.  Before we get too far into the implementation, we would like to get feedback from the LLVM community to help ensure our design would ultimately be acceptable for contribution.  For that purpose, below is an analysis of several high-level design alternatives we have considered and their various features.  We welcome any feedback.
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Joel E. Denny
>> >>> Future Technologies Group
>> >>> Oak Ridge National Laboratory
>> >>>
>> >>>
>> >>> Design Alternatives
>> >>> -------------------
>> >>>
>> >>> We have considered three design alternatives for the clacc compiler:
>> >>>
>> >>> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR + omp rt calls
>> >>> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR + omp rt calls
>> >>>
>> >>> In the above diagram:
>> >>>
>> >>> * acc src = C source code containing acc constructs.
>> >>> * acc AST = a clang AST in which acc constructs are represented by
>> >>>   nodes with acc node types.  Of course, such node types do not
>> >>>   already exist in clang's implementation.
>> >>> * omp AST = a clang AST in which acc constructs have been lowered
>> >>>   to omp constructs represented by nodes with omp node types.  Of
>> >>>   course, such node types do already exist in clang's
>> >>>   implementation.
>> >>> * parser = the existing clang parser and semantic analyzer,
>> >>>   extended to handle acc constructs.
>> >>> * codegen = the existing clang backend that translates a clang AST
>> >>>   to LLVM IR, extended if necessary (depending on which design is
>> >>>   chosen) to perform codegen from acc nodes.
>> >>> * ttx (tree transformer) = a new clang component that transforms
>> >>>   acc to omp in clang ASTs.
>> >>>
>> >>> Design Features
>> >>> ---------------
>> >>>
>> >>> There are several features to consider when choosing among the designs
>> >>> in the previous section:
>> >>>
>> >>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>> >>>    designs 2 and 3 best facilitate the creation of additional acc
>> >>>    source-level tools (such as pretty printers, analyzers, lint-like
>> >>>    tools, and editor extensions).  Some of these tools, such as pretty
>> >>>    printing, would be available immediately or as minor extensions of
>> >>>    tools that already exist in clang's ecosystem.
>> >>>
>> >>> 2. omp AST/source as an artifact -- Because they create omp AST
>> >>>    nodes, designs 1 and 3 best facilitate the use of source-level
>> >>>    tools to help an application developer discover how clacc has
>> >>>    mapped his acc to omp, possibly in order to debug a mapping
>> >>>    specification he has supplied.  With design 2 instead, an
>> >>>    application developer has to examine low-level LLVM IR + omp rt
>> >>>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>> >>>    application's acc source to omp source can be automated.
>> >>>
>> >>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>> >>>    also make it easier for the compiler developer to reason about and
>> >>>    implement mappings from acc to omp.  That is, because acc and omp
>> >>>    syntax is so similar, implementing the translation at the level of
>> >>>    a syntactic representation is probably easier than translating to
>> >>>    LLVM IR.
>> >>>
>> >>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>> >>>    compiler implementation by enabling reuse of clang's existing omp
>> >>>    support for codegen.  In contrast, design 2 requires at least some
>> >>>    extensions to clang codegen to support acc nodes.
>> >>>
>> >>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>> >>>    enable the compiler to analyze the entire source (as opposed to
>> >>>    just the acc construct currently being parsed) while choosing the
>> >>>    mapping to omp.  It is not clear if this feature will prove useful,
>> >>>    but it might enable more optimizations and compiler research
>> >>>    opportunities.
>> >>>
>> >>> 6. No acc node classes -- Design 1 simplifies the compiler
>> >>>    implementation by eliminating the need to implement many acc node
>> >>>    classes.  While we have so far found that implementing these
>> >>>    classes is mostly mechanical, it does take a non-trivial amount of
>> >>>    time.
>> >>>
>> >>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>> >>>    omp.  That is, it is conceivable that, for some acc constructs,
>> >>>    there will prove to be no omp syntax to capture the semantics we
>> >>>    wish to implement.  It is also conceivable that we might one day
>> >>>    want to represent some acc constructs directly as extensions to
>> >>>    LLVM IR, where some acc analyses or optimizations might be more
>> >>>    feasible to implement.  This possibility dovetails with recent
>> >>>    discussions in the LLVM community about developing LLVM IR
>> >>>    extensions for various parallel programming models.
>> >>>
>> >>> Because of features 4 and 6, design 1 is likely the fastest design to
>> >>> implement, at least at first while we focus on simple acc features and
>> >>> simple mappings to omp.  However, we have so far found no advantage
>> >>> that design 1 has but that design 3 does not have except for feature
>> >>> 6, which we see as the least important of the above features in the
>> >>> long term.
>> >>>
>> >>> The only advantage we have found that design 2 has but that design 3
>> >>> does not have is feature 7.  It should be possible to choose design 3
>> >>> as the default but, for certain acc constructs or scenarios where
>> >>> feature 7 proves important (if any), incorporate design 2.  In other
>> >>> words, if we decide not to map a particular acc construct to any omp
>> >>> construct, ttx would leave it alone, and we would extend codegen to
>> >>> handle it directly.
>> >>>
>> >>> Conclusions
>> >>> -----------
>> >>>
>> >>> For the above reasons, and because design 3 offers the cleanest
>> >>> separation of concerns, we have chosen design 3 with the possibility
>> >>> of incorporating design 2 where it proves useful.
>> >>>
>> >>> Because of the immutability of clang's AST, the design of our proposed
>> >>> ttx component requires careful consideration.  To shorten this initial
>> >>> email, we have omitted those details for now, but we will be happy to
>> >>> include them as the discussion progresses.
>> >>>
>> >>> _______________________________________________
>> >>> cfe-dev mailing list
>> >>> [hidden email]
>> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jeff Hammond
>> >> [hidden email]
>> >> http://jeffhammond.github.io/
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> [hidden email]
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >>
>> >>
>> >> --
>> >> Hal Finkel
>> >> Lead, Compiler Technology and Programming Languages
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >
>> >
>>
>>
>>
>> --
>> Jeff Hammond
>> [hidden email]
>> http://jeffhammond.github.io/
>
>



--
Jeff Hammond
[hidden email]
http://jeffhammond.github.io/

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev