[RFC] Open sourcing and contributing TAPI back to the LLVM community

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC] Open sourcing and contributing TAPI back to the LLVM community

Eric Fiselier via cfe-dev
Hi @ll,

Over the past years I have been looking into how to reduce the size of the SDK that ships with Xcode and how to improve build times for the overall OS inside Apple. The result is a tool called TAPI, which is used at Apple for all things related to text-based dynamic library files (.tbd).

What are text-based dynamic library files?
Text-based dynamic library files (TBDs) are a textual representation of the information in a dynamic library / shared library that is required by the static linker - basically a symbol list of the exported symbols.

Apple’s SDKs originally used Mach-O Dynamic Library Stubs. Mach-O Dynamic Library Stubs are dynamic library files, but with all the text and data stripped out. TBD files were introduced to replaced Mach-O Dynamic Library Stub files in the SDK to further reduce its overall size.

Over time the TAPI tool has grown and is used now in a variety of ways.

Dynamic Library Stubbing:
As mentioned above, TAPI is used to read the content of dynamic library / shared library and generates a textual representation that can be used by the static linker. The current implementation reads MachO files, but it could be extended to also provide the same functionality for other object file formats.

Framework / Dynamic Library Verification:
The symbols that are exported from a dynamic library should ideally match, or at least contain, all the API that is specified in the associated header files. TAPI performs this verification by parsing the header files with CLANG and compare the findings to the exported symbols from the library.

InstallAPI:
InstallAPI is a new build phase that generates the TBD file from header files only. This allows a dependency of the library to build concurrently even before the library has been built itself. This can be used to increase parallelism in the build or larger projects or operating systems.

Misc:
- display and operate on TBD files
- automatically generate API tests from header files
- libtapi, which is used by the linker (ld64) to parse the TBD files


The functionality of the tool is currently limited to Mach-O object files, but that is not a technical limitation. In making the tool open source I hope others will be able to take advantage of it too and extend its functionality to other object file formats.


I initially developed the project as a CLANG project, but that was mostly for practical reasons (out-of-tree development, separate repo, etc). For the curious ones I pushed the repo to github (https://github.com/ributzka/tapi).

I imagine, for example, that the reading/writing of TBD files is something that would fit better into the LLVM sources, which makes it available to other libraries and tools (e.g. LLVMObject, llvm-nm, lld, ...).

I created a small patch that integrates it with llvm-nm and LLVMObject. This patch is not complete and I will split it up into smaller patches for review. I am providing it as a reference to get the discussion started.

Please let me know what you think and bikeshed away :)

Thanks

Cheers,
Juergen






_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

tapi-llvm-nm.patch.tar.bz2 (22K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Open sourcing and contributing TAPI back to the LLVM community

Eric Fiselier via cfe-dev
On Thu, Sep 7, 2017 at 5:01 PM, Juergen Ributzka via llvm-dev
<[hidden email]> wrote:

> Hi @ll,
>
> Over the past years I have been looking into how to reduce the size of the
> SDK that ships with Xcode and how to improve build times for the overall OS
> inside Apple. The result is a tool called TAPI, which is used at Apple for
> all things related to text-based dynamic library files (.tbd).
>
> What are text-based dynamic library files?
> Text-based dynamic library files (TBDs) are a textual representation of the
> information in a dynamic library / shared library that is required by the
> static linker - basically a symbol list of the exported symbols.
>
> Apple’s SDKs originally used Mach-O Dynamic Library Stubs. Mach-O Dynamic
> Library Stubs are dynamic library files, but with all the text and data
> stripped out. TBD files were introduced to replaced Mach-O Dynamic Library
> Stub files in the SDK to further reduce its overall size.
>
> Over time the TAPI tool has grown and is used now in a variety of ways.
>
> Dynamic Library Stubbing:
> As mentioned above, TAPI is used to read the content of dynamic library /
> shared library and generates a textual representation that can be used by
> the static linker. The current implementation reads MachO files, but it
> could be extended to also provide the same functionality for other object
> file formats.
>
> Framework / Dynamic Library Verification:
> The symbols that are exported from a dynamic library should ideally match,
> or at least contain, all the API that is specified in the associated header
> files. TAPI performs this verification by parsing the header files with
> CLANG and compare the findings to the exported symbols from the library.
>
> InstallAPI:
> InstallAPI is a new build phase that generates the TBD file from header
> files only. This allows a dependency of the library to build concurrently
> even before the library has been built itself. This can be used to increase
> parallelism in the build or larger projects or operating systems.
>
> Misc:
> - display and operate on TBD files
> - automatically generate API tests from header files
> - libtapi, which is used by the linker (ld64) to parse the TBD files
>

I'm interested in whether you plan to have this integrated in lld as well.
As far as I understand, this is going to be the de-facto way of
shipping for Mach-O binaries (at least, the ones released by Apple).
Please correct me if I'm wrong.
I tried to self-host lld on El Capitan and it fails because lld
doesn't really know about TBD files.
This, unfortunately, makes the linker not really usable for modern Mac
OS releases.

>
> The functionality of the tool is currently limited to Mach-O object files,
> but that is not a technical limitation. In making the tool open source I
> hope others will be able to take advantage of it too and extend its
> functionality to other object file formats.
>
>
> I initially developed the project as a CLANG project, but that was mostly
> for practical reasons (out-of-tree development, separate repo, etc). For the
> curious ones I pushed the repo to github (https://github.com/ributzka/tapi).
>
> I imagine, for example, that the reading/writing of TBD files is something
> that would fit better into the LLVM sources, which makes it available to
> other libraries and tools (e.g. LLVMObject, llvm-nm, lld, ...).
>
> I created a small patch that integrates it with llvm-nm and LLVMObject. This
> patch is not complete and I will split it up into smaller patches for
> review. I am providing it as a reference to get the discussion started.
>
> Please let me know what you think and bikeshed away :)
>
> Thanks
>
> Cheers,
> Juergen
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Open sourcing and contributing TAPI back to the LLVM community

Eric Fiselier via cfe-dev
On Thu, Sep 7, 2017 at 6:52 PM, Davide Italiano <[hidden email]> wrote:
On Thu, Sep 7, 2017 at 5:01 PM, Juergen Ributzka via llvm-dev
<[hidden email]> wrote:
> Hi @ll,
>
> Over the past years I have been looking into how to reduce the size of the
> SDK that ships with Xcode and how to improve build times for the overall OS
> inside Apple. The result is a tool called TAPI, which is used at Apple for
> all things related to text-based dynamic library files (.tbd).
>
> What are text-based dynamic library files?
> Text-based dynamic library files (TBDs) are a textual representation of the
> information in a dynamic library / shared library that is required by the
> static linker - basically a symbol list of the exported symbols.
>
> Apple’s SDKs originally used Mach-O Dynamic Library Stubs. Mach-O Dynamic
> Library Stubs are dynamic library files, but with all the text and data
> stripped out. TBD files were introduced to replaced Mach-O Dynamic Library
> Stub files in the SDK to further reduce its overall size.
>
> Over time the TAPI tool has grown and is used now in a variety of ways.
>
> Dynamic Library Stubbing:
> As mentioned above, TAPI is used to read the content of dynamic library /
> shared library and generates a textual representation that can be used by
> the static linker. The current implementation reads MachO files, but it
> could be extended to also provide the same functionality for other object
> file formats.
>
> Framework / Dynamic Library Verification:
> The symbols that are exported from a dynamic library should ideally match,
> or at least contain, all the API that is specified in the associated header
> files. TAPI performs this verification by parsing the header files with
> CLANG and compare the findings to the exported symbols from the library.
>
> InstallAPI:
> InstallAPI is a new build phase that generates the TBD file from header
> files only. This allows a dependency of the library to build concurrently
> even before the library has been built itself. This can be used to increase
> parallelism in the build or larger projects or operating systems.
>
> Misc:
> - display and operate on TBD files
> - automatically generate API tests from header files
> - libtapi, which is used by the linker (ld64) to parse the TBD files
>

I'm interested in whether you plan to have this integrated in lld as well.
As far as I understand, this is going to be the de-facto way of
shipping for Mach-O binaries (at least, the ones released by Apple).
Please correct me if I'm wrong.

Yes, this is already the de-facto way of shipping Mach-O files in the SDK. That means self-hosting LLD against the SDK is currently not possible. The system itself is obviously still shipping full Mach-O files in /System, so you should be still able to self-host against those file.

My plan is to integrate support for TBD files into all LLVM tools where it makes sense (including LLD). This is why I wanted to start to put the basic support into LLVM first, so it can be used by other tools and libraries.
 
I tried to self-host lld on El Capitan and it fails because lld
doesn't really know about TBD files.
This, unfortunately, makes the linker not really usable for modern Mac
OS releases.

>
> The functionality of the tool is currently limited to Mach-O object files,
> but that is not a technical limitation. In making the tool open source I
> hope others will be able to take advantage of it too and extend its
> functionality to other object file formats.
>
>
> I initially developed the project as a CLANG project, but that was mostly
> for practical reasons (out-of-tree development, separate repo, etc). For the
> curious ones I pushed the repo to github (https://github.com/ributzka/tapi).
>
> I imagine, for example, that the reading/writing of TBD files is something
> that would fit better into the LLVM sources, which makes it available to
> other libraries and tools (e.g. LLVMObject, llvm-nm, lld, ...).
>
> I created a small patch that integrates it with llvm-nm and LLVMObject. This
> patch is not complete and I will split it up into smaller patches for
> review. I am providing it as a reference to get the discussion started.
>
> Please let me know what you think and bikeshed away :)
>
> Thanks
>
> Cheers,
> Juergen
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare


_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Open sourcing and contributing TAPI back to the LLVM community

Eric Fiselier via cfe-dev
In reply to this post by Eric Fiselier via cfe-dev
Hi Paul,

My experience has shown the same when it comes to header files and I am not claiming this is going to work out of the box for all library projects. It usually requires some cleanup first and that is why the tool comes with a verification mode to make sure the headers are the truth. Also keep in mind that you don't have to parse all the headers, but only the small set that get installed as part of the library API.

The tool does not read the linker script / export file, because they are not necessarily the truth either and may have wildcards. In my view they are just one way of managing exported symbols. Another way, which I personally prefer, is to build with visibility hidden and annotate only the API with visibility default. That makes the headers the single source of what is API.

Cheers,
Juergen



On Fri, Sep 8, 2017 at 9:29 AM, Robinson, Paul <[hidden email]> wrote:
> InstallAPI:
> InstallAPI is a new build phase that generates the TBD file from header
> files only. This allows a dependency of the library to build concurrently
> even before the library has been built itself. This can be used to
> increase parallelism in the build or larger projects or operating systems.

My experience is that headers don't necessarily form the best source of
truth about the API exported from a library.  If you follow the Windows
model of marking exported APIs explicitly (declspec(dllexport) or something)
then okay, but that's a Windows extension and not common in other systems.
Linker scripts seem to be a more popular method; does the tool read linker
scripts to form the content of a TBD file?
Otherwise I'm not seeing a generic improvement in build parallelism.
--paulr




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Open sourcing and contributing TAPI back to the LLVM community

Eric Fiselier via cfe-dev
I think it makes sense to have support for this input format in the tools.  Since the macOS SDK is slowly switching to this, having the tools work out of the box is a nice feature.  It is rather convenient having a single toolset be sufficient to provide infrastructure for all the targets.

Saleem

On Fri, Sep 8, 2017 at 10:32 AM, Juergen Ributzka via cfe-dev <[hidden email]> wrote:
Hi Paul,

My experience has shown the same when it comes to header files and I am not claiming this is going to work out of the box for all library projects. It usually requires some cleanup first and that is why the tool comes with a verification mode to make sure the headers are the truth. Also keep in mind that you don't have to parse all the headers, but only the small set that get installed as part of the library API.

The tool does not read the linker script / export file, because they are not necessarily the truth either and may have wildcards. In my view they are just one way of managing exported symbols. Another way, which I personally prefer, is to build with visibility hidden and annotate only the API with visibility default. That makes the headers the single source of what is API.

Cheers,
Juergen



On Fri, Sep 8, 2017 at 9:29 AM, Robinson, Paul <[hidden email]> wrote:
> InstallAPI:
> InstallAPI is a new build phase that generates the TBD file from header
> files only. This allows a dependency of the library to build concurrently
> even before the library has been built itself. This can be used to
> increase parallelism in the build or larger projects or operating systems.

My experience is that headers don't necessarily form the best source of
truth about the API exported from a library.  If you follow the Windows
model of marking exported APIs explicitly (declspec(dllexport) or something)
then okay, but that's a Windows extension and not common in other systems.
Linker scripts seem to be a more popular method; does the tool read linker
scripts to form the content of a TBD file?
Otherwise I'm not seeing a generic improvement in build parallelism.
--paulr




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




--
Saleem Abdulrasool
compnerd (at) compnerd (dot) org

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] Open sourcing and contributing TAPI back to the LLVM community

Eric Fiselier via cfe-dev
In reply to this post by Eric Fiselier via cfe-dev
Hi Juergen,

At a minimum I think adding the support to libobject, etc so the various llvm tools can read or even write files from/for OSX should be fairly non-controversial so how about go ahead and do that first (I'll happily review if you'd like) and then we can go from there to do anything else with TAPI and llvm?

Sound good?

-eric

On Thu, Sep 7, 2017 at 5:01 PM Juergen Ributzka via cfe-dev <[hidden email]> wrote:
Hi @ll,

Over the past years I have been looking into how to reduce the size of the SDK that ships with Xcode and how to improve build times for the overall OS inside Apple. The result is a tool called TAPI, which is used at Apple for all things related to text-based dynamic library files (.tbd).

What are text-based dynamic library files?
Text-based dynamic library files (TBDs) are a textual representation of the information in a dynamic library / shared library that is required by the static linker - basically a symbol list of the exported symbols.

Apple’s SDKs originally used Mach-O Dynamic Library Stubs. Mach-O Dynamic Library Stubs are dynamic library files, but with all the text and data stripped out. TBD files were introduced to replaced Mach-O Dynamic Library Stub files in the SDK to further reduce its overall size.

Over time the TAPI tool has grown and is used now in a variety of ways.

Dynamic Library Stubbing:
As mentioned above, TAPI is used to read the content of dynamic library / shared library and generates a textual representation that can be used by the static linker. The current implementation reads MachO files, but it could be extended to also provide the same functionality for other object file formats.

Framework / Dynamic Library Verification:
The symbols that are exported from a dynamic library should ideally match, or at least contain, all the API that is specified in the associated header files. TAPI performs this verification by parsing the header files with CLANG and compare the findings to the exported symbols from the library.

InstallAPI:
InstallAPI is a new build phase that generates the TBD file from header files only. This allows a dependency of the library to build concurrently even before the library has been built itself. This can be used to increase parallelism in the build or larger projects or operating systems.

Misc:
- display and operate on TBD files
- automatically generate API tests from header files
- libtapi, which is used by the linker (ld64) to parse the TBD files


The functionality of the tool is currently limited to Mach-O object files, but that is not a technical limitation. In making the tool open source I hope others will be able to take advantage of it too and extend its functionality to other object file formats.


I initially developed the project as a CLANG project, but that was mostly for practical reasons (out-of-tree development, separate repo, etc). For the curious ones I pushed the repo to github (https://github.com/ributzka/tapi).

I imagine, for example, that the reading/writing of TBD files is something that would fit better into the LLVM sources, which makes it available to other libraries and tools (e.g. LLVMObject, llvm-nm, lld, ...).

I created a small patch that integrates it with llvm-nm and LLVMObject. This patch is not complete and I will split it up into smaller patches for review. I am providing it as a reference to get the discussion started.

Please let me know what you think and bikeshed away :)

Thanks

Cheers,
Juergen




_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev