[monorepo] Downstream branch zipping tool available

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[monorepo] Downstream branch zipping tool available

Kristóf Umann via cfe-dev
Building on the great work that James Knight did on
migrate-downstream-fork.py (Thanks, James!) [1], I've created a simple
tool to take migrated downstream fork branches and zip them into a
single history given a history containing submodule updates of
subprojects [2].

With migrate-downstream-fork.py, one is left with a set of unrelated
histories, one per subproject:

llvm                             clang                       compiler-rt
* V Add my fancy LLVM feature    * G Fix my dumb clang bug   * Z Merge from upstream compiler-rt

One can do an octopus merge to unify them:

  *-- Merge llvm, clang and compiler-rt
  |\ \
  * \ \  V Add my fancy LLVM feature
  |  * |  G Fix my dumb clang bug
  |  | *  Z Merge from upstream compiler-rt

Unfortunately, that doesn't show the logical history of development,
where changes were effectively applied to subprojects in a linear
fashion.  This makes it more difficult to do bisects, among other things
because none of the downstream integration happens until the octopus
merge.

Let's say that downstream you have a local mirror for each LLVM
subproject you work on.  Suppose also that you have an "umbrella"
repository that holds submodule references to all those local mirrors.
Various commits in the umbrella update submodule references:

  * Update llvm submodule to V
  * Update clang submodule to G
  * Don't update any submodules, fix scripts or something
  * Update compiler-rt submodule to Z
  |

zip-downstream-fork.py will take these submodule updates and "inline"
them into the umbrella history, making it appear that the downstream
commits were applied against the monorepo in the order implied by the
umbrella history:

  * A Add my fancy LLVM feature
  * B Fix my dumb clang bug
  * C Merge from upstream compiler-rt
  |

Parent relationships for merges from upstream are preserved, though as
top-level comments in zip-downstream-fork.py explain, the history graph
can look a little strange.  Commits that don't update submodules are
skipped on the assumption that they modify things uninteresting to a
monorepo history.  Such commits could be preserved but doing so has some
caveats as explained in the comments.  Perhaps your umbrella repository
holds your build scripts.  You'd probably want to migrate that to the
zipped history.  If there's strong demand for this I could look into
doing it.

There are various other limitations to the tool explained in the
comments.  It was enough to get us going and I'm hopeful it will be
useful for others.  It seems to do the right thing with our repositories
but YMMV.  Feel free to open PRs with bug fixes.  :)

To get this to work, you'll need to apply a PR for
migrate-downstream-fork.py to fix issues with --revmap-out [3].

                        -David

[1] https://github.com/jyknight/llvm-git-migration/blob/master/migrate-downstream-fork.py
[2] https://github.com/jyknight/llvm-git-migration/pull/2/commits/a3b44a294c20f1762cb42b5794e6130c5b27f22d
[3] https://github.com/jyknight/llvm-git-migration/pull/1
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Downstream branch zipping tool available

Kristóf Umann via cfe-dev
Hi David.

Thanks for sharing your branch zipping migration script.


Unfortunately I think our situation is a little bit more complicated.

We have used llvm as the umbrella repo, so in llvm we have a "master"
branch (from the git single repo version of llvm) and a couple of
downstream branches (let's call them "down0", "down1") containing our
downstream work (with frequent merges from "master").
The downstream branches has tools/clang and runtimes/compiler-rt as
submodules, as well as a couple of downstream submodules.

In our downstream version of clang we have a similar structure.
A "master" branch (mapping to the git single repo version clang),
and a couple of downstream branches. The downstream branches has
tools/extra (i.e. clang-tools-extra) as a submodule.

I can also mention that the clang, compiler-rt and clang-tools-extra
submodules aren't present from the beginning of history. They have
been added later on.


I doubt that zip-downstream-fork.py will work out-of-the-box.
Hopefully I'll be able to patch it for our scenario. Any guidelines
might be helpful. But maybe it isn't even worth trying to adapt
zip-downstream-fork.py to do something useful for our scenario?


If someone else got a similar scenario, let me know. Perhaps we can
do some joint effort in adapting the zipper script.

Regards,
Björn



> -----Original Message-----
> From: llvm-dev <[hidden email]> On Behalf Of David
> Greene via llvm-dev
> Sent: den 12 november 2018 22:27
> To: [hidden email]; [hidden email]; libcxx-
> [hidden email]; [hidden email]; [hidden email];
> [hidden email]; [hidden email]
> Subject: [llvm-dev] [monorepo] Downstream branch zipping tool available
>
> Building on the great work that James Knight did on
> migrate-downstream-fork.py (Thanks, James!) [1], I've created a simple
> tool to take migrated downstream fork branches and zip them into a
> single history given a history containing submodule updates of
> subprojects [2].
>
> With migrate-downstream-fork.py, one is left with a set of unrelated
> histories, one per subproject:
>
> llvm                             clang                       compiler-rt
> * V Add my fancy LLVM feature    * G Fix my dumb clang bug   * Z Merge
> from upstream compiler-rt
>
> One can do an octopus merge to unify them:
>
>   *-- Merge llvm, clang and compiler-rt
>   |\ \
>   * \ \  V Add my fancy LLVM feature
>   |  * |  G Fix my dumb clang bug
>   |  | *  Z Merge from upstream compiler-rt
>
> Unfortunately, that doesn't show the logical history of development,
> where changes were effectively applied to subprojects in a linear
> fashion.  This makes it more difficult to do bisects, among other things
> because none of the downstream integration happens until the octopus
> merge.
>
> Let's say that downstream you have a local mirror for each LLVM
> subproject you work on.  Suppose also that you have an "umbrella"
> repository that holds submodule references to all those local mirrors.
> Various commits in the umbrella update submodule references:
>
>   * Update llvm submodule to V
>   * Update clang submodule to G
>   * Don't update any submodules, fix scripts or something
>   * Update compiler-rt submodule to Z
>   |
>
> zip-downstream-fork.py will take these submodule updates and "inline"
> them into the umbrella history, making it appear that the downstream
> commits were applied against the monorepo in the order implied by the
> umbrella history:
>
>   * A Add my fancy LLVM feature
>   * B Fix my dumb clang bug
>   * C Merge from upstream compiler-rt
>   |
>
> Parent relationships for merges from upstream are preserved, though as
> top-level comments in zip-downstream-fork.py explain, the history graph
> can look a little strange.  Commits that don't update submodules are
> skipped on the assumption that they modify things uninteresting to a
> monorepo history.  Such commits could be preserved but doing so has some
> caveats as explained in the comments.  Perhaps your umbrella repository
> holds your build scripts.  You'd probably want to migrate that to the
> zipped history.  If there's strong demand for this I could look into
> doing it.
>
> There are various other limitations to the tool explained in the
> comments.  It was enough to get us going and I'm hopeful it will be
> useful for others.  It seems to do the right thing with our repositories
> but YMMV.  Feel free to open PRs with bug fixes.  :)
>
> To get this to work, you'll need to apply a PR for
> migrate-downstream-fork.py to fix issues with --revmap-out [3].
>
>                         -David
>
> [1] https://github.com/jyknight/llvm-git-migration/blob/master/migrate-
> downstream-fork.py
> [2] https://github.com/jyknight/llvm-git-
> migration/pull/2/commits/a3b44a294c20f1762cb42b5794e6130c5b27f22d
> [3] https://github.com/jyknight/llvm-git-migration/pull/1
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Downstream branch zipping tool available

Kristóf Umann via cfe-dev
Björn Pettersson A <[hidden email]> writes:

> We have used llvm as the umbrella repo, so in llvm we have a "master"
> branch (from the git single repo version of llvm) and a couple of
> downstream branches (let's call them "down0", "down1") containing our
> downstream work (with frequent merges from "master").

Ok.

> The downstream branches has tools/clang and runtimes/compiler-rt as
> submodules, as well as a couple of downstream submodules.

Ok.

> In our downstream version of clang we have a similar structure.
> A "master" branch (mapping to the git single repo version clang),
> and a couple of downstream branches. The downstream branches has
> tools/extra (i.e. clang-tools-extra) as a submodule.

So the clang submodule in llvm has a submodule itself?  I wasn't even
aware that was possible.

> I can also mention that the clang, compiler-rt and clang-tools-extra
> submodules aren't present from the beginning of history. They have
> been added later on.

That shouldn't be a problem for the script.  We have the same sort of
history.

> I doubt that zip-downstream-fork.py will work out-of-the-box.
> Hopefully I'll be able to patch it for our scenario. Any guidelines
> might be helpful. But maybe it isn't even worth trying to adapt
> zip-downstream-fork.py to do something useful for our scenario?

Yeah, non-submodule-update commits in the llvm repository would be
droppped per this comment:

# - The script assumes that any commits in the umbrella history that
#   do not update submodules should be discarded.  It is not clear
#   what should happen if such a commit happens to touch files with
#   the same name as those in the monorepo (README files are typical).
#   Adding support to keep these commits should be straightforward,
#   but because decisions are likely to vary based on particular
#   setups, we just punt for now.

This happens around line 288 in zip-downstream-fork.py:

    if self.prev_submodules == submodules:
      # This is a commit that modified some file in the umbrella and
      # didn't update any submodules..  Assume we don't want it.
      self.debug('No submodule updates')
      return self.substitute_commit(commit, githash)

If you return commit here instead of doing substitute_commit it should
retain the commit unaltered.  That's not quite what you want for the
monorepo, you want commits to llvm to appear under the llvm directory in
the monorepo.  The code to do that is in migrate-downstream-fork.py
arount line 106 in commit_filter:

    # OK -- NOT an upstream commit: move the tree under the correct subdir, and
    # preserve everything outside that subdir.  The tricky part is figuring out
    # *which* parent to get the rest of the tree (other than the named subproject)
    # from, in case of a merge.

You could try to copy this verbatim into zip-downstream-fork.py or it
could be factored out into a common library.  If a significant number of
people have a setup similar to yours, it may very well be worth doing
that.  You'd also need to add the check for upstream commits.

Now that I think about it, what you really want is something that runs
migrate-downstream-fork.py on the commits in llvm and something that
runs zip-downstream-fork.py on commits in other projects, but they have
to ruin simultaneously to keep the commits in the proper order.  If both
migrate-downstream-fork.py and zip-downstream-fork.py were refactored to
put most of their code in a package/library, then a third tool could be
created to do what you need.  Obviously, that will take some work to
accomplish.  You'd also want James' guidance on changing
migrate-downstream-fork.py.  There are certain enhancements to
zip-downstream-fork.py that I didn't make because I didn't want to mess
with migrate-downstream-fork.py (see the comments at the top of
zip-downstream-fork.py).

zip-downstream-fork.py also doesn't consider submodules of other
submodules.  You can maybe get that to work by altering how
find_submodules looks for submodule commits.  It would have to recurse
over the submodules it finds.

> If someone else got a similar scenario, let me know. Perhaps we can
> do some joint effort in adapting the zipper script.

Unfortunately, I don't have any bandwidth to hack on this right now.
I'm happy to answer questions, though.

                               -David
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [monorepo] Downstream branch zipping tool available

Kristóf Umann via cfe-dev
David Greene via llvm-dev <[hidden email]> writes:

> Now that I think about it, what you really want is something that runs
> migrate-downstream-fork.py on the commits in llvm and something that
> runs zip-downstream-fork.py on commits in other projects, but they have
> to ruin simultaneously to keep the commits in the proper order.

After pondering this overnight, I think a better approach might be to do
the enhancement of zip-downstream-fork.py described in the comments:

# - The script requires a history with submodule updates.  It should
#   be fairly straightforward to enhance the script to take a revlist
#   directly, ordering the commits according to the revlist.  Such a
#   revlist could be generated from an umbrella repository or via
#   site-specific mechanisms.  This would be passed to
#   fast_filter_branch.py directly, rather than generating a list via
#   expand_ref_pattern(self.reflist) in Zipper.run as is currently
#   done.  Changes would need to be made to fast_filter_branch.py to
#   accept a revlist to process directly, bypassing its invocation of
#   git rev-list within do_filter.

If that were done, then it should be possible to write a tool to
generate such a revlist from your llvm master project.  The tool would
examine each commit in llvm and if it were a commit to llvm, it would
add its hash to the revlist.  If it were a submodule update it would
traverse the gitlink to find the commit in the corresponding project
(see find_submodules_in_entry in zip-downstream-fork.py).  It would then
add that commit's hash to the revlist.  If a commit updates multiple
submodules then you just have to pick an arbitrary order.

All of the code to do the traversal is already in
zip-downstream-fork.py.  You could enhance it to output a revlist in the
same way fast_filter_branch can output a revmap and have it not actually
rewrite any commits.  You would have to tell it to not skip
non-submodule-update commits as described in my previous message.

This all assumes that each submodule update only adds one new commit
from the project linked by the submodule (zip-downstream-fork.py also
makes this assumption).  If a submodule update represents moving a
submodule up multiple commits, then you'd need something that can walk
that history and add hashes to the revlist.

The more I think about it, the more it seems to me that this is the
easiest way to go.  It's much less work that refactoring two tools and
should require relatively minimal changes to migrate-downstream-fork.py.

                            -David
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev