[monorepo] Much improved downstream zipping tool available

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[monorepo] Much improved downstream zipping tool available

David Blaikie via cfe-dev
He all,

I've updated the downstream fork zipping tool that I posted about last
November [1].  It is much improved in every way.  The most important
enhancements are:

- Does a better job of simplifying history

- Handles nested submodules

- Will put non-submodule-update content in a subdirectory of the
  monorepo

- Updates tags

In addition there are plenty of the requisite bug fixes.  The latest
version of the tool can be found here:

https://github.com/greened/llvm-git-migration/tree/zip

With the nested submodules and the subdirectory features, the tool can
now take a downstream llvm repository with submodules (e.g. clang in
tools/clang and so on) as an umbrella and order the commits according to
changes in llvm and its submodules.

Björn, this new version may well be able to handle the tasks you
outlined in December [2].

I've written some recipes as proposed additions to the GitHub migration
proposal [3].  If you have a different scenario, please comment there
and if it seems a like a common case I can add a recipe for it so we can
all benefit from the learning.

Much of the bugfixing work was the result of some artificial histories I
created to shake out problems.  I believe it is ready for some testing
in the wild.  If you do try it, please let me know how it worked for you
and any problems you run into.  I will try to fix them.  It's easiest if
you can provide me with a test repository showing the problem but even a
verbal description of what is happening can help.

I hope this tool is helpful to the community.

                             -David

[1] http://lists.llvm.org/pipermail/llvm-dev/2018-November/127704.html
[2] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128620.html
[3] https://reviews.llvm.org/D56550
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Much improved downstream zipping tool available

David Blaikie via cfe-dev
Thanks for working on this David.

One problem that I've found with our downstream repos (and nestled
submodule structure) is that we haven't always been in sync when
updating the llvm and clang repos (considering svn-id:s).

Basically we can have a commit DL1 on our downstream llvm branch
being based on commit UL1 from the upstream llvm branch,
pointing out the submodule commit DC1 in clang (from downstream
branch) and that one could be based on UC1 from upstream clang
branch.

In the new monorepo UC1 may or may not be a parent to UL1.
We could actually have something like this:

  UL4->UC2->UL3->UL2->UL1->UL0->UC1

Our DL1 commit should preferably have UL1 as parent after
conversion

  UL4->UC2->UL3->UL2->UL1->UL0->UC1
                       |
                 ...->DL1

but since it also includes DC1 (via submodule reference) we
want to zip in DC1 before DL1, right?

  UL4->UC2->UL3->UL2->UL1->UL0->UC1
                       |
            ...->DC1->DL1

The problem is that DC1 is based on UC1, so we would get something
like this

  UL4->UC2->UL3->UL2->UL1->UL0->UC1
                       |         |
            ...->DC1->DL1        |
                  ^              |
                  |              |
                   --------------

Which is not correct, since then we also get the UL0 commit
as predecessor to DL1.


This make me wonder if zipping really is that interesting for
us. When checking out an old downstream commit like DL1 in the
monorepo I would not be certain that I see the same version of
clang as in the old split repos (with submodule updates).
Often it would be correct, but not always.

I'll take a look at your updated script to see if it would
make any sense for us to use it (to get some kind of zipped
history). Although, I got at feeling that doing the octopus
merge might be the simple solution for us. If we ever want to
build something old, we would use our old split repos. The
octopus merge would indicate how far back we can do bisects etc.
in the monorepo. Even with some kind of zipping it would be
hard to build/bisect older commits (on our downstream branches).

Regards,
Björn


> -----Original Message-----
> From: David Greene <[hidden email]>
> Sent: den 29 januari 2019 17:02
> To: [hidden email]; [hidden email]; openmp-
> [hidden email]; [hidden email]; libclc-
> [hidden email]; [hidden email]; [hidden email]
> Cc: Björn Pettersson A <[hidden email]>
> Subject: [monorepo] Much improved downstream zipping tool available
>
> He all,
>
> I've updated the downstream fork zipping tool that I posted about last
> November [1].  It is much improved in every way.  The most important
> enhancements are:
>
> - Does a better job of simplifying history
>
> - Handles nested submodules
>
> - Will put non-submodule-update content in a subdirectory of the
>   monorepo
>
> - Updates tags
>
> In addition there are plenty of the requisite bug fixes.  The latest
> version of the tool can be found here:
>
> https://github.com/greened/llvm-git-migration/tree/zip
>
> With the nested submodules and the subdirectory features, the tool can
> now take a downstream llvm repository with submodules (e.g. clang in
> tools/clang and so on) as an umbrella and order the commits according
> to
> changes in llvm and its submodules.
>
> Björn, this new version may well be able to handle the tasks you
> outlined in December [2].
>
> I've written some recipes as proposed additions to the GitHub migration
> proposal [3].  If you have a different scenario, please comment there
> and if it seems a like a common case I can add a recipe for it so we
> can
> all benefit from the learning.
>
> Much of the bugfixing work was the result of some artificial histories
> I
> created to shake out problems.  I believe it is ready for some testing
> in the wild.  If you do try it, please let me know how it worked for
> you
> and any problems you run into.  I will try to fix them.  It's easiest
> if
> you can provide me with a test repository showing the problem but even
> a
> verbal description of what is happening can help.
>
> I hope this tool is helpful to the community.
>
>                              -David
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2018-November/127704.html
> [2] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128620.html
> [3] https://reviews.llvm.org/D56550
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Much improved downstream zipping tool available

David Blaikie via cfe-dev
Björn Pettersson A <[hidden email]> writes:

> In the new monorepo UC1 may or may not be a parent to UL1.
> We could actually have something like this:
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>
> Our DL1 commit should preferably have UL1 as parent after
> conversion
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>                        |
>                  ...->DL1
>
> but since it also includes DC1 (via submodule reference) we
> want to zip in DC1 before DL1, right?
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>                        |
>             ...->DC1->DL1
>
> The problem is that DC1 is based on UC1, so we would get something
> like this
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>                        |         |
>             ...->DC1->DL1        |
>                   ^              |
>                   |              |
>                    --------------
>
> Which is not correct, since then we also get the UL0 commit
> as predecessor to DL1.

To be clear, is DC1 a commit that updates the clang submodule to UC1 and
DL1 a separate local commit to llvm that merges in UL1?

When zip-downstream-fork.py runs, it *always* uses the exact trees in
use by each downstream commit, whether from submodules or the umbrella
itself.  It tries very hard to maintain the state of the trees as they
appeared in the umbrella repository.

Since in your case llvm isn't a submodule (it's the "umbrella"), DL1
will absolutely have the tree from UL1, not UL0.  This is how
migrate-downstream-fork.py works and zip-downstream-fork.py won't touch
the llvm tree since it's not a submodule.  The commit DL1 doesn't update
any submodules so it will just use the clang tree from DC1.

I haven't tested this case explicitly but I would expect the resulting
history graph to look as you diagrammed above (reformatted to make it
clear there isn't a cycle):

   UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master
                        |         |
                        \         |
                         `-----------.        
                                  |   \
                           ... ->DC1->DL1 <- zip/master

The "redundant" edge here is indicating that the state of the llvm tree
at DL1 is based on UL1, not UL0.  All other projects will be in the
state at UC1 (assuming you don't have other submodules under llvm).  I
know it looks strange but this is the best I could come up with because
in general there is no guarantee that submodule updates were in any way
correlated with when upstream commits were made (as you discovered!).
There's some discussion of this issue on the documentation I posted [1],
as well as in header comments in zip-downstream-fork.py.

The difficulty with this is that going forward, if you merge from
monorepo/master git will think you already have the changes from UL0.
There are at least two ways to work around this issue.  The first is to
just manually apply the llvm diff from UL1 to UL0 on top of zip/master
and then merge from monorepo/master after that.  The other way is to
freeze your local split repositories and merge from the upstream split
masters for all subprojects before running migrate-downstream-fork.py
and zip-downstream-fork.py.  Then everything will have the most
up-to-date trees and you should be fine going forward.  Doing such a
merge isn't possible for everyone at the time they want to migrate, but
the manual diff/patch method should suffice for those situations.  You
just have to somehow remember to do it before the next merge from
upstream.  Creating an auxilliary branch with the patch applied is one
way to remember.

I haven't really thought of a better way to handle situations like this
so I'm open to ideas!

                           -David

[1] https://reviews.llvm.org/D56550
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Much improved downstream zipping tool available

David Blaikie via cfe-dev


> -----Original Message-----
> From: David Greene <[hidden email]>
> Sent: den 29 januari 2019 19:33
> To: Björn Pettersson A <[hidden email]>
> Cc: [hidden email]; [hidden email]; openmp-
> [hidden email]; [hidden email]; libclc-
> [hidden email]; [hidden email]; [hidden email]
> Subject: Re: [monorepo] Much improved downstream zipping tool available
>
> Björn Pettersson A <[hidden email]> writes:
>
> > In the new monorepo UC1 may or may not be a parent to UL1.
> > We could actually have something like this:
> >
> >   UL4->UC2->UL3->UL2->UL1->UL0->UC1
> >
> > Our DL1 commit should preferably have UL1 as parent after
> > conversion
> >
> >   UL4->UC2->UL3->UL2->UL1->UL0->UC1
> >                        |
> >                  ...->DL1
> >
> > but since it also includes DC1 (via submodule reference) we
> > want to zip in DC1 before DL1, right?
> >
> >   UL4->UC2->UL3->UL2->UL1->UL0->UC1
> >                        |
> >             ...->DC1->DL1
> >
> > The problem is that DC1 is based on UC1, so we would get something
> > like this
> >
> >   UL4->UC2->UL3->UL2->UL1->UL0->UC1
> >                        |         |
> >             ...->DC1->DL1        |
> >                   ^              |
> >                   |              |
> >                    --------------
> >
> > Which is not correct, since then we also get the UL0 commit
> > as predecessor to DL1.
>
> To be clear, is DC1 a commit that updates the clang submodule to UC1
> and
> DL1 a separate local commit to llvm that merges in UL1?


In llvm (split) we have:

  UL4->UL3->UL2->UL1->UL0
                   \
         ...->DL2->DL1

In clang (split) we have:

  UC4->UC3->UC2->UC1->UC0
                   \
         ...->DC2->DC1


DL1 is a commit that updates the clang submodule to DC1 (and in this
scenario at the same time merges UL1 and DL2 in llvm).


>
> When zip-downstream-fork.py runs, it *always* uses the exact trees in
> use by each downstream commit, whether from submodules or the umbrella
> itself.  It tries very hard to maintain the state of the trees as they
> appeared in the umbrella repository.
>
> Since in your case llvm isn't a submodule (it's the "umbrella"), DL1
> will absolutely have the tree from UL1, not UL0.  This is how
> migrate-downstream-fork.py works and zip-downstream-fork.py won't touch
> the llvm tree since it's not a submodule.  The commit DL1 doesn't
> update
> any submodules so it will just use the clang tree from DC1.
>
> I haven't tested this case explicitly but I would expect the resulting
> history graph to look as you diagrammed above (reformatted to make it
> clear there isn't a cycle):
>
>    UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master
>                         |         |
>                         \         |
>                          `-----------.
>                                   |   \
>                            ... ->DC1->DL1 <- zip/master
>
> The "redundant" edge here is indicating that the state of the llvm tree
> at DL1 is based on UL1, not UL0.  All other projects will be in the
> state at UC1 (assuming you don't have other submodules under llvm).  I
> know it looks strange but this is the best I could come up with because
> in general there is no guarantee that submodule updates were in any way
> correlated with when upstream commits were made (as you discovered!).
> There's some discussion of this issue on the documentation I posted
> [1],
> as well as in header comments in zip-downstream-fork.py.

How does git know that it should follow the parent relation from
DL1 to UL1 for the llvm subdir, and not the UL0->UC1->DC1->DL1
path? I mean, if I check out commit DC1 I will see the contribution
from UL0 in the llvm subdir, and DL1 includes the changes from DC1.

(I understand that we never really want to check out the old clang commits
after the migration, it will be the DLx commits that matters and that
should have a synced view between the different subdirs, it is also the
DLx commits that may have old release labels in our downstream release
track)


_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Much improved downstream zipping tool available

David Blaikie via cfe-dev
Björn Pettersson A <[hidden email]> writes:

> In llvm (split) we have:
>
>   UL4->UL3->UL2->UL1->UL0
>                    \
>          ...->DL2->DL1
>
> In clang (split) we have:
>
>   UC4->UC3->UC2->UC1->UC0
>                    \
>          ...->DC2->DC1
>
>
> DL1 is a commit that updates the clang submodule to DC1 (and in this
> scenario at the same time merges UL1 and DL2 in llvm).

Ok, in that case I would expect the resulting history to look like this:

    UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master
                         |         \
                         \          `---.
                          `------------. \
                                        \|
                            ... ->DL2->DL1/DC2 <- zip/master
                                        /
                            ... ->DC2--'

As a submodule update, DC1 is "inlined" into DL1 and its commit message
is appended to that of DL1.  I'm presuming here that llvm never updated
the clang submodule to DC2, so it remains an independent commit.

The inlining is done assuming that submodule updates represent a single
logical change.  Submodule updates are assumed to be related to whatever
changes happen in the umbrella so they all get smushed together into one
commit.

The edge UC1->DL1 represents the use of UC1 tree for every project
*except* llvm, because clang was a submodule of llvm (and updated to DC1
which merged UC1) and no other project was a submodule in llvm.  DL1
still has the llvm tree from UL1 plus any local changes you may have
made.

Admittedly, this is tricky to understand.  Believe me, there were a lot
of headaches involved trying to figure out what the right thing to do
is.  This is my best stab at that.

I don't think I have a test that creates this kind of graph.  It would
be interesting to see if it works.  :) At the moment I'm busy with other
things.  Give it a try and see if it does what you expect.

> How does git know that it should follow the parent relation from
> DL1 to UL1 for the llvm subdir, and not the UL0->UC1->DC1->DL1
> path? I mean, if I check out commit DC1 I will see the contribution
> from UL0 in the llvm subdir, and DL1 includes the changes from DC1.

With the history above this is no longer an issue since you can't check
out DC1 as such.  It's related to the llvm tree in DL1.

Let's say we have a commit DC3 and commit DL3 updated llvm's clang
submodule to DC3.  Commit DC4 was never referenced in a submodule
update.  The graph should then look like this:

    UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master
                         |         \
                         \          `-------.
                          `----------------. \
                                            \|
                       ... ->DL3/DC3->DL2->DL1/DC1 <- zip/master
                             /\             /
                 ... ->DC4--'  `--->DC2----'

DC3 is related to DL3 so it got inlined.  DC2 has an llvm tree based on
DL3.

Hopefully, this is now clear as mud.  :)

                             -David
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Much improved downstream zipping tool available

David Blaikie via cfe-dev


> -----Original Message-----
> From: David Greene <[hidden email]>
> Sent: den 30 januari 2019 22:44
> To: Björn Pettersson A <[hidden email]>
> Cc: [hidden email]; [hidden email]; openmp-
> [hidden email]; [hidden email]; libclc-
> [hidden email]; [hidden email]; [hidden email]
> Subject: Re: [monorepo] Much improved downstream zipping tool available
>
> Björn Pettersson A <[hidden email]> writes:
>
> > In llvm (split) we have:
> >
> >   UL4->UL3->UL2->UL1->UL0
> >                    \
> >          ...->DL2->DL1
> >
> > In clang (split) we have:
> >
> >   UC4->UC3->UC2->UC1->UC0
> >                    \
> >          ...->DC2->DC1
> >
> >
> > DL1 is a commit that updates the clang submodule to DC1 (and in this
> > scenario at the same time merges UL1 and DL2 in llvm).
>
> Ok, in that case I would expect the resulting history to look like
> this:
>
>     UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master
>                          |         \
>                          \          `---.
>                           `------------. \
>                                         \|
>                             ... ->DL2->DL1/DC2 <- zip/master
>                                         /
>                             ... ->DC2--'
>
> As a submodule update, DC1 is "inlined" into DL1 and its commit message
> is appended to that of DL1.  I'm presuming here that llvm never updated
> the clang submodule to DC2, so it remains an independent commit.
>
> The inlining is done assuming that submodule updates represent a single
> logical change.  Submodule updates are assumed to be related to
> whatever
> changes happen in the umbrella so they all get smushed together into
> one
> commit.
>
> The edge UC1->DL1 represents the use of UC1 tree for every project
> *except* llvm, because clang was a submodule of llvm (and updated to
> DC1
> which merged UC1) and no other project was a submodule in llvm.  DL1
> still has the llvm tree from UL1 plus any local changes you may have
> made.

I still do not understand how that actually works technically, but maybe
it does if you say so. But I also believe that "git log" etc on DL1/DC2
will show that commit UL0 is part of my tree (which it isn't?). This will
be really confusing when looking back at the history when debugging etc.

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [monorepo] Much improved downstream zipping tool available

David Blaikie via cfe-dev
Björn Pettersson A <[hidden email]> writes:

>> Ok, in that case I would expect the resulting history to look like
>> this:
>>
>>     UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master
>>                          |         \
>>                          \          `---.
>>                           `------------. \
>>                                         \|
>>                             ... ->DL2->DL1/DC2 <- zip/master
>>                                         /
>>                             ... ->DC2--'
>>
>
> I still do not understand how that actually works technically, but maybe
> it does if you say so. But I also believe that "git log" etc on DL1/DC2
> will show that commit UL0 is part of my tree (which it isn't?). This will
> be really confusing when looking back at the history when debugging etc.

Yes, it will look like UL0 is part of your tree.  The edge from
UL1->DL1, which looks redundant, is actually there as a visual reminder
of the state of the llvm tree.

Unfortunately, git just doesn't have a good way to express the kind of
history we're creating here.  Since redundant edges are oddball in git
and git itself never creates them, I thought it would be strange enough
to stick out as a reminder.

If there's some other way to express this (Git notes?  Commit message?)
that would be more helpful, I'd be happy to consider it.

                             -David
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev