RFC: Bugzilla migration plan

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: Bugzilla migration plan

David Chisnall via cfe-dev
Dear all,

Over the last few weeks with the help of GH folks I've been exploring
the options of Bugzilla migration. I believe finally we came to the
viable solution which is detailed below.

It turned out that GitHub has an internal project rehydration tool
that could be used to populate the empty repo contents from the simple
serialized format. There is a big advantage of this approach as
compared to using GH API as we are not bound to various thresholds and
throttling limits (remember, that we need to import 35k+ bz issues).
The downside is that such rehydration requires the empty repo and we
cannot delete the current llvm-project: this way we will lose
releases, fork connections, stars and watches. Unfortunately, there is
no way to recreate releases while keeping the origins dates, so this
is a no-go for us. Losing forks connections would strongly affect
downstream users as well. This allowed to formulate the following
scheme:

1. Migrate Bugzilla to a new repo, say, llvm-bugzilla-import using the
internal storage format.
2. Install redirects llvm.org/PR1234 => gh/llvm/llvm-bugzilla-import/issues/1234
3. Wipe existing issues and pull requests
4. Migrate all issues from llvm-bugzilla-import to llvm-project using
GH API. Github will take about llvm-bugzilla-import/issues/1234 =>
llvm-project/issues/5678 redirects

The only downside of this approach is that we will be seeing 30k
events like "llvm-bugzilla-import/issues/1234 migrated to
llvm-project/issues/5678".

Here is the tentative timeline / list of action points:

1. Collect the mapping email (used by bugzilla) => GH account name
(used by issues). We are going to collect using different sources:
  - Auto-populating the mapping from the list of known committers
  - Asking GH API (works only if a person made their email public and
only when allowed by local law)
  - Emailing everyone who submitted to Bugzilla over last year or
maybe two asking to fill in the form with the GH username
  - We would likely allow a month or so to let everyone respond.
2. While 1. is in progress, we will work on various format issues for
migration. For this we will use probable first 1k issues or so. It
would be nice to include some meta-bugs here to ensure we could
re-recreate issues. Things to consider:
  - Comment migration (GH uses markdown everywhere, so we'd need to
carefully escape bugzilla contents)
  - Components => labels mapping and migration
  - Linking between the issues. Maybe automatically replace PR1234 in
the text with #1234 to enable auto linking.
  - Authorship: reporter / commenter
  - Attaches
3. After we are sure everyone is ready, we will do the test migration
of the whole bugzilla.
  - Estimate the necessary time it would be required to make such a transition.
  - Fix remaining issues, if any
4. Put bugzilla into read-only mode and perform the final migration to
llvm-bugzilla-archive
5. Wipe issues / PRs in llvm-project repo and perform migration from
llvm-bugzilla-archive to llvm-project
6. Migration done. Probably bugzilla will be kept in read-only mode
for some time just for the sake of consistency and should any issues
be found.

Any comments & ideas?
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Bugzilla migration plan

David Chisnall via cfe-dev
Thank you very much for all the work on this Anton!

The steps outlined seem like it should work.

If I remember correctly, one of the main concerns here is making sure that one can still easily find issues based on existing bugzilla IDs in existing comments, commit messages etc.
Do I understand correctly that after the migration, for an existing reference to "PR1234", you'll need to go to https://github.com/llvm/llvm-bugzilla-import/issues/1234 to find it (after our bugzilla server has been shut down)?
That seems workable if we document that well.

One other area where I thought there was quite a bit of debate was about how components will map to labels; mainly triggered because current bug triagers and watchers are looking for how they will be able to set up filters to see the bug updates they are interested in.
I wonder what the most recent thinking is on that?

Thanks,

Kristof


Op vr 10 jul. 2020 om 10:11 schreef Anton Korobeynikov via cfe-dev <[hidden email]>:
Dear all,

Over the last few weeks with the help of GH folks I've been exploring
the options of Bugzilla migration. I believe finally we came to the
viable solution which is detailed below.

It turned out that GitHub has an internal project rehydration tool
that could be used to populate the empty repo contents from the simple
serialized format. There is a big advantage of this approach as
compared to using GH API as we are not bound to various thresholds and
throttling limits (remember, that we need to import 35k+ bz issues).
The downside is that such rehydration requires the empty repo and we
cannot delete the current llvm-project: this way we will lose
releases, fork connections, stars and watches. Unfortunately, there is
no way to recreate releases while keeping the origins dates, so this
is a no-go for us. Losing forks connections would strongly affect
downstream users as well. This allowed to formulate the following
scheme:

1. Migrate Bugzilla to a new repo, say, llvm-bugzilla-import using the
internal storage format.
2. Install redirects llvm.org/PR1234 => gh/llvm/llvm-bugzilla-import/issues/1234
3. Wipe existing issues and pull requests
4. Migrate all issues from llvm-bugzilla-import to llvm-project using
GH API. Github will take about llvm-bugzilla-import/issues/1234 =>
llvm-project/issues/5678 redirects

The only downside of this approach is that we will be seeing 30k
events like "llvm-bugzilla-import/issues/1234 migrated to
llvm-project/issues/5678".

Here is the tentative timeline / list of action points:

1. Collect the mapping email (used by bugzilla) => GH account name
(used by issues). We are going to collect using different sources:
  - Auto-populating the mapping from the list of known committers
  - Asking GH API (works only if a person made their email public and
only when allowed by local law)
  - Emailing everyone who submitted to Bugzilla over last year or
maybe two asking to fill in the form with the GH username
  - We would likely allow a month or so to let everyone respond.
2. While 1. is in progress, we will work on various format issues for
migration. For this we will use probable first 1k issues or so. It
would be nice to include some meta-bugs here to ensure we could
re-recreate issues. Things to consider:
  - Comment migration (GH uses markdown everywhere, so we'd need to
carefully escape bugzilla contents)
  - Components => labels mapping and migration
  - Linking between the issues. Maybe automatically replace PR1234 in
the text with #1234 to enable auto linking.
  - Authorship: reporter / commenter
  - Attaches
3. After we are sure everyone is ready, we will do the test migration
of the whole bugzilla.
  - Estimate the necessary time it would be required to make such a transition.
  - Fix remaining issues, if any
4. Put bugzilla into read-only mode and perform the final migration to
llvm-bugzilla-archive
5. Wipe issues / PRs in llvm-project repo and perform migration from
llvm-bugzilla-archive to llvm-project
6. Migration done. Probably bugzilla will be kept in read-only mode
for some time just for the sake of consistency and should any issues
be found.

Any comments & ideas?
--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Bugzilla migration plan

David Chisnall via cfe-dev
Hi Kristof,

> If I remember correctly, one of the main concerns here is making sure that one can still easily find issues based on existing bugzilla IDs in existing comments, commit messages etc.
> Do I understand correctly that after the migration, for an existing reference to "PR1234", you'll need to go to https://github.com/llvm/llvm-bugzilla-import/issues/1234 to find it (after our bugzilla server has been shut down)?
> That seems workable if we document that well.
Oh, maybe I was not clear

- We will install llvm.org side redirect. So, links like
llvm.org/PR1234 will redirect to
https://github.com/llvm/llvm-bugzilla-import/issues/1234
- Since we will migrate issue to llvm-project, github will redirect by
itself from https://github.com/llvm/llvm-bugzilla-import/issues/1234
to https://github.com/llvm/llvm-project/issues/XYZ for whatever value
XYZ will be.

During the bugzilla import we will also replace PR1234 in the comment
text to #1234 and github during the migration will properly rewrite
these references to llvm-project/issues/XYZ ones.

> One other area where I thought there was quite a bit of debate was about how components will map to labels; mainly triggered because current bug triagers and watchers are looking for how they will be able to set up filters to see the bug updates they are interested in.
> I wonder what the most recent thinking is on that?
I made the first set of labels (and they are outlined in the google
doc I made previously) basing the list of components / products we're
having in Bugzilla

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Bugzilla migration plan

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
On Fri, 10 Jul 2020 at 09:11, Anton Korobeynikov via llvm-dev
<[hidden email]> wrote:
> 3. Wipe existing issues and pull requests

Does this really wipes the "auto-increment" IDs used by PRs and issues
and starts from zero again?

> 4. Migrate all issues from llvm-bugzilla-import to llvm-project using
> GH API. Github will take about llvm-bugzilla-import/issues/1234 =>
> llvm-project/issues/5678 redirects

If we're setting a redirect, PR1234 wouldn't hit #5678. We either
guarantee that the IDs will be identical or we'll need a smart
redirect that will know the delta (or 1:1 relationship).
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Bugzilla migration plan

David Chisnall via cfe-dev
> On Fri, 10 Jul 2020 at 09:11, Anton Korobeynikov via llvm-dev
> <[hidden email]> wrote:
> > 3. Wipe existing issues and pull requests
> Does this really wipes the "auto-increment" IDs used by PRs and issues
> and starts from zero again?
I will need to clarify whether we will be able to reset the counter or not

> > 4. Migrate all issues from llvm-bugzilla-import to llvm-project using
> > GH API. Github will take about llvm-bugzilla-import/issues/1234 =>
> > llvm-project/issues/5678 redirects
> If we're setting a redirect, PR1234 wouldn't hit #5678. We either
> guarantee that the IDs will be identical or we'll need a smart
> redirect that will know the delta (or 1:1 relationship).
Why? If you migrate the issue inside GH, then GH does the necessary
redirects on its side.

So, we will be taking care about PR1234 =>
llvm-bugzilla-import/issues/1234 redirect and github will further
redirect from llvm-bugzilla-import/issues/1234 to
llvm-project/issues/5678.

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Bugzilla migration plan

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
If I recall correctly, the previous discussion had a fair bit of
pushback on having two numbering systems. In part because the old bug
numbers are littered throughout the codebase, commit messages, etc,
and aren't always prefixed with "PR", sometimes just "bug XXXX" -
having two numberings is likely to mean some amount of
confusiong/friction/ongoing cost to looking in multiple places for
bugs.

(but I'm not personally holding this process up on that issue - I
think I can live with it, if that's what those with the
time/inclination to make this migration happen decide is the right
tradeoff)

On Fri, Jul 10, 2020 at 1:11 AM Anton Korobeynikov via cfe-dev
<[hidden email]> wrote:

>
> Dear all,
>
> Over the last few weeks with the help of GH folks I've been exploring
> the options of Bugzilla migration. I believe finally we came to the
> viable solution which is detailed below.
>
> It turned out that GitHub has an internal project rehydration tool
> that could be used to populate the empty repo contents from the simple
> serialized format. There is a big advantage of this approach as
> compared to using GH API as we are not bound to various thresholds and
> throttling limits (remember, that we need to import 35k+ bz issues).
> The downside is that such rehydration requires the empty repo and we
> cannot delete the current llvm-project: this way we will lose
> releases, fork connections, stars and watches. Unfortunately, there is
> no way to recreate releases while keeping the origins dates, so this
> is a no-go for us. Losing forks connections would strongly affect
> downstream users as well. This allowed to formulate the following
> scheme:
>
> 1. Migrate Bugzilla to a new repo, say, llvm-bugzilla-import using the
> internal storage format.
> 2. Install redirects llvm.org/PR1234 => gh/llvm/llvm-bugzilla-import/issues/1234
> 3. Wipe existing issues and pull requests
> 4. Migrate all issues from llvm-bugzilla-import to llvm-project using
> GH API. Github will take about llvm-bugzilla-import/issues/1234 =>
> llvm-project/issues/5678 redirects
>
> The only downside of this approach is that we will be seeing 30k
> events like "llvm-bugzilla-import/issues/1234 migrated to
> llvm-project/issues/5678".
>
> Here is the tentative timeline / list of action points:
>
> 1. Collect the mapping email (used by bugzilla) => GH account name
> (used by issues). We are going to collect using different sources:
>   - Auto-populating the mapping from the list of known committers
>   - Asking GH API (works only if a person made their email public and
> only when allowed by local law)
>   - Emailing everyone who submitted to Bugzilla over last year or
> maybe two asking to fill in the form with the GH username
>   - We would likely allow a month or so to let everyone respond.
> 2. While 1. is in progress, we will work on various format issues for
> migration. For this we will use probable first 1k issues or so. It
> would be nice to include some meta-bugs here to ensure we could
> re-recreate issues. Things to consider:
>   - Comment migration (GH uses markdown everywhere, so we'd need to
> carefully escape bugzilla contents)
>   - Components => labels mapping and migration
>   - Linking between the issues. Maybe automatically replace PR1234 in
> the text with #1234 to enable auto linking.
>   - Authorship: reporter / commenter
>   - Attaches
> 3. After we are sure everyone is ready, we will do the test migration
> of the whole bugzilla.
>   - Estimate the necessary time it would be required to make such a transition.
>   - Fix remaining issues, if any
> 4. Put bugzilla into read-only mode and perform the final migration to
> llvm-bugzilla-archive
> 5. Wipe issues / PRs in llvm-project repo and perform migration from
> llvm-bugzilla-archive to llvm-project
> 6. Migration done. Probably bugzilla will be kept in read-only mode
> for some time just for the sake of consistency and should any issues
> be found.
>
> Any comments & ideas?
> --
> With best regards, Anton Korobeynikov
> Department of Statistical Modelling, Saint Petersburg State University
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Bugzilla migration plan

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
On Fri, 10 Jul 2020 at 16:20, Anton Korobeynikov
<[hidden email]> wrote:
> Why? If you migrate the issue inside GH, then GH does the necessary
> redirects on its side.
>
> So, we will be taking care about PR1234 =>
> llvm-bugzilla-import/issues/1234 redirect and github will further
> redirect from llvm-bugzilla-import/issues/1234 to
> llvm-project/issues/5678.

David's reply is what I was referring to. All other PRXXX references
all over the place that won't change.

Over time, the bugs alive today will end up with two references, 1234
and 5678 and it will be confusing which bug you're talking about.

However, like David, I don't think that's a blocker. I expect the two
ranges to be disjoint, so no clashes.

LGTM otherwise. While I weirdly like Bugzilla, I think we'll be much
better off using anything else.

cheers,
--renato
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Bugzilla migration plan

David Chisnall via cfe-dev
> Over time, the bugs alive today will end up with two references, 1234
> and 5678 and it will be confusing which bug you're talking about.
Yes, I agree, this might be a problem, however I'm not 100% sure we
could reset the counter. Still we will see if we could ensure the same
numbering during the migration (e.g just keeping first ~250 bz issues
in the archive and try to continue the numbering in the llvm-project
repo). But at least the redirect would provide the mapping as a last
resort.

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Bugzilla migration plan

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
On 7/10/20 1:12 PM, David Blaikie via cfe-dev wrote:
> If I recall correctly, the previous discussion had a fair bit of
> pushback on having two numbering systems. In part because the old bug
> numbers are littered throughout the codebase, commit messages, etc,
> and aren't always prefixed with "PR", sometimes just "bug XXXX" -
> having two numberings is likely to mean some amount of
> confusiong/friction/ongoing cost to looking in multiple places for
> bugs.
>

This was my take away from the discussions too.  I think it's important
that we try to preserve the single numbering system.

-Tom

> (but I'm not personally holding this process up on that issue - I
> think I can live with it, if that's what those with the
> time/inclination to make this migration happen decide is the right
> tradeoff)
>
> On Fri, Jul 10, 2020 at 1:11 AM Anton Korobeynikov via cfe-dev
> <[hidden email]> wrote:
>>
>> Dear all,
>>
>> Over the last few weeks with the help of GH folks I've been exploring
>> the options of Bugzilla migration. I believe finally we came to the
>> viable solution which is detailed below.
>>
>> It turned out that GitHub has an internal project rehydration tool
>> that could be used to populate the empty repo contents from the simple
>> serialized format. There is a big advantage of this approach as
>> compared to using GH API as we are not bound to various thresholds and
>> throttling limits (remember, that we need to import 35k+ bz issues).
>> The downside is that such rehydration requires the empty repo and we
>> cannot delete the current llvm-project: this way we will lose
>> releases, fork connections, stars and watches. Unfortunately, there is
>> no way to recreate releases while keeping the origins dates, so this
>> is a no-go for us. Losing forks connections would strongly affect
>> downstream users as well. This allowed to formulate the following
>> scheme:
>>
>> 1. Migrate Bugzilla to a new repo, say, llvm-bugzilla-import using the
>> internal storage format.
>> 2. Install redirects llvm.org/PR1234 => gh/llvm/llvm-bugzilla-import/issues/1234
>> 3. Wipe existing issues and pull requests
>> 4. Migrate all issues from llvm-bugzilla-import to llvm-project using
>> GH API. Github will take about llvm-bugzilla-import/issues/1234 =>
>> llvm-project/issues/5678 redirects
>>
>> The only downside of this approach is that we will be seeing 30k
>> events like "llvm-bugzilla-import/issues/1234 migrated to
>> llvm-project/issues/5678".
>>
>> Here is the tentative timeline / list of action points:
>>
>> 1. Collect the mapping email (used by bugzilla) => GH account name
>> (used by issues). We are going to collect using different sources:
>>    - Auto-populating the mapping from the list of known committers
>>    - Asking GH API (works only if a person made their email public and
>> only when allowed by local law)
>>    - Emailing everyone who submitted to Bugzilla over last year or
>> maybe two asking to fill in the form with the GH username
>>    - We would likely allow a month or so to let everyone respond.
>> 2. While 1. is in progress, we will work on various format issues for
>> migration. For this we will use probable first 1k issues or so. It
>> would be nice to include some meta-bugs here to ensure we could
>> re-recreate issues. Things to consider:
>>    - Comment migration (GH uses markdown everywhere, so we'd need to
>> carefully escape bugzilla contents)
>>    - Components => labels mapping and migration
>>    - Linking between the issues. Maybe automatically replace PR1234 in
>> the text with #1234 to enable auto linking.
>>    - Authorship: reporter / commenter
>>    - Attaches
>> 3. After we are sure everyone is ready, we will do the test migration
>> of the whole bugzilla.
>>    - Estimate the necessary time it would be required to make such a transition.
>>    - Fix remaining issues, if any
>> 4. Put bugzilla into read-only mode and perform the final migration to
>> llvm-bugzilla-archive
>> 5. Wipe issues / PRs in llvm-project repo and perform migration from
>> llvm-bugzilla-archive to llvm-project
>> 6. Migration done. Probably bugzilla will be kept in read-only mode
>> for some time just for the sake of consistency and should any issues
>> be found.
>>
>> Any comments & ideas?
>> --
>> With best regards, Anton Korobeynikov
>> Department of Statistical Modelling, Saint Petersburg State University
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Bugzilla migration plan

David Chisnall via cfe-dev
In reply to this post by David Chisnall via cfe-dev
On Fri, Jul 10, 2020 at 10:47 AM Anton Korobeynikov via llvm-dev
<[hidden email]> wrote:
>
> > Over time, the bugs alive today will end up with two references, 1234
> > and 5678 and it will be confusing which bug you're talking about.
> Yes, I agree, this might be a problem, however I'm not 100% sure we
> could reset the counter. Still we will see if we could ensure the same
> numbering during the migration (e.g just keeping first ~250 bz issues
> in the archive and try to continue the numbering in the llvm-project
> repo). But at least the redirect would provide the mapping as a last
> resort.

I believe that was the sort of notion proposed (maybe by James Y
Knight?) as one solution in the previous thread - importing so they
line up, and potentially rewriting those first 250 (refiling the
original github ones on top of the imported ones - then rewriting the
original github ones to be the original bugzilla ones - apparently
with some loss of fidelity (pull requests/etc use the same numbering,
so some of those early bugzilla bugs would be end up being written
into pull requests - but they're so old some slightly quirky rendering
seems OK))

(honestly, if it made it /way/ easier - probably having those first
250 remain what they are, and adding a comment or something "hey, if
you're looking for the original bugzilla bug 123, it's over at issue
5047" or whatever - I doubt those couple of hundred bugs are often
looked at, etc)

>
> --
> With best regards, Anton Korobeynikov
> Department of Statistical Modelling, Saint Petersburg State University
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Bugzilla migration plan

David Chisnall via cfe-dev
> (honestly, if it made it /way/ easier - probably having those first
> 250 remain what they are, and adding a comment or something "hey, if
> you're looking for the original bugzilla bug 123, it's over at issue
> 5047" or whatever - I doubt those couple of hundred bugs are often
> looked at, etc)
Right. We will move to this issue as soon as we have an archive in
place with proper numbering. Then we will see how to proceed with the
final step.

--
With best regards, Anton Korobeynikov
Department of Statistical Modelling, Saint Petersburg State University
_______________________________________________
cfe-dev mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev