[Numpy-discussion] Fwd: Github notifications and trac-to-github migration

Fernando Perez fperez.net at gmail.com
Thu Jul 26 13:47:51 EDT 2012


Forwarding Jordi's message on the trac migration, he's having issues
sending it directly.

---------- Forwarded message ----------
From: Jordi Torrents <jordi.t21 at gmail.com>
Date: 2012/7/26
Subject: Re: [Numpy-discussion] Github notifications and
trac-to-github migration
To: Aric Hagberg <aric.hagberg at gmail.com>
Cc: Discussion of Numerical Python <numpy-discussion at scipy.org>

Hi,

Sorry for the delay in reporting feedback on the migration. I was
planning to do it when I came back to Barcelona from Scipy but had
some domestic problems (broken pipe, partially flooded flat) that got
in the way.

2012/7/26 Aric Hagberg <aric.hagberg at gmail.com>:
> On Wed, Jul 25, 2012 at 3:51 AM, Thouis (Ray) Jones <thouis at gmail.com> wrote:
>> On Wed, Jul 25, 2012 at 6:53 AM, Fernando Perez <fperez.net at gmail.com> wrote:
>>> Hi Thouis,
>>>
>>> On Tue, Jul 24, 2012 at 2:46 PM, Thouis (Ray) Jones <thouis at gmail.com> wrote:
>>>> I would estimate I'm between a fourth and halfway through the
>>>> implementation of the trac-to-github-issues migration code.  The work
>>>> lives in at https://github.com/thouis/numpy-trac-migration
>>>
>>> mmh, I would have thought you're farther ahead... Aric Hagberg
>>> (@hagberg) and Jordi Torrents (@jtorrents) of NetworkX fame last
>>> weekend completed the trac2github migration for nx, and he said he'd
>>> only had to make a few improvements to your code.
>>>
>>> I'm cc'ing Aric here so he can give us more details, but based on the
>>> fact that they were able to successfully migrate nx completely to GH,
>>> I would have imagined you'd be much, much closer for numpy/scipy.
>>
>> Perhaps my estimate was low.  I hadn't done any work with creating
>> issues on github (only extracting them from Trac into a form that maps
>> onto github issues), but I expect the PyGithub library
>> (https://github.com/jacquev6/PyGithub) helps make the rest of the work
>> easier.  Glad to hear it helped.
>>
>>> Their migration looks pretty solid, including all old comments and
>>> attachments being correctly linked, cf this one:
>>>
>>> https://github.com/networkx/networkx/issues/693
>>
>> Based on that issue, it looks like I wasn't careful enough in temporal
>> ordering of comments, not that it's that critical.
>>
>> Aric, is the code you ended up using available somewhere?
>
> We obviously didn't get all of the details quite right when we
> migrated the Trac tickets to Github issues.  We gave ourselves a day
> or so at the SciPy sprints to do it and made a best effort.  We would
> have never been able to  accomplish what we did without the code Ray
> wrote.  Really there wasn't much more to add and we are happy to share
> what we wrote (though it is a hack).  I've cc'd Jordi who wrote the
> extra code.

As Aric says, Ray's code was a life saver for us. We tried several
other scripts for the migration before knowing about Ray's code, and
all of them failed badly. The only important part missing in Ray's
code was the method push() of the class issue in issue.py. We didn't
migrate milestones nor labels, so more work is needed to do that. Here
is how we implemented the push method:

def push(self, repo):
    github_issue = repo.create_issue(title=self.github.title,
                                        body=self.github.body)#,
                                        #assignee=self.github.assignee,
                                        #milestone=self.github.milestone,
                                        #labels=self.github.labels)
    try:
        for comment in self.github.comments:
            github_issue.create_comment(comment)
    except:
        print("!!! Error in ticket %s" % self.trac.id)
    finally:
        if self.github.state == "closed":
            github_issue.edit(state='closed')

Some formatting in the comments was problematic (more on that below).
In our case this affected approximately 5% of the tickets. Github
returned an ugly HTTP 500 (Internal Server Error) and all comments
coming after a problematic one were lost. A more careful handling of
individual comments would have prevented the loss of subsequent
comments. Then the move_issues.py code was simply:

import trac
from ghissues import gh_repo

repo = gh_repo() # Manual auth here

for issue in trac.issues('data/trac.db'):
    print("processing issue %s" % issue.trac.id)
    issue.githubify()
    issue.push(repo)

> Briefly here are some of the issues we encountered - Jordi can
> probably add more.
>
> 1) We made and applied a mapping from the changeset hashes in our old
> repository (Mercurial) to the Git changeset hashes.  This mostly
> worked.

We used a regular expression to match the mercurial hash for commits
in trac comments and a map generated by hg-git. It turns out that the
regular expression was not general enough and we missed some hashes.
The quick and dirty code that we used is:

m = re.compile('\[(.*)/networkx\]')

def load_hg_map(fname='git-mapfile'):
    hg_map = {}
    f = open(fname,'r')
    for row in f:
        hg_map[row.split(" ")[1].strip()] = row.split(" ")[0].strip()
    f.close()
    return hg_map

hg_map = load_hg_map()

def map_hg(hg_hash):
    if hg_hash in hg_map:
        return hg_map[hg_hash]
    else:
        return hg_hash

def t2g_markup(s):
    h = m.search(s)
    if h:
        hh = h.group(1)
        s = s.replace(hh,map_hg(hh))
    return s.replace('{{{', "'''").replace('}}}', "'''")

> 2) We didn't make a mapping between the Git issue numbers and the Trac
> issue numbers so many of the cross references were wrong.  I recommend
> doing that.
>
> 3) You are right that many messages will get sent out so considering
> the impact of that is worthwhile.
>
> 4) Some of the tickets/comments  (maybe 50 of the approx 800 tickets
> we converted) had some formatting that broke during conversion.  Jordi
> might have some thoughts on how to fix that.

As I said above, the problematic comments triggered an ugly HTTP 500
(Internal Server Error) from Github. We didn't spend time trying to
debug and fix that. Most of the comments that failed in the migration
had python code in their code using trac syntax ({{{#!python ....
}}}). However not all comments with python code failed, my feeling is
that the problematic parts were the prints that used the percent sign
(%) but I'm not sure about that. Also, the edited trac comments also
triggered a HTTP 500 error, see for instance:

https://networkx.lanl.gov/trac/ticket/609#comment:26

The list of almost all issues with problematic comments is (to look at
them https://networkx.lanl.gov/trac/ticket/{id} ):

231, 255, 273, 282, 283, 301, 310, 314, 346, 348, 362, 401, 431, 450,
466, 477, 494, 501, 539, 563, 574, 583, 598, 609, 623, 628, 632, 637,
643, 676, 704, 707, 713, 740

Ray, thank you very much for your work, without your code our
migration would have been a lot more painful and slow.

Salut!



More information about the NumPy-Discussion mailing list