[Python-Dev] Speeding up 2to3: Results from a GSOC Project

Guido van Rossum guido at python.org
Fri Jul 30 22:43:47 CEST 2010


Great! CS FTW!

--Guido

On Fri, Jul 30, 2010 at 1:28 PM, George Boutsioukis
<gboutsioukis at gmail.com> wrote:
> Hi everyone, I haven't had a chance to introduce myself yet. I'm George
> Boutsioukis, a CS student from Greece and I'm currently enrolled as a GSOC
> student for the PSF. The task I was involved with for the past few weeks was
> speeding up the 2to3 tool.
>
> For those who are not aware of 2to3's internals, the tool matches a series
> of tree patterns to a python syntax tree and applies a code transformation
> on each match. The real bottleneck of this tool is the tree pattern matching
> algorithm, since the current version traverses all nodes of the tree,
> top-to-bottom, and checks each node against a set of tree patterns.
>
> The new algorithm I've implemented reduces the candidate nodes for matching.
> The process works in three steps:
>
> 1) Each pattern tree is reduced to the most 'characteristic' path, i.e. the
> rarest path you'll encounter on a real code tree. (this is simpler than it
> sounds)
>
> 2) From the set of the above linear patterns an Aho-Corasick automaton is
> constructed that is capable of inclusively matching all patterns(meaning
> that it may mark a wrong candidate but will never miss an instance)
>
> 3) The automaton is run on each leaf until the root of the tree is reached
> and the resulting match set is returned. And finally we apply each fixer
> for each node in the match set. Of course the process is a bit more
> complicated since we have to recheck the transformed code for new matches etc.
>
> Since it is quite a hard concept to illustrate in a post, tell me if you
> have any questions or need more info.
>
> The benefit of this somewhat complex process is substantial; the total run
> time is reduced by roughly 50%. The new module is still experimental code,
> but mature enough to pass all tests. I propose to include the new module
> with the next version of Python, perhaps as an explicit option for 2to3
> until it can be thoroughly tested.
>
> The code repository can be found here:
> https://code.google.com/p/2to3-speedup2/source/browse/
> (the code is a fork from py3k's 2to3 and was tested with version 3.1.2)
>
> The blog I've been using for GSOC(containing more details on the matching
> algorithm): http://george-gsoc.blogspot.com/
>
> Regards,
> George
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list