[SoC2010-General] GSoC 2010 Draft Proposal: Python 3.x tools - Speed Up 2to3 Pattern Matching

G. M. Bond bondg at uoguelph.ca
Thu Apr 8 07:56:24 CEST 2010

Hello GSoC List members, my name is Matt Bond and I am an undergraduate,
soon to be graduate student at the University of Guelph in Canada. I'm
applying to overhaul the pattern matching algorithm in the 2to3 tool.
I've included a draft of my application below, any comments and
suggestions would be appreciated before I submit the application

Cheers, and good luck to the other applicants!

Project Overview
The 2to3 utility is a utility that has been written to help convert
python 2.x programs into working python 3.x programs. However, the tool
currently uses inefficient matching algorithms that significantly
negatively impact the execution time especially for large projects. This
project aims to replace the existing matching algorithm with an
algorithm based on state machines that should provide significantly
improve performance.

During the course of the project, I will perform an analysis of the
existing pattern matching algorithm. This analysis should provide useful
metrics about the performance of the 2to3 tool, including counts of
matching operations, as well as information about how the matches map to
the patterns and the nodes of the AST (Abstract Syntax Tree) that is
created during the tool's operation. This analysis should highlight
existing issues, provide a way to gauge the effectiveness of updated
algorithms, and possibly provide insight into other areas of the tool
that suffer performance issues.

Based on this analysis, and provided the analysis indicates that it is
most likely the best path forward, I will create a NFA (Nondeterministic
Finite Automaton/State Machine) which will model the pattern matching
system. From this NFA I will implement a DFA(Deterministic Finite
Automaton/State Machine). The benefit of this approach is that a
correctly built DFA will allow the pattern matching algorithm to be in
several of the NFA states simultaneously, allowing the algorithm to
correctly return return the correct state of the system without
resorting to backtracking when an unexpected token is read (this is a
simplification, the DFA cannot really be in multiple states at once but
the DFA has states that represent collections of NFA states).

Project Timeline
April 26 - May 24: Community Bonding
 * Become familiar with the PSF's development process, community, tools,
coding standards, and codes of conduct
 * Review documentation and code of the 2to3 tool and ensure the
complete understanding of the existing architecture.
 * Correspond with mentor about best ways to profile/benchmark the 2to3
tool so as to provide useful performance metrics

May 25 -  June 1:
 * Create design documents detailing the changes necessary to profile
the 2to3 tool
 * Seek feedback on the design documents from mentor and make
adjustments. Repeat until a good design is reached.

June 2 - June 15:
 * Implement, test, and document the profiling system

June 16 - June 29:
 * Based on the results of the profiling, create design documents
detailing the changes necessary to rework the pattern matching algorithm
to use a state machine.
 * Seek feedback on the design documents documents from mentor and make
adjustments. Repeat until a good design is reached.

June 30 - July 13:
 * Begin implementation and documentation of redesigned pattern matching

July 12 - July 16:
 * Complete Mid-term evaluation

 July 14 - August 3:
 * Continue implementation and documentation of redesigned pattern
matching algorithm
 * Create additional tests for new pattern matching algorithm, ensure
existing tests still work

August 4 - August 9:
 * Finish up any bug fixes or outstanding issues that have not been
dealt with - no new implementation work
 * Pencils down (August 9)

August 10 - August 16:
 * Clean up code
 * complete any missing documentation
 * improve poor documentation
 * improve poor tests or implement missing tests

August 17-20:
 * Final evaluation

August 21-??:
 * Developers porting python applications from 2.x to 3.x shower the PSF
with appreciation for the new and improved 2to3 tool

About Me
I have completed nearly all of my undergraduate Bachelor of Computing
(B. Comp) degree from the University of Guelph. When I originally
started at Guelph I was in the Engineering program, but my interest in
the more abstract parts of software development caused me to switch into
the computing program. The B. Comp program at Guelph requires students
to pick an "area of application" in addition to computer science. This
area of application is very similar in idea to a minor. The area of
application I have completed is in psychology, as I hope to do work in
the field of Human-Computer Interaction (HCI). Upon the completion of my
undergraduate degree, I hope to complete a M. Sc. in Computer Science
with a focus on HCI.

During this year, I successfully completed a course which involved
building a compiler for a pascal-like language, generating MIPS2000
assembly output. As part of the course, I successfully created a pattern
matching lexer program by converting regular expressions into a NFA,
then converting that NFA into a DFA and implementing it. This experience
should be extremely beneficial during the course of this GSoC project,
as my proposal involves creating an NFA/DFA system. Although the lexer
was written in C, I have made it available to demonstrate the experience
I have had with this process - the code and documentation are available
at (~~Address to be added when I have somewhere to host it~~).

I chose the PSF as the only organization to which I would apply because
of my high regard for the python language. When working with other
students in my courses I frequently advocate the use of Python over
other languages for many applications, and have introduced several of my
peers to it, with almost universally positive results. Because I have
found the language so useful and such a joy to use, I can think of no
better organization to which I could offer my skills this term for the

Although I have never submitted code before to an OSS project, I have
submitted a patch for 2to3 that resolve issue #8029
( http://bugs.python.org/issue8029 ) and another minor issue I
discovered during the process of fixing that issue. Although these are
minor fixes, I hope that they demonstrate my ability and willingness to
contribute to the project.

Contact Information
I can be with any questions or feedback about this proposal at:
email: ~to be added~
phone: ~to be added~
IM: ~to be added~

My blog (which is currently empty but will begin being used as part of
this project) is available at:

Thanks in advance for your time, I look forward to working with the PSF
this summer
Geoffrey Matthew "Matt" Bond
April 8, 2010
Guelph, Ontario, Canada

More information about the SoC2010-General mailing list