[pypy-issue] [issue340] rewrite PyPy's tokenizer

Buck Golemon tracker at bugs.pypy.org
Tue Mar 19 21:08:08 CET 2013

Buck Golemon <buck.golemon at gmail.com> added the comment:

Just recording this here for everyone's information.

On Mon, Mar 18, 2013 at 10:42 PM, Benjamin Peterson <benjamin at python.org> wrote:
Hi Buck,
I wanted to say a bit more about what a better PyPy tokenizer would
look like. If you look in pypy/interpreter/pyparser/pytokenizer.py,
you'll see the main rountine is generate_tokens(). I think that
routine is fine overall. You'll notice it uses things like "endDFA"
and "whiteSpaceDFA" for matching. Looking under the layers, you'll see
this is basically a bunch of automatically generated icky DFAs. That
would be a excellent place for the regular expressions of rply.

nosy: +buck

PyPy bug tracker <tracker at bugs.pypy.org>

More information about the pypy-issue mailing list