[Python-Dev] JITted regex engine from pypy
Maciej Fijalkowski
fijall at gmail.com
Sun Jun 3 15:16:32 CEST 2012
On Sun, Jun 3, 2012 at 3:06 PM, Calvin Spealman <ironfroggy at gmail.com>wrote:
> On Sun, Jun 3, 2012 at 7:49 AM, Maciej Fijalkowski <fijall at gmail.com>
> wrote:
> > Hi
> >
> > I was reading a bit about the regex module and I would like to present
> some
> > other solution into speeding up the re module for Python.
> >
> > So, as a bit of background - pypy has a re compatible module. It's also
> > JITted and it's also exportable as a C library (that is a library you can
> > call from C with C API, not a python extension module). I wonder if it
> would
> > be worth to put some work into it to make it a library that CPython can
> use.
> >
> > On the minus side, the JIT only works on x86 and x86_64, on the plus
> side,
> > since it's 100% API compatible, it can be used as a _xxx speedup module
> > relatively easy.
> >
> > Do people have opinions?
>
> A few questions and comments about such an idea, from someone who
> hasn't used PyPy yet and doesn't understand the setup involved.
>
> 1) Would PyPy be required to build this as a C-compatible library,
> such that CPython could use it as an extension module? That is, would
> it make PyPy a required part of building CPython?
>
It depends a bit how we organize stuff. PyPy (as the pypy repository
checkout, not the pypy interpreter) would be requires to build necessary C
files (and as such also maintenance since the C files are not
hand-editable), but pypy would not be required to compile C files.
>
> 2) Are there benchmarks comparing the performance of this
> implementation to the existing re module and the proposed regex
> module?
>
I don't think so. It really is reasonably fast in a lot of cases and it can
definitely be made faster in more cases. The main power comes from JITting
- so you compile a piece of assembler per regex created. I doubt C library
can come close to this approach-wise. Of course there will be cases and
cases, but generally speaking the approach is superior. It would be cool if
someone do the benchmarks how they look like *right now*.
>
> 3) How would the maintenance work? Where would the module live
> "officially"? Does CPython fork it or is it extracted from PyPy in a
> way it can be installed as an external dependency? How does CPython
> get changes upstream?
>
I would honestly hope it can be maintained as a part of pypy and then
cpython would just use it. But those are just hopes.
>
> 4) I may be remembering wrong, but I recall maintenance ease to be one
> of the justifications for the regex module. How would your proposal
> compare? Is a random developer looking to fix a bug in his way going
> to find this easier or more difficult to get his head around?
>
I think it's relatively easy since it's python code after all, but what do
I know. Someone has to have a look, it lives here -
https://bitbucket.org/pypy/pypy/src/default/pypy/rlib/rsre I would like
people to have opinions themselves whether it's more or less maintenance
effort. On our side, we'll maintain this particular part of code anyway (so
it's also easier because you leave it to others).
Cheers,
fijal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120603/caf29963/attachment.html>
More information about the Python-Dev
mailing list