[Python-Dev] JITted regex engine from pypy

Bill Janssen janssen at parc.com
Sun Jun 3 22:51:22 CEST 2012


Maciej Fijalkowski <fijall at gmail.com> wrote:

> On Sun, Jun 3, 2012 at 5:21 PM, "Martin v. Löwis" <martin at v.loewis.de>wrote:
> 
> >  On the minus side, the JIT only works on x86 and x86_64, on the plus
> >> side, since it's 100% API compatible, it can be used as a _xxx
> >> speedup module relatively easy.
> >>
> >> Do people have opinions?
> >>
> >
> > The main concern for re is not speed, but functionality. The Python re
> > module needs to grow a number of features, and correct a number of bugs.
> > So 100% compatible is actually not good enough. 95% compatible (with
> > the features added and the bugs fixed) would be better.

>From my point of view, for textual data reduction, the MRAB regex now
has substantial improvements which enable very different kinds of uses,
like "named lists" and "fuzzy" matching, which I don't believe occur
(together) in any other RE library.  Along with features it shares with
the existing CPython "re" library, such as the ability to handle very
large RE's (which IronPython, for instance, is unable to handle,
apparently due to its use of the standard .NET RE library).  And do so
fairly efficiently.

Bill

> >
> > OTOH, sharing the re code with PyPy would be a desirable goal, as would
> > be writing the re code in Python (although SRE already implements
> > significant parts in Python).
> >
> 
> We did not reimplement those parts in RPython, they're still in python (so
> the sre engine does not accept regex, but instead the lower-level
> description, etc. etc.)
> 
> 
> >
> > As a speedup module, it's uninteresting - we want to simplify maintenance,
> > not complicate it. So this can only work if it replaces
> > SRE.
> >
> > Regards,
> > Martin
> >
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/bill%40janssen.org


More information about the Python-Dev mailing list