[pypy-dev] Adding a feature to re

Mon Aug 25 09:20:55 CEST 2014

Hey folks,

One of the projects I'm working on in CPython is becoming a little CPU
bound and I was hoping to use pypy. One problem though - one of the pieces
uses the regex library (which claims to be CPython's re-next). Running
regex through cpyext works, but is deadly slow.

>From reading the docs it seems like I have a few options:
 - rewrite all of regex in Python - seems like a bad idea
 - rewrite regex to be non-python specific & use cppyy or cffi to interface
with it. I actually looked into this & unfortunately the CPython API seems
quite deep in there.
 - get rid of the dependency somehow. What I'm missing are named lists
(basically "L<a>", a=["1","2"] will match 1 or 2). Unfortunately creating
one really long re string is out of the question - I have not seen
compile() finish with that approach. Writing a custom DFA could be on the
table, but I was hoping to avoid that error prone step.
 - somehow factor out the part using regex and keep using CPython for it.
 - add the missing functionality to pypy's re. This seems like the path of
least resistance.

I've started looking into the sre module and it looks like quite a few bits
(parsing & compiling to byte code mostly) are reused from CPython. I would
have to change some of those bits. My question is then - is there any hope
of getting these changes upstream then? Do stdlib pieces have a "no touch"
policy?

Thanks,
Mike.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20140825/ad963102/attachment.html>