[pypy-dev] Adding a feature to re

Laura Creighton lac at openend.se
Mon Aug 25 15:30:11 CEST 2014


In a message of Mon, 25 Aug 2014 03:20:55 -0400, Mike Kaplinskiy writes:
>Hey folks,
>
>One of the projects I'm working on in CPython is becoming a little CPU
>bound and I was hoping to use pypy. One problem though - one of the pieces
>uses the regex library (which claims to be CPython's re-next). Running
>regex through cpyext works, but is deadly slow.
>
>>From reading the docs it seems like I have a few options:
> - rewrite all of regex in Python - seems like a bad idea
> - rewrite regex to be non-python specific & use cppyy or cffi to interface
>with it. I actually looked into this & unfortunately the CPython API seems
>quite deep in there.
> - get rid of the dependency somehow. What I'm missing are named lists
>(basically "L<a>", a=["1","2"] will match 1 or 2). Unfortunately creating
>one really long re string is out of the question - I have not seen
>compile() finish with that approach. Writing a custom DFA could be on the
>table, but I was hoping to avoid that error prone step.
> - somehow factor out the part using regex and keep using CPython for it.
> - add the missing functionality to pypy's re. This seems like the path of
>least resistance.
>
>I've started looking into the sre module and it looks like quite a few bits
>(parsing & compiling to byte code mostly) are reused from CPython. I would
>have to change some of those bits. My question is then - is there any hope
>of getting these changes upstream then? Do stdlib pieces have a "no touch"
>policy?
>
>Thanks,
>Mike.

Do you know about 
https://pypi.python.org/pypi/regex

If I were you, I would try to get the behaviour you want put into the
new replacement version -- which would, of course, be easiest if you
contributed the code.  Then we can see about having pypy do the same ...

Laura


More information about the pypy-dev mailing list