
Aug. 25, 2014
7:20 a.m.
Hey folks, One of the projects I'm working on in CPython is becoming a little CPU bound and I was hoping to use pypy. One problem though - one of the pieces uses the regex library (which claims to be CPython's re-next). Running regex through cpyext works, but is deadly slow. >From reading the docs it seems like I have a few options: - rewrite all of regex in Python - seems like a bad idea - rewrite regex to be non-python specific & use cppyy or cffi to interface with it. I actually looked into this & unfortunately the CPython API seems quite deep in there. - get rid of the dependency somehow. What I'm missing are named lists (basically "L<a>", a=["1","2"] will match 1 or 2). Unfortunately creating one really long re string is out of the question - I have not seen compile() finish with that approach. Writing a custom DFA could be on the table, but I was hoping to avoid that error prone step. - somehow factor out the part using regex and keep using CPython for it. - add the missing functionality to pypy's re. This seems like the path of least resistance. I've started looking into the sre module and it looks like quite a few bits (parsing & compiling to byte code mostly) are reused from CPython. I would have to change some of those bits. My question is then - is there any hope of getting these changes upstream then? Do stdlib pieces have a "no touch" policy? Thanks, Mike.
3859
Age (days ago)
3859
Last active (days ago)
0 comments
1 participants
participants (1)
-
Mike Kaplinskiy