
In the checkin message of r8719, hpk wrote:
But I mind! Let me explain this... Just before the checkin (r8718), there were following files in pypy/lib related to regular expression: re.py, dumbre.py, plexre.py, sre_adapt.py, sre_parse.py. You deleted re.py, sre_adapt.py, sre_parse.py. re.py contained (intended to contain) backend-neutral interface and utilities, like this: def match(pattern, string, flags=0): return compile(pattern, flags).match(string) And compile() does memoizing, just as CPython, etc. And re.escape() copied from CPython. These are backend-neutral. re.py imported Pattern class from the backend. Pattern's initializer has signature __init__(self, pattern, flags), and has methods of C type _sre.SRE_Pattern, like match, search, etc. And there were three backends (draft of backends) providing this Pattern class: dumbre.py, plexre.py, sre_adapt.py. Only sre_adapt.py imports CPython's _sre, others don't! You can read dumbre.py and plexre.py now, since you didn't delete them by mistake. :-) See, "dumbre.py" is exactly what you're referring as "see how far we get". It exits as soon as regular expression is used, and reports what was used. And this was the default re.py imported. (see below) "plexre.py" is more interesting, and it uses Plex to implement (for now) match(), but returns bool rather than full Match object. This is sufficient for all "if re.match(pattern, string):" tests, and this let pickle import and run, unmodified. (As I wrote in the log of r8636...) pickle uses regular expression only once, as following: __all__.extend([x for x in dir() if re.match("[A-Z][A-Z0-9_]+$",x)]) This also means PyPy interprets all of Plex just fine. Since Plex does regular expression parsing, NFA->DFA transformation, state machine all in pure Python, this is actually quite cool. "sre_adapt.py" is, as you see, cheating. It contains a single line: Pattern = sre_compile.compile And this does not work. Currently C types are not succesfully faked, so _sre.SRE_Pattern instances are created (well, see below) but all method calls will result in long traceback. I asked why is this on IRC long time ago, and I heard that it's because this C type lacks __dict__. Same applies for _random.Random, I think. That's why we have pure Python random.py from 2.2.3 in lib directory now. And why is sre_parse.py patched? Because, without those, no _sre.SRE_Pattern instances will be created, because PyPy fails to interpret some part of sre_parse.py. How to reproduce this problem: (with current PyPy)
Why is this? I wrote my reasoning on the log of this file... r3332 and r3456. To show what's going on, I will give an example: class A: def getwidth(self): print 'hahaha' class B: def getwidth(self): print 'lalala' class C: def __getitem__(self, index): return A() def __getslice__(self, start, stop): return B() c = C() c[0].getwidth() c[0:1].getwidth() CPython prints hahaha and lalala. PyPy prints both hahaha. And sre_parse uses __getslice__... And this is the only place getslice is used, in entire Python standard library! (except those in UserList and UserString) And getslice is deprecated since release 2.0, as officially announced in Language Reference 3.3.6. Okay, can I back your deletion now? :-) Seo Sanghyeon
participants (1)
-
Seo Sanghyeon