Re: [pypy-dev] [pypy-commit] pypy default: Optimize match.group('name') by making it a module dict.
Hi Alex, I don't really understand the first change of this commit. Why is it a good idea to change the groupdict of the re parser to be a module dict? There are supposed to be "not too many" module dicts, because they are promoted on read. So I don't get why this is a sensible change. Would you please add a comment to the point where the module dict is instantiated why this is a good idea, and ideally also a test_pypy_c test. Cheers, Carl Friedrich On 01/05/2013 03:55 AM, alex_gaynor wrote:
Author: Alex Gaynor <alex.gaynor@gmail.com> Branch: Changeset: r59708:3d2ff1e85bf5 Date: 2013-01-04 18:55 -0800 http://bitbucket.org/pypy/pypy/changeset/3d2ff1e85bf5/
Log: Optimize match.group('name') by making it a module dict.
diff --git a/lib-python/2.7/sre_parse.py b/lib-python/2.7/sre_parse.py --- a/lib-python/2.7/sre_parse.py +++ b/lib-python/2.7/sre_parse.py @@ -16,6 +16,12 @@
from sre_constants import *
+try: + from __pypy__ import newdict +except ImportError:@ + def newdict(tp): + return {} + SPECIAL_CHARS = ".\\[{()*+?^$|" REPEAT_CHARS = "*+?{"
@@ -68,7 +74,7 @@ self.flags = 0 self.open = [] self.groups = 1 - self.groupdict = {} + self.groupdict = newdict("module") def opengroup(self, name=None): gid = self.groups self.groups = gid + 1 diff --git a/pypy/module/_sre/interp_sre.py b/pypy/module/_sre/interp_sre.py --- a/pypy/module/_sre/interp_sre.py +++ b/pypy/module/_sre/interp_sre.py @@ -90,7 +90,7 @@ # SRE_Pattern class
class W_SRE_Pattern(Wrappable): - _immutable_fields_ = ["code", "flags", "num_groups"] + _immutable_fields_ = ["code", "flags", "num_groups", "w_indexgroup"]
def cannot_copy_w(self): space = self.space _______________________________________________ pypy-commit mailing list pypy-commit@python.org http://mail.python.org/mailman/listinfo/pypy-commit
Hi Carl, The reason is that the dict has similar properties to a module dict: 1) keys are written only once 2) lookups are almost always by constant strings In typical usage a the groupindex dict is never mutated after its initial creation, and reads from it are by a precise name of a field, therefore by having it be a moduledict we can make re.group('name') be free. I'll go ahead and add a comment with this info. Alex On Fri, Jan 11, 2013 at 2:17 AM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
Hi Alex,
I don't really understand the first change of this commit. Why is it a good idea to change the groupdict of the re parser to be a module dict? There are supposed to be "not too many" module dicts, because they are promoted on read. So I don't get why this is a sensible change.
Would you please add a comment to the point where the module dict is instantiated why this is a good idea, and ideally also a test_pypy_c test.
Cheers,
Carl Friedrich
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
On 01/11/2013 04:57 PM, Alex Gaynor wrote:
Hi Carl,
The reason is that the dict has similar properties to a module dict:
1) keys are written only once 2) lookups are almost always by constant strings
In typical usage a the groupindex dict is never mutated after its initial creation, and reads from it are by a precise name of a field, therefore by having it be a moduledict we can make re.group('name') be free.
I'll go ahead and add a comment with this info.
does that mean that the dict (which is created during parsing) is stored on the regex object? If yes, that is the connection that I didn't understand. CF
Alex
On Fri, Jan 11, 2013 at 2:17 AM, Carl Friedrich Bolz <cfbolz@gmx.de <mailto:cfbolz@gmx.de>> wrote:
Hi Alex,
I don't really understand the first change of this commit. Why is it a good idea to change the groupdict of the re parser to be a module dict? There are supposed to be "not too many" module dicts, because they are promoted on read. So I don't get why this is a sensible change.
Would you please add a comment to the point where the module dict is instantiated why this is a good idea, and ideally also a test_pypy_c test.
Cheers,
Carl Friedrich
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
AH, yes, it is. This dict is used when you do something like: re.match(r'(?P<name>\d+', '12').group('name') Alex On Fri, Jan 11, 2013 at 7:59 AM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
I'll go ahead and add a comment with this info.
does that mean that the dict (which is created during parsing) is stored on the regex object? If yes, that is the connection that I didn't understand.
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
participants (2)
-
Alex Gaynor
-
Carl Friedrich Bolz