[issue927] Task takes more time on nightly than CYthon 2.7
New submission from Zira <NeruYume@hotmail.com>: from itertools import combinations import time string = "ABCDEFGHIJKLMNOPQRSTUWXYZ" string = string.lower() minlen = 3 substrlen = 12 #Find combinations available letters can be arranged in at = time.clock() lookup = set() for i in range(minlen, substrlen+1): for comb in combinations(string, i): lookup.add("".join(sorted(comb))) lookup = list(lookup) lookup.sort(key=len, reverse=True) print time.clock()-at ### #This takes more time on pypy-c-jit-49069-62bc56457861-win32 than on CPython 2.7 (64 bit) Took 90 seconds on pypy-c-jit-49069-62bc56457861-win32 and 45 seconds on CPython 2.7 ---------- messages: 3423 nosy: Zira, pypy-issue priority: performance bug status: unread title: Task takes more time on nightly than CYthon 2.7 ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Zira <NeruYume@hotmail.com> added the comment: OS: Windows 7 64 bit ---------- status: unread -> chatting ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Fijal <fijall@gmail.com> added the comment: I can reproduce it on my linux (64bit) vs CPython 2.6. PyPy is slower and the offending call is lookup.add (replacing with a dict does not help). A bit no clue ---------- nosy: +fijal ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Alex Gaynor <alex.gaynor@gmail.com> added the comment: Carl pushed a ton of set-strategies stuff, we should investigate if that helps at all. ---------- nosy: +agaynor ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Justin Peel <peelpy@gmail.com> added the comment: This looks like the same gc performance problem that we have with making large dicts. Minimark uses card refs at the front of arrays to determine if objects need to be examined and possibly moved out of the nursery. Each bit represents whether anything was changed in a range of objects in the array behind the structure. I believe the size is a little over a hundred. Anyway, it works well for appending objects to lists, but when adding items to dicts and sets, objects are added in pseudo-randomly. When creating huge dicts/sets, basically the whole low level array behind the dict/set is checked for every minor collection. ---------- nosy: +justinpeel ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Armin Rigo <armin.rigo@gmail.com> added the comment: Maybe it makes sense to use in this case a much smaller card ref size? ---------- nosy: +arigo ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Justin Peel <peelpy@gmail.com> added the comment: I think that a small card ref size for sets/dicts would help (I'll just talk about dicts from now on, but it applies to sets). However, I also discovered something that may be really slowing this down. Currently, whenever anything is put into a dict, the object is considered to be young without checking whether it is. There is code to do the checking, but there is a note saying that it was slower which is why it isn't used. Please correct me if I am wrong here. Considering all inserted objects as young is fine for the example in this issue until the dict needs to be resized. When a dict is resized, every item from the old array is inserted into the new, larger array and considered to be young for the purpose of card marking. Since the card ref size is currently 128 and there should the array will have 1/3 of its items filled after resizing, all of the card refs will be set when a resizing is done. I think actually checking if items are young when reinserting during the resizing process combined with a smaller card ref size could help a lot here. By the way, lists are fine on the resizing front from what I see because direct memory copies are done for everything including the card refs. The difference with dicts is that the items are inserted pseudo-randomly so card refs can't be copied. ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Carl Friedrich Bolz <cfbolz@gmx.de> added the comment: FWIW, on the set-strategies branch this also becomes quite a bit faster, so maybe in combination with the improvements to the GC infrastructure we can beat CPython here. ---------- nosy: +cfbolz ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
Carl Friedrich Bolz <cfbolz@gmx.de> added the comment: this problem is still there ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue927> ________________________________________
participants (6)
-
Alex Gaynor
-
Armin Rigo
-
Carl Friedrich Bolz
-
Fijal
-
Justin Peel
-
Zira