[issue718] string.plit('\x00') takes 300% longer under pypy1.5 w/jit

New submission from Robert Collins <robertc@robertcollins.net>: under cpython: 11:37 < lifeless> 3 0.000 0.000 0.280 0.093 journals.py:678(_tokens) under pypy(1.5, 64-bit build, your official download) 11:38 < lifeless> 3 0.000 0.000 1.161 0.387 journals.py:678(_tokens) _tokens is defined as def _tokens(): return content.split('\x00') content in the timed case is a 74MB bytestring consisting of \x00 separated fields; the fields are highly redundant - for example one in 5 in the case that produced this is 'add', another one in 5 is 'file', and the rest are 'dir', 'symlink' and relative paths, and floats (as ascii). There are 3331176 fields (len(content.split('\x00'))==3331176) ---------- effort: ??? messages: 2507 nosy: lifeless, pypy-issue priority: bug release: ??? status: unread title: string.plit('\x00') takes 300% longer under pypy1.5 w/jit _______________________________________________________ PyPy development tracker <pypy-dev-issue@codespeak.net> <https://codespeak.net/issue/pypy-dev/issue718> _______________________________________________________

Robert Collins <robertc@robertcollins.net> added the comment: I've tested alternate paths based on IRC feedback. mmap is slower using a generator and doing .find() to find the next \x00 is slower. ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue718> ________________________________________

Carl Friedrich Bolz <cfbolz@gmx.de> added the comment: I just checked, this problem still exists. ---------- nosy: +cfbolz ________________________________________ PyPy bug tracker <tracker@bugs.pypy.org> <https://bugs.pypy.org/issue718> ________________________________________
participants (3)
-
Carl Friedrich Bolz
-
Robert Collins
-
Robert Collins