Re: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
30.07.14 02:45, antoine.pitrou написав(ла):
Did you compare this with issue #15381 [1]? [1] http://bugs.python.org/issue15381
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
30.07.14 06:59, Serhiy Storchaka написав(ла):
Using microbenchmark from issue22003: $ cat i.py import io word = b'word' line = (word * int(79/len(word))) + b'\n' ar = line * int((4 * 1048576) / len(line)) def readlines(): return len(list(io.BytesIO(ar))) print('lines: %s' % (readlines(),)) $ ./python -m timeit -s 'import i' 'i.readlines()' Before patch: 10 loops, best of 3: 46.9 msec per loop After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop
data:image/s3,"s3://crabby-images/94e2d/94e2d311be2febcaed90d6892ce5a097c76b3335" alt=""
Hi Serhiy, At least conceptually, 15381 seems the better approach, but getting a correct implementation may take more iterations than the (IMHO) simpler change in 22003. For my tastes, the current 15381 implementation seems a little too magical in relying on Py_REFCNT() as the sole indication that a PyBytes can be mutated. For the sake of haste, 22003 only addresses the specific regression introduced in Python 3.x BytesIO, compared to 2.x StringI, where 3.x lacked an equivalent no-copies specialization. David
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
Le 30/07/2014 02:11, Serhiy Storchaka a écrit :
Not really, but David's patch is simple enough and does a good job of accelerating the read-only BytesIO case.
I'm surprised your patch does better here. Any idea why? Regards Antoine.
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
30.07.14 16:59, Antoine Pitrou написав(ла):
Ignoring tests and comments my patch adds/removes/modifies about 200 lines, and David's patch -- about 150 lines of code. But it's __sizeof__ looks not correct, correcting it requires changing about 50 lines. In sum the complexity of both patches is about equal.
I didn't look at David's patch too close yet. But my patch includes optimization for end-of-line scanning.
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
Le 30/07/2014 15:48, Serhiy Storchaka a écrit :
I meant that David's approach is conceptually simpler, which makes it easier to review. Regardless, there is no exclusive-OR here: if you can improve over the current version, there's no reason not to consider it/
I didn't look at David's patch too close yet. But my patch includes optimization for end-of-line scanning.
Ahah, unrelated stuff :-)
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
31.07.14 00:23, Antoine Pitrou написав(ла):
Unfortunately there is no anything common in implementations. Conceptually David came in his last patch to same idea as in issue15381 but with different and less general implementation. To apply my patch you need first rollback issue22003 changes (except tests).
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
30.07.14 06:59, Serhiy Storchaka написав(ла):
Using microbenchmark from issue22003: $ cat i.py import io word = b'word' line = (word * int(79/len(word))) + b'\n' ar = line * int((4 * 1048576) / len(line)) def readlines(): return len(list(io.BytesIO(ar))) print('lines: %s' % (readlines(),)) $ ./python -m timeit -s 'import i' 'i.readlines()' Before patch: 10 loops, best of 3: 46.9 msec per loop After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop
data:image/s3,"s3://crabby-images/94e2d/94e2d311be2febcaed90d6892ce5a097c76b3335" alt=""
Hi Serhiy, At least conceptually, 15381 seems the better approach, but getting a correct implementation may take more iterations than the (IMHO) simpler change in 22003. For my tastes, the current 15381 implementation seems a little too magical in relying on Py_REFCNT() as the sole indication that a PyBytes can be mutated. For the sake of haste, 22003 only addresses the specific regression introduced in Python 3.x BytesIO, compared to 2.x StringI, where 3.x lacked an equivalent no-copies specialization. David
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
Le 30/07/2014 02:11, Serhiy Storchaka a écrit :
Not really, but David's patch is simple enough and does a good job of accelerating the read-only BytesIO case.
I'm surprised your patch does better here. Any idea why? Regards Antoine.
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
30.07.14 16:59, Antoine Pitrou написав(ла):
Ignoring tests and comments my patch adds/removes/modifies about 200 lines, and David's patch -- about 150 lines of code. But it's __sizeof__ looks not correct, correcting it requires changing about 50 lines. In sum the complexity of both patches is about equal.
I didn't look at David's patch too close yet. But my patch includes optimization for end-of-line scanning.
data:image/s3,"s3://crabby-images/78d01/78d0121057ef01b75628908c4ad7e1d6fcbadc34" alt=""
Le 30/07/2014 15:48, Serhiy Storchaka a écrit :
I meant that David's approach is conceptually simpler, which makes it easier to review. Regardless, there is no exclusive-OR here: if you can improve over the current version, there's no reason not to consider it/
I didn't look at David's patch too close yet. But my patch includes optimization for end-of-line scanning.
Ahah, unrelated stuff :-)
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
31.07.14 00:23, Antoine Pitrou написав(ла):
Unfortunately there is no anything common in implementations. Conceptually David came in his last patch to same idea as in issue15381 but with different and less general implementation. To apply my patch you need first rollback issue22003 changes (except tests).
participants (3)
-
Antoine Pitrou
-
dw+python-dev@python.org
-
Serhiy Storchaka