file.readinto performance regression in Python 3.2 vs. 2.7?

Hi there, I was doing some experiments with the buffer interface of bytearray today, for the purpose of quickly reading a file's contents into a bytearray which I can then modify. I decided to do some benchmarking and ran into surprising results. Here are the functions I was timing: def justread(): # Just read a file's contents into a string/bytes object f = open(FILENAME, 'rb') s = f.read() def readandcopy(): # Read a file's contents and copy them into a bytearray. # An extra copy is done here. f = open(FILENAME, 'rb') b = bytearray(f.read()) def readinto(): # Read a file's contents directly into a bytearray, # hopefully employing its buffer interface f = open(FILENAME, 'rb') b = bytearray(os.path.getsize(FILENAME)) f.readinto(b) FILENAME is the name of a 3.6MB text file. It is read in binary mode, however, for fullest compatibility between 2.x and 3.x Now, running this under Python 2.7.2 I got these results ($1 just reflects the executable name passed to a bash script I wrote to automate these runs): $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()' 1000 loops, best of 3: 461 usec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 2.81 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()' 1000 loops, best of 3: 697 usec per loop Which make sense. The readinto() approach is much faster than copying the read buffer into the bytearray. But with Python 3.2.2 (built from the 3.2 branch today): $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()' 1000 loops, best of 3: 336 usec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 2.62 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()' 100 loops, best of 3: 2.69 msec per loop Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed to avoid the redundant copy. Is there a real performance regression here, is this a well-known issue, or am I just missing something obvious? Eli P.S. The machine is quad-core i7-2820QM, running 64-bit Ubuntu 10.04

On Thu, 24 Nov 2011 20:15:25 +0200 Eli Bendersky <eliben@gmail.com> wrote:
Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed to avoid the redundant copy.
Is there a real performance regression here, is this a well-known issue, or am I just missing something obvious?
Can you try with latest 3.3 (from the default branch)? Thanks Antoine.

On Thu, Nov 24, 2011 at 20:29, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 24 Nov 2011 20:15:25 +0200 Eli Bendersky <eliben@gmail.com> wrote:
Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed to avoid the redundant copy.
Is there a real performance regression here, is this a well-known issue,
or
am I just missing something obvious?
Can you try with latest 3.3 (from the default branch)?
Sure. Updated the default branch just now and built: $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()' 1000 loops, best of 3: 1.14 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 2.78 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()' 1000 loops, best of 3: 1.6 msec per loop Strange. Although here, like in python 2, the performance of readinto is close to justread and much faster than readandcopy, but justread itself is much slower than in 2.7 and 3.2! Eli

What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into. On Nov 25, 2011 5:55 AM, "Eli Bendersky" <eliben@gmail.com> wrote:
On Thu, Nov 24, 2011 at 20:29, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 24 Nov 2011 20:15:25 +0200 Eli Bendersky <eliben@gmail.com> wrote:
Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed
to
avoid the redundant copy.
Is there a real performance regression here, is this a well-known issue, or am I just missing something obvious?
Can you try with latest 3.3 (from the default branch)?
Sure. Updated the default branch just now and built:
$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()' 1000 loops, best of 3: 1.14 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 2.78 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()' 1000 loops, best of 3: 1.6 msec per loop
Strange. Although here, like in python 2, the performance of readinto is close to justread and much faster than readandcopy, but justread itself is much slower than in 2.7 and 3.2!
Eli
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com

On Fri, Nov 25, 2011 at 00:02, Matt Joiner <anacrolix@gmail.com> wrote:
What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into.
Matt, I'm not sure what you mean by this - can you suggest the code? Also, I'd be happy to know if anyone else reproduces this as well on other machines/OSes. Eli

Eli, Example coming shortly, the differences are quite significant. On Fri, Nov 25, 2011 at 9:41 AM, Eli Bendersky <eliben@gmail.com> wrote:
On Fri, Nov 25, 2011 at 00:02, Matt Joiner <anacrolix@gmail.com> wrote:
What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into.
Matt, I'm not sure what you mean by this - can you suggest the code?
Also, I'd be happy to know if anyone else reproduces this as well on other machines/OSes.
Eli

It's my impression that the readinto method does not fully support the buffer interface I was expecting. I've never had cause to use it until now. I've created a question on SO that describes my confusion: http://stackoverflow.com/q/8263899/149482 Also I saw some comments on "top-posting" am I guilty of this? Gmail defaults to putting my response above the previous email. On Fri, Nov 25, 2011 at 11:49 AM, Matt Joiner <anacrolix@gmail.com> wrote:
Eli,
Example coming shortly, the differences are quite significant.
On Fri, Nov 25, 2011 at 9:41 AM, Eli Bendersky <eliben@gmail.com> wrote:
On Fri, Nov 25, 2011 at 00:02, Matt Joiner <anacrolix@gmail.com> wrote:
What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into.
Matt, I'm not sure what you mean by this - can you suggest the code?
Also, I'd be happy to know if anyone else reproduces this as well on other machines/OSes.
Eli

On Fri, 25 Nov 2011 12:02:17 +1100 Matt Joiner <anacrolix@gmail.com> wrote:
It's my impression that the readinto method does not fully support the buffer interface I was expecting. I've never had cause to use it until now. I've created a question on SO that describes my confusion:
Just use a memoryview and slice it: b = bytearray(...) m = memoryview(b) n = f.readinto(m[some_offset:])
Also I saw some comments on "top-posting" am I guilty of this?
Kind of :) Regards Antoine.

On Fri, Nov 25, 2011 at 12:07 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 25 Nov 2011 12:02:17 +1100 Matt Joiner <anacrolix@gmail.com> wrote:
It's my impression that the readinto method does not fully support the buffer interface I was expecting. I've never had cause to use it until now. I've created a question on SO that describes my confusion:
Just use a memoryview and slice it:
b = bytearray(...) m = memoryview(b) n = f.readinto(m[some_offset:])
Cheers, this seems to be what I wanted. Unfortunately it doesn't perform noticeably better if I do this. Eli, the use pattern I was referring to is when you read in chunks, and and append to a running buffer. Presumably if you know in advance the size of the data, you can readinto directly to a region of a bytearray. There by avoiding having to allocate a temporary buffer for the read, and creating a new buffer containing the running buffer, plus the new. Strangely, I find that your readandcopy is faster at this, but not by much, than readinto. Here's the code, it's a bit explicit, but then so was the original: BUFSIZE = 0x10000 def justread(): # Just read a file's contents into a string/bytes object f = open(FILENAME, 'rb') s = b'' while True: b = f.read(BUFSIZE) if not b: break s += b def readandcopy(): # Read a file's contents and copy them into a bytearray. # An extra copy is done here. f = open(FILENAME, 'rb') s = bytearray() while True: b = f.read(BUFSIZE) if not b: break s += b def readinto(): # Read a file's contents directly into a bytearray, # hopefully employing its buffer interface f = open(FILENAME, 'rb') s = bytearray(os.path.getsize(FILENAME)) o = 0 while True: b = f.readinto(memoryview(s)[o:o+BUFSIZE]) if not b: break o += b And the timings: $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.justread()' 10 loops, best of 3: 298 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 9.22 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readinto()' 100 loops, best of 3: 9.31 msec per loop The file was 10MB. I expected readinto to perform much better than readandcopy. I expected readandcopy to perform slightly better than justread. This clearly isn't the case.
Also I saw some comments on "top-posting" am I guilty of this?
If tehre's a magical option in gmail someone knows about, please tell.
Kind of :)
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com

Eli, the use pattern I was referring to is when you read in chunks, and and append to a running buffer. Presumably if you know in advance the size of the data, you can readinto directly to a region of a bytearray. There by avoiding having to allocate a temporary buffer for the read, and creating a new buffer containing the running buffer, plus the new.
Strangely, I find that your readandcopy is faster at this, but not by much, than readinto. Here's the code, it's a bit explicit, but then so was the original:
BUFSIZE = 0x10000
def justread(): # Just read a file's contents into a string/bytes object f = open(FILENAME, 'rb') s = b'' while True: b = f.read(BUFSIZE) if not b: break s += b
def readandcopy(): # Read a file's contents and copy them into a bytearray. # An extra copy is done here. f = open(FILENAME, 'rb') s = bytearray() while True: b = f.read(BUFSIZE) if not b: break s += b
def readinto(): # Read a file's contents directly into a bytearray, # hopefully employing its buffer interface f = open(FILENAME, 'rb') s = bytearray(os.path.getsize(FILENAME)) o = 0 while True: b = f.readinto(memoryview(s)[o:o+BUFSIZE]) if not b: break o += b
And the timings:
$ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.justread()' 10 loops, best of 3: 298 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 9.22 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readinto()' 100 loops, best of 3: 9.31 msec per loop
The file was 10MB. I expected readinto to perform much better than readandcopy. I expected readandcopy to perform slightly better than justread. This clearly isn't the case.
What is 'python3' on your machine? If it's 3.2, then this is consistent with my results. Try it with 3.3 and for a larger file (say ~100MB and up), you may see the same speed as on 2.7 Also, why do you think chunked reads are better here than slurping the whole file into the bytearray in one go? If you need it wholly in memory anyway, why not just issue a single read? Eli

On Fri, Nov 25, 2011 at 5:41 PM, Eli Bendersky <eliben@gmail.com> wrote:
Eli, the use pattern I was referring to is when you read in chunks, and and append to a running buffer. Presumably if you know in advance the size of the data, you can readinto directly to a region of a bytearray. There by avoiding having to allocate a temporary buffer for the read, and creating a new buffer containing the running buffer, plus the new.
Strangely, I find that your readandcopy is faster at this, but not by much, than readinto. Here's the code, it's a bit explicit, but then so was the original:
BUFSIZE = 0x10000
def justread(): # Just read a file's contents into a string/bytes object f = open(FILENAME, 'rb') s = b'' while True: b = f.read(BUFSIZE) if not b: break s += b
def readandcopy(): # Read a file's contents and copy them into a bytearray. # An extra copy is done here. f = open(FILENAME, 'rb') s = bytearray() while True: b = f.read(BUFSIZE) if not b: break s += b
def readinto(): # Read a file's contents directly into a bytearray, # hopefully employing its buffer interface f = open(FILENAME, 'rb') s = bytearray(os.path.getsize(FILENAME)) o = 0 while True: b = f.readinto(memoryview(s)[o:o+BUFSIZE]) if not b: break o += b
And the timings:
$ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.justread()' 10 loops, best of 3: 298 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 9.22 msec per loop $ python3 -O -m timeit 'import fileread_bytearray' 'fileread_bytearray.readinto()' 100 loops, best of 3: 9.31 msec per loop
The file was 10MB. I expected readinto to perform much better than readandcopy. I expected readandcopy to perform slightly better than justread. This clearly isn't the case.
What is 'python3' on your machine? If it's 3.2, then this is consistent with my results. Try it with 3.3 and for a larger file (say ~100MB and up), you may see the same speed as on 2.7
It's Python 3.2. I tried it for larger files and got some interesting results. readinto() for 10MB files, reading 10MB all at once: readinto/2.7 100 loops, best of 3: 8.6 msec per loop readinto/3.2 10 loops, best of 3: 29.6 msec per loop readinto/3.3 100 loops, best of 3: 19.5 msec per loop With 100KB chunks for the 10MB file (annotated with #): matt@stanley:~/Desktop$ for f in read bytearray_read readinto; do for v in 2.7 3.2 3.3; do echo -n "$f/$v "; "python$v" -m timeit -s 'import readinto' "readinto.$f()"; done; done read/2.7 100 loops, best of 3: 7.86 msec per loop # this is actually faster than the 10MB read read/3.2 10 loops, best of 3: 253 msec per loop # wtf? read/3.3 10 loops, best of 3: 747 msec per loop # wtf?? bytearray_read/2.7 100 loops, best of 3: 7.9 msec per loop bytearray_read/3.2 100 loops, best of 3: 7.48 msec per loop bytearray_read/3.3 100 loops, best of 3: 15.8 msec per loop # wtf? readinto/2.7 100 loops, best of 3: 8.93 msec per loop readinto/3.2 100 loops, best of 3: 10.3 msec per loop # suddenly 3.2 is performing well? readinto/3.3 10 loops, best of 3: 20.4 msec per loop Here's the code: http://pastebin.com/nUy3kWHQ
Also, why do you think chunked reads are better here than slurping the whole file into the bytearray in one go? If you need it wholly in memory anyway, why not just issue a single read?
Sometimes it's not available all at once, I do a lot of socket programming, so this case is of interest to me. As shown above, it's also faster for python2.7. readinto() should also be significantly faster for this case, tho it isn't.
Eli

On Fri, 25 Nov 2011 20:34:21 +1100 Matt Joiner <anacrolix@gmail.com> wrote:
It's Python 3.2. I tried it for larger files and got some interesting results.
readinto() for 10MB files, reading 10MB all at once:
readinto/2.7 100 loops, best of 3: 8.6 msec per loop readinto/3.2 10 loops, best of 3: 29.6 msec per loop readinto/3.3 100 loops, best of 3: 19.5 msec per loop
With 100KB chunks for the 10MB file (annotated with #):
matt@stanley:~/Desktop$ for f in read bytearray_read readinto; do for v in 2.7 3.2 3.3; do echo -n "$f/$v "; "python$v" -m timeit -s 'import readinto' "readinto.$f()"; done; done read/2.7 100 loops, best of 3: 7.86 msec per loop # this is actually faster than the 10MB read read/3.2 10 loops, best of 3: 253 msec per loop # wtf? read/3.3 10 loops, best of 3: 747 msec per loop # wtf??
No "wtf" here, the read() loop is quadratic since you're building a new, larger, bytes object every iteration. Python 2 has a fragile optimization for concatenation of strings, which can avoid the quadratic behaviour on some systems (depends on realloc() being fast).
readinto/2.7 100 loops, best of 3: 8.93 msec per loop readinto/3.2 100 loops, best of 3: 10.3 msec per loop # suddenly 3.2 is performing well? readinto/3.3 10 loops, best of 3: 20.4 msec per loop
What if you allocate the bytearray outside of the timed function? Regards Antoine.

On Fri, Nov 25, 2011 at 10:04 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 25 Nov 2011 20:34:21 +1100 Matt Joiner <anacrolix@gmail.com> wrote:
It's Python 3.2. I tried it for larger files and got some interesting results.
readinto() for 10MB files, reading 10MB all at once:
readinto/2.7 100 loops, best of 3: 8.6 msec per loop readinto/3.2 10 loops, best of 3: 29.6 msec per loop readinto/3.3 100 loops, best of 3: 19.5 msec per loop
With 100KB chunks for the 10MB file (annotated with #):
matt@stanley:~/Desktop$ for f in read bytearray_read readinto; do for v in 2.7 3.2 3.3; do echo -n "$f/$v "; "python$v" -m timeit -s 'import readinto' "readinto.$f()"; done; done read/2.7 100 loops, best of 3: 7.86 msec per loop # this is actually faster than the 10MB read read/3.2 10 loops, best of 3: 253 msec per loop # wtf? read/3.3 10 loops, best of 3: 747 msec per loop # wtf??
No "wtf" here, the read() loop is quadratic since you're building a new, larger, bytes object every iteration. Python 2 has a fragile optimization for concatenation of strings, which can avoid the quadratic behaviour on some systems (depends on realloc() being fast).
Is there any way to bring back that optimization? a 30 to 100x slow down on probably one of the most common operations... string contatenation, is very noticeable. In python3.3, this is representing a 0.7s stall building a 10MB string. Python 2.7 did this in 0.007s.
readinto/2.7 100 loops, best of 3: 8.93 msec per loop readinto/3.2 100 loops, best of 3: 10.3 msec per loop # suddenly 3.2 is performing well? readinto/3.3 10 loops, best of 3: 20.4 msec per loop
What if you allocate the bytearray outside of the timed function?
This change makes readinto() faster for 100K chunks than the other 2 methods and clears the differences between the versions. readinto/2.7 100 loops, best of 3: 6.54 msec per loop readinto/3.2 100 loops, best of 3: 7.64 msec per loop readinto/3.3 100 loops, best of 3: 7.39 msec per loop Updated test code: http://pastebin.com/8cEYG3BD
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
So as I think Eli suggested, the readinto() performance issue goes away with large enough reads, I'd put down the differences to some unrelated language changes. However the performance drop on read(): Python 3.2 is 30x slower than 2.7, and 3.3 is 100x slower than 2.7.

On Fri, 25 Nov 2011 22:37:49 +1100 Matt Joiner <anacrolix@gmail.com> wrote:
On Fri, Nov 25, 2011 at 10:04 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 25 Nov 2011 20:34:21 +1100 Matt Joiner <anacrolix@gmail.com> wrote:
It's Python 3.2. I tried it for larger files and got some interesting results.
readinto() for 10MB files, reading 10MB all at once:
readinto/2.7 100 loops, best of 3: 8.6 msec per loop readinto/3.2 10 loops, best of 3: 29.6 msec per loop readinto/3.3 100 loops, best of 3: 19.5 msec per loop
With 100KB chunks for the 10MB file (annotated with #):
matt@stanley:~/Desktop$ for f in read bytearray_read readinto; do for v in 2.7 3.2 3.3; do echo -n "$f/$v "; "python$v" -m timeit -s 'import readinto' "readinto.$f()"; done; done read/2.7 100 loops, best of 3: 7.86 msec per loop # this is actually faster than the 10MB read read/3.2 10 loops, best of 3: 253 msec per loop # wtf? read/3.3 10 loops, best of 3: 747 msec per loop # wtf??
No "wtf" here, the read() loop is quadratic since you're building a new, larger, bytes object every iteration. Python 2 has a fragile optimization for concatenation of strings, which can avoid the quadratic behaviour on some systems (depends on realloc() being fast).
Is there any way to bring back that optimization? a 30 to 100x slow down on probably one of the most common operations... string contatenation, is very noticeable. In python3.3, this is representing a 0.7s stall building a 10MB string. Python 2.7 did this in 0.007s.
Well, extending a bytearray() (as you saw yourself) is the proper solution in such cases. Note that you probably won't see a difference when concatenating very small strings. It would be interesting if you could run the same benchmarks on other OSes (Windows or OS X, for example). Regards Antoine.

On 25 November 2011 11:37, Matt Joiner <anacrolix@gmail.com> wrote:
No "wtf" here, the read() loop is quadratic since you're building a new, larger, bytes object every iteration. Python 2 has a fragile optimization for concatenation of strings, which can avoid the quadratic behaviour on some systems (depends on realloc() being fast).
Is there any way to bring back that optimization? a 30 to 100x slow down on probably one of the most common operations... string contatenation, is very noticeable. In python3.3, this is representing a 0.7s stall building a 10MB string. Python 2.7 did this in 0.007s.
It's a fundamental, but sadly not well-understood, consequence of having immutable strings. Concatenating immutable strings in a loop is quadratic. There are many ways of working around it (languages like C# and Java have string builder classes, I believe, and in Python you can use StringIO or build a list and join at the end) but that's as far as it goes. The optimisation mentioned was an attempt (by mutating an existing string when the runtime determined that it was safe to do so) to hide the consequences of this fact from end-users who didn't fully understand the issues. It was relatively effective, but like any such case (floating point is another common example) it did some level of harm at the same time as it helped (by obscuring the issue further). It would be nice to have the optimisation back if it's easy enough to do so, for quick-and-dirty code, but it is not a good idea to rely on it (and it's especially unwise to base benchmarks on it working :-)) Paul.

2011/11/25 Paul Moore <p.f.moore@gmail.com>
The optimisation mentioned was an attempt (by mutating an existing string when the runtime determined that it was safe to do so) to hide the consequences of this fact from end-users who didn't fully understand the issues. It was relatively effective, but like any such case (floating point is another common example) it did some level of harm at the same time as it helped (by obscuring the issue further).
It would be nice to have the optimisation back if it's easy enough to do so, for quick-and-dirty code, but it is not a good idea to rely on it (and it's especially unwise to base benchmarks on it working :-))
Note that this string optimization hack is still present in Python 3, but it now acts on *unicode* strings, not bytes. -- Amaury Forgeot d'Arc

On 25 November 2011 15:07, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
2011/11/25 Paul Moore <p.f.moore@gmail.com>
It would be nice to have the optimisation back if it's easy enough to do so, for quick-and-dirty code, but it is not a good idea to rely on it (and it's especially unwise to base benchmarks on it working :-))
Note that this string optimization hack is still present in Python 3, but it now acts on *unicode* strings, not bytes.
Ah, yes. That makes sense. Paul

On 25/11/2011 15:48, Paul Moore wrote:
On 25 November 2011 15:07, Amaury Forgeot d'Arc<amauryfa@gmail.com> wrote:
2011/11/25 Paul Moore<p.f.moore@gmail.com>
It would be nice to have the optimisation back if it's easy enough to do so, for quick-and-dirty code, but it is not a good idea to rely on it (and it's especially unwise to base benchmarks on it working :-)) Note that this string optimization hack is still present in Python 3, but it now acts on *unicode* strings, not bytes. Ah, yes. That makes sense.
Although for concatenating immutable bytes presumably the same hack would be *possible*. Michael
Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On 11/24/2011 5:02 PM, Matt Joiner wrote:
What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into.
If a pure read takes twice as long in 3.3 as in 3.2, that is a concern regardless of whether there is a better way. -- Terry Jan Reedy

On Thu, 24 Nov 2011 20:53:30 +0200 Eli Bendersky <eliben@gmail.com> wrote:
Sure. Updated the default branch just now and built:
$1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()' 1000 loops, best of 3: 1.14 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 2.78 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()' 1000 loops, best of 3: 1.6 msec per loop
Strange. Although here, like in python 2, the performance of readinto is close to justread and much faster than readandcopy, but justread itself is much slower than in 2.7 and 3.2!
This seems to be a side-effect of http://hg.python.org/cpython/rev/f8a697bc3ca8/ Now I'm not sure if these numbers matter a lot. 1.6ms for a 3.6MB file is still more than 2 GB/s. Regards Antoine.

On Thu, 24 Nov 2011 20:53:30 +0200 Eli Bendersky <eliben@gmail.com> wrote:
Sure. Updated the default branch just now and built:
$1 -m timeit -s'import fileread_bytearray'
'fileread_bytearray.justread()'
1000 loops, best of 3: 1.14 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readandcopy()' 100 loops, best of 3: 2.78 msec per loop $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.readinto()' 1000 loops, best of 3: 1.6 msec per loop
Strange. Although here, like in python 2, the performance of readinto is close to justread and much faster than readandcopy, but justread itself is much slower than in 2.7 and 3.2!
This seems to be a side-effect of http://hg.python.org/cpython/rev/f8a697bc3ca8/
Now I'm not sure if these numbers matter a lot. 1.6ms for a 3.6MB file is still more than 2 GB/s.
Just to be clear, there were two separate issues raised here. One is the speed regression of readinto() from 2.7 to 3.2, and the other is the relative slowness of justread() in 3.3 Regarding the second, I'm not sure it's an issue because I tried a larger file (100MB and then also 300MB) and the speed of 3.3 is now on par with 3.2 and 2.7 However, the original question remains - on the 100MB file also, although in 2.7 readinto is 35% faster than readandcopy(), on 3.2 it's about the same speed (even a few % slower). That said, I now observe with Python 3.3 the same speed as with 2.7, including the readinto() speedup - so it appears that the readinto() regression has been solved in 3.3? Any clue about where it happened (i.e. which bug/changeset)? Eli

On Fri, 25 Nov 2011 08:38:48 +0200 Eli Bendersky <eliben@gmail.com> wrote:
Just to be clear, there were two separate issues raised here. One is the speed regression of readinto() from 2.7 to 3.2, and the other is the relative slowness of justread() in 3.3
Regarding the second, I'm not sure it's an issue because I tried a larger file (100MB and then also 300MB) and the speed of 3.3 is now on par with 3.2 and 2.7
However, the original question remains - on the 100MB file also, although in 2.7 readinto is 35% faster than readandcopy(), on 3.2 it's about the same speed (even a few % slower). That said, I now observe with Python 3.3 the same speed as with 2.7, including the readinto() speedup - so it appears that the readinto() regression has been solved in 3.3? Any clue about where it happened (i.e. which bug/changeset)?
It would probably be http://hg.python.org/cpython/rev/a1d77c6f4ec1/ Regards Antoine.

You can see in the tests on the largest buffer size tested, 8192, that the naive "read" actually outperforms readinto(). It's possibly by extrapolating into significantly larger buffer sizes that readinto() gets left behind. It's also reasonable to assume that this wasn't tested thoroughly. On Fri, Nov 25, 2011 at 9:55 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 25 Nov 2011 08:38:48 +0200 Eli Bendersky <eliben@gmail.com> wrote:
Just to be clear, there were two separate issues raised here. One is the speed regression of readinto() from 2.7 to 3.2, and the other is the relative slowness of justread() in 3.3
Regarding the second, I'm not sure it's an issue because I tried a larger file (100MB and then also 300MB) and the speed of 3.3 is now on par with 3.2 and 2.7
However, the original question remains - on the 100MB file also, although in 2.7 readinto is 35% faster than readandcopy(), on 3.2 it's about the same speed (even a few % slower). That said, I now observe with Python 3.3 the same speed as with 2.7, including the readinto() speedup - so it appears that the readinto() regression has been solved in 3.3? Any clue about where it happened (i.e. which bug/changeset)?
It would probably be http://hg.python.org/cpython/rev/a1d77c6f4ec1/
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com

However, the original question remains - on the 100MB file also, although in 2.7 readinto is 35% faster than readandcopy(), on 3.2 it's about the same speed (even a few % slower). That said, I now observe with Python 3.3 the same speed as with 2.7, including the readinto() speedup - so it appears that the readinto() regression has been solved in 3.3? Any clue about where it happened (i.e. which bug/changeset)?
It would probably be http://hg.python.org/cpython/rev/a1d77c6f4ec1/
Great, thanks. This is an important change, definitely something to wait for in 3.3 Eli

I was under the impression this is already in 3.3? On Nov 25, 2011 10:58 PM, "Eli Bendersky" <eliben@gmail.com> wrote:
However, the original question remains - on the 100MB file also,
although
in 2.7 readinto is 35% faster than readandcopy(), on 3.2 it's about the same speed (even a few % slower). That said, I now observe with Python 3.3 the same speed as with 2.7, including the readinto() speedup - so it appears that the readinto() regression has been solved in 3.3? Any clue about where it happened (i.e. which bug/changeset)?
It would probably be http://hg.python.org/cpython/rev/a1d77c6f4ec1/
Great, thanks. This is an important change, definitely something to wait for in 3.3 Eli
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com
participants (7)
-
Amaury Forgeot d'Arc
-
Antoine Pitrou
-
Eli Bendersky
-
Matt Joiner
-
Michael Foord
-
Paul Moore
-
Terry Reedy