[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
Antoine Pitrou
solipsis at pitrou.net
Fri Nov 25 13:11:57 CET 2011
On Fri, 25 Nov 2011 22:37:49 +1100
Matt Joiner <anacrolix at gmail.com> wrote:
> On Fri, Nov 25, 2011 at 10:04 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > On Fri, 25 Nov 2011 20:34:21 +1100
> > Matt Joiner <anacrolix at gmail.com> wrote:
> >>
> >> It's Python 3.2. I tried it for larger files and got some interesting results.
> >>
> >> readinto() for 10MB files, reading 10MB all at once:
> >>
> >> readinto/2.7 100 loops, best of 3: 8.6 msec per loop
> >> readinto/3.2 10 loops, best of 3: 29.6 msec per loop
> >> readinto/3.3 100 loops, best of 3: 19.5 msec per loop
> >>
> >> With 100KB chunks for the 10MB file (annotated with #):
> >>
> >> matt at stanley:~/Desktop$ for f in read bytearray_read readinto; do for
> >> v in 2.7 3.2 3.3; do echo -n "$f/$v "; "python$v" -m timeit -s 'import
> >> readinto' "readinto.$f()"; done; done
> >> read/2.7 100 loops, best of 3: 7.86 msec per loop # this is actually
> >> faster than the 10MB read
> >> read/3.2 10 loops, best of 3: 253 msec per loop # wtf?
> >> read/3.3 10 loops, best of 3: 747 msec per loop # wtf??
> >
> > No "wtf" here, the read() loop is quadratic since you're building a
> > new, larger, bytes object every iteration. Python 2 has a fragile
> > optimization for concatenation of strings, which can avoid the
> > quadratic behaviour on some systems (depends on realloc() being fast).
>
> Is there any way to bring back that optimization? a 30 to 100x slow
> down on probably one of the most common operations... string
> contatenation, is very noticeable. In python3.3, this is representing
> a 0.7s stall building a 10MB string. Python 2.7 did this in 0.007s.
Well, extending a bytearray() (as you saw yourself) is the proper
solution in such cases. Note that you probably won't see a difference
when concatenating very small strings.
It would be interesting if you could run the same benchmarks on other
OSes (Windows or OS X, for example).
Regards
Antoine.
More information about the Python-Dev
mailing list