[Python-ideas] Keep free list of popular iterator objects

Kyle Fisher anthonyfk at gmail.com
Sun Sep 15 20:50:42 CEST 2013


Hi Raymond,

Thanks for taking the time to respond, I appreciate that!  Please note that
I'm not attempting to be deceptive.  I'm not looping over a list of length
one; I'm just timing the creation of the iterator object.  In doing this,
I've shown that, in a micro-benchmark, creating an iterator object can be
made 15% faster with a freelist.

Yes, in one loop this is focusing on an O(1) setup of an O(n) operation.  I
think the main benefit of this would be for inner loops.  (Which is what my
benchmark tested, no?)  In our case, we tend to have a large list to
iterate over (n) where each item has a couple containers that we also need
to iterate over (size m).  In this case, I'm focusing on the O(n) setup of
the O(n*m) operation where n is large.  Surely this isn't completely
wasteful?

I'm not completely sure what the best way to exercise the case where the
freelist misses.  Create one more than the number of freelisted iterators,
perhaps?  I'm not sure if that would be reflective of real-world uses
though.  Ignoring threads or other spontaneous iterator creation, either a
particular loop is going to have its iterator in the freelist or not.  In
the case not, we'll just check an iterator and fall back to the default
freelist.

In regards to your second paragraph, would a more real-world benchmark
help?  I don't want to put too many more resources into a bad idea, but I
know in our app we tend to iterate over things a lot, so my hunch is that
the iterator freelist would be in cache more often than not.  Forgive my
ignorance, but is there a macro-benchmark suite I could try this against?
Even if the iterator freelist isn't in first-level cache, I'm almost
certain it would exist within last-level cache.  Do you know what the cost
of fetching from this is compared to grabbing a lock like the current
_Py_Malloc does?

Again, thanks for the time.  This is literally the first thing I've tried
hacking into Python because it seemed like a cheap, easy (albeit, minor)
improvement to a very common operation.

Best,
-Kyle

P.S. I would like to put some effort into aligned memory allocations!  I've
been casually browsing the issue on the bug tracker over the last week or
so; I'm somewhat surprised this isn't already the case for the numerical
types!


On Sun, Sep 15, 2013 at 10:38 AM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

>
> On Sep 14, 2013, at 11:26 PM, Kyle Fisher <anthonyfk at gmail.com> wrote:
>
> I've realized that my original example is far too complex, so I've
> simplified it:
>
> Status quo:
> ./python -m timeit -r 100 -s "a=[1]" "iter(a)"
> 10000000 loops, best of 100: 0.0662 usec per loop
>
> With patch:
> ./python -m timeit -r 100 -s "a=[1]" "iter(a)"
> 10000000 loops, best of 100: 0.0557 usec per loop
> List iter allocations: 6
> List iter reuse through freelist: 1011111554
> 100.00% reuse rate
>
> Which seems to show a 15% speedup.  I'd be curious what others get.
>
>
> This 15% claim is incredibly deceptive.  You're looping over a list of
> length one and the "benefits" fall away immediately for anything longer.
>  It seems like it is intentionally ignoring that you're optimizing an O(1)
> setup step in an O(n) operation.  And the timing loop does not exercise the
> cases where the freelist misses.
>
> More important effects are being masked by the tight timing loop that only
> exercises the most favorable case.  In real programs, your patch may
> actually make performance worse.  The default freelisting scheme is heavily
> used and tends to always be in cache.  In contrast, a freelist for less
> frequently used objects tend to not be in cache when you need them.
> Similar logic applies to branch prediction here as well. (In short, I
> believe that the patch serves only to optimize an unrealistic benchmark and
> would make actual programs worse-off).
>
> I'm -1 on adding freelisting to iterators.  I recently removed the
> freelist scheme from Objects/setobject.c because it provided no incremental
> benefit over the default freelisting scheme.
>
> Please focus your optimization efforts elsewhere in the code.  There are
> real improvements to be had for operations that matter.  The time to create
> an iterator is one of the least important operations in Python.   If this
> had been a real win, I would have incorporated it into itertools long ago.
>
>
> Raymond
>
>
> P.S.  If you want to help benchmark the effects of aligned versus
> unaligned memory allocations, that is an area this likely to bear fruit
> (for example, if integer objects with 32 byte aligned, it would guarantee
> that the object head and body would be in the same cache line).
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130915/73e06b86/attachment.html>


More information about the Python-ideas mailing list