[Python-ideas] Keep free list of popular iterator objects

Sun Sep 15 20:52:57 CEST 2013

"In the case not, we'll just check an iterator and fall back to the default
freelist."

Should of course be:

"In the case not, we'll just check an integer and fall back to the default
freelist."

On Sun, Sep 15, 2013 at 12:50 PM, Kyle Fisher <anthonyfk at gmail.com> wrote:

> Hi Raymond,
>
> Thanks for taking the time to respond, I appreciate that!  Please note
> that I'm not attempting to be deceptive.  I'm not looping over a list of
> length one; I'm just timing the creation of the iterator object.  In doing
> this, I've shown that, in a micro-benchmark, creating an iterator object
> can be made 15% faster with a freelist.
>
> Yes, in one loop this is focusing on an O(1) setup of an O(n) operation.
> I think the main benefit of this would be for inner loops.  (Which is what
> my benchmark tested, no?)  In our case, we tend to have a large list to
> iterate over (n) where each item has a couple containers that we also need
> to iterate over (size m).  In this case, I'm focusing on the O(n) setup of
> the O(n*m) operation where n is large.  Surely this isn't completely
> wasteful?
>
> I'm not completely sure what the best way to exercise the case where the
> freelist misses.  Create one more than the number of freelisted iterators,
> perhaps?  I'm not sure if that would be reflective of real-world uses
> though.  Ignoring threads or other spontaneous iterator creation, either a
> particular loop is going to have its iterator in the freelist or not.  In
> the case not, we'll just check an iterator and fall back to the default
> freelist.
>
> In regards to your second paragraph, would a more real-world benchmark
> help?  I don't want to put too many more resources into a bad idea, but I
> know in our app we tend to iterate over things a lot, so my hunch is that
> the iterator freelist would be in cache more often than not.  Forgive my
> ignorance, but is there a macro-benchmark suite I could try this against?
> Even if the iterator freelist isn't in first-level cache, I'm almost
> certain it would exist within last-level cache.  Do you know what the cost
> of fetching from this is compared to grabbing a lock like the current
> _Py_Malloc does?
>
> Again, thanks for the time.  This is literally the first thing I've tried
> hacking into Python because it seemed like a cheap, easy (albeit, minor)
> improvement to a very common operation.
>
> Best,
> -Kyle
>
> P.S. I would like to put some effort into aligned memory allocations!
> I've been casually browsing the issue on the bug tracker over the last week
> or so; I'm somewhat surprised this isn't already the case for the numerical
> types!
>
>
> On Sun, Sep 15, 2013 at 10:38 AM, Raymond Hettinger <
> raymond.hettinger at gmail.com> wrote:
>
>>
>> On Sep 14, 2013, at 11:26 PM, Kyle Fisher <anthonyfk at gmail.com> wrote:
>>
>> I've realized that my original example is far too complex, so I've
>> simplified it:
>>
>> Status quo:
>> ./python -m timeit -r 100 -s "a=[1]" "iter(a)"
>> 10000000 loops, best of 100: 0.0662 usec per loop
>>
>> With patch:
>> ./python -m timeit -r 100 -s "a=[1]" "iter(a)"
>> 10000000 loops, best of 100: 0.0557 usec per loop
>> List iter allocations: 6
>> List iter reuse through freelist: 1011111554
>> 100.00% reuse rate
>>
>> Which seems to show a 15% speedup.  I'd be curious what others get.
>>
>>
>> This 15% claim is incredibly deceptive.  You're looping over a list of
>> length one and the "benefits" fall away immediately for anything longer.
>>  It seems like it is intentionally ignoring that you're optimizing an O(1)
>> setup step in an O(n) operation.  And the timing loop does not exercise the
>> cases where the freelist misses.
>>
>> More important effects are being masked by the tight timing loop that
>> only exercises the most favorable case.  In real programs, your patch may
>> actually make performance worse.  The default freelisting scheme is heavily
>> used and tends to always be in cache.  In contrast, a freelist for less
>> frequently used objects tend to not be in cache when you need them.
>> Similar logic applies to branch prediction here as well. (In short, I
>> believe that the patch serves only to optimize an unrealistic benchmark and
>> would make actual programs worse-off).
>>
>> I'm -1 on adding freelisting to iterators.  I recently removed the
>> freelist scheme from Objects/setobject.c because it provided no incremental
>> benefit over the default freelisting scheme.
>>
>> Please focus your optimization efforts elsewhere in the code.  There are
>> real improvements to be had for operations that matter.  The time to create
>> an iterator is one of the least important operations in Python.   If this
>> had been a real win, I would have incorporated it into itertools long ago.
>>
>>
>> Raymond
>>
>>
>> P.S.  If you want to help benchmark the effects of aligned versus
>> unaligned memory allocations, that is an area this likely to bear fruit
>> (for example, if integer objects with 32 byte aligned, it would guarantee
>> that the object head and body would be in the same cache line).
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130915/61ba0fd8/attachment-0001.html>