[Tutor] How to override getting items from a list for iteration

Sun Feb 10 16:15:57 CET 2013

Walter Prins wrote:

> Hello,
> 
> I have a program where I'm overriding the retrieval of items from a list.
>  As background: The data held by the lists are calculated but then read
> potentially many times thereafter, so in order to prevent needless
> re-calculating the same value over and over, and to remove
> checking/caching code from the calculation logic code, I therefore created
> a subclass of list that will automatically calculate the value in a given
> slot automatically if not yet calculated. (So differently put, I'm
> implemented a kind of list specific caching/memoization with the intent
> that it should be transparent to the client code.)
> 
> The way I've implemented this so far was to simply override
> list.__getitem__(self, key) to check if the value needs to be calculated
> or not and call a calculation method if required, after which the value is
> returned as normal.  On subsequent calls __getitem__ then directly returns
> the value without calculating it again.
> 
> This worked mostly fine, however yesterday I ran into a slightly
> unexpected problem when I found that when the list contents is iterated
> over and values retrieved that way rather than via [], then __getitem__ is
> in fact *not* called on the list to read the item values from the list,
> and consequently I get back the "not yet calculated" entries in the list,
> without the calculation routine being automatically called as is intended.
> 
> Here's a test application that demonstrates the issue:
> 
> class NotYetCalculated:
>     pass
> 
> class CalcList(list):
>     def __init__(self, calcitem):
>         super(CalcList, self).__init__()
>         self.calcitem = calcitem
> 
>     def __getitem__(self, key):
>         """Override __getitem__ to call self.calcitem() if needed"""
>         print "CalcList.__getitem__(): Enter"
>         value = super(CalcList, self).__getitem__(key)
>         if value is NotYetCalculated:
>             print "CalcList.__getitem__(): calculating"
>             value = self.calcitem(key)
>             self[key] = value
>         print "CalcList.__getitem__(): return"
>         return value
> 
> def calcitem(key):
>     # Demo: return square of index
>     return key*key
> 
> 
> def main():
>     # Create a list that calculates its contents via a given
>     # method/fn onece only
>     l = CalcList(calcitem)
>     # Extend with  few entries to demonstrate issue:
>     l.extend([NotYetCalculated, NotYetCalculated, NotYetCalculated,
>               NotYetCalculated])
> 
>     print "1) Directly getting values from list works as expected:
> __getitem__ is called:"
>     print "Retrieving value [2]:\n", l[2]
>     print
>     print "Retrieving value [3]:\n", l[3]
>     print
>     print "Retrieving value [2] again (no calculation this time):\n", l[2]
>     print
> 
>     print "Retrieving values via an iterator doesn't work as expected:"
>     print "(__getitem__ is not called and the code returns "
>     print " NotYetCalcualted entries without calling __getitem__. How do I
> fix this?)"
>     print "List contents:"
>     for x in l: print x
> 
> 
> if __name__ == "__main__":
>     main()
> 
> To reiterate:
> 
> What should happen:  In test 2) above all entries should be automatically
> calculated and output should be numbers only.
> 
> What actually happens: In test 2) above the first 2 list entries
> corresponding to list indexes 0 and 1 are output as "NotYetCalculated" and
> calcitem is not called when required.
> 
> What's the best way to fix this problem?  Do I need to maybe override
> another method, perhaps provide my own iterator implementation?  For that
> matter, why doesn't iterating over the list contents fall back to calling
> __getitem__?

Probably an optimisation for the common case where retrieval of list items 
does not involve any calculation.

You can override the __iter__() along the lines of

def __iter__(self):
    for i in range(len(self)):
        return self[i]

If the items are calculated from the index as in your example there's also 
the option to inherit from collections.Sequence instead of list:

from collections import Sequence

class List(Sequence):
    def __init__(self, getitem, size):
        self.getitem = getitem
        self._cache = [None] * size
    def __getitem__(self, index):
        assert not isinstance(index, (slice, tuple))
        value = self._cache[index]
        if value is None:
            value = self._cache[index] = self.getitem(index)
        return value
    def __len__(self):
        return len(self._cache)

if __name__ == "__main__":
    items = List(lambda x: x*x, 10)
    print("items[4] =", items[4])
    print("items =", list(items))

But first and foremost I'd seriously reinvestigate your caching scheme. Does 
it really save enough time to make it worthwhile?