[Tutor] How to override getting items from a list for iteration

Sun Feb 10 16:18:22 CET 2013

On 02/10/2013 10:10 AM, Dave Angel wrote:
> On 02/10/2013 09:32 AM, Walter Prins wrote:
>> Hello,
>>
>> I have a program where I'm overriding the retrieval of items from a list.
>>   As background: The data held by the lists are calculated but then read
>> potentially many times thereafter, so in order to prevent needless
>> re-calculating the same value over and over, and to remove
>> checking/caching
>> code from the calculation logic code, I therefore created a subclass of
>> list that will automatically calculate the value in a given slot
>> automatically if not yet calculated. (So differently put, I'm
>> implemented a
>> kind of list specific caching/memoization with the intent that it
>> should be
>> transparent to the client code.)
>>
>> The way I've implemented this so far was to simply override
>> list.__getitem__(self, key) to check if the value needs to be
>> calculated or
>> not and call a calculation method if required, after which the value is
>> returned as normal.  On subsequent calls __getitem__ then directly
>> returns
>> the value without calculating it again.
>>
>> This worked mostly fine, however yesterday I ran into a slightly
>> unexpected
>> problem when I found that when the list contents is iterated over and
>> values retrieved that way rather than via [], then __getitem__ is in fact
>> *not* called on the list to read the item values from the list, and
>> consequently I get back the "not yet calculated" entries in the list,
>> without the calculation routine being automatically called as is
>> intended.
>>
>> Here's a test application that demonstrates the issue:
>>
>> class NotYetCalculated:
>>      pass
>>
>> class CalcList(list):
>>      def __init__(self, calcitem):
>>          super(CalcList, self).__init__()
>>          self.calcitem = calcitem
>>
>>      def __getitem__(self, key):
>>          """Override __getitem__ to call self.calcitem() if needed"""
>>          print "CalcList.__getitem__(): Enter"
>>          value = super(CalcList, self).__getitem__(key)
>>          if value is NotYetCalculated:
>>              print "CalcList.__getitem__(): calculating"
>>              value = self.calcitem(key)
>>              self[key] = value
>>          print "CalcList.__getitem__(): return"
>>          return value
>>
>> def calcitem(key):
>>      # Demo: return square of index
>>      return key*key
>>
>>
>> def main():
>>      # Create a list that calculates its contents via a given
>>      # method/fn onece only
>>      l = CalcList(calcitem)
>>      # Extend with  few entries to demonstrate issue:
>>      l.extend([NotYetCalculated, NotYetCalculated, NotYetCalculated,
>>                NotYetCalculated])
>>
>>      print "1) Directly getting values from list works as expected:
>> __getitem__ is called:"
>>      print "Retrieving value [2]:\n", l[2]
>>      print
>>      print "Retrieving value [3]:\n", l[3]
>>      print
>>      print "Retrieving value [2] again (no calculation this time):\n",
>> l[2]
>>      print
>>
>>      print "Retrieving values via an iterator doesn't work as expected:"
>>      print "(__getitem__ is not called and the code returns "
>>      print " NotYetCalcualted entries without calling __getitem__. How
>> do I
>> fix this?)"
>>      print "List contents:"
>>      for x in l: print x
>>
>>
>> if __name__ == "__main__":
>>      main()
>>
>> To reiterate:
>>
>> What should happen:  In test 2) above all entries should be automatically
>> calculated and output should be numbers only.
>>
>> What actually happens: In test 2) above the first 2 list entries
>> corresponding to list indexes 0 and 1 are output as "NotYetCalculated"
>> and
>> calcitem is not called when required.
>>
>> What's the best way to fix this problem?  Do I need to maybe override
>> another method, perhaps provide my own iterator implementation?  For that
>> matter, why doesn't iterating over the list contents fall back to calling
>> __getitem__?
>>
>
> Implement your own __iter__() special method.
>
> And consider whether you might need __setitem__(), __len__(),
> __setslice__(), __getslice__() and others.
>
> Maybe you'd be better off not inheriting from list at all, and just
> having an attribute that's a list.  It doesn't sound like you're
> defining a very big subset of list, and overriding the methods you
> *don't* want seems to be more work than just implementing the ones you do.
>
> A separate question:  is this likely to be a sparse list?  If it's very
> sparse, perhaps you'd consider using a dict, rather than a list attribute.
>
>
>

BTW, the answer to why iterating over the list contents doesn't call 
__getitem__, the answer is because list does define __iter__, presumably 
to do it more efficiently.

And there is your clue that perhaps you don't want to inherit from list. 
  You don't want its more-efficient version, so all you have to do is 
not to implement an __iter__ and it should just work.

-- 
DaveA