[Tutor] __iter__ loops, partitioning list among children

Eric Abrahamsen eric at ericabrahamsen.net
Sat Aug 23 12:47:15 CEST 2008


Hi,

I've got a problem that takes a bit of explaining, but it's relatively  
simple when you get down to it. This is another django-related thing,  
but the issue itself is pure python.

I made a custom class, called an EventEngine, which represents a span  
of time. You initialize it with a queryset/list of django model  
instances, and the resulting EventEngine object has an 'events'  
attribute, which is a queryset/list of model instances that fall  
within that span of time, according to a datetime attribute on the  
django model.

Next come EventEngine subclasses, representing commonly used spans of  
time – Month, Week, Day, etc. The EventEngine __iter__ method is  
overridden so that each of these subclasses, when iterated over,  
produces child instances of the next smallest EventEngine subclass.  
Iterating on a Month produces Weeks, iterating on a Week produces Days  
and so on.

As each parent produces its children, the events in the 'events'  
attribute get partitioned out to the children as appropriate to their  
date ranges. Right now I'm doing this as stupidly as possible: for  
each child 'c' in the parent, I loop over all the parent events, and  
append them to c.events if their date attribute falls within the  
child's date range:

for e in self.events:
   if c.start <= getattr(e,self.start_attr) < c.stop:
   c.events.append(e)

This is stupid for (at least) two reasons: 1) it evaluates the entire  
queryset and sticks it in memory with the first child instance,  
obviating the whole point of an iterable, and 2) it loops over the  
entire event list for every child, even though the list is already  
properly sorted by date, and we could break from the loop with the  
first event that falls outside the child's date range. Performance is  
important, as some madman could initialize a Year with a huge  
queryset, and do nested iterations all the way down to Minute.

So I've got a few considerations:

1. The parent's events queryset should only be evaluated as much as is  
needed to yield one child at a time.
2. The parent's events queryset should not be duplicated as it is  
distributed among the children, ie it should make references and not  
copies (I don't know how to make sure this happens).
3. I don't want to empty the parent's 'events' list as it is  
distributed among the children, as other methods depend on the list  
being there.
4. The events attribute of the 'top level' instance is a django  
queryset, but it will be a list for all descendent instances (they can  
be nested arbitrarily deeply). This limits the number of functions  
available – ie you can't pop() a django queryset.

At first I thought the bisect module was the way to go, but it is too  
tightly tied to integer list indices, and works very awkwardly when  
bisecting on datetime attributes. List slices produce copies, unless  
I'm mistaken. Something tells me that a plain old while loop with a  
break might do the trick, but I can't quite see it clear. Ideally this  
would be something that starts iterating where the date attribute  
falls into the right range, and then stops iterating as soon as it  
passes out of that range. Probably I'll have to settle for just one or  
the other, but one can dream.

Thanks for reading all this. I'm hoping that all the restrictions and  
conditions will make the solution more obvious, rather than less  
obvious, to those who know more than I...

--Eric


More information about the Tutor mailing list