Interesting list() un-optimization

Thu Mar 7 02:31:26 EST 2013

Roy Smith wrote:

> I stumbled upon an interesting bit of trivia concerning lists and list
> comprehensions today.
> 
> We use mongoengine as a database model layer.  A mongoengine query
> returns an iterable object called a QuerySet.  The "obvious" way to
> create a list of the query results would be:
> 
>     my_objects = list(my_query_set)
> 
> and, indeed, that works.  But, then I found this code:
> 
>    my_objects = [obj for obj in my_query_set]
> 
> which seemed a bit silly.  I called over the guy who wrote it and asked
> him why he didn't just write it using list().  I was astounded when it
> turned out there's a good reason!
> 
> Apparently, list() has an "optimization" where it calls len() on its
> argument to try and discover the number of items it's going to put into
> the list.  Presumably, list() uses this information to pre-allocate the
> right amount of memory the first time, without any resizing.  If len()
> fails, it falls back to just iterating and resizing as needed.
> Normally, this would be a win.
> 
> The problem is, QuerySets have a __len__() method.  Calling it is a lot
> faster than iterating over the whole query set and counting the items,
> but it does result in an additional database query, which is a lot
> slower than the list resizing!  Writing the code as a list comprehension
> prevents list() from trying to optimize when it shouldn't!

Interesting discovery.  Yet isn't this as much an issue with the mongoengine 
library as with list()?  Queryset.count() can be called if the "length" of a 
resultset needs to be retrieved, so the __len__() methd seems redundant.  
And given that it's not unheard of to call list() on iterables, perhaps the 
library designers should either optimise the __len__() method, or document 
the performance implications of calling list on the queryset? 

Anyway, thanks for this thought-provoking post.

Cheers,

Kev