[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Steven D'Aprano steve at pearwood.info
Sun Mar 5 21:45:55 EST 2017


On Sun, Mar 05, 2017 at 07:19:43AM +0000, Elliot Gorokhovsky wrote:

> You may remember seeing some messages on here about optimizing list.sort()
> by exploiting type-homogeneity: since comparing apples and oranges is
> uncommon (though possible, i.e. float to int), it pays off to check if the
> list is type-homogeneous 

I sometimes need to know if a list is homogenous, but unfortunately 
checking large lists for a common type in pure Python is quote slow.

Here is a radical thought... why don't lists track their common type 
themselves? There's only a few methods which can add items:

- append
- extend
- insert
- __setitem__


Suppose we gave lists a read-only attrribute, __type_hint__, which 
returns None for hetrogeneous lists and the type for homogeneous lists. Adding 
an item to the list does as follows:

- if the list is empty, adding an item sets __type_hint__ to type(item);
- if the list is not empty, adding an item tests whether type(item) 
  is identical to (not a subclass) of __type_hint__, and if not, sets
  __type_hint__ to None;
- removing an item doesn't change the __type_hint__ unless the list 
  becomes empty, in which case it is reset to None;
- if the internal allocated space of the list shrinks, that triggers
  a recalculation of the __type_hint__ if it is currently None.

(There's no need to recalculate the hint if it is not None.)

Optional: doing a list.sort() could also recalculate the hint.


The effect will be:

- if __type_hint__ is a type object, then you can be sure that 
  the list is homogeneous;

- if the __type_hint__ is None, then it might still be 
  homogeneous, but it isn't safe to assume so.

Not only could sorting take advantage of the type hint without needing 
to do a separate O(N) scan of the list, but so could other code. I know 
I would be interested in using this. I have a fair amount of code that 
has to track the type of any items seen in a list, and swap to a "type 
agnostic but slow" version if the list is not homogeneous. I could 
probably replace that with some variation of:

if thelist.__type_hint__ is None:
    process_slow(thelist)
else:
    process_fast(thelist)


At the very least, I'd be interested in experimenting with this.

Thoughts?



-- 
Steve


More information about the Python-ideas mailing list