[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Koos Zevenhoven k7hoven at gmail.com
Fri Mar 10 07:12:29 EST 2017


On Mon, Mar 6, 2017 at 4:45 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Mar 05, 2017 at 07:19:43AM +0000, Elliot Gorokhovsky wrote:
>
>> You may remember seeing some messages on here about optimizing list.sort()
>> by exploiting type-homogeneity: since comparing apples and oranges is
>> uncommon (though possible, i.e. float to int), it pays off to check if the
>> list is type-homogeneous
>
> I sometimes need to know if a list is homogenous, but unfortunately
> checking large lists for a common type in pure Python is quote slow.
>
> Here is a radical thought... why don't lists track their common type
> themselves? There's only a few methods which can add items:
>
> - append
> - extend
> - insert
> - __setitem__
>

I can also imagine other places where knowing those one or two bits of
information about homogeneity might potentially allow speedups:
converting lists or tuples to numpy arrays, min, max, sum etc.

If this extra one or two bits of information were tracked, and the
overhead of doing that was very small, then operations over the
collections could benefit regardless of the complexity of the
algorithm, so also O(n) operations.

Too bad that one common thing done with lists – iterating – does not
have obvious benefits from type homogeneity in CPython.

—Koos


> Suppose we gave lists a read-only attrribute, __type_hint__, which
> returns None for hetrogeneous lists and the type for homogeneous lists. Adding
> an item to the list does as follows:
>
> - if the list is empty, adding an item sets __type_hint__ to type(item);
> - if the list is not empty, adding an item tests whether type(item)
>   is identical to (not a subclass) of __type_hint__, and if not, sets
>   __type_hint__ to None;
> - removing an item doesn't change the __type_hint__ unless the list
>   becomes empty, in which case it is reset to None;
> - if the internal allocated space of the list shrinks, that triggers
>   a recalculation of the __type_hint__ if it is currently None.
>
> (There's no need to recalculate the hint if it is not None.)
>
> Optional: doing a list.sort() could also recalculate the hint.
>
>
> The effect will be:
>
> - if __type_hint__ is a type object, then you can be sure that
>   the list is homogeneous;
>
> - if the __type_hint__ is None, then it might still be
>   homogeneous, but it isn't safe to assume so.
>
> Not only could sorting take advantage of the type hint without needing
> to do a separate O(N) scan of the list, but so could other code. I know
> I would be interested in using this. I have a fair amount of code that
> has to track the type of any items seen in a list, and swap to a "type
> agnostic but slow" version if the list is not homogeneous. I could
> probably replace that with some variation of:
>
> if thelist.__type_hint__ is None:
>     process_slow(thelist)
> else:
>     process_fast(thelist)
>
>
> At the very least, I'd be interested in experimenting with this.
>
> Thoughts?
>
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +


More information about the Python-ideas mailing list