[Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

Erik python at lucidity.plus.com
Tue Mar 7 15:46:45 EST 2017


On 06/03/17 03:08, David Mertz wrote:
> On Sun, Mar 5, 2017 at 6:45 PM, Steven D'Aprano <steve at pearwood.info
> <mailto:steve at pearwood.info>> wrote:
>
>     Here is a radical thought... why don't lists track their common type
>     themselves? There's only a few methods which can add items:
>
>
> I had exactly the same thought.  Lists would need to grow a new
> attribute, of course. I'm not sure how that would affect the object
> layout and word boundaries.  But there might be free space for another
> attribute slot.
>
> The real question is whether doing this is a win.  On each
> append/mutation operation we would need to do a comparison to the
> __type_hint__ (assuming Steven's spelling of the attribute).  That's not
> free.  Balancing that, however, when we actually *did* a sort, it would
> be O(1) to tell if it was homogeneous (and also the actual type if yes)
> rather than O(N).

I don't think anyone has mentioned this yet, but FWIW I think the 'type 
hint' may need to be tri-state: heterogeneous (NULL), homogeneous (the 
pointer to the type structure) and also "unknown" (a sentinel value - 
the address of a static char or something).

Otherwise, a delete operation on the list would need to scan the list to 
work out if it had changed from heterogeneous to homogeneous (unless it 
was acceptable that once heterogeneous, a list is always considered 
heterogeneous - i.e., delete always sets the hint to NULL).

Instead, delete would change a NULL hint to the sentinel (leaving a 
valid type hint as it is) and then prior to sorting - as the hint is 
being checked anyway - if it's the sentinel value, perform the pre-scan 
that the existing patch is doing to restore the knowledge of just what 
type of list it is.


I'd prefer the sort optimization to be based on what my list contains 
NOW, not on what it may have contained some time in the past, so I'm not 
a fan of the "once heterogeneous, always considered heterogeneous" 
behaviour if it's cheap enough to avoid it.


E.



More information about the Python-ideas mailing list