[Python-ideas] Adding "Typed" collections/iterators to Python

Terry Reedy tjreedy at udel.edu
Tue Dec 20 02:14:49 CET 2011


On 12/19/2011 9:30 AM, Nathan Rice wrote:

> Couple things.
>
> 1. The "broadcasting" that people seemed to have latched on to is only
> part of what I put forward,

Perhaps because it is the most understandable.

> and I agree it is something that would
> have to be done *correctly* to be beneficial.  I would have no issues
> with providing a userspace lib to do this if type decorations were
> included in homogeneous collections/iterables,

The meaning of 'homogeneous' depends on the context -- the purpose and 
use of the collection. For some purposes -- str(o), len(c), o in c, 
c.index(o), and others,  -- all objects, collections, or seqeuences 
*are* 'homogeneous' as instances or subclasses of 'object'. On the other 
hand, even [-1, 0, 1] is heterogeneous with respect to both sqrt and 
log, with the divide different for each. So I do not consider 
'homogeneous' to be a property of collections as such.

Python's current restricted-type mutable sequence factory is 
array.array. The types do not even have to be Python types, just machine 
storage types. The typecode is part of the object and exposed as an 
attribute. Such sequences cannot be 'degraded' because type-checking is 
done with all operations. It would not be difficult to make a TypedList 
class that did the same, either subclassing or wrapping list.

What you have noticed is that iter(array(tc,init)) does not get the 
typecode information, so potentially useful information is lost. Your 
first concrete proposal might be that the information be kept and that 
arrayiterators get a type attribute corresponding to the Python type 
that the produced values are converted to. Also, array could expose the 
mapping to typecodes to Python types. These changes would allow 
experiments that would show the value of your basic idea.

> as long as the
> implementation of the decoration didn't suffer from some form of
> "string failure" (string subclasses are basically worthless as methods
> return strings, not an instance of the class).

This problem is generic to subclassing built-in classes. List would be a 
better example here since strings already are specialized sequences.

> 2. A "type decorator" on homogeneous collections and iterables has a
> lot of nice little benefits throughout the toolchain.

That is what you need to demonstrate, because it does not seem clear 
yet. What would you do with an arrayiterator with a type attribute.

By the way, a 'decorator' in Python is a specific category of callable 
used in a specific way. Perhaps you mean 'type attribute'?

> 3. Being able to place methods on a "type decorator" is useful,

'Placing methods' on an attribute or even a callable does not mean much.
You can only concretely add methods to concrete classes, not abstract 
categories.

> it solves issues like "foo".join() which really wants to be a method on
> string collections.

No it does not. 'String collection' is a category, not a class. Nor can 
it be a class without drastically revising Python. It is a category that 
cuts across all generic collection classes. So .join has to be a method 
of the joiner class.

> 4. I wanted to gauge people's feelings before I went through the steps
> involved in writing a PEP.  I believe that is the right thing to do,
> so I don't feel the "hand waving" comment is warranted.

To the extent one does not understand what you say, and to the extent 
that it seems disconnected from concrete reality, it is easy to see it 
as hand waving. That you perhaps did not understand why .join is a 
string method points in that direction.

>  I've already
> learned people view collections that provide child object methods in
> vector form as a very big change

Because we understand that non-method functions have virtues, and Python 
already has collection functions.

 > even if it is backwards compatible; that is fine.

Backwards compatible duplication needs justification.
...
> I agree that changes to syntax and commonly used modules that impact
> how people interface with them should be carefully vetted.  Type
> decorations on homogeneous collections/iterators are effectively

I am still not sure what you are really proposing. You may have the germ 
of a useful idea, but I think it needs clarification and a demonstration.

> invisible in that perspective though;

Slowdowns are not invisible. Requiring a type check on every addition to 
every built-in collection might result in such.

> the main problem with them as I
> see it is that it involves touching a lot of code to implement, even
> if the actual implementation would be simple.

Changes that touch a lot of code are fairly rare and require major 
benefits. One was the switch to new-style classes started in 2.2 and 
ended in 3.0. Several people contributed patches. They must have thought 
that unifying types and classes into one system was worth it.

In 3.3, the two unicode implementations (one per build) are effectively 
combined with a third with a new C-level API. Adding and tweaking the 
new API (which continues today) and converting the entire C core and 
stdlib codebase to the new API has required something on the order of 50 
patches over 3 months, so far. But it improves performance (overall) and 
removed the inherent bugs in representing 3-bytes chars with 2 2-byte 
chars and in having different Python builds respond differently to the 
same code. Note that the PEP concretely lays out the new C structures 
and API and that there was a prototype implementation showing benefits 
before it was approved.

-- 
Terry Jan Reedy




More information about the Python-ideas mailing list