Adding "Typed" collections/iterators to Python
I believe it would be a good idea in instances where it is known that a collection of a single type is going to be returned, to return a subclass with type information and type specific methods "mixed in". You could provide member methods as collection methods that operate in a vectorized manner, returning a new collection or iterator with the results much like the mathematical functions in NumPy. This would also give people a reliable method to make functions operate on both scalar and vector values. I believe this could be implemented without needing subclasses for everything under the sun with a generic collection "type contract" mix-in. If a developer wanted to provide additional type specific collection/iterator methods they would of course need to subclass that. To avoid handcuffing people with types (which is definitely un-pythonic) and maintain backwards compatibility, the standard collection modification methods could be hooked so that if an object of an incorrect type is added, a warning is raised and the collection gracefully degrades by removing mixed-in type information and methods. Additionally, a method could be provided that lets the user "terminate the contract" causing the collection to degrade without a warning. I have several motivations for this: -- Performing a series of operations using comprehensions or map tends to be highly verbose in an uninformative way. Compare the current method with what would be possible using "typed" collections: L2 = [X(e) for e in L1] L3 = [Y(e) for e in L2] vs L2 = X(L1) # assuming X has been updated to work in both vector/scalar L3 = Y(L2) # context... L2 = [Z(Y(X(e))) for e in L1] vs L2 = Z(Y(X(L1))) L2 = [e.X().Y().Z() for e in L1] vs L2 = L1.X().Y().Z() # assuming vectorized versions of member methods #are folded into the collection via the mixin. -- Because collections are type agnostic, it is not possible to place methods on them that are type specific. This leads to a lot of cases where python forces you to read inside out or a the syntax gets very disjoint in general. A good example of this is: "\n".join(l.capitalize() for l in my_string.split("\n")) which could reduce to something far more readable, such as: my_string.split("\n").capitalize().join_items("\n") Besides the benefits to basic language usability (in my opinion) there are tool and development benefits: -- The additional type information would simplify static analysis and provide cues for optimization (I'm looking at pypy here; their list strategies play to this perfectly) -- The warning on "violating the contract" and without first terminating it would be a helpful tool in catching and debugging errors. I have some thoughts on syntax and specifics that I think would work well, however I wanted to solicit more feedback before I go too far down that path. Nathan
participants (9)
-
alex23
-
Devin Jeanpierre
-
Ethan Furman
-
Greg Ewing
-
Nathan Rice
-
Nick Coghlan
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy