Adding "Typed" collections/iterators to Python

I believe it would be a good idea in instances where it is known that a collection of a single type is going to be returned, to return a subclass with type information and type specific methods "mixed in". You could provide member methods as collection methods that operate in a vectorized manner, returning a new collection or iterator with the results much like the mathematical functions in NumPy. This would also give people a reliable method to make functions operate on both scalar and vector values. I believe this could be implemented without needing subclasses for everything under the sun with a generic collection "type contract" mix-in. If a developer wanted to provide additional type specific collection/iterator methods they would of course need to subclass that. To avoid handcuffing people with types (which is definitely un-pythonic) and maintain backwards compatibility, the standard collection modification methods could be hooked so that if an object of an incorrect type is added, a warning is raised and the collection gracefully degrades by removing mixed-in type information and methods. Additionally, a method could be provided that lets the user "terminate the contract" causing the collection to degrade without a warning. I have several motivations for this: -- Performing a series of operations using comprehensions or map tends to be highly verbose in an uninformative way. Compare the current method with what would be possible using "typed" collections: L2 = [X(e) for e in L1] L3 = [Y(e) for e in L2] vs L2 = X(L1) # assuming X has been updated to work in both vector/scalar L3 = Y(L2) # context... L2 = [Z(Y(X(e))) for e in L1] vs L2 = Z(Y(X(L1))) L2 = [e.X().Y().Z() for e in L1] vs L2 = L1.X().Y().Z() # assuming vectorized versions of member methods #are folded into the collection via the mixin. -- Because collections are type agnostic, it is not possible to place methods on them that are type specific. This leads to a lot of cases where python forces you to read inside out or a the syntax gets very disjoint in general. A good example of this is: "\n".join(l.capitalize() for l in my_string.split("\n")) which could reduce to something far more readable, such as: my_string.split("\n").capitalize().join_items("\n") Besides the benefits to basic language usability (in my opinion) there are tool and development benefits: -- The additional type information would simplify static analysis and provide cues for optimization (I'm looking at pypy here; their list strategies play to this perfectly) -- The warning on "violating the contract" and without first terminating it would be a helpful tool in catching and debugging errors. I have some thoughts on syntax and specifics that I think would work well, however I wanted to solicit more feedback before I go too far down that path. Nathan

On Mon, Dec 19, 2011 at 9:28 AM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
This use case is why map() remains a builtin, even in Python 3: L2 = map(X, L1) L3 = map(Y, L2) Short, but explicit (no under-the-hood guessing about whether or not something should be treated as a scalar or vector value - in the general case, this distinction isn't as clear as you might think, just look at strings).
def XYZ(arg): """Look, I can document what this means!""" return Z(Y(X(arg))) L2 = map(XYZ, L1)
def XYZ_methods(arg): """I can also document what *this* means""" return arg.X().Y().Z() L2 = map(XYZ_methods, L1)
Another bad example, since that's just a really verbose way of writing my_string.capitalize(). Short answer: what advantage does your proposal really offer over simply extracting the repetitive operation out to a use case specific function, and making effective use of the existing vectorisation utilities (i.e. map() and itertools)? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 18, 2011 at 6:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yes, but map(lambda x: getattr(x, "method")(), thing) is ugly, and map(lambda x: x.method_2(param), map(lambda x: x.method(param), thing)) is really ugly. On top of that, it is asking more of code analysis tools to verify that code, and IDEs aren't going to be able to tell you the methods on the ambiguous x in the lambda. Sure if the only argument is self, you could call class.method, but I don't think that is the majority use case.
Yes, I love that feature of strings, it is a source of lots of bugs, but I digress. The reason this partially solves that problem is that instead of having to do a bunch of guesswork on an iterable to see if you should do the vectorized version of the function, you just check to see if it is an instance of a certain TypedCollectionContract. If so, vectorize. No "isinstance(foo, Iterable) and not isinstance(foo, basestr)" silliness here.
What about WXY, and XZ, and WYZ, and YZ, and...
The python interpreter says otherwise...
IDEs can provide context hints, interpreters can use the contract to change how they treat the collection to improve performance, lint style code analysis will be easier and it is across the board less verbose than the currently available options, while reading left to right with fewer non letter characters I.E. more understandable. Nathan

On Dec 19, 12:24 pm, Nathan Rice <nathan.alexander.r...@gmail.com> wrote:
Yes, but map(lambda x: getattr(x, "method")(), thing) is ugly
from operator import methodcaller method = methodcaller('method') result = map(method, thing)
map(lambda x: x.method_2(param), map(lambda x: x.method(param), thing)) is really ugly.
method = methodcaller('method', param) method2 = methodcaller('method_2', param) result = map(method2, map(method, thing)) If your code is ugly, stop writing ugly code. :)

Received via email, this isn't a private discussion:
If you consider importing functionality 'clunky', then we're already at an impasse over what we consider to be 'elegant'. I also don't believe that user ignorance of standard library features mandates the inclusion of _everything_ into the base langauge.
Just because you CAN do something already doesn't mean you can do it in an elegant way.
And just because YOU find it inelegant doesn't make it so. I find using operator & functools _far_ clearer in intent than using lambda, _and it works right now_, which was the point I was trying to make here. You wrote unreadable code and then tried to use that as an argument for your idea. Breaking down complex statements into clearer parts isn't a radical notion; cramming more and more functionality onto a single line isn't something to which Python needs to aspire.
If the only metric was "can we do this?" we would all still be using Fortran.
And if it's "I don't want to have to code to deal with this, let the language change to do it", you end up with PHP.

My apologies for the direct email. Gmail is somewhat retarded when it comes to the python lists, and I don't always remember to change the to:...
There are a lot of warts in the standard library, (thus Py3). I'm all for importing things that deserve to be in their own namespace with other, similar things. I'm not for having to import something that is needlessly more verbose than solutions that are available without an import (see comprehensions). If you were just trying to map a single argument function I'd say "yeah, go for it" but from my perspective, that code is square peg, meet round hole.
What makes a solution elegant? I'd say having access to a cross cutting paradigm that is backwards compatible, reduces code while making it more readable and provides opportunities to improve the toolchain is more elegant. At any rate, I could care less about the lambda, that is a strawman.
Well, that would be true if you also added "I don't care if the language is internally consistent in any way" and a few others, but I'm not here to bash PHP.

On Mon, Dec 19, 2011 at 12:24 PM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
Ah, my mistake, I was thinking of title() rather than capitalize(). Still: def capitalize_lines(s): return "\n".join(s.capitalize() for line in s.split("\n")) There comes a time when the contortions people go to to avoid naming a frequently repeated operation just get silly. If you do something a lot, pull it out into a function and name it. The named function can even be a local closure if the usage is sufficiently specific to one operation - then it can still reference local variables without requiring a lot of additional parameters. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Dec 19, 2011 at 12:24 PM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
If you plan to introduce a new ABC to drive this, then I have a simple proposal: 1. Write a module that implements your "broadcast API for collections" 2. Publish it on PyPI 3. When it has popular uptake to indicate widespread user demand, then come back to us Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I certainly plan to make an implementation for example purposes. And if this was Ruby and not Python, that module would actually be universally useful, because they can rewire their internals better than we can. Unfortunately, in Python land it is very difficult to hook into list/iterator/etc creation on a global scale. As a result, a user would have to be very careful about calling into other people's code, and the type information would still get blown away pretty frequently. Kind of like if you subclass string; all the string methods still return strings, so the language is working against you there. Regardless, I'm sure everything that gets added to the language was first a hugely popular PyPI project, that is clearly the way the language evolves.

On Mon, Dec 19, 2011 at 2:47 PM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
Not everything, no. The real criteria is to have solid use cases where a proposal clearly improves the language. Sometimes that's just obvious (e.g. supporting a popular new compression protocol), sometimes a PEP is enough to explain, other times real world experience on PyPI is the best option. You don't have *any* of those at this point (just some vague hand-waving), so publishing a PyPI package is the most obvious way to start acquiring more data (and to prove that there's even a version of the idea that can be taken beyond the hand-waving stage). What you seem to be asking for is a general purpose typed container factory along the following lines: def typed_container(container_type, data_type): class TypedContainer(container_type): def __getattr__(self, attr): data_type_attr = getattribute(data_type, attr) if callable(data_type_attr): _result_type = type(self) def _broadcast(*args, **kwds): _result_type(data_type_attr(x, *args, **kwds) for x in self) return _broadcast return data_type_attr return TypedContainer I think it will have a lot of problems in practice (note that NumPy doesn't try to solve the broadcasting problem in general, just for a single specific data type), but, if the concept has any merit at all, that's certainly something that can be demonstrated quite adequately on PyPI. To get a better idea of the level of evidence you're trying to reach if your suggestion is ever going to get anywhere, try taking a look at http://www.boredomandlaziness.org/2011/02/justifying-python-language-changes... and http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Seems to me this would be better described as a "broadcasting container" than a "typed container". The only use it makes of the element type is to speed things up a bit by extracting bound methods. So the typed-ness is not an essential feature of its functionality, just something required to support an optimisation. Extended to handle the various operator methods, this might be a useful thing to have around. In conjunction with array.array, it could provide a kind of "numpy lite" for when depending on full-blown numpy would seem like overkill. -- Greg

On Mon, Dec 19, 2011 at 4:48 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Agreed, but that "might" is the killer - hence why PyPI is the appropriate place for this idea. Perhaps when it's been refined for a couple of years, a case might be made for standard lib inclusion. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Couple things. 1. The "broadcasting" that people seemed to have latched on to is only part of what I put forward, and I agree it is something that would have to be done *correctly* to be beneficial. I would have no issues with providing a userspace lib to do this if type decorations were included in homogeneous collections/iterables, as long as the implementation of the decoration didn't suffer from some form of "string failure" (string subclasses are basically worthless as methods return strings, not an instance of the class). 2. A "type decorator" on homogeneous collections and iterables has a lot of nice little benefits throughout the toolchain. 3. Being able to place methods on a "type decorator" is useful, it solves issues like "foo".join() which really wants to be a method on string collections. 4. I wanted to gauge people's feelings before I went through the steps involved in writing a PEP. I believe that is the right thing to do, so I don't feel the "hand waving" comment is warranted. I've already learned people view collections that provide child object methods in vector form as a very big change even if it is backwards compatible; that is fine, I'm willing to shelve that if consensus is that people aren't comfortable with it.
What you seem to be asking for is a general purpose typed container factory along the following lines:
Something along those lines. Again I feel people have latched on to one element of what I proposed here, to the detriment of the proposal as a whole.
This may be the case, thus my request for input. I agree that a thoughtful approach is prudent, however as I have stated, as it currently stands the language would not be very supportive of an add-on module that does this. With PyPy, I could probably get my hooks in deeply enough that I could make something really useful, however given CPython is the current flavor du jour, I doubt it would get much traction. Kind of hard to sell someone on syntactic sugar when they have to constantly wrap return values.
I agree that changes to syntax and commonly used modules that impact how people interface with them should be carefully vetted. Type decorations on homogeneous collections/iterators are effectively invisible in that perspective though; the main problem with them as I see it is that it involves touching a lot of code to implement, even if the actual implementation would be simple. I appreciate your feedback, Nathan

On Tue, Dec 20, 2011 at 12:30 AM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
I deliberately ignored everything else, because the broadcasting aspect is the only part that doesn't reek of "Python, why u no use static typing?" and hence the only part I find particularly interesting. If you reframe the typing ideas in terms of Abstract Base Classes and generation of typed proxies, the rest of it may become interesting.
The "Hand waving" comment is absolutely warranted, because without code, we don't really know what you mean.
Everything the numpy folks have done, they have done with the type system as it currently stands. It would be perfectly possible to create proxies for objects that implement the container ABCs that provide the type constraints that you describe, all without touching the language core or standard library. If you take full advantage of the power that descriptors and metaclasses offer, then you can do a *lot* with this concept as a PyPI module. As a specific suggestion, I'd advise exploring a "container.broadcast" descriptor on a mixin type that let you do things like: L2 = L1.broadcast(X, *args, **kwds) L3 = L2.broadcast(Y, *args, **kwds) L2 = L1.broadcast(X).broadcast(Y).broadcast(Z) L2 = L1.broadcast_chain(X)(Y)(Z).apply() L2 = L1.broadcast.X().broadcast.Y().broadcast.Z() L2 = L1.broadcast_chain.X().Y().Z().apply() my_string.split("\n").broadcast.capitalize().join_items("\n")
No, they're not invisible at all - they're a massive conceptual addition, on a similar scale to Abstract Base Classes themselves. Syntax matters for readability purposes, but it's the *semantics* that is at issue here (and also in many other PEPs). We're talking years of exploration and debate here, not something that can be resolved in a few weeks or months. There's zero chance of anything like this making it into 3.3., so think 3.4 at the earliest, and more likely 3.5 (and that's only if coded explorations of the concept prove useful in practice - it's entirely possible that the real outcome will be "never"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick: My main issue with anything you've said is that I felt your original statement about going off, creating a library, evangelizing it vigorously until it is popular and coming back in several years as a prerequisite to even engaging in a thoughtful discussion on the subject was overly dismissive and phrased in a combative manner. I do appreciate the strong challenges you've presented though. Terry: My apologies for being unclear. I do not take enough care in using terminology in the standard way. In my case, I think code is a better communication mechanism, so that is where I will be focusing most of my efforts moving forward. ***Broadcasting/Elementwise operations*** I agree code speaks more clearly than words: I put a module called "elementwise" on pypi (http://pypi.python.org/pypi/elementwise/0.111220) that implements my idea of what a nice broadcast proxy should do. The proxy itself could be leaner, I started out trying to make it in a very surgical manner, but 1.) There are serious dragons in the way python handles lookup of operator overloading special methods 2.) there are serious dragons in how python handles complex inheritance graphs that result in "object.__new__() takes no parameters", despite not having any builtin bases and having no base class overriding __new__ or __init__ 3.) Proxying EVERYTHING is pythonic in some sense. I am aware that there are probably issues around having some combination of special methods defined that will confuse the interpreter. The syntax is really simple: your_iterable_elementwise = ElementwiseProxy(your_iterable) your_iterable_elementwise.method_1().method_2().method_3() (your_iterable_elementwise + 1) * 2 + 10 The __iter__ method is how you break the cycle, so when you are done working in an elementwise manner, just call {list|set|tuple|...}(your_iterable_elementwise). In place operator modification should work as well. All operations on an ElementwiseProxy return another ElementwiseProxy with a parent attribute that you can use to backtrack in the operation history, if for some reason you were brought into the loop partway through. I also included an ElementwiseProxyMixin that gives you a ruby-esque class.each property for iterables, which is basically just returns a "ElementwiseProxy(self)". I think the syntax is nicer. ***I can haz moar Metadata (was "typed" collections)*** Because "typed" is sort of a dirty word in the python community, and some people have had issues with other terms I have used, I am going to settle on addressing things in term of the problem. The problem is metadata - languages with declared types have a lot of it, python has a lot less. Metadata is useful, and we should try to pack as much if it in as we can, because it lets us do things in a intelligent ways on a case by case basis. I understand annotations are designed for this, but they are absolutely pointless without some sort of schema or strong conventions. I see from the archived discussions that people wanted to avoid a design by committee situation, but people working in the tool-chain aren't going to mess with annotations unless there is one metadata standard that is pretty much ubiquitous (at which point it should be included in the stdlib anyhow). I think it is a decent idea to have it incubate outside the stdlib, but I don't see any progress on that front at all. With that being said, I would like to reopen the discussion about an metadata annotation schema, so we can get an incubation implementation going and begin evangelism.

On 12/20/2011 1:50 PM, Nathan Rice wrote:
I downloaded and took a brief look. I hope to get back to it later. 2.) there are serious dragons in
Best not to use object as the apex of multiple inheritance.
Because "typed" is sort of a dirty word in the python community,
Not exactly true, and unnecessarily combative. More true is that careless use of 'typed' has gotten tiresome. Python is strongly dynamically typed. But people occasionally post -- again the same day you posted to python list -- that Python is weakly typed. I am tired of explaining that 'typed' is not synonymous with 'statically typed'. Or consider your subject line. Python collections are typed both as to collection object and the contents. Python has narrow-content typed sequences. So just saying you want 'typed collections' does not say anything. We already have them. We even have propagation of narrow-content typing for operations on bytes and strings. But we do not have anything similar for numbers or user classes. And that might be worthwhile. So your subject seems more like 'adding generic narrowly typed sequences to Python'. -- Terry Jan Reedy

On 12/20/2011 7:09 PM, Steven D'Aprano wrote:
I believe the recommendation I have seen is something like this
class C(): def __init__(self, *arg, **kwds): pass
C(1) <__main__.C object at 0x00000000034C0F98>
whereas
Now, so others have posted, define A and B with super, inheriting from C instead of object, and D with super inheriting from A and B, and all should go well instead of crashing. But I have not tested this myself. Same for ordinary single inheritance chain. -- Terry Jan Reedy

I don't find this much less careless. How do you differentiate between the "strong typing" of Python and the "strong typing" of Agda? It isn't a binary quantity. Perhaps, instead, we should stop claiming things are "strong" or "weak". If I said that, relatively speaking, Python is weakly typed, people would get offended -- not because I made any technically incorrect statement (on the spectrum, Python is far closer to assembly than Agda), but because to call it "weak" is insulting. -- Devin On Tue, Dec 20, 2011 at 6:01 PM, Terry Reedy <tjreedy@udel.edu> wrote:

On 12/20/2011 7:51 PM, Devin Jeanpierre wrote:
If you are going to use term idiosyncratically, then consider giving you definition along with it. See https://en.wikipedia.org/wiki/Strongly_typed for a common usage, by which Python is strongly typed. -- Terry Jan Reedy

Hm, I made a lie. That isn't the point I was trying to make. Or rather, I made two points at once originally, and the second one contradicted the first one :) (I stated that strong vs weak is dumb, and then I said Python was weakly typed. I meant to say "in the sense that Agda is strong", but I guess I screwed that one up) *sigh*. It's perfectly possible to say that Python is strongly typed, with a self-consistent definition of "strongly typed". It's also possible to say that it's weakly typed, with another self-consistent definition. There is no _standard_ definition. The definition that Agda uses is not the definition that Python uses. I think people tend to choose the one that is least likely to call their favorite language "weak". Devin On Tue, Dec 20, 2011 at 8:41 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:

On Wed, Dec 21, 2011 at 10:51 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
When you can mutate a str object into an int object (or vice-versa), then you can claim Python is weakly typed without being technically incorrect. Weak typing has a very specific meaning: objects can change their type without changing their identity (e.g. via pointer casting in C and C++). Python lets objects *lie* about their types to some degree (by altering __class__), but type(obj) will always reveal the true underling type (indeed, "obj.__class__ != type(obj)" is one of the ways to detect when you've been given a proxy object, if the distinction matters for a particular use case). (For CPython, extension module authors can actually use C code to get around the strong typing if they really try, but the authors of such code get no sympathy when it inevitably blows up in obscure and hard to debug ways. Old-style classes in 2.x can also be legitimately described as weakly typed, since *all* instances of such classes share a single underlying type, and __class__ is the true determinant of their behaviour) Weak vs strong typing and dynamic vs static typing are well-defined concepts - it's just all too common that folks that initially learn to program with a static language confuse the two spectra and think that "static typing" and "strong typing" are the same thing. They're not only not the same, they're actually completely orthogonal. CPython, for example, uses the weak static typing of C to implement Python's strong dynamic typing mechanisms. IronPython and Jython have strong typing at both levels, but retain the static vs dynamic split. I believe PyPy uses strong dynamic typing throughout (although RPython has a type inference mechanism and a few other tricks to support translation to machine code) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I realised I gave a quick definition of strong vs weak typing, but not dynamic vs static. Here's explanations for all four: Strong typing: an object's type is immutable. To change the type, you must change the object's identity (i.e. create a new object). vs Weak typing: an object's type (and hence its behaviour) can be changed while leaving its identity untouched. Python allows weak typing at the __class__ level (to support proxy objects and similar metaprogramming tools), but the underlying object model of the language is strongly typed. Static typing: types are assigned not only to objects, but also to labels that refer to objects. The type of the label and the type of the object must match. Accordingly, variables must be explicitly associated with a type via variable declarations. vs Dynamic typing: types are assigned only to objects, and labels themselves are untyped. Accordingly, variables can be defined implicitly just by assigning a value to them. (Some otherwise statically typed languages include explicit support for dynamically typed references) (An interest hybrid variant for static vs dynamic is an approach where labels *are* typed, but they acquire their type from the first value assigned to them. Automatic type inferencing can make static typing significantly less painful to work with, while still picking up most type errors at compile time. C++11 has gone that way with its introduction of "auto" type declarations for initialised types, where you explicitly tell the compiler "this variable is of the same type as the result of the initialiser"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
While I like the definitions Nick has given, I think he's a tad optimistic to claim that the various foo-typing are "well-defined". I think that weak and strong typing aren't dichotomies, but extremes in a continuum. Most languages include elements of both weak and strong typing, particularly coercion of ints to floats. Chris Smith's influential article "What To Know Before Debating Type Systems" goes further, suggesting that weak and strong typing are meaningless terms. I don't go that far, but you should read his article: http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/ See also http://en.wikipedia.org/wiki/Type_system -- Steven

On Wed, Dec 21, 2011 at 9:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Yeah, I'd only ever encountered weak/strong under the meanings I gave, but the Wikipedia article Terry linked was eye-opening (and does indeed suggest that if you're going to use weak/strong as terms, it's necessary to define them in situ). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/19/2011 9:30 AM, Nathan Rice wrote:
Perhaps because it is the most understandable.
The meaning of 'homogeneous' depends on the context -- the purpose and use of the collection. For some purposes -- str(o), len(c), o in c, c.index(o), and others, -- all objects, collections, or seqeuences *are* 'homogeneous' as instances or subclasses of 'object'. On the other hand, even [-1, 0, 1] is heterogeneous with respect to both sqrt and log, with the divide different for each. So I do not consider 'homogeneous' to be a property of collections as such. Python's current restricted-type mutable sequence factory is array.array. The types do not even have to be Python types, just machine storage types. The typecode is part of the object and exposed as an attribute. Such sequences cannot be 'degraded' because type-checking is done with all operations. It would not be difficult to make a TypedList class that did the same, either subclassing or wrapping list. What you have noticed is that iter(array(tc,init)) does not get the typecode information, so potentially useful information is lost. Your first concrete proposal might be that the information be kept and that arrayiterators get a type attribute corresponding to the Python type that the produced values are converted to. Also, array could expose the mapping to typecodes to Python types. These changes would allow experiments that would show the value of your basic idea.
This problem is generic to subclassing built-in classes. List would be a better example here since strings already are specialized sequences.
2. A "type decorator" on homogeneous collections and iterables has a lot of nice little benefits throughout the toolchain.
That is what you need to demonstrate, because it does not seem clear yet. What would you do with an arrayiterator with a type attribute. By the way, a 'decorator' in Python is a specific category of callable used in a specific way. Perhaps you mean 'type attribute'?
3. Being able to place methods on a "type decorator" is useful,
'Placing methods' on an attribute or even a callable does not mean much. You can only concretely add methods to concrete classes, not abstract categories.
it solves issues like "foo".join() which really wants to be a method on string collections.
No it does not. 'String collection' is a category, not a class. Nor can it be a class without drastically revising Python. It is a category that cuts across all generic collection classes. So .join has to be a method of the joiner class.
To the extent one does not understand what you say, and to the extent that it seems disconnected from concrete reality, it is easy to see it as hand waving. That you perhaps did not understand why .join is a string method points in that direction.
Because we understand that non-method functions have virtues, and Python already has collection functions.
even if it is backwards compatible; that is fine.
Backwards compatible duplication needs justification. ...
I am still not sure what you are really proposing. You may have the germ of a useful idea, but I think it needs clarification and a demonstration.
invisible in that perspective though;
Slowdowns are not invisible. Requiring a type check on every addition to every built-in collection might result in such.
Changes that touch a lot of code are fairly rare and require major benefits. One was the switch to new-style classes started in 2.2 and ended in 3.0. Several people contributed patches. They must have thought that unifying types and classes into one system was worth it. In 3.3, the two unicode implementations (one per build) are effectively combined with a third with a new C-level API. Adding and tweaking the new API (which continues today) and converting the entire C core and stdlib codebase to the new API has required something on the order of 50 patches over 3 months, so far. But it improves performance (overall) and removed the inherent bugs in representing 3-bytes chars with 2 2-byte chars and in having different Python builds respond differently to the same code. Note that the PEP concretely lays out the new C structures and API and that there was a prototype implementation showing benefits before it was approved. -- Terry Jan Reedy

On Tue, Dec 20, 2011 at 9:10 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Nope, foo.title() isn't exactly the same - Nathan's variant capitalises the first letter of each line, foo.title() would capitalise the first letter of each word. Not a common operation, but split + capitalise + join is likely the simplest way to spell it. </tangent> Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I am, and vectorized functions/typed arrays from numpy are a source of inspiration for me in this proposal. The problem with having typed arrays/etc as an add-on is that you lose the benefits as soon as you step into code that isn't meant to interact with with the specific library. This isn't such an issue with NumPy since it has a healthy ecosystem, but there are lots of other instances where having type aware collections would be nice.

Nathan Rice wrote:
-1 to automatic downgrading. Imagine in step C that the collection is auto-downgraded and later in step Q a method is called that no longer exists -- Exception raised. (I find a Warning less than useful -- either you aren't going to use one of the special methods, in which case you didn't need the special container and the warning is just noise, or you do need the special methods and you'll get an exception further on when you try to use it and it's no longer there.)
+1 Either it was not needed to begin with, or the need has been satisfied and now we need to add in objects with a different type Do you have some examples of functions where you pass in a container object and get the same type of object back, but it's a different object? ~Ethan~

Personally, I like warnings. When my code passes tests, and I see a warning, often times it clues me in to an additional edge case or subtle bug. What are the other options? Throw an exception immediately, or don't downgrade at all, then throw an exception when someone tries to use a method that doesn't exist on a child? Throwing an exception immediately forces people to be explicit which is good, but it isn't backwards compatible. Not downgrading at all seems like it could be a source of annoyingly subtle bugs, since it is possible that you could put something in the list that fulfills part of the contract, but not all of it, or fulfills the contract in a subtly incorrect way.
The restriction on being the same type is not needed. If a list is homogeneous, an iterator or set derived from the list would be homogeneous, and keeping that information in tact would probably be a good thing. With the slight adjustment, list, tuple, set, filter, iter (+most of itertools), sorted, reversed and heapq off the top of my head. Looking at the list, you could give type decorations legs just by having iterables of decorated objects retain the decoration, and having the basic collection types inherit type decorations at construction.

On Mon, Dec 19, 2011 at 9:28 AM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
This use case is why map() remains a builtin, even in Python 3: L2 = map(X, L1) L3 = map(Y, L2) Short, but explicit (no under-the-hood guessing about whether or not something should be treated as a scalar or vector value - in the general case, this distinction isn't as clear as you might think, just look at strings).
def XYZ(arg): """Look, I can document what this means!""" return Z(Y(X(arg))) L2 = map(XYZ, L1)
def XYZ_methods(arg): """I can also document what *this* means""" return arg.X().Y().Z() L2 = map(XYZ_methods, L1)
Another bad example, since that's just a really verbose way of writing my_string.capitalize(). Short answer: what advantage does your proposal really offer over simply extracting the repetitive operation out to a use case specific function, and making effective use of the existing vectorisation utilities (i.e. map() and itertools)? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Dec 18, 2011 at 6:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Yes, but map(lambda x: getattr(x, "method")(), thing) is ugly, and map(lambda x: x.method_2(param), map(lambda x: x.method(param), thing)) is really ugly. On top of that, it is asking more of code analysis tools to verify that code, and IDEs aren't going to be able to tell you the methods on the ambiguous x in the lambda. Sure if the only argument is self, you could call class.method, but I don't think that is the majority use case.
Yes, I love that feature of strings, it is a source of lots of bugs, but I digress. The reason this partially solves that problem is that instead of having to do a bunch of guesswork on an iterable to see if you should do the vectorized version of the function, you just check to see if it is an instance of a certain TypedCollectionContract. If so, vectorize. No "isinstance(foo, Iterable) and not isinstance(foo, basestr)" silliness here.
What about WXY, and XZ, and WYZ, and YZ, and...
The python interpreter says otherwise...
IDEs can provide context hints, interpreters can use the contract to change how they treat the collection to improve performance, lint style code analysis will be easier and it is across the board less verbose than the currently available options, while reading left to right with fewer non letter characters I.E. more understandable. Nathan

On Dec 19, 12:24 pm, Nathan Rice <nathan.alexander.r...@gmail.com> wrote:
Yes, but map(lambda x: getattr(x, "method")(), thing) is ugly
from operator import methodcaller method = methodcaller('method') result = map(method, thing)
map(lambda x: x.method_2(param), map(lambda x: x.method(param), thing)) is really ugly.
method = methodcaller('method', param) method2 = methodcaller('method_2', param) result = map(method2, map(method, thing)) If your code is ugly, stop writing ugly code. :)

Received via email, this isn't a private discussion:
If you consider importing functionality 'clunky', then we're already at an impasse over what we consider to be 'elegant'. I also don't believe that user ignorance of standard library features mandates the inclusion of _everything_ into the base langauge.
Just because you CAN do something already doesn't mean you can do it in an elegant way.
And just because YOU find it inelegant doesn't make it so. I find using operator & functools _far_ clearer in intent than using lambda, _and it works right now_, which was the point I was trying to make here. You wrote unreadable code and then tried to use that as an argument for your idea. Breaking down complex statements into clearer parts isn't a radical notion; cramming more and more functionality onto a single line isn't something to which Python needs to aspire.
If the only metric was "can we do this?" we would all still be using Fortran.
And if it's "I don't want to have to code to deal with this, let the language change to do it", you end up with PHP.

My apologies for the direct email. Gmail is somewhat retarded when it comes to the python lists, and I don't always remember to change the to:...
There are a lot of warts in the standard library, (thus Py3). I'm all for importing things that deserve to be in their own namespace with other, similar things. I'm not for having to import something that is needlessly more verbose than solutions that are available without an import (see comprehensions). If you were just trying to map a single argument function I'd say "yeah, go for it" but from my perspective, that code is square peg, meet round hole.
What makes a solution elegant? I'd say having access to a cross cutting paradigm that is backwards compatible, reduces code while making it more readable and provides opportunities to improve the toolchain is more elegant. At any rate, I could care less about the lambda, that is a strawman.
Well, that would be true if you also added "I don't care if the language is internally consistent in any way" and a few others, but I'm not here to bash PHP.

On Mon, Dec 19, 2011 at 12:24 PM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
Ah, my mistake, I was thinking of title() rather than capitalize(). Still: def capitalize_lines(s): return "\n".join(s.capitalize() for line in s.split("\n")) There comes a time when the contortions people go to to avoid naming a frequently repeated operation just get silly. If you do something a lot, pull it out into a function and name it. The named function can even be a local closure if the usage is sufficiently specific to one operation - then it can still reference local variables without requiring a lot of additional parameters. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Dec 19, 2011 at 12:24 PM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
If you plan to introduce a new ABC to drive this, then I have a simple proposal: 1. Write a module that implements your "broadcast API for collections" 2. Publish it on PyPI 3. When it has popular uptake to indicate widespread user demand, then come back to us Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I certainly plan to make an implementation for example purposes. And if this was Ruby and not Python, that module would actually be universally useful, because they can rewire their internals better than we can. Unfortunately, in Python land it is very difficult to hook into list/iterator/etc creation on a global scale. As a result, a user would have to be very careful about calling into other people's code, and the type information would still get blown away pretty frequently. Kind of like if you subclass string; all the string methods still return strings, so the language is working against you there. Regardless, I'm sure everything that gets added to the language was first a hugely popular PyPI project, that is clearly the way the language evolves.

On Mon, Dec 19, 2011 at 2:47 PM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
Not everything, no. The real criteria is to have solid use cases where a proposal clearly improves the language. Sometimes that's just obvious (e.g. supporting a popular new compression protocol), sometimes a PEP is enough to explain, other times real world experience on PyPI is the best option. You don't have *any* of those at this point (just some vague hand-waving), so publishing a PyPI package is the most obvious way to start acquiring more data (and to prove that there's even a version of the idea that can be taken beyond the hand-waving stage). What you seem to be asking for is a general purpose typed container factory along the following lines: def typed_container(container_type, data_type): class TypedContainer(container_type): def __getattr__(self, attr): data_type_attr = getattribute(data_type, attr) if callable(data_type_attr): _result_type = type(self) def _broadcast(*args, **kwds): _result_type(data_type_attr(x, *args, **kwds) for x in self) return _broadcast return data_type_attr return TypedContainer I think it will have a lot of problems in practice (note that NumPy doesn't try to solve the broadcasting problem in general, just for a single specific data type), but, if the concept has any merit at all, that's certainly something that can be demonstrated quite adequately on PyPI. To get a better idea of the level of evidence you're trying to reach if your suggestion is ever going to get anywhere, try taking a look at http://www.boredomandlaziness.org/2011/02/justifying-python-language-changes... and http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
Seems to me this would be better described as a "broadcasting container" than a "typed container". The only use it makes of the element type is to speed things up a bit by extracting bound methods. So the typed-ness is not an essential feature of its functionality, just something required to support an optimisation. Extended to handle the various operator methods, this might be a useful thing to have around. In conjunction with array.array, it could provide a kind of "numpy lite" for when depending on full-blown numpy would seem like overkill. -- Greg

On Mon, Dec 19, 2011 at 4:48 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Agreed, but that "might" is the killer - hence why PyPI is the appropriate place for this idea. Perhaps when it's been refined for a couple of years, a case might be made for standard lib inclusion. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Couple things. 1. The "broadcasting" that people seemed to have latched on to is only part of what I put forward, and I agree it is something that would have to be done *correctly* to be beneficial. I would have no issues with providing a userspace lib to do this if type decorations were included in homogeneous collections/iterables, as long as the implementation of the decoration didn't suffer from some form of "string failure" (string subclasses are basically worthless as methods return strings, not an instance of the class). 2. A "type decorator" on homogeneous collections and iterables has a lot of nice little benefits throughout the toolchain. 3. Being able to place methods on a "type decorator" is useful, it solves issues like "foo".join() which really wants to be a method on string collections. 4. I wanted to gauge people's feelings before I went through the steps involved in writing a PEP. I believe that is the right thing to do, so I don't feel the "hand waving" comment is warranted. I've already learned people view collections that provide child object methods in vector form as a very big change even if it is backwards compatible; that is fine, I'm willing to shelve that if consensus is that people aren't comfortable with it.
What you seem to be asking for is a general purpose typed container factory along the following lines:
Something along those lines. Again I feel people have latched on to one element of what I proposed here, to the detriment of the proposal as a whole.
This may be the case, thus my request for input. I agree that a thoughtful approach is prudent, however as I have stated, as it currently stands the language would not be very supportive of an add-on module that does this. With PyPy, I could probably get my hooks in deeply enough that I could make something really useful, however given CPython is the current flavor du jour, I doubt it would get much traction. Kind of hard to sell someone on syntactic sugar when they have to constantly wrap return values.
I agree that changes to syntax and commonly used modules that impact how people interface with them should be carefully vetted. Type decorations on homogeneous collections/iterators are effectively invisible in that perspective though; the main problem with them as I see it is that it involves touching a lot of code to implement, even if the actual implementation would be simple. I appreciate your feedback, Nathan

On Tue, Dec 20, 2011 at 12:30 AM, Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
I deliberately ignored everything else, because the broadcasting aspect is the only part that doesn't reek of "Python, why u no use static typing?" and hence the only part I find particularly interesting. If you reframe the typing ideas in terms of Abstract Base Classes and generation of typed proxies, the rest of it may become interesting.
The "Hand waving" comment is absolutely warranted, because without code, we don't really know what you mean.
Everything the numpy folks have done, they have done with the type system as it currently stands. It would be perfectly possible to create proxies for objects that implement the container ABCs that provide the type constraints that you describe, all without touching the language core or standard library. If you take full advantage of the power that descriptors and metaclasses offer, then you can do a *lot* with this concept as a PyPI module. As a specific suggestion, I'd advise exploring a "container.broadcast" descriptor on a mixin type that let you do things like: L2 = L1.broadcast(X, *args, **kwds) L3 = L2.broadcast(Y, *args, **kwds) L2 = L1.broadcast(X).broadcast(Y).broadcast(Z) L2 = L1.broadcast_chain(X)(Y)(Z).apply() L2 = L1.broadcast.X().broadcast.Y().broadcast.Z() L2 = L1.broadcast_chain.X().Y().Z().apply() my_string.split("\n").broadcast.capitalize().join_items("\n")
No, they're not invisible at all - they're a massive conceptual addition, on a similar scale to Abstract Base Classes themselves. Syntax matters for readability purposes, but it's the *semantics* that is at issue here (and also in many other PEPs). We're talking years of exploration and debate here, not something that can be resolved in a few weeks or months. There's zero chance of anything like this making it into 3.3., so think 3.4 at the earliest, and more likely 3.5 (and that's only if coded explorations of the concept prove useful in practice - it's entirely possible that the real outcome will be "never"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick: My main issue with anything you've said is that I felt your original statement about going off, creating a library, evangelizing it vigorously until it is popular and coming back in several years as a prerequisite to even engaging in a thoughtful discussion on the subject was overly dismissive and phrased in a combative manner. I do appreciate the strong challenges you've presented though. Terry: My apologies for being unclear. I do not take enough care in using terminology in the standard way. In my case, I think code is a better communication mechanism, so that is where I will be focusing most of my efforts moving forward. ***Broadcasting/Elementwise operations*** I agree code speaks more clearly than words: I put a module called "elementwise" on pypi (http://pypi.python.org/pypi/elementwise/0.111220) that implements my idea of what a nice broadcast proxy should do. The proxy itself could be leaner, I started out trying to make it in a very surgical manner, but 1.) There are serious dragons in the way python handles lookup of operator overloading special methods 2.) there are serious dragons in how python handles complex inheritance graphs that result in "object.__new__() takes no parameters", despite not having any builtin bases and having no base class overriding __new__ or __init__ 3.) Proxying EVERYTHING is pythonic in some sense. I am aware that there are probably issues around having some combination of special methods defined that will confuse the interpreter. The syntax is really simple: your_iterable_elementwise = ElementwiseProxy(your_iterable) your_iterable_elementwise.method_1().method_2().method_3() (your_iterable_elementwise + 1) * 2 + 10 The __iter__ method is how you break the cycle, so when you are done working in an elementwise manner, just call {list|set|tuple|...}(your_iterable_elementwise). In place operator modification should work as well. All operations on an ElementwiseProxy return another ElementwiseProxy with a parent attribute that you can use to backtrack in the operation history, if for some reason you were brought into the loop partway through. I also included an ElementwiseProxyMixin that gives you a ruby-esque class.each property for iterables, which is basically just returns a "ElementwiseProxy(self)". I think the syntax is nicer. ***I can haz moar Metadata (was "typed" collections)*** Because "typed" is sort of a dirty word in the python community, and some people have had issues with other terms I have used, I am going to settle on addressing things in term of the problem. The problem is metadata - languages with declared types have a lot of it, python has a lot less. Metadata is useful, and we should try to pack as much if it in as we can, because it lets us do things in a intelligent ways on a case by case basis. I understand annotations are designed for this, but they are absolutely pointless without some sort of schema or strong conventions. I see from the archived discussions that people wanted to avoid a design by committee situation, but people working in the tool-chain aren't going to mess with annotations unless there is one metadata standard that is pretty much ubiquitous (at which point it should be included in the stdlib anyhow). I think it is a decent idea to have it incubate outside the stdlib, but I don't see any progress on that front at all. With that being said, I would like to reopen the discussion about an metadata annotation schema, so we can get an incubation implementation going and begin evangelism.

On 12/20/2011 1:50 PM, Nathan Rice wrote:
I downloaded and took a brief look. I hope to get back to it later. 2.) there are serious dragons in
Best not to use object as the apex of multiple inheritance.
Because "typed" is sort of a dirty word in the python community,
Not exactly true, and unnecessarily combative. More true is that careless use of 'typed' has gotten tiresome. Python is strongly dynamically typed. But people occasionally post -- again the same day you posted to python list -- that Python is weakly typed. I am tired of explaining that 'typed' is not synonymous with 'statically typed'. Or consider your subject line. Python collections are typed both as to collection object and the contents. Python has narrow-content typed sequences. So just saying you want 'typed collections' does not say anything. We already have them. We even have propagation of narrow-content typing for operations on bytes and strings. But we do not have anything similar for numbers or user classes. And that might be worthwhile. So your subject seems more like 'adding generic narrowly typed sequences to Python'. -- Terry Jan Reedy

On 12/20/2011 7:09 PM, Steven D'Aprano wrote:
I believe the recommendation I have seen is something like this
class C(): def __init__(self, *arg, **kwds): pass
C(1) <__main__.C object at 0x00000000034C0F98>
whereas
Now, so others have posted, define A and B with super, inheriting from C instead of object, and D with super inheriting from A and B, and all should go well instead of crashing. But I have not tested this myself. Same for ordinary single inheritance chain. -- Terry Jan Reedy

I don't find this much less careless. How do you differentiate between the "strong typing" of Python and the "strong typing" of Agda? It isn't a binary quantity. Perhaps, instead, we should stop claiming things are "strong" or "weak". If I said that, relatively speaking, Python is weakly typed, people would get offended -- not because I made any technically incorrect statement (on the spectrum, Python is far closer to assembly than Agda), but because to call it "weak" is insulting. -- Devin On Tue, Dec 20, 2011 at 6:01 PM, Terry Reedy <tjreedy@udel.edu> wrote:

On 12/20/2011 7:51 PM, Devin Jeanpierre wrote:
If you are going to use term idiosyncratically, then consider giving you definition along with it. See https://en.wikipedia.org/wiki/Strongly_typed for a common usage, by which Python is strongly typed. -- Terry Jan Reedy

Hm, I made a lie. That isn't the point I was trying to make. Or rather, I made two points at once originally, and the second one contradicted the first one :) (I stated that strong vs weak is dumb, and then I said Python was weakly typed. I meant to say "in the sense that Agda is strong", but I guess I screwed that one up) *sigh*. It's perfectly possible to say that Python is strongly typed, with a self-consistent definition of "strongly typed". It's also possible to say that it's weakly typed, with another self-consistent definition. There is no _standard_ definition. The definition that Agda uses is not the definition that Python uses. I think people tend to choose the one that is least likely to call their favorite language "weak". Devin On Tue, Dec 20, 2011 at 8:41 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:

On Wed, Dec 21, 2011 at 10:51 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
When you can mutate a str object into an int object (or vice-versa), then you can claim Python is weakly typed without being technically incorrect. Weak typing has a very specific meaning: objects can change their type without changing their identity (e.g. via pointer casting in C and C++). Python lets objects *lie* about their types to some degree (by altering __class__), but type(obj) will always reveal the true underling type (indeed, "obj.__class__ != type(obj)" is one of the ways to detect when you've been given a proxy object, if the distinction matters for a particular use case). (For CPython, extension module authors can actually use C code to get around the strong typing if they really try, but the authors of such code get no sympathy when it inevitably blows up in obscure and hard to debug ways. Old-style classes in 2.x can also be legitimately described as weakly typed, since *all* instances of such classes share a single underlying type, and __class__ is the true determinant of their behaviour) Weak vs strong typing and dynamic vs static typing are well-defined concepts - it's just all too common that folks that initially learn to program with a static language confuse the two spectra and think that "static typing" and "strong typing" are the same thing. They're not only not the same, they're actually completely orthogonal. CPython, for example, uses the weak static typing of C to implement Python's strong dynamic typing mechanisms. IronPython and Jython have strong typing at both levels, but retain the static vs dynamic split. I believe PyPy uses strong dynamic typing throughout (although RPython has a type inference mechanism and a few other tricks to support translation to machine code) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I realised I gave a quick definition of strong vs weak typing, but not dynamic vs static. Here's explanations for all four: Strong typing: an object's type is immutable. To change the type, you must change the object's identity (i.e. create a new object). vs Weak typing: an object's type (and hence its behaviour) can be changed while leaving its identity untouched. Python allows weak typing at the __class__ level (to support proxy objects and similar metaprogramming tools), but the underlying object model of the language is strongly typed. Static typing: types are assigned not only to objects, but also to labels that refer to objects. The type of the label and the type of the object must match. Accordingly, variables must be explicitly associated with a type via variable declarations. vs Dynamic typing: types are assigned only to objects, and labels themselves are untyped. Accordingly, variables can be defined implicitly just by assigning a value to them. (Some otherwise statically typed languages include explicit support for dynamically typed references) (An interest hybrid variant for static vs dynamic is an approach where labels *are* typed, but they acquire their type from the first value assigned to them. Automatic type inferencing can make static typing significantly less painful to work with, while still picking up most type errors at compile time. C++11 has gone that way with its introduction of "auto" type declarations for initialised types, where you explicitly tell the compiler "this variable is of the same type as the result of the initialiser"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
While I like the definitions Nick has given, I think he's a tad optimistic to claim that the various foo-typing are "well-defined". I think that weak and strong typing aren't dichotomies, but extremes in a continuum. Most languages include elements of both weak and strong typing, particularly coercion of ints to floats. Chris Smith's influential article "What To Know Before Debating Type Systems" goes further, suggesting that weak and strong typing are meaningless terms. I don't go that far, but you should read his article: http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/ See also http://en.wikipedia.org/wiki/Type_system -- Steven

On Wed, Dec 21, 2011 at 9:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Yeah, I'd only ever encountered weak/strong under the meanings I gave, but the Wikipedia article Terry linked was eye-opening (and does indeed suggest that if you're going to use weak/strong as terms, it's necessary to define them in situ). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/19/2011 9:30 AM, Nathan Rice wrote:
Perhaps because it is the most understandable.
The meaning of 'homogeneous' depends on the context -- the purpose and use of the collection. For some purposes -- str(o), len(c), o in c, c.index(o), and others, -- all objects, collections, or seqeuences *are* 'homogeneous' as instances or subclasses of 'object'. On the other hand, even [-1, 0, 1] is heterogeneous with respect to both sqrt and log, with the divide different for each. So I do not consider 'homogeneous' to be a property of collections as such. Python's current restricted-type mutable sequence factory is array.array. The types do not even have to be Python types, just machine storage types. The typecode is part of the object and exposed as an attribute. Such sequences cannot be 'degraded' because type-checking is done with all operations. It would not be difficult to make a TypedList class that did the same, either subclassing or wrapping list. What you have noticed is that iter(array(tc,init)) does not get the typecode information, so potentially useful information is lost. Your first concrete proposal might be that the information be kept and that arrayiterators get a type attribute corresponding to the Python type that the produced values are converted to. Also, array could expose the mapping to typecodes to Python types. These changes would allow experiments that would show the value of your basic idea.
This problem is generic to subclassing built-in classes. List would be a better example here since strings already are specialized sequences.
2. A "type decorator" on homogeneous collections and iterables has a lot of nice little benefits throughout the toolchain.
That is what you need to demonstrate, because it does not seem clear yet. What would you do with an arrayiterator with a type attribute. By the way, a 'decorator' in Python is a specific category of callable used in a specific way. Perhaps you mean 'type attribute'?
3. Being able to place methods on a "type decorator" is useful,
'Placing methods' on an attribute or even a callable does not mean much. You can only concretely add methods to concrete classes, not abstract categories.
it solves issues like "foo".join() which really wants to be a method on string collections.
No it does not. 'String collection' is a category, not a class. Nor can it be a class without drastically revising Python. It is a category that cuts across all generic collection classes. So .join has to be a method of the joiner class.
To the extent one does not understand what you say, and to the extent that it seems disconnected from concrete reality, it is easy to see it as hand waving. That you perhaps did not understand why .join is a string method points in that direction.
Because we understand that non-method functions have virtues, and Python already has collection functions.
even if it is backwards compatible; that is fine.
Backwards compatible duplication needs justification. ...
I am still not sure what you are really proposing. You may have the germ of a useful idea, but I think it needs clarification and a demonstration.
invisible in that perspective though;
Slowdowns are not invisible. Requiring a type check on every addition to every built-in collection might result in such.
Changes that touch a lot of code are fairly rare and require major benefits. One was the switch to new-style classes started in 2.2 and ended in 3.0. Several people contributed patches. They must have thought that unifying types and classes into one system was worth it. In 3.3, the two unicode implementations (one per build) are effectively combined with a third with a new C-level API. Adding and tweaking the new API (which continues today) and converting the entire C core and stdlib codebase to the new API has required something on the order of 50 patches over 3 months, so far. But it improves performance (overall) and removed the inherent bugs in representing 3-bytes chars with 2 2-byte chars and in having different Python builds respond differently to the same code. Note that the PEP concretely lays out the new C structures and API and that there was a prototype implementation showing benefits before it was approved. -- Terry Jan Reedy

On Tue, Dec 20, 2011 at 9:10 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Nope, foo.title() isn't exactly the same - Nathan's variant capitalises the first letter of each line, foo.title() would capitalise the first letter of each word. Not a common operation, but split + capitalise + join is likely the simplest way to spell it. </tangent> Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I am, and vectorized functions/typed arrays from numpy are a source of inspiration for me in this proposal. The problem with having typed arrays/etc as an add-on is that you lose the benefits as soon as you step into code that isn't meant to interact with with the specific library. This isn't such an issue with NumPy since it has a healthy ecosystem, but there are lots of other instances where having type aware collections would be nice.

Nathan Rice wrote:
-1 to automatic downgrading. Imagine in step C that the collection is auto-downgraded and later in step Q a method is called that no longer exists -- Exception raised. (I find a Warning less than useful -- either you aren't going to use one of the special methods, in which case you didn't need the special container and the warning is just noise, or you do need the special methods and you'll get an exception further on when you try to use it and it's no longer there.)
+1 Either it was not needed to begin with, or the need has been satisfied and now we need to add in objects with a different type Do you have some examples of functions where you pass in a container object and get the same type of object back, but it's a different object? ~Ethan~

Personally, I like warnings. When my code passes tests, and I see a warning, often times it clues me in to an additional edge case or subtle bug. What are the other options? Throw an exception immediately, or don't downgrade at all, then throw an exception when someone tries to use a method that doesn't exist on a child? Throwing an exception immediately forces people to be explicit which is good, but it isn't backwards compatible. Not downgrading at all seems like it could be a source of annoyingly subtle bugs, since it is possible that you could put something in the list that fulfills part of the contract, but not all of it, or fulfills the contract in a subtly incorrect way.
The restriction on being the same type is not needed. If a list is homogeneous, an iterator or set derived from the list would be homogeneous, and keeping that information in tact would probably be a good thing. With the slight adjustment, list, tuple, set, filter, iter (+most of itertools), sorted, reversed and heapq off the top of my head. Looking at the list, you could give type decorations legs just by having iterables of decorated objects retain the decoration, and having the basic collection types inherit type decorations at construction.
participants (9)
-
alex23
-
Devin Jeanpierre
-
Ethan Furman
-
Greg Ewing
-
Nathan Rice
-
Nick Coghlan
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy